GitHub - bioinfoUGR/genomecluster: GenomeCluster: Finding clusters of genome elements

DNA clustering and genome complexity

Web supplement of the paper:

Dios F., G. Barturen, R. Lebrón, A. Rueda, M. Hackenberg and J.L. Oliver. 2014. DNA clustering and genome complexity. Computational Biology and Chemistry: http://www.sciencedirect.com/science/article/pii/S1476927114000905

Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of ‘clusters-within-clusters’ parallels the ‘domains within domains’ phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
GenomeCluster.pl		GenomeCluster.pl
N.py		N.py
README.md		README.md
tutorial.pdf		tutorial.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

.gitignore

.gitignore

GenomeCluster.pl

GenomeCluster.pl

N.py

N.py

README.md

README.md

tutorial.pdf

tutorial.pdf

Repository files navigation

DNA clustering and genome complexity

About

Releases

Packages

Languages

bioinfoUGR/genomecluster

Folders and files

Latest commit

History

Repository files navigation

DNA clustering and genome complexity

About

Resources

Stars

Watchers

Forks

Languages