GenomeScope plots overview

Based on Interpreting GenomeScope profiles for VGP genome assemblies by Tanya Lama.

GenomeScope is used to estimate the overall characteristics of a genome including genome size, heterozygosity rate and repeat content from unprocessed short reads using a kmer-based statistical approach.

The results are summarized in two plots, where each plot is kmer coverage (x) by kmer frequency (y). Therefore, a point in the plot represents how many different kmers spans a given coverage. As an example, repetitive regions are usually composed of particular kmers, so there are not going to be many different kmers with a high coverage (i.e. one single kmer is highly repeated and therefore has high coverage). In the VGP pipeline, a kmer of 31bp long is used for the meryl+genomescope workflow.

The main result is the genome size estimation, which in this example can be abbreviated in 1,273 Gbp. Additionally, the interpretation of the profiles allows identifying where haploid and diploid peaks are expected, but also to have an idea of the sequencing performance. Shortly, the average kmer coverage is going to be similar to the effective coverage of the reads in the genome, which is useful as an estimation of the sequencing depth.

The log plot allows to represent those 31bp long kmers that are highly repeated and therefore have high coverage (e.g. repetitive regions).

Remember that you can always comment your results and ask your doubts in the "training" channel of Slack

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Genomescope_overview_1.7.md

Genomescope_overview_1.7.md

GenomeScope plots overview

Files

Genomescope_overview_1.7.md

Latest commit

History

Genomescope_overview_1.7.md

File metadata and controls

GenomeScope plots overview