Skip to content

Latest commit

 

History

History
61 lines (52 loc) · 5.17 KB

effectiveGenomeSize.rst

File metadata and controls

61 lines (52 loc) · 5.17 KB

Effective Genome Size

A number of tools can accept an "effective genome size". This is defined as the length of the "mappable" genome. There are two common alternative ways to calculate this:

1. The number of non-N bases in the genome.
2. The number of regions (of some size) in the genome that are uniquely mappable (possibly given some maximal edit distance).

Option 1 can be computed using faCount from Kents tools. The effective genome size for a number of genomes using this method is given below:

Genome Effective size
GRCh37 2864785220
GRCh38 2913022398
T2T/CHM13CAT_v2 3117292070
GRCm37 2620345972
GRCm38 2652783500
dm3 162367812
dm6 142573017
GRCz10 1369631918
GRCz11 1368780147
WBcel235 100286401
TAIR10 119482012

These values only appropriate if multimapping reads are included. If they are excluded (or there's any MAPQ filter applied), then values derived from option 2 are more appropriate. These are then based on the read length. We can approximate these values for various read lengths using the khmer program program and unique-kmers.py in particular. A table of effective genome sizes given a read length using this method is provided below:

Read length GRCh37 GRCh38 T2T/CHM13CAT_v2 GRCm37 GRCm38 dm3 dm6 GRCz10 GRCz11 WBcel235 TAIR10
50 2685511454 2701495711 2725240337 2304947876 2308125299 130428510 125464678 1195445541 1197575653 95159402 114339094
75 2736124898 2747877702 2786136059 2404646149 2407883243 135004387 127324557 1251132611 1250812288 96945370 115317469
100 2776919708 2805636231 2814334875 2462480910 2467481008 139647132 129789773 1280188944 1280354977 98259898 118459858
150 2827436883 2862010428 2931551487 2489384085 2494787038 144307658 129940985 1312207019 1311832909 98721103 118504138
200 2855463800 2887553103 2936403235 2513019076 2520868989 148523810 132508963 1321355041 1322366338 98672558 117723393
250 2855044784 2898802627 2960856300 2528988583 2538590322 151901455 132900923 1339205109 1342093482 101271756 119585546