Skip to content

Latest commit

 

History

History
61 lines (52 loc) · 5.17 KB

effectiveGenomeSize.rst

File metadata and controls

61 lines (52 loc) · 5.17 KB

Effective Genome Size

A number of tools can accept an "effective genome size". This is defined as the length of the "mappable" genome. There are two common alternative ways to calculate this:

1. The number of non-N bases in the genome.
2. The number of regions (of some size) in the genome that are uniquely mappable (possibly given some maximal edit distance).

Option 1 can be computed using faCount from Kents tools. The effective genome size for a number of genomes using this method is given below:

Genome Effective size
GRCh37

2864785220

GRCh38

2913022398

T2T/CHM13CAT_v2

3117292070

GRCm37

2620345972

GRCm38

2652783500

dm3

162367812

dm6

142573017

GRCz10

1369631918

GRCz11

1368780147

WBcel235

100286401

TAIR10

119482012

These values only appropriate if multimapping reads are included. If they are excluded (or there's any MAPQ filter applied), then values derived from option 2 are more appropriate. These are then based on the read length. We can approximate these values for various read lengths using the khmer program program and unique-kmers.py in particular. A table of effective genome sizes given a read length using this method is provided below:

Read length GRCh37 GRCh38 T2T/CHM13CAT_v2 GRCm37 GRCm38 dm3 dm6 GRCz10 GRCz11 WBcel235 TAIR10
50 2685511454 2701495711 2725240337 2304947876 2308125299 130428510 125464678 1195445541 1197575653 95159402 114339094
75 2736124898 2747877702 2786136059 2404646149 2407883243 135004387 127324557 1251132611 1250812288 96945370 115317469
100 2776919708 2805636231 2814334875 2462480910 2467481008 139647132 129789773 1280188944 1280354977 98259898 118459858
150 2827436883 2862010428 2931551487 2489384085 2494787038 144307658 129940985 1312207019 1311832909 98721103 118504138
200 2855463800 2887553103 2936403235 2513019076 2520868989 148523810 132508963 1321355041 1322366338 98672558 117723393
250 2855044784 2898802627 2960856300 2528988583 2538590322 151901455 132900923 1339205109 1342093482 101271756 119585546