In [1]:
import numpy as np
from IPython.display import HTML
from bokeh.plotting import output_notebook, show
import genomes_dnj.lct_interval.lct_tree_plots as dm
output_notebook(hide_banner=True)

<h2>History of Lactase Persistence</h2>

Lactase persistence in European populations has been associated with the SNP rs4988235. This notebook is primarily focused on establishing four events in the history of 11_765, the series that includes rs4988235 and understanding how the population specificity of those events fits the history of human expansion out of Africa.

Those events are:

    Association of the SNP series 4_1699 and 26_1414 that are at the root of lactase persistence

    The generation of the SNP series 6_1503 in the context of the SNP series 4_1699 and 26_1414

    The generation of the SNP series 4_911 in the context of 6_1503, 4_1699, and 26_1414

    The generation of the lactase persistence SNP series 11_765 in the context of 6_1503, 4_1699, 26_1414, and 4_911
    
Attention also will be paid to other hierarchies rooted in 4_1699 or 26_1414. These examples will illustrate the two other major hierarchies that share the same region of chromosome 2 as the one related to lactase persistence. They will establish an history of crossing over between series within this region. They will also illustrate a few of the cases where an hierarchy of series has established linkage disequilibrium across this whole region.

The studies in this notebook all explore the 1,029,311 base region of chromosome 2 that starts at position 135,757,320 and ends at position 136,786,630. These end points are the locations surrounding the the lactase persistence series 11_765 where the count of active series goes to zero..

<h3>4_1699 basis plot</h3>
<p>
This figure shows a series plot for the most specific series expressed by any chromosome that expresses 4_1699.  In the hierarchy towards lactase persistence 206 chromosomes express 26_1414 as the series in the region expressed by the smallest number of chromosomes.  The next level 4_911 is expressed as a basis series by 134 chromosomes.  The next level, the lactase persistence series 11_765 is a basis series for all 765 chromosomes that express it.  The rest of the 1699 chromosomes that express 4_1699 are scattered over a variety of series.  In several cases, like 11_765 all or the chromosomes that express 4_1699 are all or almost all of the chromosomes that express the series.  All of them appear to be cases where some genetic event established a multiple SNP series.
</p>

In [2]:
plt0_obj = dm.basis_4_1699()
plt0_obj.do_plot()
plt0 = plt0_obj.plot_figure
show(plt0)

The first figure is a basis plot for all of the chromosomes that express the series 4_1699.  For each chromosome in the analysis set, a basis plot finds the series that is expressed by it and the smallest number of other chromsomes.  The id on top of a series gives the number of SNP's in it and the number of chromosomes that express it.  The number to the side of the series gives the number of chromosomes for which it is the interval basis series.  Colors represent population specificity

    Red Africa
    
    Blue South Asia
    
    Green Europe
    
    Yellow Asia
    
    Purple America

The colors in this figure represent the whole population that expresses the series and not just those for which it is the basis series.

In [3]:
bs0_obj = plt0_obj.plot_context.basis_obj
HTML(bs0_obj.html_sorted_by_allele_count())

index,first,last,length,snps,alleles,basis,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353797,136398174,136474098,75924,26,1414,206,6,0.14,2,0.15,41,1.44,81,1.95,20,0.48,20,1.34,36,1.43
353791,136393658,136486342,92684,32,1361,7,0,0.0,0,0.0,1,1.03,6,4.26,0,0.0,0,0.0,0,0.0
354129,136652953,136761175,108222,5,1296,3,0,0.0,0,0.0,0,0.0,3,4.97,0,0.0,0,0.0,0,0.0
353919,136500475,136542560,42085,13,1227,1,0,0.0,0,0.0,1,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353849,136447707,136490266,42559,4,1149,17,0,0.0,0,0.0,5,2.12,12,3.51,0,0.0,0,0.0,0,0.0
354127,136652491,136732772,80281,6,1114,10,0,0.0,0,0.0,0,0.0,5,2.48,1,0.5,2,2.75,2,1.63
353984,136556805,136747085,190280,39,1014,15,0,0.0,0,0.0,2,0.96,11,3.64,1,0.33,1,0.92,0,0.0
353807,136403994,136485096,81102,13,911,3,0,0.0,0,0.0,0,0.0,1,1.66,2,3.32,0,0.0,0,0.0
353604,136092061,136407479,315418,4,911,134,0,0.0,0,0.0,55,2.96,0,0.0,35,1.3,16,1.64,28,1.7
353902,136494985,136773638,278653,81,857,13,4,1.53,0,0.0,0,0.0,5,1.91,1,0.38,0,0.0,3,1.88


This table gives the population counts for the chromosomes in the analysis set that express a particular series as a basis 
series.  Note that the 206 chromosomes that express 26_1414 as a basis have only a very small African representation.  But, they do have a significant East Asian represresentation.  By contrast, the 134 chromosomes that express 4_911 as a basis have
no African representation and no East Asian representation.  All 765 chromosomes that express 11_765 express it as a basis series.  Those chromosomes include only 2 true Africans and 0 East Asians.  But 32 chromosomes from Caribbean or American
Southwest populations that thousand genomes does consider of African ancestry do express 11_765.

The series 6_57, 6_35, 12_29, 8_28, 15_24, and 7_22 are examples of cases where all chromosomes that express the series express it as a basis and also express 4_1699.  The presence of several African series in this group points to an history for 4_1699 that preceeds the exit of some human populations from Africa.

### 4_1699 Series Subset Plot

This plot shows all the series where at least 90% of the chromosomes expressing the series also express 4_1699.

In [4]:
plt1_obj = dm.subset_4_1699()
plt1 = plt1_obj.do_plot()
show(plt1)

In [5]:
HTML(plt1_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353462,135915358,79721,4,1699,1.0,1699,1.0,151,0.44,78,0.73,320,1.36,309,0.9,609,1.78,110,0.89,122,0.59
353283,135771974,368330,6,1503,1.0,1499,0.88,14,0.05,37,0.39,305,1.47,305,1.01,608,2.02,109,1.0,121,0.66
353797,136398174,75924,26,1414,0.92,1307,0.77,37,0.14,43,0.52,281,1.55,116,0.44,602,2.29,108,1.14,120,0.75
353604,136092061,315418,4,911,1.0,909,0.54,2,0.01,31,0.54,204,1.62,0,0.0,524,2.87,71,1.07,77,0.69
353380,135837906,870076,11,765,1.0,765,0.45,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353787,136392474,250856,95,176,0.93,163,0.1,0,0.0,0,0.0,16,0.71,147,4.48,0,0.0,0,0.0,0,0.0
353282,135771752,790527,18,60,0.98,59,0.03,38,3.2,16,4.33,5,0.61,0,0.0,0,0.0,0,0.0,0,0.0
353444,135889881,493055,6,57,1.0,57,0.03,2,0.17,1,0.28,17,2.15,0,0.0,29,2.53,5,1.21,3,0.43
353407,135859157,121781,6,35,1.0,35,0.02,1,0.14,1,0.46,13,2.68,0,0.0,17,2.42,2,0.79,1,0.23
353252,135759628,683922,12,32,1.0,32,0.02,21,3.26,6,2.99,5,1.13,0,0.0,0,0.0,0,0.0,0,0.0


This table shows the number of matches for the different subset series and the population distribution of the chromosomes
that match.  Many series show a complete or very near complete match where all chromosomes that express the subset series also express 4_1699.  The population distribution includes a column that gives a count of the number of chromosomes from the population that matched and a ratio of that number to the value that would be predicted from the number of samples of the population in the thousand genome data.

All but 4 of the chromosomes that express 6_1503, express 4_1699. Only 14 true African chromosomes express 6_1503.  But, that number is large enough to suggest that 6_1503 appeared as a genetic event in some African population.  That event specialized chromosomes that already expressed 4_1699 and 26_1414.  The large presence of 6_1503 in the thousand genome data results from the success of this series outside of Africa.  The selective advantage of lactase persistence is certainly a large part of that story.  But, almost half of the 1503 chromosomes don't express the lactase persistence SNP's.

Additional investigation documented below make it clear that two of the series 26_1414 and 95_176 are not true subsets of 4_1699.  The series 26_1414 has an independent history from 4_1699 that stretches back into the African past.  Its apparent association with 4_1699 is the result of the human population growth during the expansion outside of Africa.  Perhaps that growth is partly just a chance result of the presence of this series in a small population at the root of that expansion.  But, at least the 765 chromosomes that express both 11_765 and 26_1414 appear to have realized a significant selective advantage from their genetic variation.  The fact that 206 chromosomes express just 26_1414 as a basis series suggests a good reason to suspect that the context of 6_1503, 4_1699, and 26_1414 already offered some kind of advantage to individuals with a chromosome that expressed these series.

The source of the almost completely East Asian 95_176 series and the 51_176 series that is usually associated with it are the most obscure for any of the series in this interval that appeared during the out of Africa expansion.  But, it appears very likely that the association of 6_1503 and 4_1699 with these series came from a crossover event somewhere in the interval between 4_1699 and 26_1414.  That crossover event associated 6_1503 and 4_1699 but not 26_1414 with 95_176 and 51_176.  Most of the surviving instances of 95_176 derive from descendants of this crossover event.  But, some of the 95_176 chromosomes likely are descended from the association of series that preceeded this event.

### 6_1503, 4_1699, and 26_1414 Series Subset Plot

This plot shows the series that are expressed by subsets of the chromosomes that express 6_1503, 4_1699, and 26_1414

In [6]:
plt2_obj = dm.subset_yes_no([dm.di_6_1503, dm.di_4_1699, dm.di_26_1414])
plt2 = plt2_obj.do_plot()
show(plt2)

This figure shows the series expressed by chromosomes that are a subset of the chromosomes that express 6_1503, 4_1699, and 26_1414.  The genetic event that generated 11_765 happened in the context of chromosomes that expressed 4_911 as well as the other parent series.  But, the generation of 6_57, 6_35, and 7_22 appear to have all been independent genetic events.

In [7]:
plt3_obj = dm.subset_yes_no([dm.di_4_911])
plt3 = plt3_obj.do_plot()
show(plt3)

This figure shows that 11_765 is a subset of 4_911.

### 6_1503, 4_1699, and 26_1414 Basis Plot

This plot shows the most specific series expressed by chromosomes that express 6_1503, 4_1699, and 26_1414

In [8]:
plt4_obj = dm.basis_yes_no([dm.di_4_1699, dm.di_6_1503, dm.di_26_1414])
plt4 = plt4_obj.do_basis_plot()
show(plt4)

This figure shows a basis plot for the chromosomes that express 6_1503, 4_1699, and 26_1414

In [9]:
HTML(plt4_obj.get_basis_html())

index,first,last,length,snps,alleles,basis,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353797,136398174,136474098,75924,26,1414,201,6,0.15,2,0.16,39,1.4,80,1.98,20,0.5,19,1.3,35,1.42
354129,136652953,136761175,108222,5,1296,3,0,0.0,0,0.0,0,0.0,3,4.97,0,0.0,0,0.0,0,0.0
354127,136652491,136732772,80281,6,1114,7,0,0.0,0,0.0,0,0.0,2,1.42,1,0.71,2,3.93,2,2.33
353984,136556805,136747085,190280,39,1014,9,0,0.0,0,0.0,0,0.0,7,3.86,1,0.55,1,1.53,0,0.0
353604,136092061,136407479,315418,4,911,134,0,0.0,0,0.0,55,2.96,0,0.0,35,1.3,16,1.64,28,1.7
353902,136494985,136773638,278653,81,857,2,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,2,8.16
353938,136514709,136543147,28438,6,820,5,0,0.0,0,0.0,1,1.44,0,0.0,0,0.0,2,5.5,2,3.26
353380,135837906,136707982,870076,11,765,765,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
354061,136603638,136632125,28487,16,511,1,0,0.0,0,0.0,1,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354064,136605402,136624686,19284,8,328,3,0,0.0,0,0.0,1,2.41,0,0.0,2,3.32,0,0.0,0,0.0


This table shows the population data for the 6_1503, 4_1699, and 26_1414 basis plot.  The 201 chromosomes that express 26_1414 as a basis series include 6 true Africans.  The 2 chromosomes with African ancestry show a consistent expression level with the true African samples.  The 20 European chromosomes show an under representation.  The 80 East Asian chromosomes are the most over represented.  But, the American and South Asian chromosomes are also modestly over represented.  This pattern appears consistent with a genetic feature that existed with the base African population but which did better in the out of Africa expansion.  The reason could just be its frequency in a small population that left Africa.  Or it could be do to some kind of selective advantage.

The 134 chromosomes that express 4_911 as a basis series show a much different population pattern.  There are no African or East Asian samples.  European samples show a modest over representation.  South Asian samples show a larger over representation.  But the largest over representation is the 55 American chromosomes.  The evidence argues strongly that 4_911 was generated by an event out of Africa after the separation of the East Asian population.  It is hard to understand how that event would have happened without some kind of selective advantage for the chromosomes that expressed the 4_911 series.  The path followed by the 55 American samples remains uncertain.  Europeans with these samples do exist and they could have brought them to America.  But, a path East through Siberia in an early out of Africa population seems to be a better fit with the data.

The 11_765 chromosomes show yet a different population distribution.  The 484 European samples are highly over represented.  The 32 samples among American populations of Afican ancestry appear to show the effect of strong selection.  The effect of that selection makes the idea that the high representation of American samples is due to contributions from modern European migrations plausible.

### 4_911 Supersets

Most of the chromosomes that express 4_911 but not 11_765 do still express the standard 11_765 supersets 6_1503, 4_1699, 26_1414, 64_1575, 10_2206, and 7_1868.  But a significant portion of the 4_911 only chromosomes show an history of a crossover event.  Even with the kind of linkage disequlibrium displayed by many chromosomes throughout this region, crossovers are possible.

In [10]:
plt5_obj = dm.superset_yes_no([dm.di_4_911, dm.di_64_1575], [dm.di_11_765], min_match=0.1)
plt5 = plt5_obj.do_plot()
show(plt5)

Most of the chromosomes that express 4_911 but not 11_765 express 64_1575, 10_2206, and 7_1868 like those that do express 11_765.  The genetic event that generated 4_911 appears very likely to have happened in chromosomes that expressed all of these series.  But, a signfificant number of the 4_911 only chromosomes show an history of crossover events.  This figure shows the series expressed by chromosomes that did express 4_911 and 64_1575 but not 11_765.  The match threshold was set down to 0.1 to allow any series expressed by a minority of the chromosomes to appear.  A superset series requires chomosomes in the analysis set to also match the superset series up to some threshold.  In this case, 0.1 was the chosen threshold.  The color of the series have changed because the excluded chromosomes that express 11_765 were not candidates to be selected in the plot.

In [11]:
HTML(plt5_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.04,80,0.99,0,0.0,0,0.0,39,3.77,0,0.0,24,2.44,5,0.86,12,1.12
354170,136682274,93624,7,1868,0.04,76,0.94,0,0.0,0,0.0,34,3.46,0,0.0,25,2.67,5,0.91,12,1.18
353462,135915358,79721,4,1699,0.05,79,0.98,0,0.0,0,0.0,38,3.72,0,0.0,25,2.57,5,0.87,11,1.04
353901,136494186,271765,64,1575,0.05,81,1.0,0,0.0,0,0.0,39,3.73,0,0.0,25,2.51,5,0.85,12,1.1
353283,135771974,368330,6,1503,0.05,80,0.99,0,0.0,0,0.0,39,3.77,0,0.0,25,2.54,5,0.86,11,1.03
353797,136398174,75924,26,1414,0.06,81,1.0,0,0.0,0,0.0,39,3.73,0,0.0,25,2.51,5,0.85,12,1.1
353604,136092061,315418,4,911,0.09,81,1.0,0,0.0,0,0.0,39,3.73,0,0.0,25,2.51,5,0.85,12,1.1


Requiring the series 64_1575 selects 4_911 basis series that follow the 11_765 pattern.  There are 81 series in this class.

In [12]:
plt6_obj = dm.superset_yes_no([dm.di_4_911, dm.di_10_2206], [dm.di_64_1575, dm.di_11_765], min_match=0.1)
plt6 = plt6_obj.do_plot()
show(plt6)

In [13]:
HTML(plt6_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.03,57,1.0,0,0.0,0,0.0,16,2.21,0,0.0,13,1.86,11,2.65,17,2.27
353462,135915358,79721,4,1699,0.03,57,1.0,0,0.0,0,0.0,16,2.21,0,0.0,13,1.86,11,2.65,17,2.27
353283,135771974,368330,6,1503,0.04,57,1.0,0,0.0,0,0.0,16,2.21,0,0.0,13,1.86,11,2.65,17,2.27
353797,136398174,75924,26,1414,0.04,57,1.0,0,0.0,0,0.0,16,2.21,0,0.0,13,1.86,11,2.65,17,2.27
353604,136092061,315418,4,911,0.06,57,1.0,0,0.0,0,0.0,16,2.21,0,0.0,13,1.86,11,2.65,17,2.27


There are 57 cases of series expressing 4_911 where 10_2206 is expressed but 64_1575 is not.

In [14]:
plt7_obj = dm.superset_yes_no([dm.di_4_911], [dm.di_10_2206, dm.di_11_765], min_match=0.1)
plt7 = plt7_obj.do_plot()
show(plt7)

In [15]:
HTML(plt7_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354033,136588031,5647,7,1760,0.0,3,0.25,0,0.0,0,0.0,0,0.0,0,0.0,1,2.68,0,0.0,2,4.85
353462,135915358,79721,4,1699,0.01,12,1.0,0,0.0,0,0.0,4,2.58,0,0.0,3,2.01,1,1.16,4,2.43
354130,136653925,107928,24,1504,0.0,3,0.25,0,0.0,0,0.0,0,0.0,0,0.0,1,2.68,0,0.0,2,4.85
353283,135771974,368330,6,1503,0.01,12,1.0,0,0.0,0,0.0,4,2.58,0,0.0,3,2.01,1,1.16,4,2.43
353925,136506375,32564,4,1442,0.0,6,0.5,0,0.0,0,0.0,3,3.87,0,0.0,1,1.34,0,0.0,2,2.43
353797,136398174,75924,26,1414,0.01,12,1.0,0,0.0,0,0.0,4,2.58,0,0.0,3,2.01,1,1.16,4,2.43
353958,136535876,19014,7,1303,0.0,3,0.25,0,0.0,0,0.0,0,0.0,0,0.0,1,2.68,0,0.0,2,4.85
354129,136652953,108222,5,1296,0.0,3,0.25,0,0.0,0,0.0,0,0.0,0,0.0,1,2.68,0,0.0,2,4.85
353919,136500475,42085,13,1227,0.0,6,0.5,0,0.0,0,0.0,3,3.87,0,0.0,1,1.34,0,0.0,2,2.43
353906,136496493,57432,9,1170,0.01,7,0.58,0,0.0,0,0.0,4,4.42,0,0.0,0,0.0,1,1.98,2,2.08


There are 12 4_911 basis series that don't express 10_2206.  These series appear to result from several cross over events that have lead to the expression of a variety of series by different 4_911 instances.

### 4_1699 African History

Most of the chromosomes that express 4_1699 also express 6_1503.  The evidence strongly supports the idea that the six SNP's of 6_1503 appeared as a genetic event in the context of 4_1699 and 26_1414.  But exploration of the series that express 4_1699 but not 6_1503 reveals an history of 4_1699 that likely preceeded the appearance of 6_1503.  Some of that history can be understood by examining the series expressed by chromosomes that express both 13_1696 and 4_1699.  The series 13_1696 is the first one in the interval and a root for the history of many of the interval's series.  The larger part of that history includes an association between 13_1696 and 32_1361.  The series 32_1361 overlaps the same part of chromosome 2 as 26_1414 and does not generally coexist with it.  So the history of 4_1699 with 13_1696 and 32_1361 appears to be separated by at least two crossover events from the history of 4_1699 with 6_1503 and 26_1414.

In [16]:
plt8_obj = dm.superset_yes_no([dm.di_4_1699, dm.di_13_1696], min_match=0.1)
plt8 = plt8_obj.do_plot()
show(plt8)

This figure shows series matched by chromosomes that express 4_1699 and 13_1696.  These chromosomes express series in a completely different hierarchy from those that express 4_1699 and 6_1503.

In [17]:
HTML(plt8_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354033,136588031,5647,7,1760,0.08,140,0.88,98,3.48,33,3.76,9,0.46,0,0.0,0,0.0,0,0.0,0,0.0
353462,135915358,79721,4,1699,0.09,159,1.0,112,3.5,35,3.51,10,0.45,0,0.0,1,0.03,1,0.09,0,0.0
353240,135757320,20184,13,1696,0.09,159,1.0,112,3.5,35,3.51,10,0.45,0,0.0,1,0.03,1,0.09,0,0.0
354130,136653925,107928,24,1504,0.07,106,0.67,74,3.47,24,3.61,8,0.54,0,0.0,0,0.0,0,0.0,0,0.0
353925,136506375,32564,4,1442,0.1,137,0.86,94,3.41,34,3.96,9,0.47,0,0.0,0,0.0,0,0.0,0,0.0
353791,136393658,92684,32,1361,0.11,149,0.94,105,3.5,34,3.64,9,0.44,0,0.0,1,0.03,0,0.0,0,0.0
353958,136535876,19014,7,1303,0.11,138,0.87,95,3.42,34,3.93,9,0.47,0,0.0,0,0.0,0,0.0,0,0.0
354129,136652953,108222,5,1296,0.08,104,0.65,73,3.49,23,3.53,8,0.56,0,0.0,0,0.0,0,0.0,0,0.0
353919,136500475,42085,13,1227,0.11,136,0.86,94,3.43,33,3.87,9,0.48,0,0.0,0,0.0,0,0.0,0,0.0
354127,136652491,80281,6,1114,0.09,103,0.65,72,3.47,23,3.56,8,0.56,0,0.0,0,0.0,0,0.0,0,0.0


The series 12_29, 8_28, 24_24, and 15_24 are completely or almost completely expressed by chromosomes that express 4_1699 and 13_1696.  Their generation belongs to a completely different history then that which generated 6_1503, 4_911, and 11_765.  The involvement of series expressed only by African chromosomes suggests that this history was an earlier one.

In [18]:
di_8_28 = 353368
plt8_1_obj = dm.superset_yes_no([di_8_28], min_match=0.001)
plt8_1 = plt8_1_obj.do_plot()
show(plt8_1)

This figure shows the series expressed by chromosomes that express 8_28.

In [19]:
HTML(plt8_1_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354033,136588031,5647,7,1760,0.02,28,1.0,18,3.19,9,5.13,1,0.26,0,0.0,0,0.0,0,0.0,0,0.0
353462,135915358,79721,4,1699,0.02,28,1.0,18,3.19,9,5.13,1,0.26,0,0.0,0,0.0,0,0.0,0,0.0
353240,135757320,20184,13,1696,0.02,28,1.0,18,3.19,9,5.13,1,0.26,0,0.0,0,0.0,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.02,28,1.0,18,3.19,9,5.13,1,0.26,0,0.0,0,0.0,0,0.0,0,0.0
353925,136506375,32564,4,1442,0.02,28,1.0,18,3.19,9,5.13,1,0.26,0,0.0,0,0.0,0,0.0,0,0.0
353791,136393658,92684,32,1361,0.02,28,1.0,18,3.19,9,5.13,1,0.26,0,0.0,0,0.0,0,0.0,0,0.0
353958,136535876,19014,7,1303,0.02,28,1.0,18,3.19,9,5.13,1,0.26,0,0.0,0,0.0,0,0.0,0,0.0
354129,136652953,108222,5,1296,0.02,28,1.0,18,3.19,9,5.13,1,0.26,0,0.0,0,0.0,0,0.0,0,0.0
353919,136500475,42085,13,1227,0.02,28,1.0,18,3.19,9,5.13,1,0.26,0,0.0,0,0.0,0,0.0,0,0.0
354127,136652491,80281,6,1114,0.02,27,0.96,17,3.13,9,5.32,1,0.27,0,0.0,0,0.0,0,0.0,0,0.0


All of the series are expressed completely or almost completely by all of the 28 chromosomes that express 8_28.  The series 8_28 provides a strong example of a case where a series rooted in a completely different hierarchy from 11_765 has also established linkage disequilibrium across the same region of chromosome 2.

### 26_1414 African History

The 107 chromosomes that express 26_1414.  81 of these chromosomes express 117_685.  This series and the overlapping mostly commonly expressed series 123_1561 are at the root of a substantial part of the chromosome region's African history.  The length of these series and their large number of SNP's suggest that they are likely to be the foundation of this region's unusual characteristics.  The crossover event that separates the association of 26_1414 and 117_1685 and its association with 4_1699 is one of many for 117_1685.  Although 117_1685 expresses strong linkage disequilibrium across its more then 600,000 base length, crossover events that alter the other interval series associated with it are common.

In [20]:
plt9_obj = dm.superset_yes_no([dm.di_26_1414, dm.di_117_1685], min_match=0.1)
plt9 = plt9_obj.do_plot()
show(plt9)

This figure shows series expressed by chromosomes that express 26_1414 and 117_1685.  The series 117_1685 and 123_1561 appear likely to have been at the root of the process that generated the exceptional characteristics of the series in this region of chromosome 2.

In [21]:
HTML(plt9_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.04,80,0.99,52,3.23,16,3.19,4,0.36,3,0.19,4,0.25,1,0.17,0,0.0
354170,136682274,93624,7,1868,0.01,27,0.33,16,2.94,5,2.95,0,0.0,2,0.37,4,0.74,0,0.0,0,0.0
353244,135758231,618284,117,1685,0.05,81,1.0,53,3.25,16,3.15,4,0.36,3,0.18,4,0.25,1,0.17,0,0.0
353901,136494186,271765,64,1575,0.02,25,0.31,14,2.78,5,3.19,0,0.0,2,0.4,4,0.8,0,0.0,0,0.0
353478,135933921,434642,123,1561,0.05,77,0.95,52,3.36,15,3.11,4,0.37,3,0.19,2,0.13,1,0.18,0,0.0
354130,136653925,107928,24,1504,0.02,36,0.44,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
353797,136398174,75924,26,1414,0.06,81,1.0,53,3.25,16,3.15,4,0.36,3,0.18,4,0.25,1,0.17,0,0.0
353269,135766890,509095,62,1265,0.06,77,0.95,52,3.36,15,3.11,4,0.37,3,0.19,2,0.13,1,0.18,0,0.0
353729,136309239,52321,9,887,0.08,71,0.88,51,3.57,15,3.37,4,0.41,0,0.0,0,0.0,1,0.19,0,0.0
353312,135784351,117869,8,718,0.03,21,0.26,14,3.31,6,4.56,0,0.0,0,0.0,0,0.0,1,0.66,0,0.0


81 chromosomes that express 26_1414 also express 117_1685.  77 of them express 123_1561.

In [22]:
di_14_48 = 353833
plt10_0_obj = dm.superset_yes_no([dm.di_14_48, dm.di_123_1561], min_match=0.001)
plt10_0 = plt10_0_obj.do_plot()
show(plt10_0)

All 48 chromosomes that express the series 14_48 also express 26_1414 and 117_1685.  This figure shows all of the series that are expressed by any of those chromosomes.

In [23]:
HTML(plt10_0_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.02,46,1.0,33,3.56,9,3.12,4,0.63,0,0.0,0,0.0,0,0.0,0,0.0
353244,135758231,618284,117,1685,0.03,46,1.0,33,3.56,9,3.12,4,0.63,0,0.0,0,0.0,0,0.0,0,0.0
353478,135933921,434642,123,1561,0.03,46,1.0,33,3.56,9,3.12,4,0.63,0,0.0,0,0.0,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.02,36,0.78,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
353797,136398174,75924,26,1414,0.03,46,1.0,33,3.56,9,3.12,4,0.63,0,0.0,0,0.0,0,0.0,0,0.0
353269,135766890,509095,62,1265,0.04,46,1.0,33,3.56,9,3.12,4,0.63,0,0.0,0,0.0,0,0.0,0,0.0
353729,136309239,52321,9,887,0.05,46,1.0,33,3.56,9,3.12,4,0.63,0,0.0,0,0.0,0,0.0,0,0.0
353764,136364916,22977,5,588,0.08,46,1.0,33,3.56,9,3.12,4,0.63,0,0.0,0,0.0,0,0.0,0,0.0
353349,135810535,87488,9,545,0.08,46,1.0,33,3.56,9,3.12,4,0.63,0,0.0,0,0.0,0,0.0,0,0.0
354189,136704466,27748,5,212,0.17,36,0.78,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0


Many of the series expressed by any 14_48 chromosome are expressed by 46 of them.  Many of the rest are expressed by 36 of them.

In [24]:
plt10_1_obj = dm.superset_yes_no([dm.di_14_48, dm.di_123_1561, dm.di_24_1504], min_match=0.001)
plt10_1 = plt10_1_obj.do_plot()
show(plt10_1)

With the exception of 10_43, all of the series in this figure are expressed by the 36 chromosomes that express 14_48, 123_1561, and 24_1504.  This is another very different expression of linkage disequilibrium involving a series 26_1414 that is central to the generation of the lactase persistence series 11_765

In [25]:
HTML(plt10_1_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.02,36,1.0,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
353244,135758231,618284,117,1685,0.02,36,1.0,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
353478,135933921,434642,123,1561,0.02,36,1.0,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.02,36,1.0,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
353797,136398174,75924,26,1414,0.03,36,1.0,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
353269,135766890,509095,62,1265,0.03,36,1.0,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
353729,136309239,52321,9,887,0.04,36,1.0,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
353764,136364916,22977,5,588,0.06,36,1.0,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
353349,135810535,87488,9,545,0.07,36,1.0,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0
354189,136704466,27748,5,212,0.17,36,1.0,26,3.59,6,2.66,4,0.8,0,0.0,0,0.0,0,0.0,0,0.0


In [26]:
plt10_2_obj = dm.superset_yes_no([dm.di_14_48], [dm.di_123_1561], min_match=0.001)
plt10_2 = plt10_2_obj.do_plot()
show(plt10_2)

This figure show the series expressed by the 2 chromosomes that express 14_48 but not 123_1561

In [27]:
HTML(plt10_2_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.0,2,1.0,1,3.71,1,11.19,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353244,135758231,618284,117,1685,0.0,2,1.0,1,3.71,1,11.19,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353797,136398174,75924,26,1414,0.0,2,1.0,1,3.71,1,11.19,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353312,135784351,117869,8,718,0.0,2,1.0,1,3.71,1,11.19,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354209,136729290,55170,4,50,0.04,2,1.0,1,3.71,1,11.19,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353833,136428402,221239,14,48,0.04,2,1.0,1,3.71,1,11.19,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354151,136661788,74619,4,46,0.04,2,1.0,1,3.71,1,11.19,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0


### 6_1503 and 95_176 East Asian History

Of the chromosomes that express 6_1503, 233 do not express 26_1414.  The series expressed by these 233 chromosomes almost certainly result from one or more crossover events following the appearance of the six 6_1503 SNP's in a chromosome that expressed 4_1699 and 26_1414.  The largest association among these 233 chromosomes is with the 163 chromosomes that express 6_1503 and 95_176.  The likely genesis of this association was a crossover event between 4_1699 and 95_176 in some East Asian population that already includes some small number of chromosomes that expressed 95_176 and 51_176.  The source of the large number of SNP's in these two East Asian series is unclear.  But the varying associations of 6_1503 and 4_1699 with these series can be traced through a sequence of crossover events. 

In [28]:
plt7_1_obj = dm.superset_yes_no([dm.di_6_1503], [dm.di_26_1414], min_match=0.001)
plt7_1 = plt7_1_obj.do_plot()
show(plt7_1)

This plot shows another complex series structure that involves chromsomes that express 6_1503 and 4_1699 but not 26_1414.

In [29]:
HTML(plt7_1_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.01,22,0.09,0,0.0,0,0.0,3,1.21,18,3.34,1,0.41,0,0.0,0,0.0
354170,136682274,93624,7,1868,0.01,27,0.12,1,0.15,0,0.0,6,1.97,19,2.87,1,0.33,0,0.0,0,0.0
354033,136588031,5647,7,1760,0.11,191,0.82,1,0.02,0,0.0,22,1.02,162,3.46,4,0.19,1,0.07,1,0.04
353462,135915358,79721,4,1699,0.14,231,0.99,1,0.02,0,0.0,30,1.15,190,3.36,6,0.23,2,0.12,2,0.06
353901,136494186,271765,64,1575,0.01,16,0.07,0,0.0,0,0.0,3,1.66,13,3.32,0,0.0,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.11,165,0.71,0,0.0,0,0.0,3,0.16,157,3.89,2,0.11,1,0.09,2,0.09
353283,135771974,368330,6,1503,0.16,233,1.0,1,0.02,0,0.0,31,1.18,191,3.35,6,0.23,2,0.12,2,0.06
353814,136406646,31432,5,1460,0.02,23,0.1,1,0.17,0,0.0,4,1.54,14,2.49,3,1.18,1,0.62,0,0.0
353925,136506375,32564,4,1442,0.01,21,0.09,0,0.0,0,0.0,7,2.95,8,1.56,5,2.15,0,0.0,1,0.35
353791,136393658,92684,32,1361,0.01,17,0.07,0,0.0,0,0.0,3,1.56,10,2.4,3,1.59,0,0.0,1,0.43


Of the 233 chromsomes that express 6_1503 but not 26_1414, 191 of them are East Asian.  95_176 and 51_176 are the series most over expressed by East Asian chromosomes.

In [30]:
plt13_obj = dm.superset_yes_no([dm.di_6_1503, dm.di_95_176], min_match=0.001)
plt13 = plt13_obj.do_plot()
show(plt13)

This figure shows all of the series expressed by any chromosome that expresses both 6_1503 and 95_176

In [31]:
HTML(plt13_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354170,136682274,93624,7,1868,0.0,1,0.01,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
354033,136588031,5647,7,1760,0.09,163,1.0,0,0.0,0,0.0,17,0.75,146,4.45,0,0.0,0,0.0,0,0.0
353462,135915358,79721,4,1699,0.09,161,0.99,0,0.0,0,0.0,16,0.72,145,4.47,0,0.0,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.1,146,0.9,0,0.0,0,0.0,3,0.15,143,4.87,0,0.0,0,0.0,0,0.0
353283,135771974,368330,6,1503,0.11,163,1.0,0,0.0,0,0.0,17,0.75,146,4.45,0,0.0,0,0.0,0,0.0
354129,136652953,108222,5,1296,0.11,146,0.9,0,0.0,0,0.0,3,0.15,143,4.87,0,0.0,0,0.0,0,0.0
353849,136447707,42559,4,1149,0.14,163,1.0,0,0.0,0,0.0,17,0.75,146,4.45,0,0.0,0,0.0,0,0.0
354127,136652491,80281,6,1114,0.0,5,0.03,0,0.0,0,0.0,3,4.33,2,1.99,0,0.0,0,0.0,0,0.0
354061,136603638,28487,16,511,0.32,162,0.99,0,0.0,0,0.0,17,0.76,145,4.45,0,0.0,0,0.0,0,0.0
354123,136651773,122854,51,176,0.8,140,0.86,0,0.0,0,0.0,0,0.0,140,4.97,0,0.0,0,0.0,0,0.0


In [32]:
plt14_obj = dm.superset_yes_no([dm.di_95_176, dm.di_51_176, dm.di_6_1503, dm.di_4_1699], min_match=0.001)
plt14 = plt14_obj.do_plot()
show(plt14)

This figure shows a common set of series expressed by all chromosomes that express 6_1503, 4_1699, 95_176, and 51_176.

In [33]:
HTML(plt14_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354033,136588031,5647,7,1760,0.08,139,1.0,0,0.0,0,0.0,0,0.0,139,4.97,0,0.0,0,0.0,0,0.0
353462,135915358,79721,4,1699,0.08,139,1.0,0,0.0,0,0.0,0,0.0,139,4.97,0,0.0,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.09,139,1.0,0,0.0,0,0.0,0,0.0,139,4.97,0,0.0,0,0.0,0,0.0
353283,135771974,368330,6,1503,0.09,139,1.0,0,0.0,0,0.0,0,0.0,139,4.97,0,0.0,0,0.0,0,0.0
354129,136652953,108222,5,1296,0.11,139,1.0,0,0.0,0,0.0,0,0.0,139,4.97,0,0.0,0,0.0,0,0.0
353849,136447707,42559,4,1149,0.12,139,1.0,0,0.0,0,0.0,0,0.0,139,4.97,0,0.0,0,0.0,0,0.0
354061,136603638,28487,16,511,0.27,139,1.0,0,0.0,0,0.0,0,0.0,139,4.97,0,0.0,0,0.0,0,0.0
354123,136651773,122854,51,176,0.79,139,1.0,0,0.0,0,0.0,0,0.0,139,4.97,0,0.0,0,0.0,0,0.0
353787,136392474,250856,95,176,0.79,139,1.0,0,0.0,0,0.0,0,0.0,139,4.97,0,0.0,0,0.0,0,0.0


All 139 of the chromosomes express all of the series.  Those series are the only ones in the region expressed by any of these chromosomes.  All of the chromosomes come from East Asian populations.  The most likely explanation for this instance of linkage disequilibrium was a cross over event in an East Asian population between 4_1699 and 26_1414 that joined 6_1503 and 4_1699 with the other series shown in the figure.  The large number of chromosomes in this group would be the result of some kind of preferential advantage for the carriers of these series.  The source of 95_176 and 51_176 is less clear.  The number of SNP's involved is much larger then the number observed in other cases in this region for genetic events that created a series of variants.  Nor is there any obvious African series that could provide the root for this process.

In [34]:
plt15_obj = dm.superset_yes_no([dm.di_5_18], min_match=.001)
plt15 = plt15_obj.do_plot()
show(plt15)

All chromosomes that express the series 5_18 come from American populations.  The series appears to have been generated from a chromosome that expressed 6_1503 and 95_176.  The first step in the process probably was the cross over that generated a chromosome expressing 6_1503, 95_176, and 51_176.  The second step was another cross over between 95_176 and 51_176 that lost 51_176.  The final step was the generation of 5_18 probably sometime after the American population had separated from the source of its East Asian gene flow.

In [35]:
HTML(plt15_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.0,1,0.06,0,0.0,0,0.0,1,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354170,136682274,93624,7,1868,0.0,1,0.06,0,0.0,0,0.0,1,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354033,136588031,5647,7,1760,0.01,17,0.94,0,0.0,0,0.0,17,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353462,135915358,79721,4,1699,0.01,17,0.94,0,0.0,0,0.0,17,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353901,136494186,271765,64,1575,0.0,1,0.06,0,0.0,0,0.0,1,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.0,3,0.17,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353283,135771974,368330,6,1503,0.01,18,1.0,0,0.0,0,0.0,18,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353797,136398174,75924,26,1414,0.0,1,0.06,0,0.0,0,0.0,1,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354129,136652953,108222,5,1296,0.0,3,0.17,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353849,136447707,42559,4,1149,0.01,17,0.94,0,0.0,0,0.0,17,7.22,0,0.0,0,0.0,0,0.0,0,0.0


In [36]:
cntx15 = plt15_obj.plot_context
HTML(cntx15.get_country_html())

pop,cnt,otp,Unnamed: 3_level_0,Unnamed: 4_level_0
pop,cnt,otp,Unnamed: 3_level_1,Unnamed: 4_level_1
pop,cnt,otp,Unnamed: 3_level_2,Unnamed: 4_level_2
pop,cnt,otp,Unnamed: 3_level_3,Unnamed: 4_level_3
pop,cnt,otp,Unnamed: 3_level_4,Unnamed: 4_level_4
popcntotp ACB00.00 ASW00.00 BEB00.00 CDX00.00 CEU00.00 CHB00.00,popcntotp CHS00.00 CLM45.92 ESN00.00 FIN00.00 GBR00.00,popcntotp GIH00.00 GWD00.00 IBS00.00 ITU00.00 JPT00.00,popcntotp KHV00.00 LWK00.00 MSL00.00 MXL613.04 PEL813.09,popcntotp PJL00.00 PUR00.00 STU00.00 TSI00.00 YRI00.00

pop,cnt,otp
ACB,0,0.0
ASW,0,0.0
BEB,0,0.0
CDX,0,0.0
CEU,0,0.0
CHB,0,0.0

pop,cnt,otp
CHS,0,0.0
CLM,4,5.92
ESN,0,0.0
FIN,0,0.0
GBR,0,0.0

pop,cnt,otp
GIH,0,0.0
GWD,0,0.0
IBS,0,0.0
ITU,0,0.0
JPT,0,0.0

pop,cnt,otp
KHV,0,0.0
LWK,0,0.0
MSL,0,0.0
MXL,6,13.04
PEL,8,13.09

pop,cnt,otp
PJL,0,0.0
PUR,0,0.0
STU,0,0.0
TSI,0,0.0
YRI,0,0.0


This table shows the countries for the chromosomes that express 5_18

In [37]:
plt16_obj = dm.superset_yes_no([dm.di_5_18, dm.di_7_1760], min_match=0.9)
plt16 = plt16_obj.do_plot()
show(plt16)

In [38]:
HTML(plt16_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354033,136588031,5647,7,1760,0.01,17,1.0,0,0.0,0,0.0,17,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353462,135915358,79721,4,1699,0.01,16,0.94,0,0.0,0,0.0,16,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353283,135771974,368330,6,1503,0.01,17,1.0,0,0.0,0,0.0,17,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353849,136447707,42559,4,1149,0.01,17,1.0,0,0.0,0,0.0,17,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354061,136603638,28487,16,511,0.03,17,1.0,0,0.0,0,0.0,17,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353787,136392474,250856,95,176,0.1,17,1.0,0,0.0,0,0.0,17,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353692,136243714,107818,5,18,0.94,17,1.0,0,0.0,0,0.0,17,7.22,0,0.0,0,0.0,0,0.0,0,0.0


A common set of series is expressed by 17 of the chromosomes that express 5_18

In [39]:
plt17_obj = dm.superset_yes_no([dm.di_5_18, dm.di_24_1504], min_match=.001)
plt17 = plt17_obj.do_plot()
show(plt17)

In [40]:
HTML(plt17_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354033,136588031,5647,7,1760,0.0,3,1.0,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353462,135915358,79721,4,1699,0.0,3,1.0,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.0,3,1.0,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353283,135771974,368330,6,1503,0.0,3,1.0,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354129,136652953,108222,5,1296,0.0,3,1.0,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353849,136447707,42559,4,1149,0.0,3,1.0,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354127,136652491,80281,6,1114,0.0,3,1.0,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354061,136603638,28487,16,511,0.01,3,1.0,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353787,136392474,250856,95,176,0.02,3,1.0,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0
353692,136243714,107818,5,18,0.17,3,1.0,0,0.0,0,0.0,3,7.22,0,0.0,0,0.0,0,0.0,0,0.0


A more extensive set is expressed by 3 chromosomes probably as a result of another crossover that joined 24_1504, 5_1296, and 6_1114 with the more common set of series expressed by 17 chromosomes.

In [41]:
cntx17 = plt17_obj.plot_context
HTML(cntx17.get_country_html())

pop,cnt,otp,Unnamed: 3_level_0,Unnamed: 4_level_0
pop,cnt,otp,Unnamed: 3_level_1,Unnamed: 4_level_1
pop,cnt,otp,Unnamed: 3_level_2,Unnamed: 4_level_2
pop,cnt,otp,Unnamed: 3_level_3,Unnamed: 4_level_3
pop,cnt,otp,Unnamed: 3_level_4,Unnamed: 4_level_4
popcntotp ACB00.00 ASW00.00 BEB00.00 CDX00.00 CEU00.00 CHB00.00,popcntotp CHS00.00 CLM00.00 ESN00.00 FIN00.00 GBR00.00,popcntotp GIH00.00 GWD00.00 IBS00.00 ITU00.00 JPT00.00,popcntotp KHV00.00 LWK00.00 MSL00.00 MXL00.00 PEL329.46,popcntotp PJL00.00 PUR00.00 STU00.00 TSI00.00 YRI00.00

pop,cnt,otp
ACB,0,0.0
ASW,0,0.0
BEB,0,0.0
CDX,0,0.0
CEU,0,0.0
CHB,0,0.0

pop,cnt,otp
CHS,0,0.0
CLM,0,0.0
ESN,0,0.0
FIN,0,0.0
GBR,0,0.0

pop,cnt,otp
GIH,0,0.0
GWD,0,0.0
IBS,0,0.0
ITU,0,0.0
JPT,0,0.0

pop,cnt,otp
KHV,0,0.0
LWK,0,0.0
MSL,0,0.0
MXL,0,0.0
PEL,3,29.46

pop,cnt,otp
PJL,0,0.0
PUR,0,0.0
STU,0,0.0
TSI,0,0.0
YRI,0,0.0


All 3 of these chromsomes come from the Lima Peru population.

In [42]:
plt18_obj = dm.superset_yes_no([dm.di_5_18, dm.di_26_1414], min_match=.001)
plt18 = plt18_obj.do_plot()
show(plt18)

This figure shows the series expressed by the 1 remaining 5_18 chromosome.  It appears to involve a cross over between 5_18 and 95_176 that joined one instance of 5_18 with the series 26_1414, 64_1575, 10_2206, and 7_1868 that are common series associated with the 11_765 lactase persistence series.

In [43]:
cntx18 = plt18_obj.plot_context
HTML(cntx18.get_country_html())

pop,cnt,otp,Unnamed: 3_level_0,Unnamed: 4_level_0
pop,cnt,otp,Unnamed: 3_level_1,Unnamed: 4_level_1
pop,cnt,otp,Unnamed: 3_level_2,Unnamed: 4_level_2
pop,cnt,otp,Unnamed: 3_level_3,Unnamed: 4_level_3
pop,cnt,otp,Unnamed: 3_level_4,Unnamed: 4_level_4
popcntotp ACB00.00 ASW00.00 BEB00.00 CDX00.00 CEU00.00 CHB00.00,popcntotp CHS00.00 CLM126.64 ESN00.00 FIN00.00 GBR00.00,popcntotp GIH00.00 GWD00.00 IBS00.00 ITU00.00 JPT00.00,popcntotp KHV00.00 LWK00.00 MSL00.00 MXL00.00 PEL00.00,popcntotp PJL00.00 PUR00.00 STU00.00 TSI00.00 YRI00.00

pop,cnt,otp
ACB,0,0.0
ASW,0,0.0
BEB,0,0.0
CDX,0,0.0
CEU,0,0.0
CHB,0,0.0

pop,cnt,otp
CHS,0,0.0
CLM,1,26.64
ESN,0,0.0
FIN,0,0.0
GBR,0,0.0

pop,cnt,otp
GIH,0,0.0
GWD,0,0.0
IBS,0,0.0
ITU,0,0.0
JPT,0,0.0

pop,cnt,otp
KHV,0,0.0
LWK,0,0.0
MSL,0,0.0
MXL,0,0.0
PEL,0,0.0

pop,cnt,otp
PJL,0,0.0
PUR,0,0.0
STU,0,0.0
TSI,0,0.0
YRI,0,0.0


The 1 chromosome expressing this set of series comes from the Columbia Medellin population

In [44]:
di_6_57 = 353444
di_6_35 = 353407
di_7_22 = 353315
plt11_obj = dm.superset_yes_no([dm.di_6_1503, dm.di_26_1414], 
                             [dm.di_4_911, dm.di_11_765, di_6_57, di_6_35, di_7_22])
plt11_obj.do_plot()
HTML(plt11_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.11,243,0.98,7,0.11,3,0.18,45,1.6,94,1.56,33,1.24,22,1.28,39,1.2
353462,135915358,79721,4,1699,0.15,248,1.0,8,0.13,3,0.17,46,1.61,96,1.56,33,1.21,24,1.37,38,1.15
353283,135771974,368330,6,1503,0.17,249,1.0,8,0.13,3,0.17,46,1.6,96,1.55,33,1.21,24,1.37,39,1.17
353797,136398174,75924,26,1414,0.18,249,1.0,8,0.13,3,0.17,46,1.6,96,1.55,33,1.21,24,1.37,39,1.17


In [45]:
cntx11 = plt11_obj.plot_context
HTML(cntx11.get_country_html())

pop,cnt,otp,Unnamed: 3_level_0,Unnamed: 4_level_0
pop,cnt,otp,Unnamed: 3_level_1,Unnamed: 4_level_1
pop,cnt,otp,Unnamed: 3_level_2,Unnamed: 4_level_2
pop,cnt,otp,Unnamed: 3_level_3,Unnamed: 4_level_3
pop,cnt,otp,Unnamed: 3_level_4,Unnamed: 4_level_4
popcntotp ACB00.00 ASW30.49 BEB171.99 CDX202.16 CEU70.71 CHB100.98,popcntotp CHS121.15 CLM161.71 ESN00.00 FIN40.41 GBR30.33,popcntotp GIH100.98 GWD40.36 IBS131.22 ITU151.48 JPT353.38,popcntotp KHV191.93 LWK40.41 MSL00.00 MXL142.20 PEL131.54,popcntotp PJL70.73 PUR30.29 STU141.38 TSI60.56 YRI00.00

pop,cnt,otp
ACB,0,0.0
ASW,3,0.49
BEB,17,1.99
CDX,20,2.16
CEU,7,0.71
CHB,10,0.98

pop,cnt,otp
CHS,12,1.15
CLM,16,1.71
ESN,0,0.0
FIN,4,0.41
GBR,3,0.33

pop,cnt,otp
GIH,10,0.98
GWD,4,0.36
IBS,13,1.22
ITU,15,1.48
JPT,35,3.38

pop,cnt,otp
KHV,19,1.93
LWK,4,0.41
MSL,0,0.0
MXL,14,2.2
PEL,13,1.54

pop,cnt,otp
PJL,7,0.73
PUR,3,0.29
STU,14,1.38
TSI,6,0.56
YRI,0,0.0


### 4_911 populations

The populations of the 149 chromosomes that express 4_911 but not 11_765 provide the best available evidence for the population where 11_765 appeared.  None of those chromosomes come from tru Africans, American populations considered by thousand genomes to be of African ancestry, or East Asian populations.  The surprising result is the 59 American chromosomes that are the largest component.  Perhaps, some chance gene flow brought this series to America through some kind of bottleneck event.  But, the other possiblity is a gene flow across Siberia that merged with an East Asian gene flow as part of the intial human settlement of America.

In [46]:
plt12_obj = dm.superset_yes_no([dm.di_4_911], [dm.di_11_765])
plt12_obj.do_plot()
HTML(plt12_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.06,137,0.92,0,0.0,0,0.0,55,3.11,0,0.0,37,2.2,16,1.61,29,1.58
353462,135915358,79721,4,1699,0.09,147,0.99,0,0.0,0,0.0,58,3.05,0,0.0,40,2.21,17,1.59,32,1.62
353283,135771974,368330,6,1503,0.1,148,0.99,0,0.0,0,0.0,59,3.09,0,0.0,40,2.2,17,1.58,32,1.61
353797,136398174,75924,26,1414,0.11,149,1.0,0,0.0,0,0.0,59,3.07,0,0.0,40,2.18,17,1.57,33,1.65
353604,136092061,315418,4,911,0.16,149,1.0,0,0.0,0,0.0,59,3.07,0,0.0,40,2.18,17,1.57,33,1.65


In [47]:
cntx12 = plt12_obj.plot_context
HTML(cntx12.get_country_html())

pop,cnt,otp,Unnamed: 3_level_0,Unnamed: 4_level_0
pop,cnt,otp,Unnamed: 3_level_1,Unnamed: 4_level_1
pop,cnt,otp,Unnamed: 3_level_2,Unnamed: 4_level_2
pop,cnt,otp,Unnamed: 3_level_3,Unnamed: 4_level_3
pop,cnt,otp,Unnamed: 3_level_4,Unnamed: 4_level_4
popcntotp ACB00.00 ASW00.00 BEB81.56 CDX00.00 CEU71.19 CHB00.00,popcntotp CHS00.00 CLM61.07 ESN00.00 FIN71.19 GBR71.29,popcntotp GIH71.14 GWD00.00 IBS71.10 ITU132.14 JPT00.00,popcntotp KHV00.00 LWK00.00 MSL00.00 MXL61.58 PEL224.35,popcntotp PJL91.58 PUR254.04 STU132.14 TSI121.88 YRI00.00

pop,cnt,otp
ACB,0,0.0
ASW,0,0.0
BEB,8,1.56
CDX,0,0.0
CEU,7,1.19
CHB,0,0.0

pop,cnt,otp
CHS,0,0.0
CLM,6,1.07
ESN,0,0.0
FIN,7,1.19
GBR,7,1.29

pop,cnt,otp
GIH,7,1.14
GWD,0,0.0
IBS,7,1.1
ITU,13,2.14
JPT,0,0.0

pop,cnt,otp
KHV,0,0.0
LWK,0,0.0
MSL,0,0.0
MXL,6,1.58
PEL,22,4.35

pop,cnt,otp
PJL,9,1.58
PUR,25,4.04
STU,13,2.14
TSI,12,1.88
YRI,0,0.0


The exceptional concentration of these chromosomes is in Peruvian and Puerto Rican populations