In [1]:
import numpy as np
from IPython.display import HTML
from bokeh.plotting import output_notebook, show
import genomes_dnj.lct_interval.series_plots as dm
output_notebook(hide_banner=True)

<h3>EUR Tree</h3>
<div style="width:700px">
<p>
The plot below shows the association of series that forms the root of the EUR tree.  The
626,000 base lower region of the EUR tree root that covers the genes rab3gap1, 
zranb3, and the first half of r3hdm1 includes only the 10 SNPs in the series 6_1503 and 
4_1699.  By contrast this region of the SAS tree root includes more than 495 SNPs.
The 80,000 base upper half of the gene r3hdm1 is covered by the 26 SNP
series 26_1414.  The coverage of that series is roughly equivalent to the 32 SNP series
32_1361 in the root of the EAS tree.  The 270,000 base upper region that covers the
genes ubxn4, lct, mcm6, and dars includes the series 64_1575, 10_2206, and 
7_1868.
<p>
The individual series of the EUR tree root all have some significant presence in other
contexts.  So do some of the associations of those series.  Only 4 samples of the
1503 that express 6_1503 fail to express 4_1699.  It is possible that 6_1503 and 
4_1699 have an history as a single series and that 4_1699 as a distinct series is the result
of some genetic process that lead to its fragmentation.  The overexpressed association of
6_1503 and 4_1699 without 26_1414 includes the series 91_176, and 51_176.  That association
is expressed by 139 East Asian samples.
<p>
The series 26_1414 is strongly associated with 10_2206.  Only 29 of the samples
that express 26_1414 fail to express 10_2206.  Another 204 fail to express
64_1575.  But most of those are associated with fragments of 64_1575 that cover its overlap
with 10_2206.
<p>
The series 64_1575 is strongly associated both with 10_2206 and with 7_1868.  Only
1 of the 1575 chromosomes that express 64_1575 fails to express 10_2206.  Only
29 of them fail to express 7_1868.  It seems likely that 26_1414, 64_1575, 10_2206,
and 7_1868 have some history as a single series of SNPs.
The existence of 64_1575, 10_2206, and 7_1868 as a unit without 26_1414 comes mainly
from recombination events that associated them with 32_1361 in the EAS tree.
The hierarchies that resulted from that recombination appear to have experienced some
kind of selective advantage that lead to the large number of samples that express
the recombinant association.  But there are also several samples that express recombinant
associations of 64_1575 with a variety of other hierarchies.
<p>
The root series for this tree are overexpressed in European populations by more then a factor of two.
Only 12 samples that express this root come from African populations native to Africa.  The
samples expressing 6_1503 are the limiting factor.  East Asian populations are underexpressed
by about a factor of two.  The 37 samples in African populations from the Caribbean or the American
Southwest that express these series appear to be the result of gene flow selected for lactase persistence.
</div>

In [2]:
plt_obj = dm.superset_yes_no([dm.di_6_1503, dm.di_26_1414], min_match=0.8)
plt = plt_obj.do_plot()
show(plt)

In [3]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.57,1251,0.99,12,0.05,37,0.47,271,1.56,113,0.45,598,2.38,104,1.14,116,0.76
354170,136682274,93624,7,1868,0.59,1094,0.86,12,0.05,36,0.52,220,1.45,81,0.37,564,2.57,88,1.11,93,0.69
353462,135915358,79721,4,1699,0.75,1268,1.0,13,0.05,37,0.47,275,1.57,115,0.45,602,2.36,107,1.16,119,0.77
353901,136494186,271765,64,1575,0.7,1106,0.87,12,0.05,36,0.52,224,1.46,87,0.39,566,2.55,86,1.07,95,0.7
353283,135771974,368330,6,1503,0.84,1270,1.0,13,0.05,37,0.46,276,1.57,115,0.45,602,2.36,107,1.16,120,0.77
353797,136398174,75924,26,1414,0.9,1270,1.0,13,0.05,37,0.46,276,1.57,115,0.45,602,2.36,107,1.16,120,0.77


<div style="width:700px">
<p>
The plot below shows the descendant series of the EUR tree root.  The 13 African samples that
express the root series is a small number.  But, it is large enough to suggest that the root series
does have some history that extends back into the African past.  Otherwise, the very small number
of African samples suggests that the rest of the tree appeared during the expansion out of Africa.
</div>

In [4]:
plt_obj = dm.subset_yes_no([dm.di_6_1503, dm.di_26_1414], min_match=0.8)
plt = plt_obj.do_plot()
show(plt)

In [5]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353283,135771974,368330,6,1503,0.84,1270,1.0,13,0.05,37,0.46,276,1.57,115,0.45,602,2.36,107,1.16,120,0.77
353797,136398174,75924,26,1414,0.9,1270,1.0,13,0.05,37,0.46,276,1.57,115,0.45,602,2.36,107,1.16,120,0.77
353604,136092061,315418,4,911,1.0,910,0.72,2,0.01,31,0.54,205,1.63,0,0.0,524,2.87,71,1.07,77,0.69
353380,135837906,870076,11,765,1.0,765,0.6,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353444,135889881,493055,6,57,0.93,53,0.04,2,0.19,1,0.3,14,1.91,0,0.0,28,2.63,5,1.3,3,0.46
353407,135859157,121781,6,35,0.97,34,0.03,1,0.15,1,0.47,12,2.55,0,0.0,17,2.49,2,0.81,1,0.24
353315,135786061,800623,7,22,1.0,22,0.02,0,0.0,0,0.0,0,0.0,19,4.29,0,0.0,3,1.88,0,0.0
353570,136039834,309728,7,20,0.85,17,0.01,0,0.0,1,0.94,4,1.7,0,0.0,11,3.22,0,0.0,1,0.48


<div style="width:700px">
<p>
The plot below shows the most specific series in the region expressed by samples that express
the root of the EUR tree.  The most specific series is the one expressed by the smallest number of
samples.  None of the 765 samples that express the lactase persistence series 11_765 express
a more specific series in this region.
</div>

In [6]:
plt_obj = dm.basis_yes_no([dm.di_6_1503, dm.di_26_1414])
plt = plt_obj.do_basis_plot()
show(plt)

In [7]:
HTML(plt_obj.get_basis_html())

index,first,last,length,snps,alleles,basis,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353797,136398174,136474098,75924,26,1414,202,6,0.15,2,0.16,39,1.39,80,1.97,20,0.49,19,1.29,36,1.45
354129,136652953,136761175,108222,5,1296,3,0,0.0,0,0.0,0,0.0,3,4.97,0,0.0,0,0.0,0,0.0
354127,136652491,136732772,80281,6,1114,7,0,0.0,0,0.0,0,0.0,2,1.42,1,0.71,2,3.93,2,2.33
353984,136556805,136747085,190280,39,1014,9,0,0.0,0,0.0,0,0.0,7,3.86,1,0.55,1,1.53,0,0.0
353604,136092061,136407479,315418,4,911,135,0,0.0,0,0.0,56,2.99,0,0.0,35,1.29,16,1.63,28,1.69
353902,136494985,136773638,278653,81,857,2,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,2,8.16
353938,136514709,136543147,28438,6,820,5,0,0.0,0,0.0,1,1.44,0,0.0,0,0.0,2,5.5,2,3.26
353380,135837906,136707982,870076,11,765,765,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
354061,136603638,136632125,28487,16,511,1,0,0.0,0,0.0,1,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354064,136605402,136624686,19284,8,328,3,0,0.0,0,0.0,1,2.41,0,0.0,2,3.32,0,0.0,0,0.0


<div style="width:700px">
<p>
The plot below shows the series in the region expressed by the samples that express
26_1414 as a basis series and also express 6_1503.  Most of these samples express
the full set of six series associated  with the EUR tree root.  But, about 20% of
the samples have lost 64_1575 and 7_1868.
</div>

In [8]:
plt_obj = dm.superset_basis_yes_no(dm.di_26_1414, [dm.di_6_1503], min_match=0.1)
plt = plt_obj.do_plot()
show(plt)

In [9]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.09,202,1.0,6,0.15,2,0.16,39,1.39,80,1.97,20,0.49,19,1.29,36,1.45
354170,136682274,93624,7,1868,0.09,162,0.8,6,0.18,1,0.1,20,0.89,63,1.93,20,0.61,18,1.53,34,1.71
353462,135915358,79721,4,1699,0.12,201,1.0,6,0.15,2,0.16,39,1.4,80,1.98,20,0.5,19,1.3,35,1.42
353901,136494186,271765,64,1575,0.1,164,0.81,6,0.18,1,0.1,19,0.84,65,1.97,20,0.61,18,1.51,35,1.74
353283,135771974,368330,6,1503,0.13,202,1.0,6,0.15,2,0.16,39,1.39,80,1.97,20,0.49,19,1.29,36,1.45
353797,136398174,75924,26,1414,0.14,202,1.0,6,0.15,2,0.16,39,1.39,80,1.97,20,0.49,19,1.29,36,1.45


<div style="width:700px">
<p>
The plot below shows the series expressed by samples that express the EUR root and series 4_911
but not series 11_765.  Almost half of these samples have lost the series 64_1575 and 7_1868.
</div>

In [10]:
plt_obj = dm.superset_yes_no([dm.di_26_1414, dm.di_6_1503, dm.di_4_911], [dm.di_11_765], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [11]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.06,136,0.92,0,0.0,0,0.0,55,3.13,0,0.0,37,2.21,16,1.62,28,1.54
354170,136682274,93624,7,1868,0.04,76,0.51,0,0.0,0,0.0,34,3.46,0,0.0,25,2.67,6,1.09,11,1.08
353462,135915358,79721,4,1699,0.09,147,0.99,0,0.0,0,0.0,58,3.05,0,0.0,40,2.21,17,1.59,32,1.62
353901,136494186,271765,64,1575,0.05,80,0.54,0,0.0,0,0.0,39,3.77,0,0.0,25,2.54,5,0.86,11,1.03
353283,135771974,368330,6,1503,0.1,148,1.0,0,0.0,0,0.0,59,3.09,0,0.0,40,2.2,17,1.58,32,1.61
353797,136398174,75924,26,1414,0.1,148,1.0,0,0.0,0,0.0,59,3.09,0,0.0,40,2.2,17,1.58,32,1.61
353604,136092061,315418,4,911,0.16,148,1.0,0,0.0,0,0.0,59,3.09,0,0.0,40,2.2,17,1.58,32,1.61


<div style="width:700px">
<p>
This plot shows the most common series expressed by samples that express the series
11_765 associated with lactase persistence.  That series extends over more than 870,000
DNA bases.  Most of the associated series are expressed by all of the 765 samples that
express 11_765.  The exceptions are the 3 samples that don't express 4_911 and the
1 sample that does not express 7_1868.
</div>

In [12]:
plt_obj = dm.superset_yes_no([dm.di_11_765], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [13]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.35,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
354170,136682274,93624,7,1868,0.41,764,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,44,0.47
353462,135915358,79721,4,1699,0.45,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353901,136494186,271765,64,1575,0.49,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353283,135771974,368330,6,1503,0.51,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353797,136398174,75924,26,1414,0.54,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353604,136092061,315418,4,911,0.84,762,1.0,2,0.01,31,0.65,146,1.38,0,0.0,484,3.16,54,0.97,45,0.48
353380,135837906,870076,11,765,1.0,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48


<div style="width:700px">
<p>
The hierarchies selected by the series 6_57 show a more substantial history of
recombination than the other EUR tree descendants.  The 57 samples that express
6_57 do include 23 that express 64_1575 and 22 that express all of the series in the
standard EUR tree root.  But, the other 34 samples express associations with 6_57
that reflect some kind of recombination event.
</div>

In [14]:
plt_obj = dm.superset_yes_no([dm.di_6_57, dm.di_64_1575], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [15]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.01,23,1.0,2,0.43,1,0.69,6,1.88,0,0.0,10,2.16,2,1.2,2,0.71
354170,136682274,93624,7,1868,0.01,22,0.96,2,0.45,1,0.72,6,1.97,0,0.0,9,2.04,2,1.25,2,0.74
353462,135915358,79721,4,1699,0.01,23,1.0,2,0.43,1,0.69,6,1.88,0,0.0,10,2.16,2,1.2,2,0.71
353901,136494186,271765,64,1575,0.01,23,1.0,2,0.43,1,0.69,6,1.88,0,0.0,10,2.16,2,1.2,2,0.71
353283,135771974,368330,6,1503,0.02,23,1.0,2,0.43,1,0.69,6,1.88,0,0.0,10,2.16,2,1.2,2,0.71
353797,136398174,75924,26,1414,0.02,23,1.0,2,0.43,1,0.69,6,1.88,0,0.0,10,2.16,2,1.2,2,0.71
353444,135889881,493055,6,57,0.4,23,1.0,2,0.43,1,0.69,6,1.88,0,0.0,10,2.16,2,1.2,2,0.71


In [16]:
plt_obj = dm.superset_yes_no([dm.di_6_57, dm.di_39_1014, dm.di_26_1414], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [17]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.01,25,1.0,0,0.0,0,0.0,8,2.31,0,0.0,14,2.79,2,1.1,1,0.33
353462,135915358,79721,4,1699,0.01,25,1.0,0,0.0,0,0.0,8,2.31,0,0.0,14,2.79,2,1.1,1,0.33
353283,135771974,368330,6,1503,0.02,25,1.0,0,0.0,0,0.0,8,2.31,0,0.0,14,2.79,2,1.1,1,0.33
353797,136398174,75924,26,1414,0.02,25,1.0,0,0.0,0,0.0,8,2.31,0,0.0,14,2.79,2,1.1,1,0.33
353984,136556805,190280,39,1014,0.02,25,1.0,0,0.0,0,0.0,8,2.31,0,0.0,14,2.79,2,1.1,1,0.33
353444,135889881,493055,6,57,0.44,25,1.0,0,0.0,0,0.0,8,2.31,0,0.0,14,2.79,2,1.1,1,0.33


In [18]:
plt_obj = dm.superset_yes_no([dm.di_6_57, dm.di_39_1014], [dm.di_26_1414], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [19]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353462,135915358,79721,4,1699,0.0,3,1.0,0,0.0,0,0.0,3,8.85,0,0.0,0,0.0,0,0.0,0,0.0
353283,135771974,368330,6,1503,0.0,3,1.0,0,0.0,0,0.0,3,8.85,0,0.0,0,0.0,0,0.0,0,0.0
353814,136406646,31432,5,1460,0.0,3,1.0,0,0.0,0,0.0,3,8.85,0,0.0,0,0.0,0,0.0,0,0.0
353925,136506375,32564,4,1442,0.0,3,1.0,0,0.0,0,0.0,3,8.85,0,0.0,0,0.0,0,0.0,0,0.0
353919,136500475,42085,13,1227,0.0,3,1.0,0,0.0,0,0.0,3,8.85,0,0.0,0,0.0,0,0.0,0,0.0
353790,136393157,48253,10,1218,0.0,3,1.0,0,0.0,0,0.0,3,8.85,0,0.0,0,0.0,0,0.0,0,0.0
353906,136496493,57432,9,1170,0.0,3,1.0,0,0.0,0,0.0,3,8.85,0,0.0,0,0.0,0,0.0,0,0.0
353849,136447707,42559,4,1149,0.0,3,1.0,0,0.0,0,0.0,3,8.85,0,0.0,0,0.0,0,0.0,0,0.0
353907,136496805,55824,9,1023,0.0,3,1.0,0,0.0,0,0.0,3,8.85,0,0.0,0,0.0,0,0.0,0,0.0
353984,136556805,190280,39,1014,0.0,3,1.0,0,0.0,0,0.0,3,8.85,0,0.0,0,0.0,0,0.0,0,0.0


<div style="width:700px">
<p>
This plot shows the series most commonly expressed by samples
that express the series 6_35.
</div>

In [20]:
plt_obj = dm.superset_yes_no([dm.di_6_35], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [21]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.02,34,0.97,1,0.15,1,0.47,12,2.55,0,0.0,17,2.49,2,0.81,1,0.24
354170,136682274,93624,7,1868,0.02,30,0.86,1,0.17,1,0.53,10,2.41,0,0.0,15,2.49,2,0.92,1,0.27
353462,135915358,79721,4,1699,0.02,35,1.0,1,0.14,1,0.46,13,2.68,0,0.0,17,2.42,2,0.79,1,0.23
353901,136494186,271765,64,1575,0.02,31,0.89,1,0.16,1,0.51,10,2.33,0,0.0,16,2.57,2,0.89,1,0.26
353283,135771974,368330,6,1503,0.02,35,1.0,1,0.14,1,0.46,13,2.68,0,0.0,17,2.42,2,0.79,1,0.23
353797,136398174,75924,26,1414,0.02,34,0.97,1,0.15,1,0.47,12,2.55,0,0.0,17,2.49,2,0.81,1,0.24
353407,135859157,121781,6,35,1.0,35,1.0,1,0.14,1,0.46,13,2.68,0,0.0,17,2.42,2,0.79,1,0.23


<div style="width:700px">
<p>
This plot shows the most common series expressed by samples that express the series
7_22.  The series 7_22 extends over more than 800,000 DNA bases.  Its SNPs are concentrated
at the lower and higher ends of the series.  The chromosomes that express 7_22 include
21 that express the full set of EUR tree root series.  The 22 chromosomes include 19
from East Asian populations and 3 from South Asian populations.
</div>

In [22]:
plt_obj = dm.superset_yes_no([dm.di_7_22], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [23]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.01,22,1.0,0,0.0,0,0.0,0,0.0,19,4.29,0,0.0,3,1.88,0,0.0
354170,136682274,93624,7,1868,0.01,21,0.95,0,0.0,0,0.0,0,0.0,18,4.26,0,0.0,3,1.97,0,0.0
353462,135915358,79721,4,1699,0.01,22,1.0,0,0.0,0,0.0,0,0.0,19,4.29,0,0.0,3,1.88,0,0.0
353901,136494186,271765,64,1575,0.01,21,0.95,0,0.0,0,0.0,0,0.0,18,4.26,0,0.0,3,1.97,0,0.0
353283,135771974,368330,6,1503,0.01,22,1.0,0,0.0,0,0.0,0,0.0,19,4.29,0,0.0,3,1.88,0,0.0
353797,136398174,75924,26,1414,0.02,22,1.0,0,0.0,0,0.0,0,0.0,19,4.29,0,0.0,3,1.88,0,0.0
353315,135786061,800623,7,22,1.0,22,1.0,0,0.0,0,0.0,0,0.0,19,4.29,0,0.0,3,1.88,0,0.0


<div style="width:700px">
<p>
This plot shows the most common series expressed by samples that express the series
7_20.
</div>

In [24]:
plt_obj = dm.superset_yes_no([dm.di_7_20], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [25]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.01,18,0.9,0,0.0,1,0.89,5,2.0,0,0.0,11,3.04,0,0.0,1,0.45
354170,136682274,93624,7,1868,0.01,18,0.9,0,0.0,1,0.89,5,2.0,0,0.0,11,3.04,0,0.0,1,0.45
353462,135915358,79721,4,1699,0.01,18,0.9,0,0.0,1,0.89,4,1.6,0,0.0,12,3.32,0,0.0,1,0.45
353901,136494186,271765,64,1575,0.01,18,0.9,0,0.0,1,0.89,5,2.0,0,0.0,11,3.04,0,0.0,1,0.45
353283,135771974,368330,6,1503,0.01,18,0.9,0,0.0,1,0.89,4,1.6,0,0.0,12,3.32,0,0.0,1,0.45
353797,136398174,75924,26,1414,0.01,18,0.9,0,0.0,1,0.89,5,2.0,0,0.0,11,3.04,0,0.0,1,0.45
353570,136039834,309728,7,20,1.0,20,1.0,1,0.25,1,0.8,5,1.8,0,0.0,12,2.99,0,0.0,1,0.41
