In [1]:
import numpy as np
from IPython.display import HTML
from bokeh.plotting import output_notebook, show
import genomes_dnj.lct_interval.lct_plots as dm
output_notebook(hide_banner=True)

<h3>Lactase Persistence SNP Series</h3>
<div style="width:700px">
<p>
The original reason for focusing on the region of chromosome 2 analyzed in these notebooks
was the location of the gene lct that is needed to digest lactase.  The selective advantage of
mutations that cause production of lactase to persist into adulthood is a particularly well
known subject of human genetics.  Most studies and analyses have assumed that lactase
persistence is the result of the action of a single SNP and not paid much attention to the
phenomena of SNP correlation and haplotypes.  The results presented in this notebook come
from an initial analysis before the structured covered in the other notebooks had been identified.
Between the amount of structure identified in this notebook and all of the other structures that
have been identified in rest of the notebooks presented in this analysis, it seems likely that
the phenotype of lactase persistence is associated with more than one SNP.  This initial notebook
was retained in the overview section of the full set of notebooks because it gives some overview
of the most well known genetic variations in the studied interval of chromosome 2.
<p>
The SNP rs4988235 has been identified as the source of lactase persistence in European populations.
The method for grouping SNPs used for this analysis groups rs4988235 with 10 other SNP's into a
series that is expressed by 765 of the 5008 thousand genome phase 3 sample chromosomes.  The
chromosomes that express this series, 11_765, also express seven other series, 6_1503, 4_1699,
4_911, 26_1414, 64_1575, 10_2206, and 7_1868.  The first number in these id's is the number of SNPs
in the series.  The second number is the number of chromosomes that express the series.  This plot
shows these series.  A black line has been drawn at the location of each SNP.  The green color
indicates an over expressed European population.  The aggregate series at the bottom of the
plot shows a line for each of the 132 SNPs that are are strongly associated with the expression of rs4988235.
</div>

In [2]:
plt0_obj = dm.lct_agg()
plt0 = plt0_obj.do_plot()
show(plt0)

<div style="width:700px">
<p>
This table shows the data for the individual series.  Most were expressed by all
765 chromosomes.  None of those chromosomes came from East Asian samples.  Only 2
came from true African samples.  But 32 instances came from the American Southwest and
Caribbean populations that the thousand genome data counts as part of the African population.
The column labeled "sax" represents the part of the South Asian population that actually lives
in the United States or the United Kingdom.
<div>

In [3]:
plt1_obj = dm.superset_11_765()
plt1 = plt1_obj.do_plot()
HTML(plt1_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.35,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
354170,136682274,93624,7,1868,0.41,764,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,44,0.47
353462,135915358,79721,4,1699,0.45,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353901,136494186,271765,64,1575,0.49,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353283,135771974,368330,6,1503,0.51,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353797,136398174,75924,26,1414,0.54,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353604,136092061,315418,4,911,0.84,762,1.0,2,0.01,31,0.65,146,1.38,0,0.0,484,3.16,54,0.97,45,0.48
353380,135837906,870076,11,765,1.0,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48


<h3>Series SNPs</h3>
<div style="width:700px">
<p>
A set of tables shows summary data for each series and a table of data for each SNP in the series.  
<p>
The column in the SNP data tables labeled "niv" means not_expressed_is_variant. The SNP allele
with the lowest frequency in the thousand genome data is always considered to be the variant
even when the reference genome labeled the more common allele as the variant.  An niv value
of 1 means the allele considered as the variant for this analysis is the one that the reference
genome considers to be the standard.  Note that the word allele is used to indicate an instance
of a series and also to identify the individual instances of chromosomes that can express an
instance of a series.
</div>

In [4]:
import genomes_dnj.lct_interval.lct_series_html as dh
HTML(dh.lct_html)

index,first,length,snps,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353380,135837906,870076,11,765,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48

index,pos,id,niv,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
875450,135837906,rs7570971,1,785,2,0.01,32,0.65,149,1.37,0,0.0,493,3.13,60,1.05,49,0.51
875770,135907088,rs6730157,1,791,2,0.01,32,0.65,151,1.38,0,0.0,493,3.1,61,1.06,52,0.54
875981,135954797,rs1375131,1,789,2,0.01,32,0.65,149,1.36,0,0.0,492,3.1,61,1.06,53,0.55
876953,136138627,rs3940549,1,785,2,0.01,32,0.65,150,1.38,0,0.0,492,3.12,61,1.07,48,0.5
877231,136176540,rs13384711,1,795,3,0.02,32,0.64,151,1.37,0,0.0,493,3.09,63,1.09,53,0.54
877937,136328890,rs56369224,1,793,2,0.01,32,0.64,151,1.37,2,0.01,492,3.09,62,1.08,52,0.53
878126,136381348,rs12465802,1,795,2,0.01,32,0.64,151,1.37,0,0.0,496,3.11,62,1.07,52,0.53
878351,136429366,rs62168795,1,807,2,0.01,34,0.67,154,1.38,0,0.0,506,3.12,60,1.02,51,0.52
879308,136608646,rs4988235,0,808,2,0.01,34,0.67,150,1.34,0,0.0,511,3.15,60,1.02,51,0.51
879345,136616754,rs182549,0,818,2,0.01,34,0.66,153,1.35,0,0.0,512,3.12,63,1.06,54,0.54

index,first,length,snps,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353604,136092061,315418,4,911,2,0.01,31,0.54,205,1.62,0,0.0,524,2.86,71,1.07,78,0.7

index,pos,id,niv,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
876701,136092061,rs1561277,1,962,2,0.01,32,0.53,207,1.55,0,0.0,534,2.76,85,1.22,102,0.86
877683,136273578,rs6735329,1,938,2,0.01,33,0.56,207,1.59,0,0.0,533,2.83,78,1.14,85,0.74
877903,136322676,rs6759321,1,968,2,0.01,32,0.53,207,1.54,1,0.01,534,2.75,89,1.26,103,0.87
878243,136407479,rs1446585,1,951,2,0.01,34,0.57,209,1.59,0,0.0,542,2.84,79,1.14,85,0.73

index,first,length,snps,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353283,135771974,368330,6,1503,14,0.05,37,0.39,307,1.47,306,1.01,608,2.01,109,1.0,122,0.66

index,pos,id,niv,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
875075,135771974,rs10187402,1,1528,14,0.05,37,0.39,308,1.45,311,1.01,615,2.0,113,1.02,130,0.69
875541,135859371,rs13413101,1,1528,9,0.03,38,0.4,313,1.48,313,1.02,613,2.0,113,1.02,129,0.69
875644,135877562,rs1869829,1,1544,38,0.12,43,0.44,314,1.47,305,0.98,608,1.96,111,0.99,125,0.66
876349,136022798,rs935613,1,1560,41,0.13,43,0.44,315,1.46,312,0.99,609,1.94,113,1.0,127,0.66
876551,136058820,rs2016636,1,1559,39,0.12,43,0.44,316,1.46,312,0.99,611,1.95,111,0.98,127,0.66
876971,136140304,rs6430571,1,1564,40,0.13,44,0.45,317,1.46,316,1.0,610,1.94,111,0.98,126,0.66

index,first,length,snps,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353462,135915358,79721,4,1699,151,0.44,78,0.73,320,1.36,309,0.9,609,1.78,110,0.89,122,0.59

index,pos,id,niv,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
875801,135915358,rs6747073,1,1724,160,0.46,82,0.76,324,1.36,313,0.9,609,1.76,110,0.88,126,0.6
875978,135954405,rs1375132,1,1727,159,0.46,79,0.73,327,1.37,315,0.91,610,1.76,111,0.88,126,0.6
876197,135995073,rs12471508,1,1742,164,0.47,81,0.74,328,1.36,316,0.9,610,1.74,113,0.89,130,0.61
876198,135995079,rs7609517,1,1840,187,0.5,89,0.77,345,1.35,325,0.88,633,1.71,122,0.91,139,0.62

index,first,length,snps,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353797,136398174,75924,26,1414,93,0.33,60,0.68,288,1.47,128,0.45,608,2.14,111,1.08,126,0.73

index,pos,id,niv,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
878194,136398174,rs12619365,1,1402,78,0.28,56,0.64,289,1.49,130,0.46,608,2.16,112,1.1,129,0.75
878208,136402117,rs4954276,1,1399,78,0.28,56,0.64,289,1.49,129,0.46,607,2.16,112,1.1,128,0.75
878220,136403749,rs13413639,1,1424,95,0.33,61,0.68,289,1.46,128,0.45,609,2.13,112,1.08,130,0.74
878251,136409073,rs4954279,1,1423,95,0.33,61,0.68,289,1.47,130,0.45,607,2.12,112,1.08,129,0.74
878256,136410299,rs10928542,1,1423,95,0.33,61,0.68,289,1.47,129,0.45,608,2.13,112,1.08,129,0.74
878278,136413359,rs7608045,1,1421,95,0.33,60,0.67,289,1.47,128,0.45,608,2.13,112,1.08,129,0.74
878294,136416855,rs6715856,1,1421,95,0.33,60,0.67,289,1.47,129,0.45,608,2.13,112,1.08,128,0.73
878295,136416941,rs6755383,1,1421,95,0.33,60,0.67,289,1.47,129,0.45,608,2.13,112,1.08,128,0.73
878304,136418348,rs6720287,1,1420,95,0.33,60,0.67,289,1.47,128,0.45,608,2.13,112,1.09,128,0.74
878308,136419961,rs1446584,1,1421,96,0.34,60,0.67,289,1.47,128,0.45,608,2.13,112,1.08,128,0.73

index,first,length,snps,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,274,0.62,109,0.79,332,1.09,443,1.0,657,1.48,162,1.01,229,0.85

index,pos,id,niv,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
878705,136501840,rs10928545,0,2208,275,0.62,109,0.79,332,1.09,443,1.0,659,1.49,162,1.01,228,0.84
878719,136505546,rs3820794,0,2406,274,0.57,109,0.72,352,1.06,612,1.26,657,1.36,168,0.96,234,0.79
878786,136516748,rs12616520,0,2206,274,0.62,109,0.79,332,1.09,443,1.0,657,1.48,162,1.01,229,0.85
878813,136522710,rs9287442,0,2207,274,0.62,109,0.79,332,1.09,443,1.0,658,1.48,162,1.01,229,0.85
878846,136528004,rs2304599,0,2415,280,0.58,111,0.73,352,1.05,612,1.26,659,1.36,168,0.96,233,0.79
878905,136539513,rs10188066,0,2406,275,0.57,109,0.72,353,1.06,611,1.26,657,1.36,168,0.96,233,0.79
878929,136544752,rs6760329,0,2407,275,0.57,109,0.72,351,1.05,612,1.26,659,1.36,168,0.96,233,0.79
878937,136546110,rs2278544,0,2211,275,0.62,110,0.79,332,1.08,443,1.0,660,1.49,162,1.01,229,0.84
878979,136553529,rs1030764,0,2414,275,0.57,110,0.73,355,1.06,614,1.26,660,1.36,168,0.96,232,0.78
878998,136555659,rs2322659,0,2217,284,0.64,114,0.82,324,1.05,466,1.04,642,1.44,161,1.0,226,0.83

index,first,length,snps,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353901,136494186,271765,64,1575,55,0.17,49,0.5,257,1.18,354,1.12,584,1.85,114,1.0,162,0.84

index,pos,id,niv,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
878662,136494186,rs10202489,0,1803,95,0.26,61,0.54,310,1.24,386,1.06,618,1.71,139,1.06,194,0.88
878669,136495300,rs12469709,0,1848,127,0.34,67,0.58,315,1.23,394,1.06,617,1.66,138,1.03,190,0.84
878670,136495619,rs10195620,0,1758,62,0.18,50,0.45,307,1.26,394,1.11,617,1.75,138,1.08,190,0.88
878692,136499166,rs1438307,0,1806,97,0.27,61,0.54,311,1.24,393,1.08,616,1.7,138,1.05,190,0.86
878710,136502792,rs1438305,0,1808,97,0.27,61,0.54,311,1.24,394,1.08,617,1.7,138,1.05,190,0.86
878711,136503157,rs1438304,0,1809,98,0.27,61,0.54,311,1.24,394,1.08,617,1.7,138,1.05,190,0.86
878724,136507039,rs12998387,0,1809,98,0.27,61,0.54,311,1.24,394,1.08,616,1.7,138,1.05,191,0.86
878753,136511575,rs3213889,0,1809,97,0.27,61,0.54,311,1.24,394,1.08,617,1.7,138,1.05,191,0.86
878793,136518103,rs12994270,0,1815,97,0.27,61,0.54,311,1.24,400,1.09,616,1.69,138,1.05,192,0.86
878803,136521514,rs10192827,0,1734,43,0.12,45,0.41,307,1.28,394,1.13,616,1.77,138,1.09,191,0.9

index,first,length,snps,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354170,136682274,93624,7,1868,238,0.63,113,0.96,266,1.03,375,1.0,589,1.57,119,0.88,168,0.73

index,pos,id,niv,alleles,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
879698,136682274,rs192822,1,1870,238,0.63,113,0.96,266,1.03,376,1.0,589,1.57,120,0.88,168,0.73
879743,136691825,rs309164,1,2026,274,0.67,119,0.94,282,1.0,429,1.05,620,1.52,128,0.87,174,0.7
879775,136698098,rs309149,1,2036,274,0.67,119,0.93,282,1.0,438,1.07,620,1.52,128,0.86,175,0.7
879914,136721603,rs12615624,1,2027,274,0.67,119,0.94,282,1.0,430,1.05,620,1.52,127,0.86,175,0.7
879917,136721995,rs13404551,1,1868,238,0.63,113,0.96,267,1.03,375,1.0,588,1.57,118,0.87,169,0.74
880075,136755684,rs309134,1,2043,281,0.68,120,0.94,282,1.0,438,1.07,620,1.51,127,0.86,175,0.7
880178,136775898,rs2322818,1,1846,231,0.62,113,0.98,263,1.03,359,0.97,586,1.58,121,0.9,173,0.76


<h3>Lactase Persistence Chromosome Countries</h3>
<div style="width:700px">
<p>
This table shows the countries for the 765 lactase persistence chromosomes.  One interesting
result is the high number of instances in the Columbia Medellin population.  Both of the true
African instances come from the Gambia Western Divisions population and seem almost certain
to be the result of some kind of back migration.  Both the African Caribbean Barbados,
and the African Ancestry US South West populations that the thousand genome data considers
part of the African region do include significant numbers of lactase persistence chromosomes.
That observation was the reason for creating an African External region for those two country
populations.
</div>

In [5]:
cntx = plt1_obj.plot_context
HTML(cntx.get_country_html())

pop,cnt,otp,Unnamed: 3_level_0,Unnamed: 4_level_0
pop,cnt,otp,Unnamed: 3_level_1,Unnamed: 4_level_1
pop,cnt,otp,Unnamed: 3_level_2,Unnamed: 4_level_2
pop,cnt,otp,Unnamed: 3_level_3,Unnamed: 4_level_3
pop,cnt,otp,Unnamed: 3_level_4,Unnamed: 4_level_4
popcntotp ACB120.41 ASW201.07 BEB70.27 CDX00.00 CEU1384.56 CHB00.00,popcntotp CHS00.00 CLM571.98 ESN00.00 FIN1153.80 GBR1224.39,popcntotp GIH270.86 GWD20.06 IBS912.78 ITU120.39 JPT00.00,popcntotp KHV00.00 LWK00.00 MSL00.00 MXL301.53 PEL180.69,popcntotp PJL491.67 PUR411.29 STU60.19 TSI180.55 YRI00.00

pop,cnt,otp
ACB,12,0.41
ASW,20,1.07
BEB,7,0.27
CDX,0,0.0
CEU,138,4.56
CHB,0,0.0

pop,cnt,otp
CHS,0,0.0
CLM,57,1.98
ESN,0,0.0
FIN,115,3.8
GBR,122,4.39

pop,cnt,otp
GIH,27,0.86
GWD,2,0.06
IBS,91,2.78
ITU,12,0.39
JPT,0,0.0

pop,cnt,otp
KHV,0,0.0
LWK,0,0.0
MSL,0,0.0
MXL,30,1.53
PEL,18,0.69

pop,cnt,otp
PJL,49,1.67
PUR,41,1.29
STU,6,0.19
TSI,18,0.55
YRI,0,0.0
