In [1]:
from IPython.display import HTML
import genomes_dnj.lct_interval.plot_colors as clr
from bokeh.plotting import output_notebook, show
import genomes_dnj.lct_interval.series_plots as dm
output_notebook(hide_banner=True)

<h3>Methods</h3>
<div style="width:700px">
<p>
Full details of all of the methods are available in the source code.  This notebook describes
them and provides some examples to illustrate their use.
</div>

<h3>Identification of Series</h3>
<div style="width:700px">
<p>
SNPs were grouped into series through an algorithmic process.  That process started with an SNP
located somewhere on chromosome 2 and a test boolean mask of the thousand genomes chromosome 2
samples that express it.  SNPs farther along chromosome 2 were scanned to identify more SNPs
that met two criteria.  A boolean mask was constructed for the chromosome samples that expressed
the SNP.  That mask was compared with the test mask.  The requirement for a match was the ability
to find 90% of the samples from each mask that matched.  The other was the requirement that 90%
of the chromosome samples that matched any SNP in the series matched at least 90% of all the SNPs
in the series.  This process was carried out recursively by making the last identified SNP in a
series the start of another attempt to find more SNPs in the series.  The process was continued
until a 1,000,000 base interval was scanned without finding any additional SNPs in the series.
</div>

<h3>Yes No Comparisons</h3>
<div style="width:700px">
<p>
Plots were constructed by selecting chromosome samples that satisfied some filter criteria.
The criteria are based on a yes list of series and a no list of series.  To be selected
a chromosome sample must express all of the series in the yes list and none of the series in the no list.
</div>

<h3>Series Representation</h3>
<div style="width:700px">
<p>
Series are identified by the number of SNPs in the series and the number of 1000 genomes chromosome
samples that express it.  The series 11_765 associated with lactase persistence includes 11 SNPs
and is expressed by 765 of the 5008 1000 genomes samples of chromsome 2.
</div>

In [2]:
HTML(clr.pop_colors)

0,1
,African
,American
,East Asian
,European
,South Asian


<div style="width:700px">
<p>
Series are represented in a plot by a colored rectangle.  The hue of the rectangle represents
the most overexpressed population for the chromosome samples that express the series.
More saturated colors represent more overexpressed populations.  The table above shows the fully
saturated colors used to represent overexpression of series by different 1000 genomes populations.
The plot x axis maps to locations on chromosome 2.  The height of the rectangle represents the log
of the number of 1000 genome chromosome 2 samples that express the series.  Each SNP in the series
is represented with a vertical line.  Height and color are properties of the series regardless
of how many chromosome samples express the series in a particular plot.
</div>

<h3>Superset Plots</h3>
<div style="width:700px">
<p>
Most plots in these notebooks are superset plots.  The intent of a superset plot is to find
all of the series in the studied region expressed by some set of chromosome samples including
series that are also expressed by additional samples.  A yes no comparison is used to select
input chromosome samples.  The output series are identified by trying to match the input samples
with the samples that express each of the series in the studied region.  Those series that
meet the match criterion are added to the result set.  When the intent was to limit
the series to those expressed by all of the selected chromosome samples, the match
criterion was generally 90%.  When the intent was to identify all of the series most commonly
expressed by the selected chromosome samples, the match criterion was generally 50%.  The use
of a 50% match prevents the selection of overlapping series that are expressed by different
subsets of the selected chromosome samples.  When the intent was to display an hierarchy of
series that are all expressed by all the chosen set of chromosome samples, the match criterion
was generally 0.1%.  The data associated with each plot shows the number of input samples
that actually express each of the result series.
</div>

In [3]:
lct_plt_obj = dm.superset_yes_no([dm.di_11_765], min_match=0.9)
plt = lct_plt_obj.do_plot()
show(plt)

<div style="width:700px">
<p>
This plot is an example of a superset plot.  The only criteria for selecting chromosome samples is
the expression of the series 11_765.  The series in the plot are ones that are expressed by
at least 90% of the 1000 genomes phase 3 chromsome 2 samples that express the series 11_765.
</div>

In [4]:
HTML(lct_plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.35,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
354170,136682274,93624,7,1868,0.41,764,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,44,0.47
353462,135915358,79721,4,1699,0.45,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353901,136494186,271765,64,1575,0.49,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353283,135771974,368330,6,1503,0.51,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353797,136398174,75924,26,1414,0.54,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
353604,136092061,315418,4,911,0.84,762,1.0,2,0.01,31,0.65,146,1.38,0,0.0,484,3.16,54,0.97,45,0.48
353380,135837906,870076,11,765,1.0,765,1.0,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48


<div style="width:700px">
<p>
This table shows the plot data for the 11_765 superset plot.  Each matched series has a row
in the table.  Series are ordered by the number of 1000 genomes chromosome samples that express
the series.  Matches give the number of those samples selected for the plot that express the
particular series.
<p>
For example the top line indicates that the 10 SNP series expressed by 2206 chromosome 2 samples
is expressed by all 765 of the chromsomes samples that satisfied this plot's yes no criteria.
Those samples were 35% of the total 2206 1000 genomes chromosome 2 samples that express the series
10_2206.  Those samples were 100% of the 765 chromosome samples that satisfied this plot's yes no
criteria.
<p>
Number of matches and relative population expression are broken out by population.  Both the
African and South Asian populations are divided between those that came from the native area
and those who live externally.  The data in this table was the reason for that choice.  The 32
instances of 11_765 found in American Southwest and Caribbean African populations are very unlikely
to have come from Africa and significantly over represent the frequency of lactase persistence
in Afican populations.
<p>
For calculations of population overexpression chromosome samples that failed the no test
were excluded from the population under consideration.  The relative expression calculation
is based on a ratio of the samples that matched to the population size minus the samples that
failed the no test.
</div>

<h3>Subset Plots</h3>
<div style="width:700px">
<p>
A subset plot starts with a test sample set and identifies series that are expressed by a
subset of that sample set.  The samples that express each of the series in the result set
must be a subset of the samples in the test set up to some match criterion.
Generally, a 90% match criterion was used for these plots.
</div>

In [5]:
plt_obj = dm.subset_yes_no([dm.di_193_843], min_match=0.8)
plt = plt_obj.do_plot()
show(plt)

<div style="width:700px">
<p>
The plot above shows series that are expressed by chromosome 2 samples that are subsets of
the chromosome samples that express the series 193_843.  Those series are all part of the
hierarchy that has been named the SAS tree.  The match criterion has been relaxed to 0.8
because some of the chromosome samples that express the series 8_267 have experienced a
recombination event that cause loss of enough 193_843 SNPs for those samples not to match
193_843.
</div>

In [6]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353504,135964764,136368,6,946,0.88,832,0.99,78,0.47,26,0.5,141,1.22,165,0.99,82,0.49,114,1.89,226,2.22
353248,135759095,628204,193,843,1.0,843,1.0,79,0.47,26,0.49,143,1.22,167,0.98,83,0.49,117,1.91,228,2.21
353384,135839805,84662,4,815,0.82,667,0.79,47,0.35,18,0.43,135,1.46,147,1.09,71,0.53,79,1.63,170,2.08
353521,135989333,275298,6,713,0.94,668,0.79,47,0.35,18,0.43,135,1.46,146,1.09,73,0.54,79,1.63,170,2.08
353511,135977595,306204,4,416,1.0,415,0.49,33,0.4,9,0.35,77,1.34,96,1.15,24,0.29,53,1.76,123,2.42
353358,135818487,471078,8,267,0.84,223,0.26,1,0.02,3,0.21,56,1.81,47,1.05,48,1.07,24,1.48,44,1.61
353320,135789764,571688,4,91,1.0,91,0.11,0,0.0,0,0.0,64,5.08,25,1.36,2,0.11,0,0.0,0,0.0
353285,135772482,525048,7,90,1.0,90,0.11,24,1.32,3,0.53,1,0.08,19,1.05,0,0.0,17,2.6,26,2.36
353396,135846793,454499,7,79,1.0,79,0.09,0,0.0,0,0.0,66,6.03,11,0.69,2,0.13,0,0.0,0,0.0
353292,135775651,610782,29,68,1.0,68,0.08,1,0.07,1,0.23,5,0.53,0,0.0,10,0.73,20,4.05,31,3.72


<div style="width:700px">
<p>
The data in the table above shows 19 series where 100% of the chromsome samples that
express the series also express 193_843.
<p>
Three other series, 4_815, 6_713, and 8_267,
are of particular interest.  All three are contained within the interval of chromosome 2
covered by 193_843.  Recombination events that lose association with 193_843
must be somewhere within the DNA covered by the 193_843 series.  More than 80% of the
chromosome samples that express any of the three series also express 193_843.  But a
signficant number of those chromosomes samples do not.
<p>
The reasons are two recombination events that are analyszed in the part of this notebook
focused on allele masks.  One involved a chromosome that expressed the series 8_267 and
6_713.  It resulted from a recombination event that lost the last 23 SNPs of 193_843.
The result of that recombination event became overexpressed through a selection 
process associated with the appearance of the series 9_39.
<p>
The other recombination event involved a chromosome that expressed 193_843 and the
other major series that form the root of the SAS tree with a chromosome that expressed
the series 5_684 that is a major component of the EAS tree hierarchy.  That recombination
event resulted in a number of overexpressed variations on the association of 4_815
and 5_684 that appear to have conferred some kind of selecrtive advantage on the
recombinant chromosomes.
<p>
Another class of cases include 15_59, 20_56, 5_28, and 5_25.  All of these series
cover parts of the studied region of chromosome 2 that are beyond the
end of 193_843.  As is commonly the case hierarchies selected by these series have
experienced recombination events between the more stable lower part of the region
and the less stable upper part.
<p>
The two other cases are the series 10_25 and 8_17.  Both have experienced recombination
events within the region covered by 193_843.  But neither case has generated an
overexpressed result.
</div>

<h3>Basis Plots</h3>
<div style="width:700px">
<p>
A basis plot is used to match samples to the most specific series in an hierarchy
that the samples express.  Basis filtering can be used in combination with yes no comparisons
to explore recombination events for samples that express particular parts of an hierarchy.
</div>

In [7]:
plt_obj = dm.basis_yes_no([dm.di_6_1503, dm.di_26_1414])
plt = plt_obj.do_basis_plot()
show(plt)

<div style="width:700px">
<p>
This plot shows the basis series for all of the chromosome 2 samples in the phase 3 1000 genomes data
that express both the series 6_1503 and 26_1414.  These criteria identify the chromosome 2 samples
that express hierarchies that are part of the EUR tree.
<p>
Series are labeled in the top left corner by their id.  The number at the end of the series is the
number of chromosome 2 samples that do express it and do not express any other more specific series
in the lactase persistence region of chromosome 2.  For example the 135 chromosome 2 samples that express
the series 4_911 as a basis series do not express the series 11_765.  All but 3 of the chromosome samples
that express 11_765 also express 4_911.  But, since 911 chromosamples express 4_911 and only 765 chromosome
samples express 11_765, 11_765 is the more specific series.
<p>
The color used to represent the basis series is the color used to represent the whole series.  But the height
of the basis series rectangles is a function of the log of the number of chromosome samples that actually
express it as a basis series.
</div>

In [8]:
HTML(plt_obj.get_basis_html())

index,first,last,length,snps,alleles,basis,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353797,136398174,136474098,75924,26,1414,202,6,0.15,2,0.16,39,1.39,80,1.97,20,0.49,19,1.29,36,1.45
354129,136652953,136761175,108222,5,1296,3,0,0.0,0,0.0,0,0.0,3,4.97,0,0.0,0,0.0,0,0.0
354127,136652491,136732772,80281,6,1114,7,0,0.0,0,0.0,0,0.0,2,1.42,1,0.71,2,3.93,2,2.33
353984,136556805,136747085,190280,39,1014,9,0,0.0,0,0.0,0,0.0,7,3.86,1,0.55,1,1.53,0,0.0
353604,136092061,136407479,315418,4,911,135,0,0.0,0,0.0,56,2.99,0,0.0,35,1.29,16,1.63,28,1.69
353902,136494985,136773638,278653,81,857,2,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,2,8.16
353938,136514709,136543147,28438,6,820,5,0,0.0,0,0.0,1,1.44,0,0.0,0,0.0,2,5.5,2,3.26
353380,135837906,136707982,870076,11,765,765,2,0.01,32,0.67,146,1.38,0,0.0,484,3.15,56,1.01,45,0.48
354061,136603638,136632125,28487,16,511,1,0,0.0,0,0.0,1,7.22,0,0.0,0,0.0,0,0.0,0,0.0
354064,136605402,136624686,19284,8,328,3,0,0.0,0,0.0,1,2.41,0,0.0,2,3.32,0,0.0,0,0.0


<div style="width:700px">
<p>
The data accompanying the basis plots needs to be consulted to understand the population
distribution of the chromosome samples that express a particular series as a basis.  The data
in this table for the series 26_1414, 4_911, and 11_765 provide some insight in the evolution
of lactase persistence.
<p>
The 202 chromosome samples that express 26_1414 as a basis suggest
that the root series of the EUR tree already had some kind of selective advantage.  Those
samples are heavily under expressed in African populations, modestly under expressed in
European populations, modestly over expressed in American and South Asian populations,
and most heavily over expressed in East Asian populations.  The root series of the EUR
tree does seem to have been favored by the expansion out of Africa.  But, it appears
to have been distributed over all of the other 1000 genomes population regions.
<p>
The 135 chromosomes that express 4_911 as a basis do not include any African or East Asian
chromosomes.  They are modestly over expressed for European populations, more over expressed
for South Asian populations and most over expressed for American populations.  It seems unlikely
that the American samples got to America through the migration either of East Asian or European
populations.
</div>

<h3>Allele Mask Plots</h3>
<div style="width:700px">
<p>
Another kind of analysis is based on the use of an allele mask to keep track of the 1000 genomes
chromosome samples that express some characteristic.  An allele mask is an array with a boolean value
for each of the 5008 chromosome samples in the 1000 genomes phase 3 data.  The value is true for those
chromosome samples that satisfy some logical test.  This kind of analysis is particularly useful in
analyzing the expression of series fragments.

In [9]:
import genomes_dnj.lct_interval.anal_series as an
from genomes_dnj.lct_interval.series_masks import am_193_843, nam_193_843
an.sa_193_843.unique_snps_per_allele(am_193_843)

array([(174, 4), (175, 1), (181, 3), (185, 1), (186, 1), (188, 2),
       (191, 15), (192, 120), (193, 696)], 
      dtype=[('count', '<u2'), ('snps', '<u2')])

<div style="width:700px">
<p>
The an.sa_193_43 object is an instance of the series_anal_cls.  It is used for analysis
of the samples that express any subset of the 193 SNPs in the series 193_843.
The allele mask am_193_843 is a boolean mask that identifies the 1000 genomes
chromosome 2 samples that express the series 193_843.  The unique_snps_allele method
returns a list of the number of SNPs and the number of chromosome samples that express
that number of 193_843 SNPs.  This data shows that 696 of the 843 chromosome samples
that express 193_843 express all 193 of its SNPs.  Another 120 of the chromosomes express
192 of them.  Smaller numbers of chromosome samples do express smaller numbers of SNPs
that are fragments of the whole series.
</div>

In [10]:
an.sa_193_843.unique_snps_per_allele(nam_193_843)

array([(0, 3458), (1, 215), (2, 174), (3, 2), (4, 3), (5, 52), (6, 8),
       (7, 3), (8, 2), (9, 1), (10, 1), (11, 1), (12, 39), (15, 1),
       (17, 1), (18, 3), (19, 1), (21, 1), (23, 3), (26, 1), (27, 1),
       (28, 12), (29, 94), (30, 1), (44, 2), (54, 1), (55, 27), (63, 2),
       (65, 2), (98, 1), (100, 1), (101, 1), (139, 1), (158, 1), (164, 3),
       (165, 3), (169, 1), (170, 41)], 
      dtype=[('count', '<u2'), ('snps', '<u2')])

<div style="width:700px">
<p>
The nam_193_843 boolean mask identifies chromosome 2 samples that do not
express 193_843.  The results are a list of numbers of 193_843 SNPs and
the number of chromsome 2 samples that express that number of 193_843 SNPs.
The results show that 3458 samples express none of the 193 SNPs.
<p>
The data does show that a significant number of samples express 1, 2,
or 3 fragments of the 193 SNPs.  It also shows that 5, 6, 12, 28, 29, 55
and 170 SNP fragments are expressed by numbers of samples that indicate
substantial non random overexpression.  It also shows a background of
other fragments of the 193 SNPs that could be available for some kind
of future selection process.
</div>

In [11]:
aps_29, am_29 = an.sa_193_843.snps_from_aps_value(29, nam_193_843)
aps_29

array([93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93,
       93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,  0,  0,
        0,  1,  0,  0,  0,  0,  0,  0,  1,  0,  0,  1,  0,  1,  0,  0,  1,
        0,  0,  0,  0,  0,  0,  0,  0,  1,  1,  0,  0,  1,  1,  1,  0,  0,
        0,  0,  1,  0,  1,  0,  0,  0,  0,  1,  0,  1,  0,  0,  1,  1,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  1,  1,  0,  0,  0,  0,  1,
        1,  0,  0,  1,  0,  0,  1,  0,  1,  0,  1,  1,  0,  0,  1,  0,  0,
        0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0])

<div style="width:700px">
<p>
This data looks at the 29 SNP fragments that are expressed by 94 chromosome samples.
The method snps_from_aps_value takes a count of the number of SNPs and a
boolean mask of chromosome 2 samples.  It returns an array of the number of
samples that express each of the 193 SNPs and a boolean mask of
samples that express 29 SNP fragments of the 193 SNPs.  Only samples
identified in the input mask are elgible for the output mask.
<p>
In this case 93 of the chromsomes samples express the first 29 SNPs.
That pattern reflects the result of some recombination event.  The other chromosome
sample expresses a very irregular collection of 29 different SNPs generated
by some kind of unknown process.
</div>

In [12]:
plt_obj = dm.superset_allele_mask(am_29, min_match=0.5)
plt = plt_obj.do_plot()
am_4_815 = plt_obj.plot_context.yes_allele_mask
show(plt)

<div style="width:700px">
<p>
This plot shows the series expressed by at least 50% of the chromosome samples that express a 29
SNP fragment of the series 193_843 SNPs.  It indentifies the result of a recombination events that
have associated the series 4_815 from the South Asian tree with 5_684 and 32_1361 from the East Asian
tree and with 64_1575, 10_2206, and 7_1868 from the European tree.
</div>

In [13]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.03,60,0.64,0,0.0,0,0.0,0,0.0,59,4.89,0,0.0,1,0.23,0,0.0
354170,136682274,93624,7,1868,0.03,62,0.66,1,0.08,0,0.0,0,0.0,59,4.73,1,0.08,1,0.22,0,0.0
353901,136494186,271765,64,1575,0.04,60,0.64,0,0.0,0,0.0,0,0.0,59,4.89,0,0.0,1,0.23,0,0.0
353791,136393658,92684,32,1361,0.06,79,0.84,1,0.06,0,0.0,0,0.0,71,4.47,1,0.06,4,0.7,2,0.21
353384,135839805,84662,4,815,0.11,93,0.99,2,0.11,0,0.0,0,0.0,84,4.49,1,0.05,4,0.59,2,0.18
353498,135959272,416583,5,684,0.13,87,0.93,0,0.0,0,0.0,0,0.0,82,4.68,0,0.0,4,0.63,1,0.09


<div style="width:700px">
<p>
The data from the plot show that 93 out of the 94 chromsome samples that express 29 193_843
SNPs express the series 4_815 and that 87 of them express the series 5_864
</div>

In [42]:
an.sa_193_843.alleles_per_snp(am_4_815)

array([93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93,
       93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0])

<div style="width:700px">
<p>
This data confirms that the 93 chromosome samples that express 4_815 are the
93 chromsome samples that express the first 29 SNPs of the series 193_843.
</div>

In [43]:
plt_obj = dm.superset_allele_mask(am_29, [dm.di_4_815, dm.di_5_684, dm.di_32_1361, dm.di_64_1575], min_match=0.001)
plt = plt_obj.do_plot()
show(plt)

<div style="width:700px">
<p>
This plot limits the chromosme samples to those that express all of the 6 series shown in the plot and
no others.
</div>

In [44]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.03,58,1.0,0,0.0,0,0.0,0,0.0,57,4.88,0,0.0,1,0.24,0,0.0
354170,136682274,93624,7,1868,0.03,58,1.0,0,0.0,0,0.0,0,0.0,57,4.88,0,0.0,1,0.24,0,0.0
353901,136494186,271765,64,1575,0.04,58,1.0,0,0.0,0,0.0,0,0.0,57,4.88,0,0.0,1,0.24,0,0.0
353791,136393658,92684,32,1361,0.04,58,1.0,0,0.0,0,0.0,0,0.0,57,4.88,0,0.0,1,0.24,0,0.0
353384,135839805,84662,4,815,0.07,58,1.0,0,0.0,0,0.0,0,0.0,57,4.88,0,0.0,1,0.24,0,0.0
353498,135959272,416583,5,684,0.08,58,1.0,0,0.0,0,0.0,0,0.0,57,4.88,0,0.0,1,0.24,0,0.0


<div style="width:700px">
<p>
The data shows that 58 chromosomes satisfy the plot's criteria.  All but 1 of them are East Asian.
<p>
The next 6 plots show the series expressed by the 6 chromosome samples that express the first 29
193_843 SNPs, series 4_815, but not series 5_684.  Every one of those chromosome samples expresses
a different association of series.
</div>

In [45]:
plt_obj = dm.superset_allele_mask(am_29, [dm.di_4_815, dm.di_32_1361, dm.di_81_857, dm.di_5_47], [dm.di_5_684], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [46]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354033,136588031,5647,7,1760,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,8.16
354130,136653925,107928,24,1504,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,8.16
353925,136506375,32564,4,1442,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,8.16
353791,136393658,92684,32,1361,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,8.16
353958,136535876,19014,7,1303,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,8.16
354129,136652953,108222,5,1296,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,8.16
353919,136500475,42085,13,1227,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,8.16
354127,136652491,80281,6,1114,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,8.16
353902,136494985,278653,81,857,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,8.16
353384,135839805,84662,4,815,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,8.16


In [47]:
plt_obj = dm.superset_allele_mask(am_29, [dm.di_4_815, dm.di_32_1361, dm.di_81_857], [dm.di_5_47, dm.di_5_684], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [20]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354033,136588031,5647,7,1760,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353925,136506375,32564,4,1442,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353791,136393658,92684,32,1361,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353958,136535876,19014,7,1303,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354129,136652953,108222,5,1296,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353919,136500475,42085,13,1227,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354127,136652491,80281,6,1114,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353902,136494985,278653,81,857,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353384,135839805,84662,4,815,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0


In [21]:
plt_obj = dm.superset_allele_mask(am_29, [dm.di_4_815, dm.di_32_1361], [dm.di_81_857, dm.di_5_684], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [22]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354170,136682274,93624,7,1868,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353791,136393658,92684,32,1361,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353384,135839805,84662,4,815,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0


In [23]:
plt_obj = dm.superset_allele_mask(am_29, [dm.di_4_815, dm.di_5_1460, dm.di_117_1685], 
                                  [dm.di_32_1361, dm.di_5_684], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [24]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354170,136682274,93624,7,1868,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354033,136588031,5647,7,1760,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353244,135758231,618284,117,1685,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353478,135933921,434642,123,1561,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353814,136406646,31432,5,1460,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353790,136393157,48253,10,1218,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353906,136496493,57432,9,1170,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353935,136511874,21321,5,976,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353729,136309239,52321,9,887,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353384,135839805,84662,4,815,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0


In [25]:
plt_obj = dm.superset_allele_mask(am_29, [dm.di_4_815, dm.di_5_1460], 
                                  [dm.di_117_1685, dm.di_32_1361, dm.di_5_684], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [26]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353814,136406646,31432,5,1460,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353906,136496493,57432,9,1170,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353849,136447707,42559,4,1149,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353907,136496805,55824,9,1023,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353984,136556805,190280,39,1014,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353935,136511874,21321,5,976,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353807,136403994,81102,13,911,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353938,136514709,28438,6,820,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353384,135839805,84662,4,815,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0


In [27]:
plt_obj = dm.superset_allele_mask(am_29, [dm.di_4_815], [dm.di_5_1460, dm.di_32_1361, dm.di_5_684], min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

In [28]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353921,136501840,53819,10,2206,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
354170,136682274,93624,7,1868,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353901,136494186,271765,64,1575,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353797,136398174,75924,26,1414,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0
353384,135839805,84662,4,815,0.0,1,1.0,0,0.0,0,0.0,0,0.0,1,4.97,0,0.0,0,0.0,0,0.0


<div style="width:700px">
<p>
The following plot shows the series expressed by the 1 chromosome sample that expresses
29 193_843 SNPs but does not express the series 4_815
</div>

In [48]:
plt_obj = dm.superset_allele_mask(am_29, [dm.di_117_1685], [dm.di_4_815], min_match=0.001)
plt = plt_obj.do_plot()
am_not_4_815 = plt_obj.plot_context.yes_allele_mask
show(plt)

In [49]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353244,135758231,618284,117,1685,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353478,135933921,434642,123,1561,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354189,136704466,27748,5,212,0.0,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354181,136696010,68554,18,172,0.01,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354143,136658238,19681,6,167,0.01,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
354156,136666402,102253,17,121,0.01,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
353866,136468307,15591,5,18,0.06,1,1.0,1,4.97,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0


In [50]:
an.sa_193_843.alleles_per_snp(am_not_4_815)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0,
       0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0])

<div style="width:700px">
<p>
This data confirms that this chromosome sample is the one that expresses the
divergent series of 29 193_843 SNPs.
</div>

In [32]:
aps_170, am_170 = an.sa_193_843.snps_from_aps_value(170, nam_193_843)
aps_170

array([41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
       41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
       41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
       41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
       41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
       41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
       41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
       41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
       41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
       41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0])

<div style="width:700px">
<p>
This data shows the expressed SNPs and obtains the allele mask for the
chromosome samples that express 170 193_843 SNPs.  In this case, all 41
of the chromosome samples express the first 170 SNPs.
</div>

In [53]:
plt_obj = dm.superset_allele_mask(am_170, min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

<div style="width:700px">
<p>
This plot indicates that a selection process  associated with the emergence of the
series 9_39 has resulted in overexpression of the first 170 193_843 SNPs.  The series
9_39 appears to be rooted in an hierachy formed by a recombination of 8_267 from the
South Asian tree and the series including 32_1361 and 81_857 at the root of the upper region
of the East Asian tree
</div>

In [54]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354033,136588031,5647,7,1760,0.02,34,0.83,1,0.15,0,0.0,9,1.91,0,0.0,21,3.07,0,0.0,3,0.72
353244,135758231,618284,117,1685,0.02,41,1.0,1,0.12,0,0.0,13,2.29,0,0.0,24,2.91,0,0.0,3,0.6
354130,136653925,107928,24,1504,0.02,31,0.76,1,0.16,0,0.0,9,2.1,0,0.0,18,2.89,0,0.0,3,0.79
353925,136506375,32564,4,1442,0.02,34,0.83,1,0.15,0,0.0,9,1.91,0,0.0,21,3.07,0,0.0,3,0.72
353791,136393658,92684,32,1361,0.03,41,1.0,1,0.12,0,0.0,13,2.29,0,0.0,24,2.91,0,0.0,3,0.6
353958,136535876,19014,7,1303,0.03,34,0.83,1,0.15,0,0.0,9,1.91,0,0.0,21,3.07,0,0.0,3,0.72
354129,136652953,108222,5,1296,0.02,31,0.76,1,0.16,0,0.0,9,2.1,0,0.0,18,2.89,0,0.0,3,0.79
353269,135766890,509095,62,1265,0.03,41,1.0,1,0.12,0,0.0,13,2.29,0,0.0,24,2.91,0,0.0,3,0.6
353919,136500475,42085,13,1227,0.03,34,0.83,1,0.15,0,0.0,9,1.91,0,0.0,21,3.07,0,0.0,3,0.72
354127,136652491,80281,6,1114,0.03,31,0.76,1,0.16,0,0.0,9,2.1,0,0.0,18,2.89,0,0.0,3,0.79


<div style="width:700px">
<p>
The data associated with the plot shows that 38 of the chromosome samples that
express the series 9_38 express the first 170 SNPs of series 193_843.
</div>

In [55]:
aps_169, am_169 = an.sa_193_843.snps_from_aps_value(169, nam_193_843)
aps_169

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0])

<div style="width:700px">
<p>
This data is for the 1 chromosome sample that expresses 169 series
193_843 SNPs.  It expresses all but 1 of the first 170 SNPs.
</div>

In [36]:
plt_obj = dm.superset_allele_mask(am_169, min_match=0.5)
plt = plt_obj.do_plot()
show(plt)

<div style="width:700px">
<p>
This plot shows that the 1 chromosome sample that expresses 169 series 193_843 SNPs is the
remaining chromosome sample that expresses the series 9_39
</div>

In [37]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353244,135758231,618284,117,1685,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353791,136393658,92684,32,1361,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353269,135766890,509095,62,1265,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353906,136496493,57432,9,1170,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353907,136496805,55824,9,1023,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353984,136556805,190280,39,1014,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353935,136511874,21321,5,976,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353504,135964764,136368,6,946,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353938,136514709,28438,6,820,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353384,135839805,84662,4,815,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0


<div style="width:700px">
<p>
This data confirms that the 1 chromosome sample has matched all of the plotted series.
</div>

In [56]:
plt_obj = dm.superset_allele_mask(am_170, [dm.di_8_267, dm.di_81_857], [dm.di_9_39], min_match=0.001)
plt = plt_obj.do_plot()
show(plt)

<div style="width:700px">
<p>
This plot shows the series association for chromosome samples that do express 170 series 193_843 SNPs,
the series 8_267, the series 81_857, but not the series 9_39
</div>

In [39]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
354033,136588031,5647,7,1760,0.0,2,1.0,0,0.0,0,0.0,0,0.0,0,0.0,2,4.98,0,0.0,0,0.0
353244,135758231,618284,117,1685,0.0,2,1.0,0,0.0,0,0.0,0,0.0,0,0.0,2,4.98,0,0.0,0,0.0
354130,136653925,107928,24,1504,0.0,2,1.0,0,0.0,0,0.0,0,0.0,0,0.0,2,4.98,0,0.0,0,0.0
353925,136506375,32564,4,1442,0.0,2,1.0,0,0.0,0,0.0,0,0.0,0,0.0,2,4.98,0,0.0,0,0.0
353791,136393658,92684,32,1361,0.0,2,1.0,0,0.0,0,0.0,0,0.0,0,0.0,2,4.98,0,0.0,0,0.0
353958,136535876,19014,7,1303,0.0,2,1.0,0,0.0,0,0.0,0,0.0,0,0.0,2,4.98,0,0.0,0,0.0
354129,136652953,108222,5,1296,0.0,2,1.0,0,0.0,0,0.0,0,0.0,0,0.0,2,4.98,0,0.0,0,0.0
353269,135766890,509095,62,1265,0.0,2,1.0,0,0.0,0,0.0,0,0.0,0,0.0,2,4.98,0,0.0,0,0.0
353919,136500475,42085,13,1227,0.0,2,1.0,0,0.0,0,0.0,0,0.0,0,0.0,2,4.98,0,0.0,0,0.0
354127,136652491,80281,6,1114,0.0,2,1.0,0,0.0,0,0.0,0,0.0,0,0.0,2,4.98,0,0.0,0,0.0


<div style="width:700px">
<p>
This data shows that this plot accounts for 2 of the chromsome samples that express 170 series
193_843 SNPs but do not express series 9_39.
</div>

In [40]:
plt_obj = dm.superset_allele_mask(am_170, [dm.di_8_267], [dm.di_9_39, dm.di_81_857], min_match=0.001)
plt = plt_obj.do_plot()
show(plt)

<div style="width:700px">
<p>
This plot shows the series expressed by the 1 remaining chromosome sample that expresses 170
series 193_843 SNPs but does not express the series 9_39.
</div>

In [41]:
HTML(plt_obj.get_html())

index,first,length,snps,alleles,alleles.1,matches,matches.1,afr,afr.1,afx,afx.1,amr,amr.1,eas,eas.1,eur,eur.1,sas,sas.1,sax,sax.1
353244,135758231,618284,117,1685,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353791,136393658,92684,32,1361,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353269,135766890,509095,62,1265,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353906,136496493,57432,9,1170,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353907,136496805,55824,9,1023,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353984,136556805,190280,39,1014,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353935,136511874,21321,5,976,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353504,135964764,136368,6,946,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353938,136514709,28438,6,820,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0
353384,135839805,84662,4,815,0.0,1,1.0,0,0.0,0,0.0,0,0.0,0,0.0,1,4.98,0,0.0,0,0.0


<div style="width:700px">
<p>
The data confirms that the plotted association of series is expressed by 1 chromosome sample.
</div>