In [1]:
%%html
<!-- Improve the styling of the Notebook. -->
<link href="https://fonts.googleapis.com/css2?family=Source+Code+Pro&family=Source+Sans+3&family=Source+Serif+4:opsz@8..60&display=swap" rel="stylesheet">
<style>
   div.jp-MarkdownOutput p { font-family: 'Source Serif 4', serif; width: 50em; }
   div.jp-MarkdownOutput h1,h2,h3,h4,h5,h6 { font-family: 'Source Sans 3', sans-serif; }
   div.cm-line { font-family: 'Source Code Pro', monospace; }
</style>

In [2]:
import hail as hl

# Importing a VCF File as a Hail Matrix Table

Matrix tables are a unique feature of Hail that are missing in other distributed, partitioned dataframe systems. Matrix tables were inspired by the VCF format which represents one or more genomic sequences. Each row is a genomic locus, like "chr1:123". Each column is a sample identified by a string of characters and numbers, like "NA12345". 

[`hl.import_vcf`](https://hail.is/docs/0.2/methods/impex.html#hail.methods.import_vcf) imports a VCF file as a Hail Matrix Table.

In [3]:
mt = hl.import_vcf('data/sample.vcf', reference_genome='GRCh38', min_partitions=2)

Initializing Hail with default parameters...
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/Users/dking/miniconda3/lib/python3.10/site-packages/pyspark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.3.2
SparkUI available at http://wm28c-761.broadinstitute.org:4041
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.120-f00f916faf78
LOGGING: writing to /Users/dking/projects/ww2023/notebooks/hail-20230829-1636

In [4]:
mt

<hail.matrixtable.MatrixTable at 0x174cd5060>

# Showing the Row, Column, and Entry Fields

Matrix tables, just like Tables, are recipes. Their printed form provides no useful information. [`MatrixTable.show`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.show), which is an action, displays the entry field values of the first few rows and columns of the matrix.

In [5]:
mt.show(n_rows=3, n_cols=3)

2023-08-29 16:37:09.833 Hail: INFO: scanning VCF for sortedness...
2023-08-29 16:37:16.139 Hail: INFO: Coerced sorted VCF - no additional import work to do


Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample1','Sample1','Sample1','Sample1','Sample2','Sample2','Sample2','Sample2','Sample3','Sample3','Sample3','Sample3'
locus,alleles,GT,DP,PL,AD,GT,DP,PL,AD,GT,DP,PL,AD
locus<GRCh38>,array<str>,call,int32,array<int32>,array<int32>,call,int32,array<int32>,array<int32>,call,int32,array<int32>,array<int32>
chr1:100,"[""A"",""G""]",0/1,15,"[50,10,80]","[10,5]",0/1,17,"[60,40,0]","[5,12]",0/0,12,"[100,0,120]","[12,0]"
chr1:200,"[""C"",""T""]",0/0,10,"[90,0,120]","[10,0]",1/1,15,"[0,30,100]","[5,10]",0/1,14,"[60,30,0]","[10,4]"
chr1:300,"[""G"",""A""]",0/1,20,"[40,20,80]","[15,10]",0/0,22,"[100,0,120]","[22,0]",0/1,18,"[70,10,0]","[9,9]"


This is the top-left corner of this matrix table.

Each column represents a sample and is shown with its sample identifier: "Sample1", "Sample2", and "Sample3". Each row represents a variant and is shown with the variant's locus and alleles. Each entry represents a sequenced genotype. This sequenced genotype comprises four fields: the genotype call "GT", the total depth "DP", the phred-scaled genotype likelihoods "PL", and the per-allele depth "AD". See the [VCF Specification version 4.3](https://samtools.github.io/hts-specs/VCFv4.3.pdf) for details.

Seven fields are visible: two row fields: locus and alleles; four entry fields: GT, DP, PL, and AD; one column field: s.

[`MatrixTable.show`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.show) only shows the entry fields, the row key fields, and the column key fields. The matrix table usually has other row and column fields that are not displayed by show. [`MatrixTable.describe`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.describe) lists all the fields.

In [6]:
mt.describe()

----------------------------------------
Global fields:
    None
----------------------------------------
Column fields:
    's': str
----------------------------------------
Row fields:
    'locus': locus<GRCh38>
    'alleles': array<str>
    'rsid': str
    'qual': float64
    'filters': set<str>
    'info': struct {
        DP: int32
    }
----------------------------------------
Entry fields:
    'GT': call
    'DP': int32
    'PL': array<int32>
    'AD': array<int32>
----------------------------------------
Column key: ['s']
Row key: ['locus', 'alleles']
----------------------------------------


[`MatrixTable.rows`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.rows) returns a Hail table with all the row fields. [`MatrixTable.cols`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.cols) returns a Hail table with all the column fields. We can, of course, use [`Table.describe`]((https://hail.is/docs/0.2/hail.Table.html#hail.Table.describe) and [`Table.show`]((https://hail.is/docs/0.2/hail.Table.html#hail.Table.show) to interrogate either table.

⚠️⚠️⚠️ Confusing Behavior Alert ⚠️⚠️⚠️

Matrix table columns are ordered in the same way as they are in the imported VCF. In contrast, every Hail table, including the `mt.cols()` table, is _always_ ordered by its key field.

This is indeed confusing; however, it is a necessary comprimise to avoid sorting, at great cost, the columns of a VCF.

In [7]:
mt.cols().show(n=3)

2023-08-29 16:38:02.425 Hail: WARN: cols(): Resulting column table is sorted by 'col_key'.
    To preserve matrix table column order, first unkey columns with 'key_cols_by()'


str
"""Sample1"""
"""Sample10"""
"""Sample2"""


We can preserve the ordering of the cols table by removing the setting the column key to the empty key (which requires no particular ordering).

In [8]:
mt.key_cols_by().cols().show(n=3)

str
"""Sample1"""
"""Sample2"""
"""Sample3"""


In [9]:
mt.rows().show(n=3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,info
locus,alleles,rsid,qual,filters,DP
locus<GRCh38>,array<str>,str,float64,set<str>,int32
chr1:100,"[""A"",""G""]","""rs1""",50.0,{},20
chr1:200,"[""C"",""T""]","""rs2""",30.0,{},15
chr1:300,"[""G"",""A""]","""rs3""",40.0,{},25


"info.DP" is our first example of a _nested_ field. The "info" field contains a "DP" field, the sum total depth across all sample. There are many ways to access a nested field:

In [10]:
mt.info.DP
mt['info'].DP
mt.info['DP']
mt['info']['DP']

<Int32Expression of type int32>

### Exercise

It's also possible to show individual fields. Try showing the info.DP field.

In [12]:
mt.info.DP.show(n=3)

locus,alleles,<expr>
locus<GRCh38>,array<str>,int32
chr1:100,"[""A"",""G""]",20
chr1:200,"[""C"",""T""]",15
chr1:300,"[""G"",""A""]",25


# Adding Row, Column, and Entry Fields with Annotate

## Row Fields

These "sum total depths" look fishy: they're too small. Let's compute the actual sum with [`hl.agg.sum`](https://hail.is/docs/0.2/aggregators.html#hail.expr.aggregators.sum) and add it as a new row field with [`MatrixTable.annotate_rows`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.annotate_rows).

In [13]:
mt = mt.annotate_rows(
    the_actual_sum_total_DP = hl.agg.sum(mt.DP)
)
mt.rows().show(n=3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,info,Unnamed: 6_level_0
locus,alleles,rsid,qual,filters,DP,the_actual_sum_total_DP
locus<GRCh38>,array<str>,str,float64,set<str>,int32,int64
chr1:100,"[""A"",""G""]","""rs1""",50.0,{},20,166
chr1:200,"[""C"",""T""]","""rs2""",30.0,{},15,133
chr1:300,"[""G"",""A""]","""rs3""",40.0,{},25,210


Not even close to correct! Let's replace the "info.DP" value with the actual sum. Notice that we use [`StructExpression.annotate`](https://hail.is/docs/0.2/hail.expr.StructExpression.html#hail.expr.StructExpression.annotate) to add a new field to the "info" field.

In [14]:
mt = mt.annotate_rows(
    info = mt.info.annotate(
        DP = hl.agg.sum(mt.DP)
    )
)
mt.rows().show(n=3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,info,Unnamed: 6_level_0
locus,alleles,rsid,qual,filters,DP,the_actual_sum_total_DP
locus<GRCh38>,array<str>,str,float64,set<str>,int64,int64
chr1:100,"[""A"",""G""]","""rs1""",50.0,{},166,166
chr1:200,"[""C"",""T""]","""rs2""",30.0,{},133,133
chr1:300,"[""G"",""A""]","""rs3""",40.0,{},210,210


## Column Fields

Hail has an extensive [library of random functions](https://hail.is/docs/0.2/functions/random.html) as well as a [library of statistical distributions and tests](https://hail.is/docs/0.2/functions/stats.html). Let's use [`MatrixTable.annotate_cols`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.annotate_cols) to randomly generate a height field for each sample.

In [15]:
mt = mt.annotate_cols(
    height_ft = hl.rand_norm(5 + 8/12, 2/12)
)
mt.key_cols_by().cols().show(n=3)

s,height_ft
str,float64
"""Sample1""",5.43
"""Sample2""",5.88
"""Sample3""",5.92


## Entry Fields

[`MatrixTable.annotate_entries`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.annotate_entries) adds new entry fields. We can show just one field (with its relevant keys) using [`Expression.show`](https://hail.is/docs/0.2/hail.expr.Expression.html#hail.expr.Expression.show).

In [16]:
mt = mt.annotate_entries(
    low_DP = mt.DP < 15
)
mt.low_DP.show(n_rows=3, n_cols=3)

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample1','Sample2','Sample3'
locus,alleles,low_DP,low_DP,low_DP
locus<GRCh38>,array<str>,bool,bool,bool
chr1:100,"[""A"",""G""]",False,False,True
chr1:200,"[""C"",""T""]",True,False,True
chr1:300,"[""G"",""A""]",False,False,False


[`Expression.show`](https://hail.is/docs/0.2/hail.expr.Expression.html#hail.expr.Expression.show) also works with compound expressions, such as a struct expression. A struct expression combines multiple values into one struct value.

In [17]:
hl.struct(low_DP=mt.low_DP, DP=mt.DP, GT=mt.GT).show(n_rows=3, n_cols=3)

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample1','Sample1','Sample1','Sample2','Sample2','Sample2','Sample3','Sample3','Sample3'
Unnamed: 0_level_1,Unnamed: 1_level_1,<expr>,<expr>,<expr>,<expr>,<expr>,<expr>,<expr>,<expr>,<expr>
locus,alleles,low_DP,DP,GT,low_DP,DP,GT,low_DP,DP,GT
locus<GRCh38>,array<str>,bool,int32,call,bool,int32,call,bool,int32,call
chr1:100,"[""A"",""G""]",False,15,0/1,False,17,0/1,True,12,0/0
chr1:200,"[""C"",""T""]",True,10,0/0,False,15,1/1,True,14,0/1
chr1:300,"[""G"",""A""]",False,20,0/1,False,22,0/0,False,18,0/1


### Exercise

Add an entry field which is the sum of the AD array. See [collection functions](https://hail.is/docs/0.2/functions/collections.html).

In [23]:
mt.annotate_entries(
    sum_AD = hl.sum(mt.AD)
).select_entries('AD', 'sum_AD').show(n_rows=3, n_cols=3)

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample1','Sample1','Sample2','Sample2','Sample3','Sample3'
locus,alleles,AD,sum_AD,AD,sum_AD,AD,sum_AD
locus<GRCh38>,array<str>,array<int32>,int32,array<int32>,int32,array<int32>,int32
chr1:100,"[""A"",""G""]","[10,5]",15,"[5,12]",17,"[12,0]",12
chr1:200,"[""C"",""T""]","[10,0]",10,"[5,10]",15,"[10,4]",14
chr1:300,"[""G"",""A""]","[15,10]",25,"[22,0]",22,"[9,9]",18


# Filtering Rows, Columns, and Entries

[`MatrixTable.filter_rows`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.filter_rows), [`MatrixTable.filter_cols`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.filter_cols), and [`MatrixTable.filter_entries`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.filter_entries) respectively remove rows, columns, and entries from a matrix table.

We make a copy of our recipe as `xx` so that we can return to the full dataset later.

In [24]:
xx = mt

In [25]:
xx = xx.filter_rows(
    xx.locus.contig != 'chr1'
)
xx.show(n_rows=3, n_cols=2)

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample1','Sample1','Sample1','Sample1','Sample1','Sample2','Sample2','Sample2','Sample2','Sample2'
locus,alleles,GT,DP,PL,AD,low_DP,GT,DP,PL,AD,low_DP
locus<GRCh38>,array<str>,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool
chr2:150,"[""T"",""C""]",1/1,14,"[0,30,100]","[5,9]",True,0/0,15,"[100,0,120]","[15,0]",False
chr2:250,"[""A"",""G""]",0/0,22,"[120,0,60]","[20,2]",False,0/1,21,"[50,30,0]","[15,6]",False
chr2:350,"[""C"",""T""]",0/1,16,"[70,10,0]","[8,8]",False,0/0,15,"[100,0,120]","[15,0]",False


In [26]:
xx = xx.filter_cols(
    xx.s > 'Sample4'
)
xx.show(n_rows=3, n_cols=2)

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample5','Sample5','Sample5','Sample5','Sample5','Sample6','Sample6','Sample6','Sample6','Sample6'
locus,alleles,GT,DP,PL,AD,low_DP,GT,DP,PL,AD,low_DP
locus<GRCh38>,array<str>,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool
chr2:150,"[""T"",""C""]",1/1,16,"[0,30,100]","[5,11]",False,0/1,17,"[0,30,100]","[5,12]",False
chr2:250,"[""A"",""G""]",0/1,20,"[0,30,100]","[5,15]",False,1/1,22,"[0,30,100]","[5,17]",False
chr2:350,"[""C"",""T""]",0/1,19,"[0,30,100]","[6,13]",False,1/1,18,"[0,30,100]","[6,12]",False


A filtered entry is like a hole in the matrix. The other entries in a row or column are still present, so Hail still treats that row and that column as part of the dataset; however, the filtered entry itself is shown as if all its entry fields are missing.

In [27]:
xx = xx.filter_entries(
    xx.GT.is_het()
)
xx.show(n_rows=3, n_cols=2)

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample5','Sample5','Sample5','Sample5','Sample5','Sample6','Sample6','Sample6','Sample6','Sample6'
locus,alleles,GT,DP,PL,AD,low_DP,GT,DP,PL,AD,low_DP
locus<GRCh38>,array<str>,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool
chr2:150,"[""T"",""C""]",,,,,,0/1,17.0,"[0,30,100]","[5,12]",False
chr2:250,"[""A"",""G""]",0/1,20.0,"[0,30,100]","[5,15]",False,,,,,
chr2:350,"[""C"",""T""]",0/1,19.0,"[0,30,100]","[6,13]",False,,,,,


### Exercise

Filter to rows in chromosome 2.

In [30]:
mt.filter_rows(
    mt.locus.contig == 'chr2'
).show(n_rows=3, n_cols=2)

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample1','Sample1','Sample1','Sample1','Sample1','Sample2','Sample2','Sample2','Sample2','Sample2'
locus,alleles,GT,DP,PL,AD,low_DP,GT,DP,PL,AD,low_DP
locus<GRCh38>,array<str>,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool
chr2:150,"[""T"",""C""]",1/1,14,"[0,30,100]","[5,9]",True,0/0,15,"[100,0,120]","[15,0]",False
chr2:250,"[""A"",""G""]",0/0,22,"[120,0,60]","[20,2]",False,0/1,21,"[50,30,0]","[15,6]",False
chr2:350,"[""C"",""T""]",0/1,16,"[70,10,0]","[8,8]",False,0/0,15,"[100,0,120]","[15,0]",False


# Head and Tail of the Dataset

[`MatrixTable.head`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.head) and [`MatrixTable.tail`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.tail) filter the dataset to corners of the matrix.

In [31]:
mt.head(n_rows=3, n_cols=3).GT.show()

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample1','Sample2','Sample3'
locus,alleles,GT,GT,GT
locus<GRCh38>,array<str>,call,call,call
chr1:100,"[""A"",""G""]",0/1,0/1,0/0
chr1:200,"[""C"",""T""]",0/0,1/1,0/1
chr1:300,"[""G"",""A""]",0/1,0/0,0/1


There is currently a bug in `tail`: it calls `n_rows` `n`. This will be fixed in 0.2.121.

In [32]:
mt.tail(n=3, n_cols=3).GT.show()

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample8','Sample9','Sample10'
locus,alleles,GT,GT,GT
locus<GRCh38>,array<str>,call,call,call
chr3:120,"[""G"",""A""]",0/0,0/1,1/1
chr3:220,"[""T"",""C""]",0/0,0/1,0/1
chr3:320,"[""A"",""G""]",0/0,0/1,1/1


Head and tail can be combined to filter to the top-right or bottom-left corners of the matrix.

In [33]:
mt.head(n_rows=3, n_cols=None).tail(n=None, n_cols=3).GT.show()

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample8','Sample9','Sample10'
locus,alleles,GT,GT,GT
locus<GRCh38>,array<str>,call,call,call
chr1:100,"[""A"",""G""]",0/1,0/0,0/1
chr1:200,"[""C"",""T""]",0/0,1/1,0/0
chr1:300,"[""G"",""A""]",0/0,0/1,1/1


# Aggregating Rows, Columns, and Entries

[`MatrixTable.aggregate_entries`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.aggregate_entries) aggregates the entire dataset into a single Python value. [`hl.agg.group_by`](https://hail.is/docs/0.2/aggregators.html#hail.expr.aggregators.group_by) partitions values into groups and aggregates each group separately.

In [34]:
mt.aggregate_entries(
    hl.agg.group_by(mt.GT, hl.agg.count())
)

{Call(alleles=[0, 0], phased=False): 30,
 Call(alleles=[0, 1], phased=False): 36,
 Call(alleles=[1, 1], phased=False): 24}

[`MatrixTable.aggregate_rows`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.aggregate_rows) and [`MatrixTable.aggregate_cols`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.aggregate_cols) respectively aggregate all the row fields or all the column fields into a single Python value.

[`hl.agg.stats`](https://hail.is/docs/0.2/aggregators.html#hail.expr.aggregators.stats) computes the mean, standard deviation, min, max, count, and sum of a numeric field.

In [35]:
mt.aggregate_rows(
    hl.agg.stats(mt.info.DP)
)

Struct(mean=184.33333333333334, stdev=29.154759474226502, min=133.0, max=230.0, n=9, sum=1659.0)

In [36]:
mt.aggregate_cols(
    hl.agg.stats(mt.height_ft)
)

Struct(mean=5.721663734584305, stdev=0.23554996630829225, min=5.40914812286674, max=6.16587388761934, n=10, sum=57.21663734584305)

[`MatrixTable.annotate_rows`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.annotate_rows) permits aggregating the entries of each row separately. This produces a single Hail value for each row which is necessarily stored in a row field. [`CallExpression.n_alt_alleles`](https://hail.is/docs/0.2/hail.expr.CallExpression.html#hail.expr.CallExpression.n_alt_alleles) returns the number of alternate alleles in the genotype call. For example, `0/0` has zero alternate alleles and `1/1` has two.

In [37]:
mt.annotate_rows(
    alternate_allele_frequency = hl.agg.mean(mt.GT.n_alt_alleles()) / 2.0
).alternate_allele_frequency.show(n=10)

locus,alleles,alternate_allele_frequency
locus<GRCh38>,array<str>,float64
chr1:100,"[""A"",""G""]",0.45
chr1:200,"[""C"",""T""]",0.45
chr1:300,"[""G"",""A""]",0.45
chr2:150,"[""T"",""C""]",0.5
chr2:250,"[""A"",""G""]",0.45
chr2:350,"[""C"",""T""]",0.45
chr3:120,"[""G"",""A""]",0.5
chr3:220,"[""T"",""C""]",0.5
chr3:320,"[""A"",""G""]",0.45


[`MatrixTable.annotate_cols`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.annotate_cols) likewise permits aggregating each column of entries.

In [38]:
mt.annotate_cols(
    mean_sample_depth = hl.agg.mean(mt.DP)
).key_cols_by().cols().select('s', 'mean_sample_depth').show(n=10)

s,mean_sample_depth
str,float64
"""Sample1""",17.7
"""Sample2""",18.0
"""Sample3""",17.8
"""Sample4""",17.3
"""Sample5""",19.9
"""Sample6""",19.0
"""Sample7""",19.2
"""Sample8""",17.4
"""Sample9""",18.4
"""Sample10""",19.6


### Exercise

Filter to rows with more homozygous reference calls than heterozygous calls.

In [46]:
n_hets = hl.agg.count_where(mt.GT.is_het())
n_homs = hl.agg.count_where(mt.GT.is_hom_ref())

mt.filter_rows(
    n_hets > n_homs
).select_entries('GT').show(3, 10)

Unnamed: 0_level_0,Unnamed: 1_level_0,'Sample1','Sample2','Sample3','Sample4','Sample5','Sample6','Sample7','Sample8','Sample9','Sample10'
locus,alleles,GT,GT,GT,GT,GT,GT,GT,GT,GT,GT
locus<GRCh38>,array<str>,call,call,call,call,call,call,call,call,call,call
chr1:100,"[""A"",""G""]",0/1,0/1,0/0,0/1,1/1,0/0,1/1,0/1,0/0,0/1
chr1:300,"[""G"",""A""]",0/1,0/0,0/1,0/1,0/0,1/1,0/1,0/0,0/1,1/1
chr2:150,"[""T"",""C""]",1/1,0/0,0/1,0/0,1/1,0/1,1/1,0/0,0/1,0/1


### Exercise

Filter to samples whose mean depth across all variants is greater than 18.

In [48]:
mean_depth = hl.agg.mean(mt.DP)

mt.filter_cols(mean_depth > 18).key_cols_by().cols().show()

s,height_ft
str,float64
"""Sample5""",5.8
"""Sample6""",5.67
"""Sample7""",5.41
"""Sample9""",6.17
"""Sample10""",5.86


# Aggregating within Groups of Rows or Groups of Columns

[`MatrixTable.group_rows_by`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.group_rows_by) aggregates groups of rows to produce a new matrix table whose rows are the groups. 

In [49]:
mt.group_rows_by(
    contig=mt.locus.contig
).aggregate(
    n_alt_alleles = hl.agg.sum(mt.GT.n_alt_alleles())
).show(n_rows=3, n_cols=10)

2023-08-29 16:48:23.644 Hail: INFO: Coerced sorted dataset


Unnamed: 0_level_0,'Sample1','Sample2','Sample3','Sample4','Sample5','Sample6','Sample7','Sample8','Sample9','Sample10'
contig,n_alt_alleles,n_alt_alleles,n_alt_alleles,n_alt_alleles,n_alt_alleles,n_alt_alleles,n_alt_alleles,n_alt_alleles,n_alt_alleles,n_alt_alleles
str,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64
"""chr1""",2,3,2,2,3,4,4,1,3,3
"""chr2""",3,1,3,2,4,5,3,1,3,3
"""chr3""",3,2,2,2,3,4,5,0,3,5


[`MatrixTable.group_cols_by`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.group_cols_by) is the column analogue.

In [50]:
mt.group_cols_by(
    is_shorter_than_5_8 = mt.height_ft < (5 + 8/12)
).aggregate(
    n_alt_alleles = hl.agg.sum(mt.GT.n_alt_alleles())
).show(n_rows=10, n_cols=2)

Unnamed: 0_level_0,Unnamed: 1_level_0,<col 0>,<col 1>
locus,alleles,n_alt_alleles,n_alt_alleles
locus<GRCh38>,array<str>,int64,int64
chr1:100,"[""A"",""G""]",4,5
chr1:200,"[""C"",""T""]",8,1
chr1:300,"[""G"",""A""]",6,3
chr2:150,"[""T"",""C""]",6,4
chr2:250,"[""A"",""G""]",8,1
chr2:350,"[""C"",""T""]",5,4
chr3:120,"[""G"",""A""]",7,3
chr3:220,"[""T"",""C""]",6,4
chr3:320,"[""A"",""G""]",6,3


### Exercise

Calculate the mean depth for each contig.

In [51]:
mt.group_rows_by(
    contig=mt.locus.contig
).aggregate(
    mean_depth = hl.agg.mean(mt.DP)
).show(n_rows=3, n_cols=10)

2023-08-29 16:49:04.728 Hail: INFO: Coerced sorted dataset


Unnamed: 0_level_0,'Sample1','Sample2','Sample3','Sample4','Sample5','Sample6','Sample7','Sample8','Sample9','Sample10'
contig,mean_depth,mean_depth,mean_depth,mean_depth,mean_depth,mean_depth,mean_depth,mean_depth,mean_depth,mean_depth
str,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64
"""chr1""",15.0,18.0,14.7,15.7,20.3,17.3,18.0,16.7,17.0,17.0
"""chr2""",17.3,17.0,18.3,17.3,18.3,19.0,19.0,17.3,18.7,21.3
"""chr3""",20.7,19.0,20.3,19.0,21.0,20.7,20.7,18.3,19.7,20.3


# Writing and Reading Matrix Tables in Hail Native Format

Hail has a partitioned, indexed, binary file format for quickly reading and writing matrix tables. [`MatrixTable.write`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.write) is the action which writes a matrix table in Hail native format. We use the ".mt" file extension by convention.

In [54]:
mt.write('output/sample_vcf.mt')

2023-08-29 16:49:28.454 Hail: INFO: wrote matrix table with 9 rows and 10 columns in 2 partitions to output/sample_vcf.mt


Unless you're using BGEN, a binary format for which Hail has excellent support, you should always read from a Hail native format file instead of importing.

[`hl.read_matrix_table`](https://hail.is/docs/0.2/methods/impex.html#hail.methods.read_matrix_table) reads matrix tables stored in Hail native format.

In [55]:
mt = hl.read_matrix_table('output/sample_vcf.mt')

# Exporting a Matrix Table

A matrix table supports [export to many formats](https://hail.is/docs/0.2/methods/impex.html#export) including VCF, BGEN, and PLINK.

Always export VCFs using block GZIP compression and "header_per_shard" or "separate_header". VCF does not support boolean FORMAT fields so we must recode `low_DP` to an integer using [`hl.if_else`](https://hail.is/docs/0.2/functions/core.html#hail.expr.functions.if_else).

In [56]:
xx = mt
xx = xx.annotate_entries(low_DP=hl.if_else(xx.low_DP, 1, 0))
hl.export_vcf(xx, 'output/sample_vcf.vcf.bgz', parallel='header_per_shard')

2023-08-29 16:49:40.958 Hail: WARN: export_vcf: ignored the following fields:
    'height_ft' (column)
    'the_actual_sum_total_DP' (row)


In [57]:
!ls output/sample_vcf.vcf.bgz/

_SUCCESS
part-0-8820347e-74ea-4bc0-b1d3-b9af3560cb78.bgz
part-1-bd59c3a9-e27a-40c3-8ad7-4be8d7b4ab5c.bgz
shard-manifest.txt


⚠️⚠️⚠️ Confusing Behavior Alert ⚠️⚠️⚠️

BGEN datasets are usually two files: a .bgen file and a .sample file. [`hl.export_bgen`](https://hail.is/docs/0.2/methods/impex.html#hail.methods.export_bgen) expects a file path _without_ an extension. A file named `....sample` contains the sample IDs. A file or folder named `....bgen` contains the genotype probabilities in BGEN format.

In [58]:
xx = mt
xx = xx.annotate_entries(
    GP=(hl.case()
        .when(mt.GT.is_hom_ref(), [1, 0, 0])
        .when(mt.GT.is_het(), [0, 1, 0])
        .when(mt.GT.is_hom_var(), [0, 0, 1])
        .or_error(hl.format('Unexpected GT: %s', mt.GT))
       )
)
hl.export_bgen(xx, 'output/sample_vcf', gp=xx.GP, parallel='header_per_shard')

In [59]:
!ls output/sample_vcf.bgen/

part-0-ad1ac6dc-60ac-4821-806a-ecf721d7f567
part-1-2171c9c6-f653-4b59-929f-742d666e9579
shard-manifest.txt


In [60]:
!head -n 4 output/sample_vcf.sample

ID_1 ID_2 missing
0 0 0
Sample1 Sample1 0
Sample2 Sample2 0


# Collecting a Matrix Table

Matrix tables do not support `collect` because there is no obvious Python analogue to the matrix table. A list of list or a NumPy matrix both seem reasonable. Matrix table does not support `to_pandas` because Pandas DataFrames have a large per-column overhead and most matrix tables have many columns, each with many entry fields.

Instead, matrix tables provide methods for producing tables which can be converted to lists or Pandas DataFrames.

[`MatrixTable.make_table`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.make_table) creates a table with one field for every column for every entry field.

In [61]:
xx = mt
xx = xx.make_table()
xx.show(n=3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,info,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0,Unnamed: 11_level_0,Unnamed: 12_level_0,Unnamed: 13_level_0,Unnamed: 14_level_0,Unnamed: 15_level_0,Unnamed: 16_level_0,Unnamed: 17_level_0,Unnamed: 18_level_0,Unnamed: 19_level_0,Unnamed: 20_level_0,Unnamed: 21_level_0,Unnamed: 22_level_0,Unnamed: 23_level_0,Unnamed: 24_level_0,Unnamed: 25_level_0,Unnamed: 26_level_0,Unnamed: 27_level_0,Unnamed: 28_level_0,Unnamed: 29_level_0,Unnamed: 30_level_0,Unnamed: 31_level_0,Unnamed: 32_level_0,Unnamed: 33_level_0,Unnamed: 34_level_0,Unnamed: 35_level_0,Unnamed: 36_level_0,Unnamed: 37_level_0,Unnamed: 38_level_0,Unnamed: 39_level_0,Unnamed: 40_level_0,Unnamed: 41_level_0,Unnamed: 42_level_0,Unnamed: 43_level_0,Unnamed: 44_level_0,Unnamed: 45_level_0,Unnamed: 46_level_0,Unnamed: 47_level_0,Unnamed: 48_level_0,Unnamed: 49_level_0,Unnamed: 50_level_0,Unnamed: 51_level_0,Unnamed: 52_level_0,Unnamed: 53_level_0,Unnamed: 54_level_0,Unnamed: 55_level_0,Unnamed: 56_level_0
locus,alleles,rsid,qual,filters,DP,the_actual_sum_total_DP,Sample1.GT,Sample1.DP,Sample1.PL,Sample1.AD,Sample1.low_DP,Sample2.GT,Sample2.DP,Sample2.PL,Sample2.AD,Sample2.low_DP,Sample3.GT,Sample3.DP,Sample3.PL,Sample3.AD,Sample3.low_DP,Sample4.GT,Sample4.DP,Sample4.PL,Sample4.AD,Sample4.low_DP,Sample5.GT,Sample5.DP,Sample5.PL,Sample5.AD,Sample5.low_DP,Sample6.GT,Sample6.DP,Sample6.PL,Sample6.AD,Sample6.low_DP,Sample7.GT,Sample7.DP,Sample7.PL,Sample7.AD,Sample7.low_DP,Sample8.GT,Sample8.DP,Sample8.PL,Sample8.AD,Sample8.low_DP,Sample9.GT,Sample9.DP,Sample9.PL,Sample9.AD,Sample9.low_DP,Sample10.GT,Sample10.DP,Sample10.PL,Sample10.AD,Sample10.low_DP
locus<GRCh38>,array<str>,str,float64,set<str>,int64,int64,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool,call,int32,array<int32>,array<int32>,bool
chr1:100,"[""A"",""G""]","""rs1""",50.0,{},166,166,0/1,15,"[50,10,80]","[10,5]",False,0/1,17,"[60,40,0]","[5,12]",False,0/0,12,"[100,0,120]","[12,0]",True,0/1,14,"[20,30,100]","[5,9]",True,1/1,20,"[0,30,100]","[5,15]",False,0/0,16,"[110,0,40]","[16,0]",False,1/1,19,"[0,30,100]","[6,13]",False,0/1,18,"[30,20,0]","[10,8]",False,0/0,14,"[100,0,120]","[14,0]",True,0/1,21,"[40,20,0]","[8,13]",False
chr1:200,"[""C"",""T""]","""rs2""",30.0,{},133,133,0/0,10,"[90,0,120]","[10,0]",True,1/1,15,"[0,30,100]","[5,10]",False,0/1,14,"[60,30,0]","[10,4]",True,0/0,12,"[100,0,120]","[12,0]",True,0/1,16,"[0,30,100]","[5,11]",False,1/1,13,"[0,30,100]","[4,9]",True,0/1,11,"[60,30,0]","[5,6]",True,0/0,14,"[100,0,120]","[14,0]",True,1/1,18,"[0,30,100]","[5,13]",False,0/0,10,"[100,0,120]","[10,0]",True
chr1:300,"[""G"",""A""]","""rs3""",40.0,{},210,210,0/1,20,"[40,20,80]","[15,10]",False,0/0,22,"[100,0,120]","[22,0]",False,0/1,18,"[70,10,0]","[9,9]",False,0/1,21,"[20,30,100]","[11,10]",False,0/0,25,"[100,0,120]","[25,0]",False,1/1,23,"[0,40,80]","[5,18]",False,0/1,24,"[20,30,100]","[10,14]",False,0/0,18,"[100,0,120]","[18,0]",False,0/1,19,"[20,30,100]","[8,11]",False,1/1,20,"[0,30,100]","[5,15]",False


In [62]:
xx.to_pandas()

Unnamed: 0,locus,alleles,rsid,qual,filters,info.DP,the_actual_sum_total_DP,Sample1.GT,Sample1.DP,Sample1.PL,...,Sample9.GT,Sample9.DP,Sample9.PL,Sample9.AD,Sample9.low_DP,Sample10.GT,Sample10.DP,Sample10.PL,Sample10.AD,Sample10.low_DP
0,chr1:100,"[A, G]",rs1,50.0,{},166,166,0/1,15,"[50, 10, 80]",...,0/0,14,"[100, 0, 120]","[14, 0]",True,0/1,21,"[40, 20, 0]","[8, 13]",False
1,chr1:200,"[C, T]",rs2,30.0,{},133,133,0/0,10,"[90, 0, 120]",...,1/1,18,"[0, 30, 100]","[5, 13]",False,0/0,10,"[100, 0, 120]","[10, 0]",True
2,chr1:300,"[G, A]",rs3,40.0,{},210,210,0/1,20,"[40, 20, 80]",...,0/1,19,"[20, 30, 100]","[8, 11]",False,1/1,20,"[0, 30, 100]","[5, 15]",False
3,chr2:150,"[T, C]",rs4,60.0,{},168,168,1/1,14,"[0, 30, 100]",...,0/1,15,"[0, 30, 100]","[5, 10]",False,0/1,19,"[20, 30, 100]","[9, 10]",False
4,chr2:250,"[A, G]",rs5,55.0,{},217,217,0/0,22,"[120, 0, 60]",...,1/1,24,"[0, 30, 100]","[5, 20]",False,0/0,25,"[100, 0, 120]","[25, 0]",False
5,chr2:350,"[C, T]",rs6,45.0,{},166,166,0/1,16,"[70, 10, 0]",...,0/0,17,"[100, 0, 120]","[17, 0]",False,1/1,20,"[0, 30, 100]","[5, 15]",False
6,chr3:120,"[G, A]",rs7,38.0,{},173,173,0/0,18,"[110, 0, 50]",...,0/1,18,"[30, 30, 0]","[10, 8]",False,1/1,15,"[0, 30, 100]","[5, 10]",False
7,chr3:220,"[T, C]",rs8,42.0,{},196,196,1/1,19,"[0, 40, 80]",...,0/1,19,"[20, 30, 0]","[7, 12]",False,0/1,22,"[20, 30, 100]","[11, 11]",False
8,chr3:320,"[A, G]",rs9,48.0,{},230,230,0/1,25,"[30, 20, 0]",...,0/1,22,"[20, 30, 0]","[10, 12]",False,1/1,24,"[0, 30, 100]","[5, 20]",False


[`MatrixTable.localize_entries`](https://hail.is/docs/0.2/hail.MatrixTable.html#hail.MatrixTable.localize_entries) is a confusingly named method which converts the entries into array row fields:

In [63]:
xx = mt
xx = xx.localize_entries('entries', 'columns')
xx.show()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,info,Unnamed: 6_level_0,Unnamed: 7_level_0
locus,alleles,rsid,qual,filters,DP,the_actual_sum_total_DP,entries
locus<GRCh38>,array<str>,str,float64,set<str>,int64,int64,"array<struct{GT: call, DP: int32, PL: array<int32>, AD: array<int32>, low_DP: bool}>"
chr1:100,"[""A"",""G""]","""rs1""",50.0,{},166,166,"[(0/1,15,[50,10,80],[10,5],False),(0/1,17,[60,40,0],[5,12],False),(0/0,12,[100,0,120],[12,0],True),(0/1,14,[20,30,100],[5,9],True),(1/1,20,[0,30,100],[5,15],False),(0/0,16,[110,0,40],[16,0],False),(1/1,19,[0,30,100],[6,13],False),(0/1,18,[30,20,0],[10,8],False),(0/0,14,[100,0,120],[14,0],True),(0/1,21,[40,20,0],[8,13],False)]"
chr1:200,"[""C"",""T""]","""rs2""",30.0,{},133,133,"[(0/0,10,[90,0,120],[10,0],True),(1/1,15,[0,30,100],[5,10],False),(0/1,14,[60,30,0],[10,4],True),(0/0,12,[100,0,120],[12,0],True),(0/1,16,[0,30,100],[5,11],False),(1/1,13,[0,30,100],[4,9],True),(0/1,11,[60,30,0],[5,6],True),(0/0,14,[100,0,120],[14,0],True),(1/1,18,[0,30,100],[5,13],False),(0/0,10,[100,0,120],[10,0],True)]"
chr1:300,"[""G"",""A""]","""rs3""",40.0,{},210,210,"[(0/1,20,[40,20,80],[15,10],False),(0/0,22,[100,0,120],[22,0],False),(0/1,18,[70,10,0],[9,9],False),(0/1,21,[20,30,100],[11,10],False),(0/0,25,[100,0,120],[25,0],False),(1/1,23,[0,40,80],[5,18],False),(0/1,24,[20,30,100],[10,14],False),(0/0,18,[100,0,120],[18,0],False),(0/1,19,[20,30,100],[8,11],False),(1/1,20,[0,30,100],[5,15],False)]"
chr2:150,"[""T"",""C""]","""rs4""",60.0,{},168,168,"[(1/1,14,[0,30,100],[5,9],True),(0/0,15,[100,0,120],[15,0],False),(0/1,18,[40,30,0],[10,8],False),(0/0,20,[100,0,120],[20,0],False),(1/1,16,[0,30,100],[5,11],False),(0/1,17,[0,30,100],[5,12],False),(1/1,18,[0,30,100],[6,13],False),(0/0,16,[100,0,120],[16,0],False),(0/1,15,[0,30,100],[5,10],False),(0/1,19,[20,30,100],[9,10],False)]"
chr2:250,"[""A"",""G""]","""rs5""",55.0,{},217,217,"[(0/0,22,[120,0,60],[20,2],False),(0/1,21,[50,30,0],[15,6],False),(1/1,23,[0,30,100],[5,18],False),(0/0,19,[100,0,120],[19,0],False),(0/1,20,[0,30,100],[5,15],False),(1/1,22,[0,30,100],[5,17],False),(0/0,23,[100,0,120],[23,0],False),(0/1,18,[20,30,0],[8,10],False),(1/1,24,[0,30,100],[5,20],False),(0/0,25,[100,0,120],[25,0],False)]"
chr2:350,"[""C"",""T""]","""rs6""",45.0,{},166,166,"[(0/1,16,[70,10,0],[8,8],False),(0/0,15,[100,0,120],[15,0],False),(0/0,14,[100,0,120],[14,0],True),(1/1,13,[0,30,100],[4,9],True),(0/1,19,[0,30,100],[6,13],False),(1/1,18,[0,30,100],[6,12],False),(0/1,16,[20,30,0],[7,9],False),(0/0,18,[100,0,120],[18,0],False),(0/0,17,[100,0,120],[17,0],False),(1/1,20,[0,30,100],[5,15],False)]"
chr3:120,"[""G"",""A""]","""rs7""",38.0,{},173,173,"[(0/0,18,[110,0,50],[18,0],False),(1/1,16,[0,30,100],[5,11],False),(0/0,17,[100,0,120],[17,0],False),(0/1,15,[30,30,0],[5,10],False),(0/1,20,[0,30,100],[5,14],False),(0/1,19,[0,30,100],[6,13],False),(1/1,18,[0,30,100],[6,12],False),(0/0,17,[100,0,120],[17,0],False),(0/1,18,[30,30,0],[10,8],False),(1/1,15,[0,30,100],[5,10],False)]"
chr3:220,"[""T"",""C""]","""rs8""",42.0,{},196,196,"[(1/1,19,[0,40,80],[6,13],False),(0/0,18,[100,0,120],[18,0],False),(0/1,20,[60,30,0],[12,8],False),(0/0,22,[100,0,120],[22,0],False),(1/1,21,[0,30,100],[5,16],False),(0/1,20,[0,30,100],[6,14],False),(1/1,18,[0,30,100],[6,12],False),(0/0,17,[100,0,120],[17,0],False),(0/1,19,[20,30,0],[7,12],False),(0/1,22,[20,30,100],[11,11],False)]"
chr3:320,"[""A"",""G""]","""rs9""",48.0,{},230,230,"[(0/1,25,[30,20,0],[14,11],False),(0/0,23,[100,0,120],[23,0],False),(0/1,24,[40,20,0],[13,11],False),(0/1,20,[20,30,100],[11,9],False),(0/0,22,[100,0,120],[22,0],False),(1/1,23,[0,30,100],[5,18],False),(0/1,26,[20,30,100],[12,14],False),(0/0,21,[100,0,120],[21,0],False),(0/1,22,[20,30,0],[10,12],False),(1/1,24,[0,30,100],[5,20],False)]"


In [64]:
xx.collect()

[Struct(locus=Locus(contig=chr1, position=100, reference_genome=GRCh38), alleles=['A', 'G'], rsid='rs1', qual=50.0, filters=set(), info=Struct(DP=166), the_actual_sum_total_DP=166, entries=[Struct(GT=Call(alleles=[0, 1], phased=False), DP=15, PL=[50, 10, 80], AD=[10, 5], low_DP=False), Struct(GT=Call(alleles=[0, 1], phased=False), DP=17, PL=[60, 40, 0], AD=[5, 12], low_DP=False), Struct(GT=Call(alleles=[0, 0], phased=False), DP=12, PL=[100, 0, 120], AD=[12, 0], low_DP=True), Struct(GT=Call(alleles=[0, 1], phased=False), DP=14, PL=[20, 30, 100], AD=[5, 9], low_DP=True), Struct(GT=Call(alleles=[1, 1], phased=False), DP=20, PL=[0, 30, 100], AD=[5, 15], low_DP=False), Struct(GT=Call(alleles=[0, 0], phased=False), DP=16, PL=[110, 0, 40], AD=[16, 0], low_DP=False), Struct(GT=Call(alleles=[1, 1], phased=False), DP=19, PL=[0, 30, 100], AD=[6, 13], low_DP=False), Struct(GT=Call(alleles=[0, 1], phased=False), DP=18, PL=[30, 20, 0], AD=[10, 8], low_DP=False), Struct(GT=Call(alleles=[0, 0], phased

Hail incorrectly converts this table into a Pandas DataFrame (notice the entries are the field names, not the field values). This is a [known bug](https://github.com/hail-is/hail/issues/13512) which will be fixed in a future version of Hail.

In [65]:
xx.to_pandas()

Unnamed: 0,locus,alleles,rsid,qual,filters,info.DP,the_actual_sum_total_DP,entries
0,chr1:100,"[A, G]",rs1,50.0,{},166,166,"[(GT, DP, PL, AD, low_DP), (GT, DP, PL, AD, lo..."
1,chr1:200,"[C, T]",rs2,30.0,{},133,133,"[(GT, DP, PL, AD, low_DP), (GT, DP, PL, AD, lo..."
2,chr1:300,"[G, A]",rs3,40.0,{},210,210,"[(GT, DP, PL, AD, low_DP), (GT, DP, PL, AD, lo..."
3,chr2:150,"[T, C]",rs4,60.0,{},168,168,"[(GT, DP, PL, AD, low_DP), (GT, DP, PL, AD, lo..."
4,chr2:250,"[A, G]",rs5,55.0,{},217,217,"[(GT, DP, PL, AD, low_DP), (GT, DP, PL, AD, lo..."
5,chr2:350,"[C, T]",rs6,45.0,{},166,166,"[(GT, DP, PL, AD, low_DP), (GT, DP, PL, AD, lo..."
6,chr3:120,"[G, A]",rs7,38.0,{},173,173,"[(GT, DP, PL, AD, low_DP), (GT, DP, PL, AD, lo..."
7,chr3:220,"[T, C]",rs8,42.0,{},196,196,"[(GT, DP, PL, AD, low_DP), (GT, DP, PL, AD, lo..."
8,chr3:320,"[A, G]",rs9,48.0,{},230,230,"[(GT, DP, PL, AD, low_DP), (GT, DP, PL, AD, lo..."
