# Pfizer 2023 Hail Workshop
## 04: Advanced Hail Functionality

This notebook is a grab bag of more advanced Hail functionality.

### Approximate CDF

Normally computing quantiles or the median requires sorting an entire dataset. Hail uses a sophisticated data structure to get provably good approximations of all quantiles without sorting the data, providing buckets, or using unbounded memory.

In [1]:
import os
# Give Hail a bunch of RAM
os.environ['PYSPARK_SUBMIT_ARGS'] = '--executor-memory 16G --driver-memory 16G pyspark-shell'

In [2]:
import hail as hl
hl.plot.output_notebook()

In [3]:
mt = hl.read_matrix_table('resources/1kg.mt')
mt = hl.variant_qc(mt)
cdf = mt.aggregate_rows(hl.agg.approx_cdf(mt.variant_qc.call_rate))
cdf

Initializing Hail with default parameters...


2023-01-11 14:45:40.378 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


2023-01-11 14:45:42.185 WARN  Utils:69 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
2023-01-11 14:45:42.186 WARN  Utils:69 - Service 'SparkUI' could not bind on port 4041. Attempting port 4042.


Running on Apache Spark version 3.1.3
SparkUI available at http://148.168.60.57:4042
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.107-2387bb00ceee
LOGGING: writing to /Users/dking/projects/pfizer-2023-01-11/04/hail-20230111-1445-0.2.107-2387bb00ceee.log
[Stage 0:>                                                          (0 + 4) / 4]

Struct(values=[0.5633802816901409, 0.6161971830985915, 0.6795774647887324, 0.6971830985915493, 0.8028169014084507, 0.8556338028169014, 0.8661971830985915, 0.9049295774647887, 0.9225352112676056, 0.9330985915492958, 0.9401408450704225, 0.9436619718309859, 0.9471830985915493, 0.9507042253521126, 0.954225352112676, 0.9577464788732394, 0.9612676056338029, 0.9647887323943662, 0.9683098591549296, 0.971830985915493, 0.9753521126760564, 0.9788732394366197, 0.9823943661971831, 0.9859154929577465, 0.9894366197183099, 0.9929577464788732, 0.9964788732394366, 1.0], ranks=[0, 1, 2, 3, 5, 9, 25, 57, 58, 122, 186, 250, 314, 378, 379, 444, 572, 700, 828, 956, 1212, 1597, 2109, 2621, 3645, 5053, 6909, 9149, 10879], _compaction_counts=[165, 61, 23, 8, 3, 1, 0, 0])

In [4]:
import bokeh.plotting as bp

def plot_cdf(cdf, title):
    values = cdf['values']
    values = values + [values[-1]]
    ranks = cdf['ranks']
    ranks = [x / ranks[-1] for x in ranks]

    p = bp.figure(title=title, plot_width=400, plot_height=400)
    p.step(x=[0] + values, y=[0] + ranks, line_width=2, line_color='black')

    hl.plot.show(p)
    
plot_cdf(cdf, 'Approximate CDF of Call Rate')

In [5]:
mt = hl.read_matrix_table('resources/1kg.mt')
mt = hl.variant_qc(mt)
cdf = mt.aggregate_rows(hl.agg.approx_cdf(mt.variant_qc.AF[0]))
plot_cdf(cdf, 'Approximate CDF of Reference AF')

You can also ask directly for the median:

In [6]:
mt.aggregate_rows(hl.agg.approx_median(mt.variant_qc.AF[0]))

0.7307692307692307

### PCA on Unusual Values

Flexible, general-purpose methods enable analysts to explore data sets with novel statistics.

In [7]:
mt = hl.read_matrix_table('resources/1kg.mt')
mt = mt.filter_rows(hl.agg.any(hl.is_missing(mt.GT)))
mt = mt.annotate_entries(
    is_missing = hl.is_missing(mt.GT)
)
mt = mt.annotate_rows(
    is_missing_stats = hl.agg.stats(mt.is_missing)
)
mt = mt.annotate_entries(
    normed_is_missing = (mt.is_missing - mt.is_missing_stats.mean) / mt.is_missing_stats.stdev
)
_, scores, _ = hl.pca(mt.normed_is_missing, k=2)
hl.plot.show(hl.plot.scatter(scores.scores[0], scores.scores[1]))

2023-01-11 14:47:26.961 Hail: INFO: pca: running PCA with 2 components...4) / 4]
2023-01-11 14:47:32.481 Hail: INFO: wrote table with 0 rows in 0 partitions to /tmp/persist_tablet5ConpVaEV
    Total size: 6.26 KiB
    * Rows: 0.00 B
    * Globals: 6.26 KiB
    * Smallest partition: N/A
    * Largest partition:  N/A


### LD Prune

In [9]:
?hl.ld_prune

In [8]:
mt = hl.read_matrix_table('resources/qced-hgdp-1kg.mt')
print(f'Before pruning we have: {mt.count_rows()}')
pruned_variants = hl.ld_prune(mt.GT)
pruned_variants.write('output/pruned_variants.ht', overwrite=True)
pruned_variants = hl.read_table('output/pruned_variants.ht')

mt = mt.filter_rows(hl.is_defined(pruned_variants[mt.row_key]))
print(f'After pruning we have: {mt.count_rows()}')

Before pruning we have: 19151


2023-01-11 14:48:09.127 Hail: INFO: ld_prune: running local pruning stage with max queue size of 61681 variants
2023-01-11 14:48:19.298 Hail: INFO: wrote table with 2389 rows in 100 partitions to /tmp/persist_tabletUZWOBnbAf
    Total size: 95.93 KiB
    * Rows: 95.92 KiB
    * Globals: 11.00 B
    * Smallest partition: 0 rows (21.00 B)
    * Largest partition:  126 rows (4.49 KiB)
2023-01-11 14:48:24.414 Hail: INFO: wrote table with 2389 rows in 100 partitions to /tmp/1GbEd3sczwrKGSu4mV68JG
2023-01-11 14:48:38.503 Hail: INFO: Wrote all 2 blocks of 2389 x 4151 matrix with block size 4096.
2023-01-11 14:48:48.126 Hail: INFO: wrote table with 1027 rows in 1 partition to /tmp/ipcajvgdPpB5J74dF3IrLO
    Total size: 49.82 KiB
    * Rows: 15.05 KiB
    * Globals: 34.77 KiB
    * Smallest partition: 1027 rows (15.05 KiB)
    * Largest partition:  1027 rows (15.05 KiB)
2023-01-11 14:48:54.138 Hail: INFO: wrote table with 1920 rows in 100 partitions to /tmp/persist_tablehddJ5XwGbw
2023-01-11 14

After pruning we have: 1920


### Kinship Estimators

Hail supports a number of different kinship estimators.

Getting PC Relate to produce good-looking results is tricky! Here we see what happens when you don't quality control the variants well enough.

In [10]:
mt = hl.read_matrix_table('resources/qced-hgdp-1kg.mt')

pc_kin = hl.pc_relate(mt.GT, 0.1, k=4, statistics='kin20', min_kinship=0.1)
pc_kin.write('output/pc_kin.ht', overwrite=True)
pc_kin = hl.read_table('output/pc_kin.ht')

hl.plot.show(
    hl.plot.scatter(
        pc_kin.kin,
        pc_kin.ibd0,
        width=400,
        height=400,
        size=3
    )
)

2023-01-11 14:49:56.437 Hail: INFO: hwe_normalize: found 19151 variants after filtering out monomorphic sites.
2023-01-11 14:50:00.494 Hail: INFO: pca: running PCA with 4 components... / 100]
2023-01-11 14:50:14.420 Hail: INFO: wrote table with 0 rows in 0 partitions to /tmp/persist_tablerWSLMmzZmL
    Total size: 155.63 KiB
    * Rows: 0.00 B
    * Globals: 155.63 KiB
    * Smallest partition: N/A
    * Largest partition:  N/A
2023-01-11 14:50:23.619 Hail: INFO: Wrote all 10 blocks of 19151 x 4151 matrix with block size 4096.
2023-01-11 14:50:24.968 Hail: INFO: wrote matrix with 5 rows and 19151 columns as 5 blocks of size 4096 to /tmp/pcrelate-write-read-7w2g7F0nv3EpAzKdGPmnfo.bm
2023-01-11 14:50:25.467 Hail: INFO: wrote matrix with 19151 rows and 4151 columns as 10 blocks of size 4096 to /tmp/pcrelate-write-read-6ojsB8LWNAOEUtbs29nrcK.bm
2023-01-11 14:50:27.113 Hail: INFO: wrote matrix with 19151 rows and 4151 columns as 10 blocks of size 4096 to /tmp/pcrelate-write-read-su6UN1FqnMY

In [11]:
mt = hl.read_matrix_table('resources/qced-hgdp-1kg.mt')

king_kin = hl.king(mt.GT)
king_kin = king_kin.filter_entries(king_kin.phi > 0.1).entries()
king_kin.write('output/king_kin.ht', overwrite=True)
king_kin = hl.read_table('output/king_kin.ht')

hl.plot.show(
    hl.plot.histogram(
        king_kin.phi
    )
)

2023-01-11 14:51:36.891 Hail: INFO: Wrote all 10 blocks of 19151 x 4151 matrix with block size 4096.
2023-01-11 14:51:42.811 Hail: INFO: Wrote all 10 blocks of 19151 x 4151 matrix with block size 4096.
2023-01-11 14:51:49.255 Hail: INFO: Wrote all 10 blocks of 19151 x 4151 matrix with block size 4096.
2023-01-11 14:51:54.697 Hail: INFO: Wrote all 10 blocks of 19151 x 4151 matrix with block size 4096.
2023-01-11 14:52:02.657 Hail: INFO: wrote matrix with 4151 rows and 4151 columns as 4 blocks of size 4096 to /tmp/icEy71iOgKWyKp1oEnTS1n
2023-01-11 14:52:10.180 Hail: INFO: wrote matrix with 4151 rows and 4151 columns as 4 blocks of size 4096 to /tmp/d3IhDPkISALQjZkBvHRzbj
2023-01-11 14:52:17.277 Hail: INFO: wrote matrix with 4151 rows and 4151 columns as 4 blocks of size 4096 to /tmp/DBKFebgFovXoyisCBglL5x
2023-01-11 14:52:18.300 Hail: INFO: wrote matrix with 4151 rows and 4151 columns as 4 blocks of size 4096 to /tmp/bbhFh5OIsvuDH2qTG6UNGz
2023-01-11 14:52:18.993 Hail: INFO: wrote matrix

In [19]:
king_kin.filter(king_kin.phi < 0.45).show()

s_1,s,phi
str,str,float64
"""CHMI_CHMI3_WGS2""","""HG00114""",0.13
"""CHMI_CHMI3_WGS2""","""HG00183""",0.106
"""CHMI_CHMI3_WGS2""","""HG00243""",0.101
"""CHMI_CHMI3_WGS2""","""HG00246""",0.101
"""CHMI_CHMI3_WGS2""","""HG00282""",0.103
"""CHMI_CHMI3_WGS2""","""HG00308""",0.116
"""CHMI_CHMI3_WGS2""","""HG00311""",0.104
"""CHMI_CHMI3_WGS2""","""HG00351""",0.125
"""CHMI_CHMI3_WGS2""","""HG00639""",0.116
"""CHMI_CHMI3_WGS2""","""HG00640""",0.1


Hail also supports identity-by-descent calculation but it's currently broken for the new Apple M1 chips because it uses some fast native code that hasn't been compiled for M1 yet. Expect a fix soon!

### Polygenic Score Calculation

In this section, I import a height polygenic score from the [PGS Catalog](https://www.pgscatalog.org/score/PGS000297/), and use it to calculate the polygenic score in our toy dataset. Our toy dataset does not have enough shared variants with the score to produce useful estimates, but the code below could be effectively applied to a larger, quality-controlled dataset.

In [13]:
ht = hl.import_table('resources/height-polygenic-score.txt', comment='#', impute=True)
ht = ht.key_by(
    locus = hl.locus(hl.str(ht.chr_name), ht.chr_position)
)
ht.write('output/height-polygenic-score.ht', overwrite=True)

2023-01-11 14:53:25.098 Hail: INFO: wrote table with 3291 rows in 1 partition to /tmp/persist_tableivSmIMEc2i
2023-01-11 14:53:26.114 Hail: INFO: Reading table to impute column types
2023-01-11 14:53:27.469 Hail: INFO: Finished type imputation
  Loading field 'rsID' as type str (imputed)
  Loading field 'chr_name' as type int32 (imputed)
  Loading field 'chr_position' as type int32 (imputed)
  Loading field 'effect_allele' as type str (imputed)
  Loading field 'other_allele' as type str (imputed)
  Loading field 'effect_weight' as type float64 (imputed)
  Loading field 'variant_description' as type str (imputed)
2023-01-11 14:53:28.156 Hail: INFO: Coerced sorted dataset
2023-01-11 14:53:28.995 Hail: INFO: wrote table with 3290 rows in 1 partition to output/height-polygenic-score.ht


In [14]:
ht = hl.read_table('output/height-polygenic-score.ht')

In [15]:
ht.show()

rsID,chr_name,chr_position,effect_allele,other_allele,effect_weight,variant_description,locus
str,int32,int32,str,str,float64,str,locus<GRCh37>
"""rs385039""",1,2077409,"""G""","""A""",0.0205,"""lead SNPs""",1:2077409
"""rs12743493""",1,2224836,"""A""","""G""",0.0139,"""lead SNPs""",1:2224836
"""rs2843146""",1,2265969,"""T""","""G""",0.00537,"""secondary""",1:2265969
"""rs4648832""",1,2286127,"""G""","""C""",0.00918,"""secondary""",1:2286127
"""rs16823193""",1,2847985,"""T""","""A""",0.0111,"""lead SNPs""",1:2847985
"""rs6704012""",1,3038530,"""C""","""T""",0.00842,"""secondary""",1:3038530
"""rs10909940""",1,3316756,"""A""","""G""",0.0113,"""lead SNPs""",1:3316756
"""rs17404435""",1,3766286,"""G""","""T""",0.00908,"""lead SNPs""",1:3766286
"""rs12046884""",1,6594561,"""G""","""A""",0.015,"""lead SNPs""",1:6594561
"""rs28624""",1,8084355,"""C""","""A""",0.0164,"""lead SNPs""",1:8084355


In [16]:
mt = hl.read_matrix_table('resources/1kg.mt')
mt = hl.variant_qc(mt)
mt = mt.annotate_rows(score=ht[mt.locus])

mt = mt.annotate_rows(is_flipped = (
    hl.case()
    .when(mt.score.effect_allele == mt.alleles[0], True)
    .when(mt.score.effect_allele == mt.alleles[1], False)
    .or_missing()
))
mt = mt.annotate_rows(
    mean_gt=2 * hl.if_else(mt.is_flipped, mt.variant_qc.AF[0], mt.variant_qc.AF[1])
)
mt = mt.annotate_entries(
    n_effect_alleles = hl.if_else(
        mt.is_flipped,
        2 - mt.GT.n_alt_alleles(),
        mt.GT.n_alt_alleles()
    )
)
mt = mt.annotate_cols(
    prs = hl.agg.sum(mt.score.effect_weight * hl.coalesce(mt.n_effect_alleles, mt.mean_gt)),
    n_useful_variants = hl.agg.sum(hl.is_defined(mt.score.effect_weight))
)
mt.cols().show()

[Stage 166:>                                                        (0 + 4) / 4]

s,prs,n_useful_variants
str,float64,int64
"""HG00096""",0.297,13
"""HG00099""",0.327,13
"""HG00105""",0.227,13
"""HG00118""",0.283,13
"""HG00129""",0.316,13
"""HG00148""",0.259,13
"""HG00177""",0.341,13
"""HG00182""",0.403,13
"""HG00242""",0.359,13
"""HG00254""",0.231,13


### LD Score

Hail also has utilities for simulating phenotypes, calculating LD Scores, and running LD Score regression.

In [20]:
mt = hl.read_matrix_table('resources/qced-hgdp-1kg.mt')
mt = hl.experimental.ldscsim.simulate_phenotypes(mt, mt.GT, h2=0.5)
mt.y.show()

calculating phenotype




s,y
str,float64
"""CHMI_CHMI3_WGS2""",-0.461
"""LP6005441-DNA_F08""",-1.4
"""HGDP00843""",0.159
"""HGDP00392""",0.312
"""LP6005441-DNA_H03""",-0.834
"""HGDP00544""",0.566
"""HGDP01053""",0.00525
"""HGDP00191""",-0.126
"""HGDP01399""",2.07
"""LP6005441-DNA_C05""",-0.609


In [21]:
betas = hl.linear_regression_rows(y=mt.y, x=mt.GT.n_alt_alleles(), covariates=[1.0])

2023-01-11 14:56:13.312 Hail: INFO: linear_regression_rows: running on 4151 samples for 1 response variable y,
    with input variable x, and 1 additional covariate...
2023-01-11 14:56:20.075 Hail: INFO: wrote table with 19147 rows in 100 partitions to /tmp/persist_table4gVjdpdl74
    Total size: 1.22 MiB
    * Rows: 1.22 MiB
    * Globals: 11.00 B
    * Smallest partition: 0 rows (21.00 B)
    * Largest partition:  517 rows (32.20 KiB)


In [22]:
betas.show()

locus,alleles,n,sum_x,y_transpose_x,beta,standard_error,t_stat,p_value
locus<GRCh38>,array<str>,int32,float64,float64,float64,float64,float64,float64
chr1:14464,"[""A"",""T""]",4151,999.0,27.5,0.0515,0.0352,1.46,0.143
chr1:16298,"[""C"",""T""]",4151,4450.0,-115.0,-0.111,0.0404,-2.75,0.00604
chr1:16378,"[""T"",""C""]",4151,4040.0,-71.4,-0.0881,0.0577,-1.53,0.127
chr1:16487,"[""T"",""C""]",4151,706.0,-51.8,-0.0745,0.0401,-1.86,0.063
chr1:16495,"[""G"",""C""]",4151,4080.0,-55.3,-0.122,0.118,-1.03,0.302
chr1:16841,"[""G"",""T""]",4151,501.0,-7.69,-0.00444,0.0468,-0.0948,0.924
chr1:17020,"[""G"",""A""]",4151,1760.0,-22.8,-0.00258,0.0311,-0.083,0.934
chr1:17147,"[""G"",""A""]",4151,473.0,-3.86,0.00397,0.0484,0.082,0.935
chr1:17385,"[""G"",""A""]",4151,1860.0,-12.4,0.00905,0.0306,0.296,0.768
chr1:17407,"[""G"",""A""]",4151,538.0,18.0,0.0523,0.0451,1.16,0.246


In [28]:
?hl.experimental.ld_score

In [23]:
ht_scores = hl.experimental.ld_score(entry_expr=mt.GT.n_alt_alleles(),
                                     locus_expr=mt.locus,
                                     radius=1e6)


betas = betas.annotate(z_score = betas.beta / betas.standard_error)
betas = betas.annotate(chi_sq_statistic = betas.z_score ** 2)

ht = mt.rows()

ht_results = hl.experimental.ld_score_regression(
    weight_expr=ht_scores[ht.locus].univariate,
    ld_score_expr=ht_scores[ht.locus].univariate,
    chi_sq_exprs=betas[ht.key].chi_sq_statistic,
    n_samples_exprs=betas[ht.key].n
)

2023-01-11 14:56:36.015 Hail: INFO: Wrote all 10 blocks of 19147 x 4151 matrix with block size 4096.
2023-01-11 14:57:01.615 Hail: INFO: wrote matrix with 19147 rows and 19147 columns as 15 blocks of size 4096 to /tmp/xSPnI5xVVcn6XeJAiRzWim
2023-01-11 14:57:02.961 Hail: INFO: wrote matrix with 19147 rows and 1 column as 5 blocks of size 4096 to /tmp/UNmxNYTaoSg6YI5sKo7HFZ
2023-01-11 14:57:03.654 Hail: INFO: merging 5 files totalling 343.7K...
2023-01-11 14:57:03.683 Hail: INFO: while writing:
    /tmp/4fGMYYqi3jxceDnVUkwocF
  merge time: 27.676ms
2023-01-11 14:57:04.704 Hail: INFO: wrote table with 19147 rows in 1 partition to /tmp/persist_tablekM4gtc1o6v
2023-01-11 14:57:05.116 Hail: INFO: Reading table to impute column types
2023-01-11 14:57:05.619 Hail: INFO: Finished type imputation
  Loading field 'f0' as type float64 (imputed)
2023-01-11 14:57:14.284 Hail: INFO: Coerced sorted dataset
2023-01-11 14:57:21.470 Hail: INFO: Coerced sorted dataset=====> (97 + 3) / 100]
2023-01-11 14:5

In [25]:
ht_results.write('output/ldsr.ht', overwrite=True)
ldsr = hl.read_table('output/ldsr.ht')
ldsr.show()

2023-01-11 14:58:29.997 Hail: INFO: wrote table with 1 row in 1 partition to output/ldsr.ht


Unnamed: 0_level_0,Unnamed: 1_level_0,intercept,intercept,snp_heritability,snp_heritability
phenotype,mean_chi_sq,estimate,standard_error,estimate,standard_error
int32,float64,float64,float64,float64,float64
0,10.8,1.53,0.189,0.345,0.0258


### Annotation Database

The Hail team maintains a database of common variant annotations in Google Cloud Storage and S3. These commands will only work when executed inside a cluster with access to Google Cloud Storage or S3.

A full list of available annotations can be found [in the Hail docs](https://hail.is/docs/0.2/annotation_database_ui.html).

In [None]:
mt = hl.read_matrix_table('resources/1kg.mt')

db = hl.experimental.DB(region='us', cloud='aws')
mt = db.annotate_rows_db(
    mt,
    'CADD', 'GTEx_eQTL_Adipose_Subcutaneous_all_snp_gene_associations', 'gnomad_ld_scores_afr'
)
mt.rows().show()

### VEP

Hail also supports VEP annotation. This requires a specially configured cluster.

In [None]:
mt = hl.read_matrix_table('resources/1kg.mt')
mt = hl.vep(mt)
mt.vep.show()

In [26]:
mt = hl.read_matrix_table('resources/qced-hgdp-1kg.mt')
mt.vep.show()

Unnamed: 0_level_0,Unnamed: 1_level_0,vep,vep,vep,vep,vep,vep,vep,vep,vep,vep,vep,vep,vep,vep,vep,vep
locus,alleles,assembly_name,allele_string,ancestral,context,end,id,input,intergenic_consequences,most_severe_consequence,motif_feature_consequences,regulatory_feature_consequences,seq_region_name,start,strand,transcript_consequences,variant_class
locus<GRCh38>,array<str>,str,str,str,str,int32,str,str,"array<struct{allele_num: int32, consequence_terms: array<str>, impact: str, minimised: int32, variant_allele: str}>",str,"array<struct{allele_num: int32, consequence_terms: array<str>, high_inf_pos: str, impact: str, minimised: int32, motif_feature_id: str, motif_name: str, motif_pos: int32, motif_score_change: float64, strand: int32, variant_allele: str}>","array<struct{allele_num: int32, biotype: str, consequence_terms: array<str>, impact: str, minimised: int32, regulatory_feature_id: str, variant_allele: str}>",str,int32,int32,"array<struct{allele_num: int32, amino_acids: str, appris: str, biotype: str, canonical: int32, ccds: str, cdna_start: int32, cdna_end: int32, cds_end: int32, cds_start: int32, codons: str, consequence_terms: array<str>, distance: int32, domains: array<struct{db: str, name: str}>, exon: str, gene_id: str, gene_pheno: int32, gene_symbol: str, gene_symbol_source: str, hgnc_id: str, hgvsc: str, hgvsp: str, hgvs_offset: int32, impact: str, intron: str, lof: str, lof_flags: str, lof_filter: str, lof_info: str, minimised: int32, polyphen_prediction: str, polyphen_score: float64, protein_end: int32, protein_start: int32, protein_id: str, sift_prediction: str, sift_score: float64, strand: int32, swissprot: str, transcript_id: str, trembl: str, tsl: int32, uniparc: str, variant_allele: str}>",str
chr1:14464,"[""A"",""T""]","""GRCh38""","""A/T""",,,14464,""".""","""chr1	14464	.	A	T	.	.	GT""",,"""non_coding_transcript_exon_variant""",,,"""chr1""",14464,1,"[(1,NA,NA,""transcribed_unprocessed_pseudogene"",NA,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],794,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000450305"",NA,NA,NA,""T""),(1,NA,NA,""processed_transcript"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],55,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000456328"",NA,1,NA,""T""),(1,NA,NA,""unprocessed_pseudogene"",1,NA,1291,1291,NA,NA,NA,[""non_coding_transcript_exon_variant""],NA,NA,""11/11"",""ENSG00000227232"",NA,""WASH7P"",""HGNC"",""HGNC:38034"",""ENST00000488147.1:n.1291T>A"",NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000488147"",NA,NA,NA,""T""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2905,NA,NA,""ENSG00000278267"",NA,""MIR6859-1"",""HGNC"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000619216"",NA,NA,NA,""T""),(1,NA,NA,""transcribed_pseudogene"",1,NA,1667,1667,NA,NA,NA,[""non_coding_transcript_exon_variant""],NA,NA,""11/11"",""653635"",NA,""WASH7P"",""EntrezGene"",""HGNC:38034"",""NR_024540.1:n.1667T>A"",NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_024540.1"",NA,NA,NA,""T""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],55,NA,NA,""100287102"",NA,""DDX11L1"",""EntrezGene"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""NR_046018.2"",NA,NA,NA,""T""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2905,NA,NA,""102466751"",NA,""MIR6859-1"",""EntrezGene"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_106918.1"",NA,NA,NA,""T"")]","""SNV"""
chr1:16298,"[""C"",""T""]","""GRCh38""","""C/T""",,,16298,""".""","""chr1	16298	.	C	T	.	.	GT""",,"""intron_variant""","[(1,[""TF_binding_site_variant""],""Y"",""MODIFIER"",NA,""ENSM00205579305"",""ENSPFM0042"",4,-4.40e-02,-1,""T"")]","[(1,""CTCF_binding_site"",[""regulatory_region_variant""],""MODIFIER"",NA,""ENSR00000344266"",""T""),(1,""TF_binding_site"",[""regulatory_region_variant""],""MODIFIER"",NA,""ENSR00000918273"",""T"")]","""chr1""",16298,1,"[(1,NA,NA,""transcribed_unprocessed_pseudogene"",NA,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2628,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000450305"",NA,NA,NA,""T""),(1,NA,NA,""processed_transcript"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],1889,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000456328"",NA,1,NA,""T""),(1,NA,NA,""unprocessed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""ENSG00000227232"",NA,""WASH7P"",""HGNC"",""HGNC:38034"",""ENST00000488147.1:n.1067+309G>A"",NA,NA,""MODIFIER"",""8/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000488147"",NA,NA,NA,""T""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],1071,NA,NA,""ENSG00000278267"",NA,""MIR6859-1"",""HGNC"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000619216"",NA,NA,NA,""T""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""653635"",NA,""WASH7P"",""EntrezGene"",""HGNC:38034"",""NR_024540.1:n.1080+309G>A"",NA,NA,""MODIFIER"",""8/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_024540.1"",NA,NA,NA,""T""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],1889,NA,NA,""100287102"",NA,""DDX11L1"",""EntrezGene"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""NR_046018.2"",NA,NA,NA,""T""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],1071,NA,NA,""102466751"",NA,""MIR6859-1"",""EntrezGene"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_106918.1"",NA,NA,NA,""T"")]","""SNV"""
chr1:16378,"[""T"",""C""]","""GRCh38""","""T/C""",,,16378,""".""","""chr1	16378	.	T	C	.	.	GT""",,"""intron_variant""",,"[(1,""CTCF_binding_site"",[""regulatory_region_variant""],""MODIFIER"",NA,""ENSR00000344266"",""C""),(1,""TF_binding_site"",[""regulatory_region_variant""],""MODIFIER"",NA,""ENSR00000918273"",""C"")]","""chr1""",16378,1,"[(1,NA,NA,""transcribed_unprocessed_pseudogene"",NA,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2708,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000450305"",NA,NA,NA,""C""),(1,NA,NA,""processed_transcript"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],1969,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000456328"",NA,1,NA,""C""),(1,NA,NA,""unprocessed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""ENSG00000227232"",NA,""WASH7P"",""HGNC"",""HGNC:38034"",""ENST00000488147.1:n.1067+229A>G"",NA,NA,""MODIFIER"",""8/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000488147"",NA,NA,NA,""C""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],991,NA,NA,""ENSG00000278267"",NA,""MIR6859-1"",""HGNC"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000619216"",NA,NA,NA,""C""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""653635"",NA,""WASH7P"",""EntrezGene"",""HGNC:38034"",""NR_024540.1:n.1080+229A>G"",NA,NA,""MODIFIER"",""8/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_024540.1"",NA,NA,NA,""C""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],1969,NA,NA,""100287102"",NA,""DDX11L1"",""EntrezGene"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""NR_046018.2"",NA,NA,NA,""C""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],991,NA,NA,""102466751"",NA,""MIR6859-1"",""EntrezGene"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_106918.1"",NA,NA,NA,""C"")]","""SNV"""
chr1:16487,"[""T"",""C""]","""GRCh38""","""T/C""",,,16487,""".""","""chr1	16487	.	T	C	.	.	GT""",,"""intron_variant""",,"[(1,""CTCF_binding_site"",[""regulatory_region_variant""],""MODIFIER"",NA,""ENSR00000344266"",""C"")]","""chr1""",16487,1,"[(1,NA,NA,""transcribed_unprocessed_pseudogene"",NA,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2817,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000450305"",NA,NA,NA,""C""),(1,NA,NA,""processed_transcript"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2078,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000456328"",NA,1,NA,""C""),(1,NA,NA,""unprocessed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""ENSG00000227232"",NA,""WASH7P"",""HGNC"",""HGNC:38034"",""ENST00000488147.1:n.1067+120A>G"",NA,NA,""MODIFIER"",""8/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000488147"",NA,NA,NA,""C""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],882,NA,NA,""ENSG00000278267"",NA,""MIR6859-1"",""HGNC"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000619216"",NA,NA,NA,""C""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""653635"",NA,""WASH7P"",""EntrezGene"",""HGNC:38034"",""NR_024540.1:n.1080+120A>G"",NA,NA,""MODIFIER"",""8/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_024540.1"",NA,NA,NA,""C""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2078,NA,NA,""100287102"",NA,""DDX11L1"",""EntrezGene"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""NR_046018.2"",NA,NA,NA,""C""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],882,NA,NA,""102466751"",NA,""MIR6859-1"",""EntrezGene"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_106918.1"",NA,NA,NA,""C"")]","""SNV"""
chr1:16495,"[""G"",""C""]","""GRCh38""","""G/C""",,,16495,""".""","""chr1	16495	.	G	C	.	.	GT""",,"""intron_variant""",,"[(1,""CTCF_binding_site"",[""regulatory_region_variant""],""MODIFIER"",NA,""ENSR00000344266"",""C"")]","""chr1""",16495,1,"[(1,NA,NA,""transcribed_unprocessed_pseudogene"",NA,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2825,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000450305"",NA,NA,NA,""C""),(1,NA,NA,""processed_transcript"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2086,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000456328"",NA,1,NA,""C""),(1,NA,NA,""unprocessed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""ENSG00000227232"",NA,""WASH7P"",""HGNC"",""HGNC:38034"",""ENST00000488147.1:n.1067+112C>G"",NA,NA,""MODIFIER"",""8/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000488147"",NA,NA,NA,""C""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],874,NA,NA,""ENSG00000278267"",NA,""MIR6859-1"",""HGNC"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000619216"",NA,NA,NA,""C""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""653635"",NA,""WASH7P"",""EntrezGene"",""HGNC:38034"",""NR_024540.1:n.1080+112C>G"",NA,NA,""MODIFIER"",""8/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_024540.1"",NA,NA,NA,""C""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2086,NA,NA,""100287102"",NA,""DDX11L1"",""EntrezGene"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""NR_046018.2"",NA,NA,NA,""C""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],874,NA,NA,""102466751"",NA,""MIR6859-1"",""EntrezGene"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_106918.1"",NA,NA,NA,""C"")]","""SNV"""
chr1:16841,"[""G"",""T""]","""GRCh38""","""G/T""",,,16841,""".""","""chr1	16841	.	G	T	.	.	GT""",,"""intron_variant""",,,"""chr1""",16841,1,"[(1,NA,NA,""transcribed_unprocessed_pseudogene"",NA,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],3171,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000450305"",NA,NA,NA,""T""),(1,NA,NA,""processed_transcript"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2432,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000456328"",NA,1,NA,""T""),(1,NA,NA,""unprocessed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""ENSG00000227232"",NA,""WASH7P"",""HGNC"",""HGNC:38034"",""ENST00000488147.1:n.908+17C>A"",NA,NA,""MODIFIER"",""7/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000488147"",NA,NA,NA,""T""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],528,NA,NA,""ENSG00000278267"",NA,""MIR6859-1"",""HGNC"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000619216"",NA,NA,NA,""T""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""653635"",NA,""WASH7P"",""EntrezGene"",""HGNC:38034"",""NR_024540.1:n.921+17C>A"",NA,NA,""MODIFIER"",""7/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_024540.1"",NA,NA,NA,""T""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2432,NA,NA,""100287102"",NA,""DDX11L1"",""EntrezGene"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""NR_046018.2"",NA,NA,NA,""T""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],528,NA,NA,""102466751"",NA,""MIR6859-1"",""EntrezGene"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_106918.1"",NA,NA,NA,""T"")]","""SNV"""
chr1:17020,"[""G"",""A""]","""GRCh38""","""G/A""",,,17020,""".""","""chr1	17020	.	G	A	.	.	GT""",,"""non_coding_transcript_exon_variant""",,,"""chr1""",17020,1,"[(1,NA,NA,""transcribed_unprocessed_pseudogene"",NA,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],3350,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000450305"",NA,NA,NA,""A""),(1,NA,NA,""processed_transcript"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2611,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000456328"",NA,1,NA,""A""),(1,NA,NA,""unprocessed_pseudogene"",1,NA,746,746,NA,NA,NA,[""non_coding_transcript_exon_variant""],NA,NA,""7/11"",""ENSG00000227232"",NA,""WASH7P"",""HGNC"",""HGNC:38034"",""ENST00000488147.1:n.746C>T"",NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000488147"",NA,NA,NA,""A""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],349,NA,NA,""ENSG00000278267"",NA,""MIR6859-1"",""HGNC"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000619216"",NA,NA,NA,""A""),(1,NA,NA,""transcribed_pseudogene"",1,NA,759,759,NA,NA,NA,[""non_coding_transcript_exon_variant""],NA,NA,""7/11"",""653635"",NA,""WASH7P"",""EntrezGene"",""HGNC:38034"",""NR_024540.1:n.759C>T"",NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_024540.1"",NA,NA,NA,""A""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2611,NA,NA,""100287102"",NA,""DDX11L1"",""EntrezGene"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""NR_046018.2"",NA,NA,NA,""A""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],349,NA,NA,""102466751"",NA,""MIR6859-1"",""EntrezGene"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_106918.1"",NA,NA,NA,""A"")]","""SNV"""
chr1:17147,"[""G"",""A""]","""GRCh38""","""G/A""",,,17147,""".""","""chr1	17147	.	G	A	.	.	GT""",,"""intron_variant""",,,"""chr1""",17147,1,"[(1,NA,NA,""transcribed_unprocessed_pseudogene"",NA,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],3477,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000450305"",NA,NA,NA,""A""),(1,NA,NA,""processed_transcript"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2738,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000456328"",NA,1,NA,""A""),(1,NA,NA,""unprocessed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""ENSG00000227232"",NA,""WASH7P"",""HGNC"",""HGNC:38034"",""ENST00000488147.1:n.710+86C>T"",NA,NA,""MODIFIER"",""6/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000488147"",NA,NA,NA,""A""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],222,NA,NA,""ENSG00000278267"",NA,""MIR6859-1"",""HGNC"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000619216"",NA,NA,NA,""A""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""653635"",NA,""WASH7P"",""EntrezGene"",""HGNC:38034"",""NR_024540.1:n.723+86C>T"",NA,NA,""MODIFIER"",""6/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_024540.1"",NA,NA,NA,""A""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2738,NA,NA,""100287102"",NA,""DDX11L1"",""EntrezGene"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""NR_046018.2"",NA,NA,NA,""A""),(1,NA,NA,""miRNA"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],222,NA,NA,""102466751"",NA,""MIR6859-1"",""EntrezGene"",""HGNC:50039"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_106918.1"",NA,NA,NA,""A"")]","""SNV"""
chr1:17385,"[""G"",""A""]","""GRCh38""","""G/A""",,,17385,""".""","""chr1	17385	.	G	A	.	.	GT""",,"""mature_miRNA_variant""",,,"""chr1""",17385,1,"[(1,NA,NA,""transcribed_unprocessed_pseudogene"",NA,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],3715,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000450305"",NA,NA,NA,""A""),(1,NA,NA,""processed_transcript"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2976,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000456328"",NA,1,NA,""A""),(1,NA,NA,""unprocessed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""ENSG00000227232"",NA,""WASH7P"",""HGNC"",""HGNC:38034"",""ENST00000488147.1:n.575-17C>T"",NA,NA,""MODIFIER"",""5/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000488147"",NA,NA,NA,""A""),(1,NA,NA,""miRNA"",1,NA,52,52,NA,NA,NA,[""mature_miRNA_variant""],NA,NA,""1/1"",""ENSG00000278267"",NA,""MIR6859-1"",""HGNC"",""HGNC:50039"",""ENST00000619216.1:n.52C>T"",NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000619216"",NA,NA,NA,""A""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""653635"",NA,""WASH7P"",""EntrezGene"",""HGNC:38034"",""NR_024540.1:n.588-17C>T"",NA,NA,""MODIFIER"",""5/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_024540.1"",NA,NA,NA,""A""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2976,NA,NA,""100287102"",NA,""DDX11L1"",""EntrezGene"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""NR_046018.2"",NA,NA,NA,""A""),(1,NA,NA,""miRNA"",1,NA,52,52,NA,NA,NA,[""non_coding_transcript_exon_variant""],NA,NA,""1/1"",""102466751"",NA,""MIR6859-1"",""EntrezGene"",""HGNC:50039"",""NR_106918.1:n.52C>T"",NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_106918.1"",NA,NA,NA,""A"")]","""SNV"""
chr1:17407,"[""G"",""A""]","""GRCh38""","""G/A""",,,17407,""".""","""chr1	17407	.	G	A	.	.	GT""",,"""non_coding_transcript_exon_variant""",,,"""chr1""",17407,1,"[(1,NA,NA,""transcribed_unprocessed_pseudogene"",NA,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],3737,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000450305"",NA,NA,NA,""A""),(1,NA,NA,""processed_transcript"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2998,NA,NA,""ENSG00000223972"",NA,""DDX11L1"",""HGNC"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""ENST00000456328"",NA,1,NA,""A""),(1,NA,NA,""unprocessed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""ENSG00000227232"",NA,""WASH7P"",""HGNC"",""HGNC:38034"",""ENST00000488147.1:n.575-39C>T"",NA,NA,""MODIFIER"",""5/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000488147"",NA,NA,NA,""A""),(1,NA,NA,""miRNA"",1,NA,30,30,NA,NA,NA,[""non_coding_transcript_exon_variant""],NA,NA,""1/1"",""ENSG00000278267"",NA,""MIR6859-1"",""HGNC"",""HGNC:50039"",""ENST00000619216.1:n.30C>T"",NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""ENST00000619216"",NA,NA,NA,""A""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""intron_variant"",""non_coding_transcript_variant""],NA,NA,NA,""653635"",NA,""WASH7P"",""EntrezGene"",""HGNC:38034"",""NR_024540.1:n.588-39C>T"",NA,NA,""MODIFIER"",""5/10"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_024540.1"",NA,NA,NA,""A""),(1,NA,NA,""transcribed_pseudogene"",1,NA,NA,NA,NA,NA,NA,[""downstream_gene_variant""],2998,NA,NA,""100287102"",NA,""DDX11L1"",""EntrezGene"",""HGNC:37102"",NA,NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,""NR_046018.2"",NA,NA,NA,""A""),(1,NA,NA,""miRNA"",1,NA,30,30,NA,NA,NA,[""non_coding_transcript_exon_variant""],NA,NA,""1/1"",""102466751"",NA,""MIR6859-1"",""EntrezGene"",""HGNC:50039"",""NR_106918.1:n.30C>T"",NA,NA,""MODIFIER"",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,""NR_106918.1"",NA,NA,NA,""A"")]","""SNV"""


In [31]:
mt = mt.annotate_rows(
    interesting_cnsq = mt.vep.transcript_consequences.find(lambda x: x.consequence_terms.contains("stop_gained"))
)
mt = mt.filter_rows(hl.is_defined(mt.interesting_cnsq))
mt.interesting_cnsq.show(n=30)



Unnamed: 0_level_0,Unnamed: 1_level_0,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq,interesting_cnsq
locus,alleles,allele_num,amino_acids,appris,biotype,canonical,ccds,cdna_start,cdna_end,cds_end,cds_start,codons,consequence_terms,distance,domains,exon,gene_id,gene_pheno,gene_symbol,gene_symbol_source,hgnc_id,hgvsc,hgvsp,hgvs_offset,impact,intron,lof,lof_flags,lof_filter,lof_info,minimised,polyphen_prediction,polyphen_score,protein_end,protein_start,protein_id,sift_prediction,sift_score,strand,swissprot,transcript_id,trembl,tsl,uniparc,variant_allele
locus<GRCh38>,array<str>,int32,str,str,str,int32,str,int32,int32,int32,int32,str,array<str>,int32,"array<struct{db: str, name: str}>",str,str,int32,str,str,str,str,str,int32,str,str,str,str,str,str,int32,str,float64,int32,int32,str,str,float64,int32,str,str,str,int32,str,str
chr1:1318875,"[""G"",""A""]",1,"""Q/*""",,"""protein_coding""",,,908,908,850,850,"""Cag/Tag""","[""stop_gained""]",,,"""4/4""","""ENSG00000127054""",,"""INTS11""","""HGNC""","""HGNC:26052""","""ENST00000429572.5:c.850C>T""","""ENSP00000481275.1:p.Gln284Ter""",,"""HIGH""",,"""HC""","""PHYLOCSF_WEAK""",,"""PERCENTILE:0.977011494252874,GERP_DIST:-51.8999999761581,BP_DIST:20,DIST_FROM_LAST_EXON:-649,50_BP_RULE:FAIL,ANN_ORF:-1477.72,MAX_ORF:-1477.72""",,,,284,284,"""ENSP00000481275""",,,-1,,"""ENST00000429572""",,2.0,,"""A"""
chr1:1320653,"[""C"",""T""]",1,"""W/*""",,"""nonsense_mediated_decay""",,,151,151,153,153,"""tgG/tgA""","[""stop_gained"",""NMD_transcript_variant""]",,,"""3/4""","""ENSG00000127054""",,"""INTS11""","""HGNC""","""HGNC:26052""","""ENST00000470679.3:c.153G>A""","""ENSP00000434782.1:p.Trp51Ter""",,"""HIGH""",,,,,,,,,51,51,"""ENSP00000434782""",,,-1,,"""ENST00000470679""",,5.0,,"""T"""
chr1:2430625,"[""C"",""CGTGGGTGAGTGAGGCCCTGGCT""]",1,"""-/VGE*GPGX""",,"""protein_coding""",,,346,347,112,111,"""-/GTGGGTGAGTGAGGCCCTGGCT""","[""stop_gained"",""frameshift_variant""]",,"[(""PANTHER"",""PTHR10336""),(""PANTHER"",""PTHR10336"")]","""2/4""","""ENSG00000149527""",,"""PLCH2""","""HGNC""","""HGNC:29037""","""ENST00000609981.5:c.112_115+18dup""",,,"""HIGH""",,"""HC""","""PHYLOCSF_WEAK""",,"""PERCENTILE:0.333333333333333,GERP_DIST:89.3913670599461,BP_DIST:219,DIST_FROM_LAST_EXON:149,50_BP_RULE:PASS,ANN_ORF:-121.354,MAX_ORF:-121.354""",,,,38,37,"""ENSP00000476436""",,,1,,"""ENST00000609981""",,5.0,,"""GTGGGTGAGTGAGGCCCTGGCT"""
chr1:2559459,"[""C"",""A""]",1,"""C/*""",,"""protein_coding""",,,326,326,291,291,"""tgC/tgA""","[""stop_gained""]",,,"""1/5""","""8764""",,"""TNFRSF14""","""EntrezGene""","""HGNC:11912""","""XM_017002718.1:c.291C>A""","""XP_016858207.1:p.Cys97Ter""",,"""HIGH""",,"""LC""",,"""ANC_ALLELE""","""PERCENTILE:0.267955801104972,GERP_DIST:-2622.01975149512,BP_DIST:1473,DIST_FROM_LAST_EXON:1470,50_BP_RULE:PASS,PHYLOCSF_TOO_SHORT""",,,,97,97,"""XP_016858207.1""",,,1,,"""XM_017002718.1""",,,,"""A"""
chr1:3783866,"[""G"",""T""]",1,"""Y/*""",,"""nonsense_mediated_decay""",,,98,98,99,99,"""taC/taA""","[""stop_gained"",""NMD_transcript_variant""]",,,"""2/5""","""ENSG00000130764""",,"""LRRC47""","""HGNC""","""HGNC:29207""","""ENST00000479239.1:c.99C>A""","""ENSP00000462103.1:p.Tyr33Ter""",,"""HIGH""",,,,,,,,,33,33,"""ENSP00000462103""",,,-1,,"""ENST00000479239""",,3.0,,"""T"""
