# Downstream analysis
We create a vcf file containing small and structral variants annotated with their AF and predicted functional impact. Here we are going to go through how to explore those variants.

Let's start by visualizing structral variants to make sure that the callers did a good job.


I am going to


In [None]:
%%bash 
grep "cuteSV-25-8240662-DEL-0-1405" results/cuteSV/ERR5043144.hifi.pbmm2.phased.vcf

# Samplot
The variant of length(1405) starts at 8240662 and ends at 8242067. We are going to use samplot to visualize this variant. 
Plotting using our snakemake workflow is very simple. 

just run 'snakemake -j1 -p results/samplot/{sv\_type}\_{chrom}\_{start}\_{end}.png'

In [None]:
%%bash
snakemake -j1 -p results/samplot/DEL_25_8240662_8242067.png

In [None]:
from IPython.display import Image
Image(filename='results/samplot/DEL_25_8240662_8242067.png') 

# Visualize SV Benchmark
Benchmarking SV is not an easy job because tools always disagree about the positions of breakpoints. Therefore, we can expect that some SV in our benchmarks tagged as FP while it was correct but the breakpoint wasnt matching.

Let's view a SV called by pbsv but was tagged as FP.


In [None]:
%%bash
grep "pbsv-25-9733349-DEL-0-663" results/pbsv/ERR5043144.hifi.pbmm2.phased.vcf

In [None]:
%%bash
snakemake -j1 -p results/samplot/DEL_25_9733349_9734012.png

In [None]:
Image(filename='results/samplot/DEL_25_9733349_9734012.png') 

Looks like that SV is actually correct. Lesson here is that You have to visualize SV to make sure that everything is correct.

# Variant Effect predictor
We are going to use VEP to predict the effect of the variants. The following figgure summarizes the annotations produced by VEP. More information is available on their [website](https://uswest.ensembl.org/info/genome/variation/prediction/predicted_data.html)
![VEP](https://uswest.ensembl.org/info/genome/variation/prediction/consequences.jpg)

### Run VEP using snakemake
to get the output file for vep: replace the extnesion(".vcf.gz") of any compressed vcf file  with ".vep.vcf.gz".

for example: 

results/cuteSV/ERR7091271.ont.minimap2.phased.vcf.gz 

                will be

results/cuteSV/ERR7091271.ont.minimap2.phased.vep.vcf.gz 

In [None]:
%%bash 
snakemake -j4 --use-conda "results/cuteSV/ERR7091271.ont.minimap2.phased.vep.vcf.gz"

### View VEP report
Lets first, look at the summary results they produced. 

1. Browse the folders using the panel on the left to "results/cuteSV/"

2. Download the report "ERR7091271.ont.minimap2.phased.vep.html": right click on the file then click downloand

### Let's visualize a high impact variant
We need first to get the coordinates of a high impact variant to visualize

In [None]:
%%bash
zgrep "coding_sequence_variant"  results/cuteSV/ERR7091271.ont.minimap2.phased.vep.vcf.gz

Let's visualize the first deletion(396bp) starting from 2585287 to 2585683 on chromsome

In [None]:
%%bash
snakemake -j1 -p results/samplot/DEL_25_2585287_2585683.png

In [None]:
Image(filename='results/samplot/DEL_25_2585287_2585683.png')

## Population Frequency analysis
We calculated the AF for our VCFs in 10 samples. Here, I am providing a vcf file produced using the sample workflow but I ran the population genotyper against 428 samples. you will find the result file "final.vep.vcf.bgz" contianing all the vairants and "final.SV.vep.vcf.bgz" containing only the SV. The following table describes the metadata tagged for each variant.


| Metadata      | Description |
| -- |:-----------:|
| AC | Allele count in genotypes|
| AC_Het | Allele counts in homozygous genotypes|
| AC_Hom | Allele counts in heterozygous genotypes|
| AC_Hemi | Allele counts in hemizygous genotypes|
| AF | Allele frequency |
| MAF | Minor Allele frequency |
| NS | Number of samples with data   |
| AN | Total number of alleles in called genotypes |
| HWE | Hardy-Weinberg equilibrium |
| ExcHet | Test excess heterozygosity; 1=good, 0=bad |


Let's first check a file called "samples.csv" containing breed information of the 428 animal. The following command print the first 10 animals

In [None]:
%%bash
cat samples.csv |head|tr -s ',' $'\t' | ../tools/prettytable 3 

The commands below count the number of samples per breed

In [None]:
%%bash
cut -f2 -d, samples.csv |sort |uniq -c| awk '{print $2"\t"$1}' |sort -k2,2nr > tmp
cat <(echo -e "Breed\tcount") tmp | ../tools/prettytable 2

### Find Rare variants

Bcftools is very helpful in filtering vcf files using the variants metadata. For example, We can query the novel varaints using the following command

In [None]:
%%bash
bcftools view  -Q 0.001 final.SV.vep.vcf.bgz  | grep -vP "^#"  |head -n 4

### Finding common variants
on the other hand we can select the most common variants

In [None]:
%%bash
bcftools view  -q 0.9 final.SV.vep.vcf.bgz  | grep -vP "^#"  |head -n 4

# Hail
Although bcftools is very helpful and fast but it is hard to do complex tasks with it. Here we are suggesting using Hail to be able explore the population genotyping results and get meaningful results. Hail is a python library for genomic data expoloration. It creates a matrix table for vcf files which is very similar to R dataframes.

So let's do some coding by intializing Hail engine

In [1]:
import hail as hl
hl.init()
from hail.plot import show
from pprint import pprint
hl.plot.output_notebook()



2023-01-10 20:18:11.182 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.1.3
SparkUI available at http://c6-89.farm.cse.ucdavis.edu:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.107-2387bb00ceee
LOGGING: writing to /home/mshokrof/workshop_12Jan_2023/SV_calling_LR/hail-20230110-2018-0.2.107-2387bb00ceee.log


Now we are going to load the vcf and samples information to create Hail Matrix table

In [2]:
ref="/home/mshokrof/workshop_12Jan_2023_data/ARS-UCD1.2_Btau5.0.1Y.25.fa"
index="/home/mshokrof/workshop_12Jan_2023_data/ARS-UCD1.2_Btau5.0.1Y.25.fa.fai"
vcf="final.SV.vep.vcf.bgz"
samplesInfo="samples.csv"
hlRef=hl.ReferenceGenome.from_fasta_file("ARSUCD",ref,index)

mt = hl.import_vcf(vcf,reference_genome=hlRef)
table = (hl.import_table('samples.csv', impute=True,delimiter=",")
         .key_by('BioSample'))
mt = mt.annotate_cols(breed = table[mt.s])

2023-01-10 20:18:22.116 Hail: INFO: wrote table with 429 rows in 1 partition to /tmp/persist_tableALSnst1A9F
2023-01-10 20:18:23.436 Hail: INFO: Reading table to impute column types
2023-01-10 20:18:24.462 Hail: INFO: Finished type imputation
  Loading field 'BioSample' as type str (imputed)
  Loading field 'CompositeBreed' as type str (imputed)
  Loading field 'Cohort' as type str (imputed)


Lets see how the hail matrix table is organized

In [None]:
mt_sv.rows().show(5)

In [None]:
mt.GT.show(5)

In [3]:
samplesPercohort=mt.aggregate_cols(hl.agg.counter(mt.breed.Cohort))
print(samplesPercohort)

2023-01-10 20:18:31.630 Hail: INFO: scanning VCF for sortedness...
2023-01-10 20:18:32.567 Hail: INFO: Coerced sorted VCF - no additional import work to do


{'bosoutgroup': 36, 'indicus': 24, 'taurus': 368}


### Stratify population allele frequency
Here we are trying to answer questions like which variants are frequent in the Indicus breeds only. We are going to calculate allele frequencies per cohort.


In [4]:
mt=mt.annotate_rows(AF_indicus=hl.agg.filter(mt.breed.Cohort =="indicus",
                                     hl.agg.sum(mt.GT.n_alt_alleles())
                                     / samplesPercohort["indicus"]*2 ))
mt=mt.annotate_rows(AF_taurus=hl.agg.filter(mt.breed.Cohort =="taurus",
                                     hl.agg.sum(mt.GT.n_alt_alleles())
                                     / samplesPercohort["taurus"]*2 ))
mt=mt.annotate_rows(AF_bosoutgroup=hl.agg.filter(mt.breed.Cohort =="bosoutgroup",
                                     hl.agg.sum(mt.GT.n_alt_alleles())
                                     / samplesPercohort["bosoutgroup"]*2 ))
mt.rows().show(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,info,info,info,info,info,info,info,info,info,info,info,info,info,info,info,info,Unnamed: 21_level_0,Unnamed: 22_level_0,Unnamed: 23_level_0
locus,alleles,rsid,qual,filters,AC,AC_Hemi,AC_Het,AC_Hom,AF,AK,AN,DP,ExcHet,HWE,ID,MA,MAF,NS,UK,CSQ,AF_indicus,AF_taurus,AF_bosoutgroup
locus<ARSUCD>,array<str>,str,float64,set<str>,array<int32>,array<int32>,array<int32>,array<int32>,array<float64>,array<int32>,int32,int32,array<float64>,array<float64>,array<str>,int32,float64,int32,int32,array<str>,float64,float64,float64
25:3531,"[""G"",""GTATGTATGTATGTATTATGTATACATACACATATGTATACATACATATATATTGACTTAATGTCAAACCCAGTTAAAATAGCAAGAGTTTGTATGTATGTATGTATGT""]","""cuteSV-25-3531-INS-0-108""",-10.0,{},[4],[0],[0],[4],[4.67e-03],"[80,126]",856,,[1.00e+00],[4.11e-06],"[""cuteSV-25-3531-INS-0-108""]",0,0.00467,428,186,"[""TATGTATGTATGTATTATGTATACATACACATATGTATACATACATATATATTGACTTAATGTCAAACCCAGTTAAAATAGCAAGAGTTTGTATGTATGTATGTATGT|intergenic_variant|MODIFIER|||||||||||||||||||||""]",0.0,0.0217,0.0
25:8572,"[""G"",""GCGTGTGTGTGTGCGTCTGTCTCGGAATCTTCATGTATTTCTGCCTCTGTATCTCCTGACAGCCGCGCACCTGCCGTTCGTCTGGCCCGGGCCTCTCCCAGGTGCTCTGGCTCTGGCTGAGAGCTTGCACAGAACGGCTGTCTCCCACTTTGCGTGGGTTCGTGCGAGTGTCTGTGTGTCTGTCTGTGTGCCTGCCTGTGGCTCATCTGCTCTCTCAGGAATGTCGAGGGGAGGGCAGCGTGGGGGCAAAGGGGCGTCGGTAGTGCGAAGGGCGGGGCTGAGGCGCTCAGTCGGCTGCGCCTCCTGGCTTCCAGTGGGTTTGGGCACGGACAGGTGCCGAGAGCAGGGCACTCAAGTTTCGCTGCGCCTAGGCCCGCAGGAGGTTCAGAGGCCCCTGGGCCGCGGGGGTTGGGACGGCGGGCCCCGAAAGACAGATACCTCCGGGGCCGCGCGGGCCTGAGAAGCGAAAGGGATGCGGGGAGGCCCCGGCTCAGCGGCCCGGCCGGGTCTGGGTGCCCGGGGGTGGGAGGGCGCCGGGGCCTCCGCACCCCTCAGCCCAGCCCCCAGGCCCGCACGCACGCATGCACGCATGCGCCCTCCCGGTCAATGGGGTCCTCGAAGCCCCTCGGGCCCAGGCCCGGCCAGGTGCTGCTGGTCTGGTGTTGGCGGTGTTAGGGGCGGGCGGCGTCAGCAGGCTGGAGTCCGGTCGCGGGTCATGCGGCTCAAGTGCAGCACGCCCCCGAAGAGAGGTGCGCGGCCCATTGGCCCGACCTGCAGGTCGGAGCGAAAGGAGAGAGCGACGCAAAGCTCTTGAGACCTTGCGCCAACCCGCCTCAATCTCCGGCTGTATCAGCCTGAAGGCTCCCGATCTCGTCTGATCTCGGAAGCTAAGCAGGGTCCTAAGCAGGTCGGGCCTGGTTAGTATCTTGGATAAGGAGACCACCTGGGAATCCCGGGTGCTCGAGGCTTTGCCTGCAGAAGGTCCTTTAGCCCCCGCCTCGCGTCCAGATTCTTGGATCCCCCCGCCTTTAGCCCTCCCGGGCGGGGGGCACCGATGTGGGGCCCCGACTCCCTTTGCAGACCTGGGTTGCCTCCGCGCGGCCTGCGGGCCTCCGCATTCGCATTCGGCTTCAGGACACCCGACGACCGGCAGTCCGTCTCCATCTGGGGCTCCTTTGGCTTGCCTCAGGCAATCCCGGGGCTTACACCGACTCCAGCCATAGCCCCAGCCCTTCCCCTGCCTCGTGATCCATTAGCAGCATGTGCATGTGTAGCAGCGCAAGCACAGATAGATGTGCCCAAGCAGGCATCTATCTTGTTCACCTATACTGTGGGTGGATCTATGCCTGGGAGTCGGACTGCTGAACCATAGGCTGCTTCCTTTTAGTTCCTTGGGGAACTTCCATAGGCTTTTCCATACCGGCTCTGCCGCCATTCCTGCTCACCGTGGAGAGTTCCCTTTGCTCACATCTTCTCCAGCATTTGTTATTAACCAGACATTGCAGTGAGGCCCAATCGGACTCATGTGAAGTGGTCTCTCCTGGGAGTTTGGAGTTGAGCCTCTCCAATCATTAGTGATGGAGCAGATGACCTTTGGAAATGCAGATGCAATGCTGTCACCTCCCTGCTCCAAGCAGCTCTGCGTATTTTCCCAGCACATATTGGCTCCAAACCACTTGCAGTTCAGGCTTGCTCCTCCACATTCTTCTCCTGATGTCTTTTGCCTGGAATGCCCTCTGCCTCTTTAGCTTCCATCACGGAGCATTTTTTACGTTTCTTGAATGTCCCTGGAGTTCCCTGGTCACATAGCCTTCAAGATGCTATTTTCCCTGCTGAAATTCTTCCTGCATCTCAGCTGGTTTACATAGGTTCAACCTACTATGGACCTCAATCACAATCAGTTCTCTGAAAGCCTTCCCTGACGCAGGTTTTGGGGAAGGCCCCTCTATACGTTCTCACAGGATCTGCTTGTGATGTATCATTGAGTCGTCTCAGTGAAAGCAGGAACTCTGTGTCTTCTTCTCCACATTCTATTTCCAGAAACTTATTACTTGCCTCTAGCTCATCGTAGGTGCTCAGTATTTACTTAATTAATGTGAAAGACAAGAAAAGCTCTGGGAAGCCTCAAAATTGAAACTTTTGATCTCAGTAGGGTCAAATTCTTTCCTTTCCTGGGTTGATGTGTGTACACTATATTGAGGGCACAATCTACGGCATGAAGACACATCCATACATATAGGTTCTTTCAAGGAACTCGTACAAGCAGATTTCTACAAGTTAGGGCTTTTCAAATGCTCTAGAAGAATGTGAACGTTCAGTATGCCATTTTCATTTCAAAAGAAAGTCGTTTGACTTCCATAGCAAGGTGGGGAGACCTTCCAGCACGTTTAGTCCAATACTTATACCTTGACCATTTTCTGGCAAATCTTGTAGTTCCTTCTGGAGGTGTGTGGCTCTTGCAGAAATGTGAGGTATTCCATCGTCCTCAGACTTTGGTGTCATTTGCAATCTCAAGACAGCTTCGGGTGTAGGTTCAGACCACCACCACACACAGTCATTAGCCAGGTGGGATTCAGGAATATGCGTTGTTAACCAGCATCACCGTTGATACTGTTGCTGCACACACAATGCAGTGGGCCGGGCCTGGATTGTCCACTGGGTCTCATGTGGAAAGGAGAGAAGGTTCCCCACACACAAACCCCACCCCCTCTGTTGACAGCCTTGTGACTAAACGGTCCCGAAACAAAGAATCAGAAAAACCAGAAGCCAAGATGGTAGTCCTTGAGATGAAGGAACTAAGCAGCCTGGTTGAGATGGCACCAAGGGGAAAGGTTATGCTCTTTCTGTCTGCCCCTCTACTCCTTTCTTGCCTGGACGCCTTAGGGGTTCATCATGGAAATTTGAAATAAAGTTCACAAAAGTCATATGTGTTGAATCAAGATTTTGCATTTAACCATCTCTCAAAAAAGGAAAGAGAGCTTCGCCTGCATGTTTCTTTCAAAAACGTCAGTTGCTTTTCAATTCTGGGGTTTCTCTCTTGCCTTGTTCTAGTAGCCTTAATAGTAAAAATCAACAAAAACCACATAAAGATTCACAGATACCAACAAAGGACAGCTTCTGTTTCTGAAATCATCTTTGCATTTTAGTAGAAAAACGAGCTCTATTTTTTGTATTTTGCCTAGGGAATTAGCGATGTCCTTGTTCAGATGGCAAGGCATCCGCCCGCGGGAGACTTCGGGTTTGATCCCTGGGTCGGGAAGATCCCCCGGAGGAAAGTGGCATCCCTCAGTACATTGCAAACGCAGAATGCCACGGACAGAGGAGCCTGATGGACGCAGTCCATGGGTCGTATCAAGTTGGGCTCAACCCAGCAACCCTAGAACAGAAGTATTTCCAAAGACCACACTCAAGTATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGATATTATCTCCTCTTGCCGTCTTA""]","""cuteSV-25-8572-INS-0-3469""",-10.0,{},[0],[0],[0],[0],[0.00e+00],"[0,301]",856,,[1.00e+00],[1.00e+00],"[""cuteSV-25-8572-INS-0-3469""]",0,0.0,428,301,"[""CGTGTGTGTGTGCGTCTGTCTCGGAATCTTCATGTATTTCTGCCTCTGTATCTCCTGACAGCCGCGCACCTGCCGTTCGTCTGGCCCGGGCCTCTCCCAGGTGCTCTGGCTCTGGCTGAGAGCTTGCACAGAACGGCTGTCTCCCACTTTGCGTGGGTTCGTGCGAGTGTCTGTGTGTCTGTCTGTGTGCCTGCCTGTGGCTCATCTGCTCTCTCAGGAATGTCGAGGGGAGGGCAGCGTGGGGGCAAAGGGGCGTCGGTAGTGCGAAGGGCGGGGCTGAGGCGCTCAGTCGGCTGCGCCTCCTGGCTTCCAGTGGGTTTGGGCACGGACAGGTGCCGAGAGCAGGGCACTCAAGTTTCGCTGCGCCTAGGCCCGCAGGAGGTTCAGAGGCCCCTGGGCCGCGGGGGTTGGGACGGCGGGCCCCGAAAGACAGATACCTCCGGGGCCGCGCGGGCCTGAGAAGCGAAAGGGATGCGGGGAGGCCCCGGCTCAGCGGCCCGGCCGGGTCTGGGTGCCCGGGGGTGGGAGGGCGCCGGGGCCTCCGCACCCCTCAGCCCAGCCCCCAGGCCCGCACGCACGCATGCACGCATGCGCCCTCCCGGTCAATGGGGTCCTCGAAGCCCCTCGGGCCCAGGCCCGGCCAGGTGCTGCTGGTCTGGTGTTGGCGGTGTTAGGGGCGGGCGGCGTCAGCAGGCTGGAGTCCGGTCGCGGGTCATGCGGCTCAAGTGCAGCACGCCCCCGAAGAGAGGTGCGCGGCCCATTGGCCCGACCTGCAGGTCGGAGCGAAAGGAGAGAGCGACGCAAAGCTCTTGAGACCTTGCGCCAACCCGCCTCAATCTCCGGCTGTATCAGCCTGAAGGCTCCCGATCTCGTCTGATCTCGGAAGCTAAGCAGGGTCCTAAGCAGGTCGGGCCTGGTTAGTATCTTGGATAAGGAGACCACCTGGGAATCCCGGGTGCTCGAGGCTTTGCCTGCAGAAGGTCCTTTAGCCCCCGCCTCGCGTCCAGATTCTTGGATCCCCCCGCCTTTAGCCCTCCCGGGCGGGGGGCACCGATGTGGGGCCCCGACTCCCTTTGCAGACCTGGGTTGCCTCCGCGCGGCCTGCGGGCCTCCGCATTCGCATTCGGCTTCAGGACACCCGACGACCGGCAGTCCGTCTCCATCTGGGGCTCCTTTGGCTTGCCTCAGGCAATCCCGGGGCTTACACCGACTCCAGCCATAGCCCCAGCCCTTCCCCTGCCTCGTGATCCATTAGCAGCATGTGCATGTGTAGCAGCGCAAGCACAGATAGATGTGCCCAAGCAGGCATCTATCTTGTTCACCTATACTGTGGGTGGATCTATGCCTGGGAGTCGGACTGCTGAACCATAGGCTGCTTCCTTTTAGTTCCTTGGGGAACTTCCATAGGCTTTTCCATACCGGCTCTGCCGCCATTCCTGCTCACCGTGGAGAGTTCCCTTTGCTCACATCTTCTCCAGCATTTGTTATTAACCAGACATTGCAGTGAGGCCCAATCGGACTCATGTGAAGTGGTCTCTCCTGGGAGTTTGGAGTTGAGCCTCTCCAATCATTAGTGATGGAGCAGATGACCTTTGGAAATGCAGATGCAATGCTGTCACCTCCCTGCTCCAAGCAGCTCTGCGTATTTTCCCAGCACATATTGGCTCCAAACCACTTGCAGTTCAGGCTTGCTCCTCCACATTCTTCTCCTGATGTCTTTTGCCTGGAATGCCCTCTGCCTCTTTAGCTTCCATCACGGAGCATTTTTTACGTTTCTTGAATGTCCCTGGAGTTCCCTGGTCACATAGCCTTCAAGATGCTATTTTCCCTGCTGAAATTCTTCCTGCATCTCAGCTGGTTTACATAGGTTCAACCTACTATGGACCTCAATCACAATCAGTTCTCTGAAAGCCTTCCCTGACGCAGGTTTTGGGGAAGGCCCCTCTATACGTTCTCACAGGATCTGCTTGTGATGTATCATTGAGTCGTCTCAGTGAAAGCAGGAACTCTGTGTCTTCTTCTCCACATTCTATTTCCAGAAACTTATTACTTGCCTCTAGCTCATCGTAGGTGCTCAGTATTTACTTAATTAATGTGAAAGACAAGAAAAGCTCTGGGAAGCCTCAAAATTGAAACTTTTGATCTCAGTAGGGTCAAATTCTTTCCTTTCCTGGGTTGATGTGTGTACACTATATTGAGGGCACAATCTACGGCATGAAGACACATCCATACATATAGGTTCTTTCAAGGAACTCGTACAAGCAGATTTCTACAAGTTAGGGCTTTTCAAATGCTCTAGAAGAATGTGAACGTTCAGTATGCCATTTTCATTTCAAAAGAAAGTCGTTTGACTTCCATAGCAAGGTGGGGAGACCTTCCAGCACGTTTAGTCCAATACTTATACCTTGACCATTTTCTGGCAAATCTTGTAGTTCCTTCTGGAGGTGTGTGGCTCTTGCAGAAATGTGAGGTATTCCATCGTCCTCAGACTTTGGTGTCATTTGCAATCTCAAGACAGCTTCGGGTGTAGGTTCAGACCACCACCACACACAGTCATTAGCCAGGTGGGATTCAGGAATATGCGTTGTTAACCAGCATCACCGTTGATACTGTTGCTGCACACACAATGCAGTGGGCCGGGCCTGGATTGTCCACTGGGTCTCATGTGGAAAGGAGAGAAGGTTCCCCACACACAAACCCCACCCCCTCTGTTGACAGCCTTGTGACTAAACGGTCCCGAAACAAAGAATCAGAAAAACCAGAAGCCAAGATGGTAGTCCTTGAGATGAAGGAACTAAGCAGCCTGGTTGAGATGGCACCAAGGGGAAAGGTTATGCTCTTTCTGTCTGCCCCTCTACTCCTTTCTTGCCTGGACGCCTTAGGGGTTCATCATGGAAATTTGAAATAAAGTTCACAAAAGTCATATGTGTTGAATCAAGATTTTGCATTTAACCATCTCTCAAAAAAGGAAAGAGAGCTTCGCCTGCATGTTTCTTTCAAAAACGTCAGTTGCTTTTCAATTCTGGGGTTTCTCTCTTGCCTTGTTCTAGTAGCCTTAATAGTAAAAATCAACAAAAACCACATAAAGATTCACAGATACCAACAAAGGACAGCTTCTGTTTCTGAAATCATCTTTGCATTTTAGTAGAAAAACGAGCTCTATTTTTTGTATTTTGCCTAGGGAATTAGCGATGTCCTTGTTCAGATGGCAAGGCATCCGCCCGCGGGAGACTTCGGGTTTGATCCCTGGGTCGGGAAGATCCCCCGGAGGAAAGTGGCATCCCTCAGTACATTGCAAACGCAGAATGCCACGGACAGAGGAGCCTGATGGACGCAGTCCATGGGTCGTATCAAGTTGGGCTCAACCCAGCAACCCTAGAACAGAAGTATTTCCAAAGACCACACTCAAGTATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGATATTATCTCCTCTTGCCGTCTTA|upstream_gene_variant|MODIFIER|5S_rRNA|ENSBTAG00000048872|Transcript|ENSBTAT00000076247|rRNA|||||||||||895|1||RFAM||""]",0.0,0.0,0.0
25:9608,"[""C"",""CCGCTGGCCCCGCCTCGCGTCAGATTCTTGGATCCCCCGCAACCCCAGCCCCTCCCGGGCGGGGGGAGCCGGTGGCCAGGGCCCCGACTCCTTTGCAGACCTGGGTTGCCTCCCGCGCGGCCTGCGAGGCCTCGTGCGTCGGCTTCCCAGGCTGCCCGACGACCGGCAGTCCCGTCTCCATCTGGGGCTCCTTTGGCTTGCCTCAGGCAATCCCAGGAGCTTACCGACTCCAGCCATAGCCCCCGCCCCACCCCCTGCCTCGCAGTCATTGCGCATTGCATGTGCGCGCGCAAGCACAGATAGTGTGCCCAAACGGGGCATCTATCTTGTTCACCTATACTGTGGGTGGATCTATGCCTGGGAGTTGGACTGCTGAACCATAGGCTACTTCCACTTTTTGGTTCCTTCGGGGAACCTCCATAGGCTTTCCATACCGGCTCTGCACGCCCATTCCTACTCACCGTGGAGGAGAGTTCCCTTTGCTCCACATCTTCTCCAGCATTTGTTATTGACGGAGCAGCATTGCAGTGGGCCCAATCGGACTCATGTGAAGTGGTCTCTCCTGGGAGTTTGGGTTGAGCCTCTCCAATCATTAGTGATGTGGAGCAAGATGACCTTTTGGAAATGCAGATGCAATGCTGTCACCTCCCTGCTCAACGGCTCTCGTATTTTCCCAGCATATTGAAGATACGACCTTCCTTCAGGCTTGCTCCTCCACGTTCTTCTCTGATGTCTTTGCCTGGAATGCCCTCTGCTCTTTGCTTCCATCCCATAGGGCATTTTTAAGTTTCTTGAATGTCCTAAGGTTCCTGAGTCACATAGCCTTCCAAAGATGCTATTTTCCTGCCTGAAATTCTTCCTGCATCTCACGCTGGTTTACATGGAGGTTCAACCTATGGACCTCAATGCAATAGTTCCTCTGAAAAAGCCTGACACCCAAGGTTTTTGGTGAGGGAAAGAGCCCCTCTATACGTTCTCACAGGATCTGCTTGTGATGTATCATTAGAGTCGTCTCAGTGAAAGCAGGAACTCTGTGTCTTCTTTACTCTCCACATTCTATTTCCAGAAACTTATTACTTGCCTCCCTAGCTCATCGTAGGTGCTCAGTATTTACTTAGTGTGTGAAAGACAGAAAAGCTCTGGGAAGCCTCAGGTGGAAACTTTTGATCTCTTTGGCGATTCAATTCTTTCCTTTCCTGGGTTGATGTGTACACTATATTCGAGGCACAATCTACGGCATGAAGACACATCCATACATATGGGTTCTTGCTTTCAAGGAACTCGTACAGCAGATTTCTACAAGTTAGGGCTTTTCAAATGCTCTAGAAGAATGTGAACATTCAGTATGCTCCATTTCATTTCAAAGAAAAGATCGTTTGACTTCCATAGCAAAGGTGGGAACCTTCCAACGTTTAGTCCAATACTTATACCTTGACCATTTCAGGCAAGTCTTGTGGTTCCTTCTGGAGAGGTGTGTGGCTCTTGCAGAAATGTGAAGGTCTTCCATAGATCCTCAGACTTTGGTGTCATTTGCAATCATCAAGACAGCTTCCGGGTGTAGGTTCCAGACCACCACCACACTGATCATTAGTAGCGGAGTGGATTCCAGGAATATCGTTGTTAACCAGCATCACCGTTGATACTCTTGCTACACAATGCAGTGGGCGGAGCCTGGATTGTCCACTGGGTCTCATGTGGAAAGGAGAAGAGGTTTCCCCACTAAACCCCCCTCTGTTGACAGCCTTGTGACTAGAAACGGTCCCGAAACAAAGTCAAGAAAAACCAGAGAAGCCAAGATGGTAGTCCTTGAGATGAAGGAACTAAGCAGCTGGTTAGGATAGCACCAAGGAAAGGTTGCTCTTTCTGTCTGCCCCTCTACTCCCTTTCTTCCTGGGCCCGCCTTATGGGTTCATCATGGAATTTGAAATAAAGTTCACAAAAGTCATATGTGTTGAATCAAGATTTTGCATTTAACCATCTCTCAAAAAGGAAGAGAGCTTCGCCTGCACATGTTTCTTTCAAAAAATCAGTTGCTTTTCAATTCTGGGGTTTCTCTCTTGCCTTGTTCTAGTAGCCTTAATAGTAAAAATCAACAAAAACCACATAAAGATTCACAGATACCAACAAAGGACAGCTTCTGTTTCTGAAATCATCTTTGCATTTAGTAGAAAACGAGCTCTATTCTTTGTGTTGCCTAGGGAATTAGCGATGTCCTGGTTCAGATGGCAAGGCATCTGGCCATGGAGACTCGAGTTTGATCCTGGGTGGGAAGATCCCGAGGAGAAGGAAATGGCATCCCACTCCAGTATTCTCGCCTGGGAAATGCCACGGACAGAGGAGCCTGATGGGCCGCAGTCCACATGGGTCGTATCAAAGTTAGGCTCAACCCAGCAACTACGAATAGAAGTATTTCCAAAGACCACACTCAAGTATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTCTCCTCTTGCCGTCTTGGAGTCCTGTCTGGAATCTTCATGTATTTCTGCCTCTGTATCTCCTGACAGCCGGCCTTCCGGCCGTTCGTCTGGCCCTGGCCTCTGCCAGGTGCTCTGGCTCTGGCTGAGGGCTTCGCAGGCCAGCTGTCTCCCTTGGCGTGGGTGCGTGCCAGTGTCTGTGTGTCTGTCTGTGTGCTTGCCTGTGGCTCATCTGCTCTCTCCAGAGAGTCCGAGTGCTTGGCGGGCAGCGTGGGGAAAAGGGGGCCTTGGCGTGTGAAGGGGCGGGGCTGGAGGCACTCAGTCGACCAACGCCTCCGTGGCTTCCCGGTGGGTTTGGGCACTGGGCAGGTGCCGGGCAAGTTGGGCACTCAAGTTTCACTGCACCTAGGCCCGCAGGATGTTCAGAGGTCCCTGAGGCCGCAGGGAGTCGGGGCGGCGGGCCCTGAAGACCGACACCTCCAGGGCCGTGCGGGCCTGAGCAGAGAAATGGGATCCGGGGGCAGCCCGGCGCTCAGCGGCGCGGCCCCGGGTCTCGTCGGCCGGGGGTGGGAGGTCGCGAGACCTCCGCACCCCTCAGCCCACCCCAGGCCGGCGTGACCCCCACCCACACACAAGCCCTCCCGGTCACTGGGGGGTCAGGGAAAGCCCTTCGGGCCCCCGGGCCCGGCCAGGCGCTTGCTGGGTCTGGTGTTGGCGGTGTTCGGGGCGGGCCGGCGTCAGCAGGCTGGAGTCCGGTCATGAGTCGTGCAGCTCAAATACAGCACGCCCCCGAAGAGAGGTGCACGGCCCGTTGTCCCGACCTGCAGGTCGGAGGCGAAAGGGAGGCGAAAGCGCAAGGCTGTTAGGGCGCCCCCGCGCCAACCGCCTCTGTCTCCGGACATATCACCTGAAGCGCCCGATCTCATCTGCTCTCCGAAGCCTAAGCAGGGTCCTGCCTGGATTAGTATCTTGGATGGGAGGACCACCTGGAAATCCTGGGTGCTGGGCTTT""]","""cuteSV-25-9608-INS-0-3454""",-10.0,{},[4],[0],[0],[4],[4.67e-03],"[10,292]",856,,[1.00e+00],[4.11e-06],"[""cuteSV-25-9608-INS-0-3454""]",0,0.00467,428,301,"[""CGCTGGCCCCGCCTCGCGTCAGATTCTTGGATCCCCCGCAACCCCAGCCCCTCCCGGGCGGGGGGAGCCGGTGGCCAGGGCCCCGACTCCTTTGCAGACCTGGGTTGCCTCCCGCGCGGCCTGCGAGGCCTCGTGCGTCGGCTTCCCAGGCTGCCCGACGACCGGCAGTCCCGTCTCCATCTGGGGCTCCTTTGGCTTGCCTCAGGCAATCCCAGGAGCTTACCGACTCCAGCCATAGCCCCCGCCCCACCCCCTGCCTCGCAGTCATTGCGCATTGCATGTGCGCGCGCAAGCACAGATAGTGTGCCCAAACGGGGCATCTATCTTGTTCACCTATACTGTGGGTGGATCTATGCCTGGGAGTTGGACTGCTGAACCATAGGCTACTTCCACTTTTTGGTTCCTTCGGGGAACCTCCATAGGCTTTCCATACCGGCTCTGCACGCCCATTCCTACTCACCGTGGAGGAGAGTTCCCTTTGCTCCACATCTTCTCCAGCATTTGTTATTGACGGAGCAGCATTGCAGTGGGCCCAATCGGACTCATGTGAAGTGGTCTCTCCTGGGAGTTTGGGTTGAGCCTCTCCAATCATTAGTGATGTGGAGCAAGATGACCTTTTGGAAATGCAGATGCAATGCTGTCACCTCCCTGCTCAACGGCTCTCGTATTTTCCCAGCATATTGAAGATACGACCTTCCTTCAGGCTTGCTCCTCCACGTTCTTCTCTGATGTCTTTGCCTGGAATGCCCTCTGCTCTTTGCTTCCATCCCATAGGGCATTTTTAAGTTTCTTGAATGTCCTAAGGTTCCTGAGTCACATAGCCTTCCAAAGATGCTATTTTCCTGCCTGAAATTCTTCCTGCATCTCACGCTGGTTTACATGGAGGTTCAACCTATGGACCTCAATGCAATAGTTCCTCTGAAAAAGCCTGACACCCAAGGTTTTTGGTGAGGGAAAGAGCCCCTCTATACGTTCTCACAGGATCTGCTTGTGATGTATCATTAGAGTCGTCTCAGTGAAAGCAGGAACTCTGTGTCTTCTTTACTCTCCACATTCTATTTCCAGAAACTTATTACTTGCCTCCCTAGCTCATCGTAGGTGCTCAGTATTTACTTAGTGTGTGAAAGACAGAAAAGCTCTGGGAAGCCTCAGGTGGAAACTTTTGATCTCTTTGGCGATTCAATTCTTTCCTTTCCTGGGTTGATGTGTACACTATATTCGAGGCACAATCTACGGCATGAAGACACATCCATACATATGGGTTCTTGCTTTCAAGGAACTCGTACAGCAGATTTCTACAAGTTAGGGCTTTTCAAATGCTCTAGAAGAATGTGAACATTCAGTATGCTCCATTTCATTTCAAAGAAAAGATCGTTTGACTTCCATAGCAAAGGTGGGAACCTTCCAACGTTTAGTCCAATACTTATACCTTGACCATTTCAGGCAAGTCTTGTGGTTCCTTCTGGAGAGGTGTGTGGCTCTTGCAGAAATGTGAAGGTCTTCCATAGATCCTCAGACTTTGGTGTCATTTGCAATCATCAAGACAGCTTCCGGGTGTAGGTTCCAGACCACCACCACACTGATCATTAGTAGCGGAGTGGATTCCAGGAATATCGTTGTTAACCAGCATCACCGTTGATACTCTTGCTACACAATGCAGTGGGCGGAGCCTGGATTGTCCACTGGGTCTCATGTGGAAAGGAGAAGAGGTTTCCCCACTAAACCCCCCTCTGTTGACAGCCTTGTGACTAGAAACGGTCCCGAAACAAAGTCAAGAAAAACCAGAGAAGCCAAGATGGTAGTCCTTGAGATGAAGGAACTAAGCAGCTGGTTAGGATAGCACCAAGGAAAGGTTGCTCTTTCTGTCTGCCCCTCTACTCCCTTTCTTCCTGGGCCCGCCTTATGGGTTCATCATGGAATTTGAAATAAAGTTCACAAAAGTCATATGTGTTGAATCAAGATTTTGCATTTAACCATCTCTCAAAAAGGAAGAGAGCTTCGCCTGCACATGTTTCTTTCAAAAAATCAGTTGCTTTTCAATTCTGGGGTTTCTCTCTTGCCTTGTTCTAGTAGCCTTAATAGTAAAAATCAACAAAAACCACATAAAGATTCACAGATACCAACAAAGGACAGCTTCTGTTTCTGAAATCATCTTTGCATTTAGTAGAAAACGAGCTCTATTCTTTGTGTTGCCTAGGGAATTAGCGATGTCCTGGTTCAGATGGCAAGGCATCTGGCCATGGAGACTCGAGTTTGATCCTGGGTGGGAAGATCCCGAGGAGAAGGAAATGGCATCCCACTCCAGTATTCTCGCCTGGGAAATGCCACGGACAGAGGAGCCTGATGGGCCGCAGTCCACATGGGTCGTATCAAAGTTAGGCTCAACCCAGCAACTACGAATAGAAGTATTTCCAAAGACCACACTCAAGTATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTCTCCTCTTGCCGTCTTGGAGTCCTGTCTGGAATCTTCATGTATTTCTGCCTCTGTATCTCCTGACAGCCGGCCTTCCGGCCGTTCGTCTGGCCCTGGCCTCTGCCAGGTGCTCTGGCTCTGGCTGAGGGCTTCGCAGGCCAGCTGTCTCCCTTGGCGTGGGTGCGTGCCAGTGTCTGTGTGTCTGTCTGTGTGCTTGCCTGTGGCTCATCTGCTCTCTCCAGAGAGTCCGAGTGCTTGGCGGGCAGCGTGGGGAAAAGGGGGCCTTGGCGTGTGAAGGGGCGGGGCTGGAGGCACTCAGTCGACCAACGCCTCCGTGGCTTCCCGGTGGGTTTGGGCACTGGGCAGGTGCCGGGCAAGTTGGGCACTCAAGTTTCACTGCACCTAGGCCCGCAGGATGTTCAGAGGTCCCTGAGGCCGCAGGGAGTCGGGGCGGCGGGCCCTGAAGACCGACACCTCCAGGGCCGTGCGGGCCTGAGCAGAGAAATGGGATCCGGGGGCAGCCCGGCGCTCAGCGGCGCGGCCCCGGGTCTCGTCGGCCGGGGGTGGGAGGTCGCGAGACCTCCGCACCCCTCAGCCCACCCCAGGCCGGCGTGACCCCCACCCACACACAAGCCCTCCCGGTCACTGGGGGGTCAGGGAAAGCCCTTCGGGCCCCCGGGCCCGGCCAGGCGCTTGCTGGGTCTGGTGTTGGCGGTGTTCGGGGCGGGCCGGCGTCAGCAGGCTGGAGTCCGGTCATGAGTCGTGCAGCTCAAATACAGCACGCCCCCGAAGAGAGGTGCACGGCCCGTTGTCCCGACCTGCAGGTCGGAGGCGAAAGGGAGGCGAAAGCGCAAGGCTGTTAGGGCGCCCCCGCGCCAACCGCCTCTGTCTCCGGACATATCACCTGAAGCGCCCGATCTCATCTGCTCTCCGAAGCCTAAGCAGGGTCCTGCCTGGATTAGTATCTTGGATGGGAGGACCACCTGGAAATCCTGGGTGCTGGGCTTT|downstream_gene_variant|MODIFIER|5S_rRNA|ENSBTAG00000048872|Transcript|ENSBTAT00000076247|rRNA|||||||||||21|1||RFAM||""]",0.0,0.0217,0.0
25:10789,"[""G"",""GGCGATTCAATTCTTTTCCTTTCTGGGTTGATGTGTACACACTATATTCGAGAGGCACAATCTACGCATGAAGGCCATCCATACACTTATAGGTTCTTGCTTTCAAGGAACTAGAAACCACAACCCAATTTCTACAAGTTGGGGCTTTTCAAATGCTCTGAAGAATGTGAACATTCAGTATGCTCCATTTTCATTTCAAAGAAAGATCGTTTGACTTCCATAGCAAAGGTGGAGACCTTTTCCAGCGTTTAGTCAATACTTATACCTTGACCATTTCAGGCAAGTCTTGTGGTTCCTTCTGGAGGTGTGGCTCTTACCCAGAAATGTGAAGAGTCTTCCATACGTCCTCAGACTTTGGTGTCATTTGCAATCATCAAGACAGCTTCCGGGTGGGTTGAACCACCTACACACAGTCATTAGCCATGTAGGTGGATTCCAGGAATATGCGTTGTTAAGCAGCATCACCGTTGATACTGTTGCTGCACACACAATGCAGTGGGCCGGGCCTGGATTGTCCACTGGATCATGTGGAGATGGAGAGAAGGGTTCCCCCACACACAAACCCCACCTCTGTTGTGTGTGACTAGAAACGATTAAAACAAGAATCAGAAAAACCAGAGAAGCCAAGATGGTAGTCCTTGAGATGAAAGACTAAGCAGCTGGACAGAGATGGCACCAAGGGGAAAGGTTGCTCTTTCTGTCTGCCCTCTACTCCTTTCTTCCTGGACTGCCCATGGGTTCATCATGGAGATTTGAAATAAAGTTCACAAAAGTCATATGTGTTGAATCAAGATTTTGCATTTAACCATCTCTCAAAAAGGAAGAGAGCTTCGCTACTGTATTTCTTTCAAAAACGTCAGTTGCTTTTCAATTGGGGTTTCTCTCTTGCCTTGTTCTAGTAGCCTTAATAGTAAAAATCAACAAAAACCACATAAAGATTCACAGATACCAACAAAGGACAGCTTCTGTTTCTGAAATCATCTTTGCATTTTAGTAAACCAGGCTCTATTCTTTGTATTTTGCCTAGGAATTAGCGATGTCCTGGTTCAGATGGCAAGGCATCCAAGCACAGGAGACTGGGTTTGATCCCTGGTCCGGAAGATCCCCCGGAGAAGGAAATGGCATCCCACTCCAGTATCCTCGCTGGGAAATGCCACGGACAGAGGAGCCTGATGGGTGCAGTCCATGGGTCGTATCAAAGTTGGGCTCAACCCAGCAACCAACGAACAGAAGTATTTCAGACCACACTCAAGTATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTATCTCCTCTTGCCGTCTTGAGTCTGTCTGAATCTTCATGTATTTCTGCCTCTGTCTCCTGACAGCCGGCCGCCAGGCGTTACGTCTGGCCCTGGCCTCTGCCAGGTGCTCTGGCTCTGGCTGAGGGAGCTTGCTCGCGACCAGCTGTCTCCCTTGGCATTAGGTGCGTGCAGTGTCTGTGTGTCTGTCTGTGCTTGCCTGTGGCTCATCTTGCTCTCTCAGGAAGGTCGCGGTGGCAGGCGCGGGCGGCGTGGGGAAAGGGGGGCCTTGGTAGTGTGAAGGGGCAGGGCTGGGGGCACTCCGGTCGACCACGCCTCGGCTGGCTTCCCGGTGGGTTTGGGCACTGGGCAGGTGCCGGGCAAGTTGGGCACTCAAGTTTCCTGCACCTAGGCCCGCAGGATGTTCAGGGTCCTTGGGCCGCAGGGGTCGGAGCGGCGGGCCCTGAGGCCGACACCTCCAGGGCCGTACGGGCCTGAGGCAGAGAATGGGATCCGGGGGCAAGCCCCGCTCAGAAAACACGGCCGGGTCTGGAGTGGCCGGGGGGTGGGAGGTCGCCGAGACCTCACACCCCTCAGCCACCCCCCAGGCCCGGCGTGGGCCCCCAAGCACAAGCCCTCGGTCACTGCAGGAGTCCGGGGGAAGCCTTGAACCCAGACCCGACAGGCGCTGCTGGTCTGGTGTTGGCGGTGTTCAGGGGGCGGGCCGGCGTCAGCAGAGCTGCTGGAGTCCGGTCGCCCAGGTCGTGCGGCTCCAATCTGTGCGCCCCCGAAGAGAGTGCGGCCCGTTGTCCCGGCCTGCAGGTCGGAGCGAAAGGGGCGAAGCGCAAGGCTGTTAGGCGCCCATGGCCGCCTCTGTCTCCGGACATATCACCTGAAGGCGCCGATCTCATCTGCTCTCCGAAGCTAAGCAGGGTCCTGCCTGGTTAGTATCTTGGATGGGAGACCACCTGGAAATCCTGGGTGCTGGGCTTTTTGCCTACCAGAAGAGGTCCAACCCAACCCCCACCTCAACCCCCCCCCACCGCGTCCCAATTCTTGGATCCCCCACCAGCCCCAACCCCGCCAGAGCGGGGGGAGTGGGTGGCCGGGTGACTCCCGCTGCAGACCTGGGTTGCCTCCGTCCCGGCCCGCGAGCCCCTCCGCATTCTCAAGGCCGGCTTCAGGACACCAGACCGGCAGTCCCCTCTCCATCTGGGGCTCAACTGGCTTCTGCCTCAGGCAATCCGGGCTTACACCGACTCCAGCCATAGCCCAGCCCCGCCTCACCCTGCCTCGCAGGCGATCGCATGCGCTGTGCAGATGTGCCCAAGCGGGGCATCTGTCTGTTCACTTATACTGTGGGTGGATCTATGCCTGGGAGTCGGACTGCTGAACCATAGGCTACTTCCACTTTTAGTTCCTTCGAAGAACCTCATACGCTTTTCCATACCGGCTCTACCCACACGGCCATTCCTACTCACCGTGGAGGAGGGGTTCCCCTTTGCTCACATCTTCTCCAGCATTTGTTATTGACAGACAATGCAGTGAGGCCCAATCGGACTCATGGTATGGTCTCTCCTGGGAGTTTGGGTTGAGCCTCTCCAATCATTAGTGATGTGAGCAAGATGACCTTTGGAAATGCAGATACAATGCTGTCACCTCCCTGCTCAACGGCTCTCGTATTTTTAGCATGTTGAAGGTACGACCACTTCCCTTCAAACACTCCTCATGTTCTTCTCAGTGTCTTTTTACCTGGAATGCCCTCTGCACTCTTTGACTTCCATCACCAGGAGCATTTCGTTCTTGAATGTCCTAGGTTCCCTGGTCATAGCCTTCCAAGATGTATTTTCCACTGCCTGAAATTCTTCCTGCATCTGTGCCGGTTTACGTAGGTTCAACCTATGGACCTCAATGCAGTGGTTCCTCTGAAAAGCCTTCCTGACACCCAAGTTTTGGTGAGGAGGCCCCTCTATAAGTTCTCACAGGATCTGCCTGTGACGTATCATTAGAGTCGTCTCAGTGAAAGCAGGAATAATGTCTTTGTACTCTCCATTCCTATTTCCAGAAACTTATTACTTGCCTCCTAGCTCATCGTGGAGGTGCTCAGTATTTACTTAGTGTGAAAGGCAGAAAAGCTCTGGGAAGCCTCAAGGTGAAACTTTT""]","""cuteSV-25-10789-INS-0-3435""",-10.0,{},[0],[0],[0],[0],[0.00e+00],"[1,300]",856,,[1.00e+00],[1.00e+00],"[""cuteSV-25-10789-INS-0-3435""]",0,0.0,428,301,"[""GCGATTCAATTCTTTTCCTTTCTGGGTTGATGTGTACACACTATATTCGAGAGGCACAATCTACGCATGAAGGCCATCCATACACTTATAGGTTCTTGCTTTCAAGGAACTAGAAACCACAACCCAATTTCTACAAGTTGGGGCTTTTCAAATGCTCTGAAGAATGTGAACATTCAGTATGCTCCATTTTCATTTCAAAGAAAGATCGTTTGACTTCCATAGCAAAGGTGGAGACCTTTTCCAGCGTTTAGTCAATACTTATACCTTGACCATTTCAGGCAAGTCTTGTGGTTCCTTCTGGAGGTGTGGCTCTTACCCAGAAATGTGAAGAGTCTTCCATACGTCCTCAGACTTTGGTGTCATTTGCAATCATCAAGACAGCTTCCGGGTGGGTTGAACCACCTACACACAGTCATTAGCCATGTAGGTGGATTCCAGGAATATGCGTTGTTAAGCAGCATCACCGTTGATACTGTTGCTGCACACACAATGCAGTGGGCCGGGCCTGGATTGTCCACTGGATCATGTGGAGATGGAGAGAAGGGTTCCCCCACACACAAACCCCACCTCTGTTGTGTGTGACTAGAAACGATTAAAACAAGAATCAGAAAAACCAGAGAAGCCAAGATGGTAGTCCTTGAGATGAAAGACTAAGCAGCTGGACAGAGATGGCACCAAGGGGAAAGGTTGCTCTTTCTGTCTGCCCTCTACTCCTTTCTTCCTGGACTGCCCATGGGTTCATCATGGAGATTTGAAATAAAGTTCACAAAAGTCATATGTGTTGAATCAAGATTTTGCATTTAACCATCTCTCAAAAAGGAAGAGAGCTTCGCTACTGTATTTCTTTCAAAAACGTCAGTTGCTTTTCAATTGGGGTTTCTCTCTTGCCTTGTTCTAGTAGCCTTAATAGTAAAAATCAACAAAAACCACATAAAGATTCACAGATACCAACAAAGGACAGCTTCTGTTTCTGAAATCATCTTTGCATTTTAGTAAACCAGGCTCTATTCTTTGTATTTTGCCTAGGAATTAGCGATGTCCTGGTTCAGATGGCAAGGCATCCAAGCACAGGAGACTGGGTTTGATCCCTGGTCCGGAAGATCCCCCGGAGAAGGAAATGGCATCCCACTCCAGTATCCTCGCTGGGAAATGCCACGGACAGAGGAGCCTGATGGGTGCAGTCCATGGGTCGTATCAAAGTTGGGCTCAACCCAGCAACCAACGAACAGAAGTATTTCAGACCACACTCAAGTATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTATCTCCTCTTGCCGTCTTGAGTCTGTCTGAATCTTCATGTATTTCTGCCTCTGTCTCCTGACAGCCGGCCGCCAGGCGTTACGTCTGGCCCTGGCCTCTGCCAGGTGCTCTGGCTCTGGCTGAGGGAGCTTGCTCGCGACCAGCTGTCTCCCTTGGCATTAGGTGCGTGCAGTGTCTGTGTGTCTGTCTGTGCTTGCCTGTGGCTCATCTTGCTCTCTCAGGAAGGTCGCGGTGGCAGGCGCGGGCGGCGTGGGGAAAGGGGGGCCTTGGTAGTGTGAAGGGGCAGGGCTGGGGGCACTCCGGTCGACCACGCCTCGGCTGGCTTCCCGGTGGGTTTGGGCACTGGGCAGGTGCCGGGCAAGTTGGGCACTCAAGTTTCCTGCACCTAGGCCCGCAGGATGTTCAGGGTCCTTGGGCCGCAGGGGTCGGAGCGGCGGGCCCTGAGGCCGACACCTCCAGGGCCGTACGGGCCTGAGGCAGAGAATGGGATCCGGGGGCAAGCCCCGCTCAGAAAACACGGCCGGGTCTGGAGTGGCCGGGGGGTGGGAGGTCGCCGAGACCTCACACCCCTCAGCCACCCCCCAGGCCCGGCGTGGGCCCCCAAGCACAAGCCCTCGGTCACTGCAGGAGTCCGGGGGAAGCCTTGAACCCAGACCCGACAGGCGCTGCTGGTCTGGTGTTGGCGGTGTTCAGGGGGCGGGCCGGCGTCAGCAGAGCTGCTGGAGTCCGGTCGCCCAGGTCGTGCGGCTCCAATCTGTGCGCCCCCGAAGAGAGTGCGGCCCGTTGTCCCGGCCTGCAGGTCGGAGCGAAAGGGGCGAAGCGCAAGGCTGTTAGGCGCCCATGGCCGCCTCTGTCTCCGGACATATCACCTGAAGGCGCCGATCTCATCTGCTCTCCGAAGCTAAGCAGGGTCCTGCCTGGTTAGTATCTTGGATGGGAGACCACCTGGAAATCCTGGGTGCTGGGCTTTTTGCCTACCAGAAGAGGTCCAACCCAACCCCCACCTCAACCCCCCCCCACCGCGTCCCAATTCTTGGATCCCCCACCAGCCCCAACCCCGCCAGAGCGGGGGGAGTGGGTGGCCGGGTGACTCCCGCTGCAGACCTGGGTTGCCTCCGTCCCGGCCCGCGAGCCCCTCCGCATTCTCAAGGCCGGCTTCAGGACACCAGACCGGCAGTCCCCTCTCCATCTGGGGCTCAACTGGCTTCTGCCTCAGGCAATCCGGGCTTACACCGACTCCAGCCATAGCCCAGCCCCGCCTCACCCTGCCTCGCAGGCGATCGCATGCGCTGTGCAGATGTGCCCAAGCGGGGCATCTGTCTGTTCACTTATACTGTGGGTGGATCTATGCCTGGGAGTCGGACTGCTGAACCATAGGCTACTTCCACTTTTAGTTCCTTCGAAGAACCTCATACGCTTTTCCATACCGGCTCTACCCACACGGCCATTCCTACTCACCGTGGAGGAGGGGTTCCCCTTTGCTCACATCTTCTCCAGCATTTGTTATTGACAGACAATGCAGTGAGGCCCAATCGGACTCATGGTATGGTCTCTCCTGGGAGTTTGGGTTGAGCCTCTCCAATCATTAGTGATGTGAGCAAGATGACCTTTGGAAATGCAGATACAATGCTGTCACCTCCCTGCTCAACGGCTCTCGTATTTTTAGCATGTTGAAGGTACGACCACTTCCCTTCAAACACTCCTCATGTTCTTCTCAGTGTCTTTTTACCTGGAATGCCCTCTGCACTCTTTGACTTCCATCACCAGGAGCATTTCGTTCTTGAATGTCCTAGGTTCCCTGGTCATAGCCTTCCAAGATGTATTTTCCACTGCCTGAAATTCTTCCTGCATCTGTGCCGGTTTACGTAGGTTCAACCTATGGACCTCAATGCAGTGGTTCCTCTGAAAAGCCTTCCTGACACCCAAGTTTTGGTGAGGAGGCCCCTCTATAAGTTCTCACAGGATCTGCCTGTGACGTATCATTAGAGTCGTCTCAGTGAAAGCAGGAATAATGTCTTTGTACTCTCCATTCCTATTTCCAGAAACTTATTACTTGCCTCCTAGCTCATCGTGGAGGTGCTCAGTATTTACTTAGTGTGAAAGGCAGAAAAGCTCTGGGAAGCCTCAAGGTGAAACTTTT|downstream_gene_variant|MODIFIER|5S_rRNA|ENSBTAG00000048872|Transcript|ENSBTAT00000076247|rRNA|||||||||||1202|1||RFAM||""]",0.0,0.0,0.0
25:11316,"[""C"",""CTCATGTGGAATGGAGAGAAGGGCTCCCCCCCACACAAACCCCACCCCCTCTGTTGACAGCACCTTTTCAGACCAGGTAACCTGTCTCAAAACAAGAATCGGAAAAACCAGAGAAGCCAAGATGGTAGTCCTTGAGATGAAGGAACTACAGCTGGTTGAGATGGCACCAAGGGGAAAAGGTTGCTCTTTCTGTCTGCCCCTCTACTCCTTTCTTCCTGGACGCCTTATGGGTTCATCATGGAGATTTGAAATAAAGTTCACAAAAGTCATATGTGCTGAATCAAGATTTTGCATTTAACCATCTCTCAAAAAGGAAGAGAGCTTCGCCCTGCACATGTTTCTTTCCAAAACGTCAGTTGCTTTTCAATTCTGGGGTTTCTTCTCTTGCCTTGTTCTAGTAGCCTTAATAGTAAAATCAACAAAAACCACATAAAGATTCACAGATACCAACAAAGGACAGCTTCTGTTTCTGAAATCATCTTTGCATTTAGTAGAAAACGAGCTCTATTCTCAGATGCCTAGGGATTTATTGCTTTTAAACCCAATCTGATGAGCATCCCCCCACAGAGATCGTTGGTTTGATTCCAATGGGAAGATTCCCTGAGAAGGAGAAATGGCATCCACTCCAGTATCTCACAGAAATGCACGAGGAGCCATGGGCTCAGTCCGGTAAGGTCTTCCCACTGTTATCTAGTCCAGCAATGCAACGAACAGAAGTATTTCAAGTCCGCTCCAAGTATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTATCTCCTCTGCCGTCTTGGAGTCTGTCTCGGAATCTTCATGTATTTCGGCTTGTATCTCCTGACAGCCGGCCGCCGGCCCGTTCGTCTGGCCCTGGCCTCTAGCGGTGCTCTGGCTCTGGGCTGAGGAGCTGCGCAGGCCAGCTGTCTCCTTGGCGTGGGTGCGTGCCAGTGTCTGTGTGTCTGTCTGTGTGCTTGCCTGTGGCTCATCTGCTCTCTCAGGAAGGTCGAGTGCTTGGCGGGCGGCGTGGGGGAAAAGGGGGCCTTGGTAGTGTGAAGGGGCGGGGCTGAGGCACTGCCGACCACGCTCCGTGGCTTCCCGGTGGGTTTGGGCACTGGGCAGGTGCCGGGCAAGTTGGGCACTCAAGTTTCACTGCACCTAGGCCCGCAGGATGTTCAGAGGTCCTTGGGCCGCAGGGGTCGGGAGGCGTGGCCCTGAAGACCGGACACCTCCAGGGCCGTAAGGGCCTGAGCAGAGAATGGGATCCGGGGGCAAGCCCCGCCTCAGCGAGCACGGCCGGGTCTGGAGTGGCCGGGGGTGGAGGTCGCCGAGACCTCCTCGCACCCTCAGCCCACCCCCCAGGCCCGGCGTGAGCCCACGCACACAAGCCCTCCCGGTCACTGCGGGGTCCGGGAAGCCCTTCGGGCCCCCCGGGCCCGGCCAGGCGCTGCTGGTCTGGTGTTGGCGGTGTTCGGGGGCGGGCCGCCAGCAGGCTGGAGTCCGGTCGCGGGTCGTGCGGCTCTGGGCCACAGTACGCCCCCGAAGAGGTGCCCGGTTGTCCCGACTCTGCAGGTCGGAGCGAAGGGAGGCGAAGCTAAGGCTGCCAGGGCGCCCGCGCCAACCGCCTCTGTCTCCGGACATATCAATTCGAAGGCGCCCGATCTCATCTGCTCTCCGAAGCTAAGCAGGGTCCTGCCTGGTTAGTATCTTGGATGGGAGACCACCTGGAAATCCTGGGTGCTGGAGGCTTTTTGCCTACCAGTCCTTTAGCCCCACCCTCAACCCCCCCCCCACGCCGCGTCCCAATTCTTGGATCCCCCCGCAGCCCCAGCTCCCGCCCAGGAAGTGGGGAGTGGGTGGCCGGGGCCCCGACTCCCGCTGCAGACCTGGGGTTGCTCCGCCTGCCCGCGAGCCCTCCGCATTCTCATTCGGTTTCCAGGACACCAGACAACCGGCAGTCCCCTCTCCATCTGGGGCTCCTTTCTGCCTCAGGCAATCCCGGGCTTACACCGACTCCAGCCATAGTCCCAGCCCCCGCCTCACCCACTGCCTCGCAGGCGATCGCGCGCATGCACAGATGTGCCCAAACGCATCTATCTTGTTCACTTATACTGTGGGTGGATCTATGCCTGGGAGTCGGACTGCTGAACCATAGGCTACTTCCACTTTTAGTTCCTTCGGGAACCTCCATACGCTTTTCCATACCGCTCTACCCAGGCCATTCCTACTCACCGTGGAGGAGAGTTCCCTTTGCTCCACATCTTCTCCAGCATTTGTTATTGACAGACAATGCAGAGTGGCCCAATCGGACTCATGTGAAGTGGTCTCTCCTGGGAGTTTGAGTTGAGCCTCTCCAATCATTAGTGATGTGGAGCAAGATGACCTTTTGGAAATGCAGATGCAATGCTGTCACCTCCCTGCTCCAACGGCTCTCGTATTTTCCCAGCATATTGAAGATACGACCACTTCCCTTCAAGCTTGCTCCTCATGTTCTTCTCATGTCTTTTGCCTGGAATGCCCTCTGCACTTTGACTTCCATCCCATAGGGCATTTTTCGTTTCTTGAATGTCCTAAGGTTCCCTGGTCACATAGCCTTCCAAGATGCTATTTTCCTGCTCAGAAATTCTTCCTGCATCTCAATGCCGGTTTACGTAGGTTCAACCTATGGACCTCAATGCAGTGGTTCCTCTGAAAAGCCTTCCCTGACACCCAAGTTTTGGTGAGGGAGAGGCCCTCTATAAGTTCCTCACAGGATCTGCCTGTGACGATCATTAGAGTCGTCTCAGTGAAAGCAGGAACCTGTGTCTTCTTCTCTCCACATTCTATTTCCAGAAACTTATTACTTGCCTCTAGCTCATCGTAGGTGCTCAGTATTTACTTATGTGTGAAAGACAGAAAAGCTCTGGGAAGCCTCAAGGTGAAACTTTGATCTCTTGGCGATTCACCTCTTTCCTTTCCTGGGTTGATGTGTACACTATATTCGAGGCACAATCTACGGCATGAAGACACATCCATACACATAGGTTCTTGCTTTCAAGGAACTCGTACAGCAGATTTCTACAAGTTAGGGCTTTTCAAATGCTCTAGAAGAATGTGAACGTTCAGTATACCATTTTCATTTCAAAGAAAGATCGTTTGACTTCCATAGCAAAGGTGGAGACCTTCCAACGTTTAACTAATATTATACCTTTATCCATTTCTGGCAAGTCTTGTGGTTTCTTCTGGAGGTGTGTGGCACCTTGCAGAAATGTGAAGGTCTTCCATCGTCCTCAGACTTTGGTGTCATTTGCAATCATCAAGACAGCTTCCGGGTGTAGGTTCCAGACCACCACCACACACAGTCATTAGCCAACGGAGGATTCCAGGAATATGCGTTTGTTAACCAGCATCACCGTTGATACTCTTGCTGCACACTCA""]","""cuteSV-25-11316-INS-0-3415""",-10.0,{},[0],[0],[0],[0],[0.00e+00],"[16,290]",856,,[1.00e+00],[1.00e+00],"[""cuteSV-25-11316-INS-0-3415""]",0,0.0,428,301,"[""TCATGTGGAATGGAGAGAAGGGCTCCCCCCCACACAAACCCCACCCCCTCTGTTGACAGCACCTTTTCAGACCAGGTAACCTGTCTCAAAACAAGAATCGGAAAAACCAGAGAAGCCAAGATGGTAGTCCTTGAGATGAAGGAACTACAGCTGGTTGAGATGGCACCAAGGGGAAAAGGTTGCTCTTTCTGTCTGCCCCTCTACTCCTTTCTTCCTGGACGCCTTATGGGTTCATCATGGAGATTTGAAATAAAGTTCACAAAAGTCATATGTGCTGAATCAAGATTTTGCATTTAACCATCTCTCAAAAAGGAAGAGAGCTTCGCCCTGCACATGTTTCTTTCCAAAACGTCAGTTGCTTTTCAATTCTGGGGTTTCTTCTCTTGCCTTGTTCTAGTAGCCTTAATAGTAAAATCAACAAAAACCACATAAAGATTCACAGATACCAACAAAGGACAGCTTCTGTTTCTGAAATCATCTTTGCATTTAGTAGAAAACGAGCTCTATTCTCAGATGCCTAGGGATTTATTGCTTTTAAACCCAATCTGATGAGCATCCCCCCACAGAGATCGTTGGTTTGATTCCAATGGGAAGATTCCCTGAGAAGGAGAAATGGCATCCACTCCAGTATCTCACAGAAATGCACGAGGAGCCATGGGCTCAGTCCGGTAAGGTCTTCCCACTGTTATCTAGTCCAGCAATGCAACGAACAGAAGTATTTCAAGTCCGCTCCAAGTATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTATCTCCTCTGCCGTCTTGGAGTCTGTCTCGGAATCTTCATGTATTTCGGCTTGTATCTCCTGACAGCCGGCCGCCGGCCCGTTCGTCTGGCCCTGGCCTCTAGCGGTGCTCTGGCTCTGGGCTGAGGAGCTGCGCAGGCCAGCTGTCTCCTTGGCGTGGGTGCGTGCCAGTGTCTGTGTGTCTGTCTGTGTGCTTGCCTGTGGCTCATCTGCTCTCTCAGGAAGGTCGAGTGCTTGGCGGGCGGCGTGGGGGAAAAGGGGGCCTTGGTAGTGTGAAGGGGCGGGGCTGAGGCACTGCCGACCACGCTCCGTGGCTTCCCGGTGGGTTTGGGCACTGGGCAGGTGCCGGGCAAGTTGGGCACTCAAGTTTCACTGCACCTAGGCCCGCAGGATGTTCAGAGGTCCTTGGGCCGCAGGGGTCGGGAGGCGTGGCCCTGAAGACCGGACACCTCCAGGGCCGTAAGGGCCTGAGCAGAGAATGGGATCCGGGGGCAAGCCCCGCCTCAGCGAGCACGGCCGGGTCTGGAGTGGCCGGGGGTGGAGGTCGCCGAGACCTCCTCGCACCCTCAGCCCACCCCCCAGGCCCGGCGTGAGCCCACGCACACAAGCCCTCCCGGTCACTGCGGGGTCCGGGAAGCCCTTCGGGCCCCCCGGGCCCGGCCAGGCGCTGCTGGTCTGGTGTTGGCGGTGTTCGGGGGCGGGCCGCCAGCAGGCTGGAGTCCGGTCGCGGGTCGTGCGGCTCTGGGCCACAGTACGCCCCCGAAGAGGTGCCCGGTTGTCCCGACTCTGCAGGTCGGAGCGAAGGGAGGCGAAGCTAAGGCTGCCAGGGCGCCCGCGCCAACCGCCTCTGTCTCCGGACATATCAATTCGAAGGCGCCCGATCTCATCTGCTCTCCGAAGCTAAGCAGGGTCCTGCCTGGTTAGTATCTTGGATGGGAGACCACCTGGAAATCCTGGGTGCTGGAGGCTTTTTGCCTACCAGTCCTTTAGCCCCACCCTCAACCCCCCCCCCACGCCGCGTCCCAATTCTTGGATCCCCCCGCAGCCCCAGCTCCCGCCCAGGAAGTGGGGAGTGGGTGGCCGGGGCCCCGACTCCCGCTGCAGACCTGGGGTTGCTCCGCCTGCCCGCGAGCCCTCCGCATTCTCATTCGGTTTCCAGGACACCAGACAACCGGCAGTCCCCTCTCCATCTGGGGCTCCTTTCTGCCTCAGGCAATCCCGGGCTTACACCGACTCCAGCCATAGTCCCAGCCCCCGCCTCACCCACTGCCTCGCAGGCGATCGCGCGCATGCACAGATGTGCCCAAACGCATCTATCTTGTTCACTTATACTGTGGGTGGATCTATGCCTGGGAGTCGGACTGCTGAACCATAGGCTACTTCCACTTTTAGTTCCTTCGGGAACCTCCATACGCTTTTCCATACCGCTCTACCCAGGCCATTCCTACTCACCGTGGAGGAGAGTTCCCTTTGCTCCACATCTTCTCCAGCATTTGTTATTGACAGACAATGCAGAGTGGCCCAATCGGACTCATGTGAAGTGGTCTCTCCTGGGAGTTTGAGTTGAGCCTCTCCAATCATTAGTGATGTGGAGCAAGATGACCTTTTGGAAATGCAGATGCAATGCTGTCACCTCCCTGCTCCAACGGCTCTCGTATTTTCCCAGCATATTGAAGATACGACCACTTCCCTTCAAGCTTGCTCCTCATGTTCTTCTCATGTCTTTTGCCTGGAATGCCCTCTGCACTTTGACTTCCATCCCATAGGGCATTTTTCGTTTCTTGAATGTCCTAAGGTTCCCTGGTCACATAGCCTTCCAAGATGCTATTTTCCTGCTCAGAAATTCTTCCTGCATCTCAATGCCGGTTTACGTAGGTTCAACCTATGGACCTCAATGCAGTGGTTCCTCTGAAAAGCCTTCCCTGACACCCAAGTTTTGGTGAGGGAGAGGCCCTCTATAAGTTCCTCACAGGATCTGCCTGTGACGATCATTAGAGTCGTCTCAGTGAAAGCAGGAACCTGTGTCTTCTTCTCTCCACATTCTATTTCCAGAAACTTATTACTTGCCTCTAGCTCATCGTAGGTGCTCAGTATTTACTTATGTGTGAAAGACAGAAAAGCTCTGGGAAGCCTCAAGGTGAAACTTTGATCTCTTGGCGATTCACCTCTTTCCTTTCCTGGGTTGATGTGTACACTATATTCGAGGCACAATCTACGGCATGAAGACACATCCATACACATAGGTTCTTGCTTTCAAGGAACTCGTACAGCAGATTTCTACAAGTTAGGGCTTTTCAAATGCTCTAGAAGAATGTGAACGTTCAGTATACCATTTTCATTTCAAAGAAAGATCGTTTGACTTCCATAGCAAAGGTGGAGACCTTCCAACGTTTAACTAATATTATACCTTTATCCATTTCTGGCAAGTCTTGTGGTTTCTTCTGGAGGTGTGTGGCACCTTGCAGAAATGTGAAGGTCTTCCATCGTCCTCAGACTTTGGTGTCATTTGCAATCATCAAGACAGCTTCCGGGTGTAGGTTCCAGACCACCACCACACACAGTCATTAGCCAACGGAGGATTCCAGGAATATGCGTTTGTTAACCAGCATCACCGTTGATACTCTTGCTGCACACTCA|downstream_gene_variant|MODIFIER|5S_rRNA|ENSBTAG00000048872|Transcript|ENSBTAT00000076247|rRNA|||||||||||1729|1||RFAM||""]",0.0,0.0,0.0


Now we calculated startified AF per cohort lets find the frequent variants in Indicus samples

In [6]:
indicusFrequent=mt.filter_rows(mt.AF_indicus > 0.7)
indicusFrequent.rows().show()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,info,info,info,info,info,info,info,info,info,info,info,info,info,info,info,info,Unnamed: 21_level_0,Unnamed: 22_level_0,Unnamed: 23_level_0
locus,alleles,rsid,qual,filters,AC,AC_Hemi,AC_Het,AC_Hom,AF,AK,AN,DP,ExcHet,HWE,ID,MA,MAF,NS,UK,CSQ,AF_indicus,AF_taurus,AF_bosoutgroup
locus<ARSUCD>,array<str>,str,float64,set<str>,array<int32>,array<int32>,array<int32>,array<int32>,array<float64>,array<int32>,int32,int32,array<float64>,array<float64>,array<str>,int32,float64,int32,int32,array<str>,float64,float64,float64
25:37471,"[""ATATATTTATTATATCATATTATAATATATATTTATTATATACATAAATACATGTTTATATT"",""A""]","""cuteSV-25-37471-DEL-0-61""",-10.0,{},[339],[0],[195],[144],[3.96e-01],"[50,25]",856,,[8.67e-01],[3.14e-01],"[""cuteSV-25-37471-DEL-0-61""]",0,0.396,428,75,"[""-|upstream_gene_variant|MODIFIER||ENSBTAG00000053570|Transcript|ENSBTAT00000078325|processed_pseudogene|||||||||||850|1||||""]",1.58,1.52,2.22
25:79994,"[""TCCCTTCCTGGGAGCTCCAAGGAGGTGATCAGTCACAATGCTCCCTTCTCCCGAGTCCCCATGTTTCCCTAACTGTTG"",""T""]","""cuteSV-25-79994-DEL-0-77""",-10.0,{},[117],[0],[9],[108],[1.37e-01],"[115,64]",856,,[1.00e+00],[0.00e+00],"[""cuteSV-25-79994-DEL-0-77""]",0,0.137,428,182,"[""-|intergenic_variant|MODIFIER|||||||||||||||||||||""]",1.33,0.174,3.83
25:108289,"[""GGATGTGGGGGGCAGGGGCACTTAGGATGTGGGGGGCGGGGGGCACTTAGGATGTGGGGGGTGGGGGGCAC"",""G""]","""cuteSV-25-108289-DEL-0-70""",-10.0,{},[498],[0],[28],[470],[5.82e-01],"[98,45]",856,,[1.00e+00],[0.00e+00],"[""cuteSV-25-108289-DEL-0-70""]",0,0.418,428,143,"[""-|upstream_gene_variant|MODIFIER|IL9R|ENSBTAG00000007558|Transcript|ENSBTAT00000009944|protein_coding|||||||||||1129|-1||VGNC||0.748""]",3.25,2.28,2.17
25:175780,"[""G"",""GCCTCCTGCCCGATCTGAGCAAACCCACCACTACTCCGCACCCTGTGACTCAGGACA""]","""cuteSV-25-175780-INS-0-56""",-10.0,{},[31],[0],[7],[24],[3.62e-02],"[22,99]",856,,[1.00e+00],[6.22e-18],"[""cuteSV-25-175780-INS-0-56""]",0,0.0362,428,142,"[""CCTCCTGCCCGATCTGAGCAAACCCACCACTACTCCGCACCCTGTGACTCAGGACA|intron_variant|MODIFIER|NPRL3|ENSBTAG00000016564|Transcript|ENSBTAT00000022032|protein_coding||2/12||||||||||-1||VGNC||"",""CCTCCTGCCCGATCTGAGCAAACCCACCACTACTCCGCACCCTGTGACTCAGGACA|intron_variant|MODIFIER|NPRL3|ENSBTAG00000016564|Transcript|ENSBTAT00000072039|protein_coding||2/2||||||||||-1||VGNC||""]",1.67,0.0435,0.167
25:211573,"[""ATATATATATACATATATATATGTATATATGTATATATATGTATATGTGTATATATATATATA"",""A""]","""cuteSV-25-211573-DEL-0-62""",-10.0,{},[135],[0],[21],[114],[1.58e-01],"[53,35]",856,,[1.00e+00],[0.00e+00],"[""cuteSV-25-211573-DEL-0-62""]",0,0.158,428,86,"[""-|upstream_gene_variant|MODIFIER|HBA|ENSBTAG00000051412|Transcript|ENSBTAT00000037545|protein_coding|||||||||||4813|1||EntrezGene||"",""-|upstream_gene_variant|MODIFIER||ENSBTAG00000034024|Transcript|ENSBTAT00000048243|protein_coding|||||||||||130|1||||"",""-|upstream_gene_variant|MODIFIER|HBM|ENSBTAG00000038757|Transcript|ENSBTAT00000054450|protein_coding|||||||||||1795|1||VGNC||""]",2.42,0.201,3.83
25:449211,"[""GTCACGTGTGTTCTCTGTGTGTCACGTGTGTTCTGTGTGTGACGTGTGTGGCCTGTTCTGTGTGT"",""G""]","""cuteSV-25-449211-DEL-0-64""",-10.0,{},[75],[0],[13],[62],[8.76e-02],"[67,30]",856,,[1.00e+00],[8.03e-35],"[""cuteSV-25-449211-DEL-0-64""]",0,0.0876,428,97,"[""-|intron_variant|MODIFIER|RAB11FIP3|ENSBTAG00000016591|Transcript|ENSBTAT00000022068|protein_coding||5/13||||||||||1||VGNC||"",""-|intron_variant|MODIFIER|RAB11FIP3|ENSBTAG00000016591|Transcript|ENSBTAT00000075339|protein_coding||5/14||||||||||1||VGNC||""]",1.5,0.261,0.5
25:746258,"[""GACGTCAGTGTGGCGCCGTCTGCACGCGCTGACCCGCCCGTCGCTGCAGCCTCACCT"",""G""]","""cuteSV-25-746258-DEL-0-56""",-10.0,{},[25],[0],[1],[24],[2.92e-02],"[31,46]",856,,[1.00e+00],[6.05e-23],"[""cuteSV-25-746258-DEL-0-56""]",0,0.0292,428,82,"[""-|intron_variant|MODIFIER|LMF1|ENSBTAG00000019745|Transcript|ENSBTAT00000004206|protein_coding||3/9||||||||||-1||VGNC||0.865"",""-|intron_variant|MODIFIER|LMF1|ENSBTAG00000019745|Transcript|ENSBTAT00000066947|protein_coding||4/10||||||||||-1||VGNC||0.865"",""-|intron_variant|MODIFIER|LMF1|ENSBTAG00000019745|Transcript|ENSBTAT00000079430|protein_coding||4/9||||||||||-1||VGNC||0.865"",""-|intron_variant|MODIFIER|LMF1|ENSBTAG00000019745|Transcript|ENSBTAT00000082383|protein_coding||4/10||||||||||-1||VGNC||0.865""]",1.17,0.0272,0.333
25:886449,"[""AAGGGGAGCCCAAGGTAGGAAGGGGAAGCCCCTGGTAGGAAGGGGAAACCCCTGGTAGG"",""A""]","""cuteSV-25-886449-DEL-0-58""",-10.0,{},[142],[0],[34],[108],[1.66e-01],"[95,41]",856,,[1.00e+00],[1.58e-38],"[""cuteSV-25-886449-DEL-0-58""]",0,0.166,428,143,"[""-|downstream_gene_variant|MODIFIER||ENSBTAG00000049617|Transcript|ENSBTAT00000069032|protein_coding|||||||||||4892|1||||""]",1.92,0.364,2.89
25:1000282,"[""TCTCTGCTCCCTTGTTGCTCTGGTTTTCTTTGAATCCAACATCTTGACACAATGTGAGACATACCCTGGAAACAAACTTGAGACCTAACCTATCATGTCATTGAAGTGTAATAATGTAAACATGATGCAAGATACATAATATAATATATGTGATACACAAGATATAAAAGAAAATGTATCATACACTACATTGTGAGGCTTGCCCACAATGTAAGGGGGTTTGCCTTAGGGCAGGTCTGCGCTTAGGTGGCGCTAGTGGTAAAGAATCTGCCTGCCGATGCAGAACGTAA"",""T""]","""cuteSV-25-1000282-DEL-0-289""",-10.0,{},[122],[0],[14],[108],[1.43e-01],"[0,30]",856,,[1.00e+00],[0.00e+00],"[""cuteSV-25-1000282-DEL-0-289""]",0,0.143,428,301,"[""-|downstream_gene_variant|MODIFIER|TPSB2|ENSBTAG00000007325|Transcript|ENSBTAT00000037579|protein_coding|||||||||||809|1||EntrezGene||"",""-|downstream_gene_variant|MODIFIER|TPSB2|ENSBTAG00000007325|Transcript|ENSBTAT00000074010|protein_coding|||||||||||66|1||EntrezGene||"",""-|downstream_gene_variant|MODIFIER|TPSB2|ENSBTAG00000007325|Transcript|ENSBTAT00000085087|protein_coding|||||||||||66|1||EntrezGene||""]",1.58,0.179,3.89
25:1110371,"[""CTAAGCCTTGGGTCCCAGGATTTATAACCCTCTTGCAATGAACAGACTTTAA"",""C""]","""cuteSV-25-1110371-DEL-0-51""",-10.0,{},[124],[0],[12],[112],[1.45e-01],"[0,30]",856,,[1.00e+00],[0.00e+00],"[""cuteSV-25-1110371-DEL-0-51""]",0,0.145,428,111,"[""-|intron_variant|MODIFIER|UNKL|ENSBTAG00000014126|Transcript|ENSBTAT00000018773|protein_coding||1/13||||||||||-1||VGNC||"",""-|intron_variant|MODIFIER|UNKL|ENSBTAG00000014126|Transcript|ENSBTAT00000070499|protein_coding||2/14||||||||||-1||VGNC||""]",2.92,0.179,3.11


We can easily get the ids of the common variants

In [7]:
indicusFrequent.rows().rsid.collect()[:10]

['cuteSV-25-37471-DEL-0-61',
 'cuteSV-25-79994-DEL-0-77',
 'cuteSV-25-108289-DEL-0-70',
 'cuteSV-25-175780-INS-0-56',
 'cuteSV-25-211573-DEL-0-62',
 'cuteSV-25-449211-DEL-0-64',
 'cuteSV-25-746258-DEL-0-56',
 'cuteSV-25-886449-DEL-0-58',
 'cuteSV-25-1000282-DEL-0-289',
 'cuteSV-25-1110371-DEL-0-51']

Similarily we find common variants for the Holstein breed only.

In [8]:
numSamples=mt.aggregate_cols(hl.agg.filter(mt.breed.CompositeBreed == "Holstein" ,hl.agg.count()))
mt=mt.annotate_rows(AF_Holstein=hl.agg.filter(mt.breed.CompositeBreed =="Holstein",
                                     hl.agg.sum(mt.GT.n_alt_alleles())
                                     / numSamples*2 ))

mt.filter_rows(mt.AF_Holstein > 0.8).rows().show()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,info,info,info,info,info,info,info,info,info,info,info,info,info,info,info,info,Unnamed: 21_level_0,Unnamed: 22_level_0,Unnamed: 23_level_0,Unnamed: 24_level_0
locus,alleles,rsid,qual,filters,AC,AC_Hemi,AC_Het,AC_Hom,AF,AK,AN,DP,ExcHet,HWE,ID,MA,MAF,NS,UK,CSQ,AF_indicus,AF_taurus,AF_bosoutgroup,AF_Holstein
locus<ARSUCD>,array<str>,str,float64,set<str>,array<int32>,array<int32>,array<int32>,array<int32>,array<float64>,array<int32>,int32,int32,array<float64>,array<float64>,array<str>,int32,float64,int32,int32,array<str>,float64,float64,float64,float64
25:37471,"[""ATATATTTATTATATCATATTATAATATATATTTATTATATACATAAATACATGTTTATATT"",""A""]","""cuteSV-25-37471-DEL-0-61""",-10.0,{},[339],[0],[195],[144],[3.96e-01],"[50,25]",856,,[8.67e-01],[3.14e-01],"[""cuteSV-25-37471-DEL-0-61""]",0,0.396,428,75,"[""-|upstream_gene_variant|MODIFIER||ENSBTAG00000053570|Transcript|ENSBTAT00000078325|processed_pseudogene|||||||||||850|1||||""]",1.58,1.52,2.22,1.63
25:108289,"[""GGATGTGGGGGGCAGGGGCACTTAGGATGTGGGGGGCGGGGGGCACTTAGGATGTGGGGGGTGGGGGGCAC"",""G""]","""cuteSV-25-108289-DEL-0-70""",-10.0,{},[498],[0],[28],[470],[5.82e-01],"[98,45]",856,,[1.00e+00],[0.00e+00],"[""cuteSV-25-108289-DEL-0-70""]",0,0.418,428,143,"[""-|upstream_gene_variant|MODIFIER|IL9R|ENSBTAG00000007558|Transcript|ENSBTAT00000009944|protein_coding|||||||||||1129|-1||VGNC||0.748""]",3.25,2.28,2.17,2.47
25:1663149,"[""GACGGTGGCGTGGGGGGGTTGGGCGGGGGCAGGACAGGCAAGGATGATGGCTGGGGCAGTCCACAGAGAACAG"",""G""]","""cuteSV-25-1663149-DEL-0-72""",-10.0,{},[460],[0],[104],[356],[5.37e-01],"[28,0]",856,,[1.00e+00],[5.01e-27],"[""cuteSV-25-1663149-DEL-0-72""]",0,0.463,428,28,"[""-|intron_variant&non_coding_transcript_variant|MODIFIER|PKD1|ENSBTAG00000020619|Transcript|ENSBTAT00000027480|pseudogene||1/45||||||||||-1||VGNC||""]",3.67,1.95,3.17,1.93
25:1893344,"[""TATATGGAACCTTAATATATGGAATCCATTTTGAGTTTATTTTTGTGAATGTTGTTAGAAAGTATTCTAGCTTCATTCTTTTACAAGTGGTTGACCAGTTTTCCCAGCACCACTTGTTAAAGAGATTGTCTGTAATCCATTGTATATTCTTGCCTCCTTTGTCAAAGATAAGGTGTCCATAGGTGCGTGGATTTATCTCTGGGCTTTCTATTTCGTTCCATTTAAGTGTCTAGTTTTTATTGGAGTTTCATTACATAGGTTCATGAGTGAATCATGAACCTCGTGACTAAACTTAATCTGCAGTCTCTCCCTCCTGGCAGATCGGGAGATCAGATTGATAACCTGGCTCAAAGCCCCAACTCTCAAACTACATGGTTGATCTTTCTGATATGACCAGCCAGGATCTTGAACACTTCAATAGCAGGTGTACTCAGTACCTGAGGTAGTACTCAGGTACTACCAGGAGTAACAAAGACATACCATCCCTCCAAGGATTTAGAGGCTCCTCCCAGTAACTGGGGACAGACACCAGGCAGATTCTTTATTATTTAATACATACAATGATT"",""T""]","""cuteSV-25-1893344-DEL-0-565""",-10.0,{},[678],[0],[16],[662],[7.92e-01],"[141,33]",856,,[1.00e+00],[0.00e+00],"[""cuteSV-25-1893344-DEL-0-565""]",0,0.208,428,127,"[""-|intron_variant|MODIFIER||ENSBTAG00000049585|Transcript|ENSBTAT00000081063|protein_coding||1/2||||||||||1||||""]",3.67,3.13,3.22,2.4
25:2011284,"[""GGAGACACTCAGAGAAGATTCCTCTTTCCAGAAGTCAACACCAGCAACCTGCTAGGGCAGCAGGCTGGGGAGGAGCTG"",""G""]","""cuteSV-25-2011284-DEL-0-77""",-10.0,{},[575],[0],[77],[498],[6.72e-01],"[31,0]",856,,[1.00e+00],[1.90e-34],"[""cuteSV-25-2011284-DEL-0-77""]",0,0.328,428,31,"[""-|upstream_gene_variant|MODIFIER||ENSBTAG00000050859|Transcript|ENSBTAT00000073497|protein_coding|||||||||||1136|1||||""]",3.92,2.48,3.94,2.17
25:2188249,"[""ACCGAGCCCGAGTGTTGCAACTACTGAAGCCTAAGCCCATGCGCCACAAGT"",""A""]","""cuteSV-25-2188249-DEL-0-50""",-10.0,{},[365],[0],[85],[280],[4.26e-01],"[93,43]",856,,[1.00e+00],[1.83e-36],"[""cuteSV-25-2188249-DEL-0-50""]",0,0.426,428,135,"[""-|downstream_gene_variant|MODIFIER||ENSBTAG00000050298|Transcript|ENSBTAT00000083765|lncRNA|||||||||||4355|-1||||""]",3.0,1.44,3.56,1.68
25:2497379,"[""CAGTGCCGGATTCTCTTCTTTTGTGGAGAAGGCTATGGCACCCCACTCCAGTACTCTTGCCTGGAAAATCCCATGGACCGAGGAGCCTGGTAGGCTGCAGTCCATGGGGTGGCAAAGAGTCCGACACGACTGAGGGACTTCACTTTTACTTTCCACTTTCCTGCATTGGAGAGGGAAATGGCAACCCACTCCAGTGTTCTTACCTGGAGAATCCCAGGGATGGGGGAGCCTGGTGGGCTGCCGTCTATGGAGTCGCACAGAGTCGGACAGGACTGAAGCGACTTAGCAGCAGTAGCAGGGTCCCGGTGGTGGCGGGCTAGCGCGGGGCACGCGCAGCGCGGCCTACCCGGCCCCCGCCCCGCTCGCAGGCCCGCAGTTCTCGTCGTTCCAGGACCGGCTGCAAGAGGGCTCCGCACGCCCGCTCACCGAGCCGGGGCTGCCCCCCGGGGCGGAGGTGGACGCCGTGTCTCCTGGCGGCTCAACGAGAAGACCTACCTGATCCGGGGGCGGCAGTACTGGCGGTAGGAGGAGGCGGCGGCGCGCCCGGACCCCGGCTACCCGCGCGATCAGAGCCTCTGGGAGGGCGCGCTGCCGGCCCCCGACGATGTCACCGTCAGCAACAGGGGTGGGAGAGCTCGCGCACCCGGGGGGCCGGGCCTTGGGGGCCCAGTGGCCGTGGGGTGGTGGGGCCTGCAGAGCCCGGCGTGGGCGGAGAAGCGGTGCGGAGAGAGGGGTTTGGAGGGTGGGCAGGCGTGCACCAGCCCCTCAAACCTCTCCCTCCCCGCAGGTGACACCTACTTCTTCAAGGGCCTCCACTACTGGCGCTTCCCCAAGGGCAGCATCAGAGCCGAGCCAGACTCCCCCCAACCCATGGGTCCCAAGTTCCTGGACTGCCCCGCCCCCACTGTGGGCCCTCGGGCCCCCAGACCCCCCAGAGCGACCCTTAAGCCAGAGACTGCGACTGTCAATGTGGGATCAGTCCGGCTGCGGGGCGTCCTTCAGTGCCCCTCCTGCCTCTTCTGTCCCTGCTGGTGGGGGGCCTGGCCTCCCCCTGAAGGGAACCTGCGGCTGTATGGAAGGGATCGGATCTCCCACCGGGCCCCCTCGGGTGGTGGCTGTGGGAAGTCTGTCCCAGGGGTGGGTCTCAGCCAGGATGCAGCTGGGGATGCAGGTGGACCTAAATCCAGGAGGGCAGAGACACAGGGCCAGCCCTGGCACTGAATGTCCTGACTGCCGCCTCCTGGGCCTTCCTTCCTCCCCTGTCCCTGGCCATGACCCCGCAGGACTCTCCTTTTCCCGGAACACCTGGCCTTTCCCGGAGCTCAGGTGGCTGAATTCCTGCAGCCCGCCCCGCCTGCCAACAGCAGCTGTCCCAAGTGCTTGGCTCTTTACAGGACAGCCCCACTCCCCCTTGCTGAGCAGGCTCTCCAGCCTCTGGTCTAAGGTTTCTCATCCTGTCATCCCCTCACGAAGTGCCAATCGAAGCACATTGGTGGGGTCCCAAACCCAGCTCTGTGACCACCCCTGAGCATGTCCCAAGTCCGGGGTGACATCCATGGTTCACATGGGTGTCACATGAGCATCACGTGGGTGATTCTCCTGAGACGGCCCCTGTGGCCCATCCTTGGTTCTAATCAGCACGCTGGTAACCAGCACTGCCCACATTCAGTACGACACCCTCCTGCTCCTAAAACACCTCTCCCACCTCCTCACCTGAGCCTCTGCCACCCCCTCCAGAACCGCTTGGCTCCCATCAGCTCGATGTTGGTGGGAAGAAAGTGGACTGGTCCACTCTCAGTCTGGGGTCACTGTCTCCTCAGTGCCCCTGGGTGACAAGTGGATGGAACCTGGCCACCGCAATGACCTCCCTCCCCACTGAATTCAGGATCTA"",""C""]","""cuteSV-25-2497379-DEL-0-1890""",-10.0,{},[259],[0],[87],[172],[3.03e-01],"[273,28]",856,,[1.00e+00],[5.25e-26],"[""cuteSV-25-2497379-DEL-0-1890""]",0,0.303,428,301,"[""-|downstream_gene_variant|MODIFIER||ENSBTAG00000050846|Transcript|ENSBTAT00000067071|protein_coding|||||||||||2227|1||||"",""-|downstream_gene_variant|MODIFIER||ENSBTAG00000050846|Transcript|ENSBTAT00000078926|protein_coding|||||||||||2227|1||||"",""-|upstream_gene_variant|MODIFIER||ENSBTAG00000052397|Transcript|ENSBTAT00000085693|protein_coding|||||||||||3244|1||||"",""-|downstream_gene_variant|MODIFIER||ENSBTAG00000050846|Transcript|ENSBTAT00000085827|protein_coding|||||||||||2227|1||||""]",2.08,0.935,3.44,0.988
25:2507316,"[""GATAACAAAGGTCCATCTAGTCAAGGCTATGGTTTTTCCTGTGGTCATGTATGGATGTGAGAGTTGGACTGTGAAGAAGGCTGAGTGCCGAAGAATTGATGCTTTTGAACTGTGGTGTTGGAGAAGACTCTTGAGAGTCCCTTGGACTGCAGGGAGATCCAACCAGTCCATTCTGAAGGAGATCACGCCTGGGATTTCTTTGGAAGGAATGATGCTAAAGCTGAAATTCCAGTACTTTGGCCACCTCATGCGAGGAGTTGACTCATTGGAAAAGACTCTGATGCTGGGAGGGATTGGGGGCAGGAGGAGAAGGGGACGACAGAGGATGAGATGGCTGGATGGTATCACTGACTCGATCGACGTGAGTCTGAGTGCACTCTGGGAGTTGGTGATGGACAGGGAGGCCTGCTTTCTGCGATTGATGGGGTCGGAAAGAGTCGGACACGACTGAGCGACTGATCTGATCTGATCT"",""G""]","""cuteSV-25-2507316-DEL-0-471""",-10.0,{},[203],[0],[135],[68],[2.37e-01],"[129,119]",856,,[9.97e-01],[1.06e-02],"[""cuteSV-25-2507316-DEL-0-471""]",0,0.237,428,301,"[""-|downstream_gene_variant|MODIFIER||ENSBTAG00000052397|Transcript|ENSBTAT00000085693|protein_coding|||||||||||466|1||||""]",2.08,0.946,0.222,0.815
25:3373281,"[""A"",""AATATGGCAGTTCCTCAAACACTTAAAAATAGAATTATCAATTACACACTATCCA""]","""cuteSV-25-3373281-INS-0-54""",-10.0,{},[467],[0],[103],[364],[5.46e-01],"[27,78]",856,,[1.00e+00],[1.65e-27],"[""cuteSV-25-3373281-INS-0-54""]",0,0.454,428,108,"[""ATATGGCAGTTCCTCAAACACTTAAAAATAGAATTATCAATTACACACTATCCA|upstream_gene_variant|MODIFIER||ENSBTAG00000026400|Transcript|ENSBTAT00000037519|pseudogene|||||||||||4725|1||||""]",3.42,2.27,0.444,1.48
25:3507271,"[""GTGGGGCACTGGACCACGTGGTCTCAGAGAGGCTTCCACTCTGAGACCACGTAGTCCAGTGCCCCACGGAGGCGA"",""G""]","""cuteSV-25-3507271-DEL-0-74""",-10.0,{},[347],[0],[35],[312],[4.05e-01],"[117,43]",856,,[1.00e+00],[0.00e+00],"[""cuteSV-25-3507271-DEL-0-74""]",0,0.405,428,160,"[""-|downstream_gene_variant|MODIFIER|GLIS2|ENSBTAG00000009199|Transcript|ENSBTAT00000012119|protein_coding|||||||||||3045|1||VGNC||0.0975"",""-|upstream_gene_variant|MODIFIER|PAM16|ENSBTAG00000009200|Transcript|ENSBTAT00000012120|protein_coding|||||||||||547|-1||VGNC||"",""-|downstream_gene_variant|MODIFIER|CORO7|ENSBTAG00000009201|Transcript|ENSBTAT00000012121|protein_coding|||||||||||4719|-1||VGNC||0.89"",""-|intron_variant|MODIFIER|PAM16|ENSBTAG00000009200|Transcript|ENSBTAT00000080249|protein_coding||1/3||||||||||-1||VGNC||""]",3.25,1.43,2.5,0.889


## Explore population genotypes of a specfic variant

Let's explore the population data of the high impact variant that we visualized earlier


In [9]:
HighImpactSV=mt.filter_rows(mt.rsid=="cuteSV-25-2585287-DEL-0-396")
HighImpactSV.rows().show()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,info,info,info,info,info,info,info,info,info,info,info,info,info,info,info,info,Unnamed: 21_level_0,Unnamed: 22_level_0,Unnamed: 23_level_0,Unnamed: 24_level_0
locus,alleles,rsid,qual,filters,AC,AC_Hemi,AC_Het,AC_Hom,AF,AK,AN,DP,ExcHet,HWE,ID,MA,MAF,NS,UK,CSQ,AF_indicus,AF_taurus,AF_bosoutgroup,AF_Holstein
locus<ARSUCD>,array<str>,str,float64,set<str>,array<int32>,array<int32>,array<int32>,array<int32>,array<float64>,array<int32>,int32,int32,array<float64>,array<float64>,array<str>,int32,float64,int32,int32,array<str>,float64,float64,float64,float64
25:2585287,"[""AGCCCTGTCTGTAGCCAGAGCCCGGCCCCAGCGCTCAGTATGACCTGTCCGGGGCCAGGGCCGCCCTCCTCCTGGCTGTGACCCAGGGCCGGCTGGGGGCCCAGCACGATGTGGAGGCGCTGGAGGGCTTGTGCCAGGCCCTGGGCTTCGAGACCACCCTGAGGACAGACCCTACAGCCCAGGTGAGGGGAAGCCCAGAACCTCTGAAGGTCCTCTGAAGGAAGGGTACCCCCCACCCAGACCCTGGGGACTCTGTCCGGGGCCTCTTACCAATTGTGGGCAGAAATGCACCCCCAGCCTCCCTGTTGCATGCATGCATCTATGCGAACCCACTTCCCTGCTCCTGCAAGTCCAGTCTCCCGGGCCGCTGCCACCTCTCACGCCGGCCCCTGCTCAA"",""A""]","""cuteSV-25-2585287-DEL-0-396""",-10.0,{},[81],[0],[31],[50],[9.46e-02],"[282,19]",856,,[1.00e+00],[7.82e-20],"[""cuteSV-25-2585287-DEL-0-396""]",0,0.0946,428,301,"[""-|splice_acceptor_variant&splice_donor_variant&coding_sequence_variant&intron_variant|HIGH|CASP16|ENSBTAG00000021407|Transcript|ENSBTAT00000048500|protein_coding|5/9|4-5/8||||||||||1||VGNC||"",""-|downstream_gene_variant|MODIFIER||ENSBTAG00000052480|Transcript|ENSBTAT00000067899|lncRNA|||||||||||4958|-1||||"",""-|splice_acceptor_variant&splice_donor_variant&coding_sequence_variant&intron_variant|HIGH|CASP16|ENSBTAG00000021407|Transcript|ENSBTAT00000075295|protein_coding|4/8|3-4/7||||||||||1||VGNC||""]",0.333,0.332,0.889,0.395


In [10]:
print("Indicus Freq =%.2f"%       HighImpactSV.rows().AF_indicus.collect()[0])
print("Taurus Freq =%.2f"%        HighImpactSV.rows().AF_taurus.collect()[0])
print("Bos out group Freq =%.2f"% HighImpactSV.rows().AF_bosoutgroup.collect()[0])

Indicus Freq =0.33
Taurus Freq =0.33
Bos out group Freq =0.89


### Here we are showing the sum of alleles found per each breed.


In [16]:
entries = HighImpactSV.entries()
results = (entries.group_by(breed = entries.breed.CompositeBreed)
      .aggregate(alleleCount = hl.agg.sum(entries.GT.n_alt_alleles())))
results=results.order_by(-results.alleleCount)
results.show()

2023-01-10 20:23:20.617 Hail: INFO: Ordering unsorted dataset with network shuffle
2023-01-10 20:23:20.832 Hail: INFO: Ordering unsorted dataset with network shuffle


pop,alleleCount
str,int64
"""Holstein""",16
"""Yak""",12
"""Simmental""",7
"""Cross-Holstein-Jersey""",6
"""taurus""",6
"""SantaGertrudis""",4
"""Angus""",3
"""BrownSwiss""",3
"""Banteng""",2
"""Bison""",2


Finally get the ids of the samples that have this variant

In [19]:
entries = HighImpactSV.entries()
results = entries.filter(entries.GT.is_non_ref())
print(results.s.collect())

['SAMEA3390143', 'SAMEA3390161', 'SAMEA5159810', 'SAMEA5159818', 'SAMEA5159889', 'SAMEA6163185', 'SAMEA7573539', 'SAMEA7573648', 'SAMEA7589752', 'SAMEA8924040', 'SAMN01915352', 'SAMN01915355', 'SAMN02671521', 'SAMN02671580', 'SAMN02671595', 'SAMN02671598', 'SAMN02671621', 'SAMN02671658', 'SAMN02671659', 'SAMN02671660', 'SAMN02671677', 'SAMN02671725', 'SAMN02671729', 'SAMN03765682', 'SAMN05199572', 'SAMN05199680', 'SAMN05199691', 'SAMN05199699', 'SAMN05199708', 'SAMN05199766', 'SAMN05788491', 'SAMN05788512', 'SAMN05945846', 'SAMN06699025', 'SAMN08166128', 'SAMN08323868', 'SAMN08324143', 'SAMN08612380', 'SAMN08612388', 'SAMN08612428', 'SAMN08612456', 'SAMN09087153', 'SAMN09087199', 'SAMN09737047', 'SAMN09737155', 'SAMN10525452', 'SAMN10531953', 'SAMN10963688', 'SAMN11569574', 'SAMN12881846', 'SAMN15779711', 'SAMN15779738', 'SAMN15779990', 'SAMN15780061', 'SAMN19491905', 'SAMN19491956']


# Run principal component analysis (PCA) on the Hardy-Weinberg-normalized genotype call matrix.
Finally lets run pca on the genotypes and visualize how the samples are related to each others

In [38]:
eigenvalues, pcs, _ = hl.hwe_normalized_pca(mt.GT)
mt = mt.annotate_cols(scores = pcs[mt.s].scores)


2023-01-10 20:57:18.788 Hail: INFO: hwe_normalize: found 599 variants after filtering out monomorphic sites.
2023-01-10 20:57:19.121 Hail: INFO: Coerced sorted dataset
2023-01-10 20:57:19.465 Hail: INFO: pca: running PCA with 10 components...
2023-01-10 20:57:22.174 Hail: INFO: wrote table with 0 rows in 0 partitions to /tmp/persist_tableGXudIvjJAt
    Total size: 36.79 KiB
    * Rows: 0.00 B
    * Globals: 36.79 KiB
    * Smallest partition: N/A
    * Largest partition:  N/A


In [39]:
from bokeh.models import  CategoricalColorMapper
from bokeh.palettes import Category10

pallete=Category10[3]
colors={
    'taurus': pallete[0],
    'indicus': pallete[1],
    'bosoutgroup': pallete[2]
    
}

colorTable={}
for s in table.collect():
    colorTable[s.CompositeBreed]=colors[s.Cohort]
factors=[]
pallete=[]
for k,v in colorTable.items():
    factors.append(k)
    pallete.append(v)
    
color_mapper = CategoricalColorMapper(factors=factors, palette=pallete)    

p = hl.plot.scatter(mt.scores[0],
                    mt.scores[1],
                    label=mt.breed.CompositeBreed,
                    colors=color_mapper,
                    title='PCA', xlabel='PC1', ylabel='PC2')
show(p)