<b>Author:</b> ...

<b>Contributors:</b> ...


<div class="alert alert-block alert-danger">
Before you start running this notebook, make sure you are using the Hail Genomics Analysis Environment. To do so,
<br/>
    
<ul>
    <li>Click on the <b>cloud analysis environment</b> icon on the righthand side of the screen.</li>
    <li>Inside <b>Recommended environments</b>, select <b>Hail Genomics Analysis</b> which creates a cloud environment for your analyses.</li>
    <li>This analysis can be run with <b>high compute</b> (e.g. 96 CPUs, 624 GB of RAM, 300 workers and 300 preemptibles with 4 CPUs, 15 GB of RAM).</li>
    <li>Click on <b>Next</b>.</li>
</ul>
    
</div>

<h1>Notebook Objectives</h1>

This notebook subsets the short-read v7 VDS to the ~1,027 long-read samples and the ~989 samples for which GATK-SV calls are available.

<b>How to Use this Notebook...</b>

<b>As a tutorial:</b>

...

<b>As a resource:</b>

...

<h2>Relevant Information:</h2>

...

In [5]:
import pandas as pd
import numpy as np
import os
import re
import json

In [6]:
import pysam
from pysam import VariantFile

from collections import defaultdict
from collections import Counter

from tqdm.notebook import tqdm

In [7]:
from google.cloud import storage

In [8]:
import hail as hl
from hail.plot import show
from pprint import pprint

## Define helper functions

In [9]:
def mt_exists(gcs_path):
    (gcs_bucket_name, gcs_obj) = re.split("\/", re.sub("gs://", "", gcs_path), maxsplit=1)
    
    storage_client = storage.Client()
    gcs_bucket = storage_client.bucket(gcs_bucket_name)
    stats = storage.Blob(bucket=gcs_bucket, name=f'{gcs_obj}/README.txt').exists(storage_client)
    
    return stats

In [10]:
def vds_exists(gcs_path):
    (gcs_bucket_name, gcs_obj) = re.split("\/", re.sub("gs://", "", gcs_path), maxsplit=1)
    
    storage_client = storage.Client()
    gcs_bucket = storage_client.bucket(gcs_bucket_name)
    stats = storage.Blob(bucket=gcs_bucket, name=f'{gcs_obj}/reference_data/README.txt').exists(storage_client)
    
    return stats

In [11]:
bucket = os.environ['WORKSPACE_BUCKET']
workspace = os.environ['WORKSPACE_NAME']
namespace = os.environ['WORKSPACE_NAMESPACE']

In [12]:
if not os.path.exists("cohort_AoUSVPhaseII.v7.LRsamples.vcf.gz"):
    !gsutil cp gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/yulia/cohort_AoUSVPhaseII.v7.LRsamples.vcf.gz .

Copying gs://fc-secure-8e5a6fd7-16ae-4796-80ed-8f0463af5ff1/yulia/cohort_AoUSVPhaseII.v7.LRsamples.vcf.gz...
| [1 files][679.8 MiB/679.8 MiB]   56.8 MiB/s                                   
Operation completed over 1 objects/679.8 MiB.                                    


In [13]:
sr_sv_samples = !zgrep -m1 '^#CHROM' cohort_AoUSVPhaseII.v7.LRsamples.vcf.gz | cut -f10- | sed 's/\t/\n/g'

In [14]:
len(sr_sv_samples)

990

In [15]:
if not os.path.exists("concat_annotated.sens_09.vcf.gz"):
    !gsutil cp gs://fc-secure-fd873afb-038d-44ed-b113-623c141cb95f/releases/sv_integration/GRCh38/v1/concat_annotated.sens_09.vcf.gz .
        
if not os.path.exists("concat_annotated.sens_07.vcf.gz"):        
    !gsutil cp gs://fc-secure-fd873afb-038d-44ed-b113-623c141cb95f/releases/sv_integration/GRCh38/v1/concat_annotated.sens_07.vcf.gz .

Copying gs://fc-secure-fd873afb-038d-44ed-b113-623c141cb95f/releases/sv_integration/GRCh38/v1/concat_annotated.sens_09.vcf.gz...
- [1 files][732.3 MiB/732.3 MiB]   46.7 MiB/s                                   
Operation completed over 1 objects/732.3 MiB.                                    
Copying gs://fc-secure-fd873afb-038d-44ed-b113-623c141cb95f/releases/sv_integration/GRCh38/v1/concat_annotated.sens_07.vcf.gz...
- [1 files][451.2 MiB/451.2 MiB]                                                
Operation completed over 1 objects/451.2 MiB.                                    


In [16]:
sv_sens_09_vcf = 'concat_annotated.sens_09.vcf.gz'
sv_sens_07_vcf = 'concat_annotated.sens_07.vcf.gz'

In [17]:
!cat {sv_sens_09_vcf} | zcat | head -n 2000 | grep -v '^#' | head -n 3 | cut -f1-9

chr1	10147	0	C	CCCTAACCCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACAACCCTAACCCTAACAACCCTAACAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCAACCCAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAA	.	.	TRUVARI_ID=chr1-10148-INS-330;SVTYPE=INS;SVLEN=330;GTCNT=1073,0,0,1;F_MISSING=0.999069;NS=1;AN=2;AF=1;MAF=0;AC=2;AC_Het=0;AC_Hom=2;AC_Hemi=0;HWE=1;ExcHet=1	GT:GQ:DR:DV:SCORE:CALIBRATION_SENSITIVITY:SUPP_PBSV:SUPP_SNIFFLES:SUPP_PAV
chr1	10231	1	C	CCCTAACCCTAACCCCTACCCCAACCCCAACCCCAACCCCAACCCCAACCCTTAACCCTAA	.	.	TRUVARI_ID=chr1-10232-INS-60;SVTYPE=INS;SVLEN=60;GTCNT=1073,0,1,0;F_MISSING=0.999069;NS=1;AN=2;AF=0.5;MAF=0.5;AC=1;AC_Het=1;AC_Hom=0;AC_Hemi=0;HWE=1;ExcHet=1	GT:GQ:DR:DV:SCORE:CALIBRATION_SENSITIVITY:SUPP_PBSV:SUPP_SNIFFLES:SUPP_PAV
chr1	10280	2	AACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCTAAC	A	.	.	TRUVARI_ID=chr1-10281-DEL-50;SV

In [18]:
sv_sens_09_in = VariantFile(sv_sens_09_vcf)  # auto-detect input format

for i, rec in enumerate(sv_sens_09_in):
    print(f'{i} {rec.chrom} {rec.pos} {rec.info.values()}')
    
    if i > 10:
        break

0 chr1 10147 ['chr1-10148-INS-330', 'INS', 330, (1073, 0, 0, 1), (0.9990689754486084,), 1, 2, (1.0,), 0.0, (2,), (0,), (2,), (0,), (1.0,), (1.0,)]
1 chr1 10231 ['chr1-10232-INS-60', 'INS', 60, (1073, 0, 1, 0), (0.9990689754486084,), 1, 2, (0.5,), 0.5, (1,), (1,), (0,), (0,), (1.0,), (1.0,)]
2 chr1 10280 ['chr1-10281-DEL-50', 'DEL', 50, (1074, 0, 0, 0), (1.0,), 0, 0, (None,), None, (0,), (0,), (0,), (0,), (1.0,), (1.0,)]
3 chr1 10300 ['chr1-10301-DEL-103', 'DEL', 103, (1073, 0, 0, 1), (0.9990689754486084,), 1, 2, (1.0,), 0.0, (2,), (0,), (2,), (0,), (1.0,), (1.0,)]
4 chr1 10306 ['chr1-10307-INS-102', 'INS', 102, (1073, 0, 0, 1), (0.9990689754486084,), 1, 2, (1.0,), 0.0, (2,), (0,), (2,), (0,), (1.0,), (1.0,)]
5 chr1 10309 ['chr1-10310-INS-106', 'INS', 106, (1073, 0, 1, 0), (0.9990689754486084,), 1, 2, (0.5,), 0.5, (1,), (1,), (0,), (0,), (1.0,), (1.0,)]
6 chr1 10310 ['chr1-10311-INS-91', 'INS', 91, (1073, 0, 0, 1), (0.9990689754486084,), 1, 2, (1.0,), 0.0, (2,), (0,), (2,), (0,), (1.0,)

[E::idx_find_and_load] Could not retrieve index file for 'concat_annotated.sens_09.vcf.gz'


In [19]:
def count_samples_in_vcf(vcf_file_path):
    """
    Counts the number of samples in a VCF file.

    Parameters:
        vcf_file_path (str): Path to the VCF file.
    
    Returns:
        int: Number of samples in the VCF.
    """
    try:
        # Open the VCF file
        vcf = pysam.VariantFile(vcf_file_path)
        
        # Get the sample names
        sample_names = list(vcf.header.samples)
        
        # Return the number of samples
        return len(sample_names)
    except FileNotFoundError:
        raise FileNotFoundError(f"VCF file not found at: {vcf_file_path}")

In [20]:
count_samples_in_vcf(sv_sens_09_vcf)

[E::idx_find_and_load] Could not retrieve index file for 'concat_annotated.sens_09.vcf.gz'


1074

In [21]:
def count_variants_by_svtype(vcf_file_path, field='SVTYPE'):
    """
    Opens a VCF file and counts the number of variants by SVTYPE in the INFO field.
    
    Parameters:
        vcf_file_path (str): Path to the VCF file.
    
    Returns:
        dict: A dictionary where keys are SVTYPE values and values are counts.
    """
    # Open the VCF file
    try:
        vcf = pysam.VariantFile(vcf_file_path)
    except FileNotFoundError:
        raise FileNotFoundError(f"VCF file not found at: {vcf_file_path}")
    
    # Counter for SVTYPE occurrences
    svtype_counts = Counter()
    
    # Iterate through each record in the VCF
    for record in vcf.fetch():
        # Access the INFO field and get SVTYPE, if available
        svtype = record.info.get(field, None)
        if svtype:
            svtype_counts[svtype] += 1
    
    return dict(svtype_counts)

In [24]:
count_variants_by_svtype(sv_sens_09_vcf)

[E::idx_find_and_load] Could not retrieve index file for 'concat_annotated.sens_09.vcf.gz'


{'INS': 934491, 'DEL': 279385, 'UNK': 1684}

In [25]:
count_variants_by_svtype(sv_sens_07_vcf)

[E::idx_find_and_load] Could not retrieve index file for 'concat_annotated.sens_07.vcf.gz'


{'INS': 499272, 'DEL': 166597, 'UNK': 401}

In [22]:
!ls *.vcf.gz

cohort_AoUSVPhaseII.v7.LRsamples.vcf.gz  concat_annotated.sens_09.vcf.gz
concat_annotated.sens_07.vcf.gz


In [23]:
def count_svtype_per_sample(vcf_file_path, field='SVTYPE', passonly=False):
    """
    Iterates over each line in a VCF file and counts the number of variants
    of each SVTYPE for each sample.

    Parameters:
        vcf_file_path (str): Path to the VCF file.

    Returns:
        dict: A nested dictionary where the outer keys are sample names,
              inner keys are SVTYPEs, and values are counts.
    """
    # Open the VCF file
    try:
        vcf = pysam.VariantFile(vcf_file_path)
    except FileNotFoundError:
        raise FileNotFoundError(f"VCF file not found at: {vcf_file_path}")
        
    num_records_line = !zgrep -vc ^'#' {vcf_file_path}
    num_records = int(num_records_line[0])
    
    # Initialize the dictionary for counts
    svtype_counts = defaultdict(lambda: defaultdict(int))
    
    # Iterate through each record in the VCF
    for record in tqdm(vcf.fetch(), total=num_records, desc="Processing VCF records"):
        # Skip non-PASS variants
        if passonly and list(record.filter.keys()) != ['PASS']:
            continue

        # Get the SVTYPE from the INFO field
        svtype = record.info.get(field, None)
        if not svtype:
            continue  # Skip if SVTYPE is not present
            
        svtype = svtype[0] if type(svtype) is tuple else svtype
        
        # Update counts for each sample
        for sample in record.samples:
            genotype = record.samples[sample].get('GT')
            if genotype is None or any(gt is None for gt in genotype):
                continue  # Skip no-calls (./.)
            
            if any(gt != 0 for gt in genotype):  # Check for non-reference alleles
                svtype_counts[sample][svtype] += 1
                
    return svtype_counts

In [87]:
if not os.path.exists("sv_sens_09_counts.json"):
    sv_sens_09_counts = count_svtype_per_sample(sv_sens_09_vcf)

    # Write to a JSON file
    with open('sv_sens_09_counts.json', 'w') as json_file:
        json.dump(sv_sens_09_counts, json_file, indent=4)

with open('sv_sens_09_counts.json', 'r') as json_file:
    sv_sens_09_counts = json.load(json_file)

[E::idx_find_and_load] Could not retrieve index file for 'concat_annotated.sens_09.vcf.gz'

Processing VCF records:   0%|          | 0/1215560 [00:00<?, ?it/s]



In [88]:
sv_sens_09_counts

{}



In [29]:
if not os.path.exists("sv_sens_07_counts.json"):
    sv_sens_07_counts = count_svtype_per_sample(sv_sens_07_vcf)

    # Write to a JSON file
    with open('sv_sens_07_counts.json', 'w') as json_file:
        json.dump(sv_sens_07_counts, json_file, indent=4)

with open('sv_sens_07_counts.json', 'r') as json_file:
    sv_sens_07_counts = json.load(json_file)

[E::idx_find_and_load] Could not retrieve index file for 'concat_annotated.sens_07.vcf.gz'


Processing VCF records:   0%|          | 0/666270 [00:00<?, ?it/s]

In [30]:
sv_sens_07_counts

{'1706456': {'INS': 10832, 'DEL': 7391, 'UNK': 6},
 '2342167': {'INS': 11811, 'DEL': 8001, 'UNK': 3},
 '1584178': {'INS': 10830, 'DEL': 7702, 'UNK': 15},
 '1897748': {'INS': 10776, 'DEL': 7623, 'UNK': 12},
 '1833803': {'INS': 10560, 'DEL': 7482, 'UNK': 13},
 '1551813': {'DEL': 6942, 'INS': 10167, 'UNK': 13},
 '1632606': {'INS': 10252, 'DEL': 7237, 'UNK': 15},
 '1223114': {'INS': 10846, 'DEL': 7618, 'UNK': 16},
 '1918452': {'INS': 11001, 'DEL': 7739, 'UNK': 11},
 '1375941': {'INS': 11058, 'DEL': 7463, 'UNK': 13},
 '2063899': {'INS': 10940, 'DEL': 7757, 'UNK': 12},
 '1199196': {'INS': 11421, 'DEL': 7838, 'UNK': 5},
 '1091467': {'INS': 11020, 'DEL': 7264, 'UNK': 21},
 '1731008': {'INS': 10930, 'DEL': 7691, 'UNK': 9},
 '1604558': {'INS': 11679, 'DEL': 7953, 'UNK': 6},
 '1797026': {'INS': 11113, 'DEL': 7831, 'UNK': 17},
 '3438535': {'INS': 11077, 'DEL': 7767, 'UNK': 9},
 '1173769': {'INS': 10653, 'DEL': 7619, 'UNK': 10},
 '1313697': {'INS': 11023, 'DEL': 7661, 'UNK': 13},
 '1371964': {'INS'

In [31]:
with open('sv_sens_07_counts.json', 'w') as json_file:
    json.dump(sv_sens_07_counts, json_file, indent=4) 

In [32]:
!gsutil cp sv_sens_0*_counts.json $WORKSPACE_BUCKET/scratch/kvg/

Copying file://sv_sens_07_counts.json [Content-Type=application/json]...
Copying file://sv_sens_09_counts.json [Content-Type=application/json]...        
/ [2 files][178.1 KiB/178.1 KiB]                                                
Operation completed over 2 objects/178.1 KiB.                                    


## List long read samples

In [24]:
lr_sv_samples = !zgrep -m1 '^#CHROM' concat_annotated.sens_09.vcf.gz | cut -f10- | sed 's/\t/\n/g'

In [25]:
len(lr_sv_samples)

1074

In [26]:
!zgrep -m1 '^#CHROM' concat_annotated.sens_09.vcf.gz | cut -f10- | sed 's/\t/\n/g' > samples_1074.txt



gzip: stdout: Broken pipe


In [27]:
with open('samples_1074.txt', 'r') as file:
    sample_names = file.readlines()

sample_names = [name.strip() for name in sample_names]

In [28]:
len(sample_names)

1074

## List long read samples without HPRC samples

In [29]:
common_samples_1027 = [element for element in lr_sv_samples if not (element.startswith('HG') or element.startswith('NA'))]
len(common_samples_1027)

1027

## List long read samples with GATK-SV calls available

In [30]:
common_samples_990 = list(set(sr_sv_samples) & set(lr_sv_samples))
len(common_samples_990)

990

In [31]:
with open('samples_990.txt', 'w') as file:
    for sample_name in common_samples_990:
        file.write(f'{sample_name}\n')

In [42]:
!wget -O bcftools-1.21.tar.bz2 https://github.com/samtools/bcftools/releases/download/1.21/bcftools-1.21.tar.bz2

--2025-03-27 22:18:33--  https://github.com/samtools/bcftools/releases/download/1.21/bcftools-1.21.tar.bz2
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/11368595/de2bcdce-ae2d-4b03-a273-c1f30d0e821f?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20250327%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250327T221833Z&X-Amz-Expires=300&X-Amz-Signature=6e4b34e585278d6c284c846d91f07e827b0f45c881c190811cf2efece99118a5&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dbcftools-1.21.tar.bz2&response-content-type=application%2Foctet-stream [following]
--2025-03-27 22:18:33--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/11368595/de2bcdce-ae2d-4b03-a273-c1f30d0e821f?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-

In [43]:
!bunzip2 -c bcftools-1.21.tar.bz2 | tar xvf -

bcftools-1.21/
bcftools-1.21/filter.c
bcftools-1.21/smpl_ilist.h
bcftools-1.21/install-sh
bcftools-1.21/mw.h
bcftools-1.21/version.sh
bcftools-1.21/em.c
bcftools-1.21/edlib.c
bcftools-1.21/vcfcnv.c
bcftools-1.21/vcfbuf.h
bcftools-1.21/cigar_state.h
bcftools-1.21/smpl_ilist.c
bcftools-1.21/gvcf.h
bcftools-1.21/str_finder.h
bcftools-1.21/read_consensus.c
bcftools-1.21/vcfroh.c
bcftools-1.21/abuf.c
bcftools-1.21/gff.h
bcftools-1.21/dist.h
bcftools-1.21/cols.c
bcftools-1.21/consensus.c
bcftools-1.21/vcfsom.c
bcftools-1.21/bam2bcf.h
bcftools-1.21/kmin.h
bcftools-1.21/main.c
bcftools-1.21/vcfquery.c
bcftools-1.21/config.h.in
bcftools-1.21/LICENSE
bcftools-1.21/bam2bcf_iaux.c
bcftools-1.21/vcfsort.c
bcftools-1.21/vcfplugin.c
bcftools-1.21/mpileup_bench/
bcftools-1.21/mpileup_bench/get_data.sh
bcftools-1.21/mpileup_bench/run_multi.sh
bcftools-1.21/mpileup_bench/run_mpileup.sh
bcftools-1.21/mpileup_bench/compare_vcf_simple.sh
bcftools-1.21/mpileup_bench/plot_isec.pl
bcftools-1.21/mpileup_bench/

bcftools-1.21/htslib-1.21/test/index.cram.crai
bcftools-1.21/htslib-1.21/test/test-bcf_set_variant_type.c
bcftools-1.21/htslib-1.21/test/formatmissing-out.vcf
bcftools-1.21/htslib-1.21/test/c1.fa
bcftools-1.21/htslib-1.21/test/thrash_threads1.c
bcftools-1.21/htslib-1.21/test/ce#unmap2.sam
bcftools-1.21/htslib-1.21/test/ce#2.sam
bcftools-1.21/htslib-1.21/test/range.out2
bcftools-1.21/htslib-1.21/test/xx#repeated.sam
bcftools-1.21/htslib-1.21/test/modhdr.vcf.gz.csi
bcftools-1.21/htslib-1.21/test/realn02_exp.sam
bcftools-1.21/htslib-1.21/test/ce#5b_java.cram
bcftools-1.21/htslib-1.21/test/bgziptest.txt.gz.gzi
bcftools-1.21/htslib-1.21/test/thrash_threads3.c
bcftools-1.21/htslib-1.21/test/test-vcf-api.out
bcftools-1.21/htslib-1.21/test/pileup_mod.c
bcftools-1.21/htslib-1.21/test/test-vcf-hdr-in.vcf
bcftools-1.21/htslib-1.21/test/index_dos.sam
bcftools-1.21/htslib-1.21/test/ce#5b.sam
bcftools-1.21/htslib-1.21/test/faidx/
bcftools-1.21/htslib-1.21/test/faidx/faidx.fa.expected.fai
bcftools-1.

bcftools-1.21/htslib-1.21/test/colons.bam.bai
bcftools-1.21/htslib-1.21/sam.c
bcftools-1.21/htslib-1.21/configure
bcftools-1.21/htslib-1.21/INSTALL
bcftools-1.21/htslib-1.21/bcf_sr_sort.c
bcftools-1.21/htslib-1.21/hts_expr.c
bcftools-1.21/htslib-1.21/multipart.c
bcftools-1.21/htslib-1.21/configure.ac
bcftools-1.21/htslib-1.21/region.c
bcftools-1.21/htslib-1.21/tabix.c
bcftools-1.21/htslib-1.21/sam.5
bcftools-1.21/htslib-1.21/hts_probe_cc.sh
bcftools-1.21/htslib-1.21/cram/
bcftools-1.21/htslib-1.21/cram/cram_io.h
bcftools-1.21/htslib-1.21/cram/open_trace_file.h
bcftools-1.21/htslib-1.21/cram/string_alloc.h
bcftools-1.21/htslib-1.21/cram/mFILE.c
bcftools-1.21/htslib-1.21/cram/cram_encode.c
bcftools-1.21/htslib-1.21/cram/cram_codecs.h
bcftools-1.21/htslib-1.21/cram/cram_external.c
bcftools-1.21/htslib-1.21/cram/string_alloc.c
bcftools-1.21/htslib-1.21/cram/open_trace_file.c
bcftools-1.21/htslib-1.21/cram/cram_samtools.h
bcftools-1.21/htslib-1.21/cram/README
bcftools-1.21/htslib-1.21/cram/

bcftools-1.21/htslib-1.21/htscodecs/tests/dat/arith/u32.4
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/arith/q4.0
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/arith/q8.0
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/arith/q8.64
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/arith/q4.65
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/arith/q4.1
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/arith/qvar.65
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/arith/q40+dir.64
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/arith/q8.128
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/u32
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/r4x8/
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/r4x8/q40+dir.0
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/r4x8/qvar.1
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/r4x8/q40+dir.1
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/r4x8/qvar.0
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/r4x8/q8.1
bcftools-1.21/htslib-1.21/htscodecs/tests/dat/r4x8/q4.0
bcftools-1.21/htslib-

bcftools-1.21/htslib-1.21/htscodecs/tests/names/rr.names
bcftools-1.21/htslib-1.21/htscodecs/tests/rans4x16.test
bcftools-1.21/htslib-1.21/htscodecs/tests/rANS_static4x16pr_test.c
bcftools-1.21/htslib-1.21/htscodecs/tests/rANS_static_fuzz.c
bcftools-1.21/htslib-1.21/htscodecs/tests/varint_test.c
bcftools-1.21/htslib-1.21/htscodecs/tests/entropy_fuzz.c
bcftools-1.21/htslib-1.21/htscodecs/LICENSE.md
bcftools-1.21/htslib-1.21/htscodecs/htscodecs/
bcftools-1.21/htslib-1.21/htscodecs/htscodecs/utils.c
bcftools-1.21/htslib-1.21/htscodecs/htscodecs/rANS_static4x16pr.c
bcftools-1.21/htslib-1.21/htscodecs/htscodecs/rANS_word.h
bcftools-1.21/htslib-1.21/htscodecs/htscodecs/arith_dynamic.c
bcftools-1.21/htslib-1.21/htscodecs/htscodecs/c_range_coder.h
bcftools-1.21/htslib-1.21/htscodecs/htscodecs/rANS_byte.h
bcftools-1.21/htslib-1.21/htscodecs/htscodecs/arith_dynamic.h
bcftools-1.21/htslib-1.21/htscodecs/htscodecs/htscodecs.h
bcftools-1.21/htslib-1.21/htscodecs/htscodecs/c_simple_model.h
bcftools-

bcftools-1.21/test/setGT.2.vcf
bcftools-1.21/test/query.66.out
bcftools-1.21/test/consensus.18.fa
bcftools-1.21/test/query.filter.9.vcf
bcftools-1.21/test/annotate14.vcf
bcftools-1.21/test/view.1.out
bcftools-1.21/test/consensus.20.fa
bcftools-1.21/test/view.GTsubset.NA1NA2NA3.out
bcftools-1.21/test/norm.split.2.vcf
bcftools-1.21/test/split-vep.4.out
bcftools-1.21/test/split-vep.11.out
bcftools-1.21/test/norm.merge.3.vcf
bcftools-1.21/test/query.smpl.2.out
bcftools-1.21/test/filter.13.vcf
bcftools-1.21/test/filter.39.out
bcftools-1.21/test/consensus5.fa
bcftools-1.21/test/gtcheck.4.out
bcftools-1.21/test/plugin-missing2ref.out.vcf
bcftools-1.21/test/csq.ychr.vcf
bcftools-1.21/test/annotate17.out
bcftools-1.21/test/view.2.out
bcftools-1.21/test/prune.1.vcf
bcftools-1.21/test/23andme.fa
bcftools-1.21/test/csq.3.out
bcftools-1.21/test/split.1.8.out
bcftools-1.21/test/split.grp.1.1.txt
bcftools-1.21/test/query.smpl.5.out
bcftools-1.21/test/annotate16.vcf
bcftools-1.21/test/mpileup.1.out
bc

bcftools-1.21/test/concat.3.d.vcf
bcftools-1.21/test/consensus20.3.out
bcftools-1.21/test/mpileup.cals.1.vcf
bcftools-1.21/test/merge.gvcf.2.c.vcf
bcftools-1.21/test/fill-tags.out
bcftools-1.21/test/annotate15.hdr
bcftools-1.21/test/annotate.olap.1.out
bcftools-1.21/test/fill-tags.2.out
bcftools-1.21/test/split-vep.13.out
bcftools-1.21/test/merge.gvcf.10.4.out
bcftools-1.21/test/split-vep.10.out
bcftools-1.21/test/merge.mrules.1.a.vcf
bcftools-1.21/test/filter.6.out
bcftools-1.21/test/mpileup.c.X.vcf
bcftools-1.21/test/norm.merge.2.out
bcftools-1.21/test/plugin1.vcf
bcftools-1.21/test/dosage.vcf
bcftools-1.21/test/query.27.out
bcftools-1.21/test/query.88.out
bcftools-1.21/test/mpileup.ped
bcftools-1.21/test/dosage.3.out
bcftools-1.21/test/consensus.14.out
bcftools-1.21/test/roh.1.vcf.gz.csi
bcftools-1.21/test/guess-ploidy.PL.out
bcftools-1.21/test/mpileup.4.tab
bcftools-1.21/test/norm.iupac.vcf
bcftools-1.21/test/query.39.out
bcftools-1.21/test/split-vep.25.out
bcftools-1.21/test/conse

bcftools-1.21/test/ex1.gtf.gz
bcftools-1.21/test/split-vep.gene-list.3.out
bcftools-1.21/test/isec.ab.flt.out
bcftools-1.21/test/view.PL.vcf
bcftools-1.21/test/trio-stats.2.out
bcftools-1.21/test/merge.gvcf.4.out
bcftools-1.21/test/gtcheck.6.vcf
bcftools-1.21/test/merge.gvcf.9b.vcf
bcftools-1.21/test/norm.3.vcf
bcftools-1.21/test/query.36.out
bcftools-1.21/test/overlap.0.out
bcftools-1.21/test/query.filter.11.vcf
bcftools-1.21/test/query.94.out
bcftools-1.21/test/query.header.vcf
bcftools-1.21/test/csq.nchr.gff
bcftools-1.21/test/norm.merge.4.vcf
bcftools-1.21/test/isec.a.vcf
bcftools-1.21/test/norm.phased-split.1.out
bcftools-1.21/test/query.74.out
bcftools-1.21/test/merge.LPL.b.vcf
bcftools-1.21/test/norm.join-missing-ploidy.vcf
bcftools-1.21/test/idx.1.out
bcftools-1.21/test/consensus22.3.out
bcftools-1.21/test/consensus.beyond.vcf
bcftools-1.21/test/annotate18.1.out
bcftools-1.21/test/query.vcf
bcftools-1.21/test/indel-stats.vcf
bcftools-1.21/test/query.72.out
bcftools-1.21/test/sc

bcftools-1.21/test/csq/ENST00000420670/start-stop-lost.vcf
bcftools-1.21/test/csq/ENST00000420670/segfault.txt
bcftools-1.21/test/csq/ENST00000420670/ENST00000420670.fa.fai
bcftools-1.21/test/csq/ENST00000545279/
bcftools-1.21/test/csq/ENST00000545279/splice-region-insert.vcf
bcftools-1.21/test/csq/ENST00000545279/ENST00000545279.fa.fai
bcftools-1.21/test/csq/ENST00000545279/ENST00000545279.gff
bcftools-1.21/test/csq/ENST00000545279/splice-region-insert.txt
bcftools-1.21/test/csq/ENST00000545279/ENST00000545279.fa
bcftools-1.21/test/csq/ENST00000423372/
bcftools-1.21/test/csq/ENST00000423372/insert3.vcf
bcftools-1.21/test/csq/ENST00000423372/snps.txt
bcftools-1.21/test/csq/ENST00000423372/ENST00000423372.gff
bcftools-1.21/test/csq/ENST00000423372/ENST00000423372.fa
bcftools-1.21/test/csq/ENST00000423372/insert3.txt
bcftools-1.21/test/csq/ENST00000423372/insert3.txt-l
bcftools-1.21/test/csq/ENST00000423372/snps.txt-l
bcftools-1.21/test/csq/ENST00000423372/ENST00000423372.fa.fai
bcftools

bcftools-1.21/test/csq/ENST00000520795/ENST00000520795.gff
bcftools-1.21/test/csq/ENST00000580206/
bcftools-1.21/test/csq/ENST00000580206/test.cmd.out
bcftools-1.21/test/csq/ENST00000580206/test.cmd
bcftools-1.21/test/csq/ENST00000580206/ENST00000580206.fa
bcftools-1.21/test/csq/ENST00000580206/ascii-art.txt
bcftools-1.21/test/csq/ENST00000580206/ENST00000580206.fa.fai
bcftools-1.21/test/csq/ENST00000580206/single.del-snv.vcf
bcftools-1.21/test/csq/ENST00000580206/compound.del-ins.vcf
bcftools-1.21/test/csq/ENST00000580206/single.ins-snv.vcf
bcftools-1.21/test/csq/ENST00000580206/single.del-ins.vcf
bcftools-1.21/test/csq/ENST00000580206/short.gff
bcftools-1.21/test/csq/ENST00000580206/compound.ins-snv.vcf
bcftools-1.21/test/csq/ENST00000580206/compound.del-snv.vcf
bcftools-1.21/test/csq/ENST00000580206/ENST00000580206.gff
bcftools-1.21/test/csq/ENST00000400151/
bcftools-1.21/test/csq/ENST00000400151/ENST00000400151.fa
bcftools-1.21/test/csq/ENST00000400151/ascii-art.txt
bcftools-1.21/t

bcftools-1.21/test/mpileup/mpileup.2.bam.bai
bcftools-1.21/test/mpileup/indel-AD.2.sam
bcftools-1.21/test/mpileup/mpileup.1.out
bcftools-1.21/test/mpileup/mpileup.1.cram
bcftools-1.21/test/mpileup/annot-NMBZ.1.bam.bai
bcftools-1.21/test/mpileup/mpileup-SCR.fa
bcftools-1.21/test/mpileup/mpileup.4.out
bcftools-1.21/test/mpileup/mpileup.2.bam
bcftools-1.21/test/mpileup/indel-AD.2.bam.bai
bcftools-1.21/test/mpileup/annot-NMBZ.2.fa
bcftools-1.21/test/mpileup/indel-AD.1.out
bcftools-1.21/test/mpileup/annot-NMBZ.1.fa.fai
bcftools-1.21/test/mpileup/annot-NMBZ.2.bam.bai
bcftools-1.21/test/mpileup/annot-NMBZ.1.1.out
bcftools-1.21/test/mpileup/indel-AD.2.cram.crai
bcftools-1.21/test/mpileup/mpileup.3.cram
bcftools-1.21/test/mpileup/mpileup.1.bam.bai
bcftools-1.21/test/mpileup/annot-NMBZ.3.fa
bcftools-1.21/test/mpileup/annot-NMBZ.3.1.out
bcftools-1.21/test/mpileup/mpileup.5.out
bcftools-1.21/test/mpileup/mpileup-SCR.out
bcftools-1.21/test/mpileup/mpileup.4.cram
bcftools-1.21/test/mpileup/annot-NMB

bcftools-1.21/bam_sample.h
bcftools-1.21/csq.c
bcftools-1.21/str_finder.c
bcftools-1.21/configure.ac
bcftools-1.21/ploidy.c
bcftools-1.21/ccall.c
bcftools-1.21/plugins/
bcftools-1.21/plugins/split.c
bcftools-1.21/plugins/fill-tags.c
bcftools-1.21/plugins/variant-distance.c
bcftools-1.21/plugins/remove-overlaps.c
bcftools-1.21/plugins/dosage.c
bcftools-1.21/plugins/isecGT.mk
bcftools-1.21/plugins/parental-origin.c
bcftools-1.21/plugins/scatter.c
bcftools-1.21/plugins/trio-switch-rate.c
bcftools-1.21/plugins/color-chrs.mk
bcftools-1.21/plugins/allele-length.c
bcftools-1.21/plugins/add-variantkey.c
bcftools-1.21/plugins/trio-dnm2.c
bcftools-1.21/plugins/frameshifts.c
bcftools-1.21/plugins/GTisec.c
bcftools-1.21/plugins/fixploidy.mk
bcftools-1.21/plugins/trio-stats.c
bcftools-1.21/plugins/GTisec.mk
bcftools-1.21/plugins/fixploidy.c
bcftools-1.21/plugins/counts.c
bcftools-1.21/plugins/fill-AN-AC.c
bcftools-1.21/plugins/GTsubset.mk
bcftools-1.21/plugins/check-ploidy.c
bcftools-1.21/plugins/i

In [44]:
!cd bcftools-1.21 && make

cd htslib-1.21 && make htslib.pc.tmp
make[1]: Entering directory '/home/jupyter/AoU_DRC_WGS_LongReads_PacBio/edit/bcftools-1.21/htslib-1.21'
echo '# Default htscodecs.mk generated by Makefile' > htscodecs.mk
echo 'include $(HTSPREFIX)htscodecs_bundled.mk' >> htscodecs.mk
./hts_probe_cc.sh 'gcc' '-g -Wall -O2 -fvisibility=hidden ' '-fvisibility=hidden' >> htscodecs.mk
sed -e '/^static_libs=/s/@static_LIBS@/-lz -lm -lbz2 -llzma -lcurl/;s#@[^-][^@]*@##g' htslib.pc.in > htslib.pc.tmp
make[1]: Leaving directory '/home/jupyter/AoU_DRC_WGS_LongReads_PacBio/edit/bcftools-1.21/htslib-1.21'
cd htslib-1.21 && make htslib_static.mk
make[1]: Entering directory '/home/jupyter/AoU_DRC_WGS_LongReads_PacBio/edit/bcftools-1.21/htslib-1.21'
sed -n '/^static_libs=/s/[^=]*=/HTSLIB_static_LIBS = /p;/^static_ldflags=/s/[^=]*=/HTSLIB_static_LDFLAGS = /p' htslib.pc.tmp > htslib_static.mk
make[1]: Leaving directory '/home/jupyter/AoU_DRC_WGS_LongReads_PacBio/edit/bcftools-1.21/htslib-1.21'
echo '/* Basic config

gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o synced_bcf_reader.o synced_bcf_reader.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o vcf_sweep.o vcf_sweep.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o tbx.o tbx.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o textutils.o textutils.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o thread_pool.o thread_pool.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o vcf.o vcf.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o vcfutils.o vcfutils.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o cram/cram_codecs.o cram/cram_codecs.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o cram/cram_decode.o cram/cram_decode.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o cram/cram_encode.o cram/cram_encode.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o cram/cram_external.o cram/cram_external.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o cram/cram_index.o cram/cram_index.c
gcc -g -Wall -O2 -fvisibility=hidden  -I.  -c -o cram/cram

gcc -fPIC -shared -g -Wall -O2 -I. -Ihtslib-1.21    -o plugins/split-vep.so version.c plugins/split-vep.c 
gcc -fPIC -shared -g -Wall -O2 -I. -Ihtslib-1.21    -o plugins/isecGT.so smpl_ilist.c version.c plugins/isecGT.c  
gcc -fPIC -shared -g -Wall -O2 -I. -Ihtslib-1.21    -o plugins/check-ploidy.so version.c plugins/check-ploidy.c 
gcc -fPIC -shared -g -Wall -O2 -I. -Ihtslib-1.21    -o plugins/counts.so version.c plugins/counts.c 
gcc -fPIC -shared -g -Wall -O2 -I. -Ihtslib-1.21    -o plugins/color-chrs.so HMM.c version.c plugins/color-chrs.c  
gcc -fPIC -shared -g -Wall -O2 -I. -Ihtslib-1.21    -o plugins/prune.so version.c plugins/prune.c 
gcc -fPIC -shared -g -Wall -O2 -I. -Ihtslib-1.21    -o plugins/contrast.so version.c plugins/contrast.c 
gcc -fPIC -shared -g -Wall -O2 -I. -Ihtslib-1.21    -o plugins/trio-switch-rate.so version.c plugins/trio-switch-rate.c 
gcc -fPIC -shared -g -Wall -O2 -I. -Ihtslib-1.21    -o plugins/missing2ref.so version.c plugins/missing2ref.c 
gcc -fPIC -s

In [45]:
!./bcftools-1.21/bcftools


Program: bcftools (Tools for variant calling and manipulating VCFs and BCFs)
Version: 1.21 (using htslib 1.21)

Usage:   bcftools [--version|--version-only] [--help] <command> <argument>

Commands:

 -- Indexing
    index        index VCF/BCF files

 -- VCF/BCF manipulation
    annotate     annotate and edit VCF/BCF files
    concat       concatenate VCF/BCF files from the same set of samples
    convert      convert VCF/BCF files to different formats and back
    head         view VCF/BCF file headers
    isec         intersections of VCF/BCF files
    merge        merge VCF/BCF files files from non-overlapping sample sets
    norm         left-align and normalize indels
    plugin       user-defined plugins
    query        transform VCF/BCF into user-defined formats
    reheader     modify VCF/BCF header, change sample names
    sort         sort VCF/BCF file
    view         VCF/BCF conversion, view, subset and filter VCF/BCF files

 -- VCF/BCF analysis
 

In [46]:
!./bcftools-1.21/bcftools view -S samples_990.txt --force-samples -o concat_annotated.sens_07.subset.vcf.gz -O z concat_annotated.sens_07.vcf.gz


In [47]:
!./bcftools-1.21/bcftools view -S samples_990.txt --force-samples -o concat_annotated.sens_09.subset.vcf.gz -O z concat_annotated.sens_09.vcf.gz


In [None]:
#!./bcftools-1.21/bcftools view -S samples_990.txt --force-samples -o concat_annotated.sens_09.subset.vcf.gz -O z concat_annotated.sens_09.vcf.gz


In [48]:
!ls -lh *.vcf.gz

-rw-rw-r-- 1 jupyter users 3.1G Mar 27 20:21 AoU_srWGS_SV_PhaseI.vcf.gz
-rw-rw-r-- 1 jupyter users 680M Mar 27 20:33 cohort_AoUSVPhaseII.v7.LRsamples.vcf.gz
-rw-rw-r-- 1 jupyter users 414M Mar 27 22:29 concat_annotated.sens_07.subset.vcf.gz
-rw-rw-r-- 1 jupyter users 452M Mar 27 20:35 concat_annotated.sens_07.vcf.gz
-rw-rw-r-- 1 jupyter users 668M Mar 27 22:42 concat_annotated.sens_09.subset.vcf.gz
-rw-rw-r-- 1 jupyter users 733M Mar 27 20:35 concat_annotated.sens_09.vcf.gz


In [98]:
sv_sr_vcf = 'cohort_AoUSVPhaseII.v7.LRsamples.vcf.gz'



In [102]:
if not os.path.exists("sv_sr_counts.json"):
    sv_sr_counts = count_svtype_per_sample(sv_sr_vcf, passonly=True)

    # Write to a JSON file
    with open('sv_sr_counts.json', 'w') as json_file:
        json.dump(sv_sr_counts, json_file, indent=4)

with open('sv_sr_counts.json', 'r') as json_file:
    sv_sr_counts = json.load(json_file)

[E::idx_find_and_load] Could not retrieve index file for 'cohort_AoUSVPhaseII.v7.LRsamples.vcf.gz'

Processing VCF records:   0%|          | 0/355612 [00:00<?, ?it/s]



In [103]:
sv_sr_counts



{'1000151': {'INS': 4497, 'DUP': 2901, 'DEL': 3889, 'CPX': 53, 'INV': 12},
 '1000513': {'INS': 4772, 'DUP': 2839, 'DEL': 3802, 'CPX': 47, 'INV': 10},
 '1001980': {'INS': 3896, 'DUP': 2465, 'DEL': 3559, 'CPX': 38, 'INV': 9},
 '1008775': {'INS': 4175, 'DUP': 2520, 'DEL': 3746, 'CPX': 58, 'INV': 8},
 '1013536': {'INS': 4633, 'DEL': 3672, 'DUP': 2847, 'CPX': 56, 'INV': 9},
 '1014823': {'INS': 4142, 'DEL': 3718, 'DUP': 2347, 'CPX': 42, 'INV': 12},
 '1016985': {'INS': 4410, 'DUP': 2522, 'DEL': 3863, 'CPX': 49, 'INV': 14},
 '1024761': {'INS': 4468, 'DUP': 2920, 'DEL': 3902, 'CPX': 53, 'INV': 8},
 '1026351': {'INS': 4172, 'DEL': 3879, 'DUP': 2601, 'CPX': 49, 'INV': 9},
 '1026529': {'INS': 4558, 'DUP': 3199, 'DEL': 3896, 'CPX': 58, 'INV': 9},
 '1029520': {'INS': 4480, 'DEL': 3884, 'DUP': 2634, 'CPX': 46, 'INV': 10},
 '1036042': {'INS': 4805, 'DEL': 3936, 'DUP': 3057, 'CPX': 69, 'INV': 11},
 '1037774': {'INS': 4161, 'DEL': 3717, 'DUP': 2652, 'CPX': 57, 'INV': 12},
 '1046956': {'INS': 4008, 'DUP'



In [104]:
!gsutil cp sv_sr_counts.json $WORKSPACE_BUCKET/scratch/kvg/



Copying file://sv_sr_counts.json [Content-Type=application/json]...
/ [0 files][    0.0 B/119.6 KiB]                                                



/ [1 files][119.6 KiB/119.6 KiB]                                                -
Operation completed over 1 objects/119.6 KiB.                                    


[Stage 3:>                                                          (0 + 1) / 1]

## Download Sniffles2 calls for HPRC samples

In [2]:
!gcloud config configurations activate second-user
!gcloud auth login --brief

[1;31mERROR:[0m (gcloud.config.configurations.activate) Cannot activate configuration [second-user], it does not exist.

You are running on a Google Compute Engine virtual machine.
It is recommended that you use service accounts for authentication.

You can run:

  $ gcloud config set account `ACCOUNT`

to switch accounts if necessary.

Your credentials may be visible to others with access to this
virtual machine. Are you sure you want to authenticate with
your personal account?

Do you want to continue (Y/n)?  ^C


Command killed by keyboard interrupt



In [1]:
!gsutil ls -lh gs://fc-a90ab401-9c4b-43d1-b891-f0410c667ff2/submissions/d036c6db-4ab3-49e7-b9a4-c4d07e9b26d8/FilterLength/8eead21b-d941-405e-98c4-f50cc00f83d0/call-FilterImpl/hapestry/HG002_filtered.vcf.gz
    

AccessDeniedException: 403 pet-2657124278799dbb18d95@terra-7a376e4e.iam.gserviceaccount.com does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).


## Download Sniffles2 calls for 50 ONT samples

In [51]:
sniffles_vcfs = !gsutil ls gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/*.sniffles.vcf.gz

In [66]:
sv_ont_counts = {}

for sniffles_vcf in sniffles_vcfs:
    bn = re.sub(".sniffles.vcf.gz", "", os.path.basename(sniffles_vcf))
    
    if bn in common_samples_1027:
        !gsutil cp {sniffles_vcf} .
        sv_ont_sample_counts = count_svtype_per_sample(f'{bn}.sniffles.vcf.gz')
        sv_ont_counts[bn] = sv_ont_sample_counts[bn]

sv_ont_counts

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1000513.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1000513.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29291 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1000920.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1000920.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29402 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1001399.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1001399.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29477 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1001980.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1001980.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/28519 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1002322.sniffles.vcf.gz...
/ [1 files][  2.8 MiB/  2.8 MiB]                                                
Operation completed over 1 objects/2.8 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1002322.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29415 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1002826.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1002826.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29590 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1004266.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1004266.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29864 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1005038.sniffles.vcf.gz...
/ [1 files][  3.0 MiB/  3.0 MiB]                                                
Operation completed over 1 objects/3.0 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1005038.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29571 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1005444.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1005444.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29295 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1005938.sniffles.vcf.gz...
/ [1 files][  2.8 MiB/  2.8 MiB]                                                
Operation completed over 1 objects/2.8 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1005938.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/28197 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1007198.sniffles.vcf.gz...
/ [1 files][  3.0 MiB/  3.0 MiB]                                                
Operation completed over 1 objects/3.0 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1007198.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29604 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1008775.sniffles.vcf.gz...
/ [1 files][  2.8 MiB/  2.8 MiB]                                                
Operation completed over 1 objects/2.8 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1008775.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29123 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1010384.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1010384.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29497 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1012440.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1012440.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29055 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1012736.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1012736.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/28911 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1013536.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1013536.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29367 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1014457.sniffles.vcf.gz...
/ [1 files][  3.0 MiB/  3.0 MiB]                                                
Operation completed over 1 objects/3.0 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1014457.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/30098 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1014625.sniffles.vcf.gz...
/ [1 files][  2.8 MiB/  2.8 MiB]                                                
Operation completed over 1 objects/2.8 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1014625.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29384 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1014694.sniffles.vcf.gz...
/ [1 files][  3.0 MiB/  3.0 MiB]                                                
Operation completed over 1 objects/3.0 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1014694.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29727 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1014764.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1014764.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29321 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1014823.sniffles.vcf.gz...
/ [1 files][  2.8 MiB/  2.8 MiB]                                                
Operation completed over 1 objects/2.8 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1014823.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/28295 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1015059.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1015059.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29686 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1015507.sniffles.vcf.gz...
/ [1 files][  3.0 MiB/  3.0 MiB]                                                
Operation completed over 1 objects/3.0 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1015507.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29583 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1016971.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1016971.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29090 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1016985.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1016985.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29580 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1019345.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1019345.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29447 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1024761.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1024761.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29481 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1025136.sniffles.vcf.gz...
/ [1 files][  2.8 MiB/  2.8 MiB]                                                
Operation completed over 1 objects/2.8 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1025136.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/28915 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1025342.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1025342.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29285 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1025566.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1025566.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29747 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1025694.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1025694.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29802 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1026351.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1026351.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29443 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1026529.sniffles.vcf.gz...
/ [1 files][  3.0 MiB/  3.0 MiB]                                                
Operation completed over 1 objects/3.0 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1026529.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29888 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1026622.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1026622.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29472 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1027488.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1027488.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29971 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1027673.sniffles.vcf.gz...
/ [1 files][  3.0 MiB/  3.0 MiB]                                                
Operation completed over 1 objects/3.0 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1027673.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29880 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1029520.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1029520.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29747 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1029873.sniffles.vcf.gz...
/ [1 files][  2.8 MiB/  2.8 MiB]                                                
Operation completed over 1 objects/2.8 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1029873.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/28057 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1032052.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1032052.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29762 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1032684.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1032684.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29673 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1036042.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1036042.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29409 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1037292.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1037292.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29312 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1037774.sniffles.vcf.gz...
/ [1 files][  3.0 MiB/  3.0 MiB]                                                
Operation completed over 1 objects/3.0 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1037774.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29515 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1037790.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1037790.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29745 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1037792.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1037792.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29765 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1037950.sniffles.vcf.gz...
/ [1 files][  2.8 MiB/  2.8 MiB]                                                
Operation completed over 1 objects/2.8 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1037950.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29610 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1041753.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1041753.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29406 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1042609.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1042609.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29233 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1044452.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1044452.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29254 [00:00<?, ?it/s]

Copying gs://fc-secure-8f7d6a20-04ce-40d7-8c88-aececeac3e09/ONT/terra-eab2de06/outputs/GRCh38/variants/sv/1048940.sniffles.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      


[E::idx_find_and_load] Could not retrieve index file for '1048940.sniffles.vcf.gz'


Processing VCF records:   0%|          | 0/29500 [00:00<?, ?it/s]

{'1000513': defaultdict(int,
             {'INS': 15424, 'DEL': 11817, 'BND': 76, 'DUP': 9, 'INV': 32}),
 '1000920': defaultdict(int,
             {'DEL': 11829, 'INS': 15554, 'BND': 74, 'DUP': 11, 'INV': 31}),
 '1001399': defaultdict(int,
             {'INS': 15434, 'DEL': 11967, 'BND': 87, 'INV': 37, 'DUP': 8}),
 '1001980': defaultdict(int,
             {'INS': 14968, 'DEL': 11417, 'BND': 100, 'INV': 29, 'DUP': 8}),
 '1002322': defaultdict(int,
             {'DEL': 11909, 'INS': 15532, 'BND': 94, 'INV': 26, 'DUP': 6}),
 '1002826': defaultdict(int,
             {'INS': 15547, 'DEL': 12039, 'BND': 93, 'INV': 42, 'DUP': 10}),
 '1004266': defaultdict(int,
             {'INS': 15677, 'DEL': 12133, 'BND': 86, 'INV': 27, 'DUP': 11}),
 '1005038': defaultdict(int,
             {'INS': 15641, 'DEL': 11807, 'BND': 71, 'INV': 27, 'DUP': 10}),
 '1005444': defaultdict(int,
             {'DEL': 11818, 'INS': 15348, 'BND': 87, 'INV': 45, 'DUP': 12}),
 '1005938': defaultdict(int,
             {'INS':

In [68]:
if not os.path.exists("sv_ont_counts.json"):
    # Write to a JSON file
    with open('sv_ont_counts.json', 'w') as json_file:
        json.dump(sv_ont_counts, json_file, indent=4)

with open('sv_ont_counts.json', 'r') as json_file:
    sv_ont_counts = json.load(json_file)

In [72]:
!gsutil cp sv_ont_counts.json $WORKSPACE_BUCKET/scratch/kvg/

Copying file://sv_ont_counts.json [Content-Type=application/json]...
/ [1 files][  6.0 KiB/  6.0 KiB]                                                
Operation completed over 1 objects/6.0 KiB.                                      


In [69]:
sv_ont_counts

{'1000513': {'INS': 15424, 'DEL': 11817, 'BND': 76, 'DUP': 9, 'INV': 32},
 '1000920': {'DEL': 11829, 'INS': 15554, 'BND': 74, 'DUP': 11, 'INV': 31},
 '1001399': {'INS': 15434, 'DEL': 11967, 'BND': 87, 'INV': 37, 'DUP': 8},
 '1001980': {'INS': 14968, 'DEL': 11417, 'BND': 100, 'INV': 29, 'DUP': 8},
 '1002322': {'DEL': 11909, 'INS': 15532, 'BND': 94, 'INV': 26, 'DUP': 6},
 '1002826': {'INS': 15547, 'DEL': 12039, 'BND': 93, 'INV': 42, 'DUP': 10},
 '1004266': {'INS': 15677, 'DEL': 12133, 'BND': 86, 'INV': 27, 'DUP': 11},
 '1005038': {'INS': 15641, 'DEL': 11807, 'BND': 71, 'INV': 27, 'DUP': 10},
 '1005444': {'DEL': 11818, 'INS': 15348, 'BND': 87, 'INV': 45, 'DUP': 12},
 '1005938': {'INS': 14787, 'DEL': 11311, 'BND': 71, 'DUP': 14, 'INV': 34},
 '1007198': {'INS': 15561, 'DEL': 11933, 'BND': 94, 'INV': 34, 'DUP': 17},
 '1008775': {'INS': 15390, 'DEL': 11745, 'INV': 31, 'DUP': 12, 'BND': 86},
 '1010384': {'INS': 15450, 'DEL': 11971, 'BND': 90, 'INV': 36, 'DUP': 13},
 '1012440': {'INS': 15412, '

## Initialize Hail

In [32]:
#spark_conf_more_ram = dict()
#spark_conf_more_ram["spark.executor.memory"] = "8g"
#spark_conf_more_ram["spark.driver.memory"] = "196g"

#hl.init(idempotent=True, spark_conf=spark_conf_more_ram)

hl.init(idempotent=True)


Reading spark-defaults.conf to determine GCS requester pays configuration. This is deprecated. Please use `hailctl config set gcs_requester_pays/project` and `hailctl config set gcs_requester_pays/buckets`.

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.3.0
SparkUI available at http://saturn-f75e1fa5-6fbc-4dc6-ae19-602e6c4dd082-m.us-central1-c.c.terra-7a376e4e.internal:36305
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.130.post1-c69cd67afb8b
LOGGING: writing to /home/jupyter/AoU_DRC_WGS_LongReads_PacBio/edit/hail-20250328-0434-0.2.130.post1-c69cd67afb8b.log


In [33]:
hl.default_reference('GRCh38')

## Load Hail MT subset to samples with long reads

In [34]:
mt_1027 = hl.read_matrix_table(f'{bucket}/scratch/kvg/srs-subset.1027.mt')

In [35]:
mt_qc_1027 = hl.sample_qc(mt_1027)

In [36]:
mt_qc_1027.describe()

----------------------------------------
Global fields:
    'tranche_data': array<struct {
        model: str, 
        truth_sensitivity: float64, 
        min_vqslod: float64, 
        filter_name: str
    }>
    'truth_sensitivity_snp_threshold': float64
    'truth_sensitivity_indel_threshold': float64
    'snp_vqslod_threshold': float64
    'indel_vqslod_threshold': float64
----------------------------------------
Column fields:
    's': str
    'sample_qc': struct {
        gq_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        call_rate: float64, 
        n_called: int64, 
        n_not_called: int64, 
        n_filtered: int64, 
        n_hom_ref: int64, 
        n_het: int64, 
        n_hom_var: int64, 
        n_non_ref: int64, 
        n_singleton: int64, 
        n_snp: int64, 
        n_insertion: int64, 
        n_deletion: int64, 
        n_transition: int64, 
        n_transversi

In [37]:
mt_qc_1027.cols().show()

2025-03-28 04:36:08.552 Hail: WARN: cols(): Resulting column table is sorted by 'col_key'.
    To preserve matrix table column order, first unkey columns with 'key_cols_by()'
[Stage 3:>                                                          (0 + 1) / 1]

Unnamed: 0_level_0,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc
Unnamed: 0_level_1,gq_stats,gq_stats,gq_stats,gq_stats,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
s,mean,stdev,min,max,call_rate,n_called,n_not_called,n_filtered,n_hom_ref,n_het,n_hom_var,n_non_ref,n_singleton,n_snp,n_insertion,n_deletion,n_transition,n_transversion,n_star,r_ti_tv,r_het_hom_var,r_insertion_deletion
str,float64,float64,float64,float64,float64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,float64,float64,float64
"""1000151""",38.5,13.8,0.0,99.0,0.994,72716066,0,445914,66762013,4097921,1856132,5954053,30091,6355492,792269,793474,4241709,2113783,0,2.01,2.21,0.998
"""1000513""",37.9,13.2,0.0,99.0,0.994,72731515,0,430465,66875438,4076768,1779309,5856077,29923,6211279,777158,774227,4147162,2064117,0,2.01,2.29,1.0
"""1000920""",38.5,13.8,0.0,99.0,0.994,72718025,0,443955,66785718,4111733,1820574,5932307,28180,6304855,791619,789904,4211349,2093506,0,2.01,2.26,1.0
"""1001399""",38.8,13.6,0.0,99.0,0.994,72714732,0,447248,66797836,3996124,1920772,5916896,27806,6368180,800807,799820,4251516,2116664,0,2.01,2.08,1.0
"""1001980""",38.3,13.4,0.0,99.0,0.994,72746269,0,415711,67217496,3725614,1803159,5528773,36753,5946433,751562,747736,3970485,1975948,0,2.01,2.07,1.01
"""1002322""",38.0,13.4,0.0,99.0,0.994,72719932,0,442048,66828626,4026064,1865242,5891306,28609,6305735,790262,790175,4213357,2092378,0,2.01,2.16,1.0
"""1002826""",38.7,13.8,0.0,99.0,0.994,72712771,0,449209,66752399,4121247,1839125,5960372,32002,6342616,796135,796047,4233637,2108979,0,2.01,2.24,1.0
"""1004266""",38.4,13.7,0.0,99.0,0.994,72716202,0,445778,66724227,4108162,1883813,5991975,32626,6407634,800916,802674,4277358,2130276,0,2.01,2.18,0.998
"""1005038""",38.7,13.6,0.0,99.0,0.994,72716615,0,445365,66807892,4123780,1784943,5908723,28700,6257487,786112,783431,4174980,2082507,0,2.0,2.31,1.0
"""1005444""",38.4,13.5,0.0,99.0,0.994,72729426,0,432554,66994011,3874298,1861117,5735415,49216,6164321,777891,775620,4117269,2047052,0,2.01,2.08,1.0


In [38]:
mt_qc_1027.aggregate_cols(hl.agg.stats(mt_qc_1027.sample_qc.r_ti_tv))



Struct(mean=2.0096927696262306, stdev=0.0026847729639420154, min=2.0012562412889787, max=2.0173091551027644, n=1027, sum=2063.9544744061386)

In [None]:
stats = mt_qc_1027.aggregate_cols(hl.agg.stats(mt_qc_1027.sample_qc.r_ti_tv))
print(f"Mean r_ti_tv: {stats['mean']:.4f}")
print(f"Standard deviation: {stats['stdev']:.4f}")

In [30]:
snv_count_1027 = mt_1027.filter_rows(hl.is_snp(mt_1027.alleles[0], mt_1027.alleles[1])).count_rows()
insertion_count_1027 = mt_1027.filter_rows(hl.is_insertion(mt_1027.alleles[0], mt_1027.alleles[1])).count_rows()
deletion_count_1027 = mt_1027.filter_rows(hl.is_deletion(mt_1027.alleles[0], mt_1027.alleles[1])).count_rows()



In [None]:
mt_qc_1027.cols()

In [29]:
print(f'# SNV: {snv_count_1027}')
print(f'# INS: {insertion_count_1027}')
print(f'# DEL: {deletion_count_1027}')

# SNV: 63821008
# INS: 3194211
# DEL: 6146761


## Load Hail MT subset to samples with long reads with GATK-SV calls

In [39]:
mt_990 = hl.read_matrix_table(f'{bucket}/scratch/kvg/srs-subset.990.mt')

In [40]:
mt_qc_990 = hl.sample_qc(mt_990)

In [41]:
mt_qc_990.describe()

----------------------------------------
Global fields:
    'tranche_data': array<struct {
        model: str, 
        truth_sensitivity: float64, 
        min_vqslod: float64, 
        filter_name: str
    }>
    'truth_sensitivity_snp_threshold': float64
    'truth_sensitivity_indel_threshold': float64
    'snp_vqslod_threshold': float64
    'indel_vqslod_threshold': float64
----------------------------------------
Column fields:
    's': str
    'sample_qc': struct {
        gq_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        call_rate: float64, 
        n_called: int64, 
        n_not_called: int64, 
        n_filtered: int64, 
        n_hom_ref: int64, 
        n_het: int64, 
        n_hom_var: int64, 
        n_non_ref: int64, 
        n_singleton: int64, 
        n_snp: int64, 
        n_insertion: int64, 
        n_deletion: int64, 
        n_transition: int64, 
        n_transversi

In [42]:
mt_qc_990.cols().show()

[Stage 10:>                                                         (0 + 1) / 1]

Unnamed: 0_level_0,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc
Unnamed: 0_level_1,gq_stats,gq_stats,gq_stats,gq_stats,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
s,mean,stdev,min,max,call_rate,n_called,n_not_called,n_filtered,n_hom_ref,n_het,n_hom_var,n_non_ref,n_singleton,n_snp,n_insertion,n_deletion,n_transition,n_transversion,n_star,r_ti_tv,r_het_hom_var,r_insertion_deletion
str,float64,float64,float64,float64,float64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,float64,float64,float64
"""1000151""",38.5,13.9,0.0,99.0,0.994,71688299,0,442259,65734246,4097921,1856132,5954053,30631,6355492,792269,793474,4241709,2113783,0,2.01,2.21,0.998
"""1000513""",38.0,13.2,0.0,99.0,0.994,71703647,0,426911,65847570,4076768,1779309,5856077,30487,6211279,777158,774227,4147162,2064117,0,2.01,2.29,1.0
"""1000920""",38.5,13.8,0.0,99.0,0.994,71690324,0,440234,65758017,4111733,1820574,5932307,28636,6304855,791619,789904,4211349,2093506,0,2.01,2.26,1.0
"""1001399""",38.8,13.7,0.0,99.0,0.994,71687046,0,443512,65770150,3996124,1920772,5916896,28384,6368180,800807,799820,4251516,2116664,0,2.01,2.08,1.0
"""1001980""",38.3,13.5,0.0,99.0,0.994,71718274,0,412284,66189501,3725614,1803159,5528773,37208,5946433,751562,747736,3970485,1975948,0,2.01,2.07,1.01
"""1002322""",38.0,13.4,0.0,99.0,0.994,71692130,0,438428,65800824,4026064,1865242,5891306,29080,6305735,790262,790175,4213357,2092378,0,2.01,2.16,1.0
"""1002826""",38.7,13.8,0.0,99.0,0.994,71685077,0,445481,65724705,4121247,1839125,5960372,32533,6342616,796135,796047,4233637,2108979,0,2.01,2.24,1.0
"""1004266""",38.4,13.7,0.0,99.0,0.994,71688426,0,442132,65696451,4108162,1883813,5991975,33104,6407634,800916,802674,4277358,2130276,0,2.01,2.18,0.998
"""1005038""",38.7,13.7,0.0,99.0,0.994,71688874,0,441684,65780151,4123780,1784943,5908723,29133,6257487,786112,783431,4174980,2082507,0,2.0,2.31,1.0
"""1005444""",38.4,13.6,0.0,99.0,0.994,71701598,0,428960,65966183,3874298,1861117,5735415,49661,6164321,777891,775620,4117269,2047052,0,2.01,2.08,1.0


In [44]:
snv_count_990 = mt_990.filter_rows(hl.is_snp(mt_990.alleles[0], mt_990.alleles[1])).count_rows()
insertion_count_990 = mt_990.filter_rows(hl.is_insertion(mt_990.alleles[0], mt_990.alleles[1])).count_rows()
deletion_count_990 = mt_990.filter_rows(hl.is_deletion(mt_990.alleles[0], mt_990.alleles[1])).count_rows()



In [45]:
print(f'# SNV: {snv_count_990}')
print(f'# INS: {insertion_count_990}')
print(f'# DEL: {deletion_count_990}')

# SNV: 62895482
# INS: 3161161
# DEL: 6073915


In [47]:
mt_qc_990 = hl.sample_qc(mt_990)

In [48]:
mt_qc_990.describe()

----------------------------------------
Global fields:
    'tranche_data': array<struct {
        model: str, 
        truth_sensitivity: float64, 
        min_vqslod: float64, 
        filter_name: str
    }>
    'truth_sensitivity_snp_threshold': float64
    'truth_sensitivity_indel_threshold': float64
    'snp_vqslod_threshold': float64
    'indel_vqslod_threshold': float64
----------------------------------------
Column fields:
    's': str
    'sample_qc': struct {
        gq_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        call_rate: float64, 
        n_called: int64, 
        n_not_called: int64, 
        n_filtered: int64, 
        n_hom_ref: int64, 
        n_het: int64, 
        n_hom_var: int64, 
        n_non_ref: int64, 
        n_singleton: int64, 
        n_snp: int64, 
        n_insertion: int64, 
        n_deletion: int64, 
        n_transition: int64, 
        n_transversi

In [50]:
mt_qc_990.aggregate_cols(hl.agg.stats(mt_qc_990.sample_qc.r_ti_tv))



Struct(mean=2.0096745497622273, stdev=0.002684913051356153, min=2.0012562412889787, max=2.0173091551027644, n=990, sum=1989.577804264605)

In [51]:
mt_qc_990.aggregate_cols(hl.agg.stats(mt_qc_990.sample_qc.r_het_hom_var))



Struct(mean=2.183973992295268, stdev=0.10190033852555179, min=1.4475272934819965, max=2.588171391746939, n=990, sum=2162.1342523723156)

## Load Hail MT on 1027 long read samples

In [52]:
lr_mt = hl.read_matrix_table('gs://fc-secure-f7d80b48-be60-426f-aa6b-f037a1bf7f34/outputs/T2T/JointCallGVCFs/cohort_for_GLNexus_2023Q1_1027/cohort_for_GLNexus_2023Q1_1027.mt')

In [53]:
lr_qc_mt = hl.sample_qc(lr_mt)

In [54]:
lr_qc_mt.describe()

----------------------------------------
Global fields:
    None
----------------------------------------
Column fields:
    's': str
    'sample_qc': struct {
        dp_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        gq_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        call_rate: float64, 
        n_called: int64, 
        n_not_called: int64, 
        n_filtered: int64, 
        n_hom_ref: int64, 
        n_het: int64, 
        n_hom_var: int64, 
        n_non_ref: int64, 
        n_singleton: int64, 
        n_snp: int64, 
        n_insertion: int64, 
        n_deletion: int64, 
        n_transition: int64, 
        n_transversion: int64, 
        n_star: int64, 
        r_ti_tv: float64, 
        r_het_hom_var: float64, 
        r_insertion_deletion: float64
    }
----------------------------

In [55]:
lr_qc_mt.cols().show()



Unnamed: 0_level_0,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc
Unnamed: 0_level_1,dp_stats,dp_stats,dp_stats,dp_stats,gq_stats,gq_stats,gq_stats,gq_stats,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
s,mean,stdev,min,max,mean,stdev,min,max,call_rate,n_called,n_not_called,n_filtered,n_hom_ref,n_het,n_hom_var,n_non_ref,n_singleton,n_snp,n_insertion,n_deletion,n_transition,n_transversion,n_star,r_ti_tv,r_het_hom_var,r_insertion_deletion
str,float64,float64,float64,float64,float64,float64,float64,float64,float64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,float64,float64,float64
"""1000151""",0.674,2.45,0.0,177.0,4.07,11.9,0.0,99.0,0.952,71016388,3566152,0,65449207,3646603,1920578,5567181,25222,6314769,569662,604436,4142451,2172318,0,1.91,1.9,0.942
"""1000513""",0.585,2.21,0.0,129.0,3.77,11.2,0.0,99.0,0.95,70835528,3747012,0,65572011,3454265,1809252,5263517,23816,5976990,532695,555708,3927140,2049850,0,1.92,1.91,0.959
"""1000920""",0.689,2.48,0.0,80.0,4.1,11.9,0.0,99.0,0.951,70902787,3679753,0,65287170,3725514,1890103,5615617,25312,6344156,585688,575779,4157775,2186381,0,1.9,1.97,1.02
"""1001399""",0.576,2.2,0.0,133.0,3.72,11.0,0.0,99.0,0.951,70953943,3628597,0,65646738,3362528,1944677,5307205,23810,6127464,569358,545175,4021541,2105923,0,1.91,1.73,1.04
"""1001980""",0.613,2.35,0.0,99.0,3.7,11.1,0.0,99.0,0.95,70820558,3761982,0,65754296,3367319,1698943,5066262,39749,5728087,516796,510707,3734518,1993569,0,1.87,1.98,1.01
"""1002322""",0.745,2.67,0.0,88.0,4.23,12.3,0.0,99.0,0.954,71150954,3431586,0,65497021,3740993,1912940,5653933,25811,6370796,587514,614908,4187665,2183131,0,1.92,1.96,0.955
"""1002826""",0.7,2.52,0.0,92.0,4.17,12.1,0.0,99.0,0.955,71227604,3354936,0,65595891,3730705,1901008,5631713,28471,6337105,596832,601786,4159753,2177352,0,1.91,1.96,0.992
"""1004266""",0.638,2.34,0.0,134.0,4.0,11.7,0.0,99.0,0.952,70984009,3598531,0,65468440,3571908,1943661,5515569,28443,6286093,573038,597599,4131507,2154586,0,1.92,1.84,0.959
"""1005038""",0.644,2.37,0.0,160.0,3.95,11.6,0.0,99.0,0.949,70784532,3798008,0,65351380,3619817,1813335,5433152,24817,6141901,558631,542880,4028031,2113870,0,1.91,2.0,1.03
"""1005444""",0.665,2.48,0.0,188.0,3.91,11.6,0.0,99.0,0.951,70922020,3660520,0,65530472,3527068,1864480,5391548,44251,6126240,549421,577428,4017345,2108895,0,1.9,1.89,0.951


In [56]:
snv_count_lr = lr_mt.filter_rows(hl.is_snp(lr_mt.alleles[0], lr_mt.alleles[1])).count_rows()
insertion_count_lr = lr_mt.filter_rows(hl.is_insertion(lr_mt.alleles[0], lr_mt.alleles[1])).count_rows()
deletion_count_lr = lr_mt.filter_rows(hl.is_deletion(lr_mt.alleles[0], lr_mt.alleles[1])).count_rows()



In [57]:
print(f'# SNV: {snv_count_lr}')
print(f'# DEL: {deletion_count_lr}')
print(f'# INS: {insertion_count_lr}')

# SNV: 66467539
# DEL: 4714659
# INS: 3212616


In [67]:
lr_qc_mt.aggregate_cols(hl.agg.stats(lr_qc_mt.sample_qc.r_ti_tv))



Struct(mean=1.90710303237283, stdev=0.011946641255821865, min=1.8365416835172454, max=1.938882514963198, n=1027, sum=1958.5948142468965)

In [68]:
lr_qc_mt.aggregate_cols(hl.agg.stats(lr_qc_mt.sample_qc.r_het_hom_var))



Struct(mean=1.9231303458717768, stdev=0.13736295020533787, min=1.2990189471593447, max=2.520075331691657, n=1027, sum=1975.0548652103148)

## Subset long read Hail MT to samples with GATK-SV calls

In [58]:
if not mt_exists(f'{bucket}/scratch/kvg/lrs-subset.990.mt') or True:
    lr_samples_990 = hl.literal(set(common_samples_990))
    lr_subset_mt = lr_mt.filter_cols(lr_samples_990.contains(lr_mt.s))
    lr_subset_mt = lr_subset_mt.filter_rows(hl.agg.any(lr_subset_mt.GT.is_non_ref()))

    lr_subset_mt.write(f'{bucket}/scratch/kvg/lrs-subset.990.mt', overwrite=True)
    
lr_subset_mt = hl.read_matrix_table(f'{bucket}/scratch/kvg/lrs-subset.990.mt')

2025-03-28 09:36:51.334 Hail: INFO: wrote matrix table with 72847914 rows and 990 columns in 490 partitions to gs://fc-secure-f7d80b48-be60-426f-aa6b-f037a1bf7f34/scratch/kvg/lrs-subset.990.mt


In [59]:
len(lr_subset_mt.s.collect())

990

In [60]:
lr_subset_qc_mt = hl.sample_qc(lr_subset_mt)

In [61]:
lr_subset_qc_mt.describe()

----------------------------------------
Global fields:
    None
----------------------------------------
Column fields:
    's': str
    'sample_qc': struct {
        dp_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        gq_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        call_rate: float64, 
        n_called: int64, 
        n_not_called: int64, 
        n_filtered: int64, 
        n_hom_ref: int64, 
        n_het: int64, 
        n_hom_var: int64, 
        n_non_ref: int64, 
        n_singleton: int64, 
        n_snp: int64, 
        n_insertion: int64, 
        n_deletion: int64, 
        n_transition: int64, 
        n_transversion: int64, 
        n_star: int64, 
        r_ti_tv: float64, 
        r_het_hom_var: float64, 
        r_insertion_deletion: float64
    }
----------------------------

In [62]:
lr_subset_qc_mt.cols().show()

[Stage 29:>                                                         (0 + 1) / 1]

Unnamed: 0_level_0,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc,sample_qc
Unnamed: 0_level_1,dp_stats,dp_stats,dp_stats,dp_stats,gq_stats,gq_stats,gq_stats,gq_stats,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
s,mean,stdev,min,max,mean,stdev,min,max,call_rate,n_called,n_not_called,n_filtered,n_hom_ref,n_het,n_hom_var,n_non_ref,n_singleton,n_snp,n_insertion,n_deletion,n_transition,n_transversion,n_star,r_ti_tv,r_het_hom_var,r_insertion_deletion
str,float64,float64,float64,float64,float64,float64,float64,float64,float64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,float64,float64,float64
"""1000151""",0.689,2.47,0.0,177.0,4.14,12.0,0.0,99.0,0.962,70049385,2798529,0,64482204,3646603,1920578,5567181,25844,6314769,569662,604436,4142451,2172318,0,1.91,1.9,0.942
"""1000513""",0.598,2.23,0.0,129.0,3.84,11.3,0.0,99.0,0.959,69868927,2978987,0,64605410,3454265,1809252,5263517,24314,5976990,532695,555708,3927140,2049850,0,1.92,1.91,0.959
"""1000920""",0.704,2.51,0.0,80.0,4.17,12.1,0.0,99.0,0.96,69936848,2911066,0,64321231,3725514,1890103,5615617,25757,6344156,585688,575779,4157775,2186381,0,1.9,1.97,1.02
"""1001399""",0.589,2.22,0.0,133.0,3.79,11.2,0.0,99.0,0.961,69985557,2862357,0,64678352,3362528,1944677,5307205,24432,6127464,569358,545175,4021541,2105923,0,1.91,1.73,1.04
"""1001980""",0.627,2.37,0.0,99.0,3.76,11.2,0.0,99.0,0.959,69855211,2992703,0,64788949,3367319,1698943,5066262,40407,5728087,516796,510707,3734518,1993569,0,1.87,1.98,1.01
"""1002322""",0.762,2.7,0.0,88.0,4.3,12.4,0.0,99.0,0.963,70183972,2663942,0,64530039,3740993,1912940,5653933,26273,6370796,587514,614908,4187665,2183131,0,1.92,1.96,0.955
"""1002826""",0.715,2.55,0.0,92.0,4.24,12.2,0.0,99.0,0.964,70258577,2589337,0,64626864,3730705,1901008,5631713,28982,6337105,596832,601786,4159753,2177352,0,1.91,1.96,0.992
"""1004266""",0.652,2.36,0.0,134.0,4.07,11.8,0.0,99.0,0.961,70016665,2831249,0,64501096,3571908,1943661,5515569,28863,6286093,573038,597599,4131507,2154586,0,1.92,1.84,0.959
"""1005038""",0.658,2.39,0.0,160.0,4.02,11.7,0.0,99.0,0.958,69819093,3028821,0,64385941,3619817,1813335,5433152,25264,6141901,558631,542880,4028031,2113870,0,1.91,2.0,1.03
"""1005444""",0.68,2.5,0.0,188.0,3.98,11.7,0.0,99.0,0.96,69956399,2891515,0,64564851,3527068,1864480,5391548,44772,6126240,549421,577428,4017345,2108895,0,1.9,1.89,0.951


In [63]:
snv_count_lr_subset = lr_subset_mt.filter_rows(hl.is_snp(lr_subset_mt.alleles[0], lr_subset_mt.alleles[1])).count_rows()
insertion_count_lr_subset = lr_subset_mt.filter_rows(hl.is_insertion(lr_subset_mt.alleles[0], lr_subset_mt.alleles[1])).count_rows()
deletion_count_lr_subset = lr_subset_mt.filter_rows(hl.is_deletion(lr_subset_mt.alleles[0], lr_subset_mt.alleles[1])).count_rows()



In [64]:
print(f'# SNV: {snv_count_lr_subset}')
print(f'# DEL: {deletion_count_lr_subset}')
print(f'# INS: {insertion_count_lr_subset}')

# SNV: 65541669
# DEL: 4072836
# INS: 3045864


In [65]:
lr_subset_qc_mt.aggregate_cols(hl.agg.stats(lr_subset_qc_mt.sample_qc.r_ti_tv))



Struct(mean=1.9071749495102823, stdev=0.011895138479839754, min=1.8365416835172454, max=1.938882514963198, n=990, sum=1888.1032000151795)

In [66]:
lr_subset_qc_mt.aggregate_cols(hl.agg.stats(lr_subset_qc_mt.sample_qc.r_het_hom_var))



Struct(mean=1.923181754736893, stdev=0.13819465912202106, min=1.2990189471593447, max=2.520075331691657, n=990, sum=1903.949937189524)