# Extract HiFi QC Data<a class="tocSkip">

**This notebook reads in data from NTSM and ReadStats WDLS (stored in data tables). This is part of the HiFi QC process.**

**Below are the steps taken in this notebook:**
1. Import Statements & Global Variable Definitions
2. Define Functions
3. Read In Sample Names
4. Create Dataframe Of Files
5. Write data frame to data tables

# Import Statements & Global Variable Definitions

## Installs

In [1]:
%%capture
%pip install gcsfs
## capture CANNOT have comments above it
## For reading CSVs stored in Google Cloud (without downloading them first)
## May need to restart kernel after install 

In [2]:
%%capture
%pip install --upgrade --no-cache-dir --force-reinstall terra-pandas
%pip install --upgrade --no-cache-dir  --force-reinstall git+https://github.com/DataBiosphere/terra-notebook-utils
## For reading/writing data tables into pandas data frames
## May need to restart kernel after install 

## Import Statements

In [3]:
from firecloud import fiss
import pandas as pd 
import numpy as np
import terra_pandas as tp
import os                 
import subprocess       
import re                 
import io
import gcsfs

from typing import Any, Callable, List, Optional
from terra_notebook_utils import table, WORKSPACE_NAME, WORKSPACE_GOOGLE_PROJECT


## Global Variable Declarations

In [4]:
# Get the Google billing project name and workspace name for current workspace
PROJECT = os.environ['WORKSPACE_NAMESPACE']
WORKSPACE =os.path.basename(os.path.dirname(os.getcwd()))
bucket = os.environ['WORKSPACE_BUCKET'] + "/"


# Verify that we've captured the environment variables
print("Billing project: " + PROJECT)
print("Workspace: " + WORKSPACE)
print("Workspace storage bucket: " + bucket)

Billing project: human-pangenome-ucsc
Workspace: HPRC_WRANGLING_WUSTL_HPRC_HiFi_Year3
Workspace storage bucket: gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/


# Extract NTSM Data

## Read in NTSM Data Table

In [5]:
ntsm_df = tp.table_to_dataframe("ntsm", workspace=WORKSPACE, workspace_namespace=PROJECT)

ntsm_df.head()

Unnamed: 0_level_0,ntsv_count_2,read_2_fastq,read_1_fastq,sample,ntsv_count_1,hifi,1000g_cram,ntsm_eval_out
ntsm_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00140,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...
1,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00140,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...
10,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00408,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...
11,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00597,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...
12,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00597,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...


## Read NTSM Output & Write To DataFrame

In [6]:
ntsm_df['ntsm_score'] = np.nan
ntsm_df['result']     = np.nan

for index, row in ntsm_df.iterrows():

        sample_ntsm_fp = row['ntsm_eval_out']
        sample_ntsm_fn = os.path.basename(sample_ntsm_fp)

        ! gsutil cp {sample_ntsm_fp} .
        
        sample_ntsm_df = pd.read_csv(sample_ntsm_fn, header=None, sep='\t')

        ntsm_df['ntsm_score'][index] = sample_ntsm_df[2]
        ntsm_df['result'][index]     = sample_ntsm_df[3].astype('str')[0]


Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/a8e6d636-c700-4da9-982c-32d3a7bc1667/call-ntsm_eval/cacheCopy/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  424.0 B/  424.0 B]                                                
Operation completed over 1 objects/424.0 B.                                      


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/58de4918-166d-471e-8424-1c57188fd5c0/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  444.0 B/  444.0 B]                                                
Operation completed over 1 objects/444.0 B.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/70a8d07b-1e85-41fe-835b-cc136ca947c0/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  434.0 B/  434.0 B]                                                
Operation completed over 1 objects/434.0 B.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/055776fb-d7ed-4d35-b553-fc9d519ad3ed/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  434.0 B/  434.0 B]            

Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/5e8c364e-9b99-4612-be0b-ccad878526ed/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  434.0 B/  434.0 B]                                                
Operation completed over 1 objects/434.0 B.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/9d596662-d42b-4891-93a9-71ce7623a42c/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  424.0 B/  424.0 B]                                                
Operation completed over 1 objects/424.0 B.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/d32de91e-29fd-495d-97a3-9a72afda9efb/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  424.0 B/  424.0 B]            

Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/5cd39fbd-5dfa-4bdd-bd9d-2f7b1661e8cb/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  424.0 B/  424.0 B]                                                
Operation completed over 1 objects/424.0 B.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/f3a49e7a-0609-4650-8d64-9517cc8b3100/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  424.0 B/  424.0 B]                                                
Operation completed over 1 objects/424.0 B.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/73d2cc18-dfcd-44c5-9238-170b1b21f126/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  424.0 B/  424.0 B]            

Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/417237d7-3fa0-4e21-84b9-1a8a37f50daf/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  424.0 B/  424.0 B]                                                
Operation completed over 1 objects/424.0 B.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/30308978-2750-4399-9c83-69f6204691fe/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  424.0 B/  424.0 B]                                                
Operation completed over 1 objects/424.0 B.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/9dd83ae9-c5c3-4906-a2c2-1b52cc13b1a3/ntsm_workflow/554b6911-a283-4b20-a0d2-e429ef8b6136/call-ntsm_eval/sample_1000genome_Illumina_vs_hifi.txt...
/ [1 files][  424.0 B/  424.0 B]            

In [7]:
ntsm_df

Unnamed: 0_level_0,ntsv_count_2,read_2_fastq,read_1_fastq,sample,ntsv_count_1,hifi,1000g_cram,ntsm_eval_out,ntsm_score,result
ntsm_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00140,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,0.663237,Similar
1,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00140,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,0.597848,Similar
10,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00408,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,0.555651,Similar
11,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00597,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,0.583395,Similar
12,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00597,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,0.577262,Similar
...,...,...,...,...,...,...,...,...,...,...
76,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,NA20805,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,0.561480,Similar
77,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,NA20805,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,0.590216,Similar
78,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,NA20805,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,0.559741,Similar
8,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/1...,HG00408,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/C...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,0.574429,Similar


In [8]:
## How many rows don't match??? (Should be 0)
sum(ntsm_df['result'] != 'Similar')

0

# Extract ReadStats Data

## Read in ReadStats Data Table

In [9]:
readstats_df = tp.table_to_dataframe("readstats", workspace=WORKSPACE, workspace_namespace=PROJECT)

readstats_df.head()



Unnamed: 0_level_0,ReadStatsTarball,hifi,ReadStatsReport,sample
readstats_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00140
1,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00140
10,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00408
11,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00597
12,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00597


## Read ReadStats Output & Write To DataFrame

In [10]:
readstats_df['output']   = np.nan

for index, row in readstats_df.iterrows():

        sample_readstats_fp = row['ReadStatsReport']
        sample_readstats_fn = os.path.basename(sample_readstats_fp)

        ! gsutil cp {sample_readstats_fp} .
        
        sample_readstats_df = pd.read_csv(sample_readstats_fn, header=None, sep='\t')

        ## Just look at sample-level metrics
        sample_readstats_df = sample_readstats_df[sample_readstats_df[0]=='sample.fastq']

        ## Get rid of extra row
        sample_readstats_df = sample_readstats_df.iloc[1: , :]


        sample_coverage = sample_readstats_df[sample_readstats_df[1] == 'total_Gbp'][2]
        readstats_df['output'][index] = float(sample_coverage.values[0])

        
readstats_df['coverage'] = readstats_df['output']/3.1

Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/99fb090c-5ee9-4180-96ca-dca5b8d5daf9/call-consolidateReadStats/glob-44edd1a7587a8a70d756f66d9e5e0ada/sample_all.report.tsv...
/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/30d01b3c-3794-45e4-858d-736db6a15c04/call-consolidateReadStats/glob-44edd1a7587a8a70d756f66d9e5e0ada/sample_all.report.tsv...
/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/b56403ca-8641-4b3d-9ab8-6a1c3d6c67b8/call-consolidateReadStats/glob-44edd1a7587a8a70d756f66d9e5e0ada/sample_all.report.tsv...
/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/f98b15c9-7c47-4093-91da-5c33811a711d/call-consolidateReadStats/glob-44edd1a75

/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/93d5a8c9-9463-48dc-a5b7-9cabcbc6dd99/call-consolidateReadStats/glob-44edd1a7587a8a70d756f66d9e5e0ada/sample_all.report.tsv...
/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/7490fe12-4831-446e-90ff-2b47924ab462/call-consolidateReadStats/glob-44edd1a7587a8a70d756f66d9e5e0ada/sample_all.report.tsv...
/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      
Copying gs://fc-a7e6ae6b-860b

Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/ef3994d1-5954-4a08-8caf-1fdc68cc8394/call-consolidateReadStats/glob-44edd1a7587a8a70d756f66d9e5e0ada/sample_all.report.tsv...
/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/da4d98d4-7155-4eb4-b466-0a17c6b7fc75/call-consolidateReadStats/glob-44edd1a7587a8a70d756f66d9e5e0ada/sample_all.report.tsv...
/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/dc52a40e-87bd-4659-a2d0-6c8cd563d3e2/call-consolidateReadStats/glob-44edd1a75

/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/29b731a8-3a4d-40a1-8a56-87963d48e853/call-consolidateReadStats/glob-44edd1a7587a8a70d756f66d9e5e0ada/sample_all.report.tsv...
/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      
Copying gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/submissions/ff0b41e6-a1ae-4131-ad0f-8297a69b0f89/runReadStats/db206fa9-983b-4f0e-8897-5b68607d0507/call-consolidateReadStats/glob-44edd1a7587a8a70d756f66d9e5e0ada/sample_all.report.tsv...
/ [1 files][  1.2 KiB/  1.2 KiB]                                                
Operation completed over 1 objects/1.2 KiB.                                      
Copying gs://fc-a7e6ae6b-860b

In [11]:
readstats_df

Unnamed: 0_level_0,ReadStatsTarball,hifi,ReadStatsReport,sample,output,coverage
readstats_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00140,10.23,3.300000
1,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00140,32.18,10.380645
10,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00408,38.73,12.493548
11,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00597,34.14,11.012903
12,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00597,33.36,10.761290
...,...,...,...,...,...,...
76,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,NA20805,37.97,12.248387
77,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,NA20805,41.33,13.332258
78,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,NA20805,41.08,13.251613
8,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/s...,gs://fc-a7e6ae6b-860b-4519-80a5-277aeb967124/s...,HG00408,38.36,12.374194


In [27]:
# sum coverage by sample
for sample in (readstats_df['sample'].unique()):
    total_coverage = readstats_df.loc[readstats_df['sample'] == sample, 'coverage'].sum()
    if total_coverage < 35:
        print(sample, total_coverage)
# this should output nothing

# TODO: put these in a .csv