# MAPT - GenoML Analysis

## GP2 NBA data release 7

## Project: Exploring MAPT-containing H1 and H2 haplotypes  in Parkinson's Disease across diverse populations 

Version: Python/3.10.12

Last Updated: MAY-2025

Gene coordinates for the region of 17q21.31 (containing MAPT) from the UCSC Browser: chr17:42,800,001-46,800,000 (GRCh38/hg38)

Notebook overview: In this notebook, we used GenoML to understad the relationship between 17q21.3 subhaplotypes and PD, applying machine learning via one-hot encoding and Extra Trees Classifier. Subhaplotypes were then ranked by predictive value using Gini impurity. Higher Gini importance indicated a subhaplotype's stronger role in distinguishing PD cases from controls.


## Description:

* Loading Python librariess, set paths to the GP2 data and defining functions
* Install packages
* Copy the files
* Create a covariate file
* Remove related individuals
* Remove 'non-PD cases and -controls'
* Extract the region of interest
* Prepare file with the SNPs in the subhaplotype
* Run GenoML
    * Model individual tagging snps association with PD
    * Model per-sample subhaplotype association with PD
    * Run GenoML for subhaplotype vs PD analysis
* Save output


### Getting Started

Loading Python libraries and defining functions
Installing packages
Preparing input files:
- Copying files 
- Remove related individuals
- Remove non-PD case control individuals

In [None]:
! pip install numba==0.60.0  joblib==1.4.2 pynndescent==0.5.13 matplotlib==3.9.2 numpy==1.26.4\
 tables==3.10.1 pandas==2.2.2 pandas-plink==2.3.1 requests==2.32.3 scikit-learn==1.5.1\
 scipy==1.14.1 seaborn==0.13.2 statsmodels==0.14.2 xgboost==2.0.3 umap-learn==0.5.6 xarray==2024.7.0 --user --force-reinstall --no-deps

In [None]:
# verify installations in notebook
! pip list

#### Loading Python libraries and defining functions

In [None]:
# Use the os package to interact with the environment
import os

# Bring in Pandas for Dataframe functionality
import pandas as pd

# Numpy for basics
import numpy as np

# Use StringIO for working with file contents
from io import StringIO

# Enable IPython to display matplotlib graphs
import matplotlib.pyplot as plt
%matplotlib inline

# Enable interaction with the FireCloud API
from firecloud import api as fapi

# Import the iPython HTML rendering for displaying links to Google Cloud Console
from IPython.core.display import display, HTML

# Import urllib modules for building URLs to Google Cloud Console
import urllib.parse

# BigQuery for querying data
from google.cloud import bigquery

#Import Sys
import sys as sys

In [None]:
# Utility routine for printing a shell command before executing it
def shell_do(command):
    print(f'Executing: {command}', file=sys.stderr)
    !$command
    
def shell_return(command):
    print(f'Executing: {command}', file=sys.stderr)
    output = !$command
    return '\n'.join(output)

# Utility routine for printing a query before executing it
def bq_query(query):
    print(f'Executing: {query}', file=sys.stderr)
    return pd.read_gbq(query, project_id=BILLING_PROJECT_ID, dialect='standard')

# Utility routine for display a message and a link
def display_html_link(description, link_text, url):
    html = f'''
    <p>
    </p>
    <p>
    {description}
    <a target=_blank href="{url}">{link_text}</a>.
    </p>
    '''

    display(HTML(html))

# Utility routines for reading files from Google Cloud Storage
def gcs_read_file(path):
    """Return the contents of a file in GCS"""
    contents = !gsutil -u {BILLING_PROJECT_ID} cat {path}
    return '\n'.join(contents)
    
def gcs_read_csv(path, sep=None):
    """Return a DataFrame from the contents of a delimited file in GCS"""
    return pd.read_csv(StringIO(gcs_read_file(path)), sep=sep, engine='python')

# Utility routine for displaying a message and link to Cloud Console
def link_to_cloud_console_gcs(description, link_text, gcs_path):
    url = '{}?{}'.format(
        os.path.join('https://console.cloud.google.com/storage/browser',
                     gcs_path.replace("gs://","")),
        urllib.parse.urlencode({'userProject': BILLING_PROJECT_ID}))

    display_html_link(description, link_text, url)

In [None]:
# Set up billing project and data path variables
BILLING_PROJECT_ID = os.environ['GOOGLE_PROJECT']
WORKSPACE_NAMESPACE = os.environ['WORKSPACE_NAMESPACE']
WORKSPACE_NAME = os.environ['WORKSPACE_NAME']
WORKSPACE_BUCKET = os.environ['WORKSPACE_BUCKET']

WORKSPACE_ATTRIBUTES = fapi.get_workspace(WORKSPACE_NAMESPACE, WORKSPACE_NAME).json().get('workspace',{}).get('attributes',{})

## Print the information to check we are in the proper release and billing 
## This will be different for you, the user, depending on the billing project your workspace is on
print('Billing and Workspace')
print(f'Workspace Name: {WORKSPACE_NAME}')
print(f'Billing Project: {BILLING_PROJECT_ID}')
print(f'Workspace Bucket, where you can upload and download data: {WORKSPACE_BUCKET}')
print('')

## GP2 v7.0
## Explicitly define release v7.0 path 
GP2_RELEASE_PATH = 'gs://gp2tier2/path/to/release/7'
GP2_CLINICAL_RELEASE_PATH = f'{GP2_RELEASE_PATH}/clinical_data'
GP2_RAW_GENO_PATH = f'{GP2_RELEASE_PATH}/raw_genotypes'
GP2_IMPUTED_GENO_PATH = f'{GP2_RELEASE_PATH}/imputed_genotypes'
GP2_META_RELEASE_PATH = f'{GP2_RELEASE_PATH}/meta_data'
GP2_SUMSTAT_RELEASE_PATH = f'{GP2_RELEASE_PATH}/summary_statistics'

print('GP2 v7.0')
print(f'Path to GP2 v7.0 Clinical Data @ `GP2_CLINICAL_RELEASE_PATH`: {GP2_CLINICAL_RELEASE_PATH}')
print(f'Path to GP2 v7.0 Metadata @ `GP2_META_RELEASE_PATH`: {GP2_META_RELEASE_PATH}')
print(f'Path to GP2 v7.0 Raw Genotype Data @ `GP2_RAW_GENO_PATH`: {GP2_RAW_GENO_PATH}')
print(f'Path to GP2 v7.0 Imputed Genotype Data @ `GP2_IMPUTED_GENO_PATH`: {GP2_IMPUTED_GENO_PATH}')
print(f'Path to GP2 v7.0 summary statistics: {GP2_SUMSTAT_RELEASE_PATH}')

In [None]:
## Define ancestry
ANCESTRY = "AAC"

#### Installing packages and softwares

In [None]:
%%bash
#Installing plink

mkdir -p ~/tools
cd ~/tools

if test -e /home/jupyter/tools/plink; then
echo "Plink1.9 is already installed in /home/jupyter/tools/"

else
echo -e "Downloading plink \n    -------"
wget -N http://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20190304.zip 
unzip -o plink_linux_x86_64_20190304.zip
echo -e "\n plink downloaded and unzipped in /home/jupyter/tools \n "

fi


if test -e /home/jupyter/tools/plink2; then
echo "Plink2 is already installed in /home/jupyter/tools/"

else
echo -e "Downloading plink2 \n    -------"
wget -N https://s3.amazonaws.com/plink2-assets/alpha6/plink2_linux_avx2_20250129.zip
unzip -o plink2_linux_avx2_20250129.zip
echo -e "\n plink2 downloaded and unzipped in /home/jupyter/tools \n "

fi

In [None]:
%%bash
ls /home/jupyter/tools/

In [None]:
%%bash

# chmod plink 1.9 
chmod u+x /home/jupyter/tools/plink

In [None]:
%%bash

# chmod plink 2.0
chmod u+x /home/jupyter/tools/plink2

#### Preparing input files

In [None]:
# Make a directory
print("Making a working directory")
WORK_DIR = f'/home/jupyter/Team6_haplo/'
shell_do(f'mkdir -p {WORK_DIR}')

In [None]:
%%bash -s $ANCESTRY

WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR
ls

##### Retreive the files needed, including the genotype (iusing the imputed genotype files) and covariate files

In [None]:
shell_do(f'gsutil -mu {BILLING_PROJECT_ID} ls {GP2_IMPUTED_GENO_PATH}')

In [None]:
shell_do(f'gsutil -u {BILLING_PROJECT_ID} -m cp -r {GP2_IMPUTED_GENO_PATH}/{ANCESTRY}/chr17_{ANCESTRY}_* {WORK_DIR}')


Get the covariate file

In [None]:
shell_do(f'gsutil -u {BILLING_PROJECT_ID} ls {GP2_CLINICAL_RELEASE_PATH}')

In [None]:
shell_do(f'gsutil -u {BILLING_PROJECT_ID} -m cp -r {GP2_CLINICAL_RELEASE_PATH}/master_key_release7_final.csv {WORK_DIR}')


##### Remove related individuals

In [None]:
# Select the file that matches with your population
shell_do(f'gsutil -u {BILLING_PROJECT_ID} ls {GP2_META_RELEASE_PATH}/related_samples/')

In [None]:
shell_do(f'gsutil -u {BILLING_PROJECT_ID} -m cp -r {GP2_META_RELEASE_PATH}/related_samples/{ANCESTRY}_release7.related {WORK_DIR}')

In [None]:
shell_do(f'gsutil -u {BILLING_PROJECT_ID} -m cp -r {GP2_RAW_GENO_PATH}/{ANCESTRY}/{ANCESTRY}_release7.eigenvec {WORK_DIR}')

The IDs are:
ID1: Individual ID for the first individual of the pair
ID2: Individual ID for the second individual of the pair
We select to remove individuals in the ID1 and only exclude one person in the pair

In [None]:
%%bash -s $ANCESTRY

WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR


cut -d, -f2 ${1}_release7.related > related_ids.txt


In [None]:
%%bash -s $ANCESTRY

WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR

/home/jupyter/tools/plink2 \
--pfile chr17_${1}_release7 \
--remove related_ids.txt \
--make-pgen \
--out ${1}_release7_nonrelated

##### Remove non-PD case/control individuals

Double-check with the numbers found here for your ancestry group before moving on: https://gp2.org/the-components-of-gp2s-fifth-data-release/

The prune flag keeo only these with a plink phenotype of 1 or 0. We need to do this because the MAF will be different if these individuals are not removed (for the group all)

In [None]:
%%bash -s $ANCESTRY

WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR

/home/jupyter/tools/plink2 \
--pfile ${1}_release7_nonrelated \
--prune \
--make-pgen \
--out chr17_${1}_release7_nonrelated_pdc

##### Extract the region of interest (whole MAPT gene), update the variant IDs. and recode to plink v1.9 format (bed/bim/fam)

In [None]:
%%bash -s $ANCESTRY

WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR

/home/jupyter/tools/plink2 \
--pfile chr17_${1}_release7_nonrelated_pdc \
--chr 17 \
--new-id-max-allele-len 64 \
--from-bp 45894527  \
--to-bp 48028334 \
--set-all-var-ids 'chr@_#_$r:$a' \
--make-pgen \
--out chr17_${1}_release7_MAPT 

In [None]:
%%bash -s $ANCESTRY

WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR

/home/jupyter/tools/plink2 \
--pfile chr17_${1}_release7_MAPT \
--chr 17 \
--from-bp 45894554  \
--to-bp 48028334 \
--rm-dup force-first \
--make-bed \
--out chr17_${1}_release7_MAPT \

In [None]:
WORK_DIR='/home/jupyter/Team6_haplo/'
! ls $WORK_DIR

### Extract the region of interest 

Here we are interested in the SNP rs1052553
- This SNP was the one that they used in the Nigerian MAPT paper
- This SNP will be used as a proxy for the H1/H2 haplotype
- rs1052553 coordinates in GRCh38: 17:45996523
- We will also add --mind to remove individuals that haven't been fully genotyped for this variant

- Want to extract the H1/H2 tagging SNP, rs1052553, and the 6 subhaplotype tagging SNPs

In [None]:
%%bash

Define the working directory, adjust this as necessary
WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR

# Create a file with the desired SNPs - need the coordinates
cat > snps_to_keep.txt << EOF
chr17_45908813_G:A
chr17_45942346_G:A
chr17_45977067_A:G
chr17_45998697_C:T
chr17_46003698_A:G
chr17_46028029_A:G
EOF

# Echo the contents of the file to confirm it was created correctly
cat snps_to_keep.txt

### Extract the SNPs
We will also add --mind to remove individuals that haven't been fully genotyped for these variants

#### Header

In [None]:
%%bash -s $ANCESTRY

WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR

/home/jupyter/tools/plink \
--bfile chr17_${1}_release7_MAPT \
--extract snps_to_keep.txt \
--chr 17 \
--mind 0.01 \
--recode \
--out ${1}_release7_MAPT_snps

In [None]:
%%bash -s $ANCESTRY

WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR

cat ${1}_release7_MAPT_snps.map

In [None]:
%%bash -s $ANCESTRY

WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR

/home/jupyter/tools/plink2 \
--bfile chr17_${1}_release7_MAPT \
--extract snps_to_keep.txt \
--rm-dup force-first \
--chr 17 \
--mind 0.01 \
--make-bed \
--out ${1}_release7_MAPT_snps

In [None]:
! cat /home/jupyter/Team6_haplo/{ANCESTRY}_release7_MAPT_snps.bim

As you can see, there are two variants here with the same coordinates (At least for the CAS population). This is because there were multipel probes for the same variant during genotyping - the results for the variants should be indentical though

#### Put together the covar file

In [None]:
clin = pd.read_csv('/home/jupyter/Team6_haplo/master_key_release7_final.csv')
clin.info()

In [None]:
gen = pd.read_csv(f'/home/jupyter/Team6_haplo/{ANCESTRY}_release7_nonrelated.psam', sep='\t')
gen.info()

In [None]:
pcs = pd.read_csv(f'/home/jupyter/Team6_haplo/{ANCESTRY}_release7.eigenvec', sep='\t')
pcs.info()

In [None]:
gen2 = pd.merge(gen, clin, left_on='#IID', right_on='GP2sampleID')
gen2.info()

In [None]:
gen3 = pd.merge(gen2, pcs, left_on='#IID', right_on='IID')
gen3.info()

In [None]:
plink_clin = gen3[['#IID', 'SEX','PHENO1', 'age_at_sample_collection', 'PC1', 'PC2', 'PC3', 'PC4', 'PC5','PC6', 'PC7', 'PC8', 'PC9','PC10' ]]
plink_clin.head()

In [None]:
#Set missing values to -9 (plink format)
plink_clin.dropna(axis=0, subset="PHENO1", inplace=True)
plink_clin['age_at_sample_collection'] = plink_clin['age_at_sample_collection'].fillna(-9)
plink_clin['SEX'] = plink_clin['SEX'].fillna(-9)

In [None]:
plink_clin["PHENO1"].value_counts(dropna=False)

In [None]:
#Rename age_at_sample_collection  
plink_clin = plink_clin.rename(columns={'age_at_sample_collection': 'AGE'})
plink_clin.head()

In [None]:
plink_clin.to_csv(f'/home/jupyter/Team6_haplo/{ANCESTRY}_covars.txt', sep='\t', index=False, na_rep='-9',)

In [None]:
covariate_file = pd.read_csv(f"/home/jupyter/Team6_haplo/{ANCESTRY}_covars.txt", sep='\t')
pheno_column = covariate_file[["#IID", "PHENO1"]].copy()
pheno_column

In [None]:
# take modified covariate file and use it for genoml confounders
covariate_file = pd.read_csv(f"/home/jupyter/Team6_haplo/{ANCESTRY}_covars.txt", sep='\t')
covariate_file.rename(columns={'#IID':'ID'}, inplace=True)
# not including age because of missingness
covariates_genoml = covariate_file[['ID', 'SEX','PC1', 'PC2', 'PC3', 'PC4', 'PC5']]
covariates_genoml.to_csv(f"/home/jupyter/GenoML/{ANCESTRY}_confounders.csv", index=False)

In [None]:
# rename pheno column for processing with genoml
pheno_file = pheno_column.rename(columns={'#IID':'ID', "PHENO1": "PHENO"}).copy()

In [None]:
pheno_file["PHENO"] = pheno_file["PHENO"].astype(int)

pheno_file["ID"] = pheno_file["ID"].astype(str)



In [None]:
pheno_file["PHENO"].value_counts()

In [None]:
# map pheno to 0 for control, 1 for cases
pheno_mapping = {1: 0, 2: 1}
pheno_file['PHENO'] = pheno_file['PHENO'].map(pheno_mapping).astype('Int64')
pheno_file["PHENO"].value_counts()

In [None]:
# put it in the GenoML work directory
pheno_file.to_csv(f"/home/jupyter/GenoML/{ANCESTRY}_pheno.csv", index=False)

In [None]:
%%bash -s $ANCESTRY
# copy over bfiles and files for analysis
WORK_DIR='/home/jupyter/Team6_haplo/'
cd $WORK_DIR
cp ${1}_release7_MAPT_snps.fam ${1}_release7_MAPT_snps.bim ${1}_release7_MAPT_snps.bed /home/jupyter/GenoML

In [None]:
%%bash -s $ANCESTRY
WORK_DIR='/home/jupyter/GenoML/'
cd $WORK_DIR

ls ${1}*

## GenoML 

#### Create a results and working directory

In [None]:
# create a directory for GenoML results
! mkdir -p /home/jupyter/GenoML/results
! mkdir -p /home/jupyter/GenoML/results/{ANCESTRY}
RESULTS_PATH = f'/home/jupyter/GenoML/results/{ANCESTRY}/'
# move into outer project folder
%cd /home/jupyter/GenoML

### Model individual tagging snps association with PD

In [None]:
#### Run Genoml
## Geno Input: 6 tagging snps in MAPT region
## Feature Selection: 100 trees
## Pheno Input: PD cases and healthy controls
! genoml discrete supervised munge \
--geno {ANCESTRY}_release7_MAPT_snps \
--prefix results/{ANCESTRY}/{ANCESTRY} \
--skip_prune yes \
--feature_selection 100 \
--pheno {ANCESTRY}_pheno.csv

In [None]:
# Get ranking of the tagging SNPS and their score
! cp results/{ANCESTRY}/{ANCESTRY}.approx_feature_importance.txt results/{ANCESTRY}_snp_rank.txt 

In [None]:
! cat results/{ANCESTRY}_snp_rank.txt 

In [None]:
! cp /home/jupyter/Team6_haplo/snps_to_keep.txt /home/jupyter/GenoML

In [None]:
! cat snps_to_keep.txt

In [None]:
! cp /home/jupyter/Team6_haplo/chr17_EUR_release7_MAPT.* /home/jupyter/GenoML

In [None]:
%cd /home/jupyter/GenoML
! ls

### Model per-sample subhaplotype association with PD

In [None]:
%%bash
# recode the PLINK files
for i in AAC AFR AJ AMR CAH CAS EAS EUR FIN MDE SAS; do
/home/jupyter/tools/plink --bfile /home/jupyter/GenoML/${i}_release7_MAPT_snps --recode A --real-ref-alleles --out /home/jupyter/GenoML/${i}_r7_MAPT_snps_recode --output-missing-genotype 'N'
done

In [None]:
WORK_DIR = "/home/jupyter/GenoML"
ancestry = "SAS"
recode = pd.read_csv(f"{WORK_DIR}/{ancestry}_r7_MAPT_snps_recode.raw", sep = " ")
recode

In [None]:
recode.columns

In [None]:
# form subhaplotype group for each sample, using the recoded genotype file

WORK_DIR = "/home/jupyter/GenoML"
ancestry = "SAS"
recode = pd.read_csv(f"{WORK_DIR}/{ancestry}_r7_MAPT_snps_recode.raw", sep = " ")
snp_45908813 = {0: "G", 1: "A", 2:"A"}
snp_45942346 = {0: "G", 1: "A", 2:"A"}
snp_45977067 = {0: "A", 1: "G", 2:"G"}
snp_45998697 = {0: "C", 1: "T", 2:"T"}
snp_46003698 = {0: "A", 1: "G", 2:"G"}
snp_46028029 = {0: "A", 1: "G", 2:"G"}

def map_column(recode_df, column_list, map_list):
    """
    Forms a 6 SNP subhaplotype group based on a recoded genotype file, and adds it to the dataframe
    Params:
        recode_df: recoded data frame
        column_list: list of the tagging SNP columns
        map_list: list of mappings for each value in the recode column, to the appropriate genotype
    
    Returns:
        recoded_df: recoded dataframe with additional 'Haplotype' column 
    """
    recoded_df = recode_df.copy()
    for i in range(len(column_list)):
        col = column_list[i]
        snp = map_list[i]
        recoded_df[col] = recoded_df[col].map(snp).astype('str')
    recoded_df['Haplotype'] = recoded_df[column_list].agg(''.join, axis=1)
    return recoded_df

col_list = ['chr17_45908813_G:A_A','chr17_45942346_G:A_A', 'chr17_45977067_A:G_G', 'chr17_45998697_C:T_T','chr17_46003698_A:G_G', 'chr17_46028029_A:G_G']
map_list = [snp_45908813, snp_45942346, snp_45977067, snp_45998697, snp_46003698,snp_46028029]

recoded_df = map_column(recode, col_list, map_list)
recode_haplotype_df = pd.get_dummies(recoded_df, columns=["Haplotype"], dtype="int")
haplotype_numerical = recode_haplotype_df.iloc[:, [1] + list(range(12, recode_haplotype_df.shape[1]))]
haplotype_numerical = haplotype_numerical.rename(columns={'IID': 'ID'})
haplotype_numerical
haplotype_numerical.to_csv(f"{WORK_DIR}/{ancestry}_haplotypes.csv", index=False)


In [None]:
recoded_df

In [None]:
pd.set_option('display.max_rows', 100)
recoded_df["Haplotype"].value_counts(dropna=False)

#### Run GenoML for subhaplotype vs PD analysis
- Params:
- addit_file = `{ancestry}_haplotypes.csv`
- pheno_file = `{ancestry}_pheno.csv`

In [None]:
# create a directory for GenoML results
ancestry = "SAS"
! mkdir -p /home/jupyter/GenoML/haplo_results
! mkdir -p /home/jupyter/GenoML/haplo_results/{ancestry}
RESULTS_PATH = f'/home/jupyter/GenoML/haplo_results/{ancestry}/'
# move into outer project folder
%cd /home/jupyter/GenoML

In [None]:
! genoml discrete supervised munge \
--addit {ancestry}_haplotypes.csv \
--prefix {RESULTS_PATH} \
--skip_prune yes \
--feature_selection 100 \
--pheno {ancestry}_pheno.csv

In [None]:
! cat {RESULTS_PATH}/.approx_feature_importance.txt
! cp {RESULTS_PATH}/.approx_feature_importance.txt /home/jupyter/GenoML/haplo_results/{ancestry}_subhaplotype_feature_rank.txt
! ls /home/jupyter/GenoML/haplo_results

## Save files to workspace bucket

In [None]:
file_list = ['AAC_subhaplotype_feature_rank.txt',
 'AFR_subhaplotype_feature_rank.txt',
'AJ_subhaplotype_feature_rank.txt',
'AMR_subhaplotype_feature_rank.txt',
'CAH_subhaplotype_feature_rank.txt',
'CAS_subhaplotype_feature_rank.txt',
'EAS_subhaplotype_feature_rank.txt',
'EUR_subhaplotype_feature_rank.txt',
'FIN_subhaplotype_feature_rank.txt',
'MDE_subhaplotype_feature_rank.txt',
'SAS_subhaplotype_feature_rank.txt']

In [None]:
WORK_DIR = '/home/jupyter/GenoML/haplo_results'
for file in file_list:
    shell_do(f'gsutil -mu {BILLING_PROJECT_ID} cp -r {WORK_DIR}/{file} {WORKSPACE_BUCKET}/GenoML_subhaplotype_results/{file}')

In [None]:
! gsutil -u {BILLING_PROJECT_ID} ls $WORKSPACE_BUCKET/GenoML_subhaplotype_results