# Get the sample list from the matrix table

In this notebook, we use [HAIL](https://hail.is/) to write a sample list from a matrix table to a TSV.

Note that this work is part of a larger project to [Demonstrate the Potential for Pooled Analysis of All of Us and UK Biobank Genomic Data](https://docs.google.com/document/d/19ZS0z_-7FEM37pNDAXaWaqBSLnqyd9MZEkiOmtF3n_0/edit#). Specifically this is for the portion of the project that is the **pooled** analysis.

# Setup 

<div class="alert alert-block alert-warning">
    <b>Cloud Environment</b>: This notebook was written for use on the <i>All of Us</i> Workbench.
    <ul>
        <li>Use "Recommended Environment" <kbd><b>Hail Genomics Analysis</b></kbd> which creates compute type <kbd>Dataproc Cluster</kbd> with reasonable defaults for CPU, RAM, disk, and number of workers.</li>
        <li>This notebook only takes a minute to run.</li>
    </ul>
</div>

In [None]:
from datetime import datetime
import hail as hl
import os

## Define constants

In [None]:
# Papermill parameters. See https://papermill.readthedocs.io/en/latest/usage-parameterize.html

#---[ Inputs ]---
# Matrix table was created from UKB 200k exome release VCFs.
# Note: The UKB matrix table was created via notebook 'aou_workbench_pooled_analyses/matrix_table_creation/create_matrix_tables.ipynb'
# and then repartitioned via notebook 'aou_workbench_pooled_analyses/matrix_table_creation/redo_partitions'.
UKB_MT = 'gs://fc-secure-fd6786bf-6c28-4f33-ac30-3860fbeee5bb/data/ukb/exomes/full_dataset_fewer_partitions.mt'

#---[ Outputs ]---
UKB_SAMPLE_ID_TSV = f'{os.getenv("WORKSPACE_BUCKET")}/data/ukb/ukb_200k_exome_sample_ids.tsv'

In [None]:
UKB_SAMPLE_ID_TSV

## Check access

In [None]:
!gsutil ls {UKB_MT}

## Start Hail 

In [None]:
hl.init(default_reference='GRCh38')

# Read UKB exomes matrix table

In [None]:
ukb_exomes = hl.read_matrix_table(UKB_MT)

# Write the sample id list to a TSV

In [None]:
ukb_exomes.cols().export(UKB_SAMPLE_ID_TSV)

# Provenance

In [None]:
print(datetime.now())

In [None]:
!pip3 freeze