# Perform the METAL meta-analysis

In this notebook we perform the meta-analysis of aggregated *All of Us* and UK Biobank GWAS results.

Note that this work is part of a larger project to [Demonstrate the Potential for Pooled Analysis of All of Us and UK Biobank Genomic Data](https://github.com/all-of-us/ukb-cross-analysis-demo-project). Specifically this is for the portion of the project that is the meta-analysis of **siloed** gwas results.

# Setup

<div class="alert alert-block alert-warning">
    <b>Cloud Environment</b>: This notebook was written for use on the <i>All of Us</i> Workbench.
    <ul>
        <li>Use "Recommended Environment" <kbd><b>General Analysis</b></kbd> which creates compute type <kbd><b>Standard VM</b></kbd> with reasonable defaults for CPU, RAM, and disk.</li>
        <li>This notebook takes several hours to run interactively. You can also it in the background via <kbd>run_notebook_in_the_background</kbd> for the sake of provenance and reproducibility.</li>
    </ul>
</div>

In [None]:
from datetime import datetime
import os
import time

## Install METAL

In [None]:
%%bash

# Install METAL if it is not already installed.
if [ ! -f ./generic-metal/metal ] ; then

    curl -L -o metal.tar.gz "http://csg.sph.umich.edu/abecasis/Metal/download/Linux-metal.tar.gz"
    tar -xf metal.tar.gz

fi

## Define Constants

In [None]:
# Papermill parameters. See https://papermill.readthedocs.io/en/latest/usage-parameterize.html

#---[ Inputs ]---
AOU_AGGREGATED_GWAS_RESULTS = {
    'HDL': 'gs://fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5/data/aou/pheno/20220323/fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5_data_aou_regenie_20220318_aou_alpha3_lipids_regenie_step2_HDL_norm_aggregated.tsv',
    'LDL': 'gs://fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5/data/aou/pheno/20220323/fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5_data_aou_regenie_20220318_aou_alpha3_lipids_regenie_step2_LDL_adjusted_norm_aggregated.tsv',
    'TC': 'gs://fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5/data/aou/pheno/20220323/fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5_data_aou_regenie_20220318_aou_alpha3_lipids_regenie_step2_TC_adjusted_norm_aggregated.tsv',
    'TG': 'gs://fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5/data/aou/pheno/20220323/fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5_data_aou_regenie_20220318_aou_alpha3_lipids_regenie_step2_TG_adjusted_norm_aggregated.tsv'
}

UKB_AGGREGATED_GWAS_RESULTS = {
    'HDL': 'gs://fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5/data/aou/pheno/20220323/ukb_lipids_regenie_step2_HDL_norm_aggregated.tsv',
    'LDL': 'gs://fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5/data/aou/pheno/20220323/ukb_lipids_regenie_step2_LDL_adjusted_norm_aggregated.tsv',
    'TC': 'gs://fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5/data/aou/pheno/20220323/ukb_lipids_regenie_step2_TC_adjusted_norm_aggregated.tsv',
    'TG': 'gs://fc-secure-471c1068-cd3d-4b43-9b5d-a618c85ceea5/data/aou/pheno/20220323/ukb_lipids_regenie_step2_TG_adjusted_norm_aggregated.tsv'
}

#---[ Outputs ]---
# Create a timestamp for a folder of results generated today.
DATESTAMP = time.strftime('%Y%m%d')
METAL_OUTPUTS = f'{os.getenv("WORKSPACE_BUCKET")}/data/metaanalysis/{DATESTAMP}/'

## Transfer inputs to local disk

In [None]:
!gsutil -m cp {' '.join(AOU_AGGREGATED_GWAS_RESULTS.values())} .

In [None]:
!gsutil -m cp {' '.join(UKB_AGGREGATED_GWAS_RESULTS.values())} .

In [None]:
!ls -lh *aggregated.tsv

In [None]:
def run_metal(lipid):
    aou_file = os.path.basename(AOU_AGGREGATED_GWAS_RESULTS[lipid])
    ukb_file = os.path.basename(UKB_AGGREGATED_GWAS_RESULTS[lipid])
    
    metal_parameters = f'''
SCHEME STDERR
AVERAGEFREQ ON
MINMAXFREQ ON

MARKER ID
ALLELE ALLELE0 ALLELE1
EFFECT BETA
STDERR SE
PVALUE Pvalue
FREQ A1FREQ
SEPARATOR TAB
PROCESS {aou_file}

MARKER ID
ALLELE ALLELE0 ALLELE1
EFFECT BETA
STDERR SE
PVALUE Pvalue
FREQ A1FREQ
SEPARATOR TAB
PROCESS {ukb_file}

OUTFILE METAANALYSIS_{lipid}_ .tbl
ANALYZE

QUIT
'''
    print(f'Metal parameters:\n{metal_parameters}')
    
    metal_parameters_filename = f'METAL_{lipid}.txt'
    with open(metal_parameters_filename, 'w') as param_file:
        param_file.write(metal_parameters)
        
    !./generic-metal/metal {metal_parameters_filename}

# METAL - Meta Analysis

In [None]:
run_metal('HDL')

In [None]:
run_metal('LDL')

In [None]:
run_metal('TC')

In [None]:
run_metal('TG')

In [None]:
!ls -lth METAANALYSIS_*

# Store outputs in the workspace bucket

In [None]:
!gsutil -m cp METAANALYSIS_* {METAL_OUTPUTS}

In [None]:
!gsutil ls -lh {METAL_OUTPUTS}

# Provenance

In [None]:
!date