# Meta-analysis of antidepressant exposure GWAS

Conduct GWAS meta-analysis using [Hail](https://hail.is). Import GWAS sumstats in [GWAS VCF](https://github.com/MRCIEU/gwas-vcf-specification) format, which can be read into Hail and understood like any other VCF file. See also [meta-analysis functions from the Pan UK Biobank](https://github.com/atgu/ukbb_pan_ancestry/blob/master/run_ma.py). 

In [5]:
import hail as hl
hl.init(local='local[8]')

SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/home/madams23/miniconda3/envs/hail/lib/python3.8/site-packages/pyspark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.3.2
SparkUI available at http://waddington:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.119-ca0ff87b1687
LOGGING: writing to /home/madams23/Projects/antidep-gwas/hail-20230630-1026-0.2.119-ca0ff87b1687.log


Import GWAS VCFs and convert to `MatrixTable`s.

In [8]:
from os import path
import os
import glob

if not os.path.exists("mt"):
    os.makedirs("mt")

for gwas_vcf in glob.glob("vcf/*.vcf.gz"):
    dataset = path.basename(gwas_vcf).split(os.extsep)[0]
    gwas_mt = path.join("mt", os.extsep.join([dataset, "mt"]))
    if not path.exists(gwas_mt):
        hl.import_vcf(gwas_vcf,  reference_genome="GRCh38", force_bgz=True).write(gwas_mt, overwrite=True)


2023-06-30 10:29:52.722 Hail: INFO: scanning VCF for sortedness...
2023-06-30 10:30:08.744 Hail: INFO: Coerced sorted VCF - no additional import work to do
2023-06-30 10:30:37.893 Hail: INFO: wrote matrix table with 19364085 rows and 1 column in 5 partitions to mt/FinnGen-EUR.mt
2023-06-30 10:30:38.057 Hail: INFO: scanning VCF for sortedness...
2023-06-30 10:30:59.118 Hail: INFO: Coerced prefix-sorted VCF, requiring additional sorting within data partitions on each query.
2023-06-30 10:31:29.085 Hail: INFO: wrote matrix table with 16399430 rows and 1 column in 5 partitions to mt/GenScot-EUR.mt
2023-06-30 10:31:29.241 Hail: INFO: scanning VCF for sortedness...
2023-06-30 10:31:50.655 Hail: INFO: Coerced sorted VCF - no additional import work to do
2023-06-30 10:32:12.098 Hail: INFO: wrote matrix table with 13421318 rows and 1 column in 4 partitions to mt/BBJ-EAS.mt


Open all the sumstats tables

In [10]:
mts = [hl.read_matrix_table(mt) for mt in glob.glob("mt/*.mt")]

Merge the list of sumstats together.

In [52]:
import functools

gw = functools.reduce(lambda mt1, mt2: mt1.union_cols(mt2, row_join_type = "outer"), mts)