# Meta-analysis 3: AFR/AAC Meta-GWAS
- **Project:** GP2 AFR-AAC meta-GWAS 
- **Version:** Python/3.9
- **Status:** COMPLETE
- **Started:** 22-FEB-2023
- **Last Updated:** 22-FEB-2023
    - **Update Description:**  Notebook started

## Notebook Overview
- Meta-GWAS #3: Looking at African and African admixed individuals

### CHANGELOG
- 22-FEB-2023: Notebook started 

---
# Data Overview 

| ANCESTRY |     DATASET     | CASES | CONTROLS |  TOTAL  |           ARRAY           |                NOTES                |
|:--------:|:---------------:|:-----:|:--------:|:-------------------------:|:---------------------------------------------------------------------------------------------------------------:|:-----------------------------------:|
|    AFR   | IPDGC – Nigeria |  304  |    285   |   589   |         NeuroChip         | . | 
|    AFR   |  GP2  |  711  |   1,011  |  1,722  |        NeuroBooster       | . |
|    AAC   |  GP2 |  185  |   1,149  |  1,334  |        NeuroBooster       | . | 
|    AAC   |     23andMe     |  288  |  193,985 | 194,273 | Omni Express & GSA & 550k |        Just summary statistics       |

# Getting Started

## Importing packages

In [3]:
## Import the necessary packages 
import os
import numpy as np
import pandas as pd
import math
import numbers
import sys
import subprocess
import statsmodels.api as sm
import scipy
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

## Print out package versions
## Getting packages loaded into this notebook and their versions to allow for reproducibility
    # Repurposed code from stackoverflow here: https://stackoverflow.com/questions/40428931/package-for-listing-version-of-packages-used-in-a-jupyter-notebook

## Import packages 
import pkg_resources
import types
from datetime import date
today = date.today()
date = today.strftime("%d-%b-%Y").upper()

## Define function 
def get_imports():
    for name, val in globals().items():
        if isinstance(val, types.ModuleType):
            # Split ensures you get root package, not just imported function
            name = val.__name__.split(".")[0]

        elif isinstance(val, type):
            name = val.__module__.split(".")[0]

        # Some packages are weird and have different imported names vs. system/pip names
        # Unfortunately, there is no systematic way to get pip names from a package's imported name. You'll have to add exceptions to this list manually!
        poorly_named_packages = {
            "PIL": "Pillow",
            "sklearn": "scikit-learn"
        }
        if name in poorly_named_packages.keys():
            name = poorly_named_packages[name]

        yield name

## Get a list of packages imported 
imports = list(set(get_imports()))

# The only way I found to get the version of the root package from only the name of the package is to cross-check the names of installed packages vs. imported packages
requirements = []
for m in pkg_resources.working_set:
    if m.project_name in imports and m.project_name!="pip":
        requirements.append((m.project_name, m.version))

## Print out packages and versions 
print(f"PACKAGE VERSIONS ({date})")
for r in requirements:
    print("\t{}=={}".format(*r))

PACKAGE VERSIONS (25-FEB-2023)
	matplotlib==3.5.2
	numpy==1.22.4
	scipy==1.8.1
	pandas==1.4.3
	statsmodels==0.13.2
	seaborn==0.11.2


---
# Write out METAL command (here for documentation -- needs to be run interactively!)

```bash
module load metal 
metal << EOT 

# UNCOMMENT THE NEXT LINE TO ENABLE GenomicControl CORRECTION
SCHEME STDERR
GENOMICCONTROL ON

# === DESCRIBE AND PROCESS THE FIRST INPUT FILE ===
MARKER markerID
ALLELE effect_allele other_allele
# FREQ   maf
EFFECT beta
STDERR LOG(OR)_SE
PVALUE P
# WEIGHT OBS_CT 
PROCESS /data/GP2/projects/2023_02_MBM_AFR_AAC_GWAS/data/AFR/GP2-v4-AFR-wNIGERIAN-NB/GP2-v4-AFR-wNIGERIAN-NB-GWAS-MAF005-FEB2023.UpdatedforMETAL.tab

# === DESCRIBE AND PROCESS THE SECOND INPUT FILE ===
MARKER markerID
ALLELE effect_allele other_allele
# FREQ   maf
EFFECT beta
STDERR LOG(OR)_SE
PVALUE P
# WEIGHT OBS_CT 
PROCESS /data/GP2/projects/2023_02_MBM_AFR_AAC_GWAS/data/AFR/NIGERIAN-NC/NIGERIAN-NEUROCHIP-AFR-GWAS-MAF005-FEB2023.UpdatedforMETAL.tab

# === DESCRIBE AND PROCESS THE THIRD INPUT FILE ===
MARKER markerID
ALLELE effect_allele other_allele
# FREQ   maf
EFFECT beta
STDERR LOG(OR)_SE
PVALUE P
# WEIGHT OBS_CT 
PROCESS /data/GP2/projects/2023_02_MBM_AFR_AAC_GWAS/data/AAC/GP2-v4-AAC/GP2-v4-AAC-GWAS-MAF005-FEB2023.UpdatedforMETAL.tab

# === DESCRIBE AND PROCESS THE FOURTH INPUT FILE ===
MARKER markerID
ALLELE effect_allele alt_allele
# FREQ   maf
EFFECT effect
STDERR stderr
PVALUE pvalue
# WEIGHT N
PROCESS /data/GP2/projects/2023_02_MBM_AFR_AAC_GWAS/data/23andMe/AAC_23andMe_MAF0.05.hg38.noindels.newMarkerIDs.tab

OUTFILE /data/GP2/projects/2023_02_MBM_AFR_AAC_GWAS/data/AFR-AAC-META/AFR-AAC-META-UpdatedforMETAL .tbl
ANALYZE HETEROGENEITY
QUIT
``` 
**Then Control+D to submit job!**


# Investigate Top Hits

In [4]:
%%bash 

cat ${WORK_DIR}/data/AFR-AAC-META/AFR-AAC-META-UpdatedforMETAL1.tbl | awk '$6 <= 0.00000005' | wc -l
head -1 ${WORK_DIR}/data/AFR-AAC-META/AFR-AAC-META-UpdatedforMETAL1.tbl
cat ${WORK_DIR}/data/AFR-AAC-META/AFR-AAC-META-UpdatedforMETAL1.tbl | awk '$6 <= 0.00000005' 

35
MarkerName	Allele1	Allele2	Effect	StdErr	P-value	Direction	HetISq	HetChiSq	HetDf	HetPVal
chr1:155235878:G:T	t	g	-0.4494	0.0589	2.397e-14	----	0.0	2.878	3	0.4108
chr1:155167551:T:C	t	c	0.2868	0.0509	1.749e-08	++++	54.0	6.520	3	0.08888
chr1:155165746:T:C	t	c	0.2902	0.0501	7.086e-09	++++	55.3	6.706	3	0.08189
chr1:155317797:C:T	t	c	-0.3894	0.0625	4.685e-10	----	29.2	4.236	3	0.2371
chr1:155394894:G:A	a	g	-0.3913	0.0627	4.317e-10	----	17.7	3.646	3	0.3023
chr1:155490050:A:G	a	g	0.4240	0.0631	1.871e-11	++++	0.0	1.806	3	0.6137
chr1:155060276:A:G	a	g	-0.3735	0.0641	5.688e-09	----	0.0	2.298	3	0.5129
chr1:155168172:C:T	t	c	-0.2854	0.0500	1.125e-08	----	49.7	5.959	3	0.1136
chr1:155160272:C:T	t	c	0.4367	0.0583	6.724e-14	++++	34.2	4.558	3	0.2072
chr1:155619507:T:C	t	c	0.3757	0.0641	4.663e-09	++++	0.0	1.521	3	0.6775
chr1:155166081:A:G	a	g	0.2902	0.0501	7.136e-09	++++	53.5	6.452	3	0.09158
chr1:155169753:G:A	a	g	-0.2898	0.0501	7.466e-09	----	51.8	6.229	3	0.101
chr1:155165974:C:T	t	c	-0.2908	0.0502	7.

# Investigate which hits were imputed and which were genotyped

In [5]:
%%bash
cat ${WORK_DIR}/data/AFR-AAC-META/AFR-AAC-META-UpdatedforMETAL1.tbl | awk '$6 <= 0.00000005' | awk '{print $1}' > ${WORK_DIR}/data/AFR-AAC-META/genomewide-hits.txt

In [6]:
%%bash
grep -E -f ${WORK_DIR}/data/AFR-AAC-META/genomewide-hits.txt ${WORK_DIR}/data/AFR/GP2-v4-AFR-wNIGERIAN-NB/GP2-v4-AFR-wNIGERIAN-NB-noSEXPHENO.pvar > ${WORK_DIR}/data/AFR-AAC-META/genomewide-hits-extractPVAR.txt