# SMMAT Analyses for rare variants (MAF < 0.01) on the updated AD Data

***Written by Bale, 2023***

This notebook documents the population specific SMMAT analyses using the updated AD pheno (https://github.com/gaow/alzheimers-family/blob/master/notebook/20221121_AD_pheno_update.ipynb).

Major updates for the pheno data

* Most of missing data for age has been completed
* missing info for APOE4 updated based on the sequence data
* controls under 60 years of age excluded
* For the European samples (n = 15) age values coded as like 999, 8027 were replaced by the correct age
* unaffected singletons removed 
* PCs recalculated based on the updated pheno

## File paths
Pheno data
 > /mnt/mfs/statgen/alzheimers-family/pheno/pheno_updated_20221121/
 
Geno data: WGS data with jointly called EFIGA and NIALOAD data is available here
 > /mnt/mfs/statgen/alzheimers-family/normalized_bed/normalized_merged_autosome.*  
 
 QCed gene data used for the analyses for African, European and Hispanic
 
 > /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/$i.*
 
 > /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/$i_chr{1..22}.*
 
Null models and GRM were taken from the common variant models : ~/project_bst/notebook_Bale/AD_family/20221209_common_variants_AD_analysis.ipynb

# Rare Variants analyses

## Annotation

In [None]:
#Annotation
[submit_chrs]
import glob
bim_files = glob.glob("/mnt/mfs/hgrcgrid/shared/Family_WGS/plink_files/*.bim")
input: bim_files,group_by = 1
bash:expand = "${ }"
sos run ~/bioworkflows/variant-annotation/annovar.ipynb annovar \
    --cwd /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation_2 \
    --bim_name ${_input} \
    --humandb /mnt/mfs/statgen/isabelle/REF/humandb \
    --xref_path /mnt/mfs/statgen/isabelle/REF/humandb \
    --numThreads 1 \
    --name_prefix all \
    --container_annovar /mnt/mfs/statgen/containers/gatk4-annovar.sif \
    -c bioworkflows/admin/csg.yml -q csg -s force &> annovar.log

'Error: invalid record found in annovar outputfile: <gwasCatalog         12 56095129 56095129 G A 12:56095129:A:I_1'

error annotating chromosome 12. delete this gene from African_12.bim for now

4 different criteria to filter variants:
loss of function
missense: loss of function + nonsynonymous (damaging variants)
loss of function + nonsynonymous
all variants

In [None]:
[group]
import glob
csv_files = glob.glob("/mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/*.csv")
input: csv_files,group_by = 1
output: f'{_input[0]:nnn}'
task: trunk_workers = 1, trunk_size = 1, walltime = '10h', mem = '32G', cores = 1, tags = f'{step_name}_{_output[0]:bn}'
R:  container= '/mnt/mfs/statgen/containers/lmm.sif',expand = "${ }"
    library('dplyr')
    data = read.csv('${_input}')
    data$Gene.refGene = sub("\\;.*", "", data$Gene.refGene)
    df = data[, c('Gene.refGene', 'Chr','Start','Alt','Ref','ExonicFunc.refGene','Func.refGene','AF_afr','AF_nfe','AF_amr','AF_afr.1','AF_nfe.1','AF_amr.1','REVEL_score')]
    df[,15]=1
    df$REVEL_score = as.numeric(as.character(df$REVEL_score))
    df$AF_afr = as.numeric(as.character(df$AF_afr))
    df$AF_nfe = as.numeric(as.character(df$AF_nfe))
    df$AF_amr = as.numeric(as.character(df$AF_amr))
    df$AF_afr.1 = as.numeric(as.character(df$AF_afr.1))
    df$AF_nfe.1 = as.numeric(as.character(df$AF_nfe.1))
    df$AF_amr.1 = as.numeric(as.character(df$AF_amr.1))  
    model1=df%>%filter((ExonicFunc.refGene %in% c('startloss','stopgain')) | (Func.refGene %in% c('splicing', 'exonic;splicing', 'ncRNA_exonic;splicing', 'ncRNA_splicing')))
    model2=df%>%filter((ExonicFunc.refGene %in% c('startloss','stopgain')) | (Func.refGene %in% c('splicing', 'exonic;splicing', 'ncRNA_exonic;splicing', 'ncRNA_splicing')) | (ExonicFunc.refGene=='nonsynonymous SNV' & REVEL_score>=0.5))
    model3=df%>%filter((ExonicFunc.refGene %in% c('nonsynonymous SNV','startloss','stopgain')) | (Func.refGene %in% c('splicing', 'exonic;splicing', 'ncRNA_exonic;splicing', 'ncRNA_splicing')))
    model4=df%>%filter(!is.na(Gene.refGene))
    #African
    model1_a1=model1[which((model1$AF_afr<=0.01&model1$AF_afr.1<=0.01)|is.na(model1$AF_afr)),c(1, 2, 3, 4, 5, 15)]
    model1_a5=model1[which((model1$AF_afr<=0.05&model1$AF_afr.1<=0.05)|is.na(model1$AF_afr)),c(1, 2, 3, 4, 5, 15)]
    model2_a1=model2[which((model2$AF_afr<=0.01&model2$AF_afr.1<=0.01)|is.na(model2$AF_afr)),c(1, 2, 3, 4, 5, 15)]
    model2_a5=model2[which((model2$AF_afr<=0.05&model2$AF_afr.1<=0.05)|is.na(model2$AF_afr)),c(1, 2, 3, 4, 5, 15)]
    model3_a1=model3[which((model3$AF_afr<=0.01&model3$AF_afr.1<=0.01)|is.na(model3$AF_afr)),c(1, 2, 3, 4, 5, 15)]
    model3_a5=model3[which((model3$AF_afr<=0.05&model3$AF_afr.1<=0.05)|is.na(model3$AF_afr)),c(1, 2, 3, 4, 5, 15)]
    model4_a1=model4[which((model4$AF_afr<=0.01&model4$AF_afr.1<=0.01)|is.na(model4$AF_afr)),c(1, 2, 3, 4, 5, 15)]
    model4_a5=model4[which((model4$AF_afr<=0.05&model4$AF_afr.1<=0.05)|is.na(model4$AF_afr)),c(1, 2, 3, 4, 5, 15)]
    write.table(model1_a1, '${_output[0]}.African.model1.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model1_a5, '${_output[0]}.African.model1.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model2_a1, '${_output[0]}.African.model2.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model2_a5, '${_output[0]}.African.model2.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model3_a1, '${_output[0]}.African.model3.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model3_a5, '${_output[0]}.African.model3.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model4_a1, '${_output[0]}.African.model4.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model4_a5, '${_output[0]}.African.model4.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    #European
    model1_e1=model1[which((model1$AF_afr<=0.01&model1$AF_nfe.1<=0.01)|is.na(model1$AF_nfe)),c(1, 2, 3, 4, 5, 15)]
    model1_e5=model1[which((model1$AF_nfe<=0.05&model1$AF_nfe.1<=0.05)|is.na(model1$AF_nfe)),c(1, 2, 3, 4, 5, 15)]
    model2_e1=model2[which((model2$AF_nfe<=0.01&model2$AF_nfe.1<=0.01)|is.na(model2$AF_nfe)),c(1, 2, 3, 4, 5, 15)]
    model2_e5=model2[which((model2$AF_nfe<=0.05&model2$AF_nfe.1<=0.05)|is.na(model2$AF_nfe)),c(1, 2, 3, 4, 5, 15)]
    model3_e1=model3[which((model3$AF_nfe<=0.01&model3$AF_nfe.1<=0.01)|is.na(model3$AF_nfe)),c(1, 2, 3, 4, 5, 15)]
    model3_e5=model3[which((model3$AF_nfe<=0.05&model3$AF_nfe.1<=0.05)|is.na(model3$AF_nfe)),c(1, 2, 3, 4, 5, 15)]
    model4_e1=model4[which((model4$AF_nfe<=0.01&model4$AF_nfe.1<=0.01)|is.na(model4$AF_nfe)),c(1, 2, 3, 4, 5, 15)]
    model4_e5=model4[which((model4$AF_nfe<=0.05&model4$AF_nfe.1<=0.05)|is.na(model4$AF_nfe)),c(1, 2, 3, 4, 5, 15)]
    write.table(model1_e1, '${_output[0]}.European.model1.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model1_e5, '${_output[0]}.European.model1.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model2_e1, '${_output[0]}.European.model2.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model2_e5, '${_output[0]}.European.model2.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model3_e1, '${_output[0]}.European.model3.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model3_e5, '${_output[0]}.European.model3.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model4_e1, '${_output[0]}.European.model4.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model4_e5, '${_output[0]}.European.model4.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    #Hispanic
    model1_h1=model1%>%filter((AF_afr<=0.01&AF_afr.1<=0.01)|is.na(AF_afr))%>%filter((AF_afr<=0.01&AF_afr.1<=0.01)|is.na(AF_afr))%>%filter((AF_amr<=0.01&AF_amr.1<=0.01)|is.na(AF_amr))%>%select(c(1,2, 3, 4, 5, 15))
    model1_h5=model1%>%filter((AF_afr<=0.05&AF_afr.1<=0.05)|is.na(AF_afr))%>%filter((AF_afr<=0.05&AF_afr.1<=0.05)|is.na(AF_afr))%>%filter((AF_amr<=0.05&AF_amr.1<=0.05)|is.na(AF_amr))%>%select(c(1,2, 3, 4, 5, 15))
    model2_h1=model2%>%filter((AF_afr<=0.01&AF_afr.1<=0.01)|is.na(AF_afr))%>%filter((AF_afr<=0.01&AF_afr.1<=0.01)|is.na(AF_afr))%>%filter((AF_amr<=0.01&AF_amr.1<=0.01)|is.na(AF_amr))%>%select(c(1,2, 3, 4, 5, 15))
    model2_h5=model2%>%filter((AF_afr<=0.05&AF_afr.1<=0.05)|is.na(AF_afr))%>%filter((AF_afr<=0.05&AF_afr.1<=0.05)|is.na(AF_afr))%>%filter((AF_amr<=0.05&AF_amr.1<=0.05)|is.na(AF_amr))%>%select(c(1,2, 3, 4, 5, 15))
    model3_h1=model3%>%filter((AF_afr<=0.01&AF_afr.1<=0.01)|is.na(AF_afr))%>%filter((AF_afr<=0.01&AF_afr.1<=0.01)|is.na(AF_afr))%>%filter((AF_amr<=0.01&AF_amr.1<=0.01)|is.na(AF_amr))%>%select(c(1,2, 3, 4, 5, 15))
    model3_h5=model3%>%filter((AF_afr<=0.05&AF_afr.1<=0.05)|is.na(AF_afr))%>%filter((AF_afr<=0.05&AF_afr.1<=0.05)|is.na(AF_afr))%>%filter((AF_amr<=0.05&AF_amr.1<=0.05)|is.na(AF_amr))%>%select(c(1,2, 3, 4, 5, 15))
    model4_h1=model4%>%filter((AF_afr<=0.01&AF_afr.1<=0.01)|is.na(AF_afr))%>%filter((AF_afr<=0.01&AF_afr.1<=0.01)|is.na(AF_afr))%>%filter((AF_amr<=0.01&AF_amr.1<=0.01)|is.na(AF_amr))%>%select(c(1,2, 3, 4, 5, 15))
    model4_h5=model4%>%filter((AF_afr<=0.05&AF_afr.1<=0.05)|is.na(AF_afr))%>%filter((AF_afr<=0.05&AF_afr.1<=0.05)|is.na(AF_afr))%>%filter((AF_amr<=0.05&AF_amr.1<=0.05)|is.na(AF_amr))%>%select(c(1,2, 3, 4, 5, 15))
    write.table(model1_h1, '${_output[0]}.Hispanic.model1.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model1_h5, '${_output[0]}.Hispanic.model1.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model2_h1, '${_output[0]}.Hispanic.model2.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model2_h5, '${_output[0]}.Hispanic.model2.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model3_h1, '${_output[0]}.Hispanic.model3.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model3_h5, '${_output[0]}.Hispanic.model3.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model4_h1, '${_output[0]}.Hispanic.model4.af1.group.file', col.names = F, row.names = F, quote = F, sep = '\t')
    write.table(model4_h5, '${_output[0]}.Hispanic.model4.af5.group.file', col.names = F, row.names = F, quote = F, sep = '\t')

## Generate QCed geno data

In [None]:
# split the geno file per pop and qc
ml PLINK/2.0
for i in African European Hispanic; do   
 plink --bfile /mnt/mfs/statgen/alzheimers-family/normalized_bed/normalized_merged_autosome --geno 0.1 --hwe 5e-08  --keep /mnt/mfs/statgen/alzheimers-family/pheno/pheno_updated_20221121/$i.id --maf 0.00000000000000000000000000000000000001 --make-bed --memory 14400.0 --mind 0.1 --out /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/$i --threads 20
done


In [None]:
# generate chromosome specific genoFile for each ancestry
for chr in {1..22}; do
    plink --bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/African \
          --chr $chr --make-bed \
          --out /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/African_$chr
done

## Population specific analyses

The analyses were based 
* Model 1: gnomAD AF <= 0.01 and pLOF (start gain, stop gain, splicing) variants
* Model 2: pLOF + missense

Note: in the script model 2 is specified as model 3 (which is excatly the same)

* column names BPVALUE is p value from burden test P is p value from SKAT-O test. FREQ is the mean of the MAF for each variants per gene

CHR: Chr POS: Start position SNP: group or gene N: samplesize  P: O.pval (p value from SKAT-O test) BPVALUE: B.pval (p value from burden test)

SCORE: B.score  VAR: B.var  NV: n.variants  FREQ: freq.mean (mean MAF for the variants included in the agregate test)

## Non-Hispanic whites

### Model 1 

In [None]:
#!/bin/sh
#$ -l h_rt=23:00:00
#$ -l h_vmem=200G
#$ -N smmat_AD_European
#$ -o /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO/European-$JOB_ID.out
#$ -e /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO/European-$JOB_ID.err  
#$ -j y
#$ -S /bin/bash

export PATH=$HOME/miniconda3/bin:$PATH

module load Singularity/3.5.3

# with no APOE4 adjustment

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.European.model1.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/European_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.European.European.pca.projected.noAPOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship/geno_pruned_European.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/European.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO/EUR_noapoe_model1.log

# with APOE4 adjustment

module load Singularity/3.5.3

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.European.model1.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/European_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.European.European.pca.projected.APOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship/geno_pruned_European.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/European.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skatO/EUR_apoe_model1.log

### Model 2 

In [None]:
#!/bin/sh
#$ -l h_rt=23:00:00
#$ -l h_vmem=200G
#$ -N smmat_AD_European
#$ -o /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO/European-$JOB_ID.out
#$ -e /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO/European-$JOB_ID.err  
#$ -j y
#$ -S /bin/bash

export PATH=$HOME/miniconda3/bin:$PATH

module load Singularity/3.5.3

# with no APOE4 adjustment

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.European.model3.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/European_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.European.European.pca.projected.noAPOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship/geno_pruned_European.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/European.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO/EUR_noapoe_model3.log

# with APOE4 adjustment

module load Singularity/3.5.3

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.European.model3.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/European_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.European.European.pca.projected.APOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship/geno_pruned_European.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/European.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skatO/EUR_apoe_model3.log

## Carribean Hispanics

### Model 1

In [None]:
#!/bin/sh
#$ -l h_rt=23:00:00
#$ -l h_vmem=200G
#$ -N smmat_AD_HIS
#$ -o /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO/Hispanic-$JOB_ID.out
#$ -e /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO/Hispanic-$JOB_ID.err  
#$ -j y
#$ -S /bin/bash

export PATH=$HOME/miniconda3/bin:$PATH

module load Singularity/3.5.3



#with no APOE4 adjustment

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.Hispanic.model1.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/Hispanic_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.Hispanic.Hispanic.pca.projected.noAPOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship/geno_pruned_Hispanic.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/Hispanic.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO/HIS_noapoe_model1.log


# with APOE4 adjustment

module load Singularity/3.5.3

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.Hispanic.model1.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/Hispanic_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.Hispanic.Hispanic.pca.projected.APOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship/geno_pruned_Hispanic.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/Hispanic.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skatO/HIS_apoe_model1.log

### Model 2

In [None]:
#!/bin/sh
#$ -l h_rt=23:00:00
#$ -l h_vmem=200G
#$ -N smmat_AD_HIS
#$ -o /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO/Hispanic-$JOB_ID.out
#$ -e /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO/Hispanic-$JOB_ID.err  
#$ -j y
#$ -S /bin/bash

export PATH=$HOME/miniconda3/bin:$PATH

module load Singularity/3.5.3



#with no APOE4 adjustment

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.Hispanic.model3.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/Hispanic_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.Hispanic.Hispanic.pca.projected.noAPOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship/geno_pruned_Hispanic.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/Hispanic.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO/HIS_noapoe_model3.log


# with APOE4 adjustment

module load Singularity/3.5.3

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.Hispanic.model3.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/Hispanic_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.Hispanic.Hispanic.pca.projected.APOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship/geno_pruned_Hispanic.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/Hispanic.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skatO/HIS_apoe_model3.log

## African Americans

### Model 1

In [None]:
#!/bin/sh
#$ -l h_rt=23:00:00
#$ -l h_vmem=120G
#$ -N smmat_AD_AA
#$ -o /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO/African-$JOB_ID.out
#$ -e /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO/African-$JOB_ID.err  
#$ -j y
#$ -S /bin/bash

export PATH=$HOME/miniconda3/bin:$PATH

module load Singularity/3.5.3

# with no APOE4 adjustment

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.African.model1.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/African_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.African.African.pca.projected.noAPOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship/geno_pruned_African.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/African.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skatO/smmat_AA_noapoe_model1.log

# with APOE4 adjustment

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.African.model1.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/African_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.African.African.pca.projected.APOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship.African.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/African.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skatO/AA_apoe_model1.log

### Model 2

In [None]:
#!/bin/sh
#$ -l h_rt=23:00:00
#$ -l h_vmem=200G
#$ -N smmat_AD_AA_model3_af1
#$ -o /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO/African_model3_af1-$JOB_ID.out
#$ -e /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO/African_model3_af1-$JOB_ID.err  
#$ -j y
#$ -S /bin/bash

export PATH=$HOME/miniconda3/bin:$PATH

module load Singularity/3.5.3


# without APOE4 adjustment

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.African.model3.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/African_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.African.African.pca.projected.noAPOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship.African.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/African.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skatO/AA_noapoe_model3.log

# with APOE4 adjustment

sos run ~/project2022/bioworkflows/GWAS/LMM2.ipynb SMMAT \
--cwd  /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skatO \
--groupFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.African.model3.af1.group.file \
--posFile /mnt/mfs/statgen/alzheimers-family/SMMAT/20210802/annotation/EFIGA_NIALOAD_chr{1..22}.hg38.hg38_multianno.csv \
--bfile /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/geno/African_{1..22}.bed \
--null_model /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/null2/geno_qced.African.African.pca.projected.APOE.rds \
--grmFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/gmmat/kinship/geno_pruned_African.sXX.txt \
--phenoFile /mnt/mfs/statgen/alzheimers-family/AD_common_variants/PCA/plots/African.pca.projected.txt \
--formatFile   ~/project2022/bioworkflows/GWAS/data/smmat_template_2.yml \
--phenoCol AD \
--label_annotate SNP \
--maf_max_filter 1 \
--covarMaxLevels 10 \
--numThreads 1 \
--bgenMinMAF 0.0 \
--container_lmm /mnt/vast/hpc/csg/containers/lmm.sif \
--container_marp /mnt/vast/hpc/csg/containers/marp.sif \
--geno_filter 0.01 \
--nperbatch 100 \
-c ~/project2022/bioworkflows/admin/csg.yml -s force &> /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skatO/AA_apoe_model3.log

## Meta-analysis

### Model 1 without APOE4 adjustment

In [None]:
SCHEME SAMPLESIZE
MARKER   SNP
WEIGHT   N
EFFECT   BETA
PVAL     P


PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skat/African.pca.projected_AD.SMMAT.snp_stats.gz
PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skat/European.pca.projected_AD.SMMAT.snp_stats.gz
PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skat/Hispanic.pca.projected_AD.SMMAT.snp_stats.gz

OUTFILE /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skat/AD.SMMAT_noAPOE4_META .TXT
ANALYZE
ANALYZE HETEROGENEITY

In [None]:
# Reformat
library(stringr) # to replace strings
library(tidyr) # to get  separate function separate the marker name
library(data.table)
library(dplyr)
data = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skat/AD.SMMAT_noAPOE4_META2.TXT', header = T, sep = '\t')
his = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skat/Hispanic.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
eur = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skat/European.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
afr = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skat/African.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
pos = merge(his, eur, by = c('CHR', 'POS', 'SNP'),all = TRUE)
POS = merge(pos, afr, by = c('CHR', 'POS', 'SNP'),all = TRUE)
data = inner_join(data, POS, by = c('MarkerName'='SNP'))
data$POS = as.numeric(data$POS)
data$CHR = as.numeric(data$CHR)
write.table(data,'/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model1_skat/meta.smmat_noAPOE4.txt', sep = '\t', quote = F, col.names = T, row.names = F)
lambda <- median(qchisq(1-data$P.value,1), na.rm=TRUE)/qchisq(0.5,1)


In [1]:
library(stringr) # to replace strings
library(tidyr) # to get  separate function separate the marker name
library(data.table)
library(dplyr)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:data.table’:

    between, first, last


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




### Model 2 without APOE4 adjustment

In [None]:
SCHEME SAMPLESIZE
MARKER   SNP
WEIGHT   N
EFFECT   BETA
PVAL     P


PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skat/African.pca.projected_AD.SMMAT.snp_stats.gz
PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skat/European.pca.projected_AD.SMMAT.snp_stats.gz
PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skat/Hispanic.pca.projected_AD.SMMAT.snp_stats.gz

OUTFILE /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skat/AD.SMMAT_noAPOE4_META .TXT
ANALYZE
ANALYZE HETEROGENEITY

In [None]:
data = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skat/AD.SMMAT_noAPOE4_META2.TXT', header = T, sep = '\t')
his = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skat/Hispanic.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
eur = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skat/European.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
afr = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skat/African.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
pos = merge(his, eur, by = c('CHR', 'POS', 'SNP'),all = TRUE)
POS = merge(pos, afr, by = c('CHR', 'POS', 'SNP'),all = TRUE)
data = inner_join(data, POS, by = c('MarkerName'='SNP'))
data$POS = as.numeric(data$POS)
data$CHR = as.numeric(data$CHR)
write.table(data,'/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/noapoe/af0.01/model3_skat/meta.smmat_noAPOE4.txt', sep = '\t', quote = F, col.names = T, row.names = F)

### Model 1 with APOE4 adjustment

In [None]:
SCHEME SAMPLESIZE
MARKER   SNP
WEIGHT   N
EFFECT   BETA
PVAL     P


PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skat/African.pca.projected_AD.SMMAT.snp_stats.gz
PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skat/European.pca.projected_AD.SMMAT.snp_stats.gz
PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skat/Hispanic.pca.projected_AD.SMMAT.snp_stats.gz

OUTFILE /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skat/AD.SMMAT_APOE4_META .TXT
ANALYZE
ANALYZE HETEROGENEITY

In [2]:
data = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skat/AD.SMMAT_APOE4_META2.TXT', header = T, sep = '\t')
his = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skat/Hispanic.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
eur = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skat/European.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
afr = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skat/African.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
pos = merge(his, eur, by = c('CHR', 'POS', 'SNP'),all = TRUE)
POS = merge(pos, afr, by = c('CHR', 'POS', 'SNP'),all = TRUE)
data = inner_join(data, POS, by = c('MarkerName'='SNP'))
data$POS = as.numeric(data$POS)
data$CHR = as.numeric(data$CHR)
write.table(data,'/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model1_skat/meta.smmat_APOE4.txt', sep = '\t', quote = F, col.names = T, row.names = F)
lambda <- median(qchisq(1-data$P.value,1), na.rm=TRUE)/qchisq(0.5,1)

### Model 2 with APOE4 adjustment

In [None]:
SCHEME SAMPLESIZE
MARKER   SNP
WEIGHT   N
EFFECT   BETA
PVAL     P


PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skat/African.pca.projected_AD.SMMAT.snp_stats.gz
PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skat/European.pca.projected_AD.SMMAT.snp_stats.gz
PROCESS /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skat/Hispanic.pca.projected_AD.SMMAT.snp_stats.gz

OUTFILE /mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skat/AD.SMMAT_APOE4_META .TXT
ANALYZE
ANALYZE HETEROGENEITY

In [3]:
data = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skat/AD.SMMAT_APOE4_META2.TXT', header = T, sep = '\t')
his = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skat/Hispanic.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
eur = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skat/European.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
afr = read.table('/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skat/African.pca.projected_AD.SMMAT.snp_stats.gz', header = T, sep ='\t')[, c(1, 2, 3)]
pos = merge(his, eur, by = c('CHR', 'POS', 'SNP'),all = TRUE)
POS = merge(pos, afr, by = c('CHR', 'POS', 'SNP'),all = TRUE)
data = inner_join(data, POS, by = c('MarkerName'='SNP'))
data$POS = as.numeric(data$POS)
data$CHR = as.numeric(data$CHR)
write.table(data,'/mnt/mfs/statgen/alzheimers-family/AD_rare_variants/SMMAT/apoe/af0.01/model3_skat/meta.smmat_APOE4.txt', sep = '\t', quote = F, col.names = T, row.names = F)
lambda <- median(qchisq(1-data$P.value,1), na.rm=TRUE)/qchisq(0.5,1)