# Regression Adjusted by Covariates 
* **Project:** Mitochondrial 2158T>C variant in PD
* **Version:** Python/3.9
* **Status:** COMPLETE
* **Last Updated:** 14-MARCH-2024

### Notebook Overview
1. Obtain relevant covariates, age, sex, and principal components for WGS samples of 922 PD cases and 229 controls in GP2 monogenic cohort
2. Logistic regression adjusted by covariates

## Loading necessary packages

In [None]:
## Load required packages
module load bcftools
module load plink/2.0-alpha

## Reading in files

In [None]:
# list of the GP2 samples with EUR ancestry and manifest for genomefile
head -3 manifest_eur

In [None]:
#PCA estimation for GP2-EUR samples
plink2 --pfile ${WORK_DIR}/Monogenic_SV/all_chrs --make-bed --max-alleles 2 --maf 0.05 --geno 0.01 \
--out GP2-mono_005
plink2 --bfile GP2-mono_005 --indep-pairwise 200 50 0.25 --out GP2_eur_005
plink2 --bfile GP2-mono_005 --extract GP2_eur_005.prune.in --pca --out GP2_mono_pc
head -3 GP2_mono_pc.eigenvec

In [None]:
#check the covariate file 
head -3 GP2_allcovar

In [None]:
#merge all vcfs
bcftools merge *.vcf.gz -Oz -o merged_GP2_chrm2158.vcf.gz --force-samples -r chrM:2157-2159

In [None]:
# Convert vcf into plink file and fill reference allele
# As Mutect2 does not provide joint calls, 
# we assume all non-mutated alleles are reference alleles in the PLINK format
plink2 --vcf merged_GP2_chrm2158.vcf.gz --max-alleles 2 --make-bed --double-id --out merged_GP2_chrm2158
module load plink/1.9.0-beta4.4
cut -f8 manifest_eur > manifest_genomeid
plink --bfile merged_GP2_chrm2158 --keep-fam manifest_genomeid --fill-missing-a2 --make-bed --out merged_GP2_chrm2158_upd

In [None]:
#logistic regression adjusted by age, sex, and principal components
module load plink/2.0-alpha
plink2 --bfile merged_GP2_chrm2158_upd --glm firth-fallback --covar GP2_allcovar \
--covar-name age_covar,sex_for_qc,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10 \
--pheno GP2_allcovar --pheno-name phenotype --out GP2_chrM --covar-variance-standardize --ci 0.95 --vif 300

In [1]:
head -1 GP2_chrM.phenotype.glm.logistic.hybrid
grep 2158 GP2_chrM.phenotype.glm.logistic.hybrid

#CHROM	POS	ID	REF	ALT	A1	FIRTH?	TEST	OBS_CT	OR	LOG(OR)_SE	L95	U95	Z_STAT	P	ERRCODE
MT	[01;31m[K2158[m[K	.	T	C	C	N	ADD	1005	0.318592	1.72112	0.0109198	9.29513	-0.664593	0.506311	.
MT	[01;31m[K2158[m[K	.	T	C	C	N	sex_for_qc	1005	1.33996	0.0958628	1.11044	1.61693	3.05273	0.0022677	.
MT	[01;31m[K2158[m[K	.	T	C	C	N	age_covar	1005	1.89469	0.0944976	1.57435	2.28021	6.76264	1.35499e-11	.
MT	[01;31m[K2158[m[K	.	T	C	C	N	PC1	1005	2.0111	0.439749	0.849413	4.76156	1.58882	0.112101	.
MT	[01;31m[K2158[m[K	.	T	C	C	N	PC2	1005	0.304818	0.554686	0.102776	0.904045	-2.14183	0.0322074	.
MT	[01;31m[K2158[m[K	.	T	C	C	N	PC3	1005	0.553196	0.395982	0.254577	1.2021	-1.49512	0.134882	.
MT	[01;31m[K2158[m[K	.	T	C	C	N	PC4	1005	0.0508494	0.904457	0.00863795	0.299337	-3.29356	0.000989256	.
MT	[01;31m[K2158[m[K	.	T	C	C	N	PC5	1005	0.0930588	0.518395	0.0336898	0.257049	-4.58053	4.63809e-06	.
MT	[01;31m[K2158[m[K	.	T	C	C	N	PC6	1005	0.36071	0.579879	0.115762	1.12396	-1.75844	0.0786727	.
M