# GP2: Carrier Counts for GBA1
- **Project:** Large-scale Genetic Characterization of PD in the AFR and AAC
- **Last updated:** December 2024
- **Version:** Bash and Python 3.9
- **Data:** GP2 release 8

## Summary
Software used are Plink v.1.9 and Plink2

# **Analysis of PD genes in GP2 cases and controls of African and African admixed ancestries**


Obtain related samples among African and admixed ancestry individuals from available kinship scores in GP2

In [11]:
!gsutil -u cardterra -m ls gs://gp2tier2/release8*/wgs/deepvariant_joint_calling/related_samples/ | head -4
!gsutil -u cardterra -m ls gs://gp2tier2/release7*/meta_data/related_samples/ | head -2

gs://gp2tier2/release8_13092024/wgs/deepvariant_joint_calling/related_samples/release8_AFR.related
gs://gp2tier2/release8_13092024/wgs/deepvariant_joint_calling/related_samples/release8_AJ.related
gs://gp2tier2/release8_13092024/wgs/deepvariant_joint_calling/related_samples/release8_AMR.related
gs://gp2tier2/release8_13092024/wgs/deepvariant_joint_calling/related_samples/release8_CAH.related
gs://gp2tier2/release7_30042024/meta_data/related_samples/AAC_release7.related
gs://gp2tier2/release7_30042024/meta_data/related_samples/AFR_release7.related


In [None]:
!gsutil -u cardterra -m cp gs://gp2tier2/release8*/wgs/deepvariant_joint_calling/related_samples/*AFR.related .
!gsutil -u cardterra -m cp gs://gp2tier2/release8*/wgs/deepvariant_joint_calling/related_samples/*CAH.related .
!gsutil -u cardterra -m cp gs://gp2tier2/release7_30042024/meta_data/related_samples/A*_release7.related .

Extract African and admixed ancestry PD cases and controls with WGS data in release 8. Include the variants \
that are present at least 1 individual and recode the files as VCFs.

In [None]:
!for i in {1..22} X Y; do ./plink2 --pfile chr"$i"_AFR_release8 --recode vcf --mac 1 --keep-fam afr_case_cohort \
--out chr"$i"_AFR_release8_converted --threads 10 ; done
!for i in {1..22} X Y; do ./plink2 --pfile chr"$i"_AAC_release8 --recode vcf --mac 1 --keep-fam afr_case_cohort \
--out chr"$i"_AAC_release8_converted --threads 10 ; done

Prepare VCF files to be extracted for annoation by selecting the required columns only.

In [None]:
!grep chr chr*_AAC_release8_binary_mac1.vcf | cut -f1-10 > AAC_allvariants
!grep chr chr*_AFR_release8_binary_mac1.vcf | cut -f1-10 > AFR_allvariants

In [4]:
!wc -l AAC_allvariants
!wc -l AFR_allvariants

19616374 AAC_allvariants
30421306 AFR_allvariants


In [None]:
!head -7 chr10_AFR_release8_binary_mac1.vcf | cut -f1-10 > vcf_header
!cat vcf_header AAC_allvariants > AAC_allvariants.vcf
!cat vcf_header AFR_allvariants > AFR_allvariants.vcf

Extract African and admixed ancestry PD cases from the PDGENE clinical exome dataset

In [None]:
!./plink2 --pfile all_chrs --mac 1 --recode vcf --out all_chrs_pdgene_afr_black --keep-fam pdgene_afr --threads 10

Obtain rare-protein altering variants

In [6]:
!grep protein_coding all_chrs_pdgene_afr_black.vcf \
| egrep -v "AF=0.(9|8|7|6|5|4|3|2)|non_coding_transcript_exon_variant|synonymous_variant|upstream|downstream|intron|5_prime_UTR_variant|3_prime_UTR_variant|splice_region_variant&synonymous_variant" \
| grep /1 > all_chrs_pdgene_afr_filtered.vcf
!wc -l all_chrs_pdgene_afr_filtered.vcf

149109


# Obtain the demographics for the individuals included in this study

In [17]:
!gsutil -u cardterra -m cp gs://gp2tier2/release8_13092024/clinical_data/master_key_release8_final_terra.csv .
!gsutil -u cardterra -m cp gs://gp2tier2/release7*/clinical_data/extended_clinical_data_release7.csv .

Copying gs://gp2tier2/release8_13092024/clinical_data/master_key_release8_final_terra.csv...
/ [1/1 files][  8.0 MiB/  8.0 MiB] 100% Done                                    
Operation completed over 1 objects/8.0 MiB.                                      
Copying gs://gp2tier2/release7_30042024/clinical_data/extended_clinical_data_release7.csv...
- [1/1 files][ 24.3 MiB/ 24.3 MiB] 100% Done                                    
Operation completed over 1 objects/24.3 MiB.                                     


# Analysis of the intronic rs3115534-G variant in African and admixed individuals genotyped by the NeuroBooster array

In [119]:
!gsutil -u cardterra -m cp gs://gp2tier2/release7*/imputed_genotypes/AAC/chr1_AAC_release7.pvar .
!gsutil -u cardterra -m cp gs://gp2tier2/release7*/imputed_genotypes/AFR/chr1_AFR_release7.pvar .

Copying gs://gp2tier2/release7_30042024/imputed_genotypes/AAC/chr1_AAC_release7.pvar...
- [1/1 files][418.0 MiB/418.0 MiB] 100% Done                                    
Operation completed over 1 objects/418.0 MiB.                                    


In [None]:
!./plink2 --bfile AFR_release7.fam --snp chr1_155235878_G_T --recode ped --out chr1_155235878_G_T_afr
!./plink2 --bfile AAC_release7.fam --snp chr1_155235878_G_T --recode ped --out chr1_155235878_G_T_aac

In [None]:
./plink --bfile chr1_155235878_G_T_afr --freq case-control --pheno pheno.txt --out chr1_155235878_G_T_afr_freq
./plink --bfile chr1_155235878_G_T_aac --freq case-control --pheno pheno.txt --out chr1_155235878_G_T_aac_freq

In [None]:
!./plink --file chr1_155235878_G_T_afr --assoc --pheno 

Number of heterozygous and homozygous carriers can be obtained from the ped files

In [23]:
#!cat chr1_155235878_G_T_aac.ped
#!cat chr1_155235878_G_T_afr.ped