-
Notifications
You must be signed in to change notification settings - Fork 0
Step 1: What is the loss of genetic signal in down sampled univariate GWASs (which may later be used as indicator phenotypes in Genomic SEM)?
Richard Karlsson Linnér prepared the summary statistics as described in the original paper and removed 23andMe data when applicable. The following section was written by Camille M. Williams.
1. Report the change in the (effective) sample size of the full and down-sampled summary statistics.
If you are interested in dichotomous traits, make sure you are correctly calculating the N effective (Neff) and the sum of Neff for meta-analyses by visiting this page!
2. Using LD Score regression (Bulik et al., 2015
), compare the following parameters between the full and down-sampled summary statistics:
- The genetic signal (mean χ2, genomic inflation factor, and attenuation ratio)
- The SNP-based heritability
- The genetic correlations between univariate GWASs intended for inclusion in genomic structural equation modeling (Genomic SEM).
First, make sure that you have downloaded ldsc
and followed the instructions under getting started. Also download w_hm3.snplist
in your data
folder.
Activate the conda environment with LDSC’s dependencies.
conda activate ldsc
Munge statistics of each of your indicators after replacing your path_to_ldsc
, path_to_sumstats
, path_to_munged_output
, and path_to_hm3
. As an example, we used the ADHD summary statistics, which we save in a munged_gwas
folder.
path_to_ldsc/ldsc/munge_sumstats.py --sumstats path_to_sumstats/ADHD_sumstats.txt --out path_to_munged_output/munged_gwas/ADHD_sumstats.sumstats.gz --chunksize 500000 --merge-alleles path_to_hm3/w_hm3.snplist
The following script was adapted by Camille M. Williams from Richard Karlsson Linnér's script.
Create a bash script to examine the heritability and bivariate correlation of each indicator phenotype to be included in your factor model (e.g. age of first sex, number of sexual partners, ...).
Replace PATH with the path
of your folder with the data
, gwas_munged
, and OUTPUT
folders. Make sure you have the (eur_w_ld_chr
) folder in your
folder, which can be downloaded here.
Input the path_to_ldsc
.
#!/bin/bash
echo "Script executed:"
date
date_var=$(date +"%Y_%m_%d")
mkdir -p /PATH/OUTPUT/rg_${date_var}
mkdir -p /PATH/OUTPUT/h2_${date_var}
#==============================================================================#
# ESTIMATE LD Score regression
#==============================================================================#
cd /PATH/munged_gwas/
file_list="CLEANED.A_ADHD_PGC_2017_no_OR.sumstats.gz.sumstats.gz \
EXTERNALIZING_MA_EVER_CANNABIS_STRINGER+UKB_2022_03_03.sumstats.gz.sumstats.gz \
EXTERNALIZING_MA_SMOKE_EVER.GSCAN+UKB.2022_03_03.sumstats.gz.sumstats.gz \
CLEANED.UKB_G_NUM_SEX_PARTNERS.sumstats.gz.sumstats.gz \
CLEANED.UKB_A_AGE_FIRST_SEX.sumstats.gz.sumstats.gz \
EXTERNALIZING_MA_Problematic_drinking_2019_08_29.sumstats.gz.sumstats.gz \
EXTERNALIZING_MA_General_Risk_Tolerance_2019_10_03.sumstats.gz.sumstats.gz"
cd PATH
for file in ${file_list}
do(
## ESTIMATE h2 of indicator phenotype ##
python /path_to_ldsc/ldsc/ldsc.py \
--h2 /PATH/gwas_munged/${file} \
--ref-ld-chr /PATH/data/eur_w_ld_chr/ \
--w-ld-chr /PATH/data/eur_w_ld_chr/ \
--out /PATH/OUTPUT/h2_${date_var}/h2.${file}
)
done
wait
for file in ${file_list}
do(
## ESTIMATE bivariate rG of indicator phenotype with other indicator phenotypes to be included in Genomic SEM ##
python /path_to_ldsc/ldsc/ldsc.py \
--rg /PATH/gwas_munged/${file},\
/PATH/gwas_munged/CLEANED.A_ADHD_PGC_2017_no_OR.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/EXTERNALIZING_MA_EVER_CANNABIS_STRINGER+UKB_2022_03_03.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/EXTERNALIZING_MA_SMOKE_EVER.GSCAN+UKB.2022_03_03.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/CLEANED.UKB_G_NUM_SEX_PARTNERS.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/CLEANED.UKB_A_AGE_FIRST_SEX.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/EXTERNALIZING_MA_Problematic_drinking_2019_08_29.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/EXTERNALIZING_MA_General_Risk_Tolerance_2019_10_03.sumstats.gz.sumstats.gz \
--ref-ld-chr /PATH/data/eur_w_ld_chr/ \
--w-ld-chr /PATH/data/eur_w_ld_chr/ \
--out /PATH/OUTPUT/rg_${date_var}/rg.${file}.ALL_other
wait
)
done
wait
#==============================================================================#
echo "Script finished:"
date
#==============================================================================#
# END OF SCRIPT
#==============================================================================#
Decide on the name_of_your_bash_script. The output will be located in the OUTPUT folder, in a subfolder with the date of the analysis (e.g., OUTPUT/rg_2022_05_19).
bash name_of_your_bash_script.sh
See this page for example scripts for data visualization.