Skip to content

Step 1: What is the loss of genetic signal in down sampled univariate GWASs (which may later be used as indicator phenotypes in Genomic SEM)?

Camille M. Williams edited this page Feb 7, 2023 · 1 revision

Richard Karlsson Linnér prepared the summary statistics as described in the original paper and removed 23andMe data when applicable. The following section was written by Camille M. Williams.

1. Report the change in the (effective) sample size of the full and down-sampled summary statistics.

If you are interested in dichotomous traits, make sure you are correctly calculating the N effective (Neff) and the sum of Neff for meta-analyses by visiting this page!

2. Using LD Score regression (Bulik et al., 2015), compare the following parameters between the full and down-sampled summary statistics:

  • The genetic signal (mean χ2, genomic inflation factor, and attenuation ratio)
  • The SNP-based heritability
  • The genetic correlations between univariate GWASs intended for inclusion in genomic structural equation modeling (Genomic SEM).

2.1. Munge each of your summary statistics using terminal

First, make sure that you have downloaded ldsc and followed the instructions under getting started. Also download w_hm3.snplist in your data folder.

Activate the conda environment with LDSC’s dependencies.

conda activate ldsc

Munge statistics of each of your indicators after replacing your path_to_ldsc, path_to_sumstats, path_to_munged_output, and path_to_hm3. As an example, we used the ADHD summary statistics, which we save in a munged_gwas folder.

path_to_ldsc/ldsc/munge_sumstats.py   --sumstats path_to_sumstats/ADHD_sumstats.txt   --out path_to_munged_output/munged_gwas/ADHD_sumstats.sumstats.gz   --chunksize 500000   --merge-alleles path_to_hm3/w_hm3.snplist

2.2. Create the bash script

The following script was adapted by Camille M. Williams from Richard Karlsson Linnér's script.

Create a bash script to examine the heritability and bivariate correlation of each indicator phenotype to be included in your factor model (e.g. age of first sex, number of sexual partners, ...).

Replace PATH with the path of your folder with the data, gwas_munged, and OUTPUT folders. Make sure you have the (eur_w_ld_chr) folder in your folder, which can be downloaded here.

Input the path_to_ldsc.

#!/bin/bash

echo "Script executed:"
date

date_var=$(date  +"%Y_%m_%d")
mkdir -p /PATH/OUTPUT/rg_${date_var} 
mkdir -p /PATH/OUTPUT/h2_${date_var} 

#==============================================================================#
# ESTIMATE LD Score regression
#==============================================================================#

cd /PATH/munged_gwas/
file_list="CLEANED.A_ADHD_PGC_2017_no_OR.sumstats.gz.sumstats.gz \
EXTERNALIZING_MA_EVER_CANNABIS_STRINGER+UKB_2022_03_03.sumstats.gz.sumstats.gz \
EXTERNALIZING_MA_SMOKE_EVER.GSCAN+UKB.2022_03_03.sumstats.gz.sumstats.gz \
CLEANED.UKB_G_NUM_SEX_PARTNERS.sumstats.gz.sumstats.gz \
CLEANED.UKB_A_AGE_FIRST_SEX.sumstats.gz.sumstats.gz \
EXTERNALIZING_MA_Problematic_drinking_2019_08_29.sumstats.gz.sumstats.gz \
EXTERNALIZING_MA_General_Risk_Tolerance_2019_10_03.sumstats.gz.sumstats.gz"

cd PATH


for file in ${file_list}
do(

## ESTIMATE h2 of indicator phenotype ##
python /path_to_ldsc/ldsc/ldsc.py \
  --h2 /PATH/gwas_munged/${file} \
  --ref-ld-chr /PATH/data/eur_w_ld_chr/ \
  --w-ld-chr /PATH/data/eur_w_ld_chr/ \
  --out /PATH/OUTPUT/h2_${date_var}/h2.${file}
)
done
wait

for file in ${file_list}
do(

## ESTIMATE bivariate rG of indicator phenotype with other indicator phenotypes to be included in Genomic SEM ##
python /path_to_ldsc/ldsc/ldsc.py \
--rg /PATH/gwas_munged/${file},\
/PATH/gwas_munged/CLEANED.A_ADHD_PGC_2017_no_OR.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/EXTERNALIZING_MA_EVER_CANNABIS_STRINGER+UKB_2022_03_03.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/EXTERNALIZING_MA_SMOKE_EVER.GSCAN+UKB.2022_03_03.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/CLEANED.UKB_G_NUM_SEX_PARTNERS.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/CLEANED.UKB_A_AGE_FIRST_SEX.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/EXTERNALIZING_MA_Problematic_drinking_2019_08_29.sumstats.gz.sumstats.gz,\
/PATH/gwas_munged/EXTERNALIZING_MA_General_Risk_Tolerance_2019_10_03.sumstats.gz.sumstats.gz \
--ref-ld-chr /PATH/data/eur_w_ld_chr/ \
--w-ld-chr /PATH/data/eur_w_ld_chr/ \
--out /PATH/OUTPUT/rg_${date_var}/rg.${file}.ALL_other
wait

)
done
wait

#==============================================================================#
echo "Script finished:"
date
#==============================================================================#
# END OF SCRIPT 
#==============================================================================#

2.3. Run the bash script in your terminal

Decide on the name_of_your_bash_script. The output will be located in the OUTPUT folder, in a subfolder with the date of the analysis (e.g., OUTPUT/rg_2022_05_19).

bash name_of_your_bash_script.sh

3. Create a bivariate Genetic Correlation Matrix

See this page for example scripts for data visualization.