Step 3: What is the loss of genetic signal at the factor level of multivariate GWAS when the indicator phenotypes are down sampled univariate GWASs?

The following section was written by Camille M. Williams unless otherwise indicated.

1. Calculate the SNP effects associated with the latent factor

The code in this section is written in R. Please make sure you have the latest versions of the packages installed.

We assume that you already munged the summary statistics and calculated the multivariable LDSC in Step 2.

You'll need the HapMap3 reference file (w_hm3.snplist.bz2), and the 1000Genomes-based allele frequency reference files (reference.1000G.maf.0.005.txt) to run the following script in R.

You can download the w_hm3.snplist.bz2 file and the reference.1000G.maf.0.005.txt file here.

For a more detailed overview of estimating SNP effects please visit this page.

library(GenomicSEM)
library(data.table)
library(dplyr)

1.1. Prepare the summary statistics

# provide the path to the summary statistics
files = c("data/CLEANED.A_ADHD_PGC_2017_no_OR.tbl",
          "data/CLEANED.UKB_A_AGE_FIRST_SEX.gz",
          "data/EXTERNALIZING_MA_Problematic_drinking_2019_08_29.tbl.gz",
          "data/EXTERNALIZING_MA_EVER_CANNABIS_STRINGER+UKB_2022_03_03.tbl.gz",
          "data/EXTERNALIZING_MA_SMOKE_EVER.GSCAN+UKB.2022_03_03.tbl.gz",
          "data/CLEANED.UKB_G_NUM_SEX_PARTNERS.gz",
          "data/EXTERNALIZING_MA_General_Risk_Tolerance_2019_10_03.tbl.gz")

# set names for input traits.
trait.names = c("ADHD",
                "AGE_FIRST_SEX_REV_CODED",
                "ALCOHOL_PROB",
                "EVER_CANNABIS",
                "EVER_SMOKER",
                "NUM_SEX_PARTNER",
                "RISK_TOL")

# set sample size for input traits.
sample.prev = c(.36, NA, NA, .265, .499, NA, .26)
population.prev = c(.05, NA, NA, .22, .46, NA, .26)

# set the reference data
info.filter = 0.90
maf.filter = 0.01
hm3 = "data/w_hm3.snplist.bz2"

# munge the summary statistics files
munge(files = files,
      hm3 = hm3,
      trait.names = trait.names,
      info.filter = info.filter,
      maf.filter = maf.filter)

1.2. Create the summary statistics file

# set flags for multivariate GWAS.
se.logit = c(T, F, F, F, F, F, F)
OLS = c(T, T, T, F, F, T, F)
linprob = c(F, F, F, T, T, F, T)

#set the reference file
ref = "data/reference.1000G.maf.0.005.txt"

e_sumstats <- sumstats(files = files,
                       ref = ref,
                       trait.names = trait.names,
                       se.logit = se.logit,
                       OLS = OLS,
                       linprob = linprob,
                       info.filter = info.filter,
                       maf.filter = maf.filter,
                       keep.indel = FALSE,
                       parallel = FALSE,
                       cores = NULL)

saveRDS(e_sumstats, file="data/e_sumstats.rds")

1.3. Run genomic SEM

LDSCoutput <- readRDS("data/LDSCoutput_1.rds")
e_sumstats <- readRDS("data/e_sumstats.rds")

chunk <- 100000 # Number of SNPs in a chunk
n <- nrow(e_sumstats)
r  <- rep(1:ceiling(n/chunk),each=chunk)[1:n]
d <- split(e_sumstats,r) # Split data into chunks of 100,000

# There are 62 chunks of 100,000 SNPs. 
# Depending on your server's ability you may want to run small sets of chunks at a time and combine each rds file.

for (i in 1:length(d)) {
d1 <- d[i]
d1 <- as.data.frame(d1)
colnames(d1) <- sub('beta.', 'beta_', colnames(d1))
colnames(d1) <- sub('se.', 'se_', colnames(d1))
colnames(d1) <- sub('.*\\.', '', colnames(d1))
colnames(d1) <- sub('_', '.', colnames(d1))

#Specify the EXT GWAS model
extgwasmodel <- 'EXTERNALIZING =~ 1*NUM_SEX_PARTNER+ ADHD+ AGE_FIRST_SEX_REV_CODED+ ALCOHOL_PROB+ EVER_CANNABIS+ EVER_SMOKER+ RISK_TOL
                 ALCOHOL_PROB~~ EVER_SMOKER
                 EVER_CANNABIS~~ AGE_FIRST_SEX_REV_CODED
                 EXTERNALIZING ~ SNP'

# run the multivariate GWAS
extgwas <- userGWAS(covstruc = LDSCoutput,
                    SNPs = d1,
                    estimation = "DWLS",
                    model = extgwasmodel,
                    sub=c("EXTERNALIZING~SNP"),
                    GC = "standard")

file1 <- paste("data/Ext1.0_wo_23andme/LDSCoutput_SNPs_Prev_d", i, ".rds", sep = "")
saveRDS(extgwas, file= file1)
}

1.4. Merge the genomic SEM output

files <- list.files(path = "data/Ext1.0_wo_23andme", full.names = TRUE)
rds_combo <- as.data.frame(readRDS(files[1]))

for (i in 2:length(files)) {
  b <- as.data.frame(readRDS(files[i]))
  rds_combo <- rbind(rds_combo, b)
}

#remove label column which is empty
rds_combo1 <- as.data.frame(rds_combo)
rds_combo1 <- dplyr::select(rds_combo1, -label)

1.5. Prepare the genomic SEM output for sharing

## a. Calculate N hat Neff 
rds_combo1$N_Inflated<- (rds_combo1$Z_Estimate/rds_combo1$est)^2 / (2*rds_combo1$MAF*(1-rds_combo1$MAF))
rds_combo11 <- subset(rds_combo1, rds_combo1$MAF <= 0.4 & rds_combo1$MAF >= 0.1)
Neff <- mean(rds_combo11$N_Inflated)
rds_combo1$N <- Neff
setnames(rds_combo1, old = c("Pval_Estimate", "BP", "est", "Z_Estimate" ), new = c("PVAL","POS", "BETA.A1", "Z" ))

## b. Add A1 Allele Frequency
Ref_df <- fread("data/MAPFILE.1000G+UK10K.CEU+TSI+GBR.chr1_22_X.txt")
setnames(Ref_df, old = c("Chr", "Pos", "ID", "ChrPosID"), new = c("CHR","POS", "SNP", "CHR:POS"))
Ref_df_1 <- select(Ref_df, c(CHR, POS, SNP, ALT, "CHR:POS", "AF_ALT_1000G+UK10K")) # Select A1 and A1 FREQ
Ref_df_2 <- left_join(rds_combo1, Ref_df_1)
setnames(Ref_df_2, old = c("AF_ALT_1000G+UK10K"), new = c("FREQ.A1"))
Ref_df_2 <- select(Ref_df_2, -ALT)
Ref_df_2$FREQ.A1 <- (1-Ref_df_2$FREQ.A1)
Ref_df_3 <- select(Ref_df_2, -MAF)

## c. Select variables of interest 
A1<- select(Ref_df_3, c("SNP", "CHR", "POS", "CHR:POS", "A1", "A2", "FREQ.A1", "BETA.A1", "SE","Z","PVAL","N" ))
fwrite(A1,"data/GSEM.GWAS.EXTERNALIZING.excl23andMe.csv")

2. Estimate the genetic correlation between the latent factor of the original model and the latent factor of the model with down-sampled summary statistics using bivariate LDSC regression (`Bulik et al., 2015`)

2.1. Munge each of your summary statistics using terminal

First, make sure that you have downloaded ldsc and followed the instructions under getting started. Also download w_hm3.snplist in your data folder.

Activate the conda environment with LDSC’s dependencies.

conda activate ldsc

Munge statistics of each of your indicators after replacing your path_to_ldsc, path_to_sumstats, path_to_munged_output, and path_to_hm3. As an example, we used the ADHD summary statistics, which we save in a munged_gwas folder.

path_to_ldsc/ldsc/munge_sumstats.py   --sumstats path_to_sumstats/GSEM.GWAS.EXTERNALIZING.excl23andMe.csv   --out path_to_munged_output/munged_gwas/GSEM.GWAS.EXTERNALIZING.excl23andME.sumstats.gz   --chunksize 500000   --merge-alleles path_to_hm3/w_hm3.snplist

2.2. Create the bash script

The following script was adapted by Camille M. Williams from Richard Karlsson Linnér.

Create a bash script to estimate the heritability of each indicator and the bivariate correlation between the latent factor(s) that emerge from models with full and down-sampled summary statistics (e.g., EXT and EXTminus23andMe)

Replace PATH with the path of your folder with the data, gwas_munged, and OUTPUT folders. Make sure you have the (eur_w_ld_chr) folder in your folder, which can be downloaded here.

Input the path_to_ldsc.

#!/bin/bash

echo "Script executed:"
date

date_var=$(date  +"%Y_%m_%d")
mkdir -p /PATH/OUTPUT/rg_${date_var} 
mkdir -p /PATH/OUTPUT/h2_${date_var} 

#==============================================================================#
# ESTIMATE LD Score regression
#==============================================================================#

cd /PATH/munged_gwas/
file_list="GSEM.GWAS.EXTERNALIZING.excl23andME.sumstats.gz.sumstats.gz"
cd PATH

for file in ${file_list}
do(

## ESTIMATE the bivariate rG of between the factor from GSEM with restricted data and the factor from GSEM without restricted data ##
python /path_to_ldsc/ldsc/ldsc.py \
--rg /PATH/gwas_munged/${file},\
/PATH/gwas_munged/EXTERNALIZING_WITH_23ANDME.sumstats.gz.sumstats.gz,\
--ref-ld-chr /PATH/data/eur_w_ld_chr/ \
--w-ld-chr /PATH/data/eur_w_ld_chr/ \
--out /PATH/OUTPUT/rg_${date_var}/rg.${file}.ALL_other
wait

)
done
wait

#==============================================================================#
echo "Script finished:"
date
#==============================================================================#
# END OF SCRIPT 
#==============================================================================#

2.3. Run the bash script in your terminal

Decide on the name_of_your_bash_script. The output will be located in the OUTPUT folder, in a folder the date with the date of the analysis.

bash name_of_your_bash_script.sh

3. Estimated the decrease in genetic signal

Calculate the mean chi-square of the summary statistics from the original multivariate GWAS and the summary statistics of the multivariate GWAS with down-sampled data from the Z estimate of each genomic SEM output.

Genomic_SEM_output$ChiSQ <- (Genomic_SEM_output$Z)^2
mean(Genomic_SEM_output$ChiSQ)

4. Estimate the change in power using down-sampled data.

This section was written by Hyeokmoon Kweon.

You can use the function below to compute the statistical power for the genome-wide significance level (5e-8).

power_gw <- function(N, R2){
    # N: GWAS sample size
    # R2: Effect size in R-squared. Must be between 0 and 1 (not in % scale)        
pchisq(qchisq(5e-8, 1, lower.tail = FALSE), 1, ncp = N * R2, lower.tail = FALSE)
}

For example, to compute the power for the sample size = 1492085 and $R^2$ = 0.0038%,

power_gw(1492085, 0.000038)

which gives 0.981.

5.1. Examine the concordance of the Genomic SEM GWAS coefficients and Z-statistics for subsets of SNPs between the original and reduced summary statistics.

Concordance is examined using a binomial test, regression analyses, and scatter plots. We are interested in evaluating the concordance of SNPs associated with our trait of interest, here EXT. Therefore, we decided to examine the concordance of SNPs that either had a p-value threshold <= 1E-5 or were lead SNPs in the original GWAS.

1. Detect outliers for scatter plots: down-sampled SNPs for which the GWAS coefficient fell outside the 95% confidence interval of the full-data estimate

To detect outliers, we suggest evaluating whether the down-sampled GWAS coefficients fall outside the 95% confidence intervals of their full-data counterparts. If outliers are detected, then we recommend adding an extra indicator column to the down-sampled summary statistics to allow its users to filter out SNPs with deviating down-sampled GWAS coefficients.

First, prepare the summary statistics by making sure the alleles of the summary statistics are aligned. Make sure that down-sample GWAS are restricted to SNPs available in the original summary statistics.

Note that concordance.data.1Emin5, corresponds to a dataset with both the original and downsampled summary statistics but only includes SNPs that had a p-value threshold <= 1E-5 in the original GWAS. The concordance.data.sig corresponds to a dataset with both the original and downsampled summary statistics but only includes SNPs that are lead SNPs in the original summary statistics according to the FUMA analyses. The BETA.A1. variables correspond to the beta column of the sampled (BETA.A1.EXT.min23andMe) and original (BETA.A1.EXT.1.0) summary statistics.

## GEN PROPORTIONAL DEVIATIONS (FOR BETA DIFF FROM ZERO)
concordance.data.1Emin5[BETA.A1.EXT.1.0 != 0, ("prop.abs.dev") := round(((BETA.A1.EXT.min23andMe - BETA.A1.EXT.1.0) / BETA.A1.EXT.1.0), digits = 3) ]

concordance.data.sig[BETA.A1.EXT.1.0 != 0, ("prop.abs.dev") := round(((BETA.A1.EXT.min23andMe - BETA.A1.EXT.1.0) / BETA.A1.EXT.1.0), digits = 3) ]

## CHECK HIST CENTERED ON ZERO AND FOR OUTLIERS
hist(concordance.data.sig$prop.abs.dev)

## FIND OUTLIERS from SNPs in original GWAS (P < 1Emin5)
concordance.data.1Emin5[ , ("outlier95pCI") := 0]
concordance.data.1Emin5[ BETA.A1.EXT.min23andMe > BETA.A1.EXT.1.0 + 1.96 * SE.EXT.1.0 | BETA.A1.EXT.min23andMe < BETA.A1.EXT.1.0 - 1.96 * SE.EXT.1.0, ("outlier95pCI") := 1]
table(concordance.data.1Emin5$outlier95pCI)

## FIND OUTLIERS from SNPs in restricted GWAS (LEAD SNPS)
concordance.data.sig[ , ("outlier95pCI") := 0]
concordance.data.sig[ BETA.A1.EXT.min23andMe > BETA.A1.EXT.1.0 + 1.96 * SE.EXT.1.0 | BETA.A1.EXT.min23andMe < BETA.A1.EXT.1.0 - 1.96 * SE.EXT.1.0, ("outlier95pCI") := 1]
table(concordance.data.sig$outlier95pCI)

######## WRITE OUTLIER COLUMN FOR MERGE WITH THE DISSEMINATED SUMMARY STATISTICS ############
concordance.data[ , ("outlier95pCI") := "No"]
concordance.data[ BETA.A1.EXT.min23andMe > BETA.A1.EXT.1.0 + 1.96 * SE.EXT.1.0 | BETA.A1.EXT.min23andMe < BETA.A1.EXT.1.0 - 1.96 * SE.EXT.1.0, ("outlier95pCI") := "Yes"]

## MERGE with down-sampled summary statistics output from genomic SEM (i.e., all SNPs, not just lead SNPs)
EXT1.0min23andMe <- merge(EXT1.0min23andMe, concordance.data[, .(SNP, outlier95pCI)], by = "SNP", all.x = T, all.y = F, sort = F)

## check outliers in original GWAS that are not in the down-sampled GWAS
EXT1.0min23andMe[ is.na(outlier95pCI) , ("outlier95pCI") := "NotInEXT1.0" ]
EXT1.0min23andMe$BETAoutlier95pCI <- EXT1.0min23andMe$outlier95pCI
table(EXT1.0min23andMe$BETAoutlier95pCI)

## save
fwrite(x = EXT1.0min23andMe[, .(SNP, BETAoutlier95pCI)], file = "EXT1.0min23andMe.rsID.outlier.txt.gz", append = F, quote = F, sep = "\t", compress = "gzip")

5.2. Test the significance of the concordance between the SNPs of the down-sampled and original GWAS

Sign concordance can be evaluated by reporting the proportion of SNPs that have concordant direction of effect or by performing a binomial test. The binomial test requires an assumed null hypothesis of the true probability of success, which we set to 99% to make the test sensitive enough to detect minor deviations from near-perfect concordance (100% is too sensitive as a single discordant observation will reject the null). Power calculations show that 150 independent SNPs provide >80% power to reject this null even if the true, imperfect concordance is as high as 95%.

## FIND POWER OF BINOMIAL TEST
power.out <- round(binom.power(p.alt = 0.95, p = 0.99, n = 1:1000, alternative = "less"), digits = 2)
min(which(power.out == 0.8))

## TEST OF SIGN CONCORDANCE
concordance.data.1Emin5$sign_con <- ifelse(concordance.data.1Emin5$BETA.A1.EXT.1.0 * concordance.data.1Emin5$BETA.A1.EXT.min23andMe >= 0, 1, 0)
table(concordance.data.1Emin5$sign_con)

concordance.data.sig$sign_con <- ifelse(concordance.data.sig$BETA.A1.EXT.1.0 * concordance.data.sig$BETA.A1.EXT.min23andMe >= 0, 1, 0)
all(concordance.data.sig$sign_con == 1)
table(concordance.data.sig$sign_con)

binom.test(x = sum(concordance.data.sig$sign_con), n = nrow(concordance.data.sig), p = 0.99, alternative = "less")$p.value

5.3. Regression Analyses

The regression analysis of the down-sampled coefficients on the full-data coefficients should investigate whether (a) the intercept is zero, (b) whether the regression coefficient is unity (i.e., diagonal line), and (c) whether the adjusted coefficient of determination (adj. R2) is high.

## Figure S4a regression estimates
lm.out <- lm_robust(formula = as.formula("abs.BETA.A1.EXT.min23andMe ~ abs.BETA.A1.EXT.1.0"), data = concordance.data.sig)
summary(lm.out)
sig.intercept <- coefficients(lm.out)[1]
sig.beta <- coefficients(lm.out)[2]
sig.r2 <- summary(lm.out)$adj.r.squared

## Figure S4c regression estimates
lm.out <- lm_robust(formula = as.formula("abs.BETA.A1.EXT.min23andMe ~ abs.BETA.A1.EXT.1.0"), data = concordance.data.1Emin5)
summary(lm.out)
intercept.1Emin5 <- coefficients(lm.out)[1]
beta.1Emin5 <- coefficients(lm.out)[2]
sig.r2.1Emin5 <- summary(lm.out)$adj.r.squared

5.4. Scatter Plots

5.4.1. PLOT BETA P < 1E-5 EXT1.0 (Figure S4c)

jpeg(filename = "Scatter_abs_betas.EXTvsEXT.min23andMe.PvalE1min5EXT.smallcex.jpg", width = 6, height = 5, units = "in", res = 300)
par(las = 1)
plot(x = concordance.data.1Emin5[outlier95pCI == 1, (abs.BETA.A1.EXT.1.0)],
     y = concordance.data.1Emin5[outlier95pCI == 1, (abs.BETA.A1.EXT.min23andMe)],
     ylim = c(0, 0.05),
     xlim = c(0, 0.05),
     pch = 21,
     cex = 0.75,
     lwd = 0.3,
     type = "p",
     xaxs = "i", yaxs = "i",
     xlab = "", ylab = "",
     axes = T,
     bg = "deeppink",
     col = "deeppink3",
     bty = "n")

par(new = T)
plot(x = concordance.data.1Emin5[outlier95pCI == 0, (abs.BETA.A1.EXT.1.0)],
     y = concordance.data.1Emin5[outlier95pCI == 0, (abs.BETA.A1.EXT.min23andMe)],
     ylim = c(0, 0.05),
     xlim = c(0, 0.05),
     pch = 21,
     cex = 0.75,
     lwd = 0.3,
     type = "p",
     xaxs = "i", yaxs = "i",
     xlab = "", ylab = "",
     axes = T,
     bg = "skyblue1",
     col = "royalblue1",
     bty = "n")

## PLOT TITLES AND LABELS
title(main = paste0("Scatter plot of Genomic SEM GWAS coefficients \n (for all ", nrow(concordance.data.1Emin5), " SNPs with P<1E-5 in EXT1.0)"),
      xlab = "|Beta| on EXT",
      ylab = "|Beta| on EXT-minus-23andMe")

## ADD FITTED LINE
abline(a = sig.intercept, b = sig.beta, lty = 2)

## ADD DIAGONAL LINE AS REFERENCE
abline(a = 0, b = 1, lty = 1)

legend_text <- c(paste0("Intercept = ", round(intercept.1Emin5, digits = 3)),
                 paste0("Coefficient = ", round(beta.1Emin5, digits = 3)),
                 paste0("Adj. R-squared = ", round(sig.r2.1Emin5, digits = 3)),
                 "Outlier (>1.96 × SE)")

legend(x = "bottomright", bty = "y", legend = legend_text, adj = c(0), pch = rev(c(21, NA,NA,NA)), pt.bg = "deeppink", col = "deeppink3")

## CLOSE GRAPHICS DEVICE ##
dev.off()

5.4.2. PLOT BETA Lead SNPs (Figure S4a)

jpeg(filename = "Scatter_abs_betas.EXTvsEXT.min23andMe.leadSNPsEXT.smallcex.jpg", width = 6, height = 5, units = "in", res = 300)

par(las = 1)

plot(x = concordance.data.sig[outlier95pCI == 1, (abs.BETA.A1.EXT.1.0)],
     y = concordance.data.sig[outlier95pCI == 1, (abs.BETA.A1.EXT.min23andMe)],
     ylim = c(0, 0.05),
     xlim = c(0, 0.05),
     pch = 21,
     cex = 0.75,
     lwd = 0.3,
     type = "p",
     xaxs = "i", yaxs = "i",
     xlab = "", ylab = "",
     axes = T,
     bg = "deeppink",
     col = "deeppink3",
     bty = "n")

par(new = T)
plot(x = concordance.data.sig[outlier95pCI == 0, (abs.BETA.A1.EXT.1.0)],
     y = concordance.data.sig[outlier95pCI == 0, (abs.BETA.A1.EXT.min23andMe)],
     ylim = c(0, 0.05),
     xlim = c(0, 0.05),
     pch = 21,
     cex = 0.75,
     lwd = 0.3,
     type = "p",
     xaxs = "i", yaxs = "i",
     xlab = "", ylab = "",
     axes = T,
     bg = "skyblue1",
     col = "royalblue1",
     bty = "n")

## PLOT TITLES AND LABELS
title(main = "Scatter plot of Genomic SEM GWAS coefficients \n (for the lead SNPs in EXT1.0)",
      xlab = "|Beta| on EXT",
      ylab = "|Beta| on EXT-minus-23andMe")

## ADD FITTED LINE
abline(a = sig.intercept, b = sig.beta, lty = 2)

## ADD DIAGONAL LINE AS REFERENCE
abline(a = 0, b = 1, lty = 1)

legend_text <- c(paste0("Intercept = ", round(sig.intercept, digits = 3)),
                 paste0("Coefficient = ", round(sig.beta, digits = 3)),
                 paste0("Adj. R-squared = ", round(sig.r2, digits = 3)),
                 "Outlier (>1.96 × SE)")

legend(x = "bottomright", bty = "y", legend = legend_text, adj = c(0), pch = rev(c(21, NA,NA,NA)), pt.bg = "deeppink", col = "deeppink3")

## CLOSE GRAPHICS DEVICE ##
dev.off()

5.4.3. Example PLOT Z SNPs P < 1E-5 EXT1.0 (Figure S4d)

jpeg(filename = "Scatter_abs_Zs.EXTvsEXT.min23andMe.PvalE1min5EXT.smallcex.jpg", width = 6, height = 5, units = "in", res = 300)

par(las = 1)

plot(abs(concordance.data.1Emin5$Z.EXT.1.0),
     abs(concordance.data.1Emin5$Z.EXT.min23andMe),
     ylim = c(0, 20),
     xlim = c(0, 20),
     pch = 21,
     cex = 0.75,
     lwd = 0.3,
     type = "p",
     xaxs = "i", yaxs = "i",
     xlab = "", ylab = "",
     axes = T,
     bg = "chartreuse",
     col = "forestgreen",
     bty = "n")


## PLOT TITLES AND LABELS
title(main = paste0("Scatter plot of Genomic SEM GWAS Z-statistics \n (for all ", nrow(concordance.data.1Emin5), " SNPs with P<1E-5 in EXT1.0)"),
      xlab = "|Z| on EXT",
      ylab = "|Z| on EXT-minus-23andMe")

## ADD DIAGONAL LINE AS REFERENCE
abline(a = 0, b = 1, lty = 1)

## CLOSE GRAPHICS DEVICE ##
dev.off()

5.4.4. PLOT Z Lead SNPs (Figure S4b)

jpeg(filename = "Scatter_abs_Zs.EXTvsEXT.min23andMe.leadSNPsEXT.smallcex.jpg", width = 6, height = 5, units = "in", res = 300)

par(las = 1)

plot(x = abs(concordance.data.sig$Z.EXT.1.0),
     y = abs(concordance.data.sig$Z.EXT.min23andMe),
     ylim = c(0, 20),
     xlim = c(0, 20),
     pch = 21,
     cex = 0.75,
     lwd = 0.3,
     type = "p",
     xaxs = "i", yaxs = "i",
     xlab = "", ylab = "",
     axes = T,
     bg = "chartreuse",
     col = "forestgreen",
     bty = "n")


## PLOT TITLES AND LABELS
title(main = "Scatter plot of Genomic SEM GWAS Z-statistics \n (for the lead SNPs in EXT1.0)",
      xlab = "|Z| on EXT",
      ylab = "|Z| on EXT-minus-23andMe")

## ADD DIAGONAL LINE AS REFERENCE
abline(a = 0, b = 1, lty = 1)

## CLOSE GRAPHICS DEVICE ##
dev.off()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step 3: What is the loss of genetic signal at the factor level of multivariate GWAS when the indicator phenotypes are down sampled univariate GWASs?

1. Calculate the SNP effects associated with the latent factor

1.1. Prepare the summary statistics

1.2. Create the summary statistics file

1.3. Run genomic SEM

1.4. Merge the genomic SEM output

1.5. Prepare the genomic SEM output for sharing

2. Estimate the genetic correlation between the latent factor of the original model and the latent factor of the model with down-sampled summary statistics using bivariate LDSC regression (`Bulik et al., 2015`)

2.1. Munge each of your summary statistics using terminal

2.2. Create the bash script

2.3. Run the bash script in your terminal

3. Estimated the decrease in genetic signal

4. Estimate the change in power using down-sampled data.

5.1. Examine the concordance of the Genomic SEM GWAS coefficients and Z-statistics for subsets of SNPs between the original and reduced summary statistics.

1. Detect outliers for scatter plots: down-sampled SNPs for which the GWAS coefficient fell outside the 95% confidence interval of the full-data estimate

5.2. Test the significance of the concordance between the SNPs of the down-sampled and original GWAS

5.3. Regression Analyses

5.4. Scatter Plots

5.4.1. PLOT BETA P < 1E-5 EXT1.0 (Figure S4c)

5.4.2. PLOT BETA Lead SNPs (Figure S4a)

5.4.3. Example PLOT Z SNPs P < 1E-5 EXT1.0 (Figure S4d)

5.4.4. PLOT Z Lead SNPs (Figure S4b)

Clone this wiki locally

Step 3: What is the loss of genetic signal at the factor level of multivariate GWAS when the indicator phenotypes are down sampled univariate GWASs?

1. Calculate the SNP effects associated with the latent factor

1.1. Prepare the summary statistics

1.2. Create the summary statistics file

1.3. Run genomic SEM

1.4. Merge the genomic SEM output

1.5. Prepare the genomic SEM output for sharing

2. Estimate the genetic correlation between the latent factor of the original model and the latent factor of the model with down-sampled summary statistics using bivariate LDSC regression (Bulik et al., 2015)

2.1. Munge each of your summary statistics using terminal

2.2. Create the bash script

2.3. Run the bash script in your terminal

3. Estimated the decrease in genetic signal

4. Estimate the change in power using down-sampled data.

5.1. Examine the concordance of the Genomic SEM GWAS coefficients and Z-statistics for subsets of SNPs between the original and reduced summary statistics.

1. Detect outliers for scatter plots: down-sampled SNPs for which the GWAS coefficient fell outside the 95% confidence interval of the full-data estimate

5.2. Test the significance of the concordance between the SNPs of the down-sampled and original GWAS

5.3. Regression Analyses

5.4. Scatter Plots

5.4.1. PLOT BETA P < 1E-5 EXT1.0 (Figure S4c)

5.4.2. PLOT BETA Lead SNPs (Figure S4a)

5.4.3. Example PLOT Z SNPs P < 1E-5 EXT1.0 (Figure S4d)

5.4.4. PLOT Z Lead SNPs (Figure S4b)

Clone this wiki locally

2. Estimate the genetic correlation between the latent factor of the original model and the latent factor of the model with down-sampled summary statistics using bivariate LDSC regression (`Bulik et al., 2015`)