LD score regression trait correlation ERROR:invalid value encountered in sqrt #117

despinaman · 2018-05-14T16:32:29Z

Hi,

I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt

Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected.

I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept?

Thank you!

rkwalters · 2018-05-15T23:21:10Z

Hi, The issue with a negative number in the square root generally occurs when the heritability of one of the traits is negative in one or more of the jackknife estimates of the correlation (i.e. where the square root of h2 exists in the denominator of going from genetic covariance to correlation). This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size). Using "--no-intercept" will not only assume not sample overlap, but also no population stratification or other confounding within each GWAS (i.e. the univariate intercept is 1). If this is a correct assumption, the h2 estimates will be a bit more stable, which could resolve the appearance of negative h2 estimates in the jackknife. On the other hand, incorrectly assuming no stratification/confounding will tend to upwardly bias h2 estimates, also reducing the likelihood of negative h2 estimates. We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4). We also recommend against constraining the intercepts, since the risk of bias tends to outweigh the gains in precision of the estimates. When you're hitting negative h2 estimates that generally means the unfortunate conclusion is that the current data is insufficient for LDSR genetic correlation analysis, which is unfortunate but seems to be the current reality for LDSR sample size/power requirements. Cheers, Raymond

…

On May 14, 2018, at 12:32 PM, despinaman ***@***.***> wrote: Hi, I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected. I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept? Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbU786LH67zzipR99r2VLzqVh232ks5tybGegaJpZM4T-Hu1>.

despinaman · 2018-05-16T14:34:00Z

Dear Raymond, Thank you for your prompt answer. If I understand well, the problem that creates the error is the characteristics of the GWAS(es), i.e.low heritability of one or both traits and/or small number of individuals. What I tried to run as an analysis when the error appeared was a LDSR correlation between a GWAS trait called w6 (it’s a blood biomarker) and a trait called T1D (type 1 diabetes, N~20,000). When I perform the same analysis between a trait called w3 (a biomarker measured on the same GWAS cohort as the w6, with an N~7,800, with a heritability that is estimated to be very similar to the w6) and t1d, then I can complete the analysis without any problem. (N of SNPs~1,000,000). If the heritability and N of individuals was the problem, I would expect that the LDSR correlation analysis between w3 and t1d would produce the same error, as the analysis between w6 and t1d. The Z-scores of the successful analyses are not extreme (>4 or <-4). Below are the LDSR scripts and the results for the 1) w3-t1d, 2)w6-t1d, 3) w3-t1d (no intercept) and 4) w6-t1d (no intercept): 1) ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w3.sumstats.gz t1d.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.1636 0.0362 1.1311 0.0084 0.0034 0.005 2) ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w6.sumstats.gz t1d.sumstats.gz NA NA NA NA NA NA NA NA NA NA 3)ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w3.sumstats.gz t1d.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.4492 0.0287 1 NA 0 NA 4)ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w6.sumstats.gz t1d.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.4482 0.0291 1 NA 0 NA Based on the above results, is it possible to spot where the problem is? Thank you again for your help in this. Kindly, Despoina On May 15, 2018, at 7:21 PM, rkwalters <notifications@github.com<mailto:notifications@github.com>> wrote: Hi, The issue with a negative number in the square root generally occurs when the heritability of one of the traits is negative in one or more of the jackknife estimates of the correlation (i.e. where the square root of h2 exists in the denominator of going from genetic covariance to correlation). This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size). Using "--no-intercept" will not only assume not sample overlap, but also no population stratification or other confounding within each GWAS (i.e. the univariate intercept is 1). If this is a correct assumption, the h2 estimates will be a bit more stable, which could resolve the appearance of negative h2 estimates in the jackknife. On the other hand, incorrectly assuming no stratification/confounding will tend to upwardly bias h2 estimates, also reducing the likelihood of negative h2 estimates. We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4). We also recommend against constraining the intercepts, since the risk of bias tends to outweigh the gains in precision of the estimates. When you're hitting negative h2 estimates that generally means the unfortunate conclusion is that the current data is insufficient for LDSR genetic correlation analysis, which is unfortunate but seems to be the current reality for LDSR sample size/power requirements. Cheers, Raymond

On May 14, 2018, at 12:32 PM, despinaman ***@***.******@***.***>> wrote: Hi, I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected. I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept? Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbU786LH67zzipR99r2VLzqVh232ks5tybGegaJpZM4T-Hu1>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOArNpyZCoYJmoEhuu9-6z1Fx1jbUks5ty2LngaJpZM4T-Hu1>.

despinaman · 2018-05-16T14:52:04Z

Hi again Raymond, As an addition to my previous email, I ran an analysis inverting the traits, i.e. t1d-w3 and t1d-w6. What I conclude from the results is that the h2 for w3 is 0.0997 and for w6 (no-intercept) is 0.1351 whereas the h2 for T1D is 0.1636 and 0.4492 (no intercept) respectively from the previous analysis. Of note, the heritability of T1D based on twin studies is ~50%, which is closer to the h2 of 0.4492 (no intercept analysis). Please let me know your thoughts on this. Kindly, Despoina Below are the results: 1)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w3.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.0997 0.0397 0.9986 0.0082 0.0034 0.005 2)ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w6.sumstats.gz NA NA NA NA NA NA NA NA NA NA 3)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w3.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.0943 0.0347 1 NA 0 NA 4) ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w6.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.1351 0.0358 1 NA 0 NA On May 16, 2018, at 10:33 AM, Despoina Manousaki <despoina.manousaki@mail.mcgill.ca<mailto:despoina.manousaki@mail.mcgill.ca>> wrote: Dear Raymond, Thank you for your prompt answer. If I understand well, the problem that creates the error is the characteristics of the GWAS(es), i.e.low heritability of one or both traits and/or small number of individuals. What I tried to run as an analysis when the error appeared was a LDSR correlation between a GWAS trait called w6 (it’s a blood biomarker) and a trait called T1D (type 1 diabetes, N~20,000). When I perform the same analysis between a trait called w3 (a biomarker measured on the same GWAS cohort as the w6, with an N~7,800, with a heritability that is estimated to be very similar to the w6) and t1d, then I can complete the analysis without any problem. (N of SNPs~1,000,000). If the heritability and N of individuals was the problem, I would expect that the LDSR correlation analysis between w3 and t1d would produce the same error, as the analysis between w6 and t1d. The Z-scores of the successful analyses are not extreme (>4 or <-4). Below are the LDSR scripts and the results for the 1) w3-t1d, 2)w6-t1d, 3) w3-t1d (no intercept) and 4) w6-t1d (no intercept): 1) ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w3.sumstats.gz t1d.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.1636 0.0362 1.1311 0.0084 0.0034 0.005 2) ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w6.sumstats.gz t1d.sumstats.gz NA NA NA NA NA NA NA NA NA NA 3)ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w3.sumstats.gz t1d.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.4492 0.0287 1 NA 0 NA 4)ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w6.sumstats.gz t1d.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.4482 0.0291 1 NA 0 NA Based on the above results, is it possible to spot where the problem is? Thank you again for your help in this. Kindly, Despoina On May 15, 2018, at 7:21 PM, rkwalters <notifications@github.com<mailto:notifications@github.com>> wrote: Hi, The issue with a negative number in the square root generally occurs when the heritability of one of the traits is negative in one or more of the jackknife estimates of the correlation (i.e. where the square root of h2 exists in the denominator of going from genetic covariance to correlation). This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size). Using "--no-intercept" will not only assume not sample overlap, but also no population stratification or other confounding within each GWAS (i.e. the univariate intercept is 1). If this is a correct assumption, the h2 estimates will be a bit more stable, which could resolve the appearance of negative h2 estimates in the jackknife. On the other hand, incorrectly assuming no stratification/confounding will tend to upwardly bias h2 estimates, also reducing the likelihood of negative h2 estimates. We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4). We also recommend against constraining the intercepts, since the risk of bias tends to outweigh the gains in precision of the estimates. When you're hitting negative h2 estimates that generally means the unfortunate conclusion is that the current data is insufficient for LDSR genetic correlation analysis, which is unfortunate but seems to be the current reality for LDSR sample size/power requirements. Cheers, Raymond

On May 14, 2018, at 12:32 PM, despinaman ***@***.******@***.***>> wrote: Hi, I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected. I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept? Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbU786LH67zzipR99r2VLzqVh232ks5tybGegaJpZM4T-Hu1>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOArNpyZCoYJmoEhuu9-6z1Fx1jbUks5ty2LngaJpZM4T-Hu1>.

rkwalters · 2018-05-16T18:18:18Z

Hi Despoina, Yes, your understanding is correct that the problem of errors from negative h2 estimates is generally a property of one of the GWAS. Based on your set of analyses, it looks like the w6 GWAS is likely the problem. The key analysis you're missing to confirm that issue is: ldsc.py --h2 w6.sumstats.gz --ref-ld-chr ... in order to estimate h2 for w6 without needing to constrain the intercept. I wouldn't consider the h2=.135 from the constrained intercept rg analysis terribly meaningful until you have evidence that it's reasonable to constrain that intercept. As far as T1D, when there's that much difference between the h2 estimates with and without constrained intercept then the unconstrained estimate is expected to be much more reliable. Note that the intercept for T1D is highly significant when left unconstrained (intercept = 1.131, se=.008) suggesting the T1D GWAS has uncorrected population stratification, other confounding, or model misspecification, such that the non-null intercept is appropriate for LDSR analyses with those T1D sumstats. Cheers, Raymond

…

On May 16, 2018, at 10:52 AM, despinaman ***@***.***> wrote: Hi again Raymond, As an addition to my previous email, I ran an analysis inverting the traits, i.e. t1d-w3 and t1d-w6. What I conclude from the results is that the h2 for w3 is 0.0997 and for w6 (no-intercept) is 0.1351 whereas the h2 for T1D is 0.1636 and 0.4492 (no intercept) respectively from the previous analysis. Of note, the heritability of T1D based on twin studies is ~50%, which is closer to the h2 of 0.4492 (no intercept analysis). Please let me know your thoughts on this. Kindly, Despoina Below are the results: 1)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w3.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.0997 0.0397 0.9986 0.0082 0.0034 0.005 2)ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w6.sumstats.gz NA NA NA NA NA NA NA NA NA NA 3)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w3.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.0943 0.0347 1 NA 0 NA 4) ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w6.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.1351 0.0358 1 NA 0 NA On May 16, 2018, at 10:33 AM, Despoina Manousaki ***@***.******@***.***>> wrote: Dear Raymond, Thank you for your prompt answer. If I understand well, the problem that creates the error is the characteristics of the GWAS(es), i.e.low heritability of one or both traits and/or small number of individuals. What I tried to run as an analysis when the error appeared was a LDSR correlation between a GWAS trait called w6 (it’s a blood biomarker) and a trait called T1D (type 1 diabetes, N~20,000). When I perform the same analysis between a trait called w3 (a biomarker measured on the same GWAS cohort as the w6, with an N~7,800, with a heritability that is estimated to be very similar to the w6) and t1d, then I can complete the analysis without any problem. (N of SNPs~1,000,000). If the heritability and N of individuals was the problem, I would expect that the LDSR correlation analysis between w3 and t1d would produce the same error, as the analysis between w6 and t1d. The Z-scores of the successful analyses are not extreme (>4 or <-4). Below are the LDSR scripts and the results for the 1) w3-t1d, 2)w6-t1d, 3) w3-t1d (no intercept) and 4) w6-t1d (no intercept): 1) ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w3.sumstats.gz t1d.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.1636 0.0362 1.1311 0.0084 0.0034 0.005 2) ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w6.sumstats.gz t1d.sumstats.gz NA NA NA NA NA NA NA NA NA NA 3)ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w3.sumstats.gz t1d.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.4492 0.0287 1 NA 0 NA 4)ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w6.sumstats.gz t1d.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.4482 0.0291 1 NA 0 NA Based on the above results, is it possible to spot where the problem is? Thank you again for your help in this. Kindly, Despoina On May 15, 2018, at 7:21 PM, rkwalters ***@***.******@***.***>> wrote: Hi, The issue with a negative number in the square root generally occurs when the heritability of one of the traits is negative in one or more of the jackknife estimates of the correlation (i.e. where the square root of h2 exists in the denominator of going from genetic covariance to correlation). This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size). Using "--no-intercept" will not only assume not sample overlap, but also no population stratification or other confounding within each GWAS (i.e. the univariate intercept is 1). If this is a correct assumption, the h2 estimates will be a bit more stable, which could resolve the appearance of negative h2 estimates in the jackknife. On the other hand, incorrectly assuming no stratification/confounding will tend to upwardly bias h2 estimates, also reducing the likelihood of negative h2 estimates. We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4). We also recommend against constraining the intercepts, since the risk of bias tends to outweigh the gains in precision of the estimates. When you're hitting negative h2 estimates that generally means the unfortunate conclusion is that the current data is insufficient for LDSR genetic correlation analysis, which is unfortunate but seems to be the current reality for LDSR sample size/power requirements. Cheers, Raymond > On May 14, 2018, at 12:32 PM, despinaman ***@***.******@***.***>> wrote: > > Hi, > > I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt > > Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected. > > I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept? > > Thank you! > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub <#117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbU786LH67zzipR99r2VLzqVh232ks5tybGegaJpZM4T-Hu1>. > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOArNpyZCoYJmoEhuu9-6z1Fx1jbUks5ty2LngaJpZM4T-Hu1>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvW2yu0XKk4eTUwuDqzvceTx6vCcaks5tzD0WgaJpZM4T-Hu1>.

despinaman · 2018-05-16T18:47:47Z

Hi Raymond, Thank you for your suggestions and explanations. I ran the analysis for the w6 and the results I get are the following: Read regression weight LD Scores for 1242190 SNPs. After merging with reference panel LD, 1148002 SNPs remain. After merging with regression SNP LD, 1041855 SNPs remain. Removed 85 SNPs with chi^2 > 80 (1041770 SNPs remain) Total Observed scale h2: 0.1713 (0.0316) ... Lambda GC: 1.1747 Mean Chi^2: 1.2042 Intercept: 1.1227 (0.0087) Ratio: 0.6009 (0.0428) So the analysis was performed and the h2 is 0.1713 for w6. Would that mean that the w6 GWAS is NOT the problem? I also have a comment on the T1D GWAS. I agree with you that there could be issues with population stratification, since it is an ascertained case-control GWAS (where 30% of the cohort were T1D cases and 70% were controls, which of course does not reflect the general population, where we expect a prevalence of 0.3% of T1D). At this point I was even wondering if estimation of h2 through LDSR can be performed for ascertained case control GWASes (if yes, I guess we should only consider the unconstrained estimate). Please let me know what you think. Cheers, Despoina On May 16, 2018, at 2:18 PM, rkwalters <notifications@github.com<mailto:notifications@github.com>> wrote: Hi Despoina, Yes, your understanding is correct that the problem of errors from negative h2 estimates is generally a property of one of the GWAS. Based on your set of analyses, it looks like the w6 GWAS is likely the problem. The key analysis you're missing to confirm that issue is: ldsc.py --h2 w6.sumstats.gz --ref-ld-chr ... in order to estimate h2 for w6 without needing to constrain the intercept. I wouldn't consider the h2=.135 from the constrained intercept rg analysis terribly meaningful until you have evidence that it's reasonable to constrain that intercept. As far as T1D, when there's that much difference between the h2 estimates with and without constrained intercept then the unconstrained estimate is expected to be much more reliable. Note that the intercept for T1D is highly significant when left unconstrained (intercept = 1.131, se=.008) suggesting the T1D GWAS has uncorrected population stratification, other confounding, or model misspecification, such that the non-null intercept is appropriate for LDSR analyses with those T1D sumstats. Cheers, Raymond

On May 16, 2018, at 10:52 AM, despinaman ***@***.******@***.***>> wrote: Hi again Raymond, As an addition to my previous email, I ran an analysis inverting the traits, i.e. t1d-w3 and t1d-w6. What I conclude from the results is that the h2 for w3 is 0.0997 and for w6 (no-intercept) is 0.1351 whereas the h2 for T1D is 0.1636 and 0.4492 (no intercept) respectively from the previous analysis. Of note, the heritability of T1D based on twin studies is ~50%, which is closer to the h2 of 0.4492 (no intercept analysis). Please let me know your thoughts on this. Kindly, Despoina Below are the results: 1)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w3.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.0997 0.0397 0.9986 0.0082 0.0034 0.005 2)ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w6.sumstats.gz NA NA NA NA NA NA NA NA NA NA 3)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w3.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.0943 0.0347 1 NA 0 NA 4) ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se t1d.sumstats.gz w6.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.1351 0.0358 1 NA 0 NA On May 16, 2018, at 10:33 AM, Despoina Manousaki ***@***.******@***.******@***.***>> wrote: Dear Raymond, Thank you for your prompt answer. If I understand well, the problem that creates the error is the characteristics of the GWAS(es), i.e.low heritability of one or both traits and/or small number of individuals. What I tried to run as an analysis when the error appeared was a LDSR correlation between a GWAS trait called w6 (it’s a blood biomarker) and a trait called T1D (type 1 diabetes, N~20,000). When I perform the same analysis between a trait called w3 (a biomarker measured on the same GWAS cohort as the w6, with an N~7,800, with a heritability that is estimated to be very similar to the w6) and t1d, then I can complete the analysis without any problem. (N of SNPs~1,000,000). If the heritability and N of individuals was the problem, I would expect that the LDSR correlation analysis between w3 and t1d would produce the same error, as the analysis between w6 and t1d. The Z-scores of the successful analyses are not extreme (>4 or <-4). Below are the LDSR scripts and the results for the 1) w3-t1d, 2)w6-t1d, 3) w3-t1d (no intercept) and 4) w6-t1d (no intercept): 1) ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w3.sumstats.gz t1d.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.1636 0.0362 1.1311 0.0084 0.0034 0.005 2) ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w6.sumstats.gz t1d.sumstats.gz NA NA NA NA NA NA NA NA NA NA 3)ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w3.sumstats.gz t1d.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.4492 0.0287 1 NA 0 NA 4)ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d --no-intercept Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se w6.sumstats.gz t1d.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.4482 0.0291 1 NA 0 NA Based on the above results, is it possible to spot where the problem is? Thank you again for your help in this. Kindly, Despoina On May 15, 2018, at 7:21 PM, rkwalters ***@***.******@***.******@***.***>> wrote: Hi, The issue with a negative number in the square root generally occurs when the heritability of one of the traits is negative in one or more of the jackknife estimates of the correlation (i.e. where the square root of h2 exists in the denominator of going from genetic covariance to correlation). This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size). Using "--no-intercept" will not only assume not sample overlap, but also no population stratification or other confounding within each GWAS (i.e. the univariate intercept is 1). If this is a correct assumption, the h2 estimates will be a bit more stable, which could resolve the appearance of negative h2 estimates in the jackknife. On the other hand, incorrectly assuming no stratification/confounding will tend to upwardly bias h2 estimates, also reducing the likelihood of negative h2 estimates. We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4). We also recommend against constraining the intercepts, since the risk of bias tends to outweigh the gains in precision of the estimates. When you're hitting negative h2 estimates that generally means the unfortunate conclusion is that the current data is insufficient for LDSR genetic correlation analysis, which is unfortunate but seems to be the current reality for LDSR sample size/power requirements. Cheers, Raymond > On May 14, 2018, at 12:32 PM, despinaman ***@***.******@***.******@***.***>> wrote: > > Hi, > > I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt > > Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected. > > I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept? > > Thank you! > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub <#117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbU786LH67zzipR99r2VLzqVh232ks5tybGegaJpZM4T-Hu1>. > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOArNpyZCoYJmoEhuu9-6z1Fx1jbUks5ty2LngaJpZM4T-Hu1>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvW2yu0XKk4eTUwuDqzvceTx6vCcaks5tzD0WgaJpZM4T-Hu1>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOPL_8z0up0IxM3Q7BC8wZMOTkGF4ks5tzG1vgaJpZM4T-Hu1>.

rkwalters · 2018-05-17T17:28:42Z

Hi Despoina, Yes, LDSR works for ascertained case/control GWAS. You'll generally want to specify the within-sample and population prevalences though (with --samp-prev and --pop-prev) to get h2 estimates on the liability scale rather than the observed scale, since the interpretation of the observed scale estimate is particularly limited in ascertained samples. That intercept result for w6 definitely indicates that constraining it to zero for the correlation analysis would be inappropriate. The h2 result for w6 is sufficiently strong though that it suggests something else is happening that's not the basic power/sample size issues. My hunch would be some form of model misspecification, possibly to due outlier loci with very strong effects for each phenotype? Biomarkers do often have a genetic architecture with stronger top hits, so it possible that's causing some extra instability in the correlation analysis. To test, I'd recommend excluding any loci with outsized effects for either GWAS and then re-running LDSR to see if you get more stable estimates. Cheers, Raymond

…

On May 16, 2018, at 2:47 PM, despinaman ***@***.***> wrote: Hi Raymond, Thank you for your suggestions and explanations. I ran the analysis for the w6 and the results I get are the following: Read regression weight LD Scores for 1242190 SNPs. After merging with reference panel LD, 1148002 SNPs remain. After merging with regression SNP LD, 1041855 SNPs remain. Removed 85 SNPs with chi^2 > 80 (1041770 SNPs remain) Total Observed scale h2: 0.1713 (0.0316) ... Lambda GC: 1.1747 Mean Chi^2: 1.2042 Intercept: 1.1227 (0.0087) Ratio: 0.6009 (0.0428) So the analysis was performed and the h2 is 0.1713 for w6. Would that mean that the w6 GWAS is NOT the problem? I also have a comment on the T1D GWAS. I agree with you that there could be issues with population stratification, since it is an ascertained case-control GWAS (where 30% of the cohort were T1D cases and 70% were controls, which of course does not reflect the general population, where we expect a prevalence of 0.3% of T1D). At this point I was even wondering if estimation of h2 through LDSR can be performed for ascertained case control GWASes (if yes, I guess we should only consider the unconstrained estimate). Please let me know what you think. Cheers, Despoina On May 16, 2018, at 2:18 PM, rkwalters ***@***.******@***.***>> wrote: Hi Despoina, Yes, your understanding is correct that the problem of errors from negative h2 estimates is generally a property of one of the GWAS. Based on your set of analyses, it looks like the w6 GWAS is likely the problem. The key analysis you're missing to confirm that issue is: ldsc.py --h2 w6.sumstats.gz --ref-ld-chr ... in order to estimate h2 for w6 without needing to constrain the intercept. I wouldn't consider the h2=.135 from the constrained intercept rg analysis terribly meaningful until you have evidence that it's reasonable to constrain that intercept. As far as T1D, when there's that much difference between the h2 estimates with and without constrained intercept then the unconstrained estimate is expected to be much more reliable. Note that the intercept for T1D is highly significant when left unconstrained (intercept = 1.131, se=.008) suggesting the T1D GWAS has uncorrected population stratification, other confounding, or model misspecification, such that the non-null intercept is appropriate for LDSR analyses with those T1D sumstats. Cheers, Raymond > On May 16, 2018, at 10:52 AM, despinaman ***@***.******@***.***>> wrote: > > Hi again Raymond, > > As an addition to my previous email, I ran an analysis inverting the traits, i.e. t1d-w3 and t1d-w6. What I conclude from the results is that the h2 for w3 is 0.0997 and for w6 (no-intercept) is 0.1351 whereas the h2 for T1D is 0.1636 and 0.4492 (no intercept) respectively from the previous analysis. > > Of note, the heritability of T1D based on twin studies is ~50%, which is closer to the h2 of 0.4492 (no intercept analysis). > > Please let me know your thoughts on this. > > Kindly, > > Despoina > > > Below are the results: > 1)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > t1d.sumstats.gz w3.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.0997 0.0397 0.9986 0.0082 0.0034 0.005 > > 2)ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > t1d.sumstats.gz w6.sumstats.gz NA NA NA NA NA NA NA NA NA NA > > 3)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 --no-intercept > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > t1d.sumstats.gz w3.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.0943 0.0347 1 NA 0 NA > > 4) ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 --no-intercept > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > t1d.sumstats.gz w6.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.1351 0.0358 1 NA 0 NA > > > > > On May 16, 2018, at 10:33 AM, Despoina Manousaki ***@***.******@***.******@***.***>> wrote: > > Dear Raymond, > > Thank you for your prompt answer. If I understand well, the problem that creates the error is the characteristics of the GWAS(es), i.e.low heritability of one or both traits and/or small number of individuals. > > What I tried to run as an analysis when the error appeared was a LDSR correlation between a GWAS trait called w6 (it’s a blood biomarker) and a trait called T1D (type 1 diabetes, N~20,000). When I perform the same analysis between a trait called w3 (a biomarker measured on the same GWAS cohort as the w6, with an N~7,800, with a heritability that is estimated to be very similar to the w6) and t1d, then I can complete the analysis without any problem. (N of SNPs~1,000,000). > > If the heritability and N of individuals was the problem, I would expect that the LDSR correlation analysis between w3 and t1d would produce the same error, as the analysis between w6 and t1d. The Z-scores of the successful analyses are not extreme (>4 or <-4). > > Below are the LDSR scripts and the results for the 1) w3-t1d, 2)w6-t1d, 3) w3-t1d (no intercept) and 4) w6-t1d (no intercept): > > 1) ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > w3.sumstats.gz t1d.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.1636 0.0362 1.1311 0.0084 0.0034 0.005 > > 2) ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > w6.sumstats.gz t1d.sumstats.gz NA NA NA NA NA NA NA NA NA NA > > 3)ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d --no-intercept > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > w3.sumstats.gz t1d.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.4492 0.0287 1 NA 0 NA > > 4)ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d --no-intercept > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > w6.sumstats.gz t1d.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.4482 0.0291 1 NA 0 NA > > Based on the above results, is it possible to spot where the problem is? > > Thank you again for your help in this. > > Kindly, > > Despoina > On May 15, 2018, at 7:21 PM, rkwalters ***@***.******@***.******@***.***>> wrote: > > Hi, > > The issue with a negative number in the square root generally occurs when the heritability of one of the traits is negative in one or more of the jackknife estimates of the correlation (i.e. where the square root of h2 exists in the denominator of going from genetic covariance to correlation). This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size). > > Using "--no-intercept" will not only assume not sample overlap, but also no population stratification or other confounding within each GWAS (i.e. the univariate intercept is 1). If this is a correct assumption, the h2 estimates will be a bit more stable, which could resolve the appearance of negative h2 estimates in the jackknife. On the other hand, incorrectly assuming no stratification/confounding will tend to upwardly bias h2 estimates, also reducing the likelihood of negative h2 estimates. > > We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4). We also recommend against constraining the intercepts, since the risk of bias tends to outweigh the gains in precision of the estimates. When you're hitting negative h2 estimates that generally means the unfortunate conclusion is that the current data is insufficient for LDSR genetic correlation analysis, which is unfortunate but seems to be the current reality for LDSR sample size/power requirements. > > Cheers, > Raymond > > > > > On May 14, 2018, at 12:32 PM, despinaman ***@***.******@***.******@***.***>> wrote: > > > > Hi, > > > > I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt > > > > Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected. > > > > I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept? > > > > Thank you! > > > > — > > You are receiving this because you are subscribed to this thread. > > Reply to this email directly, view it on GitHub <#117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbU786LH67zzipR99r2VLzqVh232ks5tybGegaJpZM4T-Hu1>. > > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOArNpyZCoYJmoEhuu9-6z1Fx1jbUks5ty2LngaJpZM4T-Hu1>. > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvW2yu0XKk4eTUwuDqzvceTx6vCcaks5tzD0WgaJpZM4T-Hu1>. > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOPL_8z0up0IxM3Q7BC8wZMOTkGF4ks5tzG1vgaJpZM4T-Hu1>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvdo7YNiGYGkQCeReUFSUp_KnX6BTks5tzHRWgaJpZM4T-Hu1>.

despinaman · 2018-05-17T17:35:43Z

Hi Raymond, Thank you for these suggestions, they are really helpful. I think we can remove the loci with the outsized effects and rerun the LDSR. Do you have any suggestions for OR or chi square cut-offs? In LD hub they suggest removing SNPs with X^2 > 80. Best, Despoina On May 17, 2018, at 1:28 PM, rkwalters <notifications@github.com<mailto:notifications@github.com>> wrote: Hi Despoina, Yes, LDSR works for ascertained case/control GWAS. You'll generally want to specify the within-sample and population prevalences though (with --samp-prev and --pop-prev) to get h2 estimates on the liability scale rather than the observed scale, since the interpretation of the observed scale estimate is particularly limited in ascertained samples. That intercept result for w6 definitely indicates that constraining it to zero for the correlation analysis would be inappropriate. The h2 result for w6 is sufficiently strong though that it suggests something else is happening that's not the basic power/sample size issues. My hunch would be some form of model misspecification, possibly to due outlier loci with very strong effects for each phenotype? Biomarkers do often have a genetic architecture with stronger top hits, so it possible that's causing some extra instability in the correlation analysis. To test, I'd recommend excluding any loci with outsized effects for either GWAS and then re-running LDSR to see if you get more stable estimates. Cheers, Raymond

On May 16, 2018, at 2:47 PM, despinaman ***@***.******@***.***>> wrote: Hi Raymond, Thank you for your suggestions and explanations. I ran the analysis for the w6 and the results I get are the following: Read regression weight LD Scores for 1242190 SNPs. After merging with reference panel LD, 1148002 SNPs remain. After merging with regression SNP LD, 1041855 SNPs remain. Removed 85 SNPs with chi^2 > 80 (1041770 SNPs remain) Total Observed scale h2: 0.1713 (0.0316) ... Lambda GC: 1.1747 Mean Chi^2: 1.2042 Intercept: 1.1227 (0.0087) Ratio: 0.6009 (0.0428) So the analysis was performed and the h2 is 0.1713 for w6. Would that mean that the w6 GWAS is NOT the problem? I also have a comment on the T1D GWAS. I agree with you that there could be issues with population stratification, since it is an ascertained case-control GWAS (where 30% of the cohort were T1D cases and 70% were controls, which of course does not reflect the general population, where we expect a prevalence of 0.3% of T1D). At this point I was even wondering if estimation of h2 through LDSR can be performed for ascertained case control GWASes (if yes, I guess we should only consider the unconstrained estimate). Please let me know what you think. Cheers, Despoina On May 16, 2018, at 2:18 PM, rkwalters ***@***.******@***.******@***.***>> wrote: Hi Despoina, Yes, your understanding is correct that the problem of errors from negative h2 estimates is generally a property of one of the GWAS. Based on your set of analyses, it looks like the w6 GWAS is likely the problem. The key analysis you're missing to confirm that issue is: ldsc.py --h2 w6.sumstats.gz --ref-ld-chr ... in order to estimate h2 for w6 without needing to constrain the intercept. I wouldn't consider the h2=.135 from the constrained intercept rg analysis terribly meaningful until you have evidence that it's reasonable to constrain that intercept. As far as T1D, when there's that much difference between the h2 estimates with and without constrained intercept then the unconstrained estimate is expected to be much more reliable. Note that the intercept for T1D is highly significant when left unconstrained (intercept = 1.131, se=.008) suggesting the T1D GWAS has uncorrected population stratification, other confounding, or model misspecification, such that the non-null intercept is appropriate for LDSR analyses with those T1D sumstats. Cheers, Raymond > On May 16, 2018, at 10:52 AM, despinaman ***@***.******@***.******@***.***>> wrote: > > Hi again Raymond, > > As an addition to my previous email, I ran an analysis inverting the traits, i.e. t1d-w3 and t1d-w6. What I conclude from the results is that the h2 for w3 is 0.0997 and for w6 (no-intercept) is 0.1351 whereas the h2 for T1D is 0.1636 and 0.4492 (no intercept) respectively from the previous analysis. > > Of note, the heritability of T1D based on twin studies is ~50%, which is closer to the h2 of 0.4492 (no intercept analysis). > > Please let me know your thoughts on this. > > Kindly, > > Despoina > > > Below are the results: > 1)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > t1d.sumstats.gz w3.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.0997 0.0397 0.9986 0.0082 0.0034 0.005 > > 2)ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > t1d.sumstats.gz w6.sumstats.gz NA NA NA NA NA NA NA NA NA NA > > 3)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 --no-intercept > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > t1d.sumstats.gz w3.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.0943 0.0347 1 NA 0 NA > > 4) ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 --no-intercept > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > t1d.sumstats.gz w6.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.1351 0.0358 1 NA 0 NA > > > > > On May 16, 2018, at 10:33 AM, Despoina Manousaki ***@***.******@***.******@***.******@***.***>> wrote: > > Dear Raymond, > > Thank you for your prompt answer. If I understand well, the problem that creates the error is the characteristics of the GWAS(es), i.e.low heritability of one or both traits and/or small number of individuals. > > What I tried to run as an analysis when the error appeared was a LDSR correlation between a GWAS trait called w6 (it’s a blood biomarker) and a trait called T1D (type 1 diabetes, N~20,000). When I perform the same analysis between a trait called w3 (a biomarker measured on the same GWAS cohort as the w6, with an N~7,800, with a heritability that is estimated to be very similar to the w6) and t1d, then I can complete the analysis without any problem. (N of SNPs~1,000,000). > > If the heritability and N of individuals was the problem, I would expect that the LDSR correlation analysis between w3 and t1d would produce the same error, as the analysis between w6 and t1d. The Z-scores of the successful analyses are not extreme (>4 or <-4). > > Below are the LDSR scripts and the results for the 1) w3-t1d, 2)w6-t1d, 3) w3-t1d (no intercept) and 4) w6-t1d (no intercept): > > 1) ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > w3.sumstats.gz t1d.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.1636 0.0362 1.1311 0.0084 0.0034 0.005 > > 2) ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > w6.sumstats.gz t1d.sumstats.gz NA NA NA NA NA NA NA NA NA NA > > 3)ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d --no-intercept > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > w3.sumstats.gz t1d.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.4492 0.0287 1 NA 0 NA > > 4)ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d --no-intercept > > Summary of Genetic Correlation Results > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > w6.sumstats.gz t1d.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.4482 0.0291 1 NA 0 NA > > Based on the above results, is it possible to spot where the problem is? > > Thank you again for your help in this. > > Kindly, > > Despoina > On May 15, 2018, at 7:21 PM, rkwalters ***@***.******@***.******@***.******@***.***>> wrote: > > Hi, > > The issue with a negative number in the square root generally occurs when the heritability of one of the traits is negative in one or more of the jackknife estimates of the correlation (i.e. where the square root of h2 exists in the denominator of going from genetic covariance to correlation). This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size). > > Using "--no-intercept" will not only assume not sample overlap, but also no population stratification or other confounding within each GWAS (i.e. the univariate intercept is 1). If this is a correct assumption, the h2 estimates will be a bit more stable, which could resolve the appearance of negative h2 estimates in the jackknife. On the other hand, incorrectly assuming no stratification/confounding will tend to upwardly bias h2 estimates, also reducing the likelihood of negative h2 estimates. > > We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4). We also recommend against constraining the intercepts, since the risk of bias tends to outweigh the gains in precision of the estimates. When you're hitting negative h2 estimates that generally means the unfortunate conclusion is that the current data is insufficient for LDSR genetic correlation analysis, which is unfortunate but seems to be the current reality for LDSR sample size/power requirements. > > Cheers, > Raymond > > > > > On May 14, 2018, at 12:32 PM, despinaman ***@***.******@***.******@***.******@***.***>> wrote: > > > > Hi, > > > > I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt > > > > Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected. > > > > I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept? > > > > Thank you! > > > > — > > You are receiving this because you are subscribed to this thread. > > Reply to this email directly, view it on GitHub <#117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbU786LH67zzipR99r2VLzqVh232ks5tybGegaJpZM4T-Hu1>. > > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOArNpyZCoYJmoEhuu9-6z1Fx1jbUks5ty2LngaJpZM4T-Hu1>. > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvW2yu0XKk4eTUwuDqzvceTx6vCcaks5tzD0WgaJpZM4T-Hu1>. > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOPL_8z0up0IxM3Q7BC8wZMOTkGF4ks5tzG1vgaJpZM4T-Hu1>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvdo7YNiGYGkQCeReUFSUp_KnX6BTks5tzHRWgaJpZM4T-Hu1>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOPJzokQ60A1VdkvVpbosdPkt4NAhks5tzbNOgaJpZM4T-Hu1>.

rkwalters · 2018-05-17T18:28:52Z

Hi Despoina, Yes, our current heuristic for defining those strong loci is chi2 > 80 (or > .001*N if that is larger). Would probably be recommended to exclude the full locus though rather than just the significant SNPs (e.g. something like 500kb around the index SNP). Cheers, Raymond

…

On May 17, 2018, at 1:35 PM, despinaman ***@***.***> wrote: Hi Raymond, Thank you for these suggestions, they are really helpful. I think we can remove the loci with the outsized effects and rerun the LDSR. Do you have any suggestions for OR or chi square cut-offs? In LD hub they suggest removing SNPs with X^2 > 80. Best, Despoina On May 17, 2018, at 1:28 PM, rkwalters ***@***.******@***.***>> wrote: Hi Despoina, Yes, LDSR works for ascertained case/control GWAS. You'll generally want to specify the within-sample and population prevalences though (with --samp-prev and --pop-prev) to get h2 estimates on the liability scale rather than the observed scale, since the interpretation of the observed scale estimate is particularly limited in ascertained samples. That intercept result for w6 definitely indicates that constraining it to zero for the correlation analysis would be inappropriate. The h2 result for w6 is sufficiently strong though that it suggests something else is happening that's not the basic power/sample size issues. My hunch would be some form of model misspecification, possibly to due outlier loci with very strong effects for each phenotype? Biomarkers do often have a genetic architecture with stronger top hits, so it possible that's causing some extra instability in the correlation analysis. To test, I'd recommend excluding any loci with outsized effects for either GWAS and then re-running LDSR to see if you get more stable estimates. Cheers, Raymond > On May 16, 2018, at 2:47 PM, despinaman ***@***.******@***.***>> wrote: > > Hi Raymond, > > Thank you for your suggestions and explanations. I ran the analysis for the w6 and the results I get are the following: > > Read regression weight LD Scores for 1242190 SNPs. > After merging with reference panel LD, 1148002 SNPs remain. > After merging with regression SNP LD, 1041855 SNPs remain. > Removed 85 SNPs with chi^2 > 80 (1041770 SNPs remain) > Total Observed scale h2: 0.1713 (0.0316) > ... > Lambda GC: 1.1747 > Mean Chi^2: 1.2042 > Intercept: 1.1227 (0.0087) > Ratio: 0.6009 (0.0428) > > So the analysis was performed and the h2 is 0.1713 for w6. Would that mean that the w6 GWAS is NOT the problem? > > I also have a comment on the T1D GWAS. I agree with you that there could be issues with population stratification, since it is an ascertained case-control GWAS (where 30% of the cohort were T1D cases and 70% were controls, which of course does not reflect the general population, where we expect a prevalence of 0.3% of T1D). At this point I was even wondering if estimation of h2 through LDSR can be performed for ascertained case control GWASes (if yes, I guess we should only consider the unconstrained estimate). Please let me know what you think. > > Cheers, > > Despoina > On May 16, 2018, at 2:18 PM, rkwalters ***@***.******@***.******@***.***>> wrote: > > Hi Despoina, > > Yes, your understanding is correct that the problem of errors from negative h2 estimates is generally a property of one of the GWAS. Based on your set of analyses, it looks like the w6 GWAS is likely the problem. > > The key analysis you're missing to confirm that issue is: > ldsc.py --h2 w6.sumstats.gz --ref-ld-chr ... > in order to estimate h2 for w6 without needing to constrain the intercept. I wouldn't consider the h2=.135 from the constrained intercept rg analysis terribly meaningful until you have evidence that it's reasonable to constrain that intercept. > > As far as T1D, when there's that much difference between the h2 estimates with and without constrained intercept then the unconstrained estimate is expected to be much more reliable. Note that the intercept for T1D is highly significant when left unconstrained (intercept = 1.131, se=.008) suggesting the T1D GWAS has uncorrected population stratification, other confounding, or model misspecification, such that the non-null intercept is appropriate for LDSR analyses with those T1D sumstats. > > Cheers, > Raymond > > > > On May 16, 2018, at 10:52 AM, despinaman ***@***.******@***.******@***.***>> wrote: > > > > Hi again Raymond, > > > > As an addition to my previous email, I ran an analysis inverting the traits, i.e. t1d-w3 and t1d-w6. What I conclude from the results is that the h2 for w3 is 0.0997 and for w6 (no-intercept) is 0.1351 whereas the h2 for T1D is 0.1636 and 0.4492 (no intercept) respectively from the previous analysis. > > > > Of note, the heritability of T1D based on twin studies is ~50%, which is closer to the h2 of 0.4492 (no intercept analysis). > > > > Please let me know your thoughts on this. > > > > Kindly, > > > > Despoina > > > > > > Below are the results: > > 1)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > t1d.sumstats.gz w3.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.0997 0.0397 0.9986 0.0082 0.0034 0.005 > > > > 2)ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > t1d.sumstats.gz w6.sumstats.gz NA NA NA NA NA NA NA NA NA NA > > > > 3)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 --no-intercept > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > t1d.sumstats.gz w3.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.0943 0.0347 1 NA 0 NA > > > > 4) ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 --no-intercept > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > t1d.sumstats.gz w6.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.1351 0.0358 1 NA 0 NA > > > > > > > > > > On May 16, 2018, at 10:33 AM, Despoina Manousaki ***@***.******@***.******@***.******@***.***>> wrote: > > > > Dear Raymond, > > > > Thank you for your prompt answer. If I understand well, the problem that creates the error is the characteristics of the GWAS(es), i.e.low heritability of one or both traits and/or small number of individuals. > > > > What I tried to run as an analysis when the error appeared was a LDSR correlation between a GWAS trait called w6 (it’s a blood biomarker) and a trait called T1D (type 1 diabetes, N~20,000). When I perform the same analysis between a trait called w3 (a biomarker measured on the same GWAS cohort as the w6, with an N~7,800, with a heritability that is estimated to be very similar to the w6) and t1d, then I can complete the analysis without any problem. (N of SNPs~1,000,000). > > > > If the heritability and N of individuals was the problem, I would expect that the LDSR correlation analysis between w3 and t1d would produce the same error, as the analysis between w6 and t1d. The Z-scores of the successful analyses are not extreme (>4 or <-4). > > > > Below are the LDSR scripts and the results for the 1) w3-t1d, 2)w6-t1d, 3) w3-t1d (no intercept) and 4) w6-t1d (no intercept): > > > > 1) ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > w3.sumstats.gz t1d.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.1636 0.0362 1.1311 0.0084 0.0034 0.005 > > > > 2) ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > w6.sumstats.gz t1d.sumstats.gz NA NA NA NA NA NA NA NA NA NA > > > > 3)ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d --no-intercept > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > w3.sumstats.gz t1d.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.4492 0.0287 1 NA 0 NA > > > > 4)ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d --no-intercept > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > w6.sumstats.gz t1d.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.4482 0.0291 1 NA 0 NA > > > > Based on the above results, is it possible to spot where the problem is? > > > > Thank you again for your help in this. > > > > Kindly, > > > > Despoina > > On May 15, 2018, at 7:21 PM, rkwalters ***@***.******@***.******@***.******@***.***>> wrote: > > > > Hi, > > > > The issue with a negative number in the square root generally occurs when the heritability of one of the traits is negative in one or more of the jackknife estimates of the correlation (i.e. where the square root of h2 exists in the denominator of going from genetic covariance to correlation). This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size). > > > > Using "--no-intercept" will not only assume not sample overlap, but also no population stratification or other confounding within each GWAS (i.e. the univariate intercept is 1). If this is a correct assumption, the h2 estimates will be a bit more stable, which could resolve the appearance of negative h2 estimates in the jackknife. On the other hand, incorrectly assuming no stratification/confounding will tend to upwardly bias h2 estimates, also reducing the likelihood of negative h2 estimates. > > > > We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4). We also recommend against constraining the intercepts, since the risk of bias tends to outweigh the gains in precision of the estimates. When you're hitting negative h2 estimates that generally means the unfortunate conclusion is that the current data is insufficient for LDSR genetic correlation analysis, which is unfortunate but seems to be the current reality for LDSR sample size/power requirements. > > > > Cheers, > > Raymond > > > > > > > > > On May 14, 2018, at 12:32 PM, despinaman ***@***.******@***.******@***.******@***.***>> wrote: > > > > > > Hi, > > > > > > I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt > > > > > > Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected. > > > > > > I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept? > > > > > > Thank you! > > > > > > — > > > You are receiving this because you are subscribed to this thread. > > > Reply to this email directly, view it on GitHub <#117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbU786LH67zzipR99r2VLzqVh232ks5tybGegaJpZM4T-Hu1>. > > > > > > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOArNpyZCoYJmoEhuu9-6z1Fx1jbUks5ty2LngaJpZM4T-Hu1>. > > > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvW2yu0XKk4eTUwuDqzvceTx6vCcaks5tzD0WgaJpZM4T-Hu1>. > > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOPL_8z0up0IxM3Q7BC8wZMOTkGF4ks5tzG1vgaJpZM4T-Hu1>. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvdo7YNiGYGkQCeReUFSUp_KnX6BTks5tzHRWgaJpZM4T-Hu1>. > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOPJzokQ60A1VdkvVpbosdPkt4NAhks5tzbNOgaJpZM4T-Hu1>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvc7tE5TDph1fwdQyP8lH9JHXIPnhks5tzbTygaJpZM4T-Hu1>.

despinaman · 2018-05-18T21:02:48Z

Hi Raymond, Excluding SNPs with chi square >80 solved the problem- I could finally obtain an estimate from correlation of t1d vs w6. I have a last question for you, when I add the --samp-prev and the --pop-prev parameters (since the T1D GWAS is an ascertained case -control GWAS), then I get a liability h2 of 0.087 for t1d that is about half of what I get if I do not include this parameters in the h2 calculation (h2=0.17). Is that something expected? Also, is it a frequent phenomenon to have estimates of h2 based on LDSR that are significantly lower than estimates from twin studies (in case of T1D, twin studies report heritability ~50%)? Thank you again for being extremely helpful in solving this issue. Best, Despoina On May 17, 2018, at 2:28 PM, rkwalters <notifications@github.com<mailto:notifications@github.com>> wrote: Hi Despoina, Yes, our current heuristic for defining those strong loci is chi2 > 80 (or > .001*N if that is larger). Would probably be recommended to exclude the full locus though rather than just the significant SNPs (e.g. something like 500kb around the index SNP). Cheers, Raymond

On May 17, 2018, at 1:35 PM, despinaman ***@***.******@***.***>> wrote: Hi Raymond, Thank you for these suggestions, they are really helpful. I think we can remove the loci with the outsized effects and rerun the LDSR. Do you have any suggestions for OR or chi square cut-offs? In LD hub they suggest removing SNPs with X^2 > 80. Best, Despoina On May 17, 2018, at 1:28 PM, rkwalters ***@***.******@***.******@***.***>> wrote: Hi Despoina, Yes, LDSR works for ascertained case/control GWAS. You'll generally want to specify the within-sample and population prevalences though (with --samp-prev and --pop-prev) to get h2 estimates on the liability scale rather than the observed scale, since the interpretation of the observed scale estimate is particularly limited in ascertained samples. That intercept result for w6 definitely indicates that constraining it to zero for the correlation analysis would be inappropriate. The h2 result for w6 is sufficiently strong though that it suggests something else is happening that's not the basic power/sample size issues. My hunch would be some form of model misspecification, possibly to due outlier loci with very strong effects for each phenotype? Biomarkers do often have a genetic architecture with stronger top hits, so it possible that's causing some extra instability in the correlation analysis. To test, I'd recommend excluding any loci with outsized effects for either GWAS and then re-running LDSR to see if you get more stable estimates. Cheers, Raymond > On May 16, 2018, at 2:47 PM, despinaman ***@***.******@***.******@***.***>> wrote: > > Hi Raymond, > > Thank you for your suggestions and explanations. I ran the analysis for the w6 and the results I get are the following: > > Read regression weight LD Scores for 1242190 SNPs. > After merging with reference panel LD, 1148002 SNPs remain. > After merging with regression SNP LD, 1041855 SNPs remain. > Removed 85 SNPs with chi^2 > 80 (1041770 SNPs remain) > Total Observed scale h2: 0.1713 (0.0316) > ... > Lambda GC: 1.1747 > Mean Chi^2: 1.2042 > Intercept: 1.1227 (0.0087) > Ratio: 0.6009 (0.0428) > > So the analysis was performed and the h2 is 0.1713 for w6. Would that mean that the w6 GWAS is NOT the problem? > > I also have a comment on the T1D GWAS. I agree with you that there could be issues with population stratification, since it is an ascertained case-control GWAS (where 30% of the cohort were T1D cases and 70% were controls, which of course does not reflect the general population, where we expect a prevalence of 0.3% of T1D). At this point I was even wondering if estimation of h2 through LDSR can be performed for ascertained case control GWASes (if yes, I guess we should only consider the unconstrained estimate). Please let me know what you think. > > Cheers, > > Despoina > On May 16, 2018, at 2:18 PM, rkwalters ***@***.******@***.******@***.******@***.***>> wrote: > > Hi Despoina, > > Yes, your understanding is correct that the problem of errors from negative h2 estimates is generally a property of one of the GWAS. Based on your set of analyses, it looks like the w6 GWAS is likely the problem. > > The key analysis you're missing to confirm that issue is: > ldsc.py --h2 w6.sumstats.gz --ref-ld-chr ... > in order to estimate h2 for w6 without needing to constrain the intercept. I wouldn't consider the h2=.135 from the constrained intercept rg analysis terribly meaningful until you have evidence that it's reasonable to constrain that intercept. > > As far as T1D, when there's that much difference between the h2 estimates with and without constrained intercept then the unconstrained estimate is expected to be much more reliable. Note that the intercept for T1D is highly significant when left unconstrained (intercept = 1.131, se=.008) suggesting the T1D GWAS has uncorrected population stratification, other confounding, or model misspecification, such that the non-null intercept is appropriate for LDSR analyses with those T1D sumstats. > > Cheers, > Raymond > > > > On May 16, 2018, at 10:52 AM, despinaman ***@***.******@***.******@***.******@***.***>> wrote: > > > > Hi again Raymond, > > > > As an addition to my previous email, I ran an analysis inverting the traits, i.e. t1d-w3 and t1d-w6. What I conclude from the results is that the h2 for w3 is 0.0997 and for w6 (no-intercept) is 0.1351 whereas the h2 for T1D is 0.1636 and 0.4492 (no intercept) respectively from the previous analysis. > > > > Of note, the heritability of T1D based on twin studies is ~50%, which is closer to the h2 of 0.4492 (no intercept analysis). > > > > Please let me know your thoughts on this. > > > > Kindly, > > > > Despoina > > > > > > Below are the results: > > 1)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > t1d.sumstats.gz w3.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.0997 0.0397 0.9986 0.0082 0.0034 0.005 > > > > 2)ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > t1d.sumstats.gz w6.sumstats.gz NA NA NA NA NA NA NA NA NA NA > > > > 3)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 --no-intercept > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > t1d.sumstats.gz w3.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.0943 0.0347 1 NA 0 NA > > > > 4) ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 --no-intercept > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > t1d.sumstats.gz w6.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.1351 0.0358 1 NA 0 NA > > > > > > > > > > On May 16, 2018, at 10:33 AM, Despoina Manousaki ***@***.******@***.******@***.******@***.******@***.***>> wrote: > > > > Dear Raymond, > > > > Thank you for your prompt answer. If I understand well, the problem that creates the error is the characteristics of the GWAS(es), i.e.low heritability of one or both traits and/or small number of individuals. > > > > What I tried to run as an analysis when the error appeared was a LDSR correlation between a GWAS trait called w6 (it’s a blood biomarker) and a trait called T1D (type 1 diabetes, N~20,000). When I perform the same analysis between a trait called w3 (a biomarker measured on the same GWAS cohort as the w6, with an N~7,800, with a heritability that is estimated to be very similar to the w6) and t1d, then I can complete the analysis without any problem. (N of SNPs~1,000,000). > > > > If the heritability and N of individuals was the problem, I would expect that the LDSR correlation analysis between w3 and t1d would produce the same error, as the analysis between w6 and t1d. The Z-scores of the successful analyses are not extreme (>4 or <-4). > > > > Below are the LDSR scripts and the results for the 1) w3-t1d, 2)w6-t1d, 3) w3-t1d (no intercept) and 4) w6-t1d (no intercept): > > > > 1) ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > w3.sumstats.gz t1d.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.1636 0.0362 1.1311 0.0084 0.0034 0.005 > > > > 2) ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > w6.sumstats.gz t1d.sumstats.gz NA NA NA NA NA NA NA NA NA NA > > > > 3)ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d --no-intercept > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > w3.sumstats.gz t1d.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.4492 0.0287 1 NA 0 NA > > > > 4)ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d --no-intercept > > > > Summary of Genetic Correlation Results > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > w6.sumstats.gz t1d.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.4482 0.0291 1 NA 0 NA > > > > Based on the above results, is it possible to spot where the problem is? > > > > Thank you again for your help in this. > > > > Kindly, > > > > Despoina > > On May 15, 2018, at 7:21 PM, rkwalters ***@***.******@***.******@***.******@***.******@***.***>> wrote: > > > > Hi, > > > > The issue with a negative number in the square root generally occurs when the heritability of one of the traits is negative in one or more of the jackknife estimates of the correlation (i.e. where the square root of h2 exists in the denominator of going from genetic covariance to correlation). This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size). > > > > Using "--no-intercept" will not only assume not sample overlap, but also no population stratification or other confounding within each GWAS (i.e. the univariate intercept is 1). If this is a correct assumption, the h2 estimates will be a bit more stable, which could resolve the appearance of negative h2 estimates in the jackknife. On the other hand, incorrectly assuming no stratification/confounding will tend to upwardly bias h2 estimates, also reducing the likelihood of negative h2 estimates. > > > > We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4). We also recommend against constraining the intercepts, since the risk of bias tends to outweigh the gains in precision of the estimates. When you're hitting negative h2 estimates that generally means the unfortunate conclusion is that the current data is insufficient for LDSR genetic correlation analysis, which is unfortunate but seems to be the current reality for LDSR sample size/power requirements. > > > > Cheers, > > Raymond > > > > > > > > > On May 14, 2018, at 12:32 PM, despinaman ***@***.******@***.******@***.******@***.******@***.***>> wrote: > > > > > > Hi, > > > > > > I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt > > > > > > Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected. > > > > > > I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept? > > > > > > Thank you! > > > > > > — > > > You are receiving this because you are subscribed to this thread. > > > Reply to this email directly, view it on GitHub <#117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbU786LH67zzipR99r2VLzqVh232ks5tybGegaJpZM4T-Hu1>. > > > > > > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOArNpyZCoYJmoEhuu9-6z1Fx1jbUks5ty2LngaJpZM4T-Hu1>. > > > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvW2yu0XKk4eTUwuDqzvceTx6vCcaks5tzD0WgaJpZM4T-Hu1>. > > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOPL_8z0up0IxM3Q7BC8wZMOTkGF4ks5tzG1vgaJpZM4T-Hu1>. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvdo7YNiGYGkQCeReUFSUp_KnX6BTks5tzHRWgaJpZM4T-Hu1>. > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOPJzokQ60A1VdkvVpbosdPkt4NAhks5tzbNOgaJpZM4T-Hu1>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvc7tE5TDph1fwdQyP8lH9JHXIPnhks5tzbTygaJpZM4T-Hu1>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOEDUFW_lclDbh0OUwcynHPEwvgpfks5tzcFngaJpZM4T-Hu1>.

rkwalters · 2018-05-18T21:26:15Z

Hi Despoina, Yes, for rare outcomes where the GWAS sample is strongly ascertained to over-sample cases the estimated liability-scale h2 will be lower than the observed scale. This is because the ascertainment induces much more phenotypic variability in the sample than exists in the population, making the standardized effect sizes look larger than they would in a population sample, so correcting for that difference leads to the smaller liability scale estimate. Yes, it's normal for LDSR h2 estimates to be much lower than twin study estimates. This is expected due to a combination of many factors, including that LDSR and twin studies aren't estimating the same type of h2 (as written about here <http://www.nealelab.is/blog/2017/9/13/heritability-201-types-of-heritability-and-how-we-estimate-it>), potential model misspecification in LDSR, potential downward bias in LDSR h2 at small sample sizes, and potential confounding in twin studies. Cheers, Raymond

…

On May 18, 2018, at 5:02 PM, despinaman ***@***.***> wrote: Hi Raymond, Excluding SNPs with chi square >80 solved the problem- I could finally obtain an estimate from correlation of t1d vs w6. I have a last question for you, when I add the --samp-prev and the --pop-prev parameters (since the T1D GWAS is an ascertained case -control GWAS), then I get a liability h2 of 0.087 for t1d that is about half of what I get if I do not include this parameters in the h2 calculation (h2=0.17). Is that something expected? Also, is it a frequent phenomenon to have estimates of h2 based on LDSR that are significantly lower than estimates from twin studies (in case of T1D, twin studies report heritability ~50%)? Thank you again for being extremely helpful in solving this issue. Best, Despoina On May 17, 2018, at 2:28 PM, rkwalters ***@***.******@***.***>> wrote: Hi Despoina, Yes, our current heuristic for defining those strong loci is chi2 > 80 (or > .001*N if that is larger). Would probably be recommended to exclude the full locus though rather than just the significant SNPs (e.g. something like 500kb around the index SNP). Cheers, Raymond > On May 17, 2018, at 1:35 PM, despinaman ***@***.******@***.***>> wrote: > > Hi Raymond, > > Thank you for these suggestions, they are really helpful. I think we can remove the loci with the outsized effects and rerun the LDSR. Do you have any suggestions for OR or chi square cut-offs? In LD hub they suggest removing SNPs with X^2 > 80. > > Best, > > Despoina > On May 17, 2018, at 1:28 PM, rkwalters ***@***.******@***.******@***.***>> wrote: > > Hi Despoina, > > Yes, LDSR works for ascertained case/control GWAS. You'll generally want to specify the within-sample and population prevalences though (with --samp-prev and --pop-prev) to get h2 estimates on the liability scale rather than the observed scale, since the interpretation of the observed scale estimate is particularly limited in ascertained samples. > > That intercept result for w6 definitely indicates that constraining it to zero for the correlation analysis would be inappropriate. > > The h2 result for w6 is sufficiently strong though that it suggests something else is happening that's not the basic power/sample size issues. My hunch would be some form of model misspecification, possibly to due outlier loci with very strong effects for each phenotype? Biomarkers do often have a genetic architecture with stronger top hits, so it possible that's causing some extra instability in the correlation analysis. > > To test, I'd recommend excluding any loci with outsized effects for either GWAS and then re-running LDSR to see if you get more stable estimates. > > Cheers, > Raymond > > > > > On May 16, 2018, at 2:47 PM, despinaman ***@***.******@***.******@***.***>> wrote: > > > > Hi Raymond, > > > > Thank you for your suggestions and explanations. I ran the analysis for the w6 and the results I get are the following: > > > > Read regression weight LD Scores for 1242190 SNPs. > > After merging with reference panel LD, 1148002 SNPs remain. > > After merging with regression SNP LD, 1041855 SNPs remain. > > Removed 85 SNPs with chi^2 > 80 (1041770 SNPs remain) > > Total Observed scale h2: 0.1713 (0.0316) > > ... > > Lambda GC: 1.1747 > > Mean Chi^2: 1.2042 > > Intercept: 1.1227 (0.0087) > > Ratio: 0.6009 (0.0428) > > > > So the analysis was performed and the h2 is 0.1713 for w6. Would that mean that the w6 GWAS is NOT the problem? > > > > I also have a comment on the T1D GWAS. I agree with you that there could be issues with population stratification, since it is an ascertained case-control GWAS (where 30% of the cohort were T1D cases and 70% were controls, which of course does not reflect the general population, where we expect a prevalence of 0.3% of T1D). At this point I was even wondering if estimation of h2 through LDSR can be performed for ascertained case control GWASes (if yes, I guess we should only consider the unconstrained estimate). Please let me know what you think. > > > > Cheers, > > > > Despoina > > On May 16, 2018, at 2:18 PM, rkwalters ***@***.******@***.******@***.******@***.***>> wrote: > > > > Hi Despoina, > > > > Yes, your understanding is correct that the problem of errors from negative h2 estimates is generally a property of one of the GWAS. Based on your set of analyses, it looks like the w6 GWAS is likely the problem. > > > > The key analysis you're missing to confirm that issue is: > > ldsc.py --h2 w6.sumstats.gz --ref-ld-chr ... > > in order to estimate h2 for w6 without needing to constrain the intercept. I wouldn't consider the h2=.135 from the constrained intercept rg analysis terribly meaningful until you have evidence that it's reasonable to constrain that intercept. > > > > As far as T1D, when there's that much difference between the h2 estimates with and without constrained intercept then the unconstrained estimate is expected to be much more reliable. Note that the intercept for T1D is highly significant when left unconstrained (intercept = 1.131, se=.008) suggesting the T1D GWAS has uncorrected population stratification, other confounding, or model misspecification, such that the non-null intercept is appropriate for LDSR analyses with those T1D sumstats. > > > > Cheers, > > Raymond > > > > > > > On May 16, 2018, at 10:52 AM, despinaman ***@***.******@***.******@***.******@***.***>> wrote: > > > > > > Hi again Raymond, > > > > > > As an addition to my previous email, I ran an analysis inverting the traits, i.e. t1d-w3 and t1d-w6. What I conclude from the results is that the h2 for w3 is 0.0997 and for w6 (no-intercept) is 0.1351 whereas the h2 for T1D is 0.1636 and 0.4492 (no intercept) respectively from the previous analysis. > > > > > > Of note, the heritability of T1D based on twin studies is ~50%, which is closer to the h2 of 0.4492 (no intercept analysis). > > > > > > Please let me know your thoughts on this. > > > > > > Kindly, > > > > > > Despoina > > > > > > > > > Below are the results: > > > 1)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 > > > > > > Summary of Genetic Correlation Results > > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > > t1d.sumstats.gz w3.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.0997 0.0397 0.9986 0.0082 0.0034 0.005 > > > > > > 2)ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 > > > > > > Summary of Genetic Correlation Results > > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > > t1d.sumstats.gz w6.sumstats.gz NA NA NA NA NA NA NA NA NA NA > > > > > > 3)ldsc.py --rg t1d.sumstats.gz,w3.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w3 --no-intercept > > > > > > Summary of Genetic Correlation Results > > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > > t1d.sumstats.gz w3.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.0943 0.0347 1 NA 0 NA > > > > > > 4) ldsc.py --rg t1d.sumstats.gz,w6.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out t1d_w6 --no-intercept > > > > > > Summary of Genetic Correlation Results > > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > > t1d.sumstats.gz w6.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.1351 0.0358 1 NA 0 NA > > > > > > > > > > > > > > > On May 16, 2018, at 10:33 AM, Despoina Manousaki ***@***.******@***.******@***.******@***.******@***.***>> wrote: > > > > > > Dear Raymond, > > > > > > Thank you for your prompt answer. If I understand well, the problem that creates the error is the characteristics of the GWAS(es), i.e.low heritability of one or both traits and/or small number of individuals. > > > > > > What I tried to run as an analysis when the error appeared was a LDSR correlation between a GWAS trait called w6 (it’s a blood biomarker) and a trait called T1D (type 1 diabetes, N~20,000). When I perform the same analysis between a trait called w3 (a biomarker measured on the same GWAS cohort as the w6, with an N~7,800, with a heritability that is estimated to be very similar to the w6) and t1d, then I can complete the analysis without any problem. (N of SNPs~1,000,000). > > > > > > If the heritability and N of individuals was the problem, I would expect that the LDSR correlation analysis between w3 and t1d would produce the same error, as the analysis between w6 and t1d. The Z-scores of the successful analyses are not extreme (>4 or <-4). > > > > > > Below are the LDSR scripts and the results for the 1) w3-t1d, 2)w6-t1d, 3) w3-t1d (no intercept) and 4) w6-t1d (no intercept): > > > > > > 1) ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d > > > > > > Summary of Genetic Correlation Results > > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > > w3.sumstats.gz t1d.sumstats.gz -0.2635 0.1735 -1.5186 0.1289 0.1636 0.0362 1.1311 0.0084 0.0034 0.005 > > > > > > 2) ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d > > > > > > Summary of Genetic Correlation Results > > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > > w6.sumstats.gz t1d.sumstats.gz NA NA NA NA NA NA NA NA NA NA > > > > > > 3)ldsc.py --rg w3.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w3_t1d --no-intercept > > > > > > Summary of Genetic Correlation Results > > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > > w3.sumstats.gz t1d.sumstats.gz -0.1105 0.0726 -1.5221 0.128 0.4492 0.0287 1 NA 0 NA > > > > > > 4)ldsc.py --rg w6.sumstats.gz,t1d.sumstats.gz --ref-ld-chr $SHARE/DATA/GWAS/LDSC/1000G_EUR_Phase3_baseline/baseline. --w-ld-chr $SHARE/DATA/GWAS/LDSC/weights_hm3_no_hla/weights. --out w6_t1d --no-intercept > > > > > > Summary of Genetic Correlation Results > > > p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se > > > w6.sumstats.gz t1d.sumstats.gz -0.0514 0.0654 -0.7853 0.4323 0.4482 0.0291 1 NA 0 NA > > > > > > Based on the above results, is it possible to spot where the problem is? > > > > > > Thank you again for your help in this. > > > > > > Kindly, > > > > > > Despoina > > > On May 15, 2018, at 7:21 PM, rkwalters ***@***.******@***.******@***.******@***.******@***.***>> wrote: > > > > > > Hi, > > > > > > The issue with a negative number in the square root generally occurs when the heritability of one of the traits is negative in one or more of the jackknife estimates of the correlation (i.e. where the square root of h2 exists in the denominator of going from genetic covariance to correlation). This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size). > > > > > > Using "--no-intercept" will not only assume not sample overlap, but also no population stratification or other confounding within each GWAS (i.e. the univariate intercept is 1). If this is a correct assumption, the h2 estimates will be a bit more stable, which could resolve the appearance of negative h2 estimates in the jackknife. On the other hand, incorrectly assuming no stratification/confounding will tend to upwardly bias h2 estimates, also reducing the likelihood of negative h2 estimates. > > > > > > We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4). We also recommend against constraining the intercepts, since the risk of bias tends to outweigh the gains in precision of the estimates. When you're hitting negative h2 estimates that generally means the unfortunate conclusion is that the current data is insufficient for LDSR genetic correlation analysis, which is unfortunate but seems to be the current reality for LDSR sample size/power requirements. > > > > > > Cheers, > > > Raymond > > > > > > > > > > > > > On May 14, 2018, at 12:32 PM, despinaman ***@***.******@***.******@***.******@***.******@***.***>> wrote: > > > > > > > > Hi, > > > > > > > > I am running LDSR correlation between two traits (t1d and w6), and when I run the analysis considering an intercept I get the following error:FloatingPointError: invalid value encountered in sqrt > > > > > > > > Since there is no sample overlap in the T1D and w6 GWASes, when I constrain the intercept (--no-intercept), the analysis is done, and I get a result that seems expected. > > > > > > > > I guess there is a problem with a negative number in a square root, but, why this appears? And why it does not appear when I constrain the intercept? > > > > > > > > Thank you! > > > > > > > > — > > > > You are receiving this because you are subscribed to this thread. > > > > Reply to this email directly, view it on GitHub <#117>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbU786LH67zzipR99r2VLzqVh232ks5tybGegaJpZM4T-Hu1>. > > > > > > > > > > > > > — > > > You are receiving this because you authored the thread. > > > Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOArNpyZCoYJmoEhuu9-6z1Fx1jbUks5ty2LngaJpZM4T-Hu1>. > > > > > > > > > — > > > You are receiving this because you commented. > > > Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvW2yu0XKk4eTUwuDqzvceTx6vCcaks5tzD0WgaJpZM4T-Hu1>. > > > > > > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOPL_8z0up0IxM3Q7BC8wZMOTkGF4ks5tzG1vgaJpZM4T-Hu1>. > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvdo7YNiGYGkQCeReUFSUp_KnX6BTks5tzHRWgaJpZM4T-Hu1>. > > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOPJzokQ60A1VdkvVpbosdPkt4NAhks5tzbNOgaJpZM4T-Hu1>. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvc7tE5TDph1fwdQyP8lH9JHXIPnhks5tzbTygaJpZM4T-Hu1>. > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#117 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AldHOEDUFW_lclDbh0OUwcynHPEwvgpfks5tzcFngaJpZM4T-Hu1>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvZtU_7fYEn7MP4ww0lTJzcX0z7D_ks5tzzb6gaJpZM4T-Hu1>.

jaamarks mentioned this issue Feb 17, 2020

NGC FOU LDSC Regression jaamarks/jaamarks_notebooks#10

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LD score regression trait correlation ERROR:invalid value encountered in sqrt #117

LD score regression trait correlation ERROR:invalid value encountered in sqrt #117

despinaman commented May 14, 2018

rkwalters commented May 15, 2018 via email

despinaman commented May 16, 2018 via email

despinaman commented May 16, 2018 via email

rkwalters commented May 16, 2018 via email

despinaman commented May 16, 2018 via email

rkwalters commented May 17, 2018 via email

despinaman commented May 17, 2018 via email

rkwalters commented May 17, 2018 via email

despinaman commented May 18, 2018 via email

rkwalters commented May 18, 2018 via email

LD score regression trait correlation ERROR:invalid value encountered in sqrt #117

LD score regression trait correlation ERROR:invalid value encountered in sqrt #117

Comments

despinaman commented May 14, 2018

rkwalters commented May 15, 2018 via email

despinaman commented May 16, 2018 via email

despinaman commented May 16, 2018 via email

rkwalters commented May 16, 2018 via email

despinaman commented May 16, 2018 via email

rkwalters commented May 17, 2018 via email

despinaman commented May 17, 2018 via email

rkwalters commented May 17, 2018 via email

despinaman commented May 18, 2018 via email

rkwalters commented May 18, 2018 via email