-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GWAS mean chi2 and MetaData mean chi2 #141
Comments
What's actually killing you in this case is that the LD score intercept is
greater than the mean chi2 stat. That is a bit surprising to me for a trait
like LDL. When you run LD score regression on those summary statistics, do
you get a very large LD score intercept?
…On Thu, Sep 9, 2021 at 7:28 AM mojolin2 ***@***.***> wrote:
Dear mtaggers,
First of all thank you for putting out such a cool and efficient tool.
I was running the latest code on some lipids and blood pressure GWAS data
from UK Biobank, GWAS from BOLT-LMM. While the job finished, I noticed
'Sigma matrix used is still non-positive-definite.' in the logs. I read in
the issues history that this could be related to too high of correlation
between the traits, say LDL and Total Chol, r > .95, so I took out the
Total Chol but still got the error, it seems like it was because of GWAS
mean chi2 for LDL was 0.99, though in the beginning of the log, the
MetaData had the chi2 value for LDL (trait 1) to be 1.73. The sample size
is about 300K. Any recommendations?
Best- jake
Beginning MTAG analysis...
MTAG will use the Z column for analyses.
Read in Trait 1 summary statistics (18116736 SNPs) from LDL2_mtag_in.txt
...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: a1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: a2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.
Reading sumstats from provided DataFrame into memory 10000000 SNPs at a
time.
Read 18116736 SNPs from --sumstats file.
Removed 1041 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with SE <0 or NaN values.
Removed 0 SNPs with out-of-bounds p-values.
Removed 1685198 variants that were not SNPs. Note: strand ambiguous SNPs
were not dropped.
16430497 SNPs remain.
Removed 18699 SNPs with duplicated rs numbers (16411798 SNPs remain).
Removed 0 SNPs with N < 0.0 (16411798 SNPs remain).
Median value of SIGNED_SUMSTAT was -0.00642259715909, which seems sensible.
Dropping snps with null values
Metadata:
*Mean chi^2 = 1.728*
Lambda GC = 1.254
Max chi^2 = 1408.856
62701 Genome-wide significant SNPs (some may have been removed by
filtering).
Conversion finished at Thu Sep 9 10:22:05 2021
Total time elapsed: 4.0m:45.41s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 1 complete. SNPs remaining: 16430497
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Trait 1: Dropped 18699 SNPs for duplicate values in the "snp_name" column
Read in Trait 2 summary statistics (18116736 SNPs) from HDL_mtag_in.txt ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: a1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: a2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.
Reading sumstats from provided DataFrame into memory 10000000 SNPs at a
time.
Read 18116736 SNPs from --sumstats file.
Removed 1199 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with SE <0 or NaN values.
Removed 0 SNPs with out-of-bounds p-values.
Removed 1685184 variants that were not SNPs. Note: strand ambiguous SNPs
were not dropped.
16430353 SNPs remain.
Removed 18699 SNPs with duplicated rs numbers (16411654 SNPs remain).
Removed 0 SNPs with N < 0.0 (16411654 SNPs remain).
Median value of SIGNED_SUMSTAT was -0.00403410078196, which seems sensible.
Dropping snps with null values
Metadata:
Mean chi^2 = 1.786
Lambda GC = 1.311
Max chi^2 = 1408.856
58307 Genome-wide significant SNPs (some may have been removed by
filtering).
Conversion finished at Thu Sep 9 10:28:44 2021
Total time elapsed: 4.0m:43.8s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 2 complete. SNPs remaining: 16430353
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Trait 2: Dropped 18699 SNPs for duplicate values in the "snp_name" column
Read in Trait 3 summary statistics (18116736 SNPs) from TRIG_mtag_in.txt
...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: a1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: a2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.
Reading sumstats from provided DataFrame into memory 10000000 SNPs at a
time.
Read 18116736 SNPs from --sumstats file.
Removed 1041 SNPs with missing values.
...
Sigma hat:
[[ 1.769 -0.134 0.288 0.131]
[-0.134 1.41 -0.469 -0.043]
[ 0.288 -0.469 1.087 0.123]
[ 0.131 -0.043 0.123 1.098]]
Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not
perform well in this situation.
Beginning estimation of Omega ...
Using GMM estimator of Omega ..
Checking for positive definiteness ..
matrix is not positive definite, performing adjustment..
Warning: max number of iterations reached in adjustment procedure. Sigma
matrix used is still non-positive-definite.
Completed estimation of Omega ...
Beginning MTAG calculations...
... Completed MTAG calculations.
Writing Phenotype 1 to file ...
Writing Phenotype 2 to file ...
Writing Phenotype 3 to file ...
Writing Phenotype 4 to file ...
Summary of MTAG results:
Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS
equiv. (max) N
*1 LDL2_mtag_in.txt 13874381 340951 340951 0.999 0.943 14945144*
2 HDL_mtag_in.txt 13874381 290000 290000 1.285 1.193 196097
3 TRIG_mtag_in.txt 13874381 290000 290000 1.545 1.362 192295
4 SBP_mtag_in.txt 13874381 290000 290000 1.485 1.489 292886
Estimated Omega:
[[-6.758e-09 2.021e-11 4.492e-11 -5.942e-13]
[ 2.021e-11 1.387e-06 -4.635e-11 -1.009e-11]
[ 4.492e-11 -4.635e-11 2.044e-06 6.511e-12]
[-5.942e-13 -1.009e-11 6.511e-12 1.835e-06]]
(Correlation):
[[ nan nan nan nan]
[ nan 1.000e+00 -2.753e-05 -6.326e-06]
[ nan -2.753e-05 1.000e+00 3.361e-06]
[ nan -6.326e-06 3.361e-06 1.000e+00]]
Estimated Sigma:
[[ 1.769 -0.134 0.288 0.131]
[-0.134 1.41 -0.469 -0.043]
[ 0.288 -0.469 1.087 0.123]
[ 0.131 -0.043 0.123 1.098]]
(Correlation):
[[ 1. -0.085 0.208 0.094]
[-0.085 1. -0.379 -0.035]
[ 0.208 -0.379 1. 0.113]
[ 0.094 -0.035 0.113 1. ]]
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#141>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFBUB5LP67RKSPB345MKZADUBCK5ZANCNFSM5DW5GONA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Hi, thanks for the fast response and LDSC tips: LDSC LDL outputs: Lambda GC: 1.4729 Mean Chi^2: 2.4876 Intercept: 1.1487 (0.0201) Ratio: 0.0999 (0.0135) Analysis finished at Fri Sep 10 09:19:29 2021 Total time elapsed: 1.0m:12.13s HDL: Lambda GC: 1.7589 Mean Chi^2: 2.7523 Intercept: 1.1667 (0.0197) Ratio: 0.0951 (0.0113) Analysis finished at Fri Sep 10 09:27:20 2021 |
Hmm. Very puzzling. What is the command you are using to call MTAG?
…On Fri, Sep 10, 2021 at 4:34 AM mojolin2 ***@***.***> wrote:
Hi, thanks for the fast response and LDSC tips:
The LDSC Intercept for LDL was:
Intercept: 1.1487 (0.0201), smaller than its Mean Chi^2: 2.4876
For comparison, the HDL intercept was similar:
Intercept: 1.1667 (0.0197), Mean Chi^2: 2.7523
Within the MTAG results, which meta data value represents the the
intercept value?
LDSC LDL outputs:
Total Observed scale h2: 0.2703 (0.0643)
Lambda GC: 1.4729
Mean Chi^2: 2.4876
Intercept: 1.1487 (0.0201)
Ratio: 0.0999 (0.0135)
Analysis finished at Fri Sep 10 09:19:29 2021
Total time elapsed: 1.0m:12.13s
HDL:
Total Observed scale h2: 0.3104 (0.0593)
Lambda GC: 1.7589
Mean Chi^2: 2.7523
Intercept: 1.1667 (0.0197)
Ratio: 0.0951 (0.0113)
Analysis finished at Fri Sep 10 09:27:20 2021
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#141 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFBUB5MVLX5UMA4GOFFB2O3UBG7KPANCNFSM5DW5GONA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
nohup python ./mtag.py --sumstats HDL_mtag_in.txt,LDL_mtag_in.txt,TRIG_mtag_in.txt,CHOL_mtag_in.txt,SBP_mtag_in.txt --out ./HDL_LDL_TRIG_CHOL_SBP --n_min 0.0 --stream_stdout & It is the latest version of MTAG |
Hmm. This is still puzzling to me. You seem to have a ton of SNPs in your
summary statistics, which makes me wonder if this is being caused by rare
variants. What happens if you first filter out SNPs that have a MAF<.01 or
.05?
…On Sun, Sep 12, 2021 at 2:16 PM mojolin2 ***@***.***> wrote:
nohup python ./mtag.py --sumstats
HDL_mtag_in.txt,LDL_mtag_in.txt,TRIG_mtag_in.txt,CHOL_mtag_in.txt,SBP_mtag_in.txt
--out ./HDL_LDL_TRIG_CHOL_SBP --n_min 0.0 --stream_stdout &
It is the latest version of MTAG
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#141 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFBUB5L73MUQY34R3Q4BF7DUBTU6PANCNFSM5DW5GONA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Hi, sorry it took so long to test this. running the LDL GWAS with MAF >= .01 fixed the issue where the LD score intercept is |
No worries. In general, I worry that MTAG's assumptions may be violated if
you include both common and rare SNPs. If you really would like to include
the rare SNPs in your PRS, you could always merge the MTAG summary
statistics with the GWAS summary statistics for SNPs that were omitted, but
I suspect that it wouldn't lead to large improvements in power.
…On Fri, Oct 8, 2021 at 5:44 AM mojolin2 ***@***.***> wrote:
Hi, sorry it took so long to test this. running the LDL GWAS with MAF >=
.01 fixed the issue where the LD score intercept is
greater than the mean chi2 stat. The warning message 'Sigma matrix used is
still non-positive-definite.' is no longer produced. Not sure how removal
of all rare variants does to PRS, but it is good to know and big thanks for
the support.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#141 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFBUB5PNI3QIZBDMZWAHM2DUF24Q7ANCNFSM5DW5GONA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Good to know and thanks again. |
Dear mtaggers,
First of all thank you for putting out such a cool and efficient tool.
I was running the latest code on some lipids and blood pressure GWAS data from UK Biobank, GWAS from BOLT-LMM. While the job finished, I noticed 'Sigma matrix used is still non-positive-definite.' in the logs. I read in the issues history that this could be related to too high of correlation between the traits, say LDL and Total Chol, r > .95, so I took out the Total Chol but still got the error, it seems like it was because of GWAS mean chi2 for LDL was 0.99, though in the beginning of the log, the MetaData had the chi2 value for LDL (trait 1) to be 1.73. The sample size is about 300K. Any recommendations?
Best- jake
Beginning MTAG analysis...
MTAG will use the Z column for analyses.
Read in Trait 1 summary statistics (18116736 SNPs) from LDL2_mtag_in.txt ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: a1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: a2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.
Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
Read 18116736 SNPs from --sumstats file.
Removed 1041 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with SE <0 or NaN values.
Removed 0 SNPs with out-of-bounds p-values.
Removed 1685198 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
16430497 SNPs remain.
Removed 18699 SNPs with duplicated rs numbers (16411798 SNPs remain).
Removed 0 SNPs with N < 0.0 (16411798 SNPs remain).
Median value of SIGNED_SUMSTAT was -0.00642259715909, which seems sensible.
Dropping snps with null values
Metadata:
Mean chi^2 = 1.728
Lambda GC = 1.254
Max chi^2 = 1408.856
62701 Genome-wide significant SNPs (some may have been removed by filtering).
Conversion finished at Thu Sep 9 10:22:05 2021
Total time elapsed: 4.0m:45.41s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 1 complete. SNPs remaining: 16430497
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Trait 1: Dropped 18699 SNPs for duplicate values in the "snp_name" column
Read in Trait 2 summary statistics (18116736 SNPs) from HDL_mtag_in.txt ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: a1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: a2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.
Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
Read 18116736 SNPs from --sumstats file.
Removed 1199 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with SE <0 or NaN values.
Removed 0 SNPs with out-of-bounds p-values.
Removed 1685184 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
16430353 SNPs remain.
Removed 18699 SNPs with duplicated rs numbers (16411654 SNPs remain).
Removed 0 SNPs with N < 0.0 (16411654 SNPs remain).
Median value of SIGNED_SUMSTAT was -0.00403410078196, which seems sensible.
Dropping snps with null values
Metadata:
Mean chi^2 = 1.786
Lambda GC = 1.311
Max chi^2 = 1408.856
58307 Genome-wide significant SNPs (some may have been removed by filtering).
Conversion finished at Thu Sep 9 10:28:44 2021
Total time elapsed: 4.0m:43.8s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 2 complete. SNPs remaining: 16430353
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Trait 2: Dropped 18699 SNPs for duplicate values in the "snp_name" column
Read in Trait 3 summary statistics (18116736 SNPs) from TRIG_mtag_in.txt ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: a1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: a2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.
Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
Read 18116736 SNPs from --sumstats file.
Removed 1041 SNPs with missing values.
...
Sigma hat:
[[ 1.769 -0.134 0.288 0.131]
[-0.134 1.41 -0.469 -0.043]
[ 0.288 -0.469 1.087 0.123]
[ 0.131 -0.043 0.123 1.098]]
Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation.
Beginning estimation of Omega ...
Using GMM estimator of Omega ..
Checking for positive definiteness ..
matrix is not positive definite, performing adjustment..
Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite.
Completed estimation of Omega ...
Beginning MTAG calculations...
... Completed MTAG calculations.
Writing Phenotype 1 to file ...
Writing Phenotype 2 to file ...
Writing Phenotype 3 to file ...
Writing Phenotype 4 to file ...
Summary of MTAG results:
Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N
1 LDL2_mtag_in.txt 13874381 340951 340951 0.999 0.943 14945144
2 HDL_mtag_in.txt 13874381 290000 290000 1.285 1.193 196097
3 TRIG_mtag_in.txt 13874381 290000 290000 1.545 1.362 192295
4 SBP_mtag_in.txt 13874381 290000 290000 1.485 1.489 292886
Estimated Omega:
[[-6.758e-09 2.021e-11 4.492e-11 -5.942e-13]
[ 2.021e-11 1.387e-06 -4.635e-11 -1.009e-11]
[ 4.492e-11 -4.635e-11 2.044e-06 6.511e-12]
[-5.942e-13 -1.009e-11 6.511e-12 1.835e-06]]
(Correlation):
[[ nan nan nan nan]
[ nan 1.000e+00 -2.753e-05 -6.326e-06]
[ nan -2.753e-05 1.000e+00 3.361e-06]
[ nan -6.326e-06 3.361e-06 1.000e+00]]
Estimated Sigma:
[[ 1.769 -0.134 0.288 0.131]
[-0.134 1.41 -0.469 -0.043]
[ 0.288 -0.469 1.087 0.123]
[ 0.131 -0.043 0.123 1.098]]
(Correlation):
[[ 1. -0.085 0.208 0.094]
[-0.085 1. -0.379 -0.035]
[ 0.208 -0.379 1. 0.113]
[ 0.094 -0.035 0.113 1. ]]
The text was updated successfully, but these errors were encountered: