Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GWAS mean chi2 and MetaData mean chi2 #141

Closed
mojolin2 opened this issue Sep 9, 2021 · 8 comments
Closed

GWAS mean chi2 and MetaData mean chi2 #141

mojolin2 opened this issue Sep 9, 2021 · 8 comments

Comments

@mojolin2
Copy link

mojolin2 commented Sep 9, 2021

Dear mtaggers,

First of all thank you for putting out such a cool and efficient tool.
I was running the latest code on some lipids and blood pressure GWAS data from UK Biobank, GWAS from BOLT-LMM. While the job finished, I noticed 'Sigma matrix used is still non-positive-definite.' in the logs. I read in the issues history that this could be related to too high of correlation between the traits, say LDL and Total Chol, r > .95, so I took out the Total Chol but still got the error, it seems like it was because of GWAS mean chi2 for LDL was 0.99, though in the beginning of the log, the MetaData had the chi2 value for LDL (trait 1) to be 1.73. The sample size is about 300K. Any recommendations?

Best- jake

Beginning MTAG analysis...
MTAG will use the Z column for analyses.
Read in Trait 1 summary statistics (18116736 SNPs) from LDL2_mtag_in.txt ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: a1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: a2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
Read 18116736 SNPs from --sumstats file.
Removed 1041 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with SE <0 or NaN values.
Removed 0 SNPs with out-of-bounds p-values.
Removed 1685198 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
16430497 SNPs remain.
Removed 18699 SNPs with duplicated rs numbers (16411798 SNPs remain).
Removed 0 SNPs with N < 0.0 (16411798 SNPs remain).
Median value of SIGNED_SUMSTAT was -0.00642259715909, which seems sensible.
Dropping snps with null values

Metadata:
Mean chi^2 = 1.728
Lambda GC = 1.254
Max chi^2 = 1408.856
62701 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Thu Sep 9 10:22:05 2021
Total time elapsed: 4.0m:45.41s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 1 complete. SNPs remaining: 16430497
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 1: Dropped 18699 SNPs for duplicate values in the "snp_name" column
Read in Trait 2 summary statistics (18116736 SNPs) from HDL_mtag_in.txt ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: a1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: a2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
Read 18116736 SNPs from --sumstats file.
Removed 1199 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with SE <0 or NaN values.
Removed 0 SNPs with out-of-bounds p-values.
Removed 1685184 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
16430353 SNPs remain.
Removed 18699 SNPs with duplicated rs numbers (16411654 SNPs remain).
Removed 0 SNPs with N < 0.0 (16411654 SNPs remain).
Median value of SIGNED_SUMSTAT was -0.00403410078196, which seems sensible.
Dropping snps with null values

Metadata:
Mean chi^2 = 1.786
Lambda GC = 1.311
Max chi^2 = 1408.856
58307 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Thu Sep 9 10:28:44 2021
Total time elapsed: 4.0m:43.8s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 2 complete. SNPs remaining: 16430353
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 2: Dropped 18699 SNPs for duplicate values in the "snp_name" column
Read in Trait 3 summary statistics (18116736 SNPs) from TRIG_mtag_in.txt ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: a1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: a2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
Read 18116736 SNPs from --sumstats file.
Removed 1041 SNPs with missing values.
...
Sigma hat:
[[ 1.769 -0.134 0.288 0.131]
[-0.134 1.41 -0.469 -0.043]
[ 0.288 -0.469 1.087 0.123]
[ 0.131 -0.043 0.123 1.098]]
Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation.
Beginning estimation of Omega ...
Using GMM estimator of Omega ..
Checking for positive definiteness ..
matrix is not positive definite, performing adjustment..
Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite.
Completed estimation of Omega ...
Beginning MTAG calculations...
... Completed MTAG calculations.
Writing Phenotype 1 to file ...
Writing Phenotype 2 to file ...
Writing Phenotype 3 to file ...
Writing Phenotype 4 to file ...

Summary of MTAG results:

Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N
1 LDL2_mtag_in.txt 13874381 340951 340951 0.999 0.943 14945144
2 HDL_mtag_in.txt 13874381 290000 290000 1.285 1.193 196097
3 TRIG_mtag_in.txt 13874381 290000 290000 1.545 1.362 192295
4 SBP_mtag_in.txt 13874381 290000 290000 1.485 1.489 292886

Estimated Omega:
[[-6.758e-09 2.021e-11 4.492e-11 -5.942e-13]
[ 2.021e-11 1.387e-06 -4.635e-11 -1.009e-11]
[ 4.492e-11 -4.635e-11 2.044e-06 6.511e-12]
[-5.942e-13 -1.009e-11 6.511e-12 1.835e-06]]

(Correlation):
[[ nan nan nan nan]
[ nan 1.000e+00 -2.753e-05 -6.326e-06]
[ nan -2.753e-05 1.000e+00 3.361e-06]
[ nan -6.326e-06 3.361e-06 1.000e+00]]

Estimated Sigma:
[[ 1.769 -0.134 0.288 0.131]
[-0.134 1.41 -0.469 -0.043]
[ 0.288 -0.469 1.087 0.123]
[ 0.131 -0.043 0.123 1.098]]

(Correlation):
[[ 1. -0.085 0.208 0.094]
[-0.085 1. -0.379 -0.035]
[ 0.208 -0.379 1. 0.113]
[ 0.094 -0.035 0.113 1. ]]

@paturley
Copy link
Collaborator

paturley commented Sep 9, 2021 via email

@mojolin2
Copy link
Author

Hi, thanks for the fast response and LDSC tips:
The LDSC Intercept for LDL was:
Intercept: 1.1487 (0.0201), smaller than its Mean Chi^2: 2.4876
For comparison, the HDL intercept was similar:
Intercept: 1.1667 (0.0197), Mean Chi^2: 2.7523
Within the MTAG results, which meta data value represents the the intercept value?

LDSC LDL outputs:
Total Observed scale h2: 0.2703 (0.0643)

Lambda GC: 1.4729

Mean Chi^2: 2.4876

Intercept: 1.1487 (0.0201)

Ratio: 0.0999 (0.0135)

Analysis finished at Fri Sep 10 09:19:29 2021

Total time elapsed: 1.0m:12.13s

HDL:
Total Observed scale h2: 0.3104 (0.0593)

Lambda GC: 1.7589

Mean Chi^2: 2.7523

Intercept: 1.1667 (0.0197)

Ratio: 0.0951 (0.0113)

Analysis finished at Fri Sep 10 09:27:20 2021

@paturley
Copy link
Collaborator

paturley commented Sep 10, 2021 via email

@mojolin2
Copy link
Author

nohup python ./mtag.py --sumstats HDL_mtag_in.txt,LDL_mtag_in.txt,TRIG_mtag_in.txt,CHOL_mtag_in.txt,SBP_mtag_in.txt --out ./HDL_LDL_TRIG_CHOL_SBP --n_min 0.0 --stream_stdout &

It is the latest version of MTAG

@paturley
Copy link
Collaborator

paturley commented Sep 15, 2021 via email

@mojolin2
Copy link
Author

mojolin2 commented Oct 8, 2021

Hi, sorry it took so long to test this. running the LDL GWAS with MAF >= .01 fixed the issue where the LD score intercept is
greater than the mean chi2 stat. The warning message 'Sigma matrix used is still non-positive-definite.' is no longer produced. Not sure how removal of all rare variants does to PRS, but it is good to know and big thanks for the support.

@paturley
Copy link
Collaborator

paturley commented Oct 8, 2021 via email

@mojolin2
Copy link
Author

Good to know and thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants