error in log file #24

umcyh · 2018-04-06T17:49:28Z

I applied MTAG https://github.com/omeed-maghzian/mtag for some public data:http://csg.sph.umich.edu//abecasis/public/lipids2013/. I choose Total Cholesterol and Triglycerides data to test MTAG.

When I run MTAG, the log file has some error, please see the Log file. I just added column z=Beta/SE to the input file of MTAG from original data.
(1) Is it correct for z-value calculation ?
(2) Is N value is correct?
(3) The error from log is: ERROR converting summary statistics. Could you explain why there is error in converting summary statistics?

Log file:

2018/04/06/12:23:11 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/04/06/12:23:11 PM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
2018/04/06/12:23:11 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/04/06/12:23:11 PM Interpreting column names as follows:
2018/04/06/12:23:11 PM snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

2018/04/06/12:23:11 PM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
2018/04/06/12:23:16 PM Read 2446981 SNPs from --sumstats file.
Removed 805 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
2446176 SNPs remain.
2018/04/06/12:23:17 PM Removed 0 SNPs with duplicated rs numbers (2446176 SNPs remain).
2018/04/06/12:23:18 PM Removed 33274 SNPs with N < 63063.3333333 (2412902 SNPs remain).
2018/04/06/12:24:37 PM
ERROR converting summary statistics:

2018/04/06/12:24:37 PM Traceback (most recent call last):
File "/mnt/speliotes-lab/Software/MTAG/mtag-master/ldsc_mod/munge_sumstats.py", line 718, in munge_sumstats
check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, 0.1, sign_cname))
File "/mnt/speliotes-lab/Software/MTAG/mtag-master/ldsc_mod/munge_sumstats.py", line 372, in check_median
raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2)))
ValueError: WARNING: median value of SIGNED_SUMSTATS is 0.71 (should be close to 0.0). This column may be mislabeled.

2018/04/06/12:24:37 PM
Conversion finished at Fri Apr 6 12:24:37 2018
2018/04/06/12:24:37 PM Total time elapsed: 1.0m:26.4s
2018/04/06/12:24:37 PM WARNING: median value of SIGNED_SUMSTATS is 0.71 (should be close to 0.0). This column may be mislabeled.
Traceback (most recent call last):
File "mtag.py", line 1348, in
mtag(args)
File "mtag.py", line 1194, in mtag
DATA, args = load_and_merge_data(args)
File "mtag.py", line 229, in load_and_merge_data
GWAS_d[p], sumstats_format[p] = _perform_munge(args, GWAS_d[p], gwas_dat_gen, p)
File "mtag.py", line 149, in _perform_munge
munged_results = munge_sumstats.munge_sumstats(argnames, write_out=False, new_log=False)
File "/mnt/speliotes-lab/Software/MTAG/mtag-master/ldsc_mod/munge_sumstats.py", line 718, in munge_sumstats
check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, 0.1, sign_cname))
File "/mnt/speliotes-lab/Software/MTAG/mtag-master/ldsc_mod/munge_sumstats.py", line 372, in check_median
raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2)))
ValueError: WARNING: median value of SIGNED_SUMSTATS is 0.71 (should be close to 0.0). This column may be mislabeled.

GeneticResources · 2018-04-06T23:30:25Z

I think the public data flipped allele A1 to make betas > 0. But mtag expects the mean value of betas should be 0. Random selecting half of the SNPs and flipping A1 and beta may slove the problem.

Another related Q about mtag z scores, in XY plot of the raw Z(gwas) and Z(mtag), the beta should be around 1? I got 0.5, don't know how to explain the shrunk Z scores from mtag.

huilisabrina · 2018-04-07T03:54:31Z

Hi @umcyh ,

The error comes from one of the data validity checks built in the LDSC package, which MTAG uses to estimate the Sigma.

To your questions:
(1) You’re right about the z=beta/SE.
(2) N is correct based on the table you sent.
(3) The message indicates the effect sizes in the input sumstats are skewed to be positive. This violates the underlying random effect assumption used in MTAG. We’d expect beta or Z to be zero on average, if the choice of reference allele is arbitrary.

Thanks @GeneticResources for pointing out the allele issue in the source data. That should explains the error Yanhua was getting. In terms of the Z-score issue - Are you running a single trait MTAG? The input GWAS z and the output mtag_z should match if that is the case.

Thanks,
Hui

GeneticResources · 2018-04-07T10:50:20Z

Hi @huilisabrina ,

I ran the mtag based on a disease trait (binary) and a quantitative trait. The beta between Z(gwas of binaray trait) and Z(mtag of binaray trait) was 0.5 (not around 1). Do you know the reason? Thanks.

huilisabrina · 2018-04-07T14:23:25Z

Hi @GeneticResources ,

One possibility is that the N used in the case-control trait is not the "effective N". MTAG assumes SE=1/sqrt(N_eff2p*(1-p)), where p is the minor allele frequency. Can you try replacing the N column in the binary trait sumstats with 1/( 2p(1-p)*(SE^2) ) and see if that solves the problem? Also, there are some discussions in an older issue #10 that might be helpful.

Best,
Hui

GeneticResources · 2018-04-09T07:20:17Z

Hi @huilisabrina ,
The issue #10 is very helpful. After I used the new "effective N", the lm model slope between beta (gwas) and beta (mtag) is 1.0308840, previous was 4.3747334.
However, the slope of the z (gwas) and z (mtag) is still about 0.6259420, previous was
0.619527.
And the slope of the se (gwas) and se (mtag) is 1.6295029, previous was 6.935761.
Do you know the potential reason for z(slope) ~ 0.63 and se(slope) ~ 1.63?

Thanks.

paturley · 2018-04-09T15:50:46Z

If you could share the log file from your analysis, it would help a lot to understand what you are seeing.

…

On Mon, Apr 9, 2018 at 3:20 AM Genetics ***@***.***> wrote: Hi @huilisabrina <https://github.com/huilisabrina> , The issue #10 <#10> is very helpful. After I used the new "effective N", the lm model slope between beta (gwas) and beta (mtag) is 1.0308840, previous was 4.3747334. However, the slope of the z (gwas) and z (mtag) is still about 0.6259420, previous was 0.619527. And the slope of the se (gwas) and se (mtag) is 1.6295029, previous was 6.935761. Do you know the potential reason for z(slope) ~ 0.63 and se(slope) ~ 1.63? Thanks. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#24 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AUNA9UyNby8c5le2v6dxqJ0-GDlu4xBXks5tmwuxgaJpZM4TKc3B> .

GeneticResources · 2018-04-10T01:50:41Z

out_mtag_trait_three.log

Hi @paturley ,

The attached is the log file.
The first trait is the binary trait with z(slope) ~ 0.63 and se(slope) ~ 1.63.
Trait 2 and 3 are quantitative traits and their slopes of the z (gwas) and z (mtag) are 1.004962 and 0.949929, respectively. However, the slopes of beta and se are not round 1. (~0.2 for trait 2, ~ 3.5 for trait 3 for both beta and se)

Since the effect sample size is hard to determine, is it possible to use mtag based on beta and se, rather than z.

Thanks.

paturley · 2018-04-10T18:01:27Z

MTAG outputs betas and coefficients assuming that the phenotype has been standardized to have a standard deviation of one. That appears to be the problem with traits 2 and 3 since the betas and SEs are inflated/deflated by the same amount. (Do you know that variance of the phenotype for those two data sets?) See Issue #10

For trait one, there may be a few things that are going on.

Since the trait is binary, the results correspond to first standardizing the binary trait, and then doing GWAS. If you want to convert the betas and SEs back into binary units, you need to multiply them by the standard deviation of the binary phenotype.
Even if you correct the units of the estimates, the slope won't be one in expectation due to attenuation bias since the betas are estiamted with noise. (https://en.wikipedia.org/wiki/Regression_dilution)

Re using beta and se rather than z and N, we hope to implement that soon. If you use the formula for N that is found in issue #10 , however, that is equivalent to to using the beta and se.

Xuemin-Wang mentioned this issue Aug 5, 2020

mtag beta estimates smaller than that in individual gwas? #101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error in log file #24

error in log file #24

umcyh commented Apr 6, 2018

GeneticResources commented Apr 6, 2018

huilisabrina commented Apr 7, 2018

GeneticResources commented Apr 7, 2018

huilisabrina commented Apr 7, 2018

GeneticResources commented Apr 9, 2018

paturley commented Apr 9, 2018 via email

GeneticResources commented Apr 10, 2018

paturley commented Apr 10, 2018

error in log file #24

error in log file #24

Comments

umcyh commented Apr 6, 2018

GeneticResources commented Apr 6, 2018

huilisabrina commented Apr 7, 2018

GeneticResources commented Apr 7, 2018

huilisabrina commented Apr 7, 2018

GeneticResources commented Apr 9, 2018

paturley commented Apr 9, 2018 via email

GeneticResources commented Apr 10, 2018

paturley commented Apr 10, 2018