-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] parse nucleotides error in GWAS summary file #57
Comments
Hi, I'm not sure this is an error, as currently nucleotide encoding followed is the one provided in the validation genotypes. Hence, if it seems like a nucleotide in the validation genotypes is the complement, then the summary stats nucleotide is changed to its compliment, to match the validation genotypes. Could that explain what you're seeing. Best, |
I have thought about the compliment issue. However, a AC alleles should be
TG, but it coded as TC. This is the problem.
I checked many other sites it reported as mismatch by ldpred, it actually
can't be explained by switching to compliment.
And the same input in the python2 version has no problem but in the python3
has many problems. This is wired.
Best regards
Wallace
On Mon, Mar 4, 2019 at 2:25 AM Bjarni J. Vilhjalmsson < ***@***.***> wrote:
Hi,
I'm not sure this is an error, as currently nucleotide encoding followed
is the one provided in the validation genotypes. Hence, if it seems like a
nucleotide in the validation genotypes is the complement, then the summary
stats nucleotide is changed to its compliment, to match the validation
genotypes.
Could that explain what you're seeing.
Best,
Bjarni
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#57 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABZncjk90PeYxdUr4jQ_byLPHzMyPBOEks5vTMpKgaJpZM4bZP_S>
.
--
Best regards
Minxian Wang
---------------------------------
Research Fellow
Beth Israel Deaconess Medical Center
Harvard Medical School
99 Brookline Ave
Boston, MA 02215
|
Hi, I went through the coordination code, and refactored it, and made minor changes that I believe should improve performance.. I didn’t find the bug yet though? Best, |
Hi Bjarni, This is actually a real bug, It happened when the input GWAS summary has extreme P values (0 or 1). If this condition happened, the program will ignore add nucleotides and p-value derived beta to the system, which makes the system shifting the alleles and betas to its next record. This code makes this bug is in the file of "sum_stats_parsers.py":
If I move this section to the very begining (line 166), if we have extreme p value, just totally ignore loading any info. about this record, which fixed this problem. [Now you are loading half info., which messed up.]
Sorry, this code is based on the version of Feb272019. It's not the one you revised recently. I changed the code structure a little bit (add one more step for only compute LD score), to make it can be run in parallel chr by chr. Here! I am glad to work together with you to merge this to your repo to make LDpred can be run in parallel chr by chr if you are also happy with it. I think this is a great feature people like it, as it can save lots of memory needed, and also the waiting time. Best regards |
Thanks a lot Wallace! That's really helpful. I really only went carefully through the coordination code, and not the sum stats parsing code. But in hindsight, it makes lots of sense that the bug is in the sum stats parsing code, because I recently changed it a lot. I'll commit a fix today (or tomorrow). Regarding parallelising for multiple chromosomes, it's a good idea. I would like to add that feature, but as you know everything takes time. If you submit a merge request (that is not too difficult to merge), then that would be tremendous help. Otherwise, I'll probably get to that in the coming weeks. Finally, Wallace, it's great to work together on this, and thanks a lot for these contributions, I really appreciate them! Best, |
Hi Bjarni, Please check Here is my version of code running in parallel. Base on your code at Feb272019, with a few bugs I found fixed. You may have a try to merge to your repo, here a guide Merging 2 Different Git Repositories Without Losing your History Another bug I found yesteray for loading GWAS summary with user specified snp level sample size, Best regards |
I believe I just fixed this issue now. |
Hi Bjarni,
There seems nucleotides parsing error in the latest version(1.0) of LDpred, the exact same inputs, in the previous version(python2 version), there's no nucleotides non mathing error. However, in python3 version about 1/4 of nucleotides non mathing error. Then I checked the h5 coodfile, I found the recode is wrong in this file.
For example:
I have a recode like below in my GWAS summary:
The A/C alleles were coded as T/C in the cood file.
And there are many this type of errors.
Please have a check!
Best regards
Wallace
The text was updated successfully, but these errors were encountered: