Skip to content

LDpred2.R uses external ld matrix (partial fix for #108)#109

Merged
ofrei merged 15 commits intomainfrom
of_ldpred2
Dec 12, 2022
Merged

LDpred2.R uses external ld matrix (partial fix for #108)#109
ofrei merged 15 commits intomainfrom
of_ldpred2

Conversation

@ofrei
Copy link
Contributor

@ofrei ofrei commented Dec 9, 2022

Fixes first half of #108

Changes proposed in this pull request:

  • remove LD matrix computation from LDpred2.R script
  • instead, load reference data submitted to https://github.com/comorment/ldpred2_ref repository. For now the path to the reference was hard-coded (fell free to fix, or I can fix later)
  • allow to read subset of chromosomes using --chr2use flag
  • overlap the set of SNPs from (1) sumstats file (2) genotype (3) LD reference. Use snp_match function to perform this logic. Prefer merging on CHR:BP:A1:A2, give user a new --merge-by-rsid option to perform merging on rs# (SNP marker name). Test snp_match with a new unit-test, tests/extras/bigsnpr.R (the test passes)
  • implement new dataset (see reference/examples/ldpred2 folder), which allows to test how well LDpred2 matches SNPs from sumstats, genotypes and LD reference. The tool also runs very fast on that dataset because it only includes chr21 and chr22.
  • remove run_ldpred2.sh; usage example integrated into README.md

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

@ofrei
Copy link
Contributor Author

ofrei commented Dec 9, 2022

I'll fix Flake8 linter errors later today. The LD reference files are stil being uploaded to https://github.com/comorment/ldpred2_ref . Apart from this this PR is ready for review.

@ofrei
Copy link
Contributor Author

ofrei commented Dec 12, 2022

@deepchocolate @espenhgn I pushed few more changes, that's would be all I wanted to implement for ldpred2.R. Is it fine to merge? I'd like to ask our colleagues at CoMorMent to have a look before the meeting tomorrow.

@deepchocolate what do you think about adding --geno-file, --sumstats and --out as named arguments, rather than using them as positional arguments? I generally prefer named arguments over positional arguments, also for those arguments that are required. To me this is more readable and intuitive as one doesn't need to know that order in which positional arguments must be provided. Also it's fine to reorder without breaking backwards compatibility. Finally, with --chr2use argument with narg=Inf the behaviour is non-intuitive if --chr2use is the last argument before positional arguments, e.g. ... --chr2use 21 22 $genoFile $sumstatsFile $outFile. In this situation R parser doesn't have a way to know that there are only 2 arguments for --chr2use, the other three are positional. The command fails but that's really hard for the user to investigate.

@deepchocolate
Copy link
Contributor

@deepchocolate @espenhgn I pushed few more changes, that's would be all I wanted to implement for ldpred2.R. Is it fine to merge? I'd like to ask our colleagues at CoMorMent to have a look before the meeting tomorrow.

I'm looking at it now, I'll approve today, just checking if I need to do some changes somewhere. If there are any I'll put them in a separate branch

@deepchocolate what do you think about adding --geno-file, --sumstats and --out as named arguments, rather than using them as positional arguments? I generally prefer named arguments over positional arguments, also for those arguments that are required. To me this is more readable and intuitive as one doesn't need to know that order in which positional arguments must be provided. Also it's fine to reorder without breaking backwards compatibility. Finally, with --chr2use argument with narg=Inf the behaviour is non-intuitive if --chr2use is the last argument before positional arguments, e.g. ... --chr2use 21 22 $genoFile $sumstatsFile $outFile. In this situation R parser doesn't have a way to know that there are only 2 arguments for --chr2use, the other three are positional. The command fails but that's really hard for the user to investigate.

That's fine by me, good point! Also good to harmonize arguments to gwas.py, didn't think about that as I usually try to prefix the arguments after what they specify (--file-...).

@ofrei
Copy link
Contributor Author

ofrei commented Dec 12, 2022

@deepchocolate I agree about --file-, this sounds good idea to me, and currently this is also inconsistent in gwas.py (e.g. --sumstats also points to the file same as --geno-file, but for no good reason only one of them has -file). I suggest for now we keep things consistent with gwas.py, discuss the ideal names when there is more time, and change naming for in a later release.

@ofrei
Copy link
Contributor Author

ofrei commented Dec 12, 2022

I rebuilt r.sif container including stringr library, and fix other thing we discussed with @espenhgn
Also there is now a note on handling missing genotypes: https://github.com/comorment/containers/tree/of_ldpred2/usecases/LDpred2#handling-missing-genotypes
OK to merge?

Copy link
Contributor

@deepchocolate deepchocolate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@ofrei ofrei merged commit 0294899 into main Dec 12, 2022
@ofrei ofrei deleted the of_ldpred2 branch December 12, 2022 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants