Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDpred2 imputation #131

Merged
merged 17 commits into from
Mar 6, 2023
Merged

LDpred2 imputation #131

merged 17 commits into from
Mar 6, 2023

Conversation

deepchocolate
Copy link
Contributor

@deepchocolate deepchocolate commented Feb 16, 2023

Fixes one part of #130. Closing #138

Changes proposed in this pull request:

  • Added an imputation script usecases/LDpred2/imputeGenotypes.R. Using bigsnpr::fastImputeSimple one can choose one of these methods, but I've also added zero "imputation" as described in the docs for this function.
  • Changed the allele flipping in ldpred2.R and calculateLD.R as described in LDpred2 issue with MoBa data(?) #124
  • Removed --geno-imputefrom ldpred2.R. --geno-impute-zero can be passed instead.
  • Moved functionality into a separate script usecases/LDpred2/fun.R to enable reuse and improved testing LDpred2_example/unittests/fun.R.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

@deepchocolate
Copy link
Contributor Author

@ofrei @espenhgn I tried adding fastImpute (in addition to fastImputeSimple), but this depends on xgboost (there's an R-package for it). However, I only get errors when trying to use it. Shall I stick with the simple?

@espenhgn
Copy link
Contributor

@ofrei @espenhgn I tried adding fastImpute (in addition to fastImputeSimple), but this depends on xgboost (there's an R-package for it). However, I only get errors when trying to use it. Shall I stick with the simple?

I'm sure we can include xgboost in the r.sif build if this fixes the issue. Perhaps a specific version of this package is required for fastImpute?

@deepchocolate
Copy link
Contributor Author

@ofrei @espenhgn I tried adding fastImpute (in addition to fastImputeSimple), but this depends on xgboost (there's an R-package for it). However, I only get errors when trying to use it. Shall I stick with the simple?

I'm sure we can include xgboost in the r.sif build if this fixes the issue. Perhaps a specific version of this package is required for fastImpute?

I'll stick with fastImputeSimple for now as there are some other isues with the scripts, that needs a fix.

@deepchocolate deepchocolate linked an issue Feb 24, 2023 that may be closed by this pull request
@deepchocolate deepchocolate marked this pull request as ready for review February 24, 2023 12:23
@deepchocolate deepchocolate changed the title Deepchocolate/ldpred2 imputation LDpred2 imputation Feb 24, 2023
Copy link
Contributor

@espenhgn espenhgn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I encountered some minor issues which were easily resolved (locally), as commented. You can merge #139 into this PR. Other than that:

  • in the LDpred2/README.md height example, add the line $RSCRIPT imputeGenotypes.R --impute-simple mean0 --geno-file-rds $fileGenoRDS after creating $fileGenoRDS itself. Otherwise it won't run. Either this, or add the --geno-impute-zero argument to $RSCRIPT ldpred2.R ... calls below
  • It was my bad suggestion to call the test directory for LDpred2_example. Perhaps this can by renamed LDpred2_test in this PR?

usecases/LDpred2_example/run.sh Outdated Show resolved Hide resolved
usecases/LDpred2_example/run.sh Outdated Show resolved Hide resolved
usecases/LDpred2_example/run.sh Outdated Show resolved Hide resolved
usecases/LDpred2/imputeGenotypes.R Outdated Show resolved Hide resolved
usecases/LDpred2/ldpred2.R Outdated Show resolved Hide resolved
usecases/LDpred2/README.md Outdated Show resolved Hide resolved
usecases/LDpred2_example/run.sh Outdated Show resolved Hide resolved
@@ -112,8 +146,9 @@ LDP="$RSCRIPT $DIR_SCRIPTS/ldpred2.R \
--ld-meta-file $DIR_TESTS/output/ld/map.rds \
--merge-by-rsid \
--col-stat beta --col-stat-se beta_se \
--col-snp-id rsid --col-chr chr --col-bp pos --col-A1 a0 --col-A2 a1 \
--col-snp-id rsid --col-chr chr --col-bp pos --col-A1 a1 --col-A2 a0 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's a concern, but for this entire call a warning is thrown while merging dataframes, possibly on L220-222 in the LDpred2.R script:

Test no restrictions on snps (similar to tutorial)
Loading backingfile: /nrec/space/espenh/containers/usecases/LDpred2_example/data/public-data3.rds 

### Reading LD reference meta-file from  /nrec/space/espenh/containers/usecases/LDpred2_example/output/ld/map.rds 

### Reading summary statistics /nrec/space/espenh/containers/usecases/LDpred2_tutorial/tutorial_data/public-data3-sumstats.txt 
Loaded 50000 SNPs
Filtering SNPs based on --chr2use
Retained 50000 out of 50000 
Matching sumstats to genotypes
50,000 variants to be matched.
0 ambiguous SNPs have been removed.
45,337 variants have been matched; 22,758 were flipped and 15,092 were reversed.
Matching sumstats to LD reference
45,337 variants to be matched.
0 ambiguous SNPs have been removed.
45,337 variants have been matched; 0 were flipped and 0 were reversed.
Warning message:
In merge.data.table(as.data.table(sumstats4), as.data.table(info_snp),  :
  column names 'pos.ss' are duplicated in the result
...

usecases/LDpred2/README.md Outdated Show resolved Hide resolved
@deepchocolate
Copy link
Contributor Author

@espenhgn I think I've addressed all points. I renamed usecases/LDpred2_example to usecases/LDpred2_test as you suggested

Copy link
Contributor

@espenhgn espenhgn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a small unzip fix to /LDpred2_test/run.sh. Looks good to me!

@deepchocolate
Copy link
Contributor Author

Pushed a small unzip fix to /LDpred2_test/run.sh. Looks good to me!

Great, thanks!

@deepchocolate
Copy link
Contributor Author

@ofrei Are you ok with merging this?

@espenhgn espenhgn mentioned this pull request Mar 3, 2023
2 tasks
@deepchocolate
Copy link
Contributor Author

@espenhgn Shall I merge this or wait for an approval from @ofrei aswell?

@espenhgn
Copy link
Contributor

espenhgn commented Mar 6, 2023

@espenhgn Shall I merge this or wait for an approval from @ofrei aswell?

No need to wait for @ofrei. Just go ahead and squash-merge this one :)

@espenhgn
Copy link
Contributor

espenhgn commented Mar 6, 2023

Btw., it's fine if you merge this first. I expect a small conflict with #141 that I can fix before that one is merged.

@deepchocolate deepchocolate merged commit 6e91503 into main Mar 6, 2023
@deepchocolate deepchocolate deleted the deepchocolate/ldpred2_imputation branch March 6, 2023 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LDpred2: Character in chromosome column causes error
2 participants