Adding tests and data, new vignette, biocparallel, and more to remove BiocCheck errors/warnings#23
Conversation
eboyer221
left a comment
There was a problem hiding this comment.
I ran into two reproducibility issues to fix before merging (see comments on core_ml.R and run_ML.R). The set.seed() to withr::with_seed() swap means the seed now only covers the data split instead of the whole pipeline, so downstream steps no longer inherit it. I tested with a small toy dataset and the same seed on both branches and this one picks a different best model than main.
Everything else (BiocCheck cleanup, runnable examples, new vignette, biocViews) looks good.
eboyer221
left a comment
There was a problem hiding this comment.
Just tested with new changes and the results match main exactly now (fit_penalty 0.1, fit_mixture 1, nmcc 0.85, all the same on both branches). local_seed is a lot cleaner too. Approving on my end.
Summary
A pile of changes to clear
R CMD check,BiocCheck, and a lot of Bioconductor submission rules. The most substantial additions are the new vignette and tests for the exported functions.Code changes
**Parallelism: ** R/run_ML.R
runMLmodelsandrunMDRmodelsnow use bpapply and SnowParam. This is in line with what was done in amRdata. Droppedfutureandfuture.applyfromImports:.No
set.seed()inside package code.splitMLInputTibble(R/core_ml.R) andshuffleLabels(R/prep_ml.R) used to callset.seed()directly, which Bioconductor does not allow. Replaced withwithr::with_seedwrappers. This scopes the seed change and is bioconductor friendly while also allowing us to keep determinism for testing. AddedwithrtoImports:.Removed
<<-from.parquet2LOOMatrix. The leave-one-out function in R/generate_matrices_ml.R accumulated output filenames viawalk()+<<-(global assignment into the enclosing scope). Replaced withmap()returning the filenames +compact()/unlist()to collect them. The code gives the same outputs.is()for class checks. Bioc disallowsclass(x)[1] == "foo"patterns. Switched R/arg_check_ml.R and R/core_ml.R tomethods::is(x, "ClassName")and addedmethodstoImports:.Documentation
New vignette. vignettes/intro.Rmd replaces the previous vignette as it was too out of date (was calling functions that don't exist in the repo for example). Uses
BiocStyle::html_document, and the bundledSfl_parquet.duckdbfixtureRunnable examples on >= 80% of exported functions.
biocViewscleaned up. The original terms (ML,AMR,Pathogen,MicrobialGenomics) were mostly invalid. I replaced with valid ones (Software,Classification,Regression,StatisticalMethod,FeatureExtraction,MultipleComparison,FunctionalGenomics,Genetics,Visualization). Other terms can be searched for here https://bioconductor.org/packages/release/BiocViews.htmlDemo data for examples
New
demo_ml_tibbleanddemo_fitindata/. I created some small fixtures which were subsetted from the Sfl data.(60 rows × 80 features, plus a tuned LR fit on them). This data is used for some examples to run them quickly. For examplegenerateMLInputs()+tuneGrid()chain now load these viadata(demo_ml_tibble)/data(demo_fit).inst/scripts/make_demo_data.R generated this data
R/data.R documents both objects
How to test
Running BiocCheck should give 1 remaining error and 14 notes. The notes will be dealt with in another commit. The remaining error must be resolved by the maintainer. It involves setting this package to be watched on their account. ERROR: Add package to Watched Tags in your Support Site profile; visit
https://support.bioconductor.org/accounts/edit/profile
Future work
Most notes are solvable, miscellaneous changes. I will split that into a separate PR though as this is getting long and there is plenty to review here.
One important thing for Bioconductor is that currently the documentation stats its possible to run RF/BT, but this isn't actually implemented. We need to decide if these features need to be implemented before Bioconductor submission, and if not this documentation will need to be updated
A BiocCheck CI will be added but because of the error involving the maintainer, it would for sure just fail right now.
The vignette will be updated to either load from experimenthubs and/or data files rather than
inst/extdata.Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.