Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.3.5 #345

Merged
merged 23 commits into from
Jul 16, 2024
Merged

0.3.5 #345

merged 23 commits into from
Jul 16, 2024

Conversation

sigmafelix
Copy link
Collaborator

  • Base learner targets (draft; should be changed in accordance with Big Data Considerations #325)
  • Download targets
  • Post-checkout hooks
  • Utility functions for the new targets

- attach_xy: Retaining all columns
- generate_cv_index: Retaining all columns
- following amadeus changes (process_blackmarble)
- New GPU-enabled R torch library binding
- path setting is changed to if-else
- pipeline error failing mode to "abridge"
- removing irrelevant arguments passed to terra::rast
- Internal functions are not exported
- attach_xy logic fix: join to the "grand" data rather than the leaned one
- fit_base_brulee and fit_base_xgb: added device to manually distribute GPU workload
- Branching base learner fitting
- shared interface for branching cv set generation
- Pipeline base learner: 3 CV strategies and hyperparameter tuning targets were added
- fit_base_* function get `return_best` and `tune_bayes_iter` for `workflow` compatibility
- prepare_cvindex assigns fold ids using function names when spatialsample functions are used
- restore_rset_full: as speed-up and disk saving measures
  - rset objects are generated based on essential coordinates only; this function restores full data for subsequent steps
- TODO: size issues persist in hyperparamter tuning and identifying the best model as the entire workflow should be saved. rsample always saves training/test data.
- TODO: duplicates in features; identify where the duplicates come
- Data size reduction for memory / storage management: added trim_resamples argument in fit_base_*
- make_subdata: bootstrapping (currently 30%)
- restore_fit_best: restore full data with CV rsample rset objects, extract the best tuned results, then fit the data
  - Dealing with nested list in tibbles from tidymodels workflow/hyperparameter tuning
  - TODO/Q: Do we save fitted model object or just keep predictions?
- set_args_download and feature_raw_download are written
- targets_download.R is revised to reflect the structure of the two functions
- Base learner: lightgbm
- Added dependencies: bonsai, lightgbm
- Explicit definition of hyperparameter search controls for each
- set_args_download update
- _targets.R update to generate arglist_download
- README.md update
- setup_hook.sh is capable of immediately activating permission change
- targets_download.R: duplicate target names
- Roxygen2 documentation typo fix in fit_base_lightgbm
@kyle-messier
Copy link
Collaborator

@sigmafelix It looks like main is one ahead of pipeline-compact and needs to be merged that way. It's probably a minor README change. After that, I'll review ASAP. Thanks! 🚀

@sigmafelix
Copy link
Collaborator Author

@kyle-messier I merged main into pipeline-compact. Most functions are still in nocov status, but I will work on improving actual coverage soon.

@sigmafelix sigmafelix merged commit a5d13bb into main Jul 16, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants