Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support holdout data splits to be used in validation #75

Merged
merged 5 commits into from May 11, 2021

Conversation

micahjsmith
Copy link
Contributor

Can use the validation.split setting to specify a data split to be used in validation (i.e. test or holdout). The only significance of this setting is that data will be loaded using load_data(split=split) during validation. (Note that the pipeline and encoder will still be fit on the training data.) If your load_data function does not support loading a split by name, then ballet will fall back to using the training split.

@codecov-commenter
Copy link

codecov-commenter commented May 11, 2021

Codecov Report

Merging #75 (541bae1) into master (5e419d4) will decrease coverage by 4.71%.
The diff coverage is 85.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #75      +/-   ##
==========================================
- Coverage   84.38%   79.66%   -4.72%     
==========================================
  Files          47       54       +7     
  Lines        2753     2946     +193     
  Branches      281      282       +1     
==========================================
+ Hits         2323     2347      +24     
- Misses        341      506     +165     
- Partials       89       93       +4     
Impacted Files Coverage Δ
ballet/client.py 45.83% <ø> (ø)
ballet/eng/misc.py 80.51% <ø> (ø)
ballet/validation/feature_acceptance/__init__.py 33.33% <0.00%> (ø)
ballet/validation/main.py 83.96% <84.00%> (-1.44%) ⬇️
ballet/__init__.py 100.00% <100.00%> (ø)
ballet/validation/base.py 90.69% <100.00%> (+0.22%) ⬆️
ballet/validation/feature_acceptance/validator.py 96.55% <100.00%> (ø)
ballet/validation/feature_pruning/validator.py 97.67% <100.00%> (ø)
ballet/validation/gfssf.py 93.84% <100.00%> (ø)
ballet/pipeline.py 83.33% <0.00%> (-12.50%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f72a053...541bae1. Read the comment docs.

@micahjsmith micahjsmith merged commit e76e48d into master May 11, 2021
@micahjsmith micahjsmith deleted the validation-data-splits branch May 11, 2021 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants