Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new CLOS train test split tutorial notebook #1071

Merged
merged 51 commits into from
Jul 8, 2024

Conversation

mturk24
Copy link
Contributor

@mturk24 mturk24 commented Mar 28, 2024

Summary

Added new tutorial that shows how to improve ML performance using train-test splits on your data with CLOS.

There is currently an issue preventing me from fully building the docs to see how quickly (and if successfully) the new tutorial builds.

Also modified the index files necessary to include this in the main sidebar of the CLOS tutorials. This is replacing the tabular datalab tutorial as well.

Latest Update: Bug in tutorial has been fixed and index files have been updated appropriately. Latest commits show fix/improvements to tutorial and data in S3 has been updated

Copy link

codecov bot commented Mar 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.96%. Comparing base (e0b7615) to head (8860500).
Report is 65 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1071      +/-   ##
==========================================
- Coverage   96.20%   95.96%   -0.24%     
==========================================
  Files          76       82       +6     
  Lines        6005     6097      +92     
  Branches     1070     1069       -1     
==========================================
+ Hits         5777     5851      +74     
- Misses        135      147      +12     
- Partials       93       99       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

… iid issues and filtered training data based on exact duplicates between training and test sets
…revious version following the model eval on clean training + test data. Fixed section on using Datalab on training data to clean the data
…up notebook and added more on hyperparameter optimization section. This section still needs to be improved.
… and cleaned up some of the code, put data used into s3 bucket
…ar before DCAI workflow tutorial, and renamed it to improving_ml_performance, also removed datalab tabular tutorial since this tutorial is replacing that one
@mturk24 mturk24 requested review from jwmueller and elisno April 4, 2024 01:52
@mturk24 mturk24 changed the title Added WIP new CLOS train test split tutorial notebook Added new CLOS train test split tutorial notebook Apr 4, 2024
@mturk24 mturk24 requested a review from sanjanag April 4, 2024 01:56
@mturk24
Copy link
Contributor Author

mturk24 commented Apr 4, 2024

Also adding @sanjanag as reviewer (since she was very helpful/involved in this)

@jwmueller jwmueller removed the request for review from elisno April 4, 2024 05:09
@mturk24
Copy link
Contributor Author

mturk24 commented Apr 4, 2024

I was able to get a workaround for this issue using this approach so was able to build the docs successfully. I'm not sure what the expected runtime is to build but going to try comparing build time with and without the new notebook more thoroughly.

@jwmueller
Copy link
Member

can you resolve merge conflicts, thanks!

@jwmueller jwmueller self-requested a review July 7, 2024 07:28
Copy link
Member

@jwmueller jwmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed many edits. Please go over the whole thing carefully, then clear all the output cells and verify the docs build well for you. After that, make the final PR commit with all cleared cells and merge.

@mturk24 mturk24 merged commit 38183e3 into master Jul 8, 2024
23 of 39 checks passed
jwmueller added a commit that referenced this pull request Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants