Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 2015 data & remove holdout set #5

Merged
merged 23 commits into from
Dec 7, 2022
Merged

Use 2015 data & remove holdout set #5

merged 23 commits into from
Dec 7, 2022

Conversation

roshankern
Copy link
Member

This PR is ready for review! This PR incorporates a newer version of mitocheck_data, downloading the 2015 MitoCheck dataset and merging it with the older dataset. The pipeline is then rerun with this expanded dataset.

The "holdout" dataset is also removed, leaving only the training and testing data subsets. Our logic here is that the application of the final phenotypic profiling model to other datasets (ex Cell Health) will validate the model in the same way a holdout dataset would.

Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I left a few comments and suggestions with this review. Please don't hesitate to let me know if you have any questions or if I may clarify at all.

I also wanted to follow up with a general question:

  • I noticed that notebook files 2.train_model/train_model.ipynb and 4.interpret_model/interpret_model.ipynb did not have nbconverted .py pair files. Should those be included with this PR?

0.download_data/README.md Show resolved Hide resolved
0.download_data/scripts/nbconverted/download_data.py Outdated Show resolved Hide resolved
0.download_data/scripts/nbconverted/download_data.py Outdated Show resolved Hide resolved
1.split_data/README.md Show resolved Hide resolved
1.split_data/README.md Outdated Show resolved Hide resolved
3.evaluate_model/README.md Outdated Show resolved Hide resolved
utils/download_utils.py Show resolved Hide resolved
utils/download_utils.py Outdated Show resolved Hide resolved
utils/download_utils.py Outdated Show resolved Hide resolved
utils/download_utils.py Outdated Show resolved Hide resolved
@roshankern
Copy link
Member Author

Thank you for the review @d33bs!

Notebook files 2.train_model/train_model.ipynb and 4.interpret_model/interpret_model.ipynb should not have .py files in this PR because their python files had no changes to be tracked in this PR (the Jupyter files were just rerun).

@roshankern roshankern merged commit 44e2741 into WayScience:main Dec 7, 2022
@roshankern roshankern deleted the use-2015-data branch December 7, 2022 22:33
@roshankern
Copy link
Member Author

Accidentally merged this PR without approval, but @d33bs and I discussed that everything was good to merge.

roshankern added a commit that referenced this pull request Dec 12, 2022
* Use 2015 data & remove holdout set (#5)

* finish download module changes

* download notebook

* rerun split data module

* rerun download module

* rerun train_model

* rerun evaluation module

* rerun interpretation module

* combine datasets

* combine datasets

* split changes

* update format

* format update

* format

* finish split data

* combine datasets, remove holdout

* formatting

* rerun pipelines

* remove folded class

* rerun pipeline

* Update utils/download_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* PR fixes

* module docstrings

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* add PR curves

* get PR curves/data

* update docs, recreate py files

* greg recommendations

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
roshankern added a commit that referenced this pull request Jan 4, 2023
* Use 2015 data & remove holdout set (#5)

* finish download module changes

* download notebook

* rerun split data module

* rerun download module

* rerun train_model

* rerun evaluation module

* rerun interpretation module

* combine datasets

* combine datasets

* split changes

* update format

* format update

* format

* finish split data

* combine datasets, remove holdout

* formatting

* rerun pipelines

* remove folded class

* rerun pipeline

* Update utils/download_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* PR fixes

* module docstrings

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* validation module

* correlation matrix

* reviz

* reformat

* pearson correlation

* spearman correlation

* documentation

* documentation

* docs

* docs, rerun

* docs

* docs

* Update 5.validate_model/validate_model.sh

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Update utils/validate_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* raw link clarification

* Update utils/validate_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Update utils/validate_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Update utils/validate_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* conditional to remove x, y columns

* clarify perturbation rename

* black formatting

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
roshankern added a commit that referenced this pull request Feb 6, 2023
* Use 2015 data & remove holdout set (#5)

* finish download module changes

* download notebook

* rerun split data module

* rerun download module

* rerun train_model

* rerun evaluation module

* rerun interpretation module

* combine datasets

* combine datasets

* split changes

* update format

* format update

* format

* finish split data

* combine datasets, remove holdout

* formatting

* rerun pipelines

* remove folded class

* rerun pipeline

* Update utils/download_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* PR fixes

* module docstrings

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* move class PR curves

* use typing tuple return hint

* use tuple

* confusion matrix evaluation

* rename cm files

* update documentation

* code documentation

* get model scores

* undo last commit

* update documentation

* use correct env

* dave suggestions

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
roshankern added a commit that referenced this pull request Feb 7, 2023
* Use 2015 data & remove holdout set (#5)

* finish download module changes

* download notebook

* rerun split data module

* rerun download module

* rerun train_model

* rerun evaluation module

* rerun interpretation module

* combine datasets

* combine datasets

* split changes

* update format

* format update

* format

* finish split data

* combine datasets, remove holdout

* formatting

* rerun pipelines

* remove folded class

* rerun pipeline

* Update utils/download_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* PR fixes

* module docstrings

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* add score util

* add/run notebook

* update documentation/formatting

* update documentation

* black formatting

* rename function

* compile tidy data

* update documentation, dave suggestions

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
roshankern added a commit that referenced this pull request Feb 10, 2023
* Use 2015 data & remove holdout set (#5)

* finish download module changes

* download notebook

* rerun split data module

* rerun download module

* rerun train_model

* rerun evaluation module

* rerun interpretation module

* combine datasets

* combine datasets

* split changes

* update format

* format update

* format

* finish split data

* combine datasets, remove holdout

* formatting

* rerun pipelines

* remove folded class

* rerun pipeline

* Update utils/download_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* PR fixes

* module docstrings

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* create single cell images module

* rename_module

* finish module

* remove sample images from PR

* Co-authored-by: Jenna Tomkinson <jenna.tomkinson@ucdenver.edu>

* documentation

* documentation

* dave suggestions

* Update utils/single_cell_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
roshankern added a commit that referenced this pull request Feb 10, 2023
* Use 2015 data & remove holdout set (#5)

* finish download module changes

* download notebook

* rerun split data module

* rerun download module

* rerun train_model

* rerun evaluation module

* rerun interpretation module

* combine datasets

* combine datasets

* split changes

* update format

* format update

* format

* finish split data

* combine datasets, remove holdout

* formatting

* rerun pipelines

* remove folded class

* rerun pipeline

* Update utils/download_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* PR fixes

* module docstrings

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* upload files

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
roshankern added a commit that referenced this pull request Feb 13, 2023
* Use 2015 data & remove holdout set (#5)

* finish download module changes

* download notebook

* rerun split data module

* rerun download module

* rerun train_model

* rerun evaluation module

* rerun interpretation module

* combine datasets

* combine datasets

* split changes

* update format

* format update

* format

* finish split data

* combine datasets, remove holdout

* formatting

* rerun pipelines

* remove folded class

* rerun pipeline

* Update utils/download_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* PR fixes

* module docstrings

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* save interpretations

* docs, recreate py file

* fix typo

* PR suggestions

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
roshankern added a commit that referenced this pull request Feb 14, 2023
* Use 2015 data & remove holdout set (#5)

* finish download module changes

* download notebook

* rerun split data module

* rerun download module

* rerun train_model

* rerun evaluation module

* rerun interpretation module

* combine datasets

* combine datasets

* split changes

* update format

* format update

* format

* finish split data

* combine datasets, remove holdout

* formatting

* rerun pipelines

* remove folded class

* rerun pipeline

* Update utils/download_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* PR fixes

* module docstrings

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* get predictions

* delete unused file, compiled predictions

* rerun evaluate module

* docs

* dave suggestions

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
roshankern added a commit that referenced this pull request Feb 17, 2023
* Use 2015 data & remove holdout set (#5)

* finish download module changes

* download notebook

* rerun split data module

* rerun download module

* rerun train_model

* rerun evaluation module

* rerun interpretation module

* combine datasets

* combine datasets

* split changes

* update format

* format update

* format

* finish split data

* combine datasets, remove holdout

* formatting

* rerun pipelines

* remove folded class

* rerun pipeline

* Update utils/download_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* PR fixes

* module docstrings

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* restructure PR curves notebook

* dave suggestions

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants