Use 2015 data & remove holdout set #5

roshankern · 2022-12-06T18:50:43Z

This PR is ready for review! This PR incorporates a newer version of mitocheck_data, downloading the 2015 MitoCheck dataset and merging it with the older dataset. The pipeline is then rerun with this expanded dataset.

The "holdout" dataset is also removed, leaving only the training and testing data subsets. Our logic here is that the application of the final phenotypic profiling model to other datasets (ex Cell Health) will validate the model in the same way a holdout dataset would.

d33bs

Nice job! I left a few comments and suggestions with this review. Please don't hesitate to let me know if you have any questions or if I may clarify at all.

I also wanted to follow up with a general question:

I noticed that notebook files 2.train_model/train_model.ipynb and 4.interpret_model/interpret_model.ipynb did not have nbconverted .py pair files. Should those be included with this PR?

0.download_data/README.md

0.download_data/scripts/nbconverted/download_data.py

1.split_data/README.md

3.evaluate_model/README.md

utils/download_utils.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

…iling_model into use-2015-data

roshankern · 2022-12-07T22:01:24Z

Thank you for the review @d33bs!

Notebook files 2.train_model/train_model.ipynb and 4.interpret_model/interpret_model.ipynb should not have .py files in this PR because their python files had no changes to be tracked in this PR (the Jupyter files were just rerun).

roshankern · 2022-12-07T22:55:42Z

Accidentally merged this PR without approval, but @d33bs and I discussed that everything was good to merge.

* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * PR fixes * module docstrings Co-authored-by: Dave Bunten <ekgto445@gmail.com> * add PR curves * get PR curves/data * update docs, recreate py files * greg recommendations Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * PR fixes * module docstrings Co-authored-by: Dave Bunten <ekgto445@gmail.com> * validation module * correlation matrix * reviz * reformat * pearson correlation * spearman correlation * documentation * documentation * docs * docs, rerun * docs * docs * Update 5.validate_model/validate_model.sh Co-authored-by: Dave Bunten <ekgto445@gmail.com> * Update utils/validate_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * raw link clarification * Update utils/validate_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * Update utils/validate_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * Update utils/validate_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * conditional to remove x, y columns * clarify perturbation rename * black formatting Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * PR fixes * module docstrings Co-authored-by: Dave Bunten <ekgto445@gmail.com> * move class PR curves * use typing tuple return hint * use tuple * confusion matrix evaluation * rename cm files * update documentation * code documentation * get model scores * undo last commit * update documentation * use correct env * dave suggestions --------- Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * PR fixes * module docstrings Co-authored-by: Dave Bunten <ekgto445@gmail.com> * add score util * add/run notebook * update documentation/formatting * update documentation * black formatting * rename function * compile tidy data * update documentation, dave suggestions --------- Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * PR fixes * module docstrings Co-authored-by: Dave Bunten <ekgto445@gmail.com> * create single cell images module * rename_module * finish module * remove sample images from PR * Co-authored-by: Jenna Tomkinson <jenna.tomkinson@ucdenver.edu> * documentation * documentation * dave suggestions * Update utils/single_cell_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> --------- Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * PR fixes * module docstrings Co-authored-by: Dave Bunten <ekgto445@gmail.com> * upload files --------- Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * PR fixes * module docstrings Co-authored-by: Dave Bunten <ekgto445@gmail.com> * save interpretations * docs, recreate py file * fix typo * PR suggestions --------- Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * PR fixes * module docstrings Co-authored-by: Dave Bunten <ekgto445@gmail.com> * get predictions * delete unused file, compiled predictions * rerun evaluate module * docs * dave suggestions --------- Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* Use 2015 data & remove holdout set (#5) * finish download module changes * download notebook * rerun split data module * rerun download module * rerun train_model * rerun evaluation module * rerun interpretation module * combine datasets * combine datasets * split changes * update format * format update * format * finish split data * combine datasets, remove holdout * formatting * rerun pipelines * remove folded class * rerun pipeline * Update utils/download_utils.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * PR fixes * module docstrings Co-authored-by: Dave Bunten <ekgto445@gmail.com> * restructure PR curves notebook * dave suggestions --------- Co-authored-by: Dave Bunten <ekgto445@gmail.com>

roshankern added 19 commits November 2, 2022 15:34

finish download module changes

864b030

download notebook

8164699

rerun split data module

30afbe6

rerun download module

15ab014

rerun train_model

8bf1b49

rerun evaluation module

346a6f5

rerun interpretation module

5100754

combine datasets

9957907

combine datasets

7545b43

split changes

ff9b450

update format

9fce6c3

format update

cd0cf84

format

b6532ef

finish split data

81d3f01

combine datasets, remove holdout

186bf63

formatting

47038b1

rerun pipelines

2227eb3

remove folded class

2ba8a2a

rerun pipeline

f3641c5

d33bs reviewed Dec 7, 2022

View reviewed changes

roshankern and others added 4 commits December 7, 2022 16:42

Update utils/download_utils.py

a84f4f9

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

PR fixes

b0cb7a6

Merge branch 'use-2015-data' of github.com:roshankern/phenotypic_prof…

ffbe170

…iling_model into use-2015-data

module docstrings

63dd268

roshankern merged commit 44e2741 into WayScience:main Dec 7, 2022

roshankern deleted the use-2015-data branch December 7, 2022 22:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use 2015 data & remove holdout set #5

Use 2015 data & remove holdout set #5

roshankern commented Dec 6, 2022

d33bs left a comment

roshankern commented Dec 7, 2022

roshankern commented Dec 7, 2022

Use 2015 data & remove holdout set #5

Use 2015 data & remove holdout set #5

Conversation

roshankern commented Dec 6, 2022

d33bs left a comment

Choose a reason for hiding this comment

roshankern commented Dec 7, 2022

roshankern commented Dec 7, 2022