Required Steps for Depositing Profiles #4

gwaybio · 2020-03-08T17:27:53Z

I am working towards processing all Drug Repurposing data and adding the results in this repository. The cell health project (https://github.com/broadinstitute/cell-health) now requires that the data are uniformly processed, documented, and made available here.

I will outline below the necessary steps required to get the data and processing pipelines uploaded.

Make sure there are only small floating point differences between cytominer-derived profiles and pycytominer-derived profiles.
- We are discussing this in Adding Level 3-5 Cell Painting Data Questions #3
- I noted a potential discrepancy in cytominer-based documentation that needs addressing
Implement broad sample specific annotations
- This implementation is a work in progress here Add CMAP options to annotate cytomining/pycytominer#73
- @shntnu I will likely need some guidance on this specific point
Rerun the "all" profiles pipeline described in broadinstitute/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad#3 (currently a private repo)
- This needs to be rerun with the updated robustize_mad normalization strategy, which will also require a decision on whole-plate or DMSO-specific normalization.
Rerun 4.apply module in cell-health
- Only after steps 1-3 are complete, can I rerun the 4.apply module
- I will explore whether or not to make the lincs-cell-painting profile repository a submodule of the cell-health project

The text was updated successfully, but these errors were encountered:

shntnu · 2020-04-03T16:32:12Z

Implement broad sample specific annotations

This implementation is a work in progress here cytomining/pycytominer#73

@shntnu I will likely need some guidance on this specific point

@gwaygenomics Can remind about the input you need on this? I'll use cytotools/annotate as a reference to provide inputs.

gwaybio · 2020-04-03T16:41:44Z

Can remind about the input you need on this? I'll use cytotools/annotate as a reference to provide inputs.

Ah, that is a good reference, thanks for the pointer.

I wasn't sure about the cytominer strategy of splitting core functionality from cyto-specific functionality so I put cytominer progress on hold. The primary reason for putting it on hold was so that the lincs data could be processed with a more stable (and thus more reproducible) tool.

However, it sounds like the stability of cytominer (and pycytominer) is likely to occur in a longer timeframe than we need the lincs profiles. A potential intermediate solution could be to freeze a pycytominer version using conda (after confirming floating point differences) for lincs-specific processing. What do you think?

shntnu · 2020-04-03T16:46:48Z

Rerun the "all" profiles pipeline described in broadinstitute/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad#3 (currently a private repo)

This needs to be rerun with the updated robustize_mad normalization strategy, which will also require a decision on whole-plate or DMSO-specific normalization.

Going forward, we will very likely produce at least two different Level 4a profiles

whole-well z-scored
DMSO z-scored
because depending on the layout, one might be better than the other.

We will then produce corresponding 4b (normalized feature selected) versions of the two 4a profiles.

We will also produce corresponding 4w (normalized and whitened) versions of the two 4a profiles.

Which among these profiles are best for an application is still an open research question. But until then, we just produce them all.

@gwaygenomics Does that sound reasonable?

This does complicate the analysis for cell-health because you now need to decide which of the two 4a profiles you should use for predictions. For that case, I'd go with whole-plate because that makes it similar to the way you've processed the CRISPR data IIRC>

shntnu · 2020-04-03T16:48:50Z

A potential intermediate solution could be to freeze a pycytominer version using conda (after confirming floating point differences) for lincs-specific processing. What do you think?

That sounds good to me, and will very likely be the strategy we will use for all data processing using pycytominer, right?

gwaybio · 2020-04-03T19:55:13Z

@shntnu and I chatted about this offline. I will summarize our decisions below:

I will confirm floating point differences in pycytominer (compared to current cytominer profiles)
I will apply the two normalization schemes (whole-well and DMSO)
These two normalization schemes will propagate to two separate feature selected files and two separate consensus files

Also, here are answers to the specific questions:

For that case, I'd go with whole-plate because that makes it similar to the way you've processed the CRISPR data IIR

I normalize profiles by EMPTY CRISPR perturbations. See here.

That sounds good to me, and will very likely be the strategy we will use for all data processing using pycytominer, right?

Similar, but not exactly the same. Eventually pycytominer will be traditionally versioned on pypi and conda. Currently, pycytominer is versioned by github hash (see here). It is also worth noting that we can always reprocess the profiles again. This is the beauty of versioned data!

gwaybio · 2020-04-28T20:33:46Z

@shntnu I have a couple followup questions now that I've started adding the processing code in #21 (cc @niranjchandrasekaran)

Question 1 - Should we use z-score normalization or `robustize_mad`?

Going forward, we will very likely produce at least two different Level 4a profiles
whole-well z-scored
DMSO z-scored
because depending on the layout, one might be better than the other.

The default in cytominer_scripts/normalize.R is robustize. I assume that I should continue using this method.

Question 2 - Is it ok to leave the whitened version for a future update?

We will also produce corresponding 4w (normalized and whitened) versions of the two 4a profiles.

Pycytominer currently does have a whiten implementation, and I applied it to the two 4a profiles in a test case. The test case did not go smoothly, so it is likely I will need to tinker with the pycytominer implementation a bit (hard to estimate how long the delay will be).

Question 3 - How should I form the level 5 consensus data?

My current plan is as follows:

Process each plate independently
Generate an across-plate consensus signature on broad_sample and dose.
The consensus signature will be based on median
Output one single file for the full consensus signature
Output a separate file for a feature selected consensus signature (derived after calculating consensus)

shntnu · 2020-04-29T04:08:55Z

The default in cytominer_scripts/normalize.R is robustize. I assume that I should continue using this method.

Yes. Rationale: mostly empirical – robustize resulted in higher (compared to standardize) replicate correlations of Level 4 across a few experiments we tested this in.

Question 2 - Is it ok to leave the whitened version for a future update?

Yes, definitely ok.

How should I form the level 5 consensus data?

Your plan sounds good.

There's an incompatibility that I need to address in the handbook cytomining/profiling-handbook#53. Ugh. So glad we are thinking through provenance and reproducibility via this project!

gwaybio · 2020-05-15T18:29:09Z

Closing this issue in favor of project management in https://github.com/broadinstitute/lincs-cell-painting/projects/1

This was referenced Apr 8, 2020

Adding platemap metadata #10

Merged

Add CMAP options to annotate cytomining/pycytominer#73

Merged

This was referenced Apr 23, 2020

Old/Updated Broad IDs #11

Closed

Adding image-based profiling code #21

Merged

gwaybio mentioned this issue Apr 30, 2020

Updated Strategy for Adding Profiles #22

Closed

shntnu mentioned this issue May 15, 2020

Adding Levels 3 and 4 Profile Data #34

Merged

This was referenced May 15, 2020

Add consensus perturbation signatures #36

Closed

Add whitening normalization to this repo #38

Closed

gwaybio closed this as completed May 15, 2020

gwaybio mentioned this issue May 15, 2020

Evaluate profile normalization strategies #39

Open

gwaybio mentioned this issue Jul 26, 2020

Whiten not functioning smoothly cytomining/pycytominer#89

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Required Steps for Depositing Profiles #4

Required Steps for Depositing Profiles #4

gwaybio commented Mar 8, 2020 •

edited

Loading

shntnu commented Apr 3, 2020 •

edited

Loading

gwaybio commented Apr 3, 2020

shntnu commented Apr 3, 2020 •

edited

Loading

shntnu commented Apr 3, 2020

gwaybio commented Apr 3, 2020 •

edited

Loading

gwaybio commented Apr 28, 2020 •

edited

Loading

shntnu commented Apr 29, 2020

gwaybio commented May 15, 2020

Required Steps for Depositing Profiles #4

Required Steps for Depositing Profiles #4

Comments

gwaybio commented Mar 8, 2020 • edited Loading

shntnu commented Apr 3, 2020 • edited Loading

gwaybio commented Apr 3, 2020

shntnu commented Apr 3, 2020 • edited Loading

shntnu commented Apr 3, 2020

gwaybio commented Apr 3, 2020 • edited Loading

gwaybio commented Apr 28, 2020 • edited Loading

Question 1 - Should we use z-score normalization or robustize_mad?

Question 2 - Is it ok to leave the whitened version for a future update?

Question 3 - How should I form the level 5 consensus data?

shntnu commented Apr 29, 2020

gwaybio commented May 15, 2020

gwaybio commented Mar 8, 2020 •

edited

Loading

shntnu commented Apr 3, 2020 •

edited

Loading

shntnu commented Apr 3, 2020 •

edited

Loading

gwaybio commented Apr 3, 2020 •

edited

Loading

gwaybio commented Apr 28, 2020 •

edited

Loading

Question 1 - Should we use z-score normalization or `robustize_mad`?