Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

July 20 2020 Discussions (cell count confounders, cell health predictions) #47

Closed
shntnu opened this issue Jul 20, 2020 · 18 comments
Closed
Labels
Discussion and Notes Documenting ideas/discussions

Comments

@shntnu
Copy link
Collaborator

shntnu commented Jul 20, 2020

Let's use this thread to discuss any questions from today @jatinarora-upmc.

@shntnu shntnu added the Discussion and Notes Documenting ideas/discussions label Jul 20, 2020
@shntnu
Copy link
Collaborator Author

shntnu commented Jul 20, 2020

I am copying @gwaygenomics's question here

From Gregory Way to Everyone: (11:55 AM)

One thing regarding interpretation of morphology features: is it a bad idea to use the morphology feature selection approach to find candidate genes but then run followup tests on these 6 or so specific genes but use all morphology features


@shntnu
Copy link
Collaborator Author

shntnu commented Jul 20, 2020

@jatinarora-upmc Recap of Zernike: See #32 (comment)

@gwaybio
Copy link
Member

gwaybio commented Jul 20, 2020

I also had another question about nearest gene to GWAS signals. Do we see any of these pop up? How about GWAS gene neighborhoods?

@jatinarora-upmc
Copy link
Collaborator

jatinarora-upmc commented Jul 20, 2020

Quick notes from today's meeting on rare variant burden test on morphology features:

  1. check SLFN12 (having significant association with Cytoplasm_Areashape_Zernike_3_1 in any cells) in isolate cells also
  2. interaction between variant burden in a gene and ipsc source tissue, or donor ancestry
  3. cross check associations with images and live cell counter
  4. include doubling time as a covariate in association analysis
  5. can also include total number of cells in well as a proxy for cell cycle

@jatinarora-upmc
Copy link
Collaborator

I also had another question about nearest gene to GWAS signals. Do we see any of these pop up? How about GWAS gene neighborhoods?

@gwaygenomics thanks for bringing this up. Actually this is in to-do list once we are done with common and rare variant associations.
I was wondering if cell health can also be incorporated as a covariate.

@jatinarora-upmc
Copy link
Collaborator

I am copying @gwaygenomics's question here

From Gregory Way to Everyone: (11:55 AM)

> One thing regarding interpretation of morphology features: is it a bad idea to use the morphology feature selection approach to find candidate genes but then run followup tests on these 6 or so specific genes but use all morphology features

not a bad idea as we saw, while a feature has one or two associated genes, a single gene might impact many features. I think we could do this to check if super correlated features are affected by same genes - as a sanity check in the end.

@jatinarora-upmc
Copy link
Collaborator

@bethac07 @shntnu hi Beth, Shantanu, could you help me to get live cell counter information per well?

@shntnu
Copy link
Collaborator Author

shntnu commented Jul 22, 2020

Did you mean just cell count (vs fraction of live cells?) For the former see *_count.csv in https://github.com/broadinstitute/cmQTL/tree/master/1.profile-cell-lines/profiles. For the latter, we'd need to use models from https://github.com/broadinstitute/cell-health but it will need some effort to do that. If the latter, can you remind me of the context?

@jatinarora-upmc
Copy link
Collaborator

@shntnu actually, i meant the latter, fraction of live cells. The idea was to know how many good cells we have in the condition like this image. Actually, during last presentation, i wanted to ask your opinion to include cell health as a covariate in my model.
image

@shntnu
Copy link
Collaborator Author

shntnu commented Jul 22, 2020

@jatinarora-upmc Indeed fraction of live cells could be estimated using the Cell Health models like this.

@gwaygenomics What do you feel about Jatin using these models directly? There's no way to evaluate (in this dataset) but we'll know if it's totally off (e.g. if we get crazy numbers). The results could well be totally off the charts because the models were trained on a very different cell line. But certainly worth testing it out IMO (assuming it will take Jatin no more than 2 days to apply and test)

@gwaybio
Copy link
Member

gwaybio commented Jul 22, 2020

@gwaygenomics What do you feel about Jatin using these models directly? There's no way to evaluate (in this dataset) but we'll know if it's totally off (e.g. if we get crazy numbers). The results could well be totally off the charts because the models were trained on a very different cell line. But certainly worth testing it out IMO (assuming it will take Jatin no more than 2 days to apply and test)

Sounds cool! @jatinarora-upmc and I chatted separately on slack (sorry for not posting my thoughts earlier) but I will summarize below:

I won't be able to get to this for a couple days though, so let's brainstorm if I can do anything else in this time period (but please be gentle and weary of feature creep!)

@shntnu
Copy link
Collaborator Author

shntnu commented Jul 22, 2020

Fantastic!

The only other request is: also test a couple of well-performing models that can be easily validated by using CellProfiler features. From the list below, I'd go with cc_all_n_objects and cc_all_nucleus_area_mean (feature mapping is here). Does that sound reasonable @gwaygenomics ?

image

@gwaybio
Copy link
Member

gwaybio commented Jul 22, 2020

that's perfect - will do!

@gwaybio
Copy link
Member

gwaybio commented Aug 16, 2020

I started this analysis today and ran into a road block. It turns out there are 506 features measured in the Cell Health project that are not measured in the cmQTL project. Many of these features have nonzero coefficients for the three models we proposed using. The cmQTL data I am using (Jatin sent over a .tab file on dropbox) has 3,582 features. The missing features are all texture and correlation features.

Unless we can resolve this feature difference, then the Cell Health models can not easily be applied to the cmQTL data and we should abandon this analysis.

@gwaybio
Copy link
Member

gwaybio commented Aug 16, 2020

I added my progress in #51 - if we can resolve this, then outputting predictions can happen very quickly

@bethac07
Copy link
Contributor

Many of those features may still actually be measured*, just have different names, since IIRC CellHealth was CellProfiler 2 and cmQTL is definitely CellProfiler 3. Is there a list of the unique features from each set somewhere? We may be able to do a fair amount of cross referencing.

  • = The implementation of Texture is pretty different between CellProfiler 2 and 3, but one would HOPE anyway that even with a different implementation, Texture at a given angle and scale is still useful no matter the implementation.

@shntnu
Copy link
Collaborator Author

shntnu commented Aug 18, 2020

Let's split off the cell health-related discussion to this thread #53

@jatinarora-upmc
Copy link
Collaborator

@gwaygenomics @bethac07 @shntnu just following up on cell health readouts, was it feasible to align the features?

@shntnu shntnu changed the title July 20 2020 Discussions July 20 2020 Discussions (cell count confounders, cell health predictions) Dec 2, 2020
@shntnu shntnu closed this as completed May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion and Notes Documenting ideas/discussions
Projects
None yet
Development

No branches or pull requests

4 participants