Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nov 2020 Discussions (associations with PRLR) #64

Closed
jatinarora-upmc opened this issue Dec 9, 2020 · 48 comments
Closed

Nov 2020 Discussions (associations with PRLR) #64

jatinarora-upmc opened this issue Dec 9, 2020 · 48 comments
Labels
Discussion and Notes Documenting ideas/discussions

Comments

@jatinarora-upmc
Copy link
Collaborator

jatinarora-upmc commented Dec 9, 2020

Rare variation in PRLR gene is associated with multiple traits, but only in isolate cells. One of such traits is Cells_RadialDistribution_MeanFrac_ER_4of4.
image

Other associated traits are:
image

About PRLR:

  • PRLR is membrane-bound receptor whose ligand is prolactin.
  • Prolactin-dependent signaling occurs as the result of dimerization of the prolactin receptor.
  • PRLR -/- cells have smaller mitochondria (Viengchareun 2008, Gorvin 2015).
  • Another interesting part of PRLR is that it contains (within its gene body) 4 promoters/enhancers of AGXT2 gene, which is an mitochondrial aminotransferase.

So, can it be that differential regulation of AGXT2 might be linked to the localisation of mitochondria?
The variants taken for rare variant burden test (high or moderate impact, protein coding variants) in PRLR gene did not overlap with AGXT2 promoters/enhancers. But they could very possibly be in LD with low impact variants.
I will check this. Please let me know if you have any thoughts meanwhile.
@shntnu @AnneCarpenter @bethac07 @raldanehme

@shntnu
Copy link
Collaborator

shntnu commented Dec 9, 2020

Tagging @raldanehme separately because she may not have gotten the notification (GitHub does not notify if you mention someone in an edit to a comment, like you did above @jatinarora-upmc )

@AnneCarpenter
Copy link

The fact that the feature list includes both Cells_RadialDistribution_MeanFrac_ER_4of4 plus the same feature with Mito in place of ER tells me this is more about cell shape than intensity patterns of those particular channels. The mean intensity of DNA stain at edge of the Cell compartment implies to me that the cells are rounded up. It would help to see images to confirm this.

Do you have any mechanism for normalizing features to try to remove the impact of cell count? @jatinarora-upmc

@shntnu shntnu added the Discussion and Notes Documenting ideas/discussions label Jan 5, 2021
@jatinarora-upmc
Copy link
Collaborator Author

@shntnu @AnneCarpenter @raldanehme @bethac07 Dear all, in the final test, rare variant burden in PRLR is associated with 3 traits (column feat in screenshot) in isolate cells.
image

  • as we discussed above, trait 'Cells_Intensity_MaxIntensityEdge_DNA' is about the roundness of the cells.
  • i am not sure about 'Cytoplasm_Texture_InfoMeas1_DNA_20_00'
  • trait 'Cells_RadialDistribution_RadialCV_Mito_1of4' tells about distribution of mito in the first ring around nucleus. The association with rare variant burden is negative (effect = -1.18). I am making a schematic about this association, so reader can understand. Is this depiction below of distribution of mito inside the cell correct for this association?
    image

Btw, these 3 associations with PRLR are just nominally significant (p ~ 10^-4) in non-isolate cells (having any neighbor).

@AnneCarpenter
Copy link

I suspect Cytoplasm_Texture_InfoMeas1_DNA_20_00 is related to the same issues of cell shape as Cells_Intensity_MaxIntensityEdge_DNA, given that it's again about DNA in the cytoplasm - @bethac07 should ponder it too and confirm (I don't recall what InfoMeas1 is, nor the impact of those scale numbers 20_00). @jatinarora-upmc you can solidify these hypotheses by plotting on a single cell level the relationship/correlation between those features and each other, and those features and cell area.

The third one might be interestingly about mitochondrial distribution but my suspicion again is that it may be directly related to the same issues of cell shape as the other two. If a cell is rounded up, the mito stain will be preferentially in the inner ring around the nucleus (ring 1 out of 4 concentric rings).

Another collaborator made a schematic about this (not to be published, but just to give you the concept):
Screen Shot 2021-03-08 at 8 22 47 AM

BUT I don't know what RadialCV means in the context of this CV so hopefully Beth can illuminate (or you can check CellProfiler's manual). It may be that in the innermost ring around the nucleus, the mitochondria have a higher CV, which would mean more bright and dim (aka contrasty) staining as opposed to smooth uniform staining.

@bethac07
Copy link
Contributor

bethac07 commented Mar 8, 2021

@jatinarora-upmc 's intution about Radial CV is correct; essentially, within each "ring", is the staining uniformly distributed or unevenly distributed. I think you have it backwards, though- if it has a lower coefficient of variation, wouldn't that mean the variation is lower, and therefore the staining is more evenly distributed in the variant?

Per the CellProfiler documentation, InfoMeas1 is defined like this (mathematical definition here ).

InfoMeas1: A measure of the total amount of information contained within a region of pixels derived from the recurring spatial relationship between specific intensity values.

I would not though that either the Cytoplasm_Texture measure or the Cell_Intensity measure had anything to do with cell size or shape, except perhaps incidentally; both are measuring DNA in places where no DNA should be (within the cytoplasm, or at the outside edge of the cell, respectively), so honestly I'd say they are measures of either segmentation quality (aka that segmentation is worse in those variants, perhaps due to size or shape) or crowding. As @AnneCarpenter said, it would really be nice to see the images.

@AnneCarpenter
Copy link

Oh, Beth - there's no written trail but we discussed in our meetings - the link to cell size/shape is that we hypothesize the cells are packed in closely so that one's DNA overlaps another cell's cytoplasm.

Regarding the drawing, wouldn't both drawings have identical CV because it's more about the CV of pixel intensities within the ring, not paying attention to spatial arrangements? So to get a different CV we'd need the distribution of pixel intensities to be either more uniform or less inform to get changes in CV?

@bethac07
Copy link
Contributor

bethac07 commented Mar 8, 2021

the link to cell size/shape is that we hypothesize the cells are packed in closely so that one's DNA overlaps another cell's cytoplasm

Sure, but this association is in theory in cells that have no neighbors, which is why my $ would be on segmentation errors.

The CV metric in question is "divide each ring into 8 'wedges' and look at coefficient of variation". I'd have to (and can, if need be) dig into the source code to say more, but I think it essentially does end up breaking down into "how evenly around the ring is the staining distributed".

@AnneCarpenter
Copy link

Oh, GREAT catch on this being supposedly isolated cells. Fully agree, then.

And yes, I wasn't aware of the wedges so Jatin's original interpretation makes sense, that they would be clustered together spatially. Thanks for answering both definitively!

@jatinarora-upmc
Copy link
Collaborator Author

Thanks much everyone for detailed comments.
Here i attach RadialCV_mito association and images of 3 cell lines without and with rare variant burden in PRLR.
Isolate cells are much fewer than non-isolate cells, so you might have to spot them a little bit.
It seems their segmentation would be fine.
prlr.pdf

@AnneCarpenter
Copy link

I wish we could display individual isolated cells from the two classes instead of a field of view when so few cells are in that class. @shntnu is it too painful to pick random cells from the isolate class to make a montage of ~50 of each from across samples and fields?

Also, Jatin, have you looked at per cell histograms for these metrics to see whether in PRLR mutants the whole population shifts a bit higher or lower vs a few cells becoming outliers which causes the mean to shift (unless you're using median but still it might be nice to see the single cell data for the 3 features).

@shntnu
Copy link
Collaborator

shntnu commented Mar 8, 2021

is it too painful to pick random cells from the isolate class to make a montage of ~50 of each from across samples and fields?

It's likely painful :D But I've pinged profilers.

@bethac07
Copy link
Contributor

bethac07 commented Mar 9, 2021

I don't see any segmentation outlines on that image, unless I'm missing something? So I'm not sure on what basis we could say segmentation is or isn't fine.

@AnneCarpenter
Copy link

Once again awesome catch - to elaborate, Jatin: for example, if a single isolated cell is mistakenly segmented into two by splitting the nucleus down the middle, it would cause there to be DNA stain very close to the edge of each of the two half-cells and could explain the behavior we are seeing; it could explain all 3 features in fact.

Although this is a technical artifact, this is not to say that there is no actual phenotype here, it could be something like a lumpy nucleus that gets split into two. But if the numbers of isolated cells being analyzed is small, here, it could just be a technical artifact and not scientifically interesting. So Jatin, we would want to check the absolute numbers of isolated cells for these samples.

@AnneCarpenter
Copy link

I imagine Jatin you need help to see the segmentation. You need the raw images (as their separate channels, not overlaid) + the original CellProfiler pipeline. Then it's easy to run (using the version of CP that was used originally). Let us know if you need help on any of those steps (Beth I cannot recall which person on your team to point to, if any ?)

Unless we saved the segmentation outlines, which I doubt.

@bethac07
Copy link
Contributor

bethac07 commented Mar 9, 2021 via email

@jatinarora-upmc
Copy link
Collaborator Author

Hi everyone, thanks again for replies. Here is the comparison of number of isolate cells (n on y axis) in cell lines with (1 on x axis) and without (0 on x axis) rare variant burden in prlr gene. This is a significant difference, potentially because of unbalanced size, but you can see cell lines with rare variant burden do no have really that different number of isolate cells.
image

I do have channel images, and will check for segmentation. But I would need help in this, because your eyes are trained to look at them :)

@AnneCarpenter
Copy link

Great, the cell count is low but not egregiously low to the point where it's likely to be an artifactual result. That's a relief.

Super, if you cannot find segmentation it sounds like Beth can help if you tell her the details.

@jatinarora-upmc
Copy link
Collaborator Author

@bethac07 hi Beth, could you please help in check segmentation outlines?
The images of 3 cell lines (shown above) having rare variant burden in PRLR are:

In the format of Plate:Image
BR00107338:r11c01f05p01
with_variant_BR00107338_r11c01f05p01 5chanels
BR00107339:r07c15f05p01
image
BR00106708:r06c04f03p01
image

The plate and images of all 7 cell lines (including 3 shown above) with rare variant burden are here.
r.*03p01 and r.*05p01 are 3rd or 5th field of view i guess.

Cell Line ID Plate:Image
214 BR00107339:r07c15f03p01
214 BR00107339:r07c15f05p01
181 BR00106708:r06c04f03p01
181 BR00106708:r06c04f05p01
32 cmqtlpl1.5-31-2019-mt:r06c05f03p01
32 cmqtlpl1.5-31-2019-mt:r06c05f05p01
238 BR00107338:r11c01f03p01
238 BR00107338:r11c01f05p01
29 cmqtlpl1.5-31-2019-mt:r03c23f03p01
29 cmqtlpl1.5-31-2019-mt:r03c23f05p01
153 BR00106708:r16c19f03p01
153 BR00106708:r16c19f05p01
260 BR00107338:r06c23f03p01
260 BR00107338:r06c23f05p01

Please let me know if I can help with anything else.

@bethac07
Copy link
Contributor

bethac07 commented Mar 9, 2021

Those images are not sufficient; you would need the segmentation outlines, aka the outlines of what is called the nucleus and what is called the cell body. Since those cells are relatively rare, you'd likely want to look at all 4 (or 5? ) wells * 9 images per well for the segmentation to see if there are any trends (as well as probably a similar number of "control" images to see if the level of mistake is comparable between this and average or not).

@jatinarora-upmc
Copy link
Collaborator Author

Thanks Beth. Alright. There would be 8 wells per cell line * 8 images per well.
I have no idea how and where to look for segmentation outlines. Do you have them already calculated?
Do the plate (batch) information and cell line IDs would be sufficient information to pull out segmentation outlines.

@bethac07
Copy link
Contributor

bethac07 commented Mar 9, 2021

We sometimes save them after generation and sometimes do not; I would need to know which batch(es) to check (but likely if it's on for one it's on for all).

If they are not already calculated, we'd need to re-run CellProfiler to re-generate them.

@jatinarora-upmc
Copy link
Collaborator Author

Sure, here is plate (batch) and image information of cases and control cel lines.

Cases    
Cell Line ID Plate:Image  
214 BR00107339:r07c15f03p01
214 BR00107339:r07c15f05p01
181 BR00106708:r06c04f03p01
181 BR00106708:r06c04f05p01
32 cmqtlpl1.5-31-2019-mt:r06c05f03p01
32 cmqtlpl1.5-31-2019-mt:r06c05f05p01
238 BR00107338:r11c01f03p01
238 BR00107338:r11c01f05p01
29 cmqtlpl1.5-31-2019-mt:r03c23f03p01
29 cmqtlpl1.5-31-2019-mt:r03c23f05p01
153 BR00106708:r16c19f03p01
153 BR00106708:r16c19f05p01
260 BR00107338:r06c23f03p01
260 BR00107338:r06c23f05p01
     
Controls    
Cell Line ID Plate:Image  
112 BR00107338:r04c16f03p01
112 BR00107338:r04c16f05p01
136 BR00106709:r03c08f03p01
136 BR00106709:r03c08f05p01
30 cmqtlpl1.5-31-2019-mt:r02c04f03p01
30 cmqtlpl1.5-31-2019-mt:r02c04f05p01
255 BR00107338:r16c15f03p01
255 BR00107338:r16c15f05p01
195 BR00107339:r01c05f03p01
195 BR00107339:r01c05f05p01
12 cmqtlpl1.5-31-2019-mt:r02c24f03p01
12 cmqtlpl1.5-31-2019-mt:r02c24f05p01
158 BR00106708:r16c10f03p01
158 BR00106708:r16c10f05p01
215 BR00107339:r03c11f03p01
215 BR00107339:r03c11f05p01
233 BR00107338:r01c01f03p01
233 BR00107338:r01c01f05p01
277 cmQTLplate7-7-22-20:r07c19f03p01
277 cmQTLplate7-7-22-20:r07c19f05p01
206 BR00107339:r09c10f03p01
206 BR00107339:r09c10f05p01
248 BR00107338:r02c19f03p01
248 BR00107338:r02c19f05p01
113 BR00106709:r10c24f03p01
113 BR00106709:r10c24f05p01
155 BR00106708:r13c19f03p01
155 BR00106708:r13c19f05p01
125 BR00106709:r08c17f03p01
125 BR00106709:r08c17f05p01
227 BR00107339:r12c05f03p01
227 BR00107339:r12c05f05p01
222 BR00107339:r14c03f03p01
222 BR00107339:r14c03f05p01
213 BR00107339:r06c02f03p01
213 BR00107339:r06c02f05p01
265 BR00107338:r16c17f03p01
265 BR00107338:r16c17f05p01

@bethac07
Copy link
Contributor

bethac07 commented Mar 9, 2021

Plate information is not batch information, do you have easy access to the batch numbers? Otherwise I need to go hunting.

@bethac07
Copy link
Contributor

bethac07 commented Mar 9, 2021

(There are also for your cases only one well (with two images) listed for each, you said there were 8 wells per line?)

@bethac07
Copy link
Contributor

bethac07 commented Mar 9, 2021

I went ahead and spot checked, in at least a couple of batches the images are there so it's likely there for all of them, you can find them (assuming you have access, since you have some original images, @shntnu is this in fact the case?) at s3://imaging-platform/projects/2018_06_05_cmQTL/workspace/analysis/BATCH/PLATE/analysis/PLATE-WELL-SITE/outlines/WELL_sSITE--nuclei_outlines.png and s3://imaging-platform/projects/2018_06_05_cmQTL/workspace/analysis/BATCH/PLATE/analysis/PLATE-WELL-SITE/outlines/WELL_sSITE--cell_outlines.png. You'd want to ideally pull all images (raw AND outlines) from all wells of all the lines in question. When you have that pulled together, I can walk you through the next step. Hope that helps!

@jatinarora-upmc
Copy link
Collaborator Author

Actually, i have only two random images per cell lines, which were kindly provided by Shantanu.
@shntnu could you please help in pulling out all images from all wells for the cell lines in the table above?

@shntnu
Copy link
Collaborator

shntnu commented Mar 21, 2021

@jatinarora-upmc I've updated the files that we used in #35, and it now includes the outline files as well.

Recap:

  • Metadata is here
  • The sample_images.csv file used in the script below is here
  • Download them all like this:
IMAGE_DIR=/tmp/cmqtl

mkdir -p $IMAGE_DIR

cut -d"," -f1 data/sample_images.csv | grep -v Metadata_Plate| sort -u > /tmp/plates.txt

parallel -a /tmp/plates.txt --no-run-if-empty mkdir -p $IMAGE_DIR/{} 

parallel \
 --header ".*\n" \
 -C "," \
 -a data/sample_images.csv \
 --eta \
 --joblog ${IMAGE_DIR}/download.log \
 wget -q -O ${IMAGE_DIR}/{1}/{4} {5}

@jatinarora-upmc
Copy link
Collaborator Author

Hi @shntnu , thanks much for the files.
Are these images and outlines randomly sampled or they are for the cell lines I listed in the table above?
I am asking because I see that images for one plate (cmqtlpl261-2019-mt) is missing.

@shntnu
Copy link
Collaborator

shntnu commented Mar 24, 2021

they are for the cell lines I listed in the table above?

Yes

I am asking because I see that images for one plate (cmqtlpl261-2019-mt) is missing.

cmqtlpl261-2019-mt is not listed above

@jatinarora-upmc
Copy link
Collaborator Author

@bethac07 hi Beth. Shantanu has provided me the images and outlines of all cell lines in question.
Could you please help me in next steps to make sure that PRLR's associations is not any technical artifacts?

@bethac07
Copy link
Contributor

Sure, you'll want to do something so you can look at the outlines at the same time as the images (ideally, literally on top of the images); this could be an ImageJ script, a CellProfiler pipeline, something in your favorite scripting language, etc.

I'm happy to put together a quick CellProfiler pipeline for you to do that if that's helpful, just send me a zipped thing with all of the images (raw + outlines) from one field of view and LMK what version of CellProfiler pipeline you have.

@jatinarora-upmc
Copy link
Collaborator Author

jatinarora-upmc commented Mar 25, 2021

@bethac07 Hi Beth, it would be really helpful to have a script, as am almost not at all aware of ImageJ and CellProfiler pipelines. Thanks very much. Here is the link to the images from all 5 channels from two field of views (f03 and f05) and all cell outline images. To note, this is for 1 cell line only with rare variants in PRLR gene.
https://drive.google.com/file/d/1DAvhlAOOnaavqdRZlSnj08UY1eWuQPo7/view?usp=sharing
Please let me know if i am missing anything.

@bethac07
Copy link
Contributor

I dont' have any prewritten scripts to do that, if you want to do it in a script I would suggest you do it by modifying whatever code you made to create the views above.

If you're willing to go to CellProfiler.org though and just download the program, I can send you a pipeline so that in theory you just drag the pipeline to where it says "drag and drop pipeline", drag and drop your images to where it says "drag and drop images", set the folder for output to go to, and then click "analyze".

@bethac07
Copy link
Contributor

(if you want me to do that, let me know if you plan to include only the cell outlines or also the nuclear outlines, my suggestion would be to do both but only cell were included in the folder you setn)

@jatinarora-upmc
Copy link
Collaborator Author

jatinarora-upmc commented Mar 25, 2021

@bethac07 Hi Beth, yeah sure, it would also be great to have pipeline that i can import, so i can try to explore Cellprofiler by myself. I plan to use only cell outlines for now, as i have them only.

@jatinarora-upmc
Copy link
Collaborator Author

@shntnu hello Shantanu, i noticed that png images for nuc profiles were empty (0kb). Could you please check?

@bethac07
Copy link
Contributor

Pipeline is here. It is set to match outline images to raw images by well and site, not plate because I don't know how you're designating plate on your system; you may have to add metadata extraction for "Plate" as well. (You will need to do this if any individual well position (ie A01) is used more than once- you know you will have to, because the system will yell at you saying that some things in NamesAndTypes can't be matched; in that case, if you let me know how files are organized on your system I can quickly adjust the pipeline).
cmqtl_outline_overlay.cppipe.zip

@jatinarora-upmc
Copy link
Collaborator Author

@bethac07 Hi Beth, thanks for the reply.

  • i do have all fields of view (n=9) per well for cell lines in question.
  • the cellprofiler version used for images are 3.1.8. Is your pipeline also in this version?
  • yes, i do have the same well on different plates belonging to different cell lines. For example, well F23 on plate BR00107338 belong to cell line 260 and the same well on plate BR00107339 belong to cell line 214.
  • there are 7 plates which are named as cmqtlpl261-2019-mt, cmqtlpl1.5-31-2019-mt, cmQTLplate7-7-22-20, BR00106709, BR00106708, BR00107338, BR00107339. Could you please adjust the pipeline for plate also?

@bethac07
Copy link
Contributor

This pipeline is the most recent version of CellProfiler (4.1.3), since that's what you'll be downloading if you don't currently have it on your computer. It doesn't matter if the versions aren't the same since it's literally just adding the existing outlines to the existing images, no calculations are being done.

I need to know how the folders are arranged on your system to capture plate in the pipeline.

@jatinarora-upmc
Copy link
Collaborator Author

@bethac07 sure, the plates are arranged as individual folders (screenshot)
image

@bethac07
Copy link
Contributor

Are the images then in those top-level plate folders? IE is it BR00106708/r00c00etc, or is it BR001067088/somesubfolder1/somesubfolder2/r00c00etc?

@jatinarora-upmc
Copy link
Collaborator Author

the images are under these top-level plate folders.
An example of image would be BR00106708/r01c13f01p01-ch1sk1fk1fl1.tiff

@bethac07
Copy link
Contributor

@jatinarora-upmc
Copy link
Collaborator Author

Thanks much @bethac07 for the pipeline. So basically i need to do following steps:

  1. put images (both outlines and all fields of view) under the plate-level directories (as shown above) for selected cell lines in question
  2. browse for images and select folder one level above plate-level directories in CP
  3. change/Set default output folder
  4. click on Analyze images in CP

is this correct?

@bethac07
Copy link
Contributor

If you drag and drop in the whole folder containing your plate-level directories, it will grab any and all image files there, so I would only drag and drop in the plate level directories if any other subfolders are present that you DON'T want analyzed. You also need to load (via dragging and dropping or File -> Import) the pipeline file I sent at any point between steps 1 and 4. Otherwise, yes, correct

@jatinarora-upmc
Copy link
Collaborator Author

Hi @bethac07, all worked well, and i have overlaid images now. Now, the idea is to inspect the overlaid images visually and look if there are any segmentation problems, is it right?

@bethac07
Copy link
Contributor

bethac07 commented Apr 1, 2021 via email

@jatinarora-upmc
Copy link
Collaborator Author

Great, segmentations seems fine across many images i checked so far, but checking for patterns across all images. Thanks so much for your kind help Beth.

@shntnu shntnu closed this as completed May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion and Notes Documenting ideas/discussions
Projects
None yet
Development

No branches or pull requests

4 participants