initial draft of visualize cells #18

ErinWeisbart · 2020-06-08T19:26:26Z

Here's what I've got so far. Need a few things fixed/clarified before I can move on to the rest. See comments for specifics.

ErinWeisbart · 2020-06-08T19:30:13Z

0.preprocess-sites/7.visualize-cell-summary.py

+        cell_count_df.site,
+        categories=(
+            cell_count_df
+            .groupby("site")


I don't fully understand what all of the steps here are doing to the site column

I'm having problems with .groupby (here and elsewhere). I believe I'm understanding that this should sort the sites in descending order of the sum of their cell_count, but it's not working.

the pd.Categorical() command is the key thing. It is a strange (but powerful) nuance that is important for plotting purposes. With pd.Categorical() one can define the order of an axis containing categorical variables.

So the understanding in 2 is correct. What error are we getting with .groupby?

It doesn't throw an error but from what I can tell it's just not actually doing anything.

0.preprocess-sites/7.visualize-cell-summary.py

ErinWeisbart · 2020-06-08T19:59:29Z

0.preprocess-sites/7.visualize-cell-summary.py

+
+cell_count_gg
+
+#Same graph as above, separated by Well.


This graph isn't separating by well as desired.

The starting well_order I have here doesn't work. (It was hardcoded before but this isn't the correct way to fix it). This messes up the Well column in the df.

The row and ncol will need to be calculated instead of manually set

well_order isn't probably encoded in the site column, correct? There should be some other column that has well information. After fixing this and the graph still doesn't work, lets setup a time to chat (today if possible)

ErinWeisbart · 2020-06-08T20:11:32Z

0.preprocess-sites/7.visualize-cell-summary.py

+
+all_well_count_df
+
+a1_sum = all_well_count_df.groupby("Well")["cell_count"].sum()["A1"]


a1_sum and a2_sum here need to not be hardcoded (also affecting .ggtitle below).

what is this logic doing? what is the plot for? Is it just total cell count in each well? We probably don't need that info in the title

yes, it's just in the title. I can drop it.

gwaybio · 2020-06-09T13:44:52Z

0.preprocess-sites/7.visualize-cell-summary.py

+quality_func = core_args["categorize_cell_quality"]
+
+barcode_cols = spot_args["barcode_cols"]
+barcode_cols = ["Metadata_Foci_" + col for col in barcode_cols]


we don't want to append a prefix if it is already there. We had code somewhere to do this check, right? Might be worth splitting out an add_prefix() function at some point.

In 2.process-cells, Metadata_Foci_ is added to the beginning of all columns in the foci_df before it is merged into the compartment csvs. For this step, we filter down the columns that are in the df to just a subset. But when we set the column names in the config it is before they are renamed with Metadata_Foci_.

gwaybio · 2020-06-09T13:48:08Z

0.preprocess-sites/7.visualize-cell-summary.py

+        cell_count_df.site,
+        categories=(
+            cell_count_df
+            .groupby("site")


the pd.Categorical() command is the key thing. It is a strange (but powerful) nuance that is important for plotting purposes. With pd.Categorical() one can define the order of an axis containing categorical variables.

So the understanding in 2 is correct. What error are we getting with .groupby?

gwaybio · 2020-06-09T13:50:06Z

0.preprocess-sites/7.visualize-cell-summary.py

+
+cell_count_gg
+
+#Same graph as above, separated by Well.


well_order isn't probably encoded in the site column, correct? There should be some other column that has well information. After fixing this and the graph still doesn't work, lets setup a time to chat (today if possible)

gwaybio · 2020-06-09T13:51:34Z

0.preprocess-sites/7.visualize-cell-summary.py

+
+all_well_count_df
+
+a1_sum = all_well_count_df.groupby("Well")["cell_count"].sum()["A1"]


what is this logic doing? what is the plot for? Is it just total cell count in each well? We probably don't need that info in the title

ErinWeisbart · 2020-06-10T03:39:42Z

I didn't include a fair bit that was included in the analysis of CP074B at this step, however the sections I didn't include seemed to be addressing the question of quality of cells with very large numbers of spots which we have likely sufficiently explored already. We can always add it in later if it does become a question we need to re-address.

gwaybio

looks great - a couple of minor tweaks that should be considered. Feel free to merge once you're happy!

0.preprocess-sites/7.visualize-cell-summary.py

gwaybio · 2020-06-10T15:12:20Z

0.preprocess-sites/7.visualize-cell-summary.py

+
+output_folder = pathlib.Path(output_resultsdir, "cells")
+os.makedirs(output_folder, exist_ok=True)
+output_file = pathlib.Path(output_folder, "cell_count.tsv")


are we making this file elsewhere? Do we need to resave? I do not know, but it feels like we already have this info.

We make it for each site in 2.process-cells, but I think this is the only time that we make it for the whole batch.

gwaybio · 2020-06-10T15:21:31Z

0.preprocess-sites/7.visualize-cell-summary.py

+spot_score_mean_cols = ["Metadata_Foci_" + col + "_mean" for col in spot_score_cols]
+
+input_basedir = cell_args["output_basedir"]
+metadata_foci_col = cell_args["metadata_merge_columns"]["cell_quality_col"]


do we need to add this to the example site_processing_config.yml?

yes! I forgot to push the updated config file. It's added to this PR now.

0.preprocess-sites/7.visualize-cell-summary.py

gwaybio · 2020-06-10T15:27:51Z

0.preprocess-sites/7.visualize-cell-summary.py

+metadata_col_list.append(foci_site_col)
+
+input_dir = pathlib.Path(input_basedir, batch, "paint")
+sites = os.listdir(input_dir)


we should update this to exclude ignore_files:

https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/blob/master/0.preprocess-sites/2.process-cells.py#L87

0.preprocess-sites/7.visualize-cell-summary.py

scripts/config_utils.py

Co-authored-by: Greg Way <gregory.way@gmail.com>

initial draft of visualize cells

67a20ef

ErinWeisbart commented Jun 8, 2020

View reviewed changes

0.preprocess-sites/7.visualize-cell-summary.py Show resolved Hide resolved

ErinWeisbart commented Jun 8, 2020

View reviewed changes

fix missing well

29e8912

ErinWeisbart commented Jun 8, 2020

View reviewed changes

ErinWeisbart requested a review from gwaybio June 8, 2020 20:12

gwaybio reviewed Jun 9, 2020

View reviewed changes

ErinWeisbart added 2 commits June 9, 2020 20:26

fixes for first draft

35c4990

parser for .py not .ipynb

04c487e

ErinWeisbart marked this pull request as ready for review June 10, 2020 03:37

gwaybio approved these changes Jun 10, 2020

View reviewed changes

ErinWeisbart and others added 8 commits June 10, 2020 09:30

config updates for summarize cells

eaf1fab

Format for .py from .ipynb

120b00f

Co-authored-by: Greg Way <gregory.way@gmail.com>

make metadata_col_list pythonic

159538e

Co-authored-by: Greg Way <gregory.way@gmail.com>

update output folders by batch

689df3d

Merge branch 'master' into vizualize-cell-summary

838bfb1

handle ignore files

b8593ad

save summary quality tsv

f0b8c6d

remove hardcoded row/cols from graph

088970a

ErinWeisbart merged commit 0874f14 into broadinstitute:master Jun 11, 2020

ErinWeisbart deleted the vizualize-cell-summary branch June 11, 2020 15:25

gwaybio mentioned this pull request Jul 22, 2020

Metadata in 7.visualize-cell-summary.py #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial draft of visualize cells #18

initial draft of visualize cells #18

ErinWeisbart commented Jun 8, 2020

ErinWeisbart Jun 8, 2020

gwaybio Jun 9, 2020

ErinWeisbart Jun 9, 2020

ErinWeisbart Jun 8, 2020 •

edited

gwaybio Jun 9, 2020

ErinWeisbart Jun 8, 2020

gwaybio Jun 9, 2020

ErinWeisbart Jun 9, 2020

gwaybio Jun 9, 2020

ErinWeisbart Jun 9, 2020

gwaybio Jun 9, 2020

gwaybio Jun 9, 2020

gwaybio Jun 9, 2020

ErinWeisbart commented Jun 10, 2020

gwaybio left a comment

gwaybio Jun 10, 2020

ErinWeisbart Jun 11, 2020

gwaybio Jun 10, 2020

ErinWeisbart Jun 10, 2020

gwaybio Jun 10, 2020


		all_well_count_df

		a1_sum = all_well_count_df.groupby("Well")["cell_count"].sum()["A1"]

initial draft of visualize cells #18

initial draft of visualize cells #18

Conversation

ErinWeisbart commented Jun 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ErinWeisbart Jun 8, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ErinWeisbart commented Jun 10, 2020

gwaybio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ErinWeisbart Jun 8, 2020 •

edited