Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial draft of visualize cells #18

Merged

Conversation

ErinWeisbart
Copy link
Member

Here's what I've got so far. Need a few things fixed/clarified before I can move on to the rest. See comments for specifics.

cell_count_df.site,
categories=(
cell_count_df
.groupby("site")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't fully understand what all of the steps here are doing to the site column
  2. I'm having problems with .groupby (here and elsewhere). I believe I'm understanding that this should sort the sites in descending order of the sum of their cell_count, but it's not working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pd.Categorical() command is the key thing. It is a strange (but powerful) nuance that is important for plotting purposes. With pd.Categorical() one can define the order of an axis containing categorical variables.

So the understanding in 2 is correct. What error are we getting with .groupby?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't throw an error but from what I can tell it's just not actually doing anything.


cell_count_gg

#Same graph as above, separated by Well.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This graph isn't separating by well as desired.
  2. The starting well_order I have here doesn't work. (It was hardcoded before but this isn't the correct way to fix it). This messes up the Well column in the df.
  3. The row and ncol will need to be calculated instead of manually set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well_order isn't probably encoded in the site column, correct? There should be some other column that has well information. After fixing this and the graph still doesn't work, lets setup a time to chat (today if possible)


all_well_count_df

a1_sum = all_well_count_df.groupby("Well")["cell_count"].sum()["A1"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a1_sum and a2_sum here need to not be hardcoded (also affecting .ggtitle below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this logic doing? what is the plot for? Is it just total cell count in each well? We probably don't need that info in the title

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's just in the title. I can drop it.

@ErinWeisbart ErinWeisbart requested a review from gwaybio June 8, 2020 20:12
quality_func = core_args["categorize_cell_quality"]

barcode_cols = spot_args["barcode_cols"]
barcode_cols = ["Metadata_Foci_" + col for col in barcode_cols]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to append a prefix if it is already there. We had code somewhere to do this check, right? Might be worth splitting out an add_prefix() function at some point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 2.process-cells, Metadata_Foci_ is added to the beginning of all columns in the foci_df before it is merged into the compartment csvs. For this step, we filter down the columns that are in the df to just a subset. But when we set the column names in the config it is before they are renamed with Metadata_Foci_.

cell_count_df.site,
categories=(
cell_count_df
.groupby("site")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pd.Categorical() command is the key thing. It is a strange (but powerful) nuance that is important for plotting purposes. With pd.Categorical() one can define the order of an axis containing categorical variables.

So the understanding in 2 is correct. What error are we getting with .groupby?


cell_count_gg

#Same graph as above, separated by Well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well_order isn't probably encoded in the site column, correct? There should be some other column that has well information. After fixing this and the graph still doesn't work, lets setup a time to chat (today if possible)


all_well_count_df

a1_sum = all_well_count_df.groupby("Well")["cell_count"].sum()["A1"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this logic doing? what is the plot for? Is it just total cell count in each well? We probably don't need that info in the title

@ErinWeisbart ErinWeisbart marked this pull request as ready for review June 10, 2020 03:37
@ErinWeisbart
Copy link
Member Author

I didn't include a fair bit that was included in the analysis of CP074B at this step, however the sections I didn't include seemed to be addressing the question of quality of cells with very large numbers of spots which we have likely sufficiently explored already. We can always add it in later if it does become a question we need to re-address.

Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great - a couple of minor tweaks that should be considered. Feel free to merge once you're happy!

0.preprocess-sites/7.visualize-cell-summary.py Outdated Show resolved Hide resolved

output_folder = pathlib.Path(output_resultsdir, "cells")
os.makedirs(output_folder, exist_ok=True)
output_file = pathlib.Path(output_folder, "cell_count.tsv")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we making this file elsewhere? Do we need to resave? I do not know, but it feels like we already have this info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We make it for each site in 2.process-cells, but I think this is the only time that we make it for the whole batch.

spot_score_mean_cols = ["Metadata_Foci_" + col + "_mean" for col in spot_score_cols]

input_basedir = cell_args["output_basedir"]
metadata_foci_col = cell_args["metadata_merge_columns"]["cell_quality_col"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add this to the example site_processing_config.yml?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! I forgot to push the updated config file. It's added to this PR now.

0.preprocess-sites/7.visualize-cell-summary.py Outdated Show resolved Hide resolved
metadata_col_list.append(foci_site_col)

input_dir = pathlib.Path(input_basedir, batch, "paint")
sites = os.listdir(input_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.preprocess-sites/7.visualize-cell-summary.py Outdated Show resolved Hide resolved
0.preprocess-sites/7.visualize-cell-summary.py Outdated Show resolved Hide resolved
0.preprocess-sites/7.visualize-cell-summary.py Outdated Show resolved Hide resolved
0.preprocess-sites/7.visualize-cell-summary.py Outdated Show resolved Hide resolved
scripts/config_utils.py Show resolved Hide resolved
@ErinWeisbart ErinWeisbart merged commit 0874f14 into broadinstitute:master Jun 11, 2020
@ErinWeisbart ErinWeisbart deleted the vizualize-cell-summary branch June 11, 2020 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants