-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial draft of visualize cells #18
initial draft of visualize cells #18
Conversation
cell_count_df.site, | ||
categories=( | ||
cell_count_df | ||
.groupby("site") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't fully understand what all of the steps here are doing to the site column
- I'm having problems with .groupby (here and elsewhere). I believe I'm understanding that this should sort the sites in descending order of the sum of their cell_count, but it's not working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pd.Categorical()
command is the key thing. It is a strange (but powerful) nuance that is important for plotting purposes. With pd.Categorical()
one can define the order of an axis containing categorical variables.
So the understanding in 2 is correct. What error are we getting with .groupby
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't throw an error but from what I can tell it's just not actually doing anything.
|
||
cell_count_gg | ||
|
||
#Same graph as above, separated by Well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This graph isn't separating by well as desired.
- The starting well_order I have here doesn't work. (It was hardcoded before but this isn't the correct way to fix it). This messes up the Well column in the df.
- The row and ncol will need to be calculated instead of manually set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well_order
isn't probably encoded in the site
column, correct? There should be some other column that has well information. After fixing this and the graph still doesn't work, lets setup a time to chat (today if possible)
|
||
all_well_count_df | ||
|
||
a1_sum = all_well_count_df.groupby("Well")["cell_count"].sum()["A1"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a1_sum and a2_sum here need to not be hardcoded (also affecting .ggtitle below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this logic doing? what is the plot for? Is it just total cell count in each well? We probably don't need that info in the title
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's just in the title. I can drop it.
quality_func = core_args["categorize_cell_quality"] | ||
|
||
barcode_cols = spot_args["barcode_cols"] | ||
barcode_cols = ["Metadata_Foci_" + col for col in barcode_cols] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to append a prefix if it is already there. We had code somewhere to do this check, right? Might be worth splitting out an add_prefix()
function at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 2.process-cells, Metadata_Foci_ is added to the beginning of all columns in the foci_df before it is merged into the compartment csvs. For this step, we filter down the columns that are in the df to just a subset. But when we set the column names in the config it is before they are renamed with Metadata_Foci_.
cell_count_df.site, | ||
categories=( | ||
cell_count_df | ||
.groupby("site") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pd.Categorical()
command is the key thing. It is a strange (but powerful) nuance that is important for plotting purposes. With pd.Categorical()
one can define the order of an axis containing categorical variables.
So the understanding in 2 is correct. What error are we getting with .groupby
?
|
||
cell_count_gg | ||
|
||
#Same graph as above, separated by Well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well_order
isn't probably encoded in the site
column, correct? There should be some other column that has well information. After fixing this and the graph still doesn't work, lets setup a time to chat (today if possible)
|
||
all_well_count_df | ||
|
||
a1_sum = all_well_count_df.groupby("Well")["cell_count"].sum()["A1"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this logic doing? what is the plot for? Is it just total cell count in each well? We probably don't need that info in the title
I didn't include a fair bit that was included in the analysis of CP074B at this step, however the sections I didn't include seemed to be addressing the question of quality of cells with very large numbers of spots which we have likely sufficiently explored already. We can always add it in later if it does become a question we need to re-address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great - a couple of minor tweaks that should be considered. Feel free to merge once you're happy!
|
||
output_folder = pathlib.Path(output_resultsdir, "cells") | ||
os.makedirs(output_folder, exist_ok=True) | ||
output_file = pathlib.Path(output_folder, "cell_count.tsv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we making this file elsewhere? Do we need to resave? I do not know, but it feels like we already have this info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make it for each site in 2.process-cells, but I think this is the only time that we make it for the whole batch.
spot_score_mean_cols = ["Metadata_Foci_" + col + "_mean" for col in spot_score_cols] | ||
|
||
input_basedir = cell_args["output_basedir"] | ||
metadata_foci_col = cell_args["metadata_merge_columns"]["cell_quality_col"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to add this to the example site_processing_config.yml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! I forgot to push the updated config file. It's added to this PR now.
metadata_col_list.append(foci_site_col) | ||
|
||
input_dir = pathlib.Path(input_basedir, batch, "paint") | ||
sites = os.listdir(input_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should update this to exclude ignore_files
:
Co-authored-by: Greg Way <gregory.way@gmail.com>
Co-authored-by: Greg Way <gregory.way@gmail.com>
Here's what I've got so far. Need a few things fixed/clarified before I can move on to the rest. See comments for specifics.