-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frozen data version 1 #63
Conversation
also adding recoded dose to all annotated profiles
and run black
@shntnu - this is good to go. Sorry for the HUGE amounts of files (most are just profiles). Please pay extra attention to any updated documentation. Any code changes will require a complete rerun (which I'll only do if absolutely necessary). If necessary, I can address #65 simultaneously. The next step will be to update to dvc! |
] | ||
|
||
# Output option | ||
float_format = "%5g" | ||
compression = "gzip" | ||
compression_options = {"method": "gzip", "mtime": 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hooray!
python profiling_pipeline.py --batch "2017_12_05_Batch2" --plate_prefix "BR" --well_col "Metadata_Well" --plate_col "Metadata_Plate" --extract_cell_line | ||
```bash | ||
# Make sure you are in the profiles/ directory | ||
./run.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
if batch == "2017_12_05_Batch2": | ||
spherize_df = ( | ||
profile_df | ||
.groupby("Metadata_cell_line") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to edit but pretty sure you don't need this logic (grouping by cell line is trivially valid in batch 1 as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding spherizing overall, we will make a lot of changes over time which are likely to improve the quality:
- Also group by timepoint and not just cell line because if we don't we might be effectively factoring out subspaces we care about
- Figure out epsilon
- Drop outliers prior to spherizing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also group by timepoint and not just cell line because if we don't we might be effectively factoring out subspaces we care about
nice catch, i'll update this
Figure out epsilon
I think i'll skip this one for data freeze version 1. It seems to far off in the future to wait on.
Drop outliers prior to spherizing
Yep, we already do this. We're discussing in #65 (comment) and once we decide there, I'll run this notebook again with all the suggested changes (I'll also add drop outliers to consensus signature generation notebook)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I agree you should skip epslion
optimization, and it's great you're updating 1. as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And great that the outlier features will be gone! For dropping outlier samples, we need new functionality in pycytominer
, right? See cytomining/pycytominer#140
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pycytominer currently has a very crude outlier removal strategy (here), which can be specified as an operation in feature_select()
.
It should easily handle Michael's features as defined in #65, but you're right, the method needs to be improved in the future. Thanks for opening that pycytominer issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, it needs to be a different operation altogether (a row filter, not a column filter; drop_outlier_features
is the latter), but we can discuss this in cytomining/pycytominer#140
All set here from my end.
I focused on only .py and .md I didn't see any documentation changes other than the README.md Did I miss any documentation?
No need to do #65 except perhaps this thing you suggested:
|
Nope, I think you got it all. Thank you! I can make all of these changes, and we should be good to merge soon |
adding dose info to profile readme, adding outlier feature drop to consensus readme
spherize based on both time and cell line, and perform blocklist feature selection beforehand
Alright @shntnu - this is ready for your eyes again. Here is what changed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, I totally missed the notification 23 days ago, sorry!
Everything looks good. Thank you for documenting everything point-by-point.
Just one q:
- Adding the new blocklist features 510fe92
Do you want to update this as well? https://figshare.com/articles/dataset/Blacklist_Features_-_Cell_Profiler/10255811
(but no need to wait for that of course)
I don't think so... Although i do think that we want to update this figshare document to include other version-specific CellProfiler blocklists. At the very least, much more thought needs to go into updating it (much more thought for me at least!). As a separate but related note: I really want to do a deep dive into CellProfiler features... i think its the first step to understanding generic morphology features, which we'll want to annotate with more interpretable biology. It'll also help us with interpreting DeepProfiler features in the future. |
There are two resources that I can think of that will be relevant for this effort
|
I update pycytominer and add associated fixes as described in #62
TODO
In the next PR, I will migrate from git lfs to dvc