-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add context-dependent spatial randomization #215
Conversation
…nv with angelo packages helps
Still need to add testing, this code review is mainly so you can kind of see what's going on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the logic makes sense. I'm sure there is more optimization to be had, two main things I noticed. The first is that I believe the first two nested for loops can be combined; see my comment. The second is that I don't think the code actually runs as written; thresh
isn't defined, for example.
The way you wrote it is how it's implemented in matlab; we changed it for
the python version
…On Thu, Sep 10, 2020 at 4:15 PM alex-l-kong ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In ark/utils/spatial_analysis_utils.py
<#215 (comment)>
:
> +
+ for j in range(num):
+ # we need to regenerate the positive inds per marker so we can compare them
+ # with the positive indices for each cell type, needed so we can bootstrap
+ # in a context-based environment properly
+ marker1posinds = current_fov_channel_data[current_fov_channel_data.columns[j]] > thresh
+
+ # generate the number of positive hits per cell type for a specific marker
+ # in cell_type_facets or else for all the non-cell_type_facets cell types
+ cell_type_nums_per_facet_1 = {}
+ for cell_type, cell_type_data in cell_type_data_per_facet.values():
+ cell_type_nums_per_facet_1[cell_type] = np.sum(
+ np.logical_and(marker1posinds, cell_type_data.index))
+
+ # new iteration, needed to properly generate a pair of 1 vs 2 analysis
+ # basically the same thing as generating cell_type_nums_per_facet_1
I'll double check Erin's code, but yeah absolutely, I think we collapse
this loop out.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#215 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADJB47PRF2SG6ABRX7BSQ2DSFFMZNANCNFSM4RFUZJ5A>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a good start, but taking a look into optimizing the loops would probs be a good idea
Added very basic testing for context spatial analysis, the thing to look at now is the updated logic which is now (more) optimized. I'm unsure about one aspect which I've added in a long TODO: if anyone happens knows anything about that please feel free to comment! I will be working on integrating this with calculate_channel_spatial_enrichment, which would include a flag indicating whether we want context-randomization, as well as another argument to specify the FlowSOM ID's we want to randomize over. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking like it's getting there; I mostly have a few clarification questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with all of Adam's comments
The logic is sound now, at least when comparing Erin's data between her script and our optimized version. However, we will probably need to wait until next week until we can get full verification. For the record, the context-randomized I do know for a fact that when running Erin's non-optimized MATLAB context-randomization script on her data to produce the real |
…y test_utils function called correctly
Resolved by #451 |
What is the purpose of this PR?
Addresses and closes #207. Now that we've built a foundation for context-dependent spatial randomization, we can now begin to work on the actual randomization process. Naive randomization would assume that we do not care about the cell types (aka the FlowSOM IDs) associated with the marker list we randomize over. We no longer make that assumption in a context-dependent environment.
How did you implement your changes
We will be working primarily out of a new function: compute_close_cell_num_random_context. Good news is that compute_close_cell_num does not need to be changed, only the randomization process we're comparing against. For channel enrichment, we allow the user to specify a list of FlowSOM IDs they wish to specifically facet over, with all non-specified IDs getting grouped into an 'else' category. Erin would also like for the user to be able to specify parameters to tune the randomization process. I'm not exactly sure how this would be done yet, but this is something we'll look into after we've gotten the basic logic written in stone.
Remaining issues
Non-optimal code is the biggest issue right now. To make everything clear, I've brute-forced my way through the initial implementation. This will need to be optimized to increase efficiency.
Some code had to be duplicated from compute_close_cell_num to properly ensure markers by type were indexed properly. If we can remove some or all of that code, that'd be mighty fine.