Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute_pixel_cluster_channel_avg kills the kernel #548

Closed
ngreenwald opened this issue May 17, 2022 · 4 comments
Closed

compute_pixel_cluster_channel_avg kills the kernel #548

ngreenwald opened this issue May 17, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@ngreenwald
Copy link
Member

Please refer to our FAQ and look at our known issues before opening a bug report.

Describe the bug
When running `apply_pixel_meta_cluster_remapping', the kernel often dies when re-computing the channel averages. Given that this is being computed across all of the FOVs, is this something that will be prohibitive for analyzing larger cohorts? It won't matter how much we subset if this calculation needs to be computed from all the underlying data?

@ngreenwald ngreenwald added the bug Something isn't working label May 17, 2022
@alex-l-kong
Copy link
Contributor

Depending on what we find, this may be related to #518.

@alex-l-kong
Copy link
Contributor

@ngreenwald I'll be doing some memprofing on this one to see if there's a bottleneck in the way this is being computed.

@alex-l-kong
Copy link
Contributor

alex-l-kong commented May 20, 2022

@ngreenwald yeah, we do have a massive memory problem on our hand. I used the pympler memory profiler to check out each iteration of compute_pixel_cluster_channel_avg and here's what we're getting on Candace's dataset about 25 FOVs in:

Screen Shot 2022-05-20 at 11 59 58 AM

It's understandable that the amount of pandas.DataFrame memory goes up per iteration: we are appending after all. However, there should not be this much memory usage: for a 10x10 SOM and 30 FOVs, there should only be at most 3000 rows of data, and we don't have any large blob objects or the like either.

The problem may be we need to read in the full pixel data frame for each FOV to compute the sums and counts, and there are issues clearing the memory per iteration.

Regardless of if this should crash a 30+GB Docker instance or not, this is a lot more memory than this function is expected to use. I'll be taking a deeper dive into compute_pixel_cluster_channel_avg.

@alex-l-kong
Copy link
Contributor

@ngreenwald any status updates on these memory-related issues? Now that we've updated FlowSOM to not run MapDataToCodes when SOM is called, I think we can close this one as they solved the internal memory issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants