Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear unused intermediate DataFrame memory in channel averaging function #559

Closed
wants to merge 5 commits into from

Conversation

alex-l-kong
Copy link
Contributor

What is the purpose of this PR?

Closes #548. Pixel channel averaging needs to be an iterative process due to the way they are batched per FOV. These individual pixel cluster files can be extremely large, and Python does not automatically release them from memory even if the variable is overwritten on the next iteration. This means that for massive datasets, Docker could easily run out of memory. This PR should prevent this from happening.

How did you implement your changes

compute_pixel_cluster_channel_avg will need to explicitly flush out these large, unused DataFrames with del. We do this for the intermediate fov_pixel_data, sum_by_cluster, count_by_cluster, and agg_results variables. This is especially important for the middle 2, because groupby objects can be very computationally expensive.

Remaining issues

This may not be the only memory adjustment we need to make, but it's a start.

@alex-l-kong alex-l-kong self-assigned this May 20, 2022
@alex-l-kong
Copy link
Contributor Author

alex-l-kong commented May 20, 2022

@ngreenwald can you see if this fixes your kernel dying error?

UPDATE: if so, I'll see if there are any other places we can use explicit del commands.

@ngreenwald
Copy link
Member

ngreenwald commented May 20, 2022 via email

@alex-l-kong
Copy link
Contributor Author

@ngreenwald it does on my end.

@ngreenwald
Copy link
Member

Nope, still died
image

@ngreenwald
Copy link
Member

I closed the docker session, restarted, and it worked. So it seems like there's some leftover memory leaks from previous cells that are run which then causes the kernel to die when that one is run. Can you run profiling on the entire pixel clustering workflow to try and figure out what's causing it?

@ngreenwald
Copy link
Member

Different version of trained SOM, this time it died on the first step, "using remapping scheme." This is even though I had just started up the docker. Given that all of these different versions of the clustering pipeline are using the same data, I don't understand the randomness of it sometimes dying and sometimes not

@ngreenwald
Copy link
Member

Actually, this was after I switched to a different branch. So it seems like this change successfully addressed the issue underlying the remapping step, since that error cropped up again as soon as I switched, but not the overall memory issues

@alex-l-kong
Copy link
Contributor Author

@ngreenwald yeah I kind of suspected this might happen, which is why I alluded to needing to flush out the memory elsewhere using del. I just wanted to see if the remapping step memory problems were addressed; seems so, which means I'll now memprof and run del (or equivalent commands in R, if needed) on previous steps in the pipeline.

@alex-l-kong
Copy link
Contributor Author

alex-l-kong commented May 23, 2022

@ngreenwald identified create_pixel_matrix as another offending function and might be the worst of the bunch. These are the stats after just 15 FOVs on Candace's smaller dataset:

Screen Shot 2022-05-23 at 9 58 38 AM

The reason these are much larger is because we have to store both a pixel_mat and a pixel_mat_subset in memory for a FOV. Since we now know that Python doesn't automatically clear these on the next iteration (in spite of the variable being overwritten), this will blow up memory usage.

Explicitly calling del on the no-longer-needed DataFrames and DataArrays should eliminate this problem.

@alex-l-kong
Copy link
Contributor Author

@ngreenwald the update should ensure large, intermediate DataFrames (especially during loops) are released. Tested it on my end without memory errors, can you see if it helps on your end?

@ngreenwald
Copy link
Member

Crashed on the cluster_pixels function when computing average channel expression. I merged this in with my own branch that I'm working on, and named it combined_branch. Can you take a look, I just pushed it, to make sure all of your changes were included? If so, then there's still an issue

@alex-l-kong
Copy link
Contributor Author

@ngreenwald yeah everything in combined_branch is there. Since even freeing memory during the run is an issue, we might need to come up with a different way to address this.

One option potentially is to pre-compute this average in R instead while run_pixel_som.R is processing each FOV. I'm not a huge fan of this option since it would require another intermediate file to be saved. However, it does allow for one fewer per-FOV loop, meaning one fewer time we have to read each FOV into memory one at a time.

Another option I'm trying out right now is directly invoking the Python garbage collector using gc. This generally doesn't immediately remove memory, but in certain cases with millions of objects, it can help avoid fragmented memory. Also not normally a huge fan of this option (since the garbage collector should normally know how to do its job), but in our case where we have millions of objects in memory at once, it could help ease things up a bit.

@ngreenwald
Copy link
Member

Just had the kernel die on a completely new docker instance, running only the remapping function, not the matrix creation, som training, etc. This makes me think it's not an issue with leftover memory from other functions

image

@alex-l-kong
Copy link
Contributor Author

alex-l-kong commented May 24, 2022

@ngreenwald that's good to know. Let me focus on memory profiling that function to see if there's anything popping up. There wasn't anything when I ran it on my end, but I'll double-check on your branch.

@ngreenwald
Copy link
Member

This issue was never replicated outside my laptop

@ngreenwald ngreenwald closed this Aug 26, 2022
@ngreenwald ngreenwald deleted the pixel_channel_mem branch August 26, 2022 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

compute_pixel_cluster_channel_avg kills the kernel
2 participants