-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustering datasets with >50 million cells #185
Comments
FYI this data is already pre-processed with exclusion of dead/doublets/debris |
Hi Abbey,
Are you seeing this at the FlowSOM step or earlier? This issue mostly comes up as a kind of data handling issue, but we have run datasets of that size through FlowSOM with no problems. It has come up in some other functions before, but we have a couple of options to get around it.
Tom
…________________________________
From: Abbey Figliomeni ***@***.***>
Sent: Tuesday, March 5, 2024 8:50:57 PM
To: ImmuneDynamics/Spectre ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [ImmuneDynamics/Spectre] Clustering datasets with >50 million cells (Issue #185)
FYI this data is already pre-processed with exclusion of dead/doublets/debris
—
Reply to this email directly, view it on GitHub<#185 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACZYS66Q4XV4TTXWAXR22G3YWWIQDAVCNFSM6AAAAABEGY3Y3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYGM2TONBWGQ>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Hi Tom, |
@AbbeyFigliomeni Did you get a numerical value where the message says "xx"? Do you know the RAM amount in this computer? |
Hi Samuel, the number differed when deleting or including certain cluster marker channels, ranged from GB to mb. I have been looking at the pie chart/control panel and the program has been using a large amount of memory, will check the RAM. Thanks!
Sent from my phone - please excuse typos.
…________________________________
From: Samuel Granjeaud ***@***.***>
Sent: Tuesday, March 5, 2024 10:42:21 PM
To: ImmuneDynamics/Spectre ***@***.***>
Cc: Abbey Figliomeni ***@***.***>; Mention ***@***.***>
Subject: Re: [ImmuneDynamics/Spectre] Clustering datasets with >50 million cells (Issue #185)
@AbbeyFigliomeni<https://github.com/AbbeyFigliomeni> Did you get a numerical value where the message says "xx"? Do you know the RAM amount in this computer?
Alternatively, as I am using RStudio on my Windows10 computer, in the environment tab there is pie chart showing the memory used. If I click on it and ask for a memory usage report, I get how much RAM is used by RStudio and is on my computer.
Alt., the command sum(sapply(ls(), function(x) object.size(get(x))))/1024^3 reports the amount (in GiB) of RAM currently used.
My two cents...
—
Reply to this email directly, view it on GitHub<#185 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5NDZEDT4OJJYFY64AL4CILYWXKU3AVCNFSM6AAAAABEGY3Y3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYHEZDONJXGY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I have run into a similar issue with the spatial package, using the stars method to generate the polygons + outlines, for quite large IMC images (2000x2000) - The error shows "vector memory exhausted (limit reached?)" I've used the fix on - https://stackoverflow.com/questions/51295402/r-on-macos-error-vector-memory-exhausted-limit-reached R was using around 55Gb of RAM, so this fix extended the RAM available to R, including virtual memory. Hope this helps. |
Hi All, |
I just tested running flowsom on around 50 million cells on my mac with 24GB ram, and I ran into the same issue. I'll look into the In the meantime, if you don't have access to computer with bigger ram, as an alternative you can subsample the cells, cluster them, and map the rest into the clusters. This is not ideal as the subsampling may not cover some cell types and caused them to be merged to other cell types (that were included in the subsampling process). Or you can compress your data into supercells using supercellcyto (https://github.com/phipsonlab/SuperCellCyto) and run flowsom on those supercells. Afterwards, you can expand those supercells back and assign the cells in the supercell the cluster the supercell belong to. Disclaimer: I'm the author of supercellcyto. |
Hi all, thanks for all those who weighed in RE the RAM issue and @ghar1821 for your helpful feedback. Just an update for anyone who is interested/facing the same issue: I managed to substantially decrease the size of my data table by deleting all phenodata columns except patient identifiers, and prior to clustering removing all other irrelevant objects from my workspace. I successfully clustered (although took 4 hours with my trust 16gB processor!), and then re-added my phenodata columns. Rest of the workflow as normal. |
@AbbeyFigliomeni nice solution! We'll keep this in mind for when this comes up in future. |
Hi,
Does anyone have experience clustering datasets with excess of 50 million cells? Mine has 59 million cells, average 1 million per person, and I keep getting the error "cannot allocate vector size xx gB/mB". FYI the dataset contains 14 clustering markers of interest.
I understand the flowSOM algorithm is not designed to handle datasets this large, but would prefer not to subset my data prior to clustering to avoid any loss of data/effects due to random sampling.
Any suggestions would be greatly appreciated! :)
The text was updated successfully, but these errors were encountered: