New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run out of memory on 1.6m point dataset with 300 dimensions. #56
Comments
Thank you! Which version of UMAP are you using? This should not be happening with the memory you have and the size of the dataset. |
It's 0.4.6. It comes with top2vec. when I install top2vec with pip. |
Do you have a screenshot of the error? |
there are no details for the error. But I am pretty sure it's because run out of memory, I use pycharm to run the notebook. And also monitor the memory with the terminal. when running UMAP and it eats all memory, the notebook server reboot. |
You could try UMAP with |
And all this happens in 2 minutes. I know that doc2vec takes about 8g of memory but there are still 24g for UMAP to use. And the author of UMAP mentioned it's memory hungry. But that hungry? OK, I will try with init='random'. Thank you, I will let you know the result. |
Hi, I just tried with parameter init='random'. the issue is the same, oom. |
I have ran Top2Vec on my own laptop with a 1.2 million dataset, with no issues and it has less RAM then than your system. This seems to be a UMAP problem so unfortunately I cannot help any further. |
Hi, I think I just need more RAM. I just did some tests with fewer points. 20k points take 7g, 80k points take 21g. Thus, 1.6m should take 42g. I think UMAP really need to optimize the usage of RAM in the next version. |
If you don't mind, could you please share your gig's specs (CPU and RAM)? and how do you install the UMAP? through pip install umap-learn? or through pip install top2vec?Thanks a lot! |
I created a fresh conda environment, followed by |
Hi, great work for Top2Vec, I am trying to apply it to my dataset which has 1.6million instances. I successfully trained Doc2vec inside Top2vec. with 300 dimensions as the default. but I run out of memory on the Umap procedure in 2 minutes. BTW I have a 32g memory. I also try low_memory=True. The same oom.
So, I wonder that how many memory UMAP gonna take for 2m points with 300 dimensions? For precaution, how many more memory HDBScan gonna cost?
Thank you!
The text was updated successfully, but these errors were encountered: