Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing takes too much memory #64

Open
fschlatt opened this issue Oct 27, 2020 · 8 comments
Open

Installing takes too much memory #64

fschlatt opened this issue Oct 27, 2020 · 8 comments

Comments

@fschlatt
Copy link

fschlatt commented Oct 27, 2020

When running python -m quickumls.install on an MRCONSO.RRF file with about 7M rows, the memory footprint continuously grows and some point the process is killed because of using too much memory. The two main culprits I could find are the processed

processed = set()
and simstring
simstring_terms = set()
sets.

I assume they are there to prevent duplicate entries in the SimString and CuiSemType DBs. When using the unqlite database, a check for duplicate entries is implemented on the insert call. So duplicate entries are a non issue. However, I am not sure if the same is true for the SimString database. Is it safe to add a duplicate terms/n-grams to the SimString database or will that break anything? This would then allow removing the memory overhead from the large sets for large UMLS subsets.

@CatalinaZ16
Copy link

Hi!
have you solve that problem?, I have the same :(

@fschlatt
Copy link
Author

Hi!
have you solve that problem?, I have the same :(

Sort of. At the cost of including some duplicates in the SimString database, I was able to reduce the RAM footprint by a significant amount. It now runs for the whole UMLS on my 16G RAM machine. Take a look at my fork of the repository for the fixes.

@soldni
Copy link
Member

soldni commented Feb 12, 2021 via email

@fschlatt
Copy link
Author

Hey Luca,

Sure thing. I've also added that the preferred term is returned and applied black formatting to the repo, so there are a couple of additional changes. I'll create a pull request with my entire fork and we can discuss there, which parts are necessary and which are superfluous.

Best,
Ferdinand

@soldni
Copy link
Member

soldni commented Feb 13, 2021 via email

@jimhavrilla
Copy link

Seems like this from @fschlatt may be the fix
7651393. I had to drastically increase my RAM for the install as well.

@jmugan
Copy link

jmugan commented Aug 15, 2021

I ran into this as well. I have 16 GB of memory. Is the recommended approach implementing the changes from the comment above?

@jmugan
Copy link

jmugan commented Aug 16, 2021

I got it to work by being more selective about what I extracted from UMLS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants