-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a LMDB bigger than RAM: Program gets killed when RAM gets full #113
Comments
Hi there, A quick fix for this would be passing writemap=True to lmdb.open(), Can you tell me how much RAM the machine has available for your On 27 April 2016 at 14:30, chemaf notifications@github.com wrote:
|
Hi David, First, thank you for answering me so soon. Passing writemap=True to lmdb.open() is working!, I don't understand which problems may imply use this flag... but I will try with it, unless you can give me another solution for my problem. Thank you! |
Using writemap=True causes LMDB to directly write your changed to a file-backed memory mapping, which behaves like a "write-back cache". This means Linux will keep changes in memory until memory is low, at which point, it starts IO to write the dirty pages to disk to satisfy the memory pressure. You can have file-backed mappings much larger than RAM, but due to how Linux classifies that memory, it can cause unfortunate thrashing when running on a machine that is also running certain kinds of RAM-heavy workloads. Without writemap, LMDB keeps dirty pages in the heap (anonymous memory mapping) until some dirty page limit is reached. Only then it starts to write those dirty pages to the backing disk file. I am not sure when LMDB decides to write spilled pages, but apparently it is too late in your case. Final question.. what does 'free -m' report on your machine without your program running? It might be that your machine has 4GB of RAM, but most of it is in use. Thanks |
Hi there, Chatting to upstream, it seems LMDB should use little more than 512MB of heap at peak. That would suggest your machine is getting into trouble before LMDB gets a chance to allocate that much RAM. It seems possible your machine's free memory is quite low, and the OOM killer is jumping in because (I think) you have no swap configured. Howard also mentioned trying to set |
Another possibility is that your Python app itself is using tons of heap :) |
Hi there, About your last comment I don't think my Python app is using tons of heap because i made a dummy test of this and it had the same problem .I have reserved 2GB of swap space. During the experiment I have been checking the I am running right now the code with the Thanks! |
I am going to let the program running all this night and I will report tomorrow if it is working... Thank you David for all your help ^^ |
It seems to be working. Just a detail to be consider if someone has to use this workaround, when you set the map_size in the lmdb.open funtion you will have to set it carefully given that the final size of the file it is going to be that. It it not going to fulfill that size but after creating the LMDB if you run a This is a curious thing that when my process finished the size of the LMDB files was bigger than the hard drive I am using, I think this is the prove that it will not fulfill the size... It is a bit weird... Is there a way to set the size of the file to the real size of the LMDB size?? Or just creating them again fitting the size better? And in case that you don't fulfill the size of the map, Can that be a problem when a program tries to read the file?? I mean, reading an empty part... Thanks |
It's a sparse file, it has "holes" in it where no data has been written, yet. You can see the real file size (in multiples of your block size) using ls -s. |
try passing writemap=True to lmdb.open() |
I solved this by splitting the write into smaller batches, kind of like this: #read your files list first
batch_size = 1000 #set to whatever fits in your memory
for batch in [files[x:x+batch_size] for x in range(0, len(files), batch_size)]:
with db_data.begin(write=True) as txn:
for img in batch:
#your code for opening image and storing it in the db
#remember to keep a global counter for str_id! |
Hi, I see that this is closed, but is there a solution to this other than using |
You're going to have to be a little more specific. The original issue wasn't really a writemap issue. It was that they were writing a huge file as one massive transaction. There's generally not a problem if you chunk your data into reasonable-sized pieces and commit between the pieces. |
Hi!
I am trying to create a LMDB of a dataset quite bigger than my RAM size. This is the code that attempt to create it:
I would like to force that the ram gets flushed after a fixed numer of iterations, to be able to control my ram size. I have tried to
txn.commit()
each N iterations but i get the following error:lmdb.Error: Attempt to operate on closed/deleted/dropped object.
after the commit is done.My lmdb version is:
Is there a way to control the ram?
Thanks in advance for any help.
The text was updated successfully, but these errors were encountered: