Creating a LMDB bigger than RAM: Program gets killed when RAM gets full #113

jmfacil · 2016-04-27T13:30:04Z

Hi!

I am trying to create a LMDB of a dataset quite bigger than my RAM size. This is the code that attempt to create it:

db_data=lmdb.open(lmdb_path, map_size=int(1e12))
with db_data.begin(write=True) as txn:
  with open(complete_path) as f:
        for line in f:
            X = open_image(line)
            datum = caffe.io.array_to_datum(X.astype(float))
            str_id = '{:010}'.format(i)
            txn.put(str_id.encode('ascii'), datum.SerializeToString())
            i=i+1

I would like to force that the ram gets flushed after a fixed numer of iterations, to be able to control my ram size. I have tried to txn.commit() each N iterations but i get the following error: lmdb.Error: Attempt to operate on closed/deleted/dropped object. after the commit is done.

My lmdb version is:

>>>lmdb.version()
(0, 9, 14)

Is there a way to control the ram?

Thanks in advance for any help.

The text was updated successfully, but these errors were encountered:

dw · 2016-04-27T15:18:15Z

Hi there,

A quick fix for this would be passing writemap=True to lmdb.open(),
though this may trigger IO thrashing later. Your 'with:' statement
implicitly commits the transaction, that's why you are receiving an
error on trying to call commit() twice.

Can you tell me how much RAM the machine has available for your
process. LMDB has built-in spilling logic, although I'm unsure what
watermark it uses to decide to spill rather than keep dirty pages on
the heap.

On 27 April 2016 at 14:30, chemaf notifications@github.com wrote:

Hi!

I am trying to create a LMDB of a dataset quite bigger than my RAM size.
This is the code that attempt to create it:

db_data=lmdb.open(lmdb_path, map_size=int(1e12))
with db_data.begin(write=True) as txn:
with open(complete_path) as f:
for line in f:
X = open_image(line)
datum = caffe.io.array_to_datum(X.astype(float))
str_id = '{:010}'.format(i)
txn.put(str_id.encode('ascii'), datum.SerializeToString())
i=i+1

I would like to force that the ram gets flushed after a fixed numer of
iterations, to be able to control my ram size. I have tried to txn.commit()
each N iterations but i get the following error: lmdb.Error: Attempt to
operate on closed/deleted/dropped object. after the commit is done.

My lmdb version is:

lmdb.version()
(0, 9, 14)

Is there a way to control the ram?

Thanks in advance for any help.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

jmfacil · 2016-04-27T16:18:04Z

Hi David,

First, thank you for answering me so soon.
My computer has 4GB of RAM, I can use another with 8GB.. but right now I am using it for another work....

Passing writemap=True to lmdb.open() is working!, I don't understand which problems may imply use this flag... but I will try with it, unless you can give me another solution for my problem.

Thank you!

dw · 2016-04-27T16:22:38Z

Using writemap=True causes LMDB to directly write your changed to a file-backed memory mapping, which behaves like a "write-back cache". This means Linux will keep changes in memory until memory is low, at which point, it starts IO to write the dirty pages to disk to satisfy the memory pressure.

You can have file-backed mappings much larger than RAM, but due to how Linux classifies that memory, it can cause unfortunate thrashing when running on a machine that is also running certain kinds of RAM-heavy workloads.

Without writemap, LMDB keeps dirty pages in the heap (anonymous memory mapping) until some dirty page limit is reached. Only then it starts to write those dirty pages to the backing disk file. I am not sure when LMDB decides to write spilled pages, but apparently it is too late in your case.

Final question.. what does 'free -m' report on your machine without your program running? It might be that your machine has 4GB of RAM, but most of it is in use.

Thanks

dw · 2016-04-27T16:37:50Z

Hi there,

Chatting to upstream, it seems LMDB should use little more than 512MB of heap at peak. That would suggest your machine is getting into trouble before LMDB gets a chance to allocate that much RAM.

It seems possible your machine's free memory is quite low, and the OOM killer is jumping in because (I think) you have no swap configured.

Howard also mentioned trying to set /proc/sys/vm/swappiness to 0.

dw · 2016-04-27T16:38:19Z

Another possibility is that your Python app itself is using tons of heap :)

jmfacil · 2016-04-27T16:51:00Z

Hi there,

About your last comment I don't think my Python app is using tons of heap because i made a dummy test of this and it had the same problem .I have reserved 2GB of swap space. During the experiment I have been checking the htop output and the LMDB is using around 3GB before it gets killed.

I am running right now the code with the writemap=True. Now the memory use of the app it is not growing so I think it is working, I am not sure yet. The problem is that now it is taking more time, but i can live with that. For this reason I can not show you the free -m report now. But I can assure you I was checking it in the htop and there was around 3GB free in memory before I ran the experiment.

Thanks!

jmfacil · 2016-04-27T17:34:13Z

I am going to let the program running all this night and I will report tomorrow if it is working...

Thank you David for all your help ^^

jmfacil · 2016-04-29T08:59:26Z

It seems to be working. Just a detail to be consider if someone has to use this workaround, when you set the map_size in the lmdb.open funtion you will have to set it carefully given that the final size of the file it is going to be that. It it not going to fulfill that size but after creating the LMDB if you run a ls -l you will see that size in the file.

This is a curious thing that when my process finished the size of the LMDB files was bigger than the hard drive I am using, I think this is the prove that it will not fulfill the size... It is a bit weird...

Is there a way to set the size of the file to the real size of the LMDB size?? Or just creating them again fitting the size better?

And in case that you don't fulfill the size of the map, Can that be a problem when a program tries to read the file?? I mean, reading an empty part...

Thanks

mklemm2 · 2017-02-22T18:12:10Z

It's a sparse file, it has "holes" in it where no data has been written, yet. You can see the real file size (in multiples of your block size) using ls -s.

nomader · 2017-03-09T11:46:38Z

try passing writemap=True to lmdb.open()

Noiredd · 2018-06-01T06:47:46Z

I solved this by splitting the write into smaller batches, kind of like this:

#read your files list first
batch_size = 1000  #set to whatever fits in your memory
for batch in [files[x:x+batch_size] for x in range(0, len(files), batch_size)]:
  with db_data.begin(write=True) as txn:
    for img in batch:
      #your code for opening image and storing it in the db
      #remember to keep a global counter for str_id!

tuxxy · 2020-06-25T14:37:52Z

Hi, I see that this is closed, but is there a solution to this other than using writemap=True? The lack of support for sparsefiles in Linux vs OS X is a real pain for cross-platform development.

jnwatson · 2020-06-28T02:41:00Z

You're going to have to be a little more specific. The original issue wasn't really a writemap issue. It was that they were writing a huge file as one massive transaction. There's generally not a problem if you chunk your data into reasonable-sized pieces and commit between the pieces.

nayuta-ueno mentioned this issue Jun 11, 2018

OOM nayutaco/ptarmigan#506

Closed

hraberg mentioned this issue Dec 10, 2018

LMDB Crashes at 4Gb xtdb/xtdb#92

Closed

jnwatson closed this as completed Jul 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating a LMDB bigger than RAM: Program gets killed when RAM gets full #113

Creating a LMDB bigger than RAM: Program gets killed when RAM gets full #113

jmfacil commented Apr 27, 2016

dw commented Apr 27, 2016

jmfacil commented Apr 27, 2016

dw commented Apr 27, 2016

dw commented Apr 27, 2016

dw commented Apr 27, 2016

jmfacil commented Apr 27, 2016

jmfacil commented Apr 27, 2016

jmfacil commented Apr 29, 2016

mklemm2 commented Feb 22, 2017 •

edited

Loading

nomader commented Mar 9, 2017

Noiredd commented Jun 1, 2018 •

edited

Loading

tuxxy commented Jun 25, 2020

jnwatson commented Jun 28, 2020

Creating a LMDB bigger than RAM: Program gets killed when RAM gets full #113

Creating a LMDB bigger than RAM: Program gets killed when RAM gets full #113

Comments

jmfacil commented Apr 27, 2016

dw commented Apr 27, 2016

jmfacil commented Apr 27, 2016

dw commented Apr 27, 2016

dw commented Apr 27, 2016

dw commented Apr 27, 2016

jmfacil commented Apr 27, 2016

jmfacil commented Apr 27, 2016

jmfacil commented Apr 29, 2016

mklemm2 commented Feb 22, 2017 • edited Loading

nomader commented Mar 9, 2017

Noiredd commented Jun 1, 2018 • edited Loading

tuxxy commented Jun 25, 2020

jnwatson commented Jun 28, 2020

mklemm2 commented Feb 22, 2017 •

edited

Loading

Noiredd commented Jun 1, 2018 •

edited

Loading