Skip to content
This repository was archived by the owner on Jan 13, 2022. It is now read-only.
This repository was archived by the owner on Jan 13, 2022. It is now read-only.

Parallelized inserts using MonetDBLite are not persistent #40

@sedot42

Description

@sedot42
  • MonetDBLite-Python version: 0.6.3
  • Python version: 3.7.3
  • Pip version: 19.0.3
  • Operating System: Arch Linux

Description

I'm importing point cloud data. Since some processing steps are pretty CPU-intensive, I'm parallelizing the processing. At the end of the preprocessing, the data is loaded into MonetDB from a Pandas data frame.

As long as the Python process is active, the size of the database on disk increases with each insert. But as soon as the process/worker terminates, disk sizes shrinks back to 1.5MB.

How can I make the changes persistent?

What I Did

This is a rough simplification of the code:

def process:
   # preprocessing...
   x, y = numpy.meshgrid(numpy.arange(1000), numpy.arange(1000))
   z = numpy.random.rand(1000000)
   data = pandas.DataFrame({"x": x, "y": y, "z": z})
   conn = monetdblite.connectclient()
   monetdblite.insert('points', data, client=conn)
   del conn

datalist = [...]
monetdblite.init("./database/")
with Pool(processes=2, maxtasksperchild=1) as p:
    p.map(process, datalist, 1)
monetdblite.shutdown()

Related stackoverflow question

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions