-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: In-memory duckdb keep increasing indefinitely #2471
Comments
And i noticed that even after i deleted all the records from the table. The memory is not freed up. Thanks |
Thanks for the report!
That is perhaps related to the fact that in-memory databases are not checkpointed/flushed, or perhaps related to string updates not being properly cleaned up. What kind of data are you inserting into the tables (types, size of values, etc)?
Could you share the query/data that triggers this crash please?
|
Will try to create a repeatable example later this week. |
Hi. I found that there are different behavior for
vs
the 1st one caused me segmentation fault during UPDATE executemany on a large table. What is the difference between the two ? Thanks |
In the first one you are creating an on-disk database called Could you create a reproducible example of the segmentation fault? |
Here is the reproducible example:
I noticed that the VARCHAR/TEXT column type is the cause of the segmentation fault. |
@Mytherin could you plz take a look. Thanks! |
Thanks for the update! I can indeed reproduce the problem here. I will have a look. |
…nd handle it earlier to clean upc ode
Fix #2471: correctly handle offset passed by ::UpdateSegment, and handle it earlier to clean up code
What happens?
I run a in-memory duckdb python (initialise it with a table of 200K records, memory~250MB after inserting those, id column as the primary key) and the process subscribe to a stream of update (pandas dataframe) which keep updating the table by cursor.executemany("UPDATE TABLE set field1 = ?, field2= ? where id = ?", df.to_records()) for 500 records every second.
However, the memory of the python program keep increasing even there is no new records inserted ( I keep reusing the cursor for the updates)
If i comment out the cursor.executemany statement and just print out the dataframe. The memory doesn't increase while getting the update from the data stream.
Therefore, I am quite sure the memory increment is due to the update statement. I also set the memory limit by PRAGMA memory_limit='1GB';
Moreover, I got segmentation fault if i try to run a update-select (update a big table with 20k records from a table with 500 records) statement . If i just have say 5k records in that big table , then it runs fine.
To Reproduce
I will try to create a sample program later. But for now, wondering if I am doing anything wrong with the in-memory database.
Environment (please complete the following information):
Before Submitting
master
branch?pip install duckdb --upgrade --pre
install.packages("https://github.com/duckdb/duckdb/releases/download/master-builds/duckdb_r_src.tar.gz", repos = NULL)
The text was updated successfully, but these errors were encountered: