You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When performing a COPY to S3 using hive partitioning, memory usage is higher than expected.
For example, copying a table with 2 int64 columns with 30 partitions of 1k rows to S3 already fails when the memory limit is set to 2GiB, even though the entire table easily fits in memory. Copying to local disk, or copying to a single file in S3 both work fine. The format does not seem to matter, both Parquet and CSV exhibit the same issue.
Platforms tested (all with DuckDB CLI v0.10.2):
Linux x86-64
Linux aarch64
MacOS aarch64
To Reproduce
SET memory_limit ='2GiB';
-- Settings below do not seem to make a difference, but trying to maximize reproducibilitySET threads =1;
SET s3_uploader_thread_limit =1;
SET preserve_insertion_order = false;
-- Create a table with 30 partitions of 1000 records eachCREATETABLEtestASSELECT UNNEST(RANGE(30000)) x, x//1000AS y;
COPY test TO 's3://<bucket>/path' (FORMAT PARQUET, PARTITION_BY (y));
-- Out of Memory Error: could not allocate block of size 76.5 MiB (1.9 GiB/2.0 GiB used)
OS:
Linux x86-64
DuckDB Version:
0.10.2
DuckDB Client:
CLI
Full Name:
Jan Kramer
Affiliation:
N/A
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
Yes, I have
The text was updated successfully, but these errors were encountered:
What happens?
When performing a
COPY
to S3 using hive partitioning, memory usage is higher than expected.For example, copying a table with 2
int64
columns with 30 partitions of 1k rows to S3 already fails when the memory limit is set to 2GiB, even though the entire table easily fits in memory. Copying to local disk, or copying to a single file in S3 both work fine. The format does not seem to matter, both Parquet and CSV exhibit the same issue.Platforms tested (all with DuckDB CLI v0.10.2):
To Reproduce
OS:
Linux x86-64
DuckDB Version:
0.10.2
DuckDB Client:
CLI
Full Name:
Jan Kramer
Affiliation:
N/A
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: