Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race-condition when using COPY with PARTITION_BY #9466

Closed
1 task done
colinbreame opened this issue Oct 25, 2023 · 1 comment · Fixed by #9473
Closed
1 task done

Possible race-condition when using COPY with PARTITION_BY #9466

colinbreame opened this issue Oct 25, 2023 · 1 comment · Fixed by #9473

Comments

@colinbreame
Copy link

colinbreame commented Oct 25, 2023

What happens?

When running:

import duckdb
input = duckdb.sql('SELECT year(time) as year, month(time) as month FROM "updates.parquet"')
duckdb.sql('''COPY input TO 'output' (FORMAT PARQUET, COMPRESSION ZSTD, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE 1)''')

Results in:

duckdb.duckdb.IOException: IO Error: Could not create directory: 'output\year=2023'

To Reproduce

It is difficult to reproduce with static data and seems to only happen with a specific input.

import duckdb
input = duckdb.sql('SELECT year(time) as year, month(time) as month FROM "updates.parquet"')
duckdb.sql('''COPY input TO 'output' (FORMAT PARQUET, COMPRESSION ZSTD, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE 1)''')

It only happens if the directory has not previously been created.

Saving the input variable into a new parquet file, and then using it as the source of COPY does not trigger the error.

OS:

Windows

DuckDB Version:

0.9.1

DuckDB Client:

Python

Full Name:

Colin Breame

Affiliation:

Statkraft Germany GmbH

Have you tried this on the latest main branch?

I have tested with a main build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@l1t1

This comment was marked as abuse.

Mytherin added a commit that referenced this issue Oct 30, 2023
Fix #9360, fix #9466: grab a lock before creating directories to fix race condition on Windows in partitioned write
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants