Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #9360, fix #9466: grab a lock before creating directories to fix race condition on Windows in partitioned write #9473

Merged
merged 1 commit into from Oct 30, 2023

Conversation

Mytherin
Copy link
Collaborator

Fixes #9360
Fixes #9404
Fixes #9466

…ries to fix race condition on Windows in partitioned write
@l1t1

This comment was marked as abuse.

@killerfurbel
Copy link

The errors of @l1t1 actually look like a different error, since they occur on writing the .parquet files, not the folders.

In my tests, all the folder specific errors are gone with the build artefact.

One possible reason for not being able to write files, which I already ran into, was: If the total path length exceeds ~260 characters, it is not possible to read/write those files for most programs in windows. Maybe @l1t1 could check if that is the case? To use paths with length > 260 characters on windows, there must be a registry setting being set and also the application needs to implement some little details (mostly, application manifest must include a longPathAware setting): https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later

@Mytherin Mytherin merged commit 182b824 into duckdb:main Oct 30, 2023
43 checks passed
@Mytherin
Copy link
Collaborator Author

Thanks for having a look - I will merge this as-is then as it seems to at least fix the directory issue.

@l1t1

This comment was marked as abuse.

@bucweat
Copy link
Contributor

bucweat commented Oct 31, 2023

Coming from #9360.

I ran my example script from #9360 which makes csv files. Output shown below.

Agree with what @killerfurbel mentioned above that the reported error is file related Cannot open file "customer\h=21\data_0.csv" where before it was directory related IO Error: Could not create directory: 'customer\h=3'. However, it does appear that the issue is still that a couple of folders did not get created...in example below, folder customer\h=21 was not created which leads to file creation error customer\h=21\data_0.csv.

Running with threads=1 works fine, and 24 folders are created: h=0 to h=23.

┌─────────────────┬────────────┐
│ library_version │ source_id  │
│     varchar     │  varchar   │
├─────────────────┼────────────┤
│ v0.9.2-dev231   │ 182b824f28 │
└─────────────────┴────────────┘

┌────────────────────────────┐
│ current_setting('threads') │
│           int64            │
├────────────────────────────┤
│                          4 │
└────────────────────────────┘

build the customer table (will fail if it already exists which is ok)
Error: near line 12: Catalog Error: Table with name "customer" already exists!

show the count, min and max duedate for customer table
┌──────────┬─────────────────────┬───────────────────────┐
│ count(1) │        start        │          end          │
│  int64   │      timestamp      │       timestamp       │
├──────────┼─────────────────────┼───────────────────────┤
│   864000 │ 2022-01-02 00:00:00 │ 2022-01-02 23:59:59.9 │
└──────────┴─────────────────────┴───────────────────────┘

make sure the customer folder is empty

check that customer folder is empty/doesn't exist

 Volume in drive M is Shared Folders
 Volume Serial Number is 0000-0000
File Not Found
System command returns 1

*****************************************************************
*****************************************************************
*****************************************************************
first time copy to empty partition customer folder (expect error)
Error: near line 50: IO Error: Cannot open file "customer\h=21\data_0.csv": The system cannot find the path specified.


read the resulting partition folder and get count (expect 864000)
┌──────────┐
│ count(1) │
│  int64   │
├──────────┤
│   648072 │
└──────────┘

display the contents of customer
 Volume in drive M is Shared Folders
 Volume Serial Number is 0000-0000

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer

10/31/2023  06:28 AM    <DIR>          h=0
10/31/2023  06:28 AM    <DIR>          h=1
10/31/2023  06:28 AM    <DIR>          h=10
10/31/2023  06:28 AM    <DIR>          h=11
10/31/2023  06:28 AM    <DIR>          h=12
10/31/2023  06:28 AM    <DIR>          h=13
10/31/2023  06:28 AM    <DIR>          h=14
10/31/2023  06:28 AM    <DIR>          h=17
10/31/2023  06:28 AM    <DIR>          h=2
10/31/2023  06:28 AM    <DIR>          h=20
10/31/2023  06:28 AM    <DIR>          h=23
10/31/2023  06:28 AM    <DIR>          h=3
10/31/2023  06:28 AM    <DIR>          h=4
10/31/2023  06:28 AM    <DIR>          h=5
10/31/2023  06:28 AM    <DIR>          h=6
10/31/2023  06:28 AM    <DIR>          h=7
10/31/2023  06:28 AM    <DIR>          h=8
10/31/2023  06:28 AM    <DIR>          h=9
               0 File(s)              0 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=2

10/31/2023  06:28 AM         2,520,821 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,520,881 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=5

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM         2,584,820 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,584,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=4

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM         2,584,820 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,584,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=3

10/31/2023  06:28 AM         1,053,524 data_0.csv
10/31/2023  06:28 AM         1,503,316 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,556,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=10

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM           629,012 data_2.csv
10/31/2023  06:28 AM         1,991,828 data_3.csv
               4 File(s)      2,620,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=17

10/31/2023  06:28 AM         2,446,100 data_0.csv
10/31/2023  06:28 AM           174,740 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,620,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=11

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM         2,620,820 data_3.csv
               4 File(s)      2,620,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=20

10/31/2023  06:28 AM         1,258,004 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM         1,362,836 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,620,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=8

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM         2,584,820 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,584,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=6

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM         2,136,788 data_1.csv
10/31/2023  06:28 AM           448,052 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,584,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=1

10/31/2023  06:28 AM         2,512,820 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,512,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=0

10/31/2023  06:28 AM         2,501,714 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,501,774 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=7

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM         2,584,820 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,584,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=9

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM         2,584,820 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,584,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=14

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM         2,620,820 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM                20 data_3.csv
               4 File(s)      2,620,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=13

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM           908,564 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM         1,712,276 data_3.csv
               4 File(s)      2,620,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=12

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM                20 data_2.csv
10/31/2023  06:28 AM         2,620,820 data_3.csv
               4 File(s)      2,620,880 bytes

 Directory of M:\Scripts\appDuckDbEvaluation\dev\partition_example\customer\h=23

10/31/2023  06:28 AM                20 data_0.csv
10/31/2023  06:28 AM                20 data_1.csv
10/31/2023  06:28 AM         2,341,268 data_2.csv
10/31/2023  06:28 AM           279,572 data_3.csv
               4 File(s)      2,620,880 bytes

     Total Files Listed:
              72 File(s)     46,568,735 bytes
              18 Dir(s)  343,931,244,544 bytes free

@l1t1

This comment was marked as abuse.

@carlopi
Copy link
Contributor

carlopi commented Oct 31, 2023

Thanks @l1t1 and all, I have a reproduction, also this secondary problem will be addressed.

carlopi added a commit to carlopi/duckdb that referenced this pull request Nov 1, 2023
Fixes duckdblabs/duckdb-internal#588 improving on duckdb#9473.
Idea is that we iterate on all global partitions instead of iterating on the local ones.
@carlopi
Copy link
Contributor

carlopi commented Nov 1, 2023

The fix should be in #9535.

@Mytherin Mytherin deleted the issue9360 branch December 4, 2023 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants