Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] exception caused by dataset_writer.cc:587: Check failed: (largest) != (nullptr) #38011

Closed
pawelz188 opened this issue Oct 4, 2023 · 8 comments · Fixed by #38030
Closed
Assignees
Milestone

Comments

@pawelz188
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

The exception is repeated every time and
can be triggered by e.g. modifying the example by modification of CreateExampleParquetHivePartitionedDataset() example in arrow/cpp/examples/arrow/dataset_documentation_example.cc
by setting the recording parameters:

  write_options.base_dir = base_path;
  write_options.partitioning = partitioning;
  write_options.basename_template = "part{i}.parquet";
+   write_options.max_open_files = 2;
+   write_options.max_rows_per_file = 2;
+   write_options.max_rows_per_group = 2;
  ARROW_RETURN_NOT_OK(ds::FileSystemDataset::Write(write_options, scanner));
  return base_path;

Output error message:
vcpkg\buildtrees\arrow\src\e-arrow-13-0a65b46298.clean\cpp\src\arrow\dataset\dataset_writer.cc:587: Check failed: (largest) != (nullptr)

Exception is triggered by DCHECK_NE(largest, nullptr) in CloseLargestFile() procedure in dataset_writter.cc.

The exception call indicates a synchronization problem when arrow is closing files and a limit on the number of open files is reached.

Component(s)

C++

@pawelz188
Copy link
Author

My local commit with changes described above:
0829fd5
that triggered exception.

@kou
Copy link
Member

kou commented Oct 4, 2023

Thanks for your report.
Do you want to work on this?

@mapleFU
Copy link
Member

mapleFU commented Oct 5, 2023

Seems this is because the writer would like to close a file, but all in-flight file has written zero rows...🤔

I can fix this if you're not willing to work on this

@pawelz188
Copy link
Author

Please fix it, I am not yet expert in arrow.
Now I'm trying to make a patch to fix the problem but I'm not sure of the impact on the overall code.

@mapleFU
Copy link
Member

mapleFU commented Oct 5, 2023

I'll submit a fixing. I'm in National Day so maybe slow to reply

@pawelz188
Copy link
Author

This patch eliminates the problem of exception occurrence.
It seems that the files are handled correctly.
Please confirm the impact of the patch on the dataset at overall.
CloseLargestFile-nullptr-fix.patch

@pawelz188
Copy link
Author

I'll submit a fixing. I'm in National Day so maybe slow to reply

If You submit a fix I will test it within my scope.

@mapleFU
Copy link
Member

mapleFU commented Oct 5, 2023

#38030 I've submit a basic fixing here, will accept review and maybe working more after national holiday :-)

@kou kou closed this as completed in #38030 Oct 5, 2023
kou pushed a commit that referenced this issue Oct 5, 2023
…#38030)

### Rationale for this change

`CloseLargestFile()` will failed to close when non of the file has written any rows.

### What changes are included in this PR?

Change `CloseLargestFile()` to `TryCloseLargestFile()`, and not throw error when it cannot find a file that haven't write any rows.

### Are these changes tested?

no

### Are there any user-facing changes?

bugfix

* Closes: #38011

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 14.0.0 milestone Oct 5, 2023
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
… write (apache#38030)

### Rationale for this change

`CloseLargestFile()` will failed to close when non of the file has written any rows.

### What changes are included in this PR?

Change `CloseLargestFile()` to `TryCloseLargestFile()`, and not throw error when it cannot find a file that haven't write any rows.

### Are these changes tested?

no

### Are there any user-facing changes?

bugfix

* Closes: apache#38011

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
… write (apache#38030)

### Rationale for this change

`CloseLargestFile()` will failed to close when non of the file has written any rows.

### What changes are included in this PR?

Change `CloseLargestFile()` to `TryCloseLargestFile()`, and not throw error when it cannot find a file that haven't write any rows.

### Are these changes tested?

no

### Are there any user-facing changes?

bugfix

* Closes: apache#38011

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
… write (apache#38030)

### Rationale for this change

`CloseLargestFile()` will failed to close when non of the file has written any rows.

### What changes are included in this PR?

Change `CloseLargestFile()` to `TryCloseLargestFile()`, and not throw error when it cannot find a file that haven't write any rows.

### Are these changes tested?

no

### Are there any user-facing changes?

bugfix

* Closes: apache#38011

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants