-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37969: [C++][Parquet] add more closed file checks for ParquetFileWriter #38390
GH-37969: [C++][Parquet] add more closed file checks for ParquetFileWriter #38390
Conversation
(I didn't go through the issue carefully, but would e.g. https://github.com/apache/arrow/blob/main/cpp/src/arrow/filesystem/s3fs.cc#L1249 |
Thanks for your hint. Refactored the checking into separate method. Use "Assert" here because it will throw instead of return arrow::Status. As operation on closed object is not a normal application flow. Please help take a look again, @mapleFU |
Ooops, I think you're choosing the right way, but thats maybe because |
Status Close() override {
if (!closed_) {
// Make idempotent
closed_ = true;
if (row_group_writer_ != nullptr) {
auto row_group_writer = std::move(row_group_writer_);
PARQUET_CATCH_NOT_OK(row_group_writer->Close());
}
PARQUET_CATCH_NOT_OK(writer_->Close());
} Actually this change LGTM. But I'm not fully understand why Edit: You can leave this patch just checking here, and find out why it would close twice, then open a new issue for that? I'm a bit tired today, maybe I'll gothrough the code here tomorrow |
you are right, normally, it will not be called twice. However, in test case for use-after-close with arrow/cpp/src/parquet/arrow/arrow_reader_writer_test.cc Lines 5227 to 5237 in 450175b
To be honest, I think I need to study a bit more to fully understand state diagram of these there attributes { arrow/cpp/src/parquet/arrow/writer.cc Lines 479 to 483 in 450175b
Thank you so much @mapleFU |
The logic here is not hard to understand, but a bit complex because so many pimpl used and there are |
2b0dcc4
to
c718b8b
Compare
Update:,
How do you think? Thank you so much for guiding me along the way; I don't mind to document it down, get your review along with MR for that. Thanks @mapleFU |
Parquet is about two parts in this library
The writer has the structure below:
Also, there are some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh previously I approve because the change looks ok to me, but after re-review I found the reason for re-close. Status NewRowGroup(int64_t chunk_size) override {
if (row_group_writer_ != nullptr) {
PARQUET_CATCH_NOT_OK(row_group_writer_->Close());
}
PARQUET_CATCH_NOT_OK(row_group_writer_ = writer_->AppendRowGroup());
return Status::OK();
}
Status Close() override {
if (!closed_) {
// Make idempotent
closed_ = true;
if (row_group_writer_ != nullptr) {
PARQUET_CATCH_NOT_OK(row_group_writer_->Close());
}
PARQUET_CATCH_NOT_OK(writer_->Close());
}
return Status::OK();
} Also I think the Close();
NewRowGroupWriter(); in
Solving: You can also add a Here I think you should:
This will also prevent from the issue, because |
Thank you, @mapleFU, for providing a clear and detailed guideline. I've applied these changes and added back two simple tests. I have one final comment regarding the We currently have two options for handling errors in this function:
I personally lean towards option (B.) for the sake of consistency. However, considering that use-after-close is not a typical flow (usually requiring user code revision to fully handle this case), using an exception might be a viable option here. This part is making me uncertain, so I would appreciate your input on this matter. Thanks mappleFU. |
Sorry for delaying reply. Nice catch, I missed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failed in unrelated Will wait other committers review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't you protecting all public methods in FileWriter
? I don't think it makes sense to only protect a couple of them.
@pitrou Should this better for adding it on
I think other methods like |
Everywhere where we rely on something that's destroyed by |
Well I think adding in functions I mentioned above is ok Previously I think Maybe you can try add |
Thanks @mapleFU and @pitrou
Reviewed, safe to skip:
|
update: checked 3 failing tests. They are unrelated to this MR, I believe. |
Indeed, the CI failures are unrelated. |
Thank you @quanghgx ! I will merge this PR now. |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 3e0ca5b. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…tFileWriter (apache#38390) ### Rationale for this change Operations on closed ParquetFileWriter are not allowed, but should not segfault. Somehow, ParquetFileWriter::Close() also reset its pimpl, so after that, any operators, those need this pointer will lead to segfault ### What changes are included in this PR? Adding more checks for closed file. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#37969 Authored-by: Quang Hoang <quanghgx@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
Operations on closed ParquetFileWriter are not allowed, but should not segfault. Somehow, ParquetFileWriter::Close() also reset its pimpl, so after that, any operators, those need this pointer will lead to segfault
What changes are included in this PR?
Adding more checks for closed file.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.