Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R][C++] segfault when writing to ParquetFileWriter after closing #37969

Closed
amoeba opened this issue Oct 1, 2023 · 3 comments · Fixed by #38390
Closed

[R][C++] segfault when writing to ParquetFileWriter after closing #37969

amoeba opened this issue Oct 1, 2023 · 3 comments · Fixed by #38390
Assignees
Labels
Milestone

Comments

@amoeba
Copy link
Member

amoeba commented Oct 1, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Writing to a closed writer causes a segfault rather than an error. I ran into this while testing something unrelated, evaluating portions of a larger script in a REPL. Writing to a closed output errors as expected so the key here is writer$Close() and the subsequent writer$WriteTable call:

library(arrow)

outfile <- tempfile(fileext = ".parquet")
sink <- FileOutputStream$create(outfile)

my_schema <- schema(letters = string())
writer <- ParquetFileWriter$create(
  schema = my_schema,
  sink,
  properties = ParquetWriterProperties$create(
    column_names = names(my_schema),
    compression = arrow:::default_parquet_compression()
  )
)
tbl_arrow <- as_arrow_table(data.frame(letters=LETTERS), schema = my_schema)
writer$WriteTable(tbl_arrow, chunk_size = 1)

writer$Close()
sink$close()

tbl_arrow <- as_arrow_table(data.frame(letters=LETTERS), schema = my_schema)
writer$WriteTable(tbl_arrow, chunk_size = 1)

Result:

 *** caught segfault ***
address 0x0, cause 'invalid permissions'

Traceback:
 1: parquet___arrow___FileWriter__WriteTable(self, table, chunk_size)
 2: writer$WriteTable(tbl_arrow, chunk_size = 1)
An irrecoverable exception occurred. R is aborting now ...
fish: Job 1, 'Rscript arrow_memorypool_crashe…' terminated by signal SIGSEGV (Address boundary error)
  • OS/arch: macOS 14.0 (Sonoma), aarch64 (M2)
  • R: 4.3.1
  • arrow version: 13.0.0.1

Component(s)

R

@thisisnic
Copy link
Member

Have replicated this on Ubuntu 23.04 with R 4.3.1 and arrow 13.0.0.1 too. Here's gdb output:

Thread 1 "R" received signal SIGSEGV, Segmentation fault.
parquet::ParquetFileWriter::properties (this=0x5555563a04b0)
    at /home/nic/arrow/cpp/src/parquet/file_writer.cc:663
warning: Source file is more recent than executable.
663	  return contents_->properties();

@thisisnic thisisnic changed the title [R] segfault when writing to ParquetFileWriter after closing [R][C++] segfault when writing to ParquetFileWriter after closing Oct 1, 2023
@quanghgx
Copy link
Contributor

Hi @thisisnic and @amoeba
I can take this ticket if it's unassigned. I'm running local tests to confirm the fix.

@thisisnic
Copy link
Member

Yep, nobody's assigned and there's no open PR, so it's free. Thanks @quanghgx!

pitrou pushed a commit that referenced this issue Nov 16, 2023
…riter (#38390)

### Rationale for this change
Operations on closed ParquetFileWriter are not allowed, but should not segfault. Somehow, ParquetFileWriter::Close() also reset its pimpl, so after that, any operators, those need this pointer will lead to segfault

### What changes are included in this PR?
Adding more checks for closed file.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.

* Closes: #37969

Authored-by: Quang Hoang <quanghgx@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 15.0.0 milestone Nov 16, 2023
@amoeba amoeba added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Jan 13, 2024
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…tFileWriter (apache#38390)

### Rationale for this change
Operations on closed ParquetFileWriter are not allowed, but should not segfault. Somehow, ParquetFileWriter::Close() also reset its pimpl, so after that, any operators, those need this pointer will lead to segfault

### What changes are included in this PR?
Adding more checks for closed file.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.

* Closes: apache#37969

Authored-by: Quang Hoang <quanghgx@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants