Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] arrow::ipc::StreamDecoder may return a broken RecordBatch with Consume(const uint8_t* data, int64_t size) API #39163

Closed
kou opened this issue Dec 10, 2023 · 0 comments · Fixed by #39164
Assignees
Milestone

Comments

@kou
Copy link
Member

kou commented Dec 10, 2023

Describe the bug, including details regarding any error messages, version, and platform.

arrow::ipc::StreamDecoder::Consume(const uint8_t* data, int64_t size) can't assume the given data is still valid after the next Consume(const uint8_t* data, int64_t size) call. So we need to copy the given data for metadata message. Because the metadata message is used in the next body message. The next body message may be happen in the next Consume(const uint8_t* data, int64_t size) call.

Component(s)

C++

@kou kou added the Type: bug label Dec 10, 2023
kou added a commit to kou/arrow that referenced this issue Dec 10, 2023
…(data)

We need to copy data for metadata message. Because it may be used in
subsequent `Consume(data)` calls. We can't assume that the given
`data` is still valid in subsequent `Consume(data)`.

We also need to copy buffered `data` because it's used in subsequent
`Consume(data)` calls.
kou added a commit to kou/arrow that referenced this issue Dec 10, 2023
…(data)

We need to copy data for metadata message. Because it may be used in
subsequent `Consume(data)` calls. We can't assume that the given
`data` is still valid in subsequent `Consume(data)`.

We also need to copy buffered `data` because it's used in subsequent
`Consume(data)` calls.
kou added a commit to kou/arrow that referenced this issue Dec 11, 2023
…(data)

We need to copy data for metadata message. Because it may be used in
subsequent `Consume(data)` calls. We can't assume that the given
`data` is still valid in subsequent `Consume(data)`.

We also need to copy buffered `data` because it's used in subsequent
`Consume(data)` calls.
@ianmcook ianmcook added this to the 15.0.0 milestone Jan 2, 2024
kou added a commit to kou/arrow that referenced this issue Jan 5, 2024
…(data)

We need to copy data for metadata message. Because it may be used in
subsequent `Consume(data)` calls. We can't assume that the given
`data` is still valid in subsequent `Consume(data)`.

We also need to copy buffered `data` because it's used in subsequent
`Consume(data)` calls.
@kou kou closed this as completed in #39164 Jan 6, 2024
kou added a commit that referenced this issue Jan 6, 2024
…#39164)

### Rationale for this change

We need to copy data for metadata message. Because it may be used in subsequent `Consume(data)` calls. We can't assume that the given `data` is still valid in subsequent `Consume(data)`.

We also need to copy buffered `data` because it's used in subsequent `Consume(data)` calls.

### What changes are included in this PR?

* Add missing copies.
* Clean up existing buffer copy codes.
* Change tests to use ephemeral `data` to detect this case.
* Add `copy_record_batch` option to `CollectListener` to copy decoded record batches.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

* Closes #39163 
* Closes: #39163

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
…(data) (apache#39164)

### Rationale for this change

We need to copy data for metadata message. Because it may be used in subsequent `Consume(data)` calls. We can't assume that the given `data` is still valid in subsequent `Consume(data)`.

We also need to copy buffered `data` because it's used in subsequent `Consume(data)` calls.

### What changes are included in this PR?

* Add missing copies.
* Clean up existing buffer copy codes.
* Change tests to use ephemeral `data` to detect this case.
* Add `copy_record_batch` option to `CollectListener` to copy decoded record batches.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

* Closes apache#39163 
* Closes: apache#39163

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…(data) (apache#39164)

### Rationale for this change

We need to copy data for metadata message. Because it may be used in subsequent `Consume(data)` calls. We can't assume that the given `data` is still valid in subsequent `Consume(data)`.

We also need to copy buffered `data` because it's used in subsequent `Consume(data)` calls.

### What changes are included in this PR?

* Add missing copies.
* Clean up existing buffer copy codes.
* Change tests to use ephemeral `data` to detect this case.
* Add `copy_record_batch` option to `CollectListener` to copy decoded record batches.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

* Closes apache#39163 
* Closes: apache#39163

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…(data) (apache#39164)

### Rationale for this change

We need to copy data for metadata message. Because it may be used in subsequent `Consume(data)` calls. We can't assume that the given `data` is still valid in subsequent `Consume(data)`.

We also need to copy buffered `data` because it's used in subsequent `Consume(data)` calls.

### What changes are included in this PR?

* Add missing copies.
* Clean up existing buffer copy codes.
* Change tests to use ephemeral `data` to detect this case.
* Add `copy_record_batch` option to `CollectListener` to copy decoded record batches.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

* Closes apache#39163 
* Closes: apache#39163

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants