Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] RecordBatchStreamReader should use StreamDecoder #26153

Closed
asfimport opened this issue Sep 30, 2020 · 3 comments · Fixed by #36344
Closed

[C++] RecordBatchStreamReader should use StreamDecoder #26153

asfimport opened this issue Sep 30, 2020 · 3 comments · Fixed by #36344
Assignees
Labels
Breaking Change Includes a breaking change to the API Component: C++ Type: enhancement
Milestone

Comments

@asfimport
Copy link

There's no reason to duplicate some of the stream reading logic, and re-using StreamDecoder would ensure the behaviour of both classes matches.

Reporter: Antoine Pitrou / @pitrou
Assignee: Kouhei Sutou / @kou

Note: This issue was originally created as ARROW-10142. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
@kou This is something you might want to look at, though not high priority.

@asfimport
Copy link
Author

Kouhei Sutou / @kou:
OK. I'll do it later.

@asfimport
Copy link
Author

Apache Arrow JIRA Bot:
This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

kou added a commit to kou/arrow that referenced this issue Jun 28, 2023
…and StreamDecoder

Because they (pull-based and push-based) must have the same behavior.

This PR extracts reusable codes to StreamDecoderInternal from
StreamDecoderImpl. External API isn't changed for
RecordBatchStreamReader and StreamDecoder.

This PR adds some external API to implement this:

* arrow::Status::ToStringWithoutContextLines(): This is only for
  testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE()
  with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this.

  We can extract this and related changes to separated PR if we want.

* arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because
  RecordBatchStreamReader wants not only RecordBatch but also custom
  metadata. OnRecordBatchWithMetadataDecoded() receives
  RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and
  it's used by default for backward compatibility.

* arrow::ipc::CollectListener::metadatas(),
  arrow::ipc::CollectListener::num_record_batches(),
  arrow::ipc::CollectListener::PopRecordBatch(),
  arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add
  these APIs, we can use CollectListner in RecordBatchStreamReader. We
  can create an internal listener only for RecordBatchStreamReader if
  don't want to extend CollectListener.
kou added a commit to kou/arrow that referenced this issue Jun 28, 2023
…and StreamDecoder

Because they (pull-based and push-based) must have the same behavior.

This PR extracts reusable codes to StreamDecoderInternal from
StreamDecoderImpl. External API isn't changed for
RecordBatchStreamReader and StreamDecoder.

This PR adds some external API to implement this:

* arrow::Status::ToStringWithoutContextLines(): This is only for
  testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE()
  with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this.

  We can extract this and related changes to separated PR if we want.

* arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because
  RecordBatchStreamReader wants not only RecordBatch but also custom
  metadata. OnRecordBatchWithMetadataDecoded() receives
  RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and
  it's used by default for backward compatibility.

* arrow::ipc::CollectListener::metadatas(),
  arrow::ipc::CollectListener::num_record_batches(),
  arrow::ipc::CollectListener::PopRecordBatch(),
  arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add
  these APIs, we can use CollectListner in RecordBatchStreamReader. We
  can create an internal listener only for RecordBatchStreamReader if
  don't want to extend CollectListener.
kou added a commit to kou/arrow that referenced this issue Jun 30, 2023
…and StreamDecoder

Because they (pull-based and push-based) must have the same behavior.

This PR extracts reusable codes to StreamDecoderInternal from
StreamDecoderImpl. External API isn't changed for
RecordBatchStreamReader and StreamDecoder.

This PR adds some external API to implement this:

* arrow::Status::ToStringWithoutContextLines(): This is only for
  testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE()
  with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this.

  We can extract this and related changes to separated PR if we want.

* arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because
  RecordBatchStreamReader wants not only RecordBatch but also custom
  metadata. OnRecordBatchWithMetadataDecoded() receives
  RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and
  it's used by default for backward compatibility.

* arrow::ipc::CollectListener::metadatas(),
  arrow::ipc::CollectListener::num_record_batches(),
  arrow::ipc::CollectListener::PopRecordBatch(),
  arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add
  these APIs, we can use CollectListner in RecordBatchStreamReader. We
  can create an internal listener only for RecordBatchStreamReader if
  don't want to extend CollectListener.
kou added a commit to kou/arrow that referenced this issue Jul 3, 2023
…and StreamDecoder

Because they (pull-based and push-based) must have the same behavior.

This PR extracts reusable codes to StreamDecoderInternal from
StreamDecoderImpl. External API isn't changed for
RecordBatchStreamReader and StreamDecoder.

This PR adds some external API to implement this:

* arrow::Status::ToStringWithoutContextLines(): This is only for
  testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE()
  with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this.

  We can extract this and related changes to separated PR if we want.

* arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because
  RecordBatchStreamReader wants not only RecordBatch but also custom
  metadata. OnRecordBatchWithMetadataDecoded() receives
  RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and
  it's used by default for backward compatibility.

* arrow::ipc::CollectListener::metadatas(),
  arrow::ipc::CollectListener::num_record_batches(),
  arrow::ipc::CollectListener::PopRecordBatch(),
  arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add
  these APIs, we can use CollectListner in RecordBatchStreamReader. We
  can create an internal listener only for RecordBatchStreamReader if
  don't want to extend CollectListener.
@kou kou closed this as completed in #36344 Jul 4, 2023
kou added a commit that referenced this issue Jul 4, 2023
…reamDecoder (#36344)

### Rationale for this change

Because they (pull-based and push-based) must have the same behavior.

### What changes are included in this PR?

This PR extracts reusable codes to StreamDecoderInternal from StreamDecoderImpl. External API isn't changed for
RecordBatchStreamReader and StreamDecoder.

This PR adds some external API to implement this:

* arrow::Status::ToStringWithoutContextLines(): This is only for testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE() with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this.

  We can extract this and related changes to separated PR if we want.

* arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because RecordBatchStreamReader wants not only RecordBatch but also custom metadata. OnRecordBatchWithMetadataDecoded() receives RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and it's used by default for backward compatibility.

* arrow::ipc::CollectListener::metadatas(), arrow::ipc::CollectListener::num_record_batches(), arrow::ipc::CollectListener::PopRecordBatch(), arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add these APIs, we can use CollectListner in RecordBatchStreamReader. We can create an internal listener only for RecordBatchStreamReader if don't want to extend CollectListener.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

**This PR includes breaking changes to public APIs.**

`arrow::ipc::CollectListener::record_batches()` returns `const std::vector<std::shared_ptr<RecordBatch>>&` instead of `std::vector<std::shared_ptr<RecordBatch>>`.

* Closes: #26153

Lead-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 13.0.0 milestone Jul 4, 2023
westonpace pushed a commit to westonpace/arrow that referenced this issue Jul 7, 2023
…and StreamDecoder (apache#36344)

### Rationale for this change

Because they (pull-based and push-based) must have the same behavior.

### What changes are included in this PR?

This PR extracts reusable codes to StreamDecoderInternal from StreamDecoderImpl. External API isn't changed for
RecordBatchStreamReader and StreamDecoder.

This PR adds some external API to implement this:

* arrow::Status::ToStringWithoutContextLines(): This is only for testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE() with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this.

  We can extract this and related changes to separated PR if we want.

* arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because RecordBatchStreamReader wants not only RecordBatch but also custom metadata. OnRecordBatchWithMetadataDecoded() receives RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and it's used by default for backward compatibility.

* arrow::ipc::CollectListener::metadatas(), arrow::ipc::CollectListener::num_record_batches(), arrow::ipc::CollectListener::PopRecordBatch(), arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add these APIs, we can use CollectListner in RecordBatchStreamReader. We can create an internal listener only for RecordBatchStreamReader if don't want to extend CollectListener.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

**This PR includes breaking changes to public APIs.**

`arrow::ipc::CollectListener::record_batches()` returns `const std::vector<std::shared_ptr<RecordBatch>>&` instead of `std::vector<std::shared_ptr<RecordBatch>>`.

* Closes: apache#26153

Lead-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@raulcd raulcd added the Breaking Change Includes a breaking change to the API label Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking Change Includes a breaking change to the API Component: C++ Type: enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants