New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] RecordBatchStreamReader should use StreamDecoder #26153
Comments
Antoine Pitrou / @pitrou: |
Kouhei Sutou / @kou: |
Apache Arrow JIRA Bot: |
…and StreamDecoder Because they (pull-based and push-based) must have the same behavior. This PR extracts reusable codes to StreamDecoderInternal from StreamDecoderImpl. External API isn't changed for RecordBatchStreamReader and StreamDecoder. This PR adds some external API to implement this: * arrow::Status::ToStringWithoutContextLines(): This is only for testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE() with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this. We can extract this and related changes to separated PR if we want. * arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because RecordBatchStreamReader wants not only RecordBatch but also custom metadata. OnRecordBatchWithMetadataDecoded() receives RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and it's used by default for backward compatibility. * arrow::ipc::CollectListener::metadatas(), arrow::ipc::CollectListener::num_record_batches(), arrow::ipc::CollectListener::PopRecordBatch(), arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add these APIs, we can use CollectListner in RecordBatchStreamReader. We can create an internal listener only for RecordBatchStreamReader if don't want to extend CollectListener.
…and StreamDecoder Because they (pull-based and push-based) must have the same behavior. This PR extracts reusable codes to StreamDecoderInternal from StreamDecoderImpl. External API isn't changed for RecordBatchStreamReader and StreamDecoder. This PR adds some external API to implement this: * arrow::Status::ToStringWithoutContextLines(): This is only for testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE() with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this. We can extract this and related changes to separated PR if we want. * arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because RecordBatchStreamReader wants not only RecordBatch but also custom metadata. OnRecordBatchWithMetadataDecoded() receives RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and it's used by default for backward compatibility. * arrow::ipc::CollectListener::metadatas(), arrow::ipc::CollectListener::num_record_batches(), arrow::ipc::CollectListener::PopRecordBatch(), arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add these APIs, we can use CollectListner in RecordBatchStreamReader. We can create an internal listener only for RecordBatchStreamReader if don't want to extend CollectListener.
…and StreamDecoder Because they (pull-based and push-based) must have the same behavior. This PR extracts reusable codes to StreamDecoderInternal from StreamDecoderImpl. External API isn't changed for RecordBatchStreamReader and StreamDecoder. This PR adds some external API to implement this: * arrow::Status::ToStringWithoutContextLines(): This is only for testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE() with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this. We can extract this and related changes to separated PR if we want. * arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because RecordBatchStreamReader wants not only RecordBatch but also custom metadata. OnRecordBatchWithMetadataDecoded() receives RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and it's used by default for backward compatibility. * arrow::ipc::CollectListener::metadatas(), arrow::ipc::CollectListener::num_record_batches(), arrow::ipc::CollectListener::PopRecordBatch(), arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add these APIs, we can use CollectListner in RecordBatchStreamReader. We can create an internal listener only for RecordBatchStreamReader if don't want to extend CollectListener.
…and StreamDecoder Because they (pull-based and push-based) must have the same behavior. This PR extracts reusable codes to StreamDecoderInternal from StreamDecoderImpl. External API isn't changed for RecordBatchStreamReader and StreamDecoder. This PR adds some external API to implement this: * arrow::Status::ToStringWithoutContextLines(): This is only for testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE() with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this. We can extract this and related changes to separated PR if we want. * arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because RecordBatchStreamReader wants not only RecordBatch but also custom metadata. OnRecordBatchWithMetadataDecoded() receives RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and it's used by default for backward compatibility. * arrow::ipc::CollectListener::metadatas(), arrow::ipc::CollectListener::num_record_batches(), arrow::ipc::CollectListener::PopRecordBatch(), arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add these APIs, we can use CollectListner in RecordBatchStreamReader. We can create an internal listener only for RecordBatchStreamReader if don't want to extend CollectListener.
…reamDecoder (#36344) ### Rationale for this change Because they (pull-based and push-based) must have the same behavior. ### What changes are included in this PR? This PR extracts reusable codes to StreamDecoderInternal from StreamDecoderImpl. External API isn't changed for RecordBatchStreamReader and StreamDecoder. This PR adds some external API to implement this: * arrow::Status::ToStringWithoutContextLines(): This is only for testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE() with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this. We can extract this and related changes to separated PR if we want. * arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because RecordBatchStreamReader wants not only RecordBatch but also custom metadata. OnRecordBatchWithMetadataDecoded() receives RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and it's used by default for backward compatibility. * arrow::ipc::CollectListener::metadatas(), arrow::ipc::CollectListener::num_record_batches(), arrow::ipc::CollectListener::PopRecordBatch(), arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add these APIs, we can use CollectListner in RecordBatchStreamReader. We can create an internal listener only for RecordBatchStreamReader if don't want to extend CollectListener. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. **This PR includes breaking changes to public APIs.** `arrow::ipc::CollectListener::record_batches()` returns `const std::vector<std::shared_ptr<RecordBatch>>&` instead of `std::vector<std::shared_ptr<RecordBatch>>`. * Closes: #26153 Lead-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…and StreamDecoder (apache#36344) ### Rationale for this change Because they (pull-based and push-based) must have the same behavior. ### What changes are included in this PR? This PR extracts reusable codes to StreamDecoderInternal from StreamDecoderImpl. External API isn't changed for RecordBatchStreamReader and StreamDecoder. This PR adds some external API to implement this: * arrow::Status::ToStringWithoutContextLines(): This is only for testing. We can get stable result of ASSERT_RAISES_WITH_MESSAGE() with/without -DARROW_EXTRA_ERROR_CONTEXT=ON by this. We can extract this and related changes to separated PR if we want. * arrow::ipc::Listener::OnRecordBatchWithMetadataDecoded(): Because RecordBatchStreamReader wants not only RecordBatch but also custom metadata. OnRecordBatchWithMetadataDecoded() receives RecordBatchWithMetadata. OnRecordBatchDecoded() still exists and it's used by default for backward compatibility. * arrow::ipc::CollectListener::metadatas(), arrow::ipc::CollectListener::num_record_batches(), arrow::ipc::CollectListener::PopRecordBatch(), arrow::ipc::CollectListener::PopRecordBatchWithMetadat(): If we add these APIs, we can use CollectListner in RecordBatchStreamReader. We can create an internal listener only for RecordBatchStreamReader if don't want to extend CollectListener. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. **This PR includes breaking changes to public APIs.** `arrow::ipc::CollectListener::record_batches()` returns `const std::vector<std::shared_ptr<RecordBatch>>&` instead of `std::vector<std::shared_ptr<RecordBatch>>`. * Closes: apache#26153 Lead-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
There's no reason to duplicate some of the stream reading logic, and re-using StreamDecoder would ensure the behaviour of both classes matches.
Reporter: Antoine Pitrou / @pitrou
Assignee: Kouhei Sutou / @kou
Note: This issue was originally created as ARROW-10142. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: