Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-34056: [C++] Add Utility function to simplify converting any row-based structure into an arrow::RecordBatchReader or an arrow::Table #34057

Merged
merged 20 commits into from
Feb 21, 2023

Conversation

gringasalpastor
Copy link
Contributor

@gringasalpastor gringasalpastor commented Feb 6, 2023

Are these changes tested?

The following tests are provided:

  • basic usage
  • const ranges
  • custom struct accessor
  • usage with std::variant

@github-actions
Copy link

github-actions bot commented Feb 6, 2023

⚠️ GitHub issue #34056 has been automatically assigned in GitHub to PR creator.

@gringasalpastor
Copy link
Contributor Author

@wjones127 If you could take a look, thanks.

@wjones127 wjones127 self-requested a review February 6, 2023 19:08
Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this! It would be nice to have at least one example with nested row data.

I'd suggest either adding another test, or to be more complete it would be interesting to see how the rapidjson example would be changed to use this.

cpp/src/arrow/util/rows_to_batches.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/rows_to_batches.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/rows_to_batches.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/rows_to_batches.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/rows_to_batches.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/rows_to_batches.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/rows_to_batches.h Outdated Show resolved Hide resolved
gringasalpastor and others added 9 commits February 9, 2023 10:29
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
@wjones127 wjones127 self-requested a review February 17, 2023 17:16
Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor changes, then I think this is ready to go.

cpp/src/arrow/util/rows_to_batches.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/rows_to_batches_test.cc Outdated Show resolved Hide resolved
gringasalpastor and others added 3 commits February 17, 2023 13:09
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
@gringasalpastor
Copy link
Contributor Author

Just two minor changes, then I think this is ready to go.

Committed the last 2 suggested changes, let me know if there is anything else you would like to see updated @wjones127

@trxcllnt trxcllnt removed their request for review February 17, 2023 19:23
@westonpace westonpace removed their request for review February 17, 2023 20:16
@wjones127
Copy link
Member

@gringasalpastor There seems to be one CI failure that is legitimate. Could you address that?

/arrow/cpp/src/arrow/util/rows_to_batches.h:107: error: The following parameters of arrow::util::RowsToBatches(const std::shared_ptr< Schema > &schema, const Range &rows, DataPointConvertor &&data_point_convertor, RowAccessor &&row_accessor=detail::MakeDefaultRowAccessor(), MemoryPool *pool=default_memory_pool(), const std::size_t batch_size=1024) are not documented:
  parameter 'pool'
  parameter 'batch_size' (warning treated as error, aborting now)
1

@gringasalpastor
Copy link
Contributor Author

@gringasalpastor There seems to be one CI failure that is legitimate. Could you address that?

/arrow/cpp/src/arrow/util/rows_to_batches.h:107: error: The following parameters of arrow::util::RowsToBatches(const std::shared_ptr< Schema > &schema, const Range &rows, DataPointConvertor &&data_point_convertor, RowAccessor &&row_accessor=detail::MakeDefaultRowAccessor(), MemoryPool *pool=default_memory_pool(), const std::size_t batch_size=1024) are not documented:
  parameter 'pool'
  parameter 'batch_size' (warning treated as error, aborting now)
1

@wjones127 Thanks, I just pushed a new commit to add documentation for these.

Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjones127 wjones127 merged commit 7828165 into apache:main Feb 21, 2023
@ursabot
Copy link

ursabot commented Feb 22, 2023

Benchmark runs are scheduled for baseline = 92d91f5 and contender = 7828165. 7828165 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.79% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.83% ⬆️0.06%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 7828165f ec2-t3-xlarge-us-east-2
[Failed] 7828165f test-mac-arm
[Finished] 7828165f ursa-i9-9960x
[Finished] 7828165f ursa-thinkcentre-m75q
[Finished] 92d91f53 ec2-t3-xlarge-us-east-2
[Finished] 92d91f53 test-mac-arm
[Finished] 92d91f53 ursa-i9-9960x
[Finished] 92d91f53 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Feb 24, 2023
… row-based structure into an `arrow::RecordBatchReader` or an `arrow::Table` (apache#34057)

*Are these changes tested?*

The following tests are provided:
- basic usage
- const ranges
- custom struct accessor
- usage with `std::variant`

* Closes: apache#34056

Lead-authored-by: Mike Hancock <mhancock34@bloomberg.net>
Co-authored-by: Michael Hancock <javaiscoolmike@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants