Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36417: [C++] Add Buffer::data_as, Buffer::mutable_data_as #36418

Merged
merged 5 commits into from
Jul 12, 2023

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented Jun 30, 2023

Rationale for this change

There's a lot of boilerplate in casting buffer bytes. A templated accessor will decrease that.

What changes are included in this PR?

  • {Buffer, BufferBuilder, BufferSpan}::{data_as<T>, mutable_data_as<T>}
  • rewriting a few files' worth of reinterpret_casts to use these accessors to showcase usage
  • Buffer::FromVector, a std::vector analog to Buffer::FromString
  • some tangential cleanup of unnecessary copies in the Array classes

Are these changes tested?

Not really. The rewritten files exercise the new accessors

Are there any user-facing changes?

No, but I've noticed Buffer::operator[] again and that should probably be deprecated.

if (!dictionary_) {
// FIXME this isn't thread safe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File issue and mention it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or move dictionary_ init to constructor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomic shared pointer tent to use mutex (folly has implement one using atomic swap). I guess it may hurts the performance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, though an uncontended mutex should be reasonably fast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up filed here #36503

will update the comment to

Suggested change
// FIXME this isn't thread safe
// TODO(GH-36503) this isn't thread safe

cpp/src/arrow/buffer.h Outdated Show resolved Hide resolved
cpp/src/arrow/buffer.h Outdated Show resolved Hide resolved
cpp/src/arrow/buffer.h Outdated Show resolved Hide resolved
Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 6, 2023
@bkietz bkietz requested a review from wgtmac as a code owner July 6, 2023 21:12
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 6, 2023
cpp/src/arrow/buffer.h Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review awaiting merge Awaiting merge labels Jul 11, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 11, 2023
@pitrou
Copy link
Member

pitrou commented Jul 12, 2023

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 5946f60

Submitted crossbow builds: ursacomputing/crossbow @ actions-eb005cb286

Task Status
test-alpine-linux-cpp Github Actions
test-build-cpp-fuzz Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp Github Actions
test-debian-11-cpp-amd64 Github Actions
test-debian-11-cpp-i386 Github Actions
test-fedora-35-cpp Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-bundled Github Actions
test-ubuntu-20.04-cpp-minimal-with-formats Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-22.04-cpp Github Actions
test-ubuntu-22.04-cpp-20 Github Actions

@pitrou pitrou merged commit 95a8bfb into apache:main Jul 12, 2023
33 of 34 checks passed
@pitrou pitrou removed the awaiting change review Awaiting change review label Jul 12, 2023
@bkietz bkietz deleted the 36417-buffer-data-as branch July 12, 2023 13:06
chelseajonesr pushed a commit to chelseajonesr/arrow that referenced this pull request Jul 20, 2023
…pache#36418)

### Rationale for this change

There's a lot of boilerplate in casting buffer bytes. A templated accessor will decrease that.

### What changes are included in this PR?

- `{Buffer, BufferBuilder, BufferSpan}::{data_as<T>, mutable_data_as<T>}`
- rewriting a few files' worth of reinterpret_casts to use these accessors to showcase usage
- `Buffer::FromVector`, a std::vector analog to Buffer::FromString
- some tangential cleanup of unnecessary copies in the Array classes

### Are these changes tested?

Not really. The rewritten files exercise the new accessors

### Are there any user-facing changes?

No, but I've noticed `Buffer::operator[]` again and that should probably be deprecated.

* Closes: apache#36417

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 95a8bfb.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this pull request Aug 20, 2023
…pache#36418)

### Rationale for this change

There's a lot of boilerplate in casting buffer bytes. A templated accessor will decrease that.

### What changes are included in this PR?

- `{Buffer, BufferBuilder, BufferSpan}::{data_as<T>, mutable_data_as<T>}`
- rewriting a few files' worth of reinterpret_casts to use these accessors to showcase usage
- `Buffer::FromVector`, a std::vector analog to Buffer::FromString
- some tangential cleanup of unnecessary copies in the Array classes

### Are these changes tested?

Not really. The rewritten files exercise the new accessors

### Are there any user-facing changes?

No, but I've noticed `Buffer::operator[]` again and that should probably be deprecated.

* Closes: apache#36417

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Add Buffer::data_as
5 participants