Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-32613: [C++] Simplify IPC writer for dense unions #33822

Merged
merged 10 commits into from
Feb 2, 2023

Conversation

rtadepalli
Copy link
Contributor

@rtadepalli rtadepalli commented Jan 21, 2023

JIRA: https://issues.apache.org/jira/browse/ARROW-17339
Closes: #32613

Rationale for this change

Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved.

What changes are included in this PR?

Just a change to eliminate looping over the size of a DenseUnionArray twice.

Are these changes tested?

I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice.

Are there any user-facing changes?

There are no user facing changes for this.

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@kou kou changed the title ARROW-17339: [C++] Simplify IPC writer for dense unions GH-32613: [C++] Simplify IPC writer for dense unions Jan 22, 2023
@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #32613 has been automatically assigned in GitHub to PR creator.

@rtadepalli
Copy link
Contributor Author

Hello @kou, I have fixed the broken clang-format test. Thanks.

@kou
Copy link
Member

kou commented Jan 24, 2023

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: d787277

Submitted crossbow builds: ursacomputing/crossbow @ actions-f892614f66

Task Status
test-alpine-linux-cpp Github Actions
test-build-cpp-fuzz Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp Github Actions
test-debian-10-cpp-amd64 Github Actions
test-debian-10-cpp-i386 Github Actions
test-debian-11-cpp-amd64 Github Actions
test-debian-11-cpp-i386 Github Actions
test-fedora-35-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-20 Github Actions
test-ubuntu-20.04-cpp-bundled Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-22.04-cpp Github Actions

@kou
Copy link
Member

kou commented Jan 24, 2023

Do existing tests cover this case?

@rtadepalli
Copy link
Contributor Author

I think they should. I was looking at the implementation in Go as a reference, and it doesn't seem like the PR that introduced the change (#13806) added any additional tests for finding the offset. I am of the idea right now that existing tests should break if this change is wrong. Please let me know if this is not true, and a unit test is warranted.

@kou
Copy link
Member

kou commented Jan 30, 2023

If you add a bug in the target logic and there are tests for this case, these existing tests will be failed.
Could you try it? Or do you want me to do it?

@rtadepalli
Copy link
Contributor Author

But won't those tests run as part of the CI pipeline?

I can definitely try to run them, but would need some help 😅. Would running all tests in the ipc/ package be sufficient?

@kou
Copy link
Member

kou commented Jan 31, 2023

You can just push a commit that includes a bug to this branch.
Then CI runs all existing tests with the buggy code. If there are tests that cover this case, they will be failed.

@rtadepalli
Copy link
Contributor Author

rtadepalli commented Jan 31, 2023

Hello @kou , I have introduced a bug in the code. I added a +1 to the array index while accessing shifted offsets.

@rtadepalli
Copy link
Contributor Author

Looks like arrow-ipc-read-write-test is failing with my bug 👍

I will revert the bug if the test results seem satisfactory to you.

@kou
Copy link
Member

kou commented Feb 1, 2023

Thanks.
It seems that we have a test for this case.

Could you revert the bug?

@rtadepalli
Copy link
Contributor Author

rtadepalli commented Feb 1, 2023

I have reverted the three commits in which I pushed bugs.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit b413ac4 into apache:master Feb 2, 2023
@ursabot
Copy link

ursabot commented Feb 2, 2023

Benchmark runs are scheduled for baseline = 3ab246f and contender = b413ac4. b413ac4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.55% ⬆️0.03%] test-mac-arm
[Finished ⬇️2.04% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.03% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] b413ac4f ec2-t3-xlarge-us-east-2
[Failed] b413ac4f test-mac-arm
[Finished] b413ac4f ursa-i9-9960x
[Finished] b413ac4f ursa-thinkcentre-m75q
[Finished] 3ab246f3 ec2-t3-xlarge-us-east-2
[Finished] 3ab246f3 test-mac-arm
[Finished] 3ab246f3 ursa-i9-9960x
[Finished] 3ab246f3 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Feb 2, 2023

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

sjperkins pushed a commit to sjperkins/arrow that referenced this pull request Feb 10, 2023
)

JIRA: https://issues.apache.org/jira/browse/ARROW-17339 
Closes: apache#32613

### Rationale for this change
Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved.

### What changes are included in this PR?

Just a change to eliminate looping over the size of a `DenseUnionArray` twice.

### Are these changes tested?

I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice.

### Are there any user-facing changes?

There are no user facing changes for this.

* Closes: apache#32613

Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com>
Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
gringasalpastor pushed a commit to gringasalpastor/arrow that referenced this pull request Feb 17, 2023
)

JIRA: https://issues.apache.org/jira/browse/ARROW-17339 
Closes: apache#32613

### Rationale for this change
Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved.

### What changes are included in this PR?

Just a change to eliminate looping over the size of a `DenseUnionArray` twice.

### Are these changes tested?

I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice.

### Are there any user-facing changes?

There are no user facing changes for this.

* Closes: apache#32613

Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com>
Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Feb 24, 2023
)

JIRA: https://issues.apache.org/jira/browse/ARROW-17339 
Closes: apache#32613

### Rationale for this change
Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved.

### What changes are included in this PR?

Just a change to eliminate looping over the size of a `DenseUnionArray` twice.

### Are these changes tested?

I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice.

### Are there any user-facing changes?

There are no user facing changes for this.

* Closes: apache#32613

Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com>
Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Simplify IPC writer for dense unions
3 participants