Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-32949: [Go] REE Array IPC read/write #14223

Merged
merged 25 commits into from
Feb 6, 2023

Conversation

zeroshade
Copy link
Member

@zeroshade zeroshade commented Sep 23, 2022

@zeroshade
Copy link
Member Author

CC @zagto

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #32949 has been automatically assigned in GitHub to PR creator.

@zeroshade zeroshade changed the title GH-32949: [Go] RLE Array IPC read/write GH-32949: [Go] REE Array IPC read/write Jan 30, 2023
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions, mostly probably from my lack of knowledge around the integration tests.

dev/archery/archery/integration/datagen.py Outdated Show resolved Hide resolved
dev/archery/archery/integration/datagen.py Show resolved Hide resolved
dev/archery/archery/integration/datagen.py Show resolved Hide resolved
def __init__(self, name, bit_width, *, nullable=False,
metadata=None):
super().__init__(name, is_signed=True, bit_width=bit_width,
nullable=nullable, metadata=metadata, min_value=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why min_value=1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i mentioned in my comment about starting on a non-zero value, it's technically meaningless and incorrect to start on a 0 value for run-ends, the min-value should be 1 since run-ends are always 1 past the last index. Run-ends should never start on a 0.

@zeroshade
Copy link
Member Author

I'll merge this EOD if no one has any objections.

// RunEnds: [ 1, 2, 4, 6, 10, 1000, 1750, 2000 ]
// Values: [ "a", "b", "c", "d", "e", "f", "g", "h" ]
//
// LogicalValuesArray will return the following array:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ implementation has an iterator class that allows a zero-copy iteration over the runs that match the logical (offset, length) slice. Do you have a similar thing in Go or is this function used instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used for producing the slice that is used for writing the IPC data. It's a zero-copy slice.

In encoded.go there's MergedRuns which provides a similar function to your iterator class which provides zero-copy iteration of runs using offset/length (and for finding common runs between two REE arrays). The only a time a copy happens is when calling LogicalRunEndsArray when the offset is non-zero because it needs to modify the actual run-end values to be zero-adjusted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeroshade zeroshade merged commit 9b4c972 into apache:master Feb 6, 2023
@ursabot
Copy link

ursabot commented Feb 7, 2023

Benchmark runs are scheduled for baseline = 9e7b79b and contender = 9b4c972. 9b4c972 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.37% ⬆️0.06%] test-mac-arm
[Finished ⬇️0.26% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.19% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 9b4c9724 ec2-t3-xlarge-us-east-2
[Finished] 9b4c9724 test-mac-arm
[Finished] 9b4c9724 ursa-i9-9960x
[Finished] 9b4c9724 ursa-thinkcentre-m75q
[Finished] 9e7b79b3 ec2-t3-xlarge-us-east-2
[Finished] 9e7b79b3 test-mac-arm
[Finished] 9e7b79b3 ursa-i9-9960x
[Finished] 9e7b79b3 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

sjperkins pushed a commit to sjperkins/arrow that referenced this pull request Feb 10, 2023
* Closes: apache#32949

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: zagto <tobias@zagorni.eu>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
gringasalpastor pushed a commit to gringasalpastor/arrow that referenced this pull request Feb 17, 2023
* Closes: apache#32949

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: zagto <tobias@zagorni.eu>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Feb 24, 2023
* Closes: apache#32949

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: zagto <tobias@zagorni.eu>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Go] REE Arrays IPC read/write
6 participants