Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6307: [Java] Provide RLE vector #5163

Closed
wants to merge 1 commit into from

Conversation

liyafan82
Copy link
Contributor

RLE (run length encoding) is a widely used encoding/decoding technique. Compared with other encoding/decoding techniques, it is easier to work with the encoded data.

We want to provide an RLE vector implementation in Arrow. The design details include:

  1. RleVector implements ValueVector.
  2. the data structure of RleVector includes an inner vector, plus a buffer storing the end indices for runs.
  3. we provide random access, with time complexity O(log(n)), so it should not be used frequently.
  4. In the future, we will provide iterators to access the vector in sequence.
  5. RleVector does not support update, but supports appending.
  6. In the future, we will provide encoder/decoder to efficiently transform encoded/decoded vectors.

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick glance over this PR and I think it raises several issues that it doesn't pay to tackle until the community decides to support RLE.

First I'm pretty sure getValueCount should be the decoded value count.
Second, I'm not sure the semantics of the Transfer pair are correct. How the vector is copied probably depends on if the target vector is an RleVector or whether it is just a normal type vector.

@emkornfield
Copy link
Contributor

I'm going to close this until we come to consensus on the RLE Vectors/Arrays on the mailing list.

@liyafan82
Copy link
Contributor Author

I'm going to close this until we come to consensus on the RLE Vectors/Arrays on the mailing list.

Sure. Thanks for your effort.
There are some points that require further discussing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants