Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Why are there 3 types of OffsetSizeTraits? #1638

Closed
HaoYang670 opened this issue May 2, 2022 · 5 comments
Closed

Question: Why are there 3 types of OffsetSizeTraits? #1638

HaoYang670 opened this issue May 2, 2022 · 5 comments
Labels
question Further information is requested

Comments

@HaoYang670
Copy link
Contributor

Which part is this question about
In our code, there are 3 different OffsetSizeTrait: OffsetSizeTrait, BinaryOffsetSizeTrait and StringOffsetSizeTrait. The only difference between them is that BinaryOffsetSizeTrait and StringOffsetSizeTrait have a const dataType.

My questions are:

  1. Why do we need three of them? Do we have to declare dataType as a const?
  2. Could we just use one OffsetSizeTrait and get the dataType like this:
pub struct GenericBinaryArray<OffsetSize: OffsetSizeTrait> {
    data: ArrayData,
    value_offsets: RawPtrBox<OffsetSize>,
    value_data: RawPtrBox<u8>,
}

impl<OffsetSize: OffsetSizeTrait> GenericBinaryArray<OffsetSize> {
    ...
    pub fn get_datatype() -> DataType {
        if OffsetSize::is_large() {
            DataType::LargeBinary
        } else {
            DataType::Binary
        }
    }
    ...
}
@HaoYang670 HaoYang670 added the question Further information is requested label May 2, 2022
@tustvold
Copy link
Contributor

tustvold commented May 3, 2022

I don't see an obvious reason why they need to be separate traits, especially given that get_datatype() could be a const fn...

Perhaps @alamb or @jorgecarleitao remember some context of why these are the way they are?

@alamb
Copy link
Contributor

alamb commented May 4, 2022

Perhaps @alamb or @jorgecarleitao remember some context of why these are the way they are?

They pre-date my time with the project. Maybe @nevi-me or @andygrove or @paddyhoran have some memories or context to share

@jorgecarleitao
Copy link
Member

I introduced them. At the time BinaryArray, LargeBinaryArray, StringArray and LargeStringArray were 4 different structs. The OffsetSizes were introduced to simplify the code via generics. At the time I did not realize that it could be even simpler. I agree that it can be simplified (arrow2 has only one).

@HaoYang670
Copy link
Contributor Author

So I guess we can simplify the code by just reserving one offsetTrait and adding function const fn get_datatype() for GenericBInaryArray and GenericStringArray to get the DataType of the array.

@HaoYang670
Copy link
Contributor Author

Close this issue as #1645 has been merged! Thank you for your all opinions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants