Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ergonomics of parquet::basic::LogicalType #1554

Closed
tustvold opened this issue Apr 13, 2022 · 5 comments · Fixed by #1612
Closed

Improve ergonomics of parquet::basic::LogicalType #1554

tustvold opened this issue Apr 13, 2022 · 5 comments · Fixed by #1612
Labels
good first issue Good for newcomers parquet Changes to the parquet crate

Comments

@tustvold
Copy link
Contributor

tustvold commented Apr 13, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

LogicalType is defined as

pub enum LogicalType {
    STRING(StringType),
    MAP(MapType),
    LIST(ListType),
    ENUM(EnumType),
    DECIMAL(DecimalType),
    DATE(DateType),
    TIME(TimeType),
    TIMESTAMP(TimestampType),
    INTEGER(IntType),
    UNKNOWN(NullType),
    JSON(JsonType),
    BSON(BsonType),
    UUID(UUIDType),
}

Where the inner structs are actually defined by parquet_format. This not only requires an additional crate in scope, but also a number of these are actually empty structs resulting in constructions like

LogicalType::String(Default::default)

Describe the solution you'd like

I would like to change the signature to be instead

pub enum LogicalType {
    STRING,
    MAP,
    LIST,
    ENUM,
    DECIMAL {
        scale: i32,
        precision: i32,
    },
    DATE,
    TIME {
        is_adjusted_to_u_t_c: bool,
        unit: TimeUnit,
    },
    TIMESTAMP {
        is_adjusted_to_u_t_c: bool,
        unit: TimeUnit,
    },
    INTEGER {
        bit_width: i8,
        is_signed: bool,
    },
    UNKNOWN,
    JSON,
    BSON,
    UUID,
}

I would also like it to implement Copy

Describe alternatives you've considered

We could leave the enumeration unchanged

@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers api-change Changes to the arrow API labels Apr 13, 2022
@alamb
Copy link
Contributor

alamb commented Apr 17, 2022

cc @nevi-me and @sunchao

@tustvold
Copy link
Contributor Author

We could also change the variants to be upper camel case, as is more idiomatic in Rust code 🤔

@sunchao
Copy link
Member

sunchao commented Apr 21, 2022

+1. I think the new proposal looks cleaner. Also +1 on changing to CamelCase.

@tfeda
Copy link
Contributor

tfeda commented Apr 23, 2022

Hi all, I've started implementing this. Would we want the string parser to also use CamelCase?
for example:

let message = " 
message test_schema {
    REQUIRED INT32   uint8 (INTEGER(8,false));
    REQUIRED INT32   uint16 (INTEGER(16,false));
} 
"

would be changed to

let message = " 
message test_schema {
    REQUIRED INT32   uint8 (Integer(8,false));
    REQUIRED INT32   uint16 (Integer(16,false));
} 
"

I originally came across this when starting to edit error messages, and wasn't sure if printing the enum's format or the string's format is more instructive to the user.

Ex:
general_err!("Failed to parse timeunit for Timestamp type")

@tustvold
Copy link
Contributor Author

The convention for expressing parquet schema appears to be SHOUTY CASE, I would therefore avoid making changes to the parser to not accept this. I suppose you might be able to make it case-insensitive, but I'm not sure if this is spec-compliant, we'd need to check 🤔

As for error messages, I think whatever is easiest. I personally prefer camel case as I don't like my error messages screaming at me, but the important thing is the meaning is clear 👍

@alamb alamb added parquet Changes to the parquet crate and removed enhancement Any new improvement worthy of a entry in the changelog api-change Changes to the arrow API labels Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants