Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FromStr for DataType / Parse DataType description #3821

Closed
2 tasks
alamb opened this issue Mar 8, 2023 · 9 comments · Fixed by #5994
Closed
2 tasks

Implement FromStr for DataType / Parse DataType description #3821

alamb opened this issue Mar 8, 2023 · 9 comments · Fixed by #5994
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Mar 8, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As part of implementing apache/datafusion#5016 in DataFusion I needed some way to convert from a string passed in by the user to a DataType.

Since we already had a function arrow_typeof that provides a useful human readable type name by calling DataType::to_string() I wanted the opposite: A way to take the output of DataType::to_string() and make a DataType

Describe the solution you'd like

I think having FromStr implementation that matches the https://docs.rs/arrow/34.0.0/arrow/datatypes/enum.DataType.html#impl-Display-for-DataType implementation for DataType would be very nice.

Example usage:

let data_type = DataType:Int32;

// use (existing) Display impl to get a string representaiton
let data_type_string = data_type.to_string();
assert_eq!(data_type_string, "Int32");

// use proposed FromStr impl to get the DataType back
let parsed_datatype: DataType = data_type_string.parse()?
assert_eq!(parsed_datatype, DataType::Int32);

Describe alternatives you've considered
@tustvold pointed out that there is already a way to encode data types in String for the IPC format https://arrow.apache.org/docs/format/CDataInterface.html#data-type-description-format-strings

While this format is (designed to be) easy to parse by computers, I don't think it is easy to parse by Humans (quick quiz, what type does tdD represent?)

Additional context
See apache/datafusion#5166 (comment)

We can probably lift the implementation from apache/datafusion#5166 into Arrow-rs

The implementation in apache/datafusion#5166 currently does not cover:

  • Parsing Timezones in Timestamp types (@waitingkuo expressed interest in this)
  • Parsing Struct, Union, Map or List types (though it does handle Dictionary)
@alamb alamb added enhancement Any new improvement worthy of a entry in the changelog arrow Changes to the arrow crate good first issue Good for newcomers labels Mar 8, 2023
@alamb
Copy link
Contributor Author

alamb commented Mar 8, 2023

I think this is a good first issue because the code and tests already exists, so this ticket would be a matter of porting the code, and adjusting the interface

@Weijun-H
Copy link
Contributor

Weijun-H commented Mar 9, 2023

This is nice extention. I want to take this ticket.

@Weijun-H
Copy link
Contributor

Weijun-H commented Mar 9, 2023

Should we also keep the part like,Int32, Int64 in arrow-rs?

@tustvold
Copy link
Contributor

Should we also keep the part like,Int32, Int64 in arrow-rs?

I'm not sure I follow what you mean, but it should be able to parse the output of DataType: Display

@Weijun-H
Copy link
Contributor

Should we also keep the part like,Int32, Int64 in arrow-rs?

I'm not sure I follow what you mean, but it should be able to parse the output of DataType: Display

Sorry for the ambiguity. I mean, should we migrate parts that have been completed in arrow-datafusion to arrow-rs?

@alamb
Copy link
Contributor Author

alamb commented Mar 10, 2023

Sorry for the ambiguity. I mean, should we migrate parts that have been completed in arrow-datafusion to arrow-rs?

that would be my suggestion

@alamb
Copy link
Contributor Author

alamb commented Mar 10, 2023

Maybe we can port the implementation in arrow-datafusion first and then add support for Timezones and struct types as follow on PRs

@opensourcegeek
Copy link
Contributor

If this issue is still open, can I take a stab at it please?

@alamb
Copy link
Contributor Author

alamb commented May 9, 2024

Yes please @opensourcegeek -- I think this would be a fairly straightforward addition (port code, add some docs and tests)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants