Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GetIndexedFieldExpr adding utf8 key based access for struct v… #1204

Merged
merged 2 commits into from
Nov 2, 2021

Conversation

Igosuki
Copy link
Contributor

@Igosuki Igosuki commented Oct 30, 2021

…alues

Which issue does this PR close?

Closes #119

Rationale for this change

Adds struct value support to select nested fields. With this, users can access arbitrary nested values in lists and structs which should cover most uses cases.

What changes are included in this PR?

GetIndexedFieldExpr gets StructArray support, allowing select struct_col["bar"][0] from structs

Are there any user-facing changes?

Nope

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Oct 30, 2021
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Igosuki !

@houqp houqp added sql SQL Planner enhancement New feature or request labels Nov 1, 2021
@Igosuki
Copy link
Contributor Author

Igosuki commented Nov 1, 2021

@houqp Maybe I should add this feature to the docs ? https://arrow.apache.org/datafusion/user-guide/sql/index.html
Edit: actually not a good idea, because the arrow parquet reader currently doesn't support nested data structures. I came upon it when trying to do the same query I do for avro files over parquet files.

@houqp
Copy link
Member

houqp commented Nov 2, 2021

@Igosuki i think it's a good idea to add it to the sql reference actually. The fact that parquet reader doesn't support nested types is not relevant to sql reference. I think it should work just fine with JSON reader at least?

@Igosuki
Copy link
Contributor Author

Igosuki commented Nov 2, 2021

@houqp Since the JSON reader supports nested data yes it should work without issues.

ctx.register_table("structs", table_a)?;

// Original column is micros, convert to millis and check timestamp
let sql = "SELECT some_struct[\"bar\"] as l0 FROM structs LIMIT 3";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so cool!

@alamb
Copy link
Contributor

alamb commented Nov 2, 2021

Filed #1222 to track doc update

@alamb
Copy link
Contributor

alamb commented Nov 2, 2021

Thanks again @Igosuki !

@alamb alamb merged commit 6a7dbbb into apache:master Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SQL support for referencing fields in structs
3 participants