Skip to content

Conversation

@jorgecarleitao
Copy link
Member

This adds a check, during logical planning, to the uniqueness of column names after a projection and aggregation.

This eliminates the risk that a user creates a wrong logical query by the first column named X when there are two columns named X.

This does not eliminate this risk when the column name is duplicated on a scan.

@github-actions
Copy link

github-actions bot commented Oct 3, 2020

@jorgecarleitao jorgecarleitao marked this pull request as draft October 3, 2020 21:22
@jorgecarleitao
Copy link
Member Author

The tests are currently failing due to ARROW-10170

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is good (and the well worded error message will be appreciated by users for years to come!)

The only thing I could think of would be to apply the validate_unique_names check to all logical plan nodes (specifically, I think there might be issues if an input table scan or parquet file) had duplicate field names.

However, this code is better than what is on master so I think it could be merged as is too

@jorgecarleitao jorgecarleitao marked this pull request as ready for review October 5, 2020 04:17
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants