Skip to content

Spark: Parse sort orders with zorder expressions#5185

Closed
rdblue wants to merge 2 commits intoapache:masterfrom
rdblue:spark-parse-sort-order-zorder
Closed

Spark: Parse sort orders with zorder expressions#5185
rdblue wants to merge 2 commits intoapache:masterfrom
rdblue:spark-parse-sort-order-zorder

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Jul 3, 2022

This updates the RewriteDataFilesProcedure to directly call the parser to parse sort orders, instead of creating an ALTER TABLE statement and extracting the sort order.

This also makes a few updates to allow creating a Term from a zorder expression when parsed by Spark, so that the parser can pass back a zorder sort order representation (using RawSortField because SortOrder cannot contain a Zorder expression).

@rdblue
Copy link
Contributor Author

rdblue commented Jul 3, 2022

@ajantha-bhat, this demonstrates how to use the Spark extensions parser to parse a Zorder expression. This also updates the current procedure to use the parser directly. You can use this for your update to rewrite_data_files to support Zorder.

@ajantha-bhat
Copy link
Member

@rdblue : Thank you so much. I will update my PR with this change and add more test cases and a co-authored by message.

import java.util.List;

class SortOrderParserUtil {
public class Zorder implements Term {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this implement UnboundTerm?

Because SortOrder only accepts UnboundTerm. https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/SortOrder.java#L258

Also, Now sortOrder needs some extra array or list that says which of the fields are Zordered in the sortOrder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. UnboundTerm expects to have a single ref.

@ajantha-bhat
Copy link
Member

I have updated my PR as per this.
#4902

@rdblue
Copy link
Contributor Author

rdblue commented Jul 5, 2022

Closing this in favor of #4902.

@rdblue rdblue closed this Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants