Skip to content

Conversation

@dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Nov 23, 2024

What changes were proposed in this pull request?

This PR adds SQL pipe syntax support for SET operator.

This operator removes one or more existing column from the input table and replaces each one with a new computed column whose value is equal to evaluating the specified expression.

This is equivalent to SELECT * EXCEPT (name), <newExpressions> AS name in the SQL compiler. It is provided as a convenience feature and some functionality overlap exists with lateral column aliases.

For example:

-- Setting with an expression.
values (0, 'pqr', 2), (3, 'tuv', 5) as tab(a, b, c)
|> set c = a + length(b);

0, 'pqr', 3
3, 'tuv', 6

Why are the changes needed?

The SQL pipe operator syntax will let users compose queries in a more flexible fashion.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 23, 2024
@dtenedor dtenedor changed the title [WIP][SPARK-49566][SQL] Add SQL pipe syntax for the SET operator [SPARK-49566][SQL] Add SQL pipe syntax for the SET operator Nov 23, 2024
@dtenedor dtenedor marked this pull request as ready for review November 23, 2024 04:03
@dtenedor
Copy link
Contributor Author

cc @cloud-fan @gengliangwang here is the |> SET operator :)

respond to code review comments

respond to code review comments

respond to code review comments
Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan for your review! Please take another look.

@dtenedor dtenedor requested a review from cloud-fan November 25, 2024 19:01
Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @cloud-fan for your reviews!!

Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gengliangwang for your review!

respond to code review comments
Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan for your review again.

@dtenedor dtenedor requested a review from cloud-fan November 27, 2024 19:15
case class UnresolvedStarExcept(target: Option[Seq[String]], excepts: Seq[Seq[String]])
case class UnresolvedStarExceptOrReplace(
target: Option[Seq[String]],
excepts: Seq[Seq[String]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The excepts here can be nested columns, so technically we can support nested columns in the SET clause. This can be a followup work as an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan yes, and also we could support SELECT * REPLACE(<column_name>, <new_expression>) as well (some other SQL engines support this). We can keep these features in mind for adding to Spark in the future.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 68be1da Nov 28, 2024
@cloud-fan
Copy link
Contributor

@HyukjinKwon @allisonwang-db shall we implement df.withColumn with this way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants