Skip to content

Conversation

@mihailotim-db
Copy link
Contributor

@mihailotim-db mihailotim-db commented Oct 22, 2024

What changes were proposed in this pull request?

This PR proposed a refactor to TypeCoercion rule in order to separate logic for transformation of a single node into separate files. Refactor is doing the following:

  • Split the interface TypeCoercionBase into TypeCoercionBase and TypeCoercionHelper in separate abstract classes
  • TypeCoercionHelper contains declarations of all utility virtual methods that are later overriden in TypeCoercion and AnsiTypeCoercion
  • transform methods that rely on these virtual methods are refactored so that the check whether children have been resolved remains as is while other match cases are refactored and moved into separate objects inside TypeCoercionHelper
  • transform methods that don't depend on these virtual methods are refactored to separate files in a similar manner to previous point
  • In cases where check whether the children are resolved is wrapped into another case, that case is split into 2 cases: one that checks children and one that does the other check
  • All type coercion rules shared between TypeCoercion and AnsiTypeCoercion remain in TypeCoercionBase.
  • TypeCoercionBase inherits TypeCoercionHelper
  • TypeCoercion and AnsiTypeCoercion inherit TypeCoercionBase

This PR doesn't refactor type coercion rules that don't have a separate transform method but traverse the tree from their apply method. This will be done in a separate PRs.

Why are the changes needed?

Refactoring to support Analyzer++ effort

Does this PR introduce any user-facing change?

How was this patch tested?

Existing tests

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the SQL label Oct 22, 2024
@mihailotim-db mihailotim-db changed the title refactor [WIP] Refactor TypeCoercion and AnsiTypeCoercion to separate single node transformations Oct 22, 2024
@mihailotim-db mihailotim-db changed the title [WIP] Refactor TypeCoercion and AnsiTypeCoercion to separate single node transformations [WIP][SPARK-50068] Refactor TypeCoercion and AnsiTypeCoercion to separate single node transformations Oct 22, 2024
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/refactor_type_coercion branch from 1cfea26 to 0c1e6b2 Compare October 22, 2024 09:12
@github-actions github-actions bot added the CORE label Oct 22, 2024
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/refactor_type_coercion branch 2 times, most recently from 1fb8c74 to ed8f121 Compare October 22, 2024 09:21
@github-actions github-actions bot removed the CORE label Oct 22, 2024
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/refactor_type_coercion branch 3 times, most recently from 023c7d5 to 41cd1ab Compare October 22, 2024 10:54
@mihailotim-db mihailotim-db changed the title [WIP][SPARK-50068] Refactor TypeCoercion and AnsiTypeCoercion to separate single node transformations [SPARK-50068] Refactor TypeCoercion and AnsiTypeCoercion to separate single node transformations Oct 22, 2024
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/refactor_type_coercion branch 3 times, most recently from d4de3f0 to 945c86a Compare October 22, 2024 14:19
Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor!! This is covered by existing tests. The PR generally looks good, just a couple of naming and comment changes.

Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after responding to remaining comments

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/refactor_type_coercion branch 2 times, most recently from e94a12f to 1b6fd9d Compare October 23, 2024 07:36
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR contains to much changes. Let's split it, and make the changes step-by-step. @cloud-fan @dongjoon-hyun @HyukjinKwon WDYT?

@cloud-fan cloud-fan changed the title [SPARK-50068] Refactor TypeCoercion and AnsiTypeCoercion to separate single node transformations [SPARK-50068][SQL] Refactor TypeCoercion and AnsiTypeCoercion to separate single node transformations Oct 23, 2024
@HyukjinKwon
Copy link
Member

yeah

@mihailotim-db
Copy link
Contributor Author

Reverted renaming changes

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/refactor_type_coercion branch from 5c727f6 to 3457a9b Compare October 25, 2024 14:37
This reverts commit 1b6fd9d8b3f366df9582bda467e72aa95eecb740.
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/refactor_type_coercion branch from 3457a9b to 9b85a74 Compare October 28, 2024 08:58
Comment on lines +144 to +149
def apply(expression: Expression): Expression = {
decimalAndDecimal()
.orElse(integralAndDecimalLiteral)
.orElse(nondecimalAndDecimal(conf.literalPickMinimumPrecision))
.lift(expression).getOrElse(expression)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I changed all partial functions to regular apply methods. For consistency, I also changed this DecimalPrecision method from partial to regular. Can we do it this way?

import org.apache.spark.sql.connector.catalog.procedures.BoundProcedure
import org.apache.spark.sql.types.DataType

abstract class TypeCoercionBase extends TypeCoercionHelper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to confirm, do we just move it from TypeCoercion.scala to here without any change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly

* Type coercion helper that matches against function expression in order to type coerce function
* argument types to expected types.
*/
object FunctionArgumentTypeCoercion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are they in the helper instead of individual files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because these objects depend on methods that are defined in Helper but are only implemented in concrete TypeCoercion objects. This makes it so that the inheritance chain needs to be Helper -> FunctionArgumentTypeCoercion -> TypeCoercion/AnsiTypeCoercion. So I think all these helper objects need to be bunched into a single class so I just left them at the root as it seemed to be the cleanest solution. Don't really love it, so if you have any suggestions we should change it.

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/refactor_type_coercion branch from 2184ad5 to d1e0e6f Compare October 30, 2024 07:05
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 18bbddd Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants