Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41726][SQL] Remove OptimizedCreateHiveTableAsSelectCommand #39263

Closed
wants to merge 3 commits into from

Conversation

ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

This pr removes OptimizedCreateHiveTableAsSelectCommand and move the code that tune InsertIntoHiveTable to InsertIntoHadoopFsRelationCommand into RelationConversions.

Why are the changes needed?

CTAS use a nested execution to do data writing, so it is unnecessary to have OptimizedCreateHiveTableAsSelectCommand. The inside InsertIntoHiveTable would be converted to InsertIntoHadoopFsRelationCommand if possible.

Does this PR introduce any user-facing change?

no

How was this patch tested?

fix test

@github-actions github-actions bot added the SQL label Dec 28, 2022
assert(commands.head.nodeName == "Execute CreateHiveTableAsSelectCommand")

val v1WriteCommand = commands(1)
if (isConverted && isConvertedCtas) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the removed test in HiveExplainSuite.scala has been covered in this test.

@ulysses-you
Copy link
Contributor Author

cc @cloud-fan

// `InsertIntoHiveTable` is derived from `CreateHiveTableAsSelectCommand`.
// This pattern would not cause conflicts because this rule is always applied before
// `HiveAnalysis` and both of these rules are running once.
case InsertIntoHiveTable(tableDesc, _, query, overwrite, ifPartitionNotExists, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, how do we optimize hive table insertion before? Given there is no case for InsertIntoHiveTable before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, we optimize hive table insertion inside OptimizedCreateHiveTableAsSelectCommand. It returns InsertIntoHadoopFsRelationCommand in getWritingCommand. Now, CreateHiveTableAsSelectCommand always use InsertIntoHiveTable for data writing and we match this pattern to do optimize.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean normal table insertion, not table insertion inside CTAS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we match InsertIntoStatement and covert hive relation to HadoopFsRelation then datasource can tune it to InsertIntoHadoopFsRelationCommand. see line 223

That said, we can make CreateHiveTableAsSelectCommand use InsertIntoStatement instead of InsertIntoHiveTable but one issue is we can not aware of if the InsertIntoStatement is from CTAS. So here I use InsertIntoHiveTable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, can we make the code comments clearer? This only matches table insertion inside Hive CTAS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated comment

@ulysses-you ulysses-you force-pushed the SPARK-41726 branch 2 times, most recently from 00cf68f to 280c255 Compare December 29, 2022 01:14
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41726][SQL] Remove OptimizedCreateHiveTableAsSelectCommand [SPARK-41726][SQL] Remove OptimizedCreateHiveTableAsSelectCommand Dec 30, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master for Apache Spark 3.4.
Thank you, @ulysses-you and @cloud-fan .

@ulysses-you
Copy link
Contributor Author

thank you @dongjoon-hyun and @cloud-fan !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants