New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41726][SQL] Remove OptimizedCreateHiveTableAsSelectCommand
#39263
Conversation
assert(commands.head.nodeName == "Execute CreateHiveTableAsSelectCommand") | ||
|
||
val v1WriteCommand = commands(1) | ||
if (isConverted && isConvertedCtas) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the removed test in HiveExplainSuite.scala has been covered in this test.
cc @cloud-fan |
384a937
to
b91971b
Compare
// `InsertIntoHiveTable` is derived from `CreateHiveTableAsSelectCommand`. | ||
// This pattern would not cause conflicts because this rule is always applied before | ||
// `HiveAnalysis` and both of these rules are running once. | ||
case InsertIntoHiveTable(tableDesc, _, query, overwrite, ifPartitionNotExists, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, how do we optimize hive table insertion before? Given there is no case for InsertIntoHiveTable
before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, we optimize hive table insertion inside OptimizedCreateHiveTableAsSelectCommand
. It returns InsertIntoHadoopFsRelationCommand
in getWritingCommand
. Now, CreateHiveTableAsSelectCommand
always use InsertIntoHiveTable
for data writing and we match this pattern to do optimize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean normal table insertion, not table insertion inside CTAS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we match InsertIntoStatement
and covert hive relation to HadoopFsRelation then datasource can tune it to InsertIntoHadoopFsRelationCommand
. see line 223
That said, we can make CreateHiveTableAsSelectCommand
use InsertIntoStatement
instead of InsertIntoHiveTable
but one issue is we can not aware of if the InsertIntoStatement
is from CTAS. So here I use InsertIntoHiveTable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, can we make the code comments clearer? This only matches table insertion inside Hive CTAS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated comment
00cf68f
to
280c255
Compare
280c255
to
8db2dbe
Compare
OptimizedCreateHiveTableAsSelectCommand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master for Apache Spark 3.4.
Thank you, @ulysses-you and @cloud-fan .
thank you @dongjoon-hyun and @cloud-fan ! |
What changes were proposed in this pull request?
This pr removes
OptimizedCreateHiveTableAsSelectCommand
and move the code that tuneInsertIntoHiveTable
toInsertIntoHadoopFsRelationCommand
intoRelationConversions
.Why are the changes needed?
CTAS use a nested execution to do data writing, so it is unnecessary to have
OptimizedCreateHiveTableAsSelectCommand
. The insideInsertIntoHiveTable
would be converted toInsertIntoHadoopFsRelationCommand
if possible.Does this PR introduce any user-facing change?
no
How was this patch tested?
fix test