-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-2919][VL]Fix incorrect hive scan fallback or offload for velox #2922
Conversation
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
if (!hasComplexType) { | ||
ValidationResult.ok | ||
} else { | ||
ValidationResult.notOk("does not support complex type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Gluten does not support complex type for ORC format" better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
// LocalRelation will exercise the optimization rules better by disabling it as | ||
// this rule may potentially block testing of other optimization rules such as | ||
// ConstantPropagation etc. | ||
.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a new trait e.g., GlutenQueryHiveSuiteTrait
and make VeloxParquetWriteForHiveSuite
extend this as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VeloxParquetWriteForHiveSuite
is in a different module backends-velox
and these test utils are not passed as dependency in these modules. But we can add a GlutenHiveSQLQuerySuiteTrait
for Hive related test cases later.
spark.sessionState.catalog.dropTable( | ||
TableIdentifier("test_orc"), | ||
ignoreIfNotExists = true, | ||
purge = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hint: use purge = true
avoiding using Trash directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is following vanilla spark UTs like ParquetQuerySuite
/OrcQuerySuite
, etc.
purge = false) | ||
} | ||
|
||
test("hive orc scan") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, duplicated test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
Thanks for updates. LGTM. |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verified from customer queries
Hi binwei, this PR failed some CK UTs as changed some code in common path. I need do a update today. Please pending on merge. |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
ValidationResult.notOk("does not support complex type") | ||
} | ||
case _ => ValidationResult.notOk("Unknown file format") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above check is specific for ClickHouse backend so moved to corresponding backend API instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC: @zzcclp this piece of code moved to CK backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@zhouyuan please help take a look. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for improving on this!
} | ||
for (unsupportedDataType <- unsupportedDataTypes) { | ||
// scalastyle:off println | ||
println( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the println
for debug purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. This is necessary to print out the fallback reason directly as the info hard to pass back to FallbackReporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use log to print the message?
@taiyang-li @lgbo-ustc please help to review thanks. |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
Due to some code change for
HiveTableScanExecTransformer
, like PR and PR, etc, scan offloading behavior becomes incorrect, such as ORC format scan can't be offloaded anymore. The root cause is that some ClickHouse format/datatype check logic is added in this Op which is supposed to add in CKBackend API instead. This PR intends to fix it.How was this patch tested?
UT