Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-2919][VL]Fix incorrect hive scan fallback or offload for velox #2922

Merged
merged 4 commits into from
Sep 25, 2023

Conversation

yma11
Copy link
Contributor

@yma11 yma11 commented Aug 28, 2023

What changes were proposed in this pull request?

Due to some code change for HiveTableScanExecTransformer, like PR and PR, etc, scan offloading behavior becomes incorrect, such as ORC format scan can't be offloaded anymore. The root cause is that some ClickHouse format/datatype check logic is added in this Op which is supposed to add in CKBackend API instead. This PR intends to fix it.

How was this patch tested?

UT

@github-actions
Copy link

#2919

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

if (!hasComplexType) {
ValidationResult.ok
} else {
ValidationResult.notOk("does not support complex type")
Copy link
Contributor

@winningsix winningsix Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Gluten does not support complex type for ORC format" better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

// LocalRelation will exercise the optimization rules better by disabling it as
// this rule may potentially block testing of other optimization rules such as
// ConstantPropagation etc.
.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a new trait e.g., GlutenQueryHiveSuiteTrait and make VeloxParquetWriteForHiveSuite extend this as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VeloxParquetWriteForHiveSuite is in a different module backends-velox and these test utils are not passed as dependency in these modules. But we can add a GlutenHiveSQLQuerySuiteTrait for Hive related test cases later.

spark.sessionState.catalog.dropTable(
TableIdentifier("test_orc"),
ignoreIfNotExists = true,
purge = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hint: use purge = true avoiding using Trash directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is following vanilla spark UTs like ParquetQuerySuite/OrcQuerySuite, etc.

purge = false)
}

test("hive orc scan") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, duplicated test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

1 similar comment
@github-actions
Copy link

Run Gluten Clickhouse CI

@winningsix
Copy link
Contributor

Thanks for updates. LGTM.

@github-actions
Copy link

Run Gluten Clickhouse CI

FelixYBW
FelixYBW previously approved these changes Sep 19, 2023
Copy link
Contributor

@FelixYBW FelixYBW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified from customer queries

@yma11
Copy link
Contributor Author

yma11 commented Sep 20, 2023

verified from customer queries

Hi binwei, this PR failed some CK UTs as changed some code in common path. I need do a update today. Please pending on merge.

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

ValidationResult.notOk("does not support complex type")
}
case _ => ValidationResult.notOk("Unknown file format")
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above check is specific for ClickHouse backend so moved to corresponding backend API instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC: @zzcclp this piece of code moved to CK backend

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yma11
Copy link
Contributor Author

yma11 commented Sep 23, 2023

@zhouyuan please help take a look. Thanks.

Copy link
Contributor

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for improving on this!

}
for (unsupportedDataType <- unsupportedDataTypes) {
// scalastyle:off println
println(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the println for debug purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. This is necessary to print out the fallback reason directly as the info hard to pass back to FallbackReporter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use log to print the message?

@yma11 yma11 changed the title [GLUTEN-2919][VL]Support orc format in hive scan [GLUTEN-2919][VL]Fix incorrect hive scan fallback or offload caused by code refactor Sep 25, 2023
@yma11 yma11 changed the title [GLUTEN-2919][VL]Fix incorrect hive scan fallback or offload caused by code refactor [GLUTEN-2919][VL]Fix incorrect hive scan fallback or offload for velox Sep 25, 2023
@zzcclp
Copy link
Contributor

zzcclp commented Sep 25, 2023

@taiyang-li @lgbo-ustc please help to review thanks.

@lgbo-ustc
Copy link
Contributor

LGTM

Copy link
Contributor

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zhouyuan zhouyuan merged commit 4bcd061 into apache:main Sep 25, 2023
19 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_2922_time.csv log/native_master_09_24_2023_e7152a05a_time.csv difference percentage
q1 43.42 43.08 -0.339 99.22%
q2 24.76 24.37 -0.392 98.42%
q3 37.82 36.77 -1.047 97.23%
q4 41.56 41.00 -0.566 98.64%
q5 71.74 70.15 -1.586 97.79%
q6 6.69 5.11 -1.584 76.32%
q7 86.68 86.15 -0.536 99.38%
q8 82.41 78.81 -3.597 95.64%
q9 115.76 115.81 0.056 100.05%
q10 46.01 46.15 0.139 100.30%
q11 18.77 19.57 0.805 104.29%
q12 25.41 25.39 -0.020 99.92%
q13 49.65 50.31 0.661 101.33%
q14 17.78 19.95 2.171 112.21%
q15 30.94 30.64 -0.299 99.03%
q16 16.10 16.03 -0.071 99.56%
q17 120.85 120.09 -0.765 99.37%
q18 162.48 164.22 1.741 101.07%
q19 14.39 12.51 -1.881 86.93%
q20 30.12 30.33 0.214 100.71%
q21 237.74 238.20 0.467 100.20%
q22 15.76 15.33 -0.434 97.25%
total 1296.84 1289.98 -6.863 99.47%

@yma11 yma11 deleted the orc-hive branch January 10, 2024 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants