Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Enable array_repeat & array_except function #4504

Merged
merged 2 commits into from
Feb 20, 2024

Conversation

PHILO-HE
Copy link
Contributor

@PHILO-HE PHILO-HE commented Jan 24, 2024

What changes were proposed in this pull request?

These functions should have been supported: array_repeat, array_except, arrray_distinct, array_position. Two of them only need few code to get them correctly mapped to velox function. Also added tests to verify they are offoaded.

How was this patch tested?

Spark UT & new test.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@PHILO-HE
Copy link
Contributor Author

PHILO-HE commented Feb 1, 2024

A velox PR is fixing the test issue reported for array_repeat: facebookincubator/velox#8630.

�[31m- SPARK-36753: ArrayExcept should handle duplicated Double.NaN and Float.Nan *** FAILED ***�[0m
2024-01-24T07:34:28.7280340Z �[31m  Incorrect evaluation: array_except([NaN,1.0], [NaN]), actual: WrappedArray(NaN, 1.0), expected: List(1.0) (GlutenTestsTrait.scala:288)�[0m
�[31m- ArrayRepeat *** FAILED ***�[0m
2024-01-24T07:34:19.3744283Z �[31m  org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 347.0 failed 1 times, most recent failure: Lost task 1.0 in stage 347.0 (TID 695) (8c96ae9a310d executor driver): io.glutenproject.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxUserError�[0m
2024-01-24T07:34:19.3746335Z �[31mError Source: USER�[0m
2024-01-24T07:34:19.3746817Z �[31mError Code: INVALID_ARGUMENT�[0m
2024-01-24T07:34:19.3747669Z �[31mReason: (-1 vs. 0) Count argument of repeat function must be greater than or equal to 0�[0m
2024-01-24T07:34:19.3748471Z �[31mRetriable: False�[0m
2024-01-24T07:34:19.3748917Z �[31mExpression: count >= 0�[0m
2024-01-24T07:34:19.3749508Z �[31mContext: repeat(hi:VARCHAR, -1:INTEGER)�[0m
2024-01-24T07:34:19.3750128Z �[31mTop-Level Context: Same as context.�[0m
2024-01-24T07:34:19.3750667Z �[31mFunction: checkCount�[0m
2024-01-24T07:34:19.3751222Z �[31mFile: ../../velox/functions/prestosql/Repeat.cpp�[0m
2024-01-24T07:34:19.3751796Z �[31mLine: 52�[0m
2024-01-24T07:34:19.3752264Z �[31mStack trace:�[0m

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@PHILO-HE PHILO-HE merged commit 740746e into apache:main Feb 20, 2024
18 of 19 checks passed
ENCODE -> EncodeDecodeValidator()
ENCODE -> EncodeDecodeValidator(),
ARRAY_EXCEPT -> DefaultValidator(),
ARRAY_REPEAT -> DefaultValidator()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @taiyang-li, I directly changed this for CH backend to let these two unsupported functions fall back.

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4504_time.csv log/native_master_02_19_2024_72960de81_time.csv difference percentage
q1 34.49 33.43 -1.054 96.94%
q2 24.41 24.73 0.322 101.32%
q3 37.26 37.94 0.684 101.84%
q4 37.21 37.71 0.501 101.35%
q5 70.14 70.80 0.659 100.94%
q6 7.38 7.31 -0.074 98.99%
q7 84.36 82.80 -1.559 98.15%
q8 84.84 86.79 1.956 102.31%
q9 124.07 121.45 -2.624 97.88%
q10 42.49 41.30 -1.196 97.19%
q11 20.55 20.02 -0.526 97.44%
q12 28.40 26.06 -2.338 91.77%
q13 45.05 45.68 0.636 101.41%
q14 20.10 15.10 -5.007 75.09%
q15 27.51 27.77 0.259 100.94%
q16 13.48 14.10 0.611 104.53%
q17 102.98 102.85 -0.125 99.88%
q18 149.02 149.16 0.134 100.09%
q19 14.04 13.92 -0.114 99.19%
q20 26.87 26.87 -0.001 99.99%
q21 223.05 226.51 3.455 101.55%
q22 13.96 13.67 -0.290 97.92%
total 1231.65 1225.96 -5.692 99.54%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants