Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3951][CH]Bug fix floor diff #3956

Merged
merged 9 commits into from
Jan 2, 2024

Conversation

KevinyhZou
Copy link
Contributor

@KevinyhZou KevinyhZou commented Dec 6, 2023

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

(Fixes: #3951)

How was this patch tested?

TEST BY UT

端到端性能测试

数据类型为Int64,表结构 test_tbl(d Int64), 测试SQL: select count(1) from test_tbl where floor(d) > 1 数据总量3000W, 测试三次
PR 改动前:1.13s, 0.92s, 0.985s
PR 改动后: 1.064s, 1.077s, 0.984s

数据类型为Float64, 表结构为test_tbl(d float64) , 测试SQL select count(1) from test_tbl where floor(d) > 1 数据总量3000W, 测试三次
PR 改动前: 1.417s, 1.386s 1.426s
PR 改动后:1.568s, 1.476s, 1.508s

可见对于Int64类型来说,改动前后性能基本持平;对于float64类型来说,大约有7.6%的性能回退,主要是来自于针对数据中可能出现NaN 以及INF 的情况进行了判断和赋值。

benchmark 性能测试

使用开发的 benchmark_spark_floor_function.cpp 来测试

  • Int64类型测试
    对于CH 的Floor函数, 结果如下
    image
    对于新开发的Floor函数,结果如下
    image

  • Float64类型测试
    对于CH的Floor函数,结果如下
    image
    对于新开发的Floor函数,结果如下
    image

可见对于Int64,大概有 3%左右的回退,对于Float64类型 大概有70%左右的回退

Spark UT 关于Floor 函数的测试,会通过 org.apache.spark.sql.GlutenMathFunctionsSuite 这个测试来完成,已开启

Copy link

github-actions bot commented Dec 6, 2023

#3951

Copy link

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

2 similar comments
Copy link

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 7, 2023

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

3 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@@ -180,6 +179,7 @@ static const std::map<std::string, std::string> SCALAR_FUNCTIONS
{"add_months", "addMonths"},
{"date_trunc", "dateTrunc"},
{"floor_datetime", "dateTrunc"},
{"floor", "spark_floor"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

保持驼峰风格

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

Run Gluten Clickhouse CI

2 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@liuneng1994 liuneng1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liuneng1994 liuneng1994 merged commit 9b6beac into apache:main Jan 2, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] floor functions throws exceptions when the arguments is INF or NaN
3 participants