Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3559][VL] Rewrite GlutenInsertSuite test cases with default values #4737

Merged
merged 1 commit into from
Feb 23, 2024

Conversation

Surbhi-Vijay
Copy link
Contributor

What changes were proposed in this pull request?

Additional support was added in Spark-3.4 for default values in parquet file scans.
https://issues.apache.org/jira/browse/SPARK-39265

While scanning the files if the column with default value does not have any value then reader appends the default value to it. So, even if the column with default value was added later, file scan still provides values for all records (existing as well as new ones).

Velox does not support back filling the existing records while scan. So, if the column with default value was added later then it will provide null as column value for existing records.
This is a behavior difference and not an inconsistent behavior. Users can update the existing data by running DML commands.

This PR, rewrites those testcases with default value in Gluten.

(Fixes: #3559)

How was this patch tested?

Unit tests are passing

Copy link

#3559

Copy link

Run Gluten Clickhouse CI

@Surbhi-Vijay
Copy link
Contributor Author

@JkSelf @PHILO-HE Please review!

@ayushi-agarwal
Copy link
Contributor

@Surbhi-Vijay An issue for this can be opened: Velox does not support back filling the existing records while scan

@PHILO-HE
Copy link
Contributor

@Surbhi-Vijay, could you please rebase the code and resolve the conflicts? Thanks!

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your efforts!

@Surbhi-Vijay
Copy link
Contributor Author

@PHILO-HE All checks have passed.

@zhli1142015 zhli1142015 merged commit b823592 into apache:main Feb 23, 2024
19 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4737_time.csv log/native_master_02_23_2024_9ceade94a_time.csv difference percentage
q1 32.21 34.69 2.488 107.73%
q2 24.57 24.33 -0.240 99.02%
q3 37.99 38.86 0.875 102.30%
q4 36.28 36.08 -0.206 99.43%
q5 70.18 72.24 2.067 102.95%
q6 6.59 7.31 0.725 111.01%
q7 86.44 84.62 -1.818 97.90%
q8 87.29 85.39 -1.895 97.83%
q9 119.14 124.53 5.390 104.52%
q10 41.89 44.23 2.337 105.58%
q11 20.82 19.94 -0.877 95.78%
q12 28.08 26.67 -1.408 94.98%
q13 45.53 44.33 -1.208 97.35%
q14 16.69 20.85 4.155 124.89%
q15 28.86 29.04 0.172 100.60%
q16 14.07 14.19 0.113 100.81%
q17 103.57 102.20 -1.364 98.68%
q18 150.41 150.35 -0.056 99.96%
q19 12.57 14.06 1.485 111.81%
q20 28.70 26.19 -2.511 91.25%
q21 226.08 226.58 0.500 100.22%
q22 13.69 13.90 0.214 101.56%
total 1231.64 1240.58 8.938 100.73%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_02_23_2024_time.csv log/native_master_02_23_2024_9ceade94a_time.csv difference percentage
q1 33.31 34.69 1.379 104.14%
q2 22.67 24.33 1.663 107.34%
q3 39.20 38.86 -0.343 99.13%
q4 37.79 36.08 -1.713 95.47%
q5 70.31 72.24 1.936 102.75%
q6 7.14 7.31 0.172 102.40%
q7 84.33 84.62 0.287 100.34%
q8 86.39 85.39 -0.997 98.85%
q9 124.14 124.53 0.386 100.31%
q10 42.39 44.23 1.834 104.33%
q11 21.03 19.94 -1.094 94.80%
q12 26.45 26.67 0.227 100.86%
q13 45.22 44.33 -0.892 98.03%
q14 16.73 20.85 4.124 124.66%
q15 27.23 29.04 1.811 106.65%
q16 14.68 14.19 -0.498 96.61%
q17 101.39 102.20 0.814 100.80%
q18 148.35 150.35 2.000 101.35%
q19 13.42 14.06 0.634 104.72%
q20 26.61 26.19 -0.423 98.41%
q21 226.57 226.58 0.006 100.00%
q22 13.69 13.90 0.205 101.50%
total 1229.06 1240.58 11.518 100.94%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Track all the failed unit test in Spark 3.4.
5 participants