-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28224: Upgrade Orc version in Hive to 1.9.3 #5218
base: master
Are you sure you want to change the base?
Conversation
2ac4ebc
to
e438f4d
Compare
e438f4d
to
7b803ca
Compare
7b803ca
to
9195137
Compare
ql/src/test/queries/clientpositive/insert_values_orig_table_use_metadata.q
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general looks good, but I am not sure why the query results are changing, same query returning different results pre & post upgrade doesn't look f9 to me in general
Statistics: Num rows: 69 Data size: 13710 Basic stats: COMPLETE Column stats: NONE | ||
Statistics: Num rows: 70 Data size: #Masked# Basic stats: COMPLETE Column stats: NONE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did number of rows changed with ORC upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Num rows" is calculated estimate by dividing data size by average row size.
Previously data size was 13,710 and now it is 13,720.
Average row size = 196
Before Orc upgrade it was: 13,710 / 196 = 69.??? - hence the value of "Num rows" was 69.
Now: 13720 / 196 = 70
So basically the reason for num rows change is that Data size value increased.
NULL 3072 9318.4351351351 -4298.1513513514 5018444.1081079808 1633.60810810806667 5695.483082135323 5696.410307714464 3072 11160.71538461538500 -5147.90769230769300 6010604.30769230735360 1956.576923076922966667 6821.495748565151 6822.606289190915 | ||
NULL 3072 9318.4351351351 -4298.1513513514 5018444.1081079808 1633.60810810806667 5695.483082135325 5696.4103077144655 3072 11160.71538461538500 -5147.90769230769300 6010604.30769230735360 1956.576923076922966667 6821.4957485651385 6822.606289190904 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query result is changing here?
@@ -847,4 +847,4 @@ FROM (SELECT cint, | |||
POSTHOOK: type: QUERY | |||
POSTHOOK: Input: default@decimal_vgby_small | |||
#### A masked pattern was here #### | |||
95165244160 | |||
95767761728 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here also the result of the query is changing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This difference is caused by the previous differences in STDDEV_POP(cdecimal1) and STDDEV_SAMP(cdecimal1) fields
@@ -180,8 +180,8 @@ | |||
<mysql.version>8.0.31</mysql.version> | |||
<postgres.version>42.7.3</postgres.version> | |||
<oracle.version>21.3.0.0</oracle.version> | |||
<opencsv.version>2.3</opencsv.version> | |||
<orc.version>1.8.5</orc.version> | |||
<opencsv.version>5.9</opencsv.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask why do you change opencsv
under the PR title, HIVE-28224: Upgrade Orc version in Hive to 1.9.3
, @difin ?
Could you proceed to upgrade opencsv
library independently before upgrading ORC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added opencsv changes because Hive wasn't able to compile without them. Hive has a code that depends on opencsv version that was coming from Orc 1.8.5 and it is not in 1.9.3.
I created a separate PR for upgrading opencsv #5240
Once it gets merged, I will update this PR and remove opencsv changes from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @difin .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are welcome, @dongjoon-hyun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dongjoon-hyun, opencsv changes were removed from this PR and were done separately. The differences in the results of tests are still present.
I did a few additional checks:
- The same differences are present when Hive statistics are enabled and when they are disabled.
- With the
decimal_vgby
table used as a source in the test in Parquet format instead of Orc, the differences are absent (so it is an additional confirmation that it is something related to Orc). - I dumped the content of the
decimal_vgby
table with and without Orc upgrade and they were the same, which is odd
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q
Outdated
Show resolved
Hide resolved
ql/src/test/results/clientpositive/llap/materialized_view_create_rewrite_5.q.out
Outdated
Show resolved
Hide resolved
ql/src/test/results/clientpositive/llap/insert_values_orig_table_use_metadata.q.out
Outdated
Show resolved
Hide resolved
Quality Gate passedIssues Measures |
What changes were proposed in this pull request?
Upgrading Orc version in Hive to the latest version that supports Java 8 - 1.9.3
Why are the changes needed?
To keep the Orc version in Hive updated.
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
How was this patch tested?
Pre-commit testing