Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27847: Casting NUMERIC types to TIMESTAMP is prohibited #5278

Merged
merged 9 commits into from
Jun 6, 2024

Conversation

okumin
Copy link
Contributor

@okumin okumin commented Jun 4, 2024

What changes were proposed in this pull request?

Configure GenericUDFTimestamp.java correctly.

https://issues.apache.org/jira/browse/HIVE-27847

Why are the changes needed?

hive.strict.timestamp.conversion and hive.int.timestamp.conversion.in.seconds are not propagated to runtime.

Does this PR introduce any user-facing change?

Yes. This is a bug fix.

Is the change a dependency upgrade?

No.

How was this patch tested?

Integration tests are added.

@okumin okumin changed the title [WIP] HIVE-27847: Casting NUMERIC types to TIMESTAMP is prohibited HIVE-27847: Casting NUMERIC types to TIMESTAMP is prohibited Jun 5, 2024
<property>
<name>hive.strict.timestamp.conversion</name>
<value>false</value>
</property>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test HS2 clusters inherit this hive-site.xml, then this param is stored in where new HiveConf() picks up.
In order to make sure that test cases use the param not in hive-site but in session params, I'd like to remove it.

@@ -1,4 +1,6 @@
--! qt:dataset:src
set hive.strict.timestamp.conversion=false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the removal of the param from data/conf/hive-site.xml.

@@ -84,6 +84,7 @@ public TestVectorFilterCompare() {
// Arithmetic operations rely on getting conf from SessionState, need to initialize here.
SessionState ss = new SessionState(new HiveConf());
ss.getConf().setVar(HiveConf.ConfVars.HIVE_COMPAT, "latest");
ss.getConf().setBoolVar(HiveConf.ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION, false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the removal of the param from data/conf/hive-site.xml.

if (ss != null) {
final boolean strict = ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION);
final PrimitiveGrouping grouping = PrimitiveObjectInspectorUtils.getPrimitiveGrouping(tsInputTypes[0]);
if (strict && grouping == PrimitiveGrouping.NUMERIC_GROUP) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this validation is required only at compile-time.

@@ -4,6 +4,8 @@ create table test_num_ts_input(begin string, ts string);

insert into test_num_ts_input values('1653209895687','2022-05-22T15:58:15.931+07:00'),('1653209938316','2022-05-22T15:58:58.490+07:00'),('1653209962021','2022-05-22T15:59:22.191+07:00'),('1653210021993','2022-05-22T16:00:22.174+07:00');

set hive.vectorized.execution.enabled=false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was not reproduced without this session param.

@okumin okumin marked this pull request as ready for review June 5, 2024 00:40
-- SHA512 is used
MASK_HASH(text_timestamp),
-- Java's Pattern doesn't support it, then it fails with hive.use.googleregex.engine=false
text_timestamp RLIKE '\\p{Katakana}+'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following files would touch SessionState or HiveConf.

git grep -l -e 'org.apache.hadoop.hive.ql.session.SessionState' -e 'org.apache.hadoop.hive.conf.HiveConf' | grep 'ql/src/java/org/apache/hadoop/hive/ql/udf/generic' 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAssertTrueOOM.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseArithmetic.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseCompare.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseNumeric.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCurrentAuthorizer.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCurrentCatalog.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCurrentDatabase.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCurrentDate.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCurrentGroups.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCurrentSchema.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCurrentTimestamp.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCurrentUser.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFromUnixTime.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLoggedInUser.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskHash.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPDivide.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRegExp.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRestrictInformationSchema.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToUnixTimeStamp.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUnixTimeStamp.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSQLSchema.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantFormatter.java

I didn't add tests for the following UDFs because they are marked as "runtime constant" and the issue will unlikely happen. As another point, it is hard to configure some of them in qtest.

CURRENT_AUTHORIZER(),
CURRENT_CATALOG(),
CURRENT_DATABASE(),
CURRENT_DATE(),
CURRENT_GROUPS(),
CURRENT_SCHEMA(),
CURRENT_TIMESTAMP(),
CURRENT_USER(),
LOGGED_IN_USER(),
RESTRICT_INFORMATION_SCHEMA()

@okumin
Copy link
Contributor Author

okumin commented Jun 5, 2024

This seems to be unrelated and temporal.
https://github.com/apache/hive/actions/runs/9376224477/job/25815816131?pr=5278

Copy link
Member

@dengzhhu653 dengzhhu653 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor comment to clarify, the change looks good to me! Thank you @okumin for check and PR!

-- On HiveServer2
SELECT
-- DECIMAL because of hive.compat=latest
TYPEOF(cint1 / cint2),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this TYPEOF(cint1 / cint2) happens only at compile stage, should we change it to cint1 / cint2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! I wonder what is the most reasonable way to identify if it is DOUBLE or DECIMAL...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say, I found we can simply remove TYPEOF since the text expression or precision of DOUBLE and DECIMAL is different. This is the diff between hive.compat=latest vs default.

< 3333.3333333333335	false	1970-01-01 02:46:40	1970-01-01 01:02:03 4	1970-01-01 09:46:40 4	-21477	-21477	ca764b57c635a893c91b0edaea84bca142e065990e1a66b9c60cf855777af51070444210d30e359fa81e5a77c68a073224e0c7343d957f556816618a10baa37c	false
< 4000.0	false	1970-01-01 05:33:20	1970-01-02 04:05:06 5	1970-01-01 12:33:20 4	75906	75906	7882085a4ed36e7c675fcd12083eefa9a208f1fdcfba10b3a986bb5b1d43e23da3ba87b3933feebfed80ba9e05e4e73f72231cb8022453e562e034e8b5c603b9	false
< 4285.714285714285	false	1970-01-01 08:20:00	1970-01-03 07:08:09 6	1970-01-01 15:20:00 4	173289	173289	d9899e12fce35a4f65cd63aee1e953dacd9e91229d52dd17ddad8e44c17d2d081c0a5519a96637663bc6baa2d7c96d6f1a9d6944ac7668c7926ebc10c9f3b81b	false
---
> 3333.33333333333	false	1970-01-01 02:46:40	1970-01-01 01:02:03 4	1970-01-01 09:46:40 4	-21477	-21477	ca764b57c635a893c91b0edaea84bca142e065990e1a66b9c60cf855777af51070444210d30e359fa81e5a77c68a073224e0c7343d957f556816618a10baa37c	false
> 4000.00000000000	false	1970-01-01 05:33:20	1970-01-02 04:05:06 5	1970-01-01 12:33:20 4	75906	75906	7882085a4ed36e7c675fcd12083eefa9a208f1fdcfba10b3a986bb5b1d43e23da3ba87b3933feebfed80ba9e05e4e73f72231cb8022453e562e034e8b5c603b9	false
> 4285.71428571429	false	1970-01-01 08:20:00	1970-01-03 07:08:09 6	1970-01-01 15:20:00 4	173289	173289	d9899e12fce35a4f65cd63aee1e953dacd9e91229d52dd17ddad8e44c17d2d081c0a5519a96637663bc6baa2d7c96d6f1a9d6944ac7668c7926ebc10c9f3b81b	false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed the change and am waiting for CI.
db8edc8

Copy link

sonarqubecloud bot commented Jun 5, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@dengzhhu653 dengzhhu653 merged commit 18f435a into apache:master Jun 6, 2024
5 checks passed
@okumin okumin deleted the HIVE-27847 branch June 6, 2024 03:24
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants