ORC-554: Float to timestamp schema evolution handles time/nanoseconds incorrectly #431

abstractdog · 2019-09-16T13:12:06Z

No description provided.

jcamachor · 2019-09-19T04:15:53Z

java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java

@@ -1411,7 +1411,7 @@ public void setConvertVectorElement(int elementNum) {
      long wholeSec = (long) Math.floor(seconds);
      timestampColVector.time[elementNum] = wholeSec * 1000;
      timestampColVector.nanos[elementNum] =
-          1_000_000 * (int) Math.round((seconds - wholeSec) * 1000);
+          Math.max(0, 1_000_000 * (int) Math.round((seconds - wholeSec) * 1000));


@abstractdog , I am confused on whether this is the correct fix for the issue. Is this negative value because of long overflow? @omalley , @prasanthj , @t3rmin4t0r , what would be the expected behavior in this case for ORC? Setting timestamp to null? Using Long.MAX_VALUE?

@jcamachor: yes, this was about long overflow, so I agree, it should be handled accordingly in my opinion, the next commit is about this:
9c2b909
please take a look @omalley, @prasanthj , @t3rmin4t0r

please find the whole patch in single file here:
https://issues.apache.org/jira/secure/attachment/12981069/ORC-554.02.patch

The overflow case should become null. I know it feels weird, because in programming we use "null" to mean "no value." In SQL it means "error value," which you can see when you accidentally write "col1 = null" and get no matches.

thanks @omalley! so, do you mean that isNull[elementNum] = true is enough for the overflow case (without any other assignments in the timestamp col vector)?

you need that as well as cv.noNulls = false

omalley

Let me make the test case stronger along with the functional changes.

omalley · 2019-10-01T16:09:17Z

java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java

+      if (seconds >= Long.MAX_VALUE || seconds <= Long.MIN_VALUE) { // overflow
+        timestampColVector.time[elementNum] = 0L;
+        timestampColVector.nanos[elementNum] = 0;
+        timestampColVector.isNull[elementNum] = true;


You also need to set timestampColVector.noNulls = false.

omalley · 2019-10-01T16:09:53Z

java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java

+      } else {
+        timestampColVector.time[elementNum] = wholeSec * 1000;
+        timestampColVector.nanos[elementNum] =
+            1_000_000 * (int) Math.round((seconds - wholeSec) * 1000);


I'm worried that your patch has removed the protection keeping the value above 0.

omalley · 2019-10-01T16:10:25Z

java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java

-      timestampColVector.nanos[elementNum] =
-          1_000_000 * (int) Math.round((seconds - wholeSec) * 1000);
+
+      if (seconds >= Long.MAX_VALUE || seconds <= Long.MIN_VALUE) { // overflow


I think this should be seconds * 1000, since otherwise the time value can overflow.

agree, let's multiple first and then check

omalley · 2019-10-01T16:11:05Z

java/core/src/test/org/apache/orc/impl/TestSchemaEvolution.java

+      RecordReader rows = reader.rows(rowOptions)) {
+      assertTrue(rows.nextBatch(batchTimeStamp));
+
+      assertTrue(String.format("nanos should be > 0, instead it's: %d", t1.nanos[0]),


This test case is much too loose. Let me take a pass at strengthening it.

yeah, test case is cleaner and covers+checks lots of values and use cases now, +1

Fixes apache#431 Signed-off-by: Owen O'Malley <omalley@apache.org>

omalley · 2019-10-01T17:01:39Z

Ok, take a look at https://github.com/omalley/orc/tree/orc-554 to see if it looks ok.

abstractdog · 2019-10-02T20:05:39Z

double-checked testcase locally, it's working, I would only rename floatToTimeStampPositiveOverflow -> floatToTimeStampOverflow (similar to double)

LGTM

Fixes apache#431 Signed-off-by: Owen O'Malley <omalley@apache.org>

Fixes #431 Signed-off-by: Owen O'Malley <omalley@apache.org>

abstractdog added 6 commits September 12, 2019 09:21

repro

24b430d

testFloatToTimestampSample

cf37379

testFloatToTimestampSampleNegativeNano

f5d9197

ORC-554: wip2

2045330

revert test case

c906b47

fixed test

3666cfa

jcamachor reviewed Sep 19, 2019

View reviewed changes

proper overflow detection

9c2b909

omalley reviewed Oct 1, 2019

View reviewed changes

omalley pushed a commit to omalley/orc that referenced this pull request Oct 1, 2019

ORC-554: Float to timestamp schema evolution should handle overflow.

dd244fe

Fixes apache#431 Signed-off-by: Owen O'Malley <omalley@apache.org>

omalley pushed a commit to omalley/orc that referenced this pull request Oct 2, 2019

ORC-554: Float to timestamp schema evolution should handle overflow.

feb5c67

Fixes apache#431 Signed-off-by: Owen O'Malley <omalley@apache.org>

omalley closed this in 7de945b Oct 2, 2019

omalley pushed a commit that referenced this pull request Oct 2, 2019

ORC-554: Float to timestamp schema evolution should handle overflow.

2f1cc76

Fixes #431 Signed-off-by: Owen O'Malley <omalley@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORC-554: Float to timestamp schema evolution handles time/nanoseconds incorrectly #431

ORC-554: Float to timestamp schema evolution handles time/nanoseconds incorrectly #431

abstractdog commented Sep 16, 2019

jcamachor Sep 19, 2019

abstractdog Sep 23, 2019 •

edited

omalley Sep 27, 2019

abstractdog Sep 30, 2019

omalley Oct 1, 2019

omalley left a comment

omalley Oct 1, 2019

omalley Oct 1, 2019

omalley Oct 1, 2019

abstractdog Oct 2, 2019

omalley Oct 1, 2019

abstractdog Oct 2, 2019

omalley commented Oct 1, 2019

abstractdog commented Oct 2, 2019

ORC-554: Float to timestamp schema evolution handles time/nanoseconds incorrectly #431

ORC-554: Float to timestamp schema evolution handles time/nanoseconds incorrectly #431

Conversation

abstractdog commented Sep 16, 2019

Choose a reason for hiding this comment

abstractdog Sep 23, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

omalley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

omalley commented Oct 1, 2019

abstractdog commented Oct 2, 2019

abstractdog Sep 23, 2019 •

edited