Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-554: Float to timestamp schema evolution handles time/nanoseconds incorrectly #431

Closed
wants to merge 7 commits into from

Conversation

abstractdog
Copy link
Contributor

No description provided.

@@ -1411,7 +1411,7 @@ public void setConvertVectorElement(int elementNum) {
long wholeSec = (long) Math.floor(seconds);
timestampColVector.time[elementNum] = wholeSec * 1000;
timestampColVector.nanos[elementNum] =
1_000_000 * (int) Math.round((seconds - wholeSec) * 1000);
Math.max(0, 1_000_000 * (int) Math.round((seconds - wholeSec) * 1000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abstractdog , I am confused on whether this is the correct fix for the issue. Is this negative value because of long overflow? @omalley , @prasanthj , @t3rmin4t0r , what would be the expected behavior in this case for ORC? Setting timestamp to null? Using Long.MAX_VALUE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcamachor: yes, this was about long overflow, so I agree, it should be handled accordingly in my opinion, the next commit is about this:
9c2b909
please take a look @omalley, @prasanthj , @t3rmin4t0r

please find the whole patch in single file here:
https://issues.apache.org/jira/secure/attachment/12981069/ORC-554.02.patch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overflow case should become null. I know it feels weird, because in programming we use "null" to mean "no value." In SQL it means "error value," which you can see when you accidentally write "col1 = null" and get no matches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @omalley! so, do you mean that isNull[elementNum] = true is enough for the overflow case (without any other assignments in the timestamp col vector)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need that as well as cv.noNulls = false

Copy link
Contributor

@omalley omalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me make the test case stronger along with the functional changes.

if (seconds >= Long.MAX_VALUE || seconds <= Long.MIN_VALUE) { // overflow
timestampColVector.time[elementNum] = 0L;
timestampColVector.nanos[elementNum] = 0;
timestampColVector.isNull[elementNum] = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to set timestampColVector.noNulls = false.

} else {
timestampColVector.time[elementNum] = wholeSec * 1000;
timestampColVector.nanos[elementNum] =
1_000_000 * (int) Math.round((seconds - wholeSec) * 1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that your patch has removed the protection keeping the value above 0.

timestampColVector.nanos[elementNum] =
1_000_000 * (int) Math.round((seconds - wholeSec) * 1000);

if (seconds >= Long.MAX_VALUE || seconds <= Long.MIN_VALUE) { // overflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be seconds * 1000, since otherwise the time value can overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, let's multiple first and then check

RecordReader rows = reader.rows(rowOptions)) {
assertTrue(rows.nextBatch(batchTimeStamp));

assertTrue(String.format("nanos should be > 0, instead it's: %d", t1.nanos[0]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is much too loose. Let me take a pass at strengthening it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, test case is cleaner and covers+checks lots of values and use cases now, +1

omalley pushed a commit to omalley/orc that referenced this pull request Oct 1, 2019
Fixes apache#431

Signed-off-by: Owen O'Malley <omalley@apache.org>
@omalley
Copy link
Contributor

omalley commented Oct 1, 2019

Ok, take a look at https://github.com/omalley/orc/tree/orc-554 to see if it looks ok.

@abstractdog
Copy link
Contributor Author

double-checked testcase locally, it's working, I would only rename floatToTimeStampPositiveOverflow -> floatToTimeStampOverflow (similar to double)

LGTM

omalley pushed a commit to omalley/orc that referenced this pull request Oct 2, 2019
Fixes apache#431

Signed-off-by: Owen O'Malley <omalley@apache.org>
@omalley omalley closed this in 7de945b Oct 2, 2019
omalley pushed a commit that referenced this pull request Oct 2, 2019
Fixes #431

Signed-off-by: Owen O'Malley <omalley@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants