New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1147: Use isNaN
instead of isFinite
to determine the contain NaN values
#1080
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, @guiyanakuang . This is not correct, @guiyanakuang .
There exists multiple NaN values. Double.NaN is just one of them.
Please note that IEEE Standard defines NaN as range, not a single value. |
Maybe I should use the JDK's own method |
Thank sounds better. |
Objects.equals(dstas.getSum(), Double.NaN)
instead of isFinite
to determine if there is a NaN writeisNaN
instead of isFinite
to determine the contain NaN values
// First two rows of data cause sum overflow, sum is not a finite value, | ||
// but this does not prevent pushing down (range comparisons work fine) | ||
// The same applies to the middle stripe | ||
fcol.vector[0] = dbcol.vector[0] = Double.MAX_VALUE / 2 + Double.MAX_VALUE / 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please define a constant variable once for Double.MAX_VALUE / 2 + Double.MAX_VALUE / 4
and use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
assertEquals(1000, batch.size); | ||
|
||
rows.nextBatch(batch); | ||
// Last strip should not be read, even if sum is not finite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strip -> stripe?
|
||
// First two rows of data cause sum overflow, sum is not a finite value, | ||
// but this does not prevent pushing down (range comparisons work fine) | ||
// The same applies to the middle stripe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some more illustration about how many stripes are used here?
cc @williamhyun |
|
||
// Here we are writing 3500 rows of data, with stripeSize set to 400000 | ||
// and rowIndexStride set to 1000, so 1 stripe will be written, | ||
// indexed in 4 strides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @guiyanakuang .
…values ### What changes were proposed in this pull request? This pr is aimed at using `isNaN` instead of `isFinite` to determine the contain NaN values. I want to exclude Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY both cases, and only match NaN. ### Why are the changes needed? In the case of a sum overflow we can also predicate down to skip the corresponding strip. ### How was this patch tested? Added unit test. Closes #1080 Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 6b053d4) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…values ### What changes were proposed in this pull request? This pr is aimed at using `isNaN` instead of `isFinite` to determine the contain NaN values. I want to exclude Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY both cases, and only match NaN. ### Why are the changes needed? In the case of a sum overflow we can also predicate down to skip the corresponding strip. ### How was this patch tested? Added unit test. Closes #1080 Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 6b053d4) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 7763697) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…values ### What changes were proposed in this pull request? This pr is aimed at using `isNaN` instead of `isFinite` to determine the contain NaN values. I want to exclude Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY both cases, and only match NaN. ### Why are the changes needed? In the case of a sum overflow we can also predicate down to skip the corresponding strip. ### How was this patch tested? Added unit test. Closes apache#1080 Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This pr is aimed at using
isNaN
instead ofisFinite
to determine the contain NaN values.I want to exclude Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY both cases, and only match NaN.
Why are the changes needed?
In the case of a sum overflow we can also predicate down to skip the corresponding strip.
How was this patch tested?
Added unit test.