Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1147: Use isNaN instead of isFinite to determine the contain NaN values #1080

Closed
wants to merge 2 commits into from

Conversation

guiyanakuang
Copy link
Member

@guiyanakuang guiyanakuang commented Apr 5, 2022

What changes were proposed in this pull request?

This pr is aimed at using isNaN instead of isFinite to determine the contain NaN values.
I want to exclude Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY both cases, and only match NaN.

Why are the changes needed?

In the case of a sum overflow we can also predicate down to skip the corresponding strip.

How was this patch tested?

Added unit test.

@github-actions github-actions bot added the JAVA label Apr 5, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, @guiyanakuang . This is not correct, @guiyanakuang .

There exists multiple NaN values. Double.NaN is just one of them.

@dongjoon-hyun
Copy link
Member

Please note that IEEE Standard defines NaN as range, not a single value.

@guiyanakuang
Copy link
Member Author

Oh, @guiyanakuang . This is not correct, @guiyanakuang .

There exists multiple NaN values. Double.NaN is just one of them.

Maybe I should use the JDK's own method new Double(dstas.getSum()).isNaN(), where I want to exclude Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY both cases, and only match NaN

@dongjoon-hyun
Copy link
Member

Thank sounds better.

@guiyanakuang guiyanakuang changed the title ORC-1147: Use Objects.equals(dstas.getSum(), Double.NaN) instead of isFinite to determine if there is a NaN write ORC-1147: Use isNaN instead of isFinite to determine the contain NaN values Apr 5, 2022
// First two rows of data cause sum overflow, sum is not a finite value,
// but this does not prevent pushing down (range comparisons work fine)
// The same applies to the middle stripe
fcol.vector[0] = dbcol.vector[0] = Double.MAX_VALUE / 2 + Double.MAX_VALUE / 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define a constant variable once for Double.MAX_VALUE / 2 + Double.MAX_VALUE / 4 and use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

assertEquals(1000, batch.size);

rows.nextBatch(batch);
// Last strip should not be read, even if sum is not finite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strip -> stripe?


// First two rows of data cause sum overflow, sum is not a finite value,
// but this does not prevent pushing down (range comparisons work fine)
// The same applies to the middle stripe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some more illustration about how many stripes are used here?

@dongjoon-hyun
Copy link
Member

cc @williamhyun


// Here we are writing 3500 rows of data, with stripeSize set to 400000
// and rowIndexStride set to 1000, so 1 stripe will be written,
// indexed in 4 strides.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the details.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @guiyanakuang .

@dongjoon-hyun dongjoon-hyun added this to the 1.7.4 milestone Apr 6, 2022
dongjoon-hyun pushed a commit that referenced this pull request Apr 6, 2022
…values

### What changes were proposed in this pull request?

This pr is aimed at using `isNaN` instead of `isFinite` to determine the contain NaN values.
I want to exclude Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY both cases, and only match NaN.

### Why are the changes needed?

In the case of a sum overflow we can also predicate down to skip the corresponding strip.

### How was this patch tested?

Added unit test.

Closes #1080

Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 6b053d4)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Apr 6, 2022
…values

### What changes were proposed in this pull request?

This pr is aimed at using `isNaN` instead of `isFinite` to determine the contain NaN values.
I want to exclude Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY both cases, and only match NaN.

### Why are the changes needed?

In the case of a sum overflow we can also predicate down to skip the corresponding strip.

### How was this patch tested?

Added unit test.

Closes #1080

Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 6b053d4)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 7763697)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
…values

### What changes were proposed in this pull request?

This pr is aimed at using `isNaN` instead of `isFinite` to determine the contain NaN values.
I want to exclude Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY both cases, and only match NaN.

### Why are the changes needed?

In the case of a sum overflow we can also predicate down to skip the corresponding strip.

### How was this patch tested?

Added unit test.

Closes apache#1080

Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants