Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14518: [Ruby] Add support for Arrow::Array.new([BigDecimal]) #13377

Merged
merged 2 commits into from Jun 22, 2022

Conversation

kou
Copy link
Member

@kou kou commented Jun 14, 2022

This requires bigdecimal 3.1.0 or later for BigDecimal#scale.

Arrow::Array.new([BigDecimal]) detects the max precision and scale
from BigDecimals and creates suitable Arrow::Decimal{128,256}DataType
automatically.

This also truncates given BigDecimal when the specified
Arrow::Decimal{128,256}DataType doesn't have enough and scale. This
still doesn't check precision. If an user specifies data that have too
much precision, the data are used as-is.

This requires bigdecimal 3.1.0 or later for BigDecimal#scale.

Arrow::Array.new([BigDecimal]) detects the max precision and scale
from BigDecimals and creates suitable Arrow::Decimal{128,256}DataType
automatically.

This also truncates given BigDecimal when the specified
Arrow::Decimal{128,256}DataType doesn't have enough and scale. This
still doesn't check precision. If an user specifies data that have too
much precision, the data are used as-is.
@kou kou requested a review from mrkn June 14, 2022 07:57
@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

Copy link
Member

@mrkn mrkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou I left some comments.

Comment on lines +136 to +137
precision = [builder_info[:precision] || 0, value.precision].max
scale = [builder_info[:scale] || 0, value.scale].max
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 lines don't permit overwriting the value's precision and the scale with smaller values by specifying the precision and the scale in builder_info. Is it intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide an example case?
Arrow::Array.new([BigDecimal("1.1"), BigDecimal("11.11")])?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misreading the code. It's OK now. But, I guess we may need to permit to pass precision and scale to Arrow::Array.new as optional arguments likeArrow::Array.new([BigDecimal("1.1"), BigDecimal("11.11")], scale: 1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users know an expected data type, they should use Arrow::XXXArray.new instead of Arrow::Array.new such as Arrow::Decimal128Array.new({scale: ..., precision: ...}, [BigDecimal(...). ...]).

Arrow::Array.new may not build a Arrow::Decimal128Array. In the case, the given optional arguments are just ignored. It may confuse users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

values: array.to_a,
})
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it unnecessary to test with BigDecimal::NAN and BigDecimal::INFINITY?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't notice them.
What is the expected behavior with them? It seems that decimal classes in C++ can't represent NAN and INFINITY.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing FloatDomainError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should build an Apache Arrow array with Arrow::ArrayBuilder.build instead of raising an exception. I'll use Arrow::StringArray for the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, It makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a validation for BigDecimal::NAN and BigDecimal::INFINITY to Arrow::Decimal{128,256}ArrayBuilder. If users append BigDecimal::NAN or BigDecimal::INFINITY, FloatDomainError is raised.

@kou
Copy link
Member Author

kou commented Jun 22, 2022

+1

@kou kou merged commit 6fd4d34 into apache:master Jun 22, 2022
@kou kou deleted the ruby-array-builder-decimal branch June 22, 2022 23:58
drin pushed a commit to drin/arrow that referenced this pull request Jun 23, 2022
…pache#13377)

This requires bigdecimal 3.1.0 or later for BigDecimal#scale.

Arrow::Array.new([BigDecimal]) detects the max precision and scale
from BigDecimals and creates suitable Arrow::Decimal{128,256}DataType
automatically.

This also truncates given BigDecimal when the specified
Arrow::Decimal{128,256}DataType doesn't have enough and scale. This
still doesn't check precision. If an user specifies data that have too
much precision, the data are used as-is.

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request Jul 5, 2022
…pache#13377)

This requires bigdecimal 3.1.0 or later for BigDecimal#scale.

Arrow::Array.new([BigDecimal]) detects the max precision and scale
from BigDecimals and creates suitable Arrow::Decimal{128,256}DataType
automatically.

This also truncates given BigDecimal when the specified
Arrow::Decimal{128,256}DataType doesn't have enough and scale. This
still doesn't check precision. If an user specifies data that have too
much precision, the data are used as-is.

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants