[SPARK-21552][SQL] Add DecimalType support to ArrowWriter.#18754
[SPARK-21552][SQL] Add DecimalType support to ArrowWriter.#18754ueshin wants to merge 6 commits intoapache:masterfrom
Conversation
|
Test build #80014 has finished for PR 18754 at commit
|
| valueMutator.setIndexDefined(count) | ||
| val decimal = input.getDecimal(ordinal, precision, scale) | ||
| decimal.changePrecision(precision, scale) | ||
| DecimalUtility.writeBigDecimalToArrowBuf(decimal.toJavaBigDecimal, valueVector.getBuffer, count) |
There was a problem hiding this comment.
This will be easier in Arrow 0.7. There is now APIs for directly setting values with setSafe for BigDecimal.
There was a problem hiding this comment.
@BryanCutler Thanks, I'll update it to use setSafe after upgrading Arrow to 0.7.
Btw, when I tested upgrading to 0.7 locally, ArrowConvertersSuite.string type conversion came to fail. Do you have any ideas about that?
There was a problem hiding this comment.
I haven't tried running those tests for 0.7 yet, let me see if I can reproduce
There was a problem hiding this comment.
I got the same error, let me dig into it a little
There was a problem hiding this comment.
It was an issue with StringWriter, I put the fix in #19284 please take a look, thanks!
There was a problem hiding this comment.
I've confirmed it fixes the failure. Thanks!
| arrow_type = pa.float64() | ||
| elif type(dt) == DecimalType: | ||
| arrow_type = pa.decimal(dt.precision, dt.scale) | ||
| arrow_type = pa.decimal128(dt.precision, dt.scale) |
There was a problem hiding this comment.
@wesm @BryanCutler Is this a right way to define decimal type for Arrow?
I also wonder if there is a limit for precision and scale?
There was a problem hiding this comment.
yes, that's the right way - it is now fixed at 128 bits internally. I believe the Arrow Java limit is the same as Spark 38/38, not sure if pyarrow is the same but I think so.
|
Test build #85293 has finished for PR 18754 at commit
|
BryanCutler
left a comment
There was a problem hiding this comment.
LGTM, I just had one minor question about a call to Decimal.changePrecision
|
|
||
| override def setValue(input: SpecializedGetters, ordinal: Int): Unit = { | ||
| val decimal = input.getDecimal(ordinal, precision, scale) | ||
| decimal.changePrecision(precision, scale) |
There was a problem hiding this comment.
Is it necessary to call changePrecision even though getDecimal already takes the precision/scale as input - is it not guaranteed to return a decimal with that scale?
There was a problem hiding this comment.
Unfortunately, it depends on the implementation of getDecimal for now.
Btw, I guess we need to check the return value of changePrecision() and set null if the value is false, which means overflow.
| arrow_type = pa.float64() | ||
| elif type(dt) == DecimalType: | ||
| arrow_type = pa.decimal(dt.precision, dt.scale) | ||
| arrow_type = pa.decimal128(dt.precision, dt.scale) |
There was a problem hiding this comment.
yes, that's the right way - it is now fixed at 128 bits internally. I believe the Arrow Java limit is the same as Spark 38/38, not sure if pyarrow is the same but I think so.
|
Just took a quick look and looks fine to me too. |
|
Test build #85365 has finished for PR 18754 at commit
|
|
retest this please |
|
Test build #85371 has finished for PR 18754 at commit
|
|
Merged to master. |
What changes were proposed in this pull request?
Decimal type is not yet supported in
ArrowWriter.This is adding the decimal type support.
How was this patch tested?
Added a test to
ArrowConvertersSuite.