New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24774][SQL] Avro: Support logical decimal type #22037

Closed
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@gengliangwang
Contributor

gengliangwang commented Aug 8, 2018

What changes were proposed in this pull request?

Support Avro logical date type:
https://avro.apache.org/docs/1.8.2/spec.html#Decimal

How was this patch tested?

Unit test

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 8, 2018

Test build #94427 has finished for PR 22037 at commit bbc5434.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 8, 2018

Test build #94427 has finished for PR 22037 at commit bbc5434.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 8, 2018

Test build #94428 has finished for PR 22037 at commit e9dc019.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 8, 2018

Test build #94428 has finished for PR 22037 at commit e9dc019.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 9, 2018

Test build #94519 has finished for PR 22037 at commit 1c5d228.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 9, 2018

Test build #94519 has finished for PR 22037 at commit 1c5d228.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
}
updater.set(ordinal, bytes)
case (FIXED, d: DecimalType) => (updater, ordinal, value) =>
val bigDecimal = decimalConversions.fromFixed(value.asInstanceOf[GenericFixed], avroType,
LogicalTypes.decimal(d.precision, d.scale))

This comment has been minimized.

@cloud-fan

cloud-fan Aug 10, 2018

Contributor

parquet can convert binary to unscaled long directly, shall we follow?

@cloud-fan

cloud-fan Aug 10, 2018

Contributor

parquet can convert binary to unscaled long directly, shall we follow?

This comment has been minimized.

@gengliangwang

gengliangwang Aug 10, 2018

Contributor

Comparing to binaryToUnscaledLong, I think using the method from Avro library makes more sense.

Also the method binaryToUnscaledLong is using the underlying byte array of parquet Binary without copying it. (If we create a new Util method for both, then Parquet data source will lose this optimization.)

For performance consideration, we can create a similar method in Avro. I tried the function binaryToUnscaledLong in Avro and it works. I can change it if you insist.

@gengliangwang

gengliangwang Aug 10, 2018

Contributor

Comparing to binaryToUnscaledLong, I think using the method from Avro library makes more sense.

Also the method binaryToUnscaledLong is using the underlying byte array of parquet Binary without copying it. (If we create a new Util method for both, then Parquet data source will lose this optimization.)

For performance consideration, we can create a similar method in Avro. I tried the function binaryToUnscaledLong in Avro and it works. I can change it if you insist.

This comment has been minimized.

@cloud-fan

cloud-fan Aug 10, 2018

Contributor

ok let's leave it. We can always add later.

@cloud-fan

cloud-fan Aug 10, 2018

Contributor

ok let's leave it. We can always add later.

@HyukjinKwon

This comment has been minimized.

Show comment
Hide comment
@HyukjinKwon

HyukjinKwon Aug 10, 2018

Member

retest this please

Member

HyukjinKwon commented Aug 10, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 10, 2018

Test build #94538 has finished for PR 22037 at commit 1c5d228.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 10, 2018

Test build #94538 has finished for PR 22037 at commit 1c5d228.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 10, 2018

Test build #94546 has finished for PR 22037 at commit 9eefbe5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 10, 2018

Test build #94546 has finished for PR 22037 at commit 9eefbe5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 10, 2018

Test build #94552 has finished for PR 22037 at commit 24dbada.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 10, 2018

Test build #94552 has finished for PR 22037 at commit 24dbada.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@gengliangwang

This comment has been minimized.

Show comment
Hide comment
@gengliangwang

gengliangwang Aug 10, 2018

Contributor

retest this please

Contributor

gengliangwang commented Aug 10, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 10, 2018

Test build #94558 has finished for PR 22037 at commit 24dbada.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 10, 2018

Test build #94558 has finished for PR 22037 at commit 24dbada.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 10, 2018

Test build #94579 has finished for PR 22037 at commit 0384a1a.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 10, 2018

Test build #94579 has finished for PR 22037 at commit 0384a1a.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
}""")
val datumWriter = new GenericDatumWriter[GenericRecord](schema)
val dataFileWriter = new DataFileWriter[GenericRecord](datumWriter)
dataFileWriter.create(schema, new File(s"$dir.avro"))

This comment has been minimized.

@cloud-fan

cloud-fan Aug 11, 2018

Contributor

Let's either always use python to write test files, or always use java.

@cloud-fan

cloud-fan Aug 11, 2018

Contributor

Let's either always use python to write test files, or always use java.

@viirya

This comment has been minimized.

Show comment
Hide comment
@viirya

viirya Aug 11, 2018

Contributor

retest this please.

Contributor

viirya commented Aug 11, 2018

retest this please.

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 11, 2018

Test build #94616 has finished for PR 22037 at commit 0384a1a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 11, 2018

Test build #94616 has finished for PR 22037 at commit 0384a1a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 11, 2018

Test build #94623 has finished for PR 22037 at commit 6b11292.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 11, 2018

Test build #94623 has finished for PR 22037 at commit 6b11292.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 11, 2018

Test build #94628 has finished for PR 22037 at commit 508a340.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 11, 2018

Test build #94628 has finished for PR 22037 at commit 508a340.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Aug 13, 2018

Contributor

thanks, merging to master!

Contributor

cloud-fan commented Aug 13, 2018

thanks, merging to master!

@asfgit asfgit closed this in be2238f Aug 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment