[SPARK-11827] [SQL] Adding java.math.BigInteger support in Java type inference for POJOs and Java collections #10125

kevinyu98 · 2015-12-03T08:07:46Z

Hello : Can you help check this PR? I am adding support for the java.math.BigInteger for java bean code path. I saw internally spark is converting the BigInteger to BigDecimal in ColumnType.scala and CatalystRowConverter.scala. I use the similar way and convert the BigInteger to the BigDecimal. .

srowen · 2015-12-03T08:11:22Z

@kevinyu98 please write a meaningful title and description.
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

kevinyu98 · 2015-12-03T16:33:31Z

Hello Sean: I am sorry, I forgot to update the title and description. I have made the changes, please let me know if anything needs to be changed. Thanks.
Kevin

andrewor14 · 2015-12-15T00:05:30Z

ok to test @yhuai @davies

SparkQA · 2015-12-15T01:47:23Z

Test build #47689 has finished for PR 10125 at commit 1f77804.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2015-12-15T02:02:02Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala

@@ -326,6 +326,7 @@ object CatalystTypeConverters {
      val decimal = scalaValue match {
        case d: BigDecimal => Decimal(d)
        case d: JavaBigDecimal => Decimal(d)
+        case d: BigInteger => Decimal(d)


should we support both java and scala big integer?

Hi Wenchen: Sure, I will add that.

get latest code from upstream

adding trim characters support

get latest code for pr12646

merge latest code

merge upstream/master

srowen · 2016-05-06T17:32:59Z

Ping @kevinyu98 -- update the PR or close it?

kevinyu98 · 2016-05-06T18:44:00Z

@srowen: sorry for the long delay. I will work on it now.

techaddict · 2016-05-13T18:13:19Z

sql/core/src/test/scala/org/apache/spark/sql/ScalaReflectionRelationSuite.scala

+
+case class ReflectData3(
+                         scalaBigInt: scala.math.BigInt
+                         )


can you move this to a single line.

I just removed that code.

kevinyu98 · 2016-05-13T18:16:50Z

@srowen @davies @cloud-fan I updated the code, can you help review? Sorry for the delay. Thanks.

SparkQA · 2016-05-13T18:19:37Z

Test build #58587 has finished for PR 10125 at commit ae0be70.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

kevinyu98 · 2016-05-13T19:23:42Z

I just run the ./dev/mima locally, it works,
[info] Done packaging.
[info] spark-examples: previous-artifact not set, not analyzing binary compatibility
[info] spark-mllib: found 0 potential binary incompatibilities while checking against org.apache.spark:spark-mllib_2.11:1.6.0 (filtered 500)
[info] spark-sql: found 0 potential binary incompatibilities while checking against org.apache.spark:spark-sql_2.11:1.6.0 (filtered 752)
[success] Total time: 231 s, completed May 13, 2016 12:22:16 PM

kevinyu98 · 2016-05-13T19:23:49Z

retest it please.

SparkQA · 2016-05-18T08:45:23Z

Test build #58754 has finished for PR 10125 at commit db4bb48.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kevinyu98 · 2016-05-18T14:07:35Z

@cloud-fan can you help take a look? I have made changes based on your comments. Thanks.

cloud-fan · 2016-05-18T14:22:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala

@@ -326,6 +327,7 @@ object CatalystTypeConverters {
      val decimal = scalaValue match {
        case d: BigDecimal => Decimal(d)
        case d: JavaBigDecimal => Decimal(d)
+        case d: JavaBigInteger => Decimal(d)


Can you hold on until #13008? Then we can revert this change as CatalystTypeConverter is not used when creating DataFrame.

kevinyu98 · 2016-05-18T16:15:28Z

sure, I will do that.

cloud-fan · 2016-05-19T01:12:53Z

#13008 is merged, can you revert the CatalystTypeConverters changes and see if it still works? Thanks!

kevinyu98 · 2016-05-19T06:19:16Z

@cloud-fan I tried, and it still fail. It didn't go through the createDataFrame you added in SparkSession.
It went with this createDataFrame(data: java.util.List[], beanClass: Class[]): DataFrame
-> val rows = SQLContext.beansToRows(data.asScala.iterator, beanInfo, attrSeq)

the beanToRows will create internal rows and it is from SQLContext.

Should we add RowEncoder into the beansToRows call or leave the code as it is ? Thanks.

here is the trace

scala.MatchError: 1234567 (of class java.math.BigInteger)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:326)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:323)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:892)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:890)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.toStream(Iterator.scala:1322)
at scala.collection.AbstractIterator.toStream(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
at test.org.apache.spark.sql.JavaDataFrameSuite.testCreateDataFrameFromLocalJavaBeans(JavaDataFrameSuite.java:200)

cloud-fan · 2016-05-19T07:50:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala

@@ -109,6 +109,7 @@ object DecimalType extends AbstractDataType {
  val MAX_SCALE = 38
  val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18)
  val USER_DEFAULT: DecimalType = DecimalType(10, 0)
+  val BIGINT_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 0)


please add a private[sql] val BigIntDecimal = DecimalType(38, 0) to the next section, instead of doing this.

sure, I will do that.

kevinyu98 · 2016-05-19T14:44:40Z

retest it please.

cloud-fan · 2016-05-19T15:16:39Z

sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java

-    Row first = df.select("a", "b", "c", "d").first();
+    Assert.assertEquals(new StructField("e", DataTypes.createDecimalType(38,0), true, Metadata.empty()),
+      schema.apply("e"));
+    Row first = df.select("a", "b", "c", "d","e").first();


nit: add a space before "e"

cloud-fan · 2016-05-19T15:20:09Z

mostly LGTM, pending jenkins.

kevinyu98 · 2016-05-19T15:29:27Z

I will push the latest one after jenkins finish. Thanks very much !

SparkQA · 2016-05-19T16:09:25Z

Test build #58869 has finished for PR 10125 at commit 43faed3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-19T17:47:54Z

Test build #58872 has finished for PR 10125 at commit 3b4e360.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…nference for POJOs and Java collections Hello : Can you help check this PR? I am adding support for the java.math.BigInteger for java bean code path. I saw internally spark is converting the BigInteger to BigDecimal in ColumnType.scala and CatalystRowConverter.scala. I use the similar way and convert the BigInteger to the BigDecimal. . Author: Kevin Yu <qyu@us.ibm.com> Closes #10125 from kevinyu98/working_on_spark-11827. (cherry picked from commit 17591d9) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2016-05-20T04:45:08Z

thanks, merging to master and 2.0!

tedyu · 2016-05-20T20:51:59Z

This seems to have broken build for Java 7:

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala:137: value longValueExact is not a member of java.math.BigInteger
[ERROR]       this.longVal = bigintval.longValueExact()
[ERROR]                                ^
[ERROR] one error found

tedyu · 2016-05-20T20:55:08Z

Looks like bigintval.longValue() should have been used.

tedyu · 2016-05-20T21:14:30Z

See #13233

tedyu · 2016-05-21T11:13:32Z

When would the addendum be checked in ?

For people using Java 7, it is inconvenient because they have to modify Decimal.scala otherwise the compilation would fail.

kevinyu98 changed the title ~~Working on spark 11827~~ [SPARK-11827] [SQL] Adding java.math.BigInteger support in Java type inference for POJOs and Java collections Dec 3, 2015

cloud-fan reviewed Dec 15, 2015
View reviewed changes

kevinyu98 added 9 commits April 20, 2016 11:06

adding testcase

3b44c59

Merge remote-tracking branch 'upstream/master'

18b4a31

Merge remote-tracking branch 'upstream/master'

4f4d1c8

get latest code from upstream

Merge remote-tracking branch 'upstream/master'

f5f0cbe

adding trim characters support

Merge remote-tracking branch 'upstream/master'

d8b2edb

get latest code for pr12646

Merge remote-tracking branch 'upstream/master'

196b6c6

merge latest code

Merge remote-tracking branch 'upstream/master'

f37a01e

merge upstream/master

Merge remote-tracking branch 'upstream/master'

bb5b01f

Merge remote-tracking branch 'upstream/master'

bde5820

kevinyu98 added 2 commits May 10, 2016 14:14

Merge remote-tracking branch 'upstream/master'

5f7cd96

fix comments

ae0be70

kevinyu98 force-pushed the working_on_spark-11827 branch from 1f77804 to ae0be70 Compare May 13, 2016 18:07

fixing style

741daff

techaddict reviewed May 13, 2016
View reviewed changes

Merge remote-tracking branch 'upstream/master'

893a49a

cloud-fan reviewed May 18, 2016
View reviewed changes

Merge remote-tracking branch 'upstream/master'

8c3e5da

kevinyu98 added 2 commits May 18, 2016 21:28

Merge remote-tracking branch 'upstream/master'

a0eaa40

Merge branch 'spark-11827newnewnew' into spark-11827new1

b1527b7

cloud-fan reviewed May 19, 2016
View reviewed changes

kevinyu98 added 2 commits May 19, 2016 07:36

address comments

b26412e

delete BIGINT_DEFAULT

43faed3

cloud-fan reviewed May 19, 2016
View reviewed changes

address comments

3b4e360

asfgit closed this in 17591d9 May 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-11827] [SQL] Adding java.math.BigInteger support in Java type inference for POJOs and Java collections #10125

[SPARK-11827] [SQL] Adding java.math.BigInteger support in Java type inference for POJOs and Java collections #10125

kevinyu98 commented Dec 3, 2015

srowen commented Dec 3, 2015

kevinyu98 commented Dec 3, 2015

andrewor14 commented Dec 15, 2015

SparkQA commented Dec 15, 2015

cloud-fan Dec 15, 2015

kevinyu98 Dec 17, 2015

srowen commented May 6, 2016

kevinyu98 commented May 6, 2016

techaddict May 13, 2016

kevinyu98 May 13, 2016

kevinyu98 commented May 13, 2016

SparkQA commented May 13, 2016

kevinyu98 commented May 13, 2016

kevinyu98 commented May 13, 2016

SparkQA commented May 18, 2016

kevinyu98 commented May 18, 2016

cloud-fan May 18, 2016

kevinyu98 commented May 18, 2016

cloud-fan commented May 19, 2016

kevinyu98 commented May 19, 2016

cloud-fan May 19, 2016 •

edited

kevinyu98 May 19, 2016

kevinyu98 commented May 19, 2016

cloud-fan May 19, 2016

kevinyu98 May 19, 2016

cloud-fan commented May 19, 2016

kevinyu98 commented May 19, 2016

SparkQA commented May 19, 2016

SparkQA commented May 19, 2016

cloud-fan commented May 20, 2016

tedyu commented May 20, 2016

tedyu commented May 20, 2016

tedyu commented May 20, 2016

tedyu commented May 21, 2016

[SPARK-11827] [SQL] Adding java.math.BigInteger support in Java type inference for POJOs and Java collections #10125

[SPARK-11827] [SQL] Adding java.math.BigInteger support in Java type inference for POJOs and Java collections #10125

Conversation

kevinyu98 commented Dec 3, 2015

srowen commented Dec 3, 2015

kevinyu98 commented Dec 3, 2015

andrewor14 commented Dec 15, 2015

SparkQA commented Dec 15, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srowen commented May 6, 2016

kevinyu98 commented May 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinyu98 commented May 13, 2016

SparkQA commented May 13, 2016

kevinyu98 commented May 13, 2016

kevinyu98 commented May 13, 2016

SparkQA commented May 18, 2016

kevinyu98 commented May 18, 2016

Choose a reason for hiding this comment

kevinyu98 commented May 18, 2016

cloud-fan commented May 19, 2016

kevinyu98 commented May 19, 2016

cloud-fan May 19, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinyu98 commented May 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented May 19, 2016

kevinyu98 commented May 19, 2016

SparkQA commented May 19, 2016

SparkQA commented May 19, 2016

cloud-fan commented May 20, 2016

tedyu commented May 20, 2016

tedyu commented May 20, 2016

tedyu commented May 20, 2016

tedyu commented May 21, 2016

cloud-fan May 19, 2016 •

edited