Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11827] [SQL] Adding java.math.BigInteger support in Java type inference for POJOs and Java collections #10125

Closed
wants to merge 26 commits into from

Conversation

kevinyu98
Copy link
Contributor

Hello : Can you help check this PR? I am adding support for the java.math.BigInteger for java bean code path. I saw internally spark is converting the BigInteger to BigDecimal in ColumnType.scala and CatalystRowConverter.scala. I use the similar way and convert the BigInteger to the BigDecimal. .

@srowen
Copy link
Member

srowen commented Dec 3, 2015

@kevinyu98 please write a meaningful title and description.
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

@kevinyu98 kevinyu98 changed the title Working on spark 11827 [SPARK-11827] [SQL] Adding java.math.BigInteger support in Java type inference for POJOs and Java collections Dec 3, 2015
@kevinyu98
Copy link
Contributor Author

Hello Sean: I am sorry, I forgot to update the title and description. I have made the changes, please let me know if anything needs to be changed. Thanks.
Kevin

@andrewor14
Copy link
Contributor

ok to test @yhuai @davies

@SparkQA
Copy link

SparkQA commented Dec 15, 2015

Test build #47689 has finished for PR 10125 at commit 1f77804.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -326,6 +326,7 @@ object CatalystTypeConverters {
val decimal = scalaValue match {
case d: BigDecimal => Decimal(d)
case d: JavaBigDecimal => Decimal(d)
case d: BigInteger => Decimal(d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we support both java and scala big integer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Wenchen: Sure, I will add that.

@srowen
Copy link
Member

srowen commented May 6, 2016

Ping @kevinyu98 -- update the PR or close it?

@kevinyu98
Copy link
Contributor Author

@srowen: sorry for the long delay. I will work on it now.


case class ReflectData3(
scalaBigInt: scala.math.BigInt
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to a single line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed that code.

@kevinyu98
Copy link
Contributor Author

@srowen @davies @cloud-fan I updated the code, can you help review? Sorry for the delay. Thanks.

@SparkQA
Copy link

SparkQA commented May 13, 2016

Test build #58587 has finished for PR 10125 at commit ae0be70.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kevinyu98
Copy link
Contributor Author

I just run the ./dev/mima locally, it works,
[info] Done packaging.
[info] spark-examples: previous-artifact not set, not analyzing binary compatibility
[info] spark-mllib: found 0 potential binary incompatibilities while checking against org.apache.spark:spark-mllib_2.11:1.6.0 (filtered 500)
[info] spark-sql: found 0 potential binary incompatibilities while checking against org.apache.spark:spark-sql_2.11:1.6.0 (filtered 752)
[success] Total time: 231 s, completed May 13, 2016 12:22:16 PM

@kevinyu98
Copy link
Contributor Author

retest it please.

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58754 has finished for PR 10125 at commit db4bb48.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kevinyu98
Copy link
Contributor Author

@cloud-fan can you help take a look? I have made changes based on your comments. Thanks.

@@ -326,6 +327,7 @@ object CatalystTypeConverters {
val decimal = scalaValue match {
case d: BigDecimal => Decimal(d)
case d: JavaBigDecimal => Decimal(d)
case d: JavaBigInteger => Decimal(d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you hold on until #13008? Then we can revert this change as CatalystTypeConverter is not used when creating DataFrame.

@kevinyu98
Copy link
Contributor Author

sure, I will do that.

@cloud-fan
Copy link
Contributor

#13008 is merged, can you revert the CatalystTypeConverters changes and see if it still works? Thanks!

@kevinyu98
Copy link
Contributor Author

@cloud-fan I tried, and it still fail. It didn't go through the createDataFrame you added in SparkSession.
It went with this createDataFrame(data: java.util.List[], beanClass: Class[]): DataFrame
-> val rows = SQLContext.beansToRows(data.asScala.iterator, beanInfo, attrSeq)

the beanToRows will create internal rows and it is from SQLContext.

Should we add RowEncoder into the beansToRows call or leave the code as it is ? Thanks.

here is the trace

scala.MatchError: 1234567 (of class java.math.BigInteger)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:326)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:323)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:892)
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:890)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.toStream(Iterator.scala:1322)
at scala.collection.AbstractIterator.toStream(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
at test.org.apache.spark.sql.JavaDataFrameSuite.testCreateDataFrameFromLocalJavaBeans(JavaDataFrameSuite.java:200)

@@ -109,6 +109,7 @@ object DecimalType extends AbstractDataType {
val MAX_SCALE = 38
val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18)
val USER_DEFAULT: DecimalType = DecimalType(10, 0)
val BIGINT_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 0)
Copy link
Contributor

@cloud-fan cloud-fan May 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a private[sql] val BigIntDecimal = DecimalType(38, 0) to the next section, instead of doing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I will do that.

@kevinyu98
Copy link
Contributor Author

retest it please.

Row first = df.select("a", "b", "c", "d").first();
Assert.assertEquals(new StructField("e", DataTypes.createDecimalType(38,0), true, Metadata.empty()),
schema.apply("e"));
Row first = df.select("a", "b", "c", "d","e").first();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a space before "e"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add

@cloud-fan
Copy link
Contributor

mostly LGTM, pending jenkins.

@kevinyu98
Copy link
Contributor Author

I will push the latest one after jenkins finish. Thanks very much !

@SparkQA
Copy link

SparkQA commented May 19, 2016

Test build #58869 has finished for PR 10125 at commit 43faed3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 19, 2016

Test build #58872 has finished for PR 10125 at commit 3b4e360.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request May 20, 2016
…nference for POJOs and Java collections

Hello : Can you help check this PR? I am adding support for the java.math.BigInteger for java bean code path. I saw internally spark is converting the BigInteger to BigDecimal in ColumnType.scala and CatalystRowConverter.scala. I use the similar way and convert the BigInteger to the BigDecimal. .

Author: Kevin Yu <qyu@us.ibm.com>

Closes #10125 from kevinyu98/working_on_spark-11827.

(cherry picked from commit 17591d9)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@asfgit asfgit closed this in 17591d9 May 20, 2016
@cloud-fan
Copy link
Contributor

thanks, merging to master and 2.0!

@tedyu
Copy link
Contributor

tedyu commented May 20, 2016

This seems to have broken build for Java 7:

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala:137: value longValueExact is not a member of java.math.BigInteger
[ERROR]       this.longVal = bigintval.longValueExact()
[ERROR]                                ^
[ERROR] one error found

@tedyu
Copy link
Contributor

tedyu commented May 20, 2016

Looks like bigintval.longValue() should have been used.

@tedyu
Copy link
Contributor

tedyu commented May 20, 2016

See #13233

@tedyu
Copy link
Contributor

tedyu commented May 21, 2016

When would the addendum be checked in ?

For people using Java 7, it is inconvenient because they have to modify Decimal.scala otherwise the compilation would fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants