Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LIVY-754][THRIFT] Encode precision and scale for decimal type. #288

Closed
wants to merge 3 commits into from

Conversation

wypoon
Copy link
Contributor

@wypoon wypoon commented Apr 2, 2020

What changes were proposed in this pull request?

When a org.apache.livy.thriftserver.session.DataType.DECIMAL is converted to a org.apache.hive.service.rpc.thrift.TTypeDesc for sending a Thrift response to a client request for result set metadata, the TTypeDesc contains a TPrimitiveTypeEntry(TTypeId.DECIMAL_TYPE) without TTypeQualifiers (which are needed to capture the precision and scale).
With this change, we include the qualifiers in the TPrimitiveTypeEntry. We use both the name and the DataType of a field type to construct the TTypeDesc. We are able to do this without changing the existing internal representation for data types because we can obtain the precision and scale from the name of the decimal type.

How was this patch tested?

Use beeline to connect to the Thrift server. Do a select from a table with a column of decimal type.
Also extended an existing integration test.

@wypoon wypoon changed the title [LIVY-754][THRIFT] Encode precision and decimal for decimal type. [LIVY-754][THRIFT] Encode precision and scale for decimal type. Apr 2, 2020
@codecov-io
Copy link

codecov-io commented Apr 2, 2020

Codecov Report

Merging #288 into master will increase coverage by 0.06%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #288      +/-   ##
============================================
+ Coverage     68.19%   68.26%   +0.06%     
- Complexity      964      965       +1     
============================================
  Files           104      104              
  Lines          5952     5952              
  Branches        900      900              
============================================
+ Hits           4059     4063       +4     
+ Misses         1314     1310       -4     
  Partials        579      579              
Impacted Files Coverage Δ Complexity Δ
.../scala/org/apache/livy/sessions/SessionState.scala 61.11% <0.00%> (ø) 2.00% <0.00%> (ø%)
...ain/scala/org/apache/livy/utils/SparkYarnApp.scala 75.00% <0.00%> (+1.25%) 40.00% <0.00%> (ø%)
...in/java/org/apache/livy/rsc/driver/JobWrapper.java 88.57% <0.00%> (+5.71%) 9.00% <0.00%> (+1.00%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee7fdfc...6f5b968. Read the comment docs.

Copy link
Contributor

@andrasbeni andrasbeni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wypoon thanks for the fix. The change in general looks good to me. However, I had a few comments you may want to consider.

// name can be one of
// 1. decimal
// 2. decimal(p)
// 3. decimal(p, s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are decimal and decimal(p) actually possible? I understand these forms can be used to declare the type but based on org.apache.spark.sql.types.DecimalType I don't think the json omits scale or precision.

I might be wrong here. If so, then I believe the parsing logic should be tested for decimal and decimal(p) also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Hive that I used, I do not actually encounter decimal or decimal(p). I defined a Hive table with columns of each of those variants, and doing a "desc table" returns columns of type decimal(10,0) and decimal(p,0) for the first two variants. In this case, the json that Spark generates and used by DataTypeUtils.schemaFromSparkJson only has the third variant.
Nevertheless, I decided to handle all 3 variants purely as a defensive measure. It may be redundant but it doesn't hurt.

Copy link
Contributor Author

@wypoon wypoon Apr 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala> def f(name: String): (Int, Int) = {
     |       if (name == "decimal") {
     |         (10, 0)
     |       } else {
     |         val suffix = name.substring(7)
     |         require(suffix.startsWith("(") && suffix.endsWith(")"),
     |           name + " is not of the form decimal(<precision>,<scale>)")
     |         val parts = suffix.substring(1, suffix.length - 1).split(",")
     |         if (parts.length == 1) {
     |           (parts(0).trim.toInt, 0)
     |         } else {
     |           (parts(0).trim.toInt, parts(1).trim.toInt)
     |         }
     |       }
     | }
f: (name: String)(Int, Int)

scala> f("decimal")
res0: (Int, Int) = (10,0)

scala> f("decimal(7)")
res1: (Int, Int) = (7,0)

scala> f("decimal(9, 2)")
res2: (Int, Int) = (9,2)

scala> f("decimal_type")
java.lang.IllegalArgumentException: requirement failed: decimal_type is not of the form decimal(<precision>,<scale>)
  at scala.Predef$.require(Predef.scala:224)
  at f(<console>:28)
  ... 49 elided

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that it is overkill to write a unit test just for that block of code. The above suffices.

@wypoon
Copy link
Contributor Author

wypoon commented Apr 3, 2020

@mgaido91 @jerryshao can you please review?

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change seems fine to me, just a minor style comment. May you please add tests for all the possible cases, though? Thanks.

@wypoon
Copy link
Contributor Author

wypoon commented Apr 6, 2020

Added a couple more cases to the integration test.

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wypoon
Copy link
Contributor Author

wypoon commented May 20, 2020

@mgaido91 can you please merge this (since you have already approved it)?
I thought it was already merged, but it appears that it isn't.

@mgaido91 mgaido91 closed this in 3b9bbef May 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants