-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LIVY-754][THRIFT] Encode precision and scale for decimal type. #288
Conversation
Codecov Report
@@ Coverage Diff @@
## master #288 +/- ##
============================================
+ Coverage 68.19% 68.26% +0.06%
- Complexity 964 965 +1
============================================
Files 104 104
Lines 5952 5952
Branches 900 900
============================================
+ Hits 4059 4063 +4
+ Misses 1314 1310 -4
Partials 579 579
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wypoon thanks for the fix. The change in general looks good to me. However, I had a few comments you may want to consider.
// name can be one of | ||
// 1. decimal | ||
// 2. decimal(p) | ||
// 3. decimal(p, s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are decimal
and decimal(p)
actually possible? I understand these forms can be used to declare the type but based on org.apache.spark.sql.types.DecimalType I don't think the json omits scale or precision.
I might be wrong here. If so, then I believe the parsing logic should be tested for decimal
and decimal(p)
also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Hive that I used, I do not actually encounter decimal or decimal(p). I defined a Hive table with columns of each of those variants, and doing a "desc table" returns columns of type decimal(10,0) and decimal(p,0) for the first two variants. In this case, the json that Spark generates and used by DataTypeUtils.schemaFromSparkJson
only has the third variant.
Nevertheless, I decided to handle all 3 variants purely as a defensive measure. It may be redundant but it doesn't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scala> def f(name: String): (Int, Int) = {
| if (name == "decimal") {
| (10, 0)
| } else {
| val suffix = name.substring(7)
| require(suffix.startsWith("(") && suffix.endsWith(")"),
| name + " is not of the form decimal(<precision>,<scale>)")
| val parts = suffix.substring(1, suffix.length - 1).split(",")
| if (parts.length == 1) {
| (parts(0).trim.toInt, 0)
| } else {
| (parts(0).trim.toInt, parts(1).trim.toInt)
| }
| }
| }
f: (name: String)(Int, Int)
scala> f("decimal")
res0: (Int, Int) = (10,0)
scala> f("decimal(7)")
res1: (Int, Int) = (7,0)
scala> f("decimal(9, 2)")
res2: (Int, Int) = (9,2)
scala> f("decimal_type")
java.lang.IllegalArgumentException: requirement failed: decimal_type is not of the form decimal(<precision>,<scale>)
at scala.Predef$.require(Predef.scala:224)
at f(<console>:28)
... 49 elided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that it is overkill to write a unit test just for that block of code. The above suffices.
thriftserver/server/src/main/scala/org/apache/livy/thriftserver/types/Schema.scala
Outdated
Show resolved
Hide resolved
@mgaido91 @jerryshao can you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change seems fine to me, just a minor style comment. May you please add tests for all the possible cases, though? Thanks.
thriftserver/server/src/main/scala/org/apache/livy/thriftserver/types/Schema.scala
Outdated
Show resolved
Hide resolved
Added a couple more cases to the integration test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mgaido91 can you please merge this (since you have already approved it)? |
What changes were proposed in this pull request?
When a
org.apache.livy.thriftserver.session.DataType.DECIMAL
is converted to aorg.apache.hive.service.rpc.thrift.TTypeDesc
for sending a Thrift response to a client request for result set metadata, theTTypeDesc
contains aTPrimitiveTypeEntry(TTypeId.DECIMAL_TYPE)
withoutTTypeQualifiers
(which are needed to capture the precision and scale).With this change, we include the qualifiers in the
TPrimitiveTypeEntry
. We use both the name and theDataType
of a field type to construct theTTypeDesc
. We are able to do this without changing the existing internal representation for data types because we can obtain the precision and scale from the name of the decimal type.How was this patch tested?
Use beeline to connect to the Thrift server. Do a select from a table with a column of decimal type.
Also extended an existing integration test.