-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5498][SQL]fix query exception when partition schema does not match table schema #4289
Conversation
ok to test |
/cc @chenghao-intel |
Test build #26479 has finished for PR 4289 at commit
|
Test build #26481 has finished for PR 4289 at commit
|
Test build #26484 has finished for PR 4289 at commit
|
Test build #26489 has finished for PR 4289 at commit
|
Test build #26592 has finished for PR 4289 at commit
|
Test build #26602 has finished for PR 4289 at commit
|
Test build #26611 has finished for PR 4289 at commit
|
val tmpDir = Files.createTempDir() | ||
sql(s"CREATE TABLE table_with_partition(key int,value string) PARTITIONED by (ds string) location '${tmpDir.toURI.toString}' ") | ||
sql("INSERT OVERWRITE TABLE table_with_partition partition (ds='1') SELECT key,value FROM testData") | ||
sql("ALTER TABLE table_with_partition CHANGE COLUMN key key BIGINT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked the Hive Document
It says:
The CASCADE|RESTRICT clause is available in Hive 0.15.0. ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column change only to table metadata.
I guess in Hive 0.13.1, when table schema changed via alter table
, only the table meta data will be updated, can you double check if above query works for Hive 0.13.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I check this query in Hive 0.11 and hive-0.12 is OK,I will check this query in Hive 0.13.1 later.
Sorry for the late reply @jeanlyn !
|
Thanks @chenghao-intel for review and suggestions! We want to replace some hive sql to spark-sql in our production environment,so I use some sql in our production environmeng which running in hive-0.12 to test spark-sql and i found this issue,so i think make spark-sql to more compatible is well for popularized,and i will test the points @chenghao-intel listed both in hive and spark-sql. |
Oh, @jeanlyn, I've also tested that in Hive 0.13, seems it works. |
* @return An `Iterator[Row]` transformed from `iterator` | ||
*/ | ||
def fillObject( | ||
iterator: Iterator[Writable], | ||
deserializer: Deserializer, | ||
nonPartitionKeyAttrs: Seq[(Attribute, Int)], | ||
mutableRow: MutableRow): Iterator[Row] = { | ||
mutableRow: MutableRow, | ||
convertdeserializer: Option[Deserializer] = None): Iterator[Row] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing the deserializer
, how about take the converter
as the argument? By the way, I think Hive provides the IdentityConverter
, which mean we can make the parameter as "ObjectInspectorConverters.Converter", not necessary wrapped by Option
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the val soi
also need a convert deserializer when the schema doesn't match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, you're right, forget about my comment above. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the convertdeserializer
to outputStructObjectInspector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable name should be in camel style. convertdeserializer
=> convertDeserializer
? or change it to a better name?
In general I think the change looks reasonable to me, and we'd better use the Hive |
Test build #26820 has finished for PR 4289 at commit
|
hi,@chenghao-intel @marmbrus any suggestions? |
Test build #27010 has finished for PR 4289 at commit
|
Test build #27066 has finished for PR 4289 at commit
|
Thanks @chenghao-intel for review and suggestions!I take some of your advises to simplify the code. |
Test build #27267 has finished for PR 4289 at commit
|
Retest this please |
Hi,@marmbrus , @chenghao-intel I have no idea why |
@jeanlyn The HiveThriftServer unit test was disable previously before #4486 merged. From the log it's hard to say the failure reason, can you try it in you local?
|
@chenghao-intel ,I had passed all unit test in my local .But i think the unit test of thrift-server seems unstable,it's depend on the state of the machine,when the machine is busy,it may time out during the unit test. |
/cc @marmbrus |
test this please |
Test build #28532 has finished for PR 4289 at commit
|
Test build #28561 has finished for PR 4289 at commit
|
Updated, @marmbrus @chenghao-intel . We had tested this patch in our environment over the past few days.Any more problems in this patch? |
@@ -244,6 +244,11 @@ private[hive] object HiveShim { | |||
} | |||
} | |||
|
|||
def getConvertedOI(inputOI: ObjectInspector, | |||
outputOI: ObjectInspector): ObjectInspector = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: wrapped parameters should all be on a new line indented 4 chars.
Minor comments otherwise LGTM. |
Test build #28768 has finished for PR 4289 at commit
|
Hi, @marmbrus ,I had update the code as you mentioned about. |
Thanks! Merged to master. |
In hive,the schema of partition may be difference from the table schema.When we use spark-sql to query the data of partition which schema is difference from the table schema,we will get the exceptions as the description of the jira .For example:
we will get the cast exception
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to org.apache.spark.sql.catalyst.expressions.MutableInt