New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32546][SQL] Get table names directly from Hive tables #29363
Conversation
@@ -759,15 +759,17 @@ private[hive] class HiveClientImpl( | |||
dbName: String, | |||
pattern: String, | |||
tableType: CatalogTableType): Seq[String] = withHiveState { | |||
val hiveTableType = toHiveTableType(tableType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending Jenkins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by the following, @MaxGekk ? The existing test suite already has been passing without this PR.
### How was this patch tested?
By existing test suites like:
$ build/sbt -Phive-2.3 "test:testOnly org.apache.spark.sql.hive.client.VersionsSuite"
In that section, could you provide the manual reproducible steps which described in your previous section?
@dongjoon-hyun I mean that all modified lines by me are covered by the tests. Could you image a function which should increase an integer by one: def plusOne(i: Int): Int = {
downloadPage("http://www.blablabla.com")
i + 1
} If we remove unnecessary code The same in our situation |
Test build #127099 has finished for PR 29363 at commit
|
Why don't you put your comment into the PR description? "How was this patch tested?" section is designed for that. In addition to that, please note that what I asked was the following.
|
I agree with the point of @dongjoon-hyun here. It's best to describe how to test so people just read and follow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me to except @dongjoon-hyun's point in the PR description.
The fix LGTM. This PR is kind of an improvement to skip the unnecessary table conversion, but also fixes the serde class loading issues. Agree with @dongjoon-hyun and let's mention it in the |
@cloud-fan @HyukjinKwon @dongjoon-hyun I have updated PR's description. Please, take a look at this PR one more time. |
thanks, merging to master! |
FYI, branch-3.0 has the same issue, should I backport it there ? |
Get table names directly from a sequence of Hive tables in `HiveClientImpl.listTablesByType()` by skipping conversions Hive tables to Catalog tables. A Hive metastore can be shared across many clients. A client can create tables using a SerDe which is not available on other clients, for instance `ROW FORMAT SERDE "com.ibm.spss.hive.serde2.xml.XmlSerDe"`. In the current implementation, other clients get the following exception while getting views: ``` java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class com.ibm.spss.hive.serde2.xml.XmlSerDe not found) ``` when `com.ibm.spss.hive.serde2.xml.XmlSerDe` is not available. Yes. For example, `SHOW VIEWS` returns a list of views instead of throwing an exception. - By existing test suites like: ``` $ build/sbt -Phive-2.3 "test:testOnly org.apache.spark.sql.hive.client.VersionsSuite" ``` - And manually: 1. Build Spark with Hive 1.2: `./build/sbt package -Phive-1.2 -Phive -Dhadoop.version=2.8.5` 2. Run spark-shell with a custom Hive SerDe, for instance download [json-serde-1.3.8-jar-with-dependencies.jar](https://github.com/cdamak/Twitter-Hive/blob/master/json-serde-1.3.8-jar-with-dependencies.jar) from https://github.com/cdamak/Twitter-Hive: ``` $ ./bin/spark-shell --jars ../Downloads/json-serde-1.3.8-jar-with-dependencies.jar ``` 3. Create a Hive table using this SerDe: ```scala scala> :paste // Entering paste mode (ctrl-D to finish) sql(s""" |CREATE TABLE json_table2(page_id INT NOT NULL) |ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' |""".stripMargin) // Exiting paste mode, now interpreting. res0: org.apache.spark.sql.DataFrame = [] scala> sql("SHOW TABLES").show +--------+-----------+-----------+ |database| tableName|isTemporary| +--------+-----------+-----------+ | default|json_table2| false| +--------+-----------+-----------+ scala> sql("SHOW VIEWS").show +---------+--------+-----------+ |namespace|viewName|isTemporary| +---------+--------+-----------+ +---------+--------+-----------+ ``` 4. Quit from the current `spark-shell` and run it without jars: ``` $ ./bin/spark-shell ``` 5. Show views. Without the fix, it throws the exception: ```scala scala> sql("SHOW VIEWS").show 20/08/06 10:53:36 ERROR log: error in initSerDe: java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2273) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:385) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:258) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:605) ``` After the fix: ```scala scala> sql("SHOW VIEWS").show +---------+--------+-----------+ |namespace|viewName|isTemporary| +---------+--------+-----------+ +---------+--------+-----------+ ``` Closes apache#29363 from MaxGekk/fix-listTablesByType-for-views. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit dc96f2f) Signed-off-by: Max Gekk <max.gekk@gmail.com>
Here is a backport for 3.0 #29377 |
…entImpl.listTablesByType` ### What changes were proposed in this pull request? Explicitly convert `tableNames` to `Seq` in `HiveClientImpl.listTablesByType` as it was done by c28a6fa#diff-6fd847124f8eae45ba2de1cf7d6296feR769 ### Why are the changes needed? See this PR #29111, to compile by Scala 2.13. The changes were discarded by #29363 accidentally. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Compiling by Scala 2.13 Closes #29379 from MaxGekk/fix-listTablesByType-for-views-followup. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Thank you, @MaxGekk and all. +1, late LGTM. |
…entImpl.listTablesByType` ### What changes were proposed in this pull request? Explicitly convert `tableNames` to `Seq` in `HiveClientImpl.listTablesByType` as it was done by apache/spark@c28a6fa#diff-6fd847124f8eae45ba2de1cf7d6296feR769 ### Why are the changes needed? See this PR apache/spark#29111, to compile by Scala 2.13. The changes were discarded by apache/spark#29363 accidentally. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Compiling by Scala 2.13 Closes #29379 from MaxGekk/fix-listTablesByType-for-views-followup. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Get table names directly from a sequence of Hive tables in
HiveClientImpl.listTablesByType()
by skipping conversions Hive tables to Catalog tables.Why are the changes needed?
A Hive metastore can be shared across many clients. A client can create tables using a SerDe which is not available on other clients, for instance
ROW FORMAT SERDE "com.ibm.spss.hive.serde2.xml.XmlSerDe"
. In the current implementation, other clients get the following exception while getting views:when
com.ibm.spss.hive.serde2.xml.XmlSerDe
is not available.Does this PR introduce any user-facing change?
Yes. For example,
SHOW VIEWS
returns a list of views instead of throwing an exception.How was this patch tested?
Build Spark with Hive 1.2:
./build/sbt package -Phive-1.2 -Phive -Dhadoop.version=2.8.5
Run spark-shell with a custom Hive SerDe, for instance download json-serde-1.3.8-jar-with-dependencies.jar from https://github.com/cdamak/Twitter-Hive:
spark-shell
and run it without jars:After the fix: