Skip to content

Commit

Permalink
[SPARK-32546][SQL] Get table names directly from Hive tables
Browse files Browse the repository at this point in the history
Get table names directly from a sequence of Hive tables in `HiveClientImpl.listTablesByType()` by skipping conversions Hive tables to Catalog tables.

A Hive metastore can be shared across many clients. A client can create tables using a SerDe which is not available on other clients, for instance `ROW FORMAT SERDE "com.ibm.spss.hive.serde2.xml.XmlSerDe"`. In the current implementation, other clients get the following exception while getting views:
```
java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class com.ibm.spss.hive.serde2.xml.XmlSerDe not found)
```
when `com.ibm.spss.hive.serde2.xml.XmlSerDe` is not available.

Yes. For example, `SHOW VIEWS` returns a list of views instead of throwing an exception.

- By existing test suites like:
```
$ build/sbt -Phive-2.3 "test:testOnly org.apache.spark.sql.hive.client.VersionsSuite"
```
- And manually:

1. Build Spark with Hive 1.2: `./build/sbt package -Phive-1.2 -Phive -Dhadoop.version=2.8.5`

2. Run spark-shell with a custom Hive SerDe, for instance download [json-serde-1.3.8-jar-with-dependencies.jar](https://github.com/cdamak/Twitter-Hive/blob/master/json-serde-1.3.8-jar-with-dependencies.jar) from https://github.com/cdamak/Twitter-Hive:
```
$ ./bin/spark-shell --jars ../Downloads/json-serde-1.3.8-jar-with-dependencies.jar
```

3. Create a Hive table using this SerDe:
```scala
scala> :paste
// Entering paste mode (ctrl-D to finish)

sql(s"""
  |CREATE TABLE json_table2(page_id INT NOT NULL)
  |ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
  |""".stripMargin)

// Exiting paste mode, now interpreting.
res0: org.apache.spark.sql.DataFrame = []

scala> sql("SHOW TABLES").show
+--------+-----------+-----------+
|database|  tableName|isTemporary|
+--------+-----------+-----------+
| default|json_table2|      false|
+--------+-----------+-----------+

scala> sql("SHOW VIEWS").show
+---------+--------+-----------+
|namespace|viewName|isTemporary|
+---------+--------+-----------+
+---------+--------+-----------+
```

4. Quit from the current `spark-shell` and run it without jars:
```
$ ./bin/spark-shell
```

5. Show views. Without the fix, it throws the exception:
```scala
scala> sql("SHOW VIEWS").show
20/08/06 10:53:36 ERROR log: error in initSerDe: java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found
java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2273)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:385)
	at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276)
	at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:258)
	at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:605)
```

After the fix:
```scala
scala> sql("SHOW VIEWS").show
+---------+--------+-----------+
|namespace|viewName|isTemporary|
+---------+--------+-----------+
+---------+--------+-----------+
```

Closes #29363 from MaxGekk/fix-listTablesByType-for-views.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit dc96f2f)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
  • Loading branch information
MaxGekk committed Aug 6, 2020
1 parent ab5034f commit 9f3bda9
Showing 1 changed file with 5 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -759,15 +759,17 @@ private[hive] class HiveClientImpl(
dbName: String,
pattern: String,
tableType: CatalogTableType): Seq[String] = withHiveState {
val hiveTableType = toHiveTableType(tableType)
try {
// Try with Hive API getTablesByType first, it's supported from Hive 2.3+.
shim.getTablesByType(client, dbName, pattern, toHiveTableType(tableType))
shim.getTablesByType(client, dbName, pattern, hiveTableType)
} catch {
case _: UnsupportedOperationException =>
// Fallback to filter logic if getTablesByType not supported.
val tableNames = client.getTablesByPattern(dbName, pattern).asScala
val tables = getTablesByName(dbName, tableNames).filter(_.tableType == tableType)
tables.map(_.identifier.table)
getRawTablesByName(dbName, tableNames)
.filter(_.getTableType == hiveTableType)
.map(_.getTableName)
}
}

Expand Down

0 comments on commit 9f3bda9

Please sign in to comment.