Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32546][SQL][3.0] Get table names directly from Hive tables #29377

Closed

Commits on Aug 6, 2020

  1. [SPARK-32546][SQL] Get table names directly from Hive tables

    Get table names directly from a sequence of Hive tables in `HiveClientImpl.listTablesByType()` by skipping conversions Hive tables to Catalog tables.
    
    A Hive metastore can be shared across many clients. A client can create tables using a SerDe which is not available on other clients, for instance `ROW FORMAT SERDE "com.ibm.spss.hive.serde2.xml.XmlSerDe"`. In the current implementation, other clients get the following exception while getting views:
    ```
    java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class com.ibm.spss.hive.serde2.xml.XmlSerDe not found)
    ```
    when `com.ibm.spss.hive.serde2.xml.XmlSerDe` is not available.
    
    Yes. For example, `SHOW VIEWS` returns a list of views instead of throwing an exception.
    
    - By existing test suites like:
    ```
    $ build/sbt -Phive-2.3 "test:testOnly org.apache.spark.sql.hive.client.VersionsSuite"
    ```
    - And manually:
    
    1. Build Spark with Hive 1.2: `./build/sbt package -Phive-1.2 -Phive -Dhadoop.version=2.8.5`
    
    2. Run spark-shell with a custom Hive SerDe, for instance download [json-serde-1.3.8-jar-with-dependencies.jar](https://github.com/cdamak/Twitter-Hive/blob/master/json-serde-1.3.8-jar-with-dependencies.jar) from https://github.com/cdamak/Twitter-Hive:
    ```
    $ ./bin/spark-shell --jars ../Downloads/json-serde-1.3.8-jar-with-dependencies.jar
    ```
    
    3. Create a Hive table using this SerDe:
    ```scala
    scala> :paste
    // Entering paste mode (ctrl-D to finish)
    
    sql(s"""
      |CREATE TABLE json_table2(page_id INT NOT NULL)
      |ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
      |""".stripMargin)
    
    // Exiting paste mode, now interpreting.
    res0: org.apache.spark.sql.DataFrame = []
    
    scala> sql("SHOW TABLES").show
    +--------+-----------+-----------+
    |database|  tableName|isTemporary|
    +--------+-----------+-----------+
    | default|json_table2|      false|
    +--------+-----------+-----------+
    
    scala> sql("SHOW VIEWS").show
    +---------+--------+-----------+
    |namespace|viewName|isTemporary|
    +---------+--------+-----------+
    +---------+--------+-----------+
    ```
    
    4. Quit from the current `spark-shell` and run it without jars:
    ```
    $ ./bin/spark-shell
    ```
    
    5. Show views. Without the fix, it throws the exception:
    ```scala
    scala> sql("SHOW VIEWS").show
    20/08/06 10:53:36 ERROR log: error in initSerDe: java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found
    java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe not found
    	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2273)
    	at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:385)
    	at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276)
    	at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:258)
    	at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:605)
    ```
    
    After the fix:
    ```scala
    scala> sql("SHOW VIEWS").show
    +---------+--------+-----------+
    |namespace|viewName|isTemporary|
    +---------+--------+-----------+
    +---------+--------+-----------+
    ```
    
    Closes apache#29363 from MaxGekk/fix-listTablesByType-for-views.
    
    Authored-by: Max Gekk <max.gekk@gmail.com>
    Signed-off-by: Wenchen Fan <wenchen@databricks.com>
    (cherry picked from commit dc96f2f)
    Signed-off-by: Max Gekk <max.gekk@gmail.com>
    MaxGekk committed Aug 6, 2020
    Configuration menu
    Copy the full SHA
    9f3bda9 View commit details
    Browse the repository at this point in the history