Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18989][SQL] DESC TABLE should not fail with format class not found #16388

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

When we describe a table, we only wanna see the information of this table, not read it, so it's ok even if the format class is not present at the classpath.

How was this patch tested?

new regression test

@cloud-fan
Copy link
Contributor Author

cc @yhuai @rxin

@SparkQA
Copy link

SparkQA commented Dec 23, 2016

Test build #70548 has finished for PR 16388 at commit 7eda122.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

inputFormat = Option(h.getInputFormatClass).map(_.getName),
outputFormat = Option(h.getOutputFormatClass).map(_.getName),
inputFormat = Option(h.getTTable.getSd.getInputFormat),
outputFormat = Option(h.getTTable.getSd.getOutputFormat),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the impl of getInputFormatClass and getOutputFormatClass . It has extra logics if either getTTable.getSd.getInputFormat / getTTable.getSd.getOutputFormat is null.

I checked the change history. These extra logics were added in https://issues.apache.org/jira/browse/HIVE-1122, https://issues.apache.org/jira/browse/HIVE-705, and https://issues.apache.org/jira/browse/HIVE-5260.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we actually read the hive table, we still use getInputFormatClass. So this will only affect the DESC TABLE, and should be OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After more readings, getTTable.getSd.getInputFormat and getTTable.getSd.getOutputFormat will be null for non-native Hive tables, e.g., JDBC tables, HBase tables and Cassandra tables. See the link for more details: https://cwiki.apache.org/confluence/display/Hive/StorageHandlers

So far, this is OK. I am just afraid we might expand the usage of getTableOption in the future. Maybe at least document the restrictions?

@SparkQA
Copy link

SparkQA commented Dec 26, 2016

Test build #70581 has started for PR 16388 at commit 1f277f4.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 26, 2016

Test build #70595 has finished for PR 16388 at commit 1f277f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in dd724c8 Dec 26, 2016
// the class name directly. However, for non-native tables, there is no interface to get
// the format class name, so we may still throw ClassNotFound in this case.
inputFormat = Option(h.getTTable.getSd.getInputFormat).orElse {
Option(h.getStorageHandler).map(_.getInputFormatClass.getName)
Copy link
Contributor

@yhuai yhuai Dec 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually also fixed a bug (we did not look for storage handler)? Is it possible to have a test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's not a bug, getInputFormatClass will look for storage handler.

cmonkey pushed a commit to cmonkey/spark that referenced this pull request Dec 29, 2016
…ound

## What changes were proposed in this pull request?

When we describe a table, we only wanna see the information of this table, not read it, so it's ok even if the format class is not present at the classpath.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#16388 from cloud-fan/hive.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ound

## What changes were proposed in this pull request?

When we describe a table, we only wanna see the information of this table, not read it, so it's ok even if the format class is not present at the classpath.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#16388 from cloud-fan/hive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants