Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19847][SQL] port hive read to FileFormat API #17187

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

implement the read logic in HiveFileFormat, to unify the table read path between data source and hive serde tables.

The major change is, hive partition may have a different serde, so the planner should put more information in PartitionedFile and send it to executors.

Tow things need to be improved in the future:

  1. Due to the way we read hive table files, we do not support reading a partial file yet, which may reduce the parallelism for large files.
  2. Hive tables with storage handler(non-file-based) still go to the old code path.

How was this patch tested?

existing tests.

@cloud-fan
Copy link
Contributor Author

cc @sameeragarwal @rxin @gatorsmile

@SparkQA
Copy link

SparkQA commented Mar 7, 2017

Test build #74075 has started for PR 17187 at commit ad47887.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Mar 7, 2017

Test build #74080 has finished for PR 17187 at commit ad47887.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class PartitionDirectory(
  • case class PartitionPath(values: InternalRow, path: Path, @Nullable metadata: AnyRef = null)
  • class ReadHiveDataSource(session: SparkSession) extends Rule[LogicalPlan]
  • class HiveFileIndex(


override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
case c: CatalogRelation if DDLUtils.isHiveTable(c.tableMeta) &&
// Hive tables with storage handler with be handled in a different way, in `HiveTableScans`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with be -> will be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants