Skip to content

Improve the performance of getDistinctPartitionReadFileFormats for HiveTableScanExecTransformer #11797

@beliefer

Description

@beliefer

Description

Currently, the original code has a lot of issues.

  • Perform expensive HiveClientImpl.fromHivePartition() operation on each partition

  • Perform two traversal using exists() and map()

  • No caching mechanism, repeated calculation of the same format conversion

Gluten version

main branch

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions