Skip to content

Commit

Permalink
[SPARK-23963][SQL] Properly handle large number of columns in query o…
Browse files Browse the repository at this point in the history
…n text-based Hive table

## What changes were proposed in this pull request?

TableReader would get disproportionately slower as the number of columns in the query increased.

I fixed the way TableReader was looking up metadata for each column in the row. Previously, it had been looking up this data in linked lists, accessing each linked list by an index (column number). Now it looks up this data in arrays, where indexing by column number works better.

## How was this patch tested?

Manual testing
All sbt unit tests
python sql tests

Author: Bruce Robbins <bersprockets@gmail.com>

Closes #21043 from bersprockets/tabreadfix.
  • Loading branch information
bersprockets authored and gatorsmile committed Apr 18, 2018
1 parent a902323 commit 041aec4
Showing 1 changed file with 1 addition and 1 deletion.
Expand Up @@ -381,7 +381,7 @@ private[hive] object HadoopTableReader extends HiveInspectors with Logging {

val (fieldRefs, fieldOrdinals) = nonPartitionKeyAttrs.map { case (attr, ordinal) =>
soi.getStructFieldRef(attr.name) -> ordinal
}.unzip
}.toArray.unzip

/**
* Builds specific unwrappers ahead of time according to object inspector
Expand Down

0 comments on commit 041aec4

Please sign in to comment.