Skip to content

[SPARK-33094][SQL][2.4] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system#29987

Closed
MaxGekk wants to merge 2 commits intoapache:branch-2.4from
MaxGekk:orc-option-propagation-2.4
Closed

[SPARK-33094][SQL][2.4] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system#29987
MaxGekk wants to merge 2 commits intoapache:branch-2.4from
MaxGekk:orc-option-propagation-2.4

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Oct 9, 2020

What changes were proposed in this pull request?

Propagate ORC options to Hadoop configs in Hive OrcFileFormat and in the regular ORC datasource.

Why are the changes needed?

There is a bug that when running:

spark.read.format("orc").options(conf).load(path)

The underlying file system will not receive the conf options.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Added UT to OrcSourceSuite.

…DS options to underlying HDFS file system

Propagate ORC options to Hadoop configs in Hive `OrcFileFormat` and in the regular ORC datasource.

There is a bug that when running:
```scala
spark.read.format("orc").options(conf).load(path)
```
The underlying file system will not receive the conf options.

Yes

Added UT to `OrcSourceSuite`.

Authored-by: Max Gekk <max.gekkgmail.com>
Signed-off-by: Dongjoon Hyun <dhyunapple.com>
(cherry picked from commit c5f6af9)
Signed-off-by: Max Gekk <max.gekkgmail.com>

Closes apache#29985 from MaxGekk/orc-option-propagation-3.0.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit 9892b3e)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@SparkQA
Copy link

SparkQA commented Oct 9, 2020

Test build #129586 has finished for PR 29987 at commit d7b8467.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 9, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34191/

@SparkQA
Copy link

SparkQA commented Oct 9, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34191/

@SparkQA
Copy link

SparkQA commented Oct 9, 2020

Test build #129588 has finished for PR 29987 at commit 0f9f25e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 9, 2020

@dongjoon-hyun @HyukjinKwon Please, review this PR.

@HyukjinKwon
Copy link
Member

Merged to branch-2.4.

HyukjinKwon pushed a commit that referenced this pull request Oct 10, 2020
…DS options to underlying HDFS file system

### What changes were proposed in this pull request?
Propagate ORC options to Hadoop configs in Hive `OrcFileFormat` and in the regular ORC datasource.

### Why are the changes needed?
There is a bug that when running:
```scala
spark.read.format("orc").options(conf).load(path)
```
The underlying file system will not receive the conf options.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `OrcSourceSuite`.

Closes #29987 from MaxGekk/orc-option-propagation-2.4.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@MaxGekk MaxGekk deleted the orc-option-propagation-2.4 branch December 11, 2020 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants