Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22635][SQL][ORC] FileNotFoundException while reading ORC files containing special characters #19844

Closed
wants to merge 2 commits into from

Conversation

mgaido91
Copy link
Contributor

What changes were proposed in this pull request?

SPARK-22146 fix the FileNotFoundException issue only for the inferSchema method, ie. only for the schema inference, but it doesn't fix the problem when actually reading the data. Thus nearly the same exception happens when someone tries to use the data. This PR covers fixing the problem also there.

How was this patch tested?

enhanced UT

@mgaido91
Copy link
Contributor Author

@dongjoon-hyun @gatorsmile you helped reviewing SPARK-22146. Might you please help reviewing this too? Sorry for pinging you directly.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -59,8 +59,9 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable
sparkSession: SparkSession,
options: Map[String, String],
files: Seq[FileStatus]): Option[StructType] = {
val fileNames = files.map(_.getPath.toString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal but just curious - why did you name it fileNames BTW? I thought it's going to be just like paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they are String and not Paths. Actually this change is not needed, I changed it only because it was easier for me while debugging. I can put it back as it was before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it's not the name of a file though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I reverted the change, thanks.

@SparkQA
Copy link

SparkQA commented Nov 29, 2017

Test build #84292 has finished for PR 19844 at commit c7e817f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.
Thank you, @mgaido91 .

@SparkQA
Copy link

SparkQA commented Nov 29, 2017

Test build #84304 has finished for PR 19844 at commit 8cda8c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

ping, @cloud-fan , too.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 932bd09 Nov 30, 2017
@mgaido91
Copy link
Contributor Author

mgaido91 commented Dec 1, 2017

thanks @HyukjinKwon , may I kindly ask you to backport also to branch-2.2? Thanks.

@HyukjinKwon
Copy link
Member

I think SPARK-22146 is not backported though?

@HyukjinKwon
Copy link
Member

Ah, it was mistake of fixed version in the JIRA. Sure, will push it too.

asfgit pushed a commit that referenced this pull request Dec 1, 2017
… containing special characters

## What changes were proposed in this pull request?

SPARK-22146 fix the FileNotFoundException issue only for the `inferSchema` method, ie. only for the schema inference, but it doesn't fix the problem when actually reading the data. Thus nearly the same exception happens when someone tries to use the data. This PR covers fixing the problem also there.

## How was this patch tested?

enhanced UT

Author: Marco Gaido <mgaido@hortonworks.com>

Closes #19844 from mgaido91/SPARK-22635.
@HyukjinKwon
Copy link
Member

Merged to branch-2.2 too.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Dec 1, 2017

thank you @HyukjinKwon

MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
… containing special characters

## What changes were proposed in this pull request?

SPARK-22146 fix the FileNotFoundException issue only for the `inferSchema` method, ie. only for the schema inference, but it doesn't fix the problem when actually reading the data. Thus nearly the same exception happens when someone tries to use the data. This PR covers fixing the problem also there.

## How was this patch tested?

enhanced UT

Author: Marco Gaido <mgaido@hortonworks.com>

Closes apache#19844 from mgaido91/SPARK-22635.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants