Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19260] Spaces or "%20" in path parameter are not correctly handled with… #16614

Closed
wants to merge 4 commits into from

Conversation

zuotingbing
Copy link

@zuotingbing zuotingbing commented Jan 17, 2017

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-19260

What changes were proposed in this pull request?

  1. “spark.history.fs.logDirectory” supports with space character and “%20” characters.
  2. As usually, if the run classpath includes hdfs-site.xml and core-site.xml files, the supplied path eg."/test" which does not contain a scheme should be taken as a HDFS path rather than a local path since the path parameter is a Hadoop dir.

How was this patch tested?

Update Unit Test and take some manual tests

local:
.sbin/start-history-server.sh "file:/a b"
.sbin/start-history-server.sh "/abc%20c" (without hdfs-site.xml,core-site.xml)
.sbin/start-history-server.sh "/a b" (without hdfs-site.xml,core-site.xml)
.sbin/start-history-server.sh "/a b/a bc%20c" (without hdfs-site.xml,core-site.xml)

hdfs:
.sbin/start-history-server.sh "hdfs:/namenode:9000/a b"
.sbin/start-history-server.sh "/a b" (with hdfs-site.xml,core-site.xml)
.sbin/start-history-server.sh "/a b/a bc%20c" (with hdfs-site.xml,core-site.xml)

.map { d => Utils.resolveURI(d).toString }
.getOrElse(DEFAULT_LOG_DIR)

private val logDir = URLDecoder.decode(logURIString, "UTF-8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is either a local path, in which case it shouldn't be URI-escaped, or it's a URI, in which case it can be parsed with java.net.URI right? I think we take the latter approach in most other places in the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. I think the logic in Utils:resolveURI(path: String) seems not correct. When the path is "hdfs://namenode:9000/a b", "new URI(path)" will throws URISyntaxException and return a URI with "file" scheme instead of “hdfs” scheme .
To avoid this maybe we can use Path.getFileSystem to get the right filesystem. Thank you.

@zuotingbing zuotingbing changed the title [SPARK-19260] Spaces in path parameter are not correctly handled with… [SPARK-19260] Spaces or "%20" in path parameter are not correctly handled with… Jan 22, 2017
@SparkQA
Copy link

SparkQA commented Jan 22, 2017

Test build #3543 has finished for PR 16614 at commit ebca3ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2017

Test build #3549 has finished for PR 16614 at commit 23834a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -47,7 +47,7 @@ class FsHistoryProviderSuite extends SparkFunSuite with BeforeAndAfter with Matc
private var testDir: File = null

before {
testDir = Utils.createTempDir()
testDir = Utils.createTempDir(namePrefix = s"a b%20c+d")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, because this affects the whole suite, I wonder if it's more sensible to test this behavior once in a specific test case, rather than test the behavior everywhere. It works, and I don't have a big problem with it, just looks a little surprising to read.

@srowen
Copy link
Member

srowen commented Jan 30, 2017

Ping @zuotingbing

@srowen
Copy link
Member

srowen commented Feb 4, 2017

@zuotingbing see my last comment. I can fix this separately if you aren't able to follow up

@zuotingbing
Copy link
Author

@srowen Happy Chinese New Year !
Sorry for the late reply. It sounds more sensible to test this behavior everywhere since all the test cases used the directory, right? Thank you for your patience and guidance.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it's a little funny to me to not isolate the test for this particular corner case but it's not unreasonable.

@zuotingbing
Copy link
Author

zuotingbing commented Feb 7, 2017

It would be great if you could merge this to master and close this PR.
Thanks in advance for your help.

@srowen
Copy link
Member

srowen commented Feb 7, 2017

Merged to master

@asfgit asfgit closed this in 8fd178d Feb 7, 2017
@zuotingbing zuotingbing deleted the SPARK-19260 branch February 9, 2017 03:34
@zuotingbing
Copy link
Author

Thanks!

cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…dled with…

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-19260

## What changes were proposed in this pull request?

1. “spark.history.fs.logDirectory” supports with space character and “%20” characters.
2. As usually, if the run classpath includes hdfs-site.xml and core-site.xml files, the supplied path eg."/test" which does not contain a scheme should be taken as a HDFS path rather than a local path since the path parameter is a Hadoop dir.

## How was this patch tested?
Update Unit Test and take some manual tests

local:
.sbin/start-history-server.sh "file:/a b"
.sbin/start-history-server.sh "/abc%20c" (without hdfs-site.xml,core-site.xml)
.sbin/start-history-server.sh "/a b" (without hdfs-site.xml,core-site.xml)
.sbin/start-history-server.sh "/a b/a bc%20c" (without hdfs-site.xml,core-site.xml)

hdfs:
.sbin/start-history-server.sh "hdfs:/namenode:9000/a b"
.sbin/start-history-server.sh "/a b" (with hdfs-site.xml,core-site.xml)
.sbin/start-history-server.sh "/a b/a bc%20c" (with hdfs-site.xml,core-site.xml)

Author: zuotingbing <zuo.tingbing9@zte.com.cn>

Closes apache#16614 from zuotingbing/SPARK-19260.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants