Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalDataSegmentPusher: Fix for Hadoop + relative paths. #1761

Merged
merged 1 commit into from
Sep 22, 2015

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Sep 22, 2015

This makes the path-for-hadoop stuff work with local-mode hadoop when you have a relative local storageDirectory (like simply "segments" if you want to store stuff in segments/ in your current working directory).

@@ -54,7 +54,7 @@ public LocalDataSegmentPusher(
@Override
public String getPathForHadoop(String dataSource)
{
return String.format("file://%s/%s", config.getStorageDirectory(), dataSource);
return String.format("file://%s/%s", config.getStorageDirectory().getAbsoluteFile(), dataSource);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it at all possible to use standard URI building things here? Either hadoop FS path building or the java File and Path constructs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was avoiding the Hadoop classes since this isn't a hadoop related module. Could probably use the Java stuff a bit more though, let me see

@gianm gianm force-pushed the local-path-for-hadoop branch 2 times, most recently from 6f9b990 to fc9f9b7 Compare September 22, 2015 02:37
@gianm gianm closed this Sep 22, 2015
@gianm gianm reopened this Sep 22, 2015
@gianm
Copy link
Contributor Author

gianm commented Sep 22, 2015

@drcrallen .toURI.toString works too, updated the patch. It generates different uris ("file:/x/y/z" rather than "file:///x/y/z") but they still work with hadoop.

@himanshug
Copy link
Contributor

"It generates different uris ("file:/x/y/z" rather than "file:///x/y/z") but they still work with hadoop."
is that a hadoop bug :) or is that supposed to work really?

@gianm
Copy link
Contributor Author

gianm commented Sep 22, 2015

Do you mean, is it a bug that "file:/blah/blah/blah" works?

I don't think it's a bug. That's what you get when you do a toURI on a File in Java, so I think it makes sense that it should work in Hadoop. It also appears to also be OK by RFC 3986 in that "file:/blah" is a uri with scheme "file", no authority, and path "/blah". "file:///blah" is the same thing with an empty string authority instead of no authority, which should get interpreted the same way. At least in my reading of the thing.

      URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

      hier-part   = "//" authority path-abempty
                  / path-absolute
                  / path-rootless
                  / path-empty

      path          = path-abempty    ; begins with "/" or is empty
                    / path-absolute   ; begins with "/" but not "//"
                    / path-noscheme   ; begins with a non-colon segment
                    / path-rootless   ; begins with a segment
                    / path-empty      ; zero characters

and

   If the URI scheme defines a default for host, then that default
   applies when the host subcomponent is undefined or when the
   registered name is empty (zero length).  For example, the "file" URI
   scheme is defined so that no authority, an empty host, and
   "localhost" all mean the end-user's machine, whereas the "http"
   scheme considers a missing authority or empty host invalid.

@himanshug
Copy link
Contributor

LGTM
ok sounds like its not a hadoop bug but should work.

@gianm gianm closed this Sep 22, 2015
@gianm gianm reopened this Sep 22, 2015
xvrl added a commit that referenced this pull request Sep 22, 2015
LocalDataSegmentPusher: Fix for Hadoop + relative paths.
@xvrl xvrl merged commit 35caa75 into apache:master Sep 22, 2015
@gianm gianm deleted the local-path-for-hadoop branch September 23, 2022 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants