HDFSFileFInder #2

Merged
merged 2 commits into from Jun 28, 2012

Conversation

Projects
None yet
2 participants
@brianmartin

A utility that returns a map from node -> (number of bytes) for a given path.

This is useful for applications that require data locality. That is, choosing to schedule a job where the majority of a file is located.

Example test output for a 15b file with 8b blocks:

127.0.0.1:58283 : 7
127.0.0.1:58289 : 15
127.0.0.1:58286 : 15
127.0.0.1:58292 : 8

p.s. I'm definitely open to a name change.

@jwills

This comment has been minimized.

Show comment Hide comment
@jwills

jwills Jun 28, 2012

The "extends Configured" bit handles the getConf/setConf stuff for you-- you just need to do:

ToolRunner.run(new Configuration(), new HDFSFileFinder(), args);

in the main method.

The "extends Configured" bit handles the getConf/setConf stuff for you-- you just need to do:

ToolRunner.run(new Configuration(), new HDFSFileFinder(), args);

in the main method.

@jwills

This comment has been minimized.

Show comment Hide comment
@jwills

jwills Jun 28, 2012

Might be worth looking at FileSystem.globStatus here-- it seems like giving a glob argument on the commandline would be pretty common.

Might be worth looking at FileSystem.globStatus here-- it seems like giving a glob argument on the commandline would be pretty common.

jwills added a commit that referenced this pull request Jun 28, 2012

@jwills jwills merged commit 662179b into cloudera:master Jun 28, 2012

@jwills

This comment has been minimized.

Show comment Hide comment
@jwills

jwills Jun 28, 2012

Contributor

Thanks Brian!

Contributor

jwills commented Jun 28, 2012

Thanks Brian!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment