Review DistributedCache usage (pretty sure it's very broken)

The best explanation (if it even works) of how to use Hadoop's distributed cache is at https://stackoverflow.com/a/26421057/196405

But, I have not tested this.

Our code does not do anything like this.... we tell the distributed cache to add a cache file, but then we just loop through the list of cached URIs, pointlessly, and then reach directly out to the DFS to read it.

All of the Hadoop APIs to read the local cache files are (rightfully) deprecated, because there's no way to map them to the files that were added to the cache. But the APIs to add them to the cache are not deprecated... but they also aren't documented.

We should test the above solution, and if it works, use it, otherwise, completely remove any use of the distributed cache in our MapReduce code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review DistributedCache usage (pretty sure it's very broken) #1052

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Review DistributedCache usage (pretty sure it's very broken) #1052

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions