Skip to content

Review DistributedCache usage (pretty sure it's very broken) #1052

@ctubbsii

Description

@ctubbsii

The best explanation (if it even works) of how to use Hadoop's distributed cache is at https://stackoverflow.com/a/26421057/196405

But, I have not tested this.

Our code does not do anything like this.... we tell the distributed cache to add a cache file, but then we just loop through the list of cached URIs, pointlessly, and then reach directly out to the DFS to read it.

All of the Hadoop APIs to read the local cache files are (rightfully) deprecated, because there's no way to map them to the files that were added to the cache. But the APIs to add them to the cache are not deprecated... but they also aren't documented.

We should test the above solution, and if it works, use it, otherwise, completely remove any use of the distributed cache in our MapReduce code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    blockerThis issue blocks any release version labeled on it.bugThis issue has been verified to be a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions