-
Notifications
You must be signed in to change notification settings - Fork 479
Description
The best explanation (if it even works) of how to use Hadoop's distributed cache is at https://stackoverflow.com/a/26421057/196405
But, I have not tested this.
Our code does not do anything like this.... we tell the distributed cache to add a cache file, but then we just loop through the list of cached URIs, pointlessly, and then reach directly out to the DFS to read it.
All of the Hadoop APIs to read the local cache files are (rightfully) deprecated, because there's no way to map them to the files that were added to the cache. But the APIs to add them to the cache are not deprecated... but they also aren't documented.
We should test the above solution, and if it works, use it, otherwise, completely remove any use of the distributed cache in our MapReduce code.