Allows HDFS to correctly navigate to correct location of split file.#51
Allows HDFS to correctly navigate to correct location of split file.#51jzgithub1 wants to merge 1 commit intoapache:masterfrom
Conversation
| static String workDir = "tmp/bulkWork"; | ||
| static String inputDir = "bulk"; | ||
| static String workDir = "./tmp/bulkWork"; | ||
| static String inputDir = "./bulk"; |
|
In combination with my branch of accumulo-1052 it validates that my changes to the RangePartitioner for hdfs caching will work as expected. Without the "./" in front of outputDir variable, Hadoop looks for the fragment created in RangePartitioner.addSplit in hdfs://localhost:8020/user/jzeiberg/tmp/{cutfilename} ( I am doing this from memory so I pretty sure that's where it was looking) instead of local working directory where the link gets created. The BulkImportExample class still does not set the input directory correctly as far as HDFS is concerned and I am working on that now but that is out of the scope of what we are trying to get working with the RangePartitioner now. |
|
I confirmed that adding the './ ' did allow this example to work locally and the RangePartitioner correctly created the hdfs fragment in the working directory of the test. This did show, however, that this example has some issues in utilizing hdfs properly which is why it errors out originally. I am not sure if it just a pathing issue or not. |
|
I am pushing a better pull request that run a lot better than the present Bulk Import Test as it at least will perform the map reduce correctly. That is to say with my version of with new RangePartitioner. |
|
This is not needed. I was running the classes in the debugger in Intellij and not in yarn so it ran differently. |
In order to use the new hdfs caching code in the fix for Accumulo-1052 the workDir variable in the BulkImportTest class needs to be defined explicitly as relative path to working directory or hdfs assumes later on in the map reduce job that it the fragment link is located in the 'users' home linux directory which is not where the test is executing. Here is some debug output to verify the link creation.
[main} INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
[main} INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
[main} INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local936074918_0001
[main} INFO org.apache.hadoop.mapreduce.JobSubmitter - Executing with tokens: []
[main} INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-jzeiberg/mapred/local/1559593685834/splits.txt <- /home/jzeiberg/github/my_accumulo_examples/org.apache.accumulo.hadoop.mapreduce.partition.RangePartitioner.cutFile
[main} INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized file:/home/jzeiberg/github/my_accumulo_examples/tmp/bulkWork/splits.txt as file:/tmp/hadoop-jzeiberg/mapred/local/1559593685834/splits.txt
The link to the splits.txt file that is created is called: org.apache.accumulo.hadoop.mapreduce.partition.RangePartitioner.cutFile. That link actually gets used in RangePartitioner.getCutPoints now.