Allows HDFS to correctly navigate to correct location of split file. by jzgithub1 · Pull Request #51 · apache/accumulo-examples

jzgithub1 · 2019-06-03T20:51:39Z

In order to use the new hdfs caching code in the fix for Accumulo-1052 the workDir variable in the BulkImportTest class needs to be defined explicitly as relative path to working directory or hdfs assumes later on in the map reduce job that it the fragment link is located in the 'users' home linux directory which is not where the test is executing. Here is some debug output to verify the link creation.

[main} INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
[main} INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
[main} INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local936074918_0001
[main} INFO org.apache.hadoop.mapreduce.JobSubmitter - Executing with tokens: []
[main} INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-jzeiberg/mapred/local/1559593685834/splits.txt <- /home/jzeiberg/github/my_accumulo_examples/org.apache.accumulo.hadoop.mapreduce.partition.RangePartitioner.cutFile
[main} INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized file:/home/jzeiberg/github/my_accumulo_examples/tmp/bulkWork/splits.txt as file:/tmp/hadoop-jzeiberg/mapred/local/1559593685834/splits.txt

The link to the splits.txt file that is created is called: org.apache.accumulo.hadoop.mapreduce.partition.RangePartitioner.cutFile. That link actually gets used in RangePartitioner.getCutPoints now.

…n BulkImportTest

ctubbsii · 2019-06-03T22:21:02Z

src/main/java/org/apache/accumulo/examples/mapreduce/bulk/BulkIngestExample.java

-  static String workDir = "tmp/bulkWork";
-  static String inputDir = "bulk";
+  static String workDir = "./tmp/bulkWork";
+  static String inputDir = "./bulk";


What does this change?

jzgithub1 · 2019-06-04T00:50:33Z

In combination with my branch of accumulo-1052 it validates that my changes to the RangePartitioner for hdfs caching will work as expected. Without the "./" in front of outputDir variable, Hadoop looks for the fragment created in RangePartitioner.addSplit in hdfs://localhost:8020/user/jzeiberg/tmp/{cutfilename} ( I am doing this from memory so I pretty sure that's where it was looking) instead of local working directory where the link gets created.

The BulkImportExample class still does not set the input directory correctly as far as HDFS is concerned and I am working on that now but that is out of the scope of what we are trying to get working with the RangePartitioner now.

Manno15 · 2019-06-05T14:05:17Z

I confirmed that adding the './ ' did allow this example to work locally and the RangePartitioner correctly created the hdfs fragment in the working directory of the test.

This did show, however, that this example has some issues in utilizing hdfs properly which is why it errors out originally. I am not sure if it just a pathing issue or not.

jzgithub1 · 2019-06-05T15:20:31Z

I am pushing a better pull request that run a lot better than the present Bulk Import Test as it at least will perform the map reduce correctly. That is to say with my version of with new RangePartitioner.

jzgithub1 · 2019-06-06T17:35:44Z

This is not needed. I was running the classes in the debugger in Intellij and not in yarn so it ran differently.

Allows HDFS to correctly navigate to correct location of split file i…

a583bcd

…n BulkImportTest

ctubbsii reviewed Jun 3, 2019

View reviewed changes

jzgithub1 closed this Jun 5, 2019

jzgithub1 reopened this Jun 5, 2019

jzgithub1 closed this Jun 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allows HDFS to correctly navigate to correct location of split file.#51

Allows HDFS to correctly navigate to correct location of split file.#51
jzgithub1 wants to merge 1 commit intoapache:masterfrom
jzeiberg:fixes_hdfs_fragment

jzgithub1 commented Jun 3, 2019

Uh oh!

ctubbsii Jun 3, 2019

Uh oh!

jzgithub1 commented Jun 4, 2019 •

edited

Loading

Uh oh!

Manno15 commented Jun 5, 2019

Uh oh!

jzgithub1 commented Jun 5, 2019

Uh oh!

jzgithub1 commented Jun 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jzgithub1 commented Jun 3, 2019

Uh oh!

ctubbsii Jun 3, 2019

Choose a reason for hiding this comment

Uh oh!

jzgithub1 commented Jun 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Manno15 commented Jun 5, 2019

Uh oh!

jzgithub1 commented Jun 5, 2019

Uh oh!

jzgithub1 commented Jun 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jzgithub1 commented Jun 4, 2019 •

edited

Loading