Skip to content

Allows HDFS to correctly navigate to correct location of split file.#51

Closed
jzgithub1 wants to merge 1 commit intoapache:masterfrom
jzeiberg:fixes_hdfs_fragment
Closed

Allows HDFS to correctly navigate to correct location of split file.#51
jzgithub1 wants to merge 1 commit intoapache:masterfrom
jzeiberg:fixes_hdfs_fragment

Conversation

@jzgithub1
Copy link
Copy Markdown
Contributor

In order to use the new hdfs caching code in the fix for Accumulo-1052 the workDir variable in the BulkImportTest class needs to be defined explicitly as relative path to working directory or hdfs assumes later on in the map reduce job that it the fragment link is located in the 'users' home linux directory which is not where the test is executing. Here is some debug output to verify the link creation.

[main} INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
[main} INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
[main} INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local936074918_0001
[main} INFO org.apache.hadoop.mapreduce.JobSubmitter - Executing with tokens: []
[main} INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-jzeiberg/mapred/local/1559593685834/splits.txt <- /home/jzeiberg/github/my_accumulo_examples/org.apache.accumulo.hadoop.mapreduce.partition.RangePartitioner.cutFile
[main} INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized file:/home/jzeiberg/github/my_accumulo_examples/tmp/bulkWork/splits.txt as file:/tmp/hadoop-jzeiberg/mapred/local/1559593685834/splits.txt

The link to the splits.txt file that is created is called: org.apache.accumulo.hadoop.mapreduce.partition.RangePartitioner.cutFile. That link actually gets used in RangePartitioner.getCutPoints now.

static String workDir = "tmp/bulkWork";
static String inputDir = "bulk";
static String workDir = "./tmp/bulkWork";
static String inputDir = "./bulk";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this change?

@jzgithub1
Copy link
Copy Markdown
Contributor Author

jzgithub1 commented Jun 4, 2019

In combination with my branch of accumulo-1052 it validates that my changes to the RangePartitioner for hdfs caching will work as expected. Without the "./" in front of outputDir variable, Hadoop looks for the fragment created in RangePartitioner.addSplit in hdfs://localhost:8020/user/jzeiberg/tmp/{cutfilename} ( I am doing this from memory so I pretty sure that's where it was looking) instead of local working directory where the link gets created.

The BulkImportExample class still does not set the input directory correctly as far as HDFS is concerned and I am working on that now but that is out of the scope of what we are trying to get working with the RangePartitioner now.

@Manno15
Copy link
Copy Markdown
Contributor

Manno15 commented Jun 5, 2019

I confirmed that adding the './ ' did allow this example to work locally and the RangePartitioner correctly created the hdfs fragment in the working directory of the test.

This did show, however, that this example has some issues in utilizing hdfs properly which is why it errors out originally. I am not sure if it just a pathing issue or not.

@jzgithub1 jzgithub1 closed this Jun 5, 2019
@jzgithub1 jzgithub1 reopened this Jun 5, 2019
@jzgithub1
Copy link
Copy Markdown
Contributor Author

I am pushing a better pull request that run a lot better than the present Bulk Import Test as it at least will perform the map reduce correctly. That is to say with my version of with new RangePartitioner.

@jzgithub1 jzgithub1 closed this Jun 6, 2019
@jzgithub1
Copy link
Copy Markdown
Contributor Author

This is not needed. I was running the classes in the debugger in Intellij and not in yarn so it ran differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants