Skip to content

data-tsunami/hello-hadoop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hello-hadoop

Hello World Hadoop project... Eclipse + Maven + unit tests

Project files & dirs

  • /conf: config to be used for running MR on the cluster
  • /conf-local: config to be used for running MR locally
  • /hadoop-1.2.1: Hadoop binaries (recomended WITHOUT conf/ dir, to avoid mistakes)

To setup files on /conf directory, you can use core-site.xml.sample and mapred-site.xml.sample as reference.

Before running

Before running the MR jobs you need to build the JAR and generate some data.

Build the jar

$ mvn clean package

Generate sample data

$ find /usr/share/doc > /tmp/MYDATA.txt

Run the MR job locally using Ant

From Eclipse, run the launch-local.xml as an Ant script. By default the task launch-local is used, which does not clean the output directory. You can use the clean-and-launch task to clean the output directory before the launch of the MR job.

Run the MR job locally from CLI

Run locally

$ ./hadoop-1.2.1/bin/hadoop --config conf-local jar hello-hadoop-0.0.1-SNAPSHOT.jar \
	ar.com.datatsunami.hellohadoop.Launcher file:///tmp/MYDATA.txt file:///tmp/OUTPUT

Run the MR job in the cluster

Build the jar

$ mvn clean package

Copy the data to HDFS

$ ./hadoop-1.2.1/bin/hadoop --config conf fs -copyFromLocal /tmp/MYDATA.txt /

Launch!

$ ./hadoop-1.2.1/bin/hadoop --config conf jar hello-hadoop-0.0.1-SNAPSHOT.jar \
	ar.com.datatsunami.hellohadoop.Launcher /MYDATA.txt /OUTPUT

See the output directory:

$ ./hadoop-1.2.1/bin/hadoop --config conf fs -ls /OUTPUT

Some Maven recipes

Setup Eclipse classpath using Maven

$ mvn eclipse:eclipse

Hadoop sources for use in Eclipse (doesn't works)

To download the Hadoop sources or javadocs DOESN'T WORKS with Maven, so you'll have to setup in Eclipse by yourself.

$ mvn dependency:sources -DincludeGroupIds=org.apache.hadoop 
(...)
[INFO] The following files were skipped:
[INFO]    org.apache.hadoop:hadoop-core:java-source:sources:1.2.0
[INFO]    org.apache.hadoop:hadoop-test:java-source:sources:1.2.0
(...)

$ mvn dependency:resolve -DincludeGroupIds=org.apache.hadoop -Dclassifier=javadoc 
(...)
[INFO] The following files have NOT been resolved:
[INFO]    org.apache.hadoop:hadoop-core:java-source:javadoc:1.2.0
[INFO]    org.apache.hadoop:hadoop-test:java-source:javadoc:1.2.0
(...)

Workaround: generate sources jar and install to local Maven repository

First, you'll need to generate the jar with the sources, and then:

$ mvn org.apache.maven.plugins:maven-install-plugin:2.5:install-file \
    -Dfile=hadoop-1.2.1-custom-sources.jar \
    -DgroupId=org.apache.hadoop \
    -DartifactId=hadoop-core \
    -Dversion=1.2.1 \
    -Dpackaging=jar \
    -Dclassifier=sources

I needed to remove a file from my local repository:

$ rm ~/.m2/repository/org/apache/hadoop/hadoop-core/1.2.1/hadoop-core-1.2.1-sources.jar-not-available

The sources will be available to Eclipse after running:

$ mvn eclipse:eclipse

Web resources

Eclipse + Hadoop

Maven / Eclipse m2e

TODOs

License

Copyright 2013 (C) Horacio G. de Oro - hgdeoro@gmail.com

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

Hello World Hadoop project + Maven + Unit test

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages