Skip to content

Latest commit

 

History

History
155 lines (98 loc) · 5.48 KB

README.md

File metadata and controls

155 lines (98 loc) · 5.48 KB

hello-hadoop

Hello World Hadoop project... Eclipse + Maven + unit tests

Project files & dirs

  • /conf: config to be used for running MR on the cluster
  • /conf-local: config to be used for running MR locally
  • /hadoop-1.2.1: Hadoop binaries (recomended WITHOUT conf/ dir, to avoid mistakes)

To setup files on /conf directory, you can use core-site.xml.sample and mapred-site.xml.sample as reference.

Before running

Before running the MR jobs you need to build the JAR and generate some data.

Build the jar

$ mvn clean package

Generate sample data

$ find /usr/share/doc > /tmp/MYDATA.txt

Run the MR job locally using Ant

From Eclipse, run the launch-local.xml as an Ant script. By default the task launch-local is used, which does not clean the output directory. You can use the clean-and-launch task to clean the output directory before the launch of the MR job.

Run the MR job locally from CLI

Run locally

$ ./hadoop-1.2.1/bin/hadoop --config conf-local jar hello-hadoop-0.0.1-SNAPSHOT.jar \
	ar.com.datatsunami.hellohadoop.Launcher file:///tmp/MYDATA.txt file:///tmp/OUTPUT

Run the MR job in the cluster

Build the jar

$ mvn clean package

Copy the data to HDFS

$ ./hadoop-1.2.1/bin/hadoop --config conf fs -copyFromLocal /tmp/MYDATA.txt /

Launch!

$ ./hadoop-1.2.1/bin/hadoop --config conf jar hello-hadoop-0.0.1-SNAPSHOT.jar \
	ar.com.datatsunami.hellohadoop.Launcher /MYDATA.txt /OUTPUT

See the output directory:

$ ./hadoop-1.2.1/bin/hadoop --config conf fs -ls /OUTPUT

Some Maven recipes

Setup Eclipse classpath using Maven

$ mvn eclipse:eclipse

Hadoop sources for use in Eclipse (doesn't works)

To download the Hadoop sources or javadocs DOESN'T WORKS with Maven, so you'll have to setup in Eclipse by yourself.

$ mvn dependency:sources -DincludeGroupIds=org.apache.hadoop 
(...)
[INFO] The following files were skipped:
[INFO]    org.apache.hadoop:hadoop-core:java-source:sources:1.2.0
[INFO]    org.apache.hadoop:hadoop-test:java-source:sources:1.2.0
(...)

$ mvn dependency:resolve -DincludeGroupIds=org.apache.hadoop -Dclassifier=javadoc 
(...)
[INFO] The following files have NOT been resolved:
[INFO]    org.apache.hadoop:hadoop-core:java-source:javadoc:1.2.0
[INFO]    org.apache.hadoop:hadoop-test:java-source:javadoc:1.2.0
(...)

Workaround: generate sources jar and install to local Maven repository

First, you'll need to generate the jar with the sources, and then:

$ mvn org.apache.maven.plugins:maven-install-plugin:2.5:install-file \
    -Dfile=hadoop-1.2.1-custom-sources.jar \
    -DgroupId=org.apache.hadoop \
    -DartifactId=hadoop-core \
    -Dversion=1.2.1 \
    -Dpackaging=jar \
    -Dclassifier=sources

I needed to remove a file from my local repository:

$ rm ~/.m2/repository/org/apache/hadoop/hadoop-core/1.2.1/hadoop-core-1.2.1-sources.jar-not-available

The sources will be available to Eclipse after running:

$ mvn eclipse:eclipse

Web resources

Eclipse + Hadoop

Maven / Eclipse m2e

TODOs

License

Copyright 2013 (C) Horacio G. de Oro - hgdeoro@gmail.com

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.