Skip to content

Scalding with CDH3U2 in a Maven project

amimimor edited this page May 15, 2012 · 5 revisions

Introduction

Aim

This wiki describes a procedure that should allow the dedicated reader to create an executable jar file implementing Scalding, using Maven, that is readily available for deployment on CDH3U2 cluster.

Hadoop Flavors and Compatibility Issues

To deploy a MapReduce job on any Hadoop cluster, since the different Hadoop versions are not necessarily compatible with each other, one has to ensure that the core Hadoop libraries the client code uses are identical to those found throughout the entire cluster. Roughly said, client code that is planned to be deployed as an executable jar, should use the same exact jars as are used by the server nodes on the cluster.

Prerequisites

  • Scalding source - here we used v0.5.3
  • SBT - to build Scalding
  • Cloudera's Hadoop (CDH) - binaries are fine, e.g. hadoop-0.20.2-cdh3u2.tar.gz . Other versions are cool, just use the same version your cluster uses.
  • IDE with Maven support - here I use Eclipse. There is no need for an IDE if you are a Maven wizard. I am not one of those.

Contents

Getting help

Documentation

Matrix API

Third Party Modules

Videos

How-tos

Tutorials

Articles

Other

Clone this wiki locally