Skip to content
This repository has been archived by the owner on Jul 9, 2021. It is now read-only.

Commit

Permalink
Add ability to compile against Cloudera or Apache Hadoop.
Browse files Browse the repository at this point in the history
Added more thorough compilation instructions.

From: Aaron Kimball <aaron@cloudera.com>

git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149879 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information
Andrew Bayer committed Jul 22, 2011
1 parent 0fde7df commit 22190b9
Show file tree
Hide file tree
Showing 5 changed files with 212 additions and 17 deletions.
130 changes: 130 additions & 0 deletions COMPILING.txt
@@ -0,0 +1,130 @@

= Compiling

This document explains how to compile Sqoop.

== Build Dependencies

Compiling Sqoop requires the following tools:

* Apache ant (1.7.1)
* Java JDK 1.6

Additionally, building the documentation requires these tools:

* asciidoc
* make
* python 2.5+
* xmlto
* tar
* gzip

Furthermore, Sqoop's build can be instrumented with the following:

* findbugs (1.3.9) for code quality checks
* cobertura (1.9.4.1) for code coverage

== The Basics

Sqoop is compiled with ant. Type +ant -p+ to see the list of available targets.

Type +ant jar+ to compile java sources into jar files. Type +ant package+ to
produce a fully self-hosted build. This will appear in the
+build/sqoop-(version)/+ directory.

== Testing Sqoop

Sqoop has several unit tests which can be run with +ant test+. This command
will run all the "basic" checks against an in-memory database, HSQLDB.

Sqoop also has compatibility tests that check its ability to work with
several third-party databases. To enable these tests, you will need to install
and configure the databases, and download the JDBC drivers for each one.

=== MySQL

Install MySQL server and client 5.0. Download MySQL Connector/J 5.0.8 for
JDBC. Instructions for configuring the MySQL database are in MySQLAuthTest
and DirectMySQLTest.

=== Oracle

Install Oracle XE (Express edition) 10.2.0. Instructions for configuring the
database are in OracleManagerTest. Download the ojdbc6_g jar.

=== PostgreSQL

Install PostgreSQL 8.3.9. Download the postgresql 8.4 jdbc driver. Instructions
for configuring the database are in PostgresqlTest.

=== Running the Third-party Tests

After the third-party databases are installed and configured, run:

++++
ant test -Dthirdparty=true -Dsqoop.thirdparty.lib.dir=/path/to/jdbc/drivers/
++++


== Multiple Hadoop Distributions

Sqoop can be compiled against different versions of Hadoop. Both the svn
trunk of Apache Hadoop, and Cloudera's Distribution for Hadoop (CDH3)
can be used as the underlying Hadoop implementation.

By default, Sqoop will compile against the latest snapshot from Apache
(retrieved through maven). You can specify the Hadoop distribution to
retrieve with the hadoop.dist property. Valid values are "apache" or
"cloudera":

++++
ant jar -Dhadoop.dist=apache
ant jar -Dhadoop.dist=cloudera
++++

To switch between builds, you will need to clear Ivy's dependency
cache: +ant veryclean+


== Code Quality Analysis

We have two tools which can be used to analyze Sqoop's code quality.

=== Findbugs

Findbugs detects common errors in programming. New patches should not
trigger additional warnings in Findbugs.

Install findbugs (1.3.9) according to its instructions. To use it,
run:

++++
ant findbugs -Dfindbugs.home=/path/to/findbugs/
++++

A report will be generated in +build/findbugs/+

=== Cobertura

Cobertura runs code coverage checks. It instruments the build and
checks that each line and conditional expression is evaluated along
all possible paths.

Install Cobertura according to its instructions. Then run a test with:

++++
ant clean
ant cobertura -Dcobertura.home=/path/to/cobertura
ant cobertura -Dcobertura.home=/path/to/cobertura \
-Dthirdparty=true -Dsqoop.thirdparty.lib.dir=/path/to/thirdparty
++++

(You'll need to run the cobertura target twice; once against the regular
test targets, and once against the thirdparty targets.)

When complete, the report will be placed in +build/cobertura/+

New patches should come with sufficient tests for their functionality
as well as their error recovery code paths. Cobertura can help assess
whether your tests are thorough enough, or where gaps lie.

1 change: 1 addition & 0 deletions README.txt
Expand Up @@ -37,6 +37,7 @@ provided in the +build/sqoop-(version)/+ directory.

You can build just the jar by running +ant jar+.

See the COMPILING.txt document for for information.

== This is also an Asciidoc file!

Expand Down
53 changes: 45 additions & 8 deletions build.xml
Expand Up @@ -48,6 +48,10 @@
<property name="javac.debug" value="on"/>
<property name="build.encoding" value="ISO-8859-1"/>

<!-- controlling the Hadoop source -->
<!-- valid values for ${hadoop.dist} are 'apache' and 'cloudera' -->
<property name="hadoop.dist" value="apache" />

<!-- testing with JUnit -->
<property name="test.junit.output.format" value="plain"/>
<property name="test.output" value="no"/>
Expand Down Expand Up @@ -107,11 +111,13 @@
<pathelement location="${build.classes}"/>
<path refid="lib.path"/>
<path refid="${name}.common.classpath"/>
<path refid="${name}.hadoop.classpath"/>
</path>

<!-- Classpath for unit tests (superset of compile.classpath) -->
<path id="test.classpath">
<pathelement location="${build.test.classes}" />
<path refid="${name}.hadooptest.classpath"/>
<path refid="${name}.test.classpath"/>
<path refid="compile.classpath"/>
</path>
Expand All @@ -126,7 +132,8 @@
<target name="init" />

<!-- Compile core classes for the project -->
<target name="compile" depends="init, ivy-retrieve-common"
<target name="compile"
depends="init, ivy-retrieve-common, ivy-retrieve-hadoop"
description="Compile core classes for the project">
<!-- don't use an out-of-date instrumented build. -->
<delete dir="${cobertura.class.dir}" />
Expand All @@ -143,7 +150,8 @@
</javac>
</target>

<target name="compile-test" depends="compile, ivy-retrieve-test"
<target name="compile-test"
depends="compile, ivy-retrieve-test, ivy-retrieve-hadoop-test"
description="Compile test classes">
<mkdir dir="${build.test.classes}" />
<javac
Expand Down Expand Up @@ -362,6 +370,13 @@
<delete dir="${build.dir}"/>
</target>

<target name="veryclean"
depends="clean"
description="Clean build and remove cached dependencies">
<delete dir="${user.home}/.ivy2/cache/org.apache.hadoop" />
<delete file="${ivy.jar}" />
</target>

<target name="findbugs" depends="check-for-findbugs,jar,compile-test"
if="findbugs.present" description="Run FindBugs">
<taskdef name="findbugs" classname="edu.umd.cs.findbugs.anttask.FindBugsTask"
Expand Down Expand Up @@ -467,37 +482,59 @@
<property name="ivy.configured" value="true" />
</target>


<!-- retrieve ivy-managed artifacts for the compile configuration -->
<target name="ivy-resolve-common" depends="ivy-init">
<ivy:resolve settingsRef="${name}.ivy.settings" conf="common" />
</target>

<!-- retrieve ivy-managed artifacts for the compile configuration -->
<target name="ivy-retrieve-common" depends="ivy-resolve-common">
<ivy:retrieve settingsRef="${name}.ivy.settings"
pattern="${build.ivy.lib.dir}/${ivy.artifact.retrieve.pattern}" sync="true" />
<ivy:cachepath pathid="${name}.common.classpath" conf="common" />
</target>


<!-- retrieve ivy-managed artifacts for the test configuration -->
<target name="ivy-resolve-test" depends="ivy-init">
<ivy:resolve settingsRef="${name}.ivy.settings" conf="test" />
</target>

<!-- retrieve ivy-managed artifacts for the test configuration -->
<target name="ivy-retrieve-test" depends="ivy-resolve-test">
<ivy:retrieve settingsRef="${name}.ivy.settings"
pattern="${build.ivy.lib.dir}/${ivy.artifact.retrieve.pattern}" sync="true" />
<ivy:cachepath pathid="${name}.test.classpath" conf="test" />
</target>


<!-- retrieve ivy-managed artifacts for the redist configuration -->
<target name="ivy-resolve-redist" depends="ivy-init">
<ivy:resolve settingsRef="${name}.ivy.settings" conf="redist" />
</target>

<!-- retrieve ivy-managed artifacts for the redist configuration -->
<target name="ivy-retrieve-redist" depends="ivy-resolve-redist">
<ivy:retrieve settingsRef="${name}.ivy.settings"
pattern="${build.ivy.lib.dir}/${ivy.artifact.retrieve.pattern}" sync="true" />
<ivy:cachepath pathid="${name}.redist.classpath" conf="redist" />
</target>

<!-- retrieve ivy-managed artifacts from the Hadoop distribution -->
<target name="ivy-resolve-hadoop" depends="ivy-init">
<ivy:resolve settingsRef="${name}.ivy.settings" conf="${hadoop.dist}" />
</target>
<target name="ivy-retrieve-hadoop" depends="ivy-resolve-hadoop">
<ivy:retrieve settingsRef="${name}.ivy.settings"
pattern="${build.ivy.lib.dir}/${ivy.artifact.retrieve.pattern}" sync="true" />
<ivy:cachepath pathid="${name}.hadoop.classpath" conf="${hadoop.dist}" />
</target>

<!-- retrieve ivy-managed test artifacts from the Hadoop distribution -->
<target name="ivy-resolve-hadoop-test" depends="ivy-init">
<ivy:resolve settingsRef="${name}.ivy.settings" conf="${hadoop.dist}test" />
</target>
<target name="ivy-retrieve-hadoop-test" depends="ivy-resolve-hadoop-test">
<ivy:retrieve settingsRef="${name}.ivy.settings"
pattern="${build.ivy.lib.dir}/${ivy.artifact.retrieve.pattern}"
sync="true" />
<ivy:cachepath pathid="${name}.hadooptest.classpath"
conf="${hadoop.dist}test" />
</target>

</project>
34 changes: 28 additions & 6 deletions ivy.xml
Expand Up @@ -32,7 +32,20 @@
<conf name="common" visibility="private"
extends="runtime"
description="artifacts needed to compile/test the application"/>
<conf name="apache" visibility="private"
extends="runtime"
description="artifacts from Apache for compile/test" />
<conf name="cloudera" visibility="private"
extends="runtime"
description="artifacts from Cloudera for compile/test" />

<conf name="test" visibility="private" extends="runtime"/>
<conf name="apachetest" visibility="private"
extends="test"
description="artifacts from Apache for testing" />
<conf name="clouderatest" visibility="private"
extends="test"
description="artifacts from Cloudera for testing" />

<!-- We don't redistribute everything we depend on (e.g., Hadoop itself);
anything which Hadoop itself also depends on, we do not ship.
Expand All @@ -46,18 +59,27 @@
<artifact conf="master"/>
</publications>
<dependencies>
<!-- Dependencies for Apache Hadoop -->
<dependency org="org.apache.hadoop" name="hadoop-core"
rev="${hadoop-core.version}" conf="common->default"/>
rev="${hadoop-core.apache.version}" conf="apache->default"/>
<dependency org="org.apache.hadoop" name="hadoop-core-test"
rev="${hadoop-core.version}" conf="common->default"/>
rev="${hadoop-core.apache.version}" conf="apachetest->default"/>
<dependency org="org.apache.hadoop" name="hadoop-hdfs"
rev="${hadoop-hdfs.version}" conf="common->default"/>
rev="${hadoop-hdfs.apache.version}" conf="apache->default"/>
<dependency org="org.apache.hadoop" name="hadoop-hdfs-test"
rev="${hadoop-hdfs.version}" conf="test->default"/>
rev="${hadoop-hdfs.apache.version}" conf="apachetest->default"/>
<dependency org="org.apache.hadoop" name="hadoop-mapred"
rev="${hadoop-mapred.version}" conf="common->default"/>
rev="${hadoop-mapred.apache.version}" conf="apache->default"/>
<dependency org="org.apache.hadoop" name="hadoop-mapred-test"
rev="${hadoop-mapred.version}" conf="test->default"/>
rev="${hadoop-mapred.apache.version}" conf="apachetest->default"/>

<!-- Dependencies for Cloudera's Distribution for Hadoop -->
<dependency org="org.apache.hadoop" name="hadoop-core"
rev="${hadoop-core.cloudera.version}" conf="cloudera->default"/>
<dependency org="org.apache.hadoop" name="hadoop-core-test"
rev="${hadoop-core.cloudera.version}" conf="clouderatest->default"/>

<!-- Common dependencies for Sqoop -->
<dependency org="commons-logging" name="commons-logging"
rev="${commons-logging.version}" conf="common->default"/>
<dependency org="log4j" name="log4j" rev="${log4j.version}"
Expand Down
11 changes: 8 additions & 3 deletions ivy/libraries.properties
Expand Up @@ -19,9 +19,14 @@
commons-io.version=1.4
commons-logging.version=1.0.4

hadoop-core.version=0.22.0-SNAPSHOT
hadoop-hdfs.version=0.22.0-SNAPSHOT
hadoop-mapred.version=0.22.0-SNAPSHOT
# Apache Hadoop dependency version: use trunk.
hadoop-core.apache.version=0.22.0-SNAPSHOT
hadoop-hdfs.apache.version=0.22.0-SNAPSHOT
hadoop-mapred.apache.version=0.22.0-SNAPSHOT

# Cloudera Distribution dependency version
hadoop-core.cloudera.version=0.20.2-CDH3b2-SNAPSHOT

hsqldb.version=1.8.0.10

ivy.version=2.0.0-rc2
Expand Down

0 comments on commit 22190b9

Please sign in to comment.