Skip to content
This repository has been archived by the owner on Jul 9, 2021. It is now read-only.

Latest commit

 

History

History
642 lines (459 loc) · 25.6 KB

COMPILING.adoc

File metadata and controls

642 lines (459 loc) · 25.6 KB

Compiling

This document explains how to compile and test Sqoop.

Build Dependencies

Compiling Sqoop requires the following tools:

  • Apache ant (1.9.7) or Gradle (5.4.1)

  • Java JDK 1.8

Additionally, building the documentation requires these tools:

  • asciidoc

  • make

  • python 2.5+

  • xmlto

  • tar

  • gzip

Furthermore, Sqoop’s build can be instrumented with the following:

  • findbugs (1.3.9) for code quality checks

  • cobertura (1.9.4.1) for code coverage

  • checkstyle (5.x) for code style checks

The Basics

Sqoop can be compiled and tested with both Ant and Gradle. Type ant -p or ./gradlew tasks --all to see the list of available targets/tasks.

Type ant or ./gradlew jar to compile all java sources. You can then run Sqoop with bin/sqoop.

If you want to build everything (including the documentation), type ant package or ./gradlew package. This will appear in the build/sqoop-(version)/ directory.

Testing Sqoop

Sqoop supports both Ant and Gradle but the test tasks are a bit different in each build tools.

Testing with Ant

The Ant build defines two main test categories: unit and third party tests.

Classes with the Test prefix are unit tests and classes with the Test postfix are third party tests.

Sqoop unit tests can be run with ant test. This command will run all the "basic" checks against an in-memory database, HSQLDB.

Sqoop’s third party tests are compatibility tests that check its ability to work with several third-party databases. To enable these tests, you will need to install and configure the databases or run them in Docker containers, and download the JDBC drivers for each one.

For more information about how to run this suite see the 'Third party tests' section.

Note that the unit test suite contains several Amazon S3 test classes too which test the "RDBMS to Amazon S3" use case with an in-memory database, HSQLDB. To enable these tests, you will need to have either permanent or temporary Amazon S3 security credentials.

For more information about how to run this suite see the 'Amazon S3 tests' section.

Testing with Gradle

The Gradle build supports JUnit’s @Category annotation so we have access to much more fine grained test categories here. The test categories are defined as marker interfaces under org.apache.sqoop.testcategories package.

The following table shows the currently supported test categories and their hierarchy:

Table 1. Available test categories
Category Subcategories

SqoopTest

UnitTest

IntegrationTest

ManualTest

ThirdPartyTest

CubridTest

Db2Test

MainFrameTest

MysqlTest

NetezzaTest

OracleTest

OracleEeTest

PostgresqlTest

SqlServerTest

S3Test

KerberizedTest

SqoopTest

A general category including UnitTest, IntegrationTest and ManualTest.

UnitTest

A unit test shall test one class at a time having it’s dependencies mocked. A unit test shall not start a mini cluster nor an embedded database and it shall not use a JDBC driver.

IntegrationTest

An integration test shall test if independently developed classes work together correctly. An integration test checks a whole scenario and thus may start mini clusters or embedded databases and may connect to external resources like RDBMS instances.

ManualTest

Deprecated category, shall not be used nor extended.

ThirdPartyTest

A third party test shall test a scenario where a third party side is required such as a JDBC driver or an external RDBMS instance. The subcategories define what kind of third party dependency is needed by the test.

KerberizedTest

A kerberized test shall run in kerberized environment thus it starts mini KDC server.

Categorizing tests

Note that if you add a new test you have to make sure that it is categorized otherwise Gradle will not execute it.

The categorizing steps are the following:

  • Decide if the test is a unit or an integration test, mark the test class with the @Category annotation and add the corresponding marker interface to it.

  • If the test needs a JDBC driver or an external service then add ThirdPartyTest or one of its subinterfaces to the @Category annotation. Try to use the most specific interface.

  • If the test starts a Mini KDC then add the KerberizedTest interface to the @Category annotation.

Available test targets

  • unitTest: Runs unit tests which do not need proprietary JDBC driver.

  • integrationTest: Runs integration tests which do not need a docker container or an external database/service.

  • kerberizedTest: Runs kerberized tests.

  • thirdPartyTest: Runs third-party tests. For more information see the 'Third party tests' section.

  • postgreSqlTest: Runs PostgreSql third-party tests.

  • mySqlTest: Runs MySql third-party tests.

  • cubridTest: Runs Cubrid third-party tests.

  • db2Test: Runs DB2 third-party tests.

  • oracleTest: Runs Oracle third-party tests.

  • oracleXeTest: Run Oracle third party tests which can be run with Oracle Express Edition too.

  • oracleEeTest: Run Oracle third party tests which require Oracle Enterprise Edition.

  • sqlServerTest: Runs SqlServer third-party tests.

  • test: Runs tests that do not need external JDBC driver and/or a docker container. This the same as running unitTest, integrationTest and kerberizedTest.

  • s3Test: Runs S3 tests. For more information see the 'Amazon S3 tests' section.

  • allTest: Runs all Sqoop tests.

The forkEvery parameter of most of the Gradle test tasks is set to 0 which means that all of the tests run in a single JVM. The only exception is the kerberizedTest task which requires a new JVM for every test class. The benefit of this setup is that the test tasks finish much faster since the JVM creation is a slow operation however the Sqoop test framework seems to consume/leak too much memory which can lead to an OutOfMemoryError during the build. To prevent the JVM running out of memory you can use the -DforkEvery.default property to set the forkEvery parameter for all the test tasks except kerberizedTest:

./gradlew -DforkEvery.default=30 test

The ignoreFailures parameter of the Gradle test tasks is set to false which means that if a Gradle test task fails the gradle process returns with non-zero. In some CI tools (e.g. Jenkins) this will make the status of the job red and not yellow which usually means some more serious issue than a test failure. To change this behavior you can use the -DignoreTestFailures property to set the ignoreFailures parameter for all the test tasks:

./gradlew -DignoreTestFailures=true test

Third party tests

Installing the necessary databases

MySQL

Install MySQL server and client 5.0. Download MySQL Connector/J 5.0.8 for JDBC. Instructions for configuring the MySQL database are in MySQLAuthTest and DirectMySQLTest.

Use the following system properties to configure connection to the MySQL host used for testing: sqoop.test.mysql.connectstring.host_url, sqoop.test.mysql.databasename, sqoop.test.mysql.username and sqoop.test.mysql.password. Specify these properties on the command line or via the build.properties file. For example:

sqoop.test.mysql.connectstring.host_url=jdbc:mysql://host.example.com/ sqoop.test.mysql.databasename=MYDB sqoop.test.mysql.username=MYUSR sqoop.test.mysql.password=MYPWD

If not specified, the default value used for this property is: jdbc:mysql://127.0.0.1:13306/

Oracle

Install Oracle Enterprise Edition 10.2.0+. Instructions for configuring the database are in OracleManagerTest. Download the ojdbc6_g jar.

If running the tests against Oracle XE (Express Edition) - a lot of them will fail as it does not include the partitioning feature.

Use the following system properties to configure connection to the Oracle XE host used for testing: sqoop.test.oracle.connectstring, sqoop.test.oracle.username and sqoop.test.oracle.password. You can configure the connection properties separately for tests requiring Oracle EE: sqoop.test.oracle-ee.connectstring, sqoop.test.oracle-ee.username and sqoop.test.oracle-ee.password. Specify these properties on the command line or via the build.properties file. For example:

sqoop.test.oracle.connectstring=jdbc:oracle:thin:@//host.example.com/xe sqoop.test.oracle.username=MYUSR sqoop.test.oracle.password=MYPWD

If not specified, the default value used for sqoop.test.oracle.connectstring property is: jdbc:oracle:thin:@//localhost:1521/xe

The default value used for sqoop.test.oracle-ee.connectstring property is: jdbc:oracle:thin:@//localhost:1522/sqoop

Users sqooptest and sqooptest2 should be created prior to running the tests. SQL script is available in src/test/oraoop/create_users.sql

PostgreSQL

Install PostgreSQL 8.3.9. Download the postgresql 8.4 jdbc driver. Instructions for configuring the database are in PostgresqlTest.

Use the following system properties to configure connection to the PostgreSQL host used for testing: sqoop.test.postgresql.connectstring.host_url, sqoop.test.postgresql.database, sqoop.test.postgresql.username and sqoop.test.postgresql.password. Specify this property on the command line or via the build.properties file. For example:

sqoop.test.postgresql.connectstring.host_url=jdbc:postgresql://sqoop-dbs.sf.cloudera.com/ sqoop.test.postgresql.database=MYDB sqoop.test.postgresql.username=MYUSR sqoop.test.postgresql.password=MYPWD

If not specified, the default value used for this property is: jdbc:postgresql://localhost:15432/

SQL Server

Install SQL Server Express 2012 and create a database instance and download the appropriate JDBC driver. Instructions for configuring the database can be found in SQLServerManagerImportManualTest.

Use the following system properties to configure connection to the SQL Server host used for testing: sqoop.test.sqlserver.connectstring.host_url, sqoop.test.sqlserver.database, ms.sqlserver.username and ms.sqlserver.password. the URL for the SQL Server host used for testing. Specify this property on the command line or via the build.properties file. For example:

sqoop.test.sqlserver.connectstring.host_url=jdbc:sqlserver://sqlserverhost:1433 sqoop.test.sqlserver.database=MYDB ms.sqlserver.username=MYUSR ms.sqlserver.password=MYPWD

If not specified, the default value used for this property is: jdbc:sqlserver://localhost:1433

Cubrid

Install Cubrid 9.2.2.0003 and create a database instance and download the appropriate JDBC driver. Instructions for configuring the database are in CubridAuthTest, CubridCompatTest, CubridManagerImportTest and CubridManagerExportTest.

Use the following system properties to configure connection to the Cubrid host used for testing: sqoop.test.cubrid.connectstring.host_url, sqoop.test.cubrid.connectstring.database, sqoop.test.cubrid.connectstring.username and sqoop.test.cubrid.connectstring.password. Specify this property on the command line or via the build.properties file. For example:

sqoop.test.cubrid.connectstring.host_url=jdbc:cubrid:localhost sqoop.test.cubrid.connectstring.database=MYDB sqoop.test.cubrid.connectstring.username=MYUSR sqoop.test.cubrid.connectstring.password=MYPWD

If not specified, the default value used for this property is: jdbc:cubrid:localhost:33000

DB2

Install DB2 9.74 Express C and download the appropriate JDBC driver. Instructions for configuring the server can be found in DB2ManagerImportManualTest.

Use the following system properties to configure connection to the DB2 host used for testing: sqoop.test.db2.connectstring.host_url, sqoop.test.db2.connectstring.database, sqoop.test.db2.connectstring.username and sqoop.test.db2.connectstring.password. Specify this property on the command line or via build.properties file. For example:

sqoop.test.db2.connectstring.host_url=jdbc:db2://db2host:50000 sqoop.test.db2.connectstring.database=MYDB sqoop.test.db2.connectstring.username=MYUSR sqoop.test.db2.connectstring.password=MYPWD

If not specified, the default value used for this property is: jdbc:db2://localhost:50000

Running the Third-party Tests on native database servers

After the third-party databases are installed and configured, run:

ant test -Dthirdparty=true -Dsqoop.thirdparty.lib.dir=/path/to/jdbc/drivers/ [-DconnectionProperty1=propertyValue1
-DconnectionProperty2=propertyValue ...]
./gradlew -Dsqoop.thirdparty.lib.dir=/relative/path/to/jdbc/drivers/ thirdPartyTest [-DconnectionProperty1=propertyValue1
-DconnectionProperty2=propertyValue ...]

This command will run all thirdparty tests except some DB2 tests. To run these DB2 test, specify the property "manual" instead of "thirdparty" as follows:

ant test -Dmanual=true -Dsqoop.thirdparty.lib.dir=/path/to/jdbc/drivers/ [-DconnectionProperty1=propertyValue1
-DconnectionProperty2=propertyValue ...]
./gradlew -Dsqoop.thirdparty.lib.dir=/relative/path/to/jdbc/drivers/ manualTest [-DconnectionProperty1=propertyValue1
-DconnectionProperty2=propertyValue ...]

Note that sqoop.thirdparty.lib.dir can also be specified in build.properties.

Setting up and executing third-party tests with databases running in Docker containers

The easiest way to run the Sqoop third party test pack is to start all the necessary database servers in Docker containers. This eliminates the need of installing and setting up 6 different RDBMSs on the development machines and provides a clean database environment every time the tests are executed.

Installing docker

The first step is to install a recent version (1.13.0+) of Docker and Docker Compose on your development machine. Please refer to the Docker documentation for the installation instructions for your OS environment:

Downloading docker images

MySQL, PostgreSQL, MSSQL, DB2, Oracle XE and Cubrid images are freely available on Docker Hub so they will be pulled automatically by the startup command specified below however the Oracle EE image has to be built manually. Please refer to the README.md file on the below Github project for building instructions:

Please note that some Sqoop third party tests require Oracle Enterprise Edition and the startup command assumes version 12.2.0.1.

Starting the Docker containers

A startup script has been added to the Sqoop project to make the Docker container initialization easier:

<sqoop_workspace>/src/scripts/thirdpartytest/start-thirdpartytest-db-containers.sh

If it is executed without parameters it starts the following services:

Table 2. Third party test DB docker services
Service name Database version

mysql

MySql 5.7.19

postgresql

PostgreSQL 9.6.4

mssql

MSSQL 14.0.1000.169 Developer Edition

cubrid

Cubrid 10.0

oracle-ee

Oracle EE 12.2.0.1

oracle

Oracle XE 11g Release 2

db2

DB2 Express Edition 10.5.0.5-3.10.0

The script starts up all the services by default but you can override this behavior by specifying the service names in the parameter list. For example if you want to start up mysql and postgresql services only you can execute the script like this:

<sqoop_workspace>/src/scripts/thirdpartytest/start-thirdpartytest-db-containers.sh mysql postgresql

After the startup script is executed the containers need some time to initialize the databases. You can follow the status of the containers using the docker ps command. If a service is properly initialized you will see a healthy status in the output of the docker ps command. For example the below output means that Cubrid is started up successfully and it is ready to be used but DB2 is still starting up:

61c4ef871cbb        cubrid/cubrid:10.0                    "/entrypoint.sh"         43 seconds ago      Up 38 seconds (healthy)            0.0.0.0:33000->33000/tcp           sqoop_cubrid_container
158e5421d134        ibmcom/db2express-c:10.5.0.5-3.10.0   "/home/db2inst1/db..."   43 seconds ago      Up 40 seconds (health: starting)   22/tcp, 0.0.0.0:50000->50000/tcp   sqoop_db2_container

Most of the containers need less than 1 minute to start up but DB2 needs \~5 minutes and Oracle EE needs \~15 minutes. The Docker images need \~17GB free disk space and Docker requires ~5GB of memory to start all of them at the same time.

Stopping the Docker containers

You can stop and remove the Docker containers using the following command:

<sqoop_workspace>/src/scripts/thirdpartytest/stop-thirdpartytest-db-containers.sh
Running the third party tests using docker containers

You can execute the third party tests against the DBs running in Docker containers using the following command (replace <path_to_thirdparty_lib_directory> with the path you have the necessary JDBC drivers, <your-bucket-url> and <your-credential-generator-command> with the values described in the 'Amazon S3 tests' section):

ant clean test -Dthirdparty=true -Dsqoop.thirdparty.lib.dir=<path_to_thirdparty_lib_directory> \
    -Ds3.bucket.url=<your-bucket-url> \
    -Ds3.generator.command=<your-credential-generator-command>

or

./gradlew -Dsqoop.thirdparty.lib.dir=<path_to_thirdparty_lib_directory> \
    -Ds3.bucket.url=<your-bucket-url> \
    -Ds3.generator.command=<your-credential-generator-command> \
    thirdPartyTest

Gradle is able download all of the necessary drivers but the Ant build has to be provided with the drivers for all the databases.

Please note that even if you do not need to install RDBMSs to run Sqoop third party tests against the Docker containers you still need to install the following tools:

  • mysqldump

  • mysqlimport

  • psql

Amazon S3 tests

To enable Amazon S3 tests you need to have Amazon S3 security credentials. To pass these credentials to Sqoop during test execution you need to have a generator command that writes Amazon S3 credentials to the first line of standard output in the following order: access key, secret key and session token (the latter one only in case of temporary credentials) having them separated by spaces.

You can then pass the bucket URL and the generator command to the tests via system properties as follows:

ant clean test -Ds3.bucket.url=<your-bucket-url> -Ds3.generator.command=<your-credential-generator-command>

or

./gradlew s3Test -Ds3.bucket.url=<your-bucket-url> -Ds3.generator.command=<your-credential-generator-command>

Code Quality Analysis

We have three tools which can be used to analyze Sqoop’s code quality.

Findbugs

Findbugs detects common errors in programming. New patches should not trigger additional warnings in Findbugs.

Install findbugs (1.3.9) according to its instructions. To use it, run:

ant findbugs -Dfindbugs.home=/path/to/findbugs/

or

./gradlew findbugsMain

A report will be generated in build/findbugs/

Code Coverage reports

For ant Cobertura runs code coverage checks. It instruments the build and checks that each line and conditional expression is evaluated along all possible paths.

Install Cobertura according to its instructions. Then run a test with:

ant clean
ant cobertura -Dcobertura.home=/path/to/cobertura
ant cobertura -Dcobertura.home=/path/to/cobertura \
    -Dthirdparty=true -Dsqoop.thirdparty.lib.dir=/path/to/thirdparty

For Gradle we run Jacoco for code coverage checks. You can create single reports or composite reports, where you can check the combined coverage of unit and thirdparty tests.

./gradlew clean
./gradlew test
./gradlew jacocoTestReport

./gradlew -Dsqoop.thirdparty.lib.dir=<path_to_thirdparty_lib_directory> thirdPartyTest
./gradlew jacocoThirdPartyReport

or generate the composite report after running test and thirdPartyTest

./gradlew jacocoCompositeReport

(You’ll need to run the cobertura target twice; once against the regular test targets, and once against the thirdparty targets.)

When complete, the report will be placed in build/cobertura/

New patches should come with sufficient tests for their functionality as well as their error recovery code paths. Cobertura can help assess whether your tests are thorough enough, or where gaps lie.

Checkstyle

Checkstyle enforces our style guide. There are currently a very small number of violations of this style in the source code, but hopefully this will remain the case. New code should not trigger additional checkstyle warnings.

Checkstyle does not need to be installed manually; it will be retrieved via Ivy when necessary.

To run checkstyle, execute:

ant checkstyle

or

./gradlew checkStyleMain

A report will be generated as build/checkstyle-errors.html

Deploying to Maven

To use Sqoop as a dependency in other projects, you can pull Sqoop into your dependency management system through Maven.

To install Sqoop in your local .m2 cache, run:

ant mvn-install

or

./gradlew publishToMavenLocal

This will install a pom and the Sqoop jar.

To deploy Sqoop to a public repository, use:

ant mvn-deploy

or

./gradlew publishSqoopPublicationToCloudera.snapshot.repoRepository
./gradlew -DmvnRepo=x publishSqoopPublicationToCloudera.x.repoRepository

By default, this deploys to repository.cloudera.com. You can choose the complete URL to deploy to with the mvn.deploy.url property. By default, this deploys to the "snapshots" repository. To deploy to "staging" or "releases" on repository.cloudera.com, set the mvn.repo property accordingly.

Releasing Sqoop

To build a full release of Sqoop, run ant release -Dversion=(somever). This will build a binary release tarball and the web-based documentation as well as run a release audit which flags any source files which may be missing license headers.

(The release audit can be run standalone with the ant releaseaudit (./gradlew rat) target.)

You must set the version property explicitly; you cannot release a snapshot. To simultaneously deploy this to a maven repository, include the mvn-install or mvn-deploy targets as well.

Using Eclipse

Running ant eclipse will generate .project and .classpath files that will allow you to edit Sqoop sources in Eclipse with all the library dependencies correctly resolved. To compile the jars, you should still use Ant or Gradle.

Building the documentation

Building the documentation requires that you have toxml installed. Also, one needs to set the XML_CATALOG_FILES environment variable.

export XML_CATALOG_FILES=/usr/local/etc/xml/catalog
ant docs

or

export XML_CATALOG_FILES=/usr/local/etc/xml/catalog
./gradlew docs

Other important Gradle commands

  • ./gradlew wrapper to generate gradle wrapper (to ensure you are using the correct, compatible version of gradle)

  • ./gradlew tasks to list all top-level gradle tasks or run ./gradlew tasks --all to show all tasks and subtasks

  • ./gradlew compileJava to compile the main Java source

  • ./gradlew test --debug-jvm to run remote debug on port 5005

  • ./gradlew test --tests TestSqoopOptions, ./gradlew thirdPartyTest --tests TestS3* to run one or more tests matching a pattern (use with test or thirdPartyTest)

  • ./gradlew build --refresh-dependencies to refresh dependencies for a build

  • ./gradle build -x test for skipping a single test or a set of tests

  • ./gradlew dependencyInsight --configuration optionalConfiguration --dependency searchedForDependency to get a dependency tree

  • ./gradlew dependencies to get the list of the dependencies of the selected project, broken down by configuration

Setting up Travis CI

You can now set up a Travis CI job for your own Sqoop fork so you can easily test your patches before uploading them for review. The steps for setting up the CI job are the following:

  • Go to https://github.com/apache/sqoop and fork the project if it is not already forked for your GitHub account.

  • Go to https://travis-ci.com and Sign up with GitHub.

  • Accept the Authorization of Travis CI. You’ll be redirected to GitHub.

  • Click the green Activate button, and select the Sqoop repository.

The Travis CI job uses Gradle to build and test the project and it is able to execute all of the tests except the ones requiring an Oracle EE database.