Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Sqoop is a tool designed to import data from relational databases into Hadoop. Sqoop uses JDBC to connect to a database. It examines each table’s schema and automatically generates the necessary classes to import data into the Hadoop Distributed File System (HDFS). Sqoop then creates and launches a MapReduce job to read tables from the database in parallel.
Sqoop can also import tables into Hive, for further relational processing, as well as export tabular data from HDFS back to databases.
For certain databases, such as MySQL, Sqoop provides further performance enhancements by using database-specific tools to facilitate imports and exports.
Binary releases are available from the downloads page.
You can also download Sqoop from this repository, and build it yourself: running
ant tar will create a release package that you can use to interact with Hadoop. The repository version is under continuous development and is not as stable as a packaged release.
Sqoop relies on advanced features of Apache Hadoop. As such, it requires the latest beta of Cloudera’s Distribution for Hadoop (CDH3 beta 2). Sqoop may be compatible with the Apache 0.21.0 release, but this is considered experimental and should not be used in production. The COMPILING.txt file describes how to select a Hadoop distribution to target at compilation time.
Most people will probably want to use Sqoop within a Hadoop distribution. A release of Sqoop is packaged with Cloudera’s Distribution for Hadoop, which is based on Apache Hadoop 0.20. Instructions on downloading/installing CDH are available at http://docs.cloudera.com. In addition to the Hadoop core, CDH includes RPM- and Debian-based installation packages for Sqoop.
User and developer documentation, including manpages, are provided with Sqoop in the
Additionally, documentation for the latest release is hosted at the Cloudera software archive.
Sqoop has two mailing lists to facilitate interaction between developers and users.
- firstname.lastname@example.org – For questions and discussion about Sqoop’s usage, troubleshooting, etc. If you’re using Sqoop, please subscribe.
- email@example.com – Discussion regarding development of Sqoop itself. If you’re interested in contributing to Sqoop, please subscribe here.
- DevelopmentProcess – Notes on developing against Sqoop and what process to follow
- ReleaseProcess – Instructions for committers on how to cut a release
- SqoopImprovementProposals – Proposed design changes, large features, etc.
- How to compile (This is in your git repository, too!)
Sqoop is free software made available under the Apache License, version 2.0. Using Sqoop with a particular database may require that you install a database-specific JDBC driver, available from your database vendor under its own license. See your database vendor’s documentation for further details.