Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sqoop - Data transfer tool from RDBMS to Hadoop Box #62

Open
v5tech opened this issue Jan 16, 2015 · 0 comments
Open

Sqoop - Data transfer tool from RDBMS to Hadoop Box #62

v5tech opened this issue Jan 16, 2015 · 0 comments

Comments

@v5tech
Copy link
Owner

v5tech commented Jan 16, 2015

TO IMPORT & EXPORT DATA FROM RDBMS (MYSQL,ORACLE, etc) INTO HDFS / HIVE / HBASE

Pre-requisite

Apache Hadoop

Apache Sqoop (compatible with Hadoop version)

Apache Hive (optional)

Apache HBase (optional)

Apache HCatalog (optional)

JDBC/ODBC connector

For all RDBMS, Connection URL changes and remaining all command line arguments remains same. You need to download specific JDBC/ODBC connector JAR and copy it to $SQOOP_HOME/lib

MySQL

Download mysql-connector-java.jar and place in $SQOOP_HOME/lib folder

cp mysql-connector-java-5.1.18-bin.jar /usr/local/hadoop/sqoop-1.4.3-cdh4.4.0/lib/

Expecting you have data in MySQL tables.

Retrieving list of Databases available in MySQL from SQOOP

sqoop list-databases --connect jdbc:mysql://localhost:3306/  --username root -P
  • MySQL to HDFS Import

Have Primary key:

sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /path/to/directoryName

No Primary key:

sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /path/to/directoryName  -m 1
  • MySQL to Hive Import

Have Primary key:

sqoop-import  --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home

No Primary key:

sqoop-import  --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table tableName --create-hive-table --hive-import --hive-home  path/to/hive_home -m 1
  • MySQL to HBase Import

Have Import All columns:

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName  --column-family hbase_table_col1 --hbase-create-table

HBase import few columns

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table

To HBase with Primary key:

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 –hbase-create-table

To Hbase with no primary key:

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table
  • Export from HDFS to MySQL:

Same for all Hive/HBase/HDFS: Because Hive tables are nothing but directories in HDFS. So you're just exporting a directory to MySQL

sqoop export --connect jdbc:mysql://localhost:3306/test_db --table tableName  --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'

SQL Server

Connection URL:

sqoop import --connect 'jdbc:sqlserver://<IP(or)hostname>;username=dbuser;password=dbpasswd;database=<DB>' --table <table> --target-dir /path/to/hdfs/dir --split-by <KEY> -m 1

Download Connector from Microsoft website

http://www.microsoft.com/en-us/download/confirmation.aspx?id=11774
Place it in $SQOOP_HOME/lib

Oracle

Connection URL:

sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=myhost)(port=1521))(connect_data=(service_name=myservice)))" \
--username USER --table SCHEMA.TABLE_NAME --hive-import --hive-table SCHEMA.TABLE_NAME \
--num-mappers 1 --verbose -P \

IBM DB2

Download the DB2Driver and place it in $SQOOP_HOME/lib

sqoop import --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/testdb --username db2user --db2pwd --table db2tbl --split-by tbl_primarykey --target-dir sqoopimports
sqoop export --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/myDB --username db2user --password db2pwd --table db2tbl --export-dir /sqoop/dataFile.csv

Different Connection Strings for Different RDBMS

Database version --direct support? connect string matches

HSQLDB 1.8.0+ No jdbc:hsqldb:*//

MySQL 5.0+ Yes jdbc:mysql://

Oracle 10.2.0+ No jdbc:oracle:*//

PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://

https://coderwall.com/p/kgrwwq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant