From a008125b2864c3acbc3b630030cb614a5ea2679f Mon Sep 17 00:00:00 2001 From: Lisa Owen Date: Tue, 18 Apr 2017 17:13:57 -0700 Subject: [PATCH 1/4] document new pxf jdbc plug-in --- .../source/subnavs/apache-hawq-nav.erb | 1 + .../pxf/HawqExtensionFrameworkPXF.html.md.erb | 4 + markdown/pxf/InstallPXFPlugins.html.md.erb | 4 +- markdown/pxf/JdbcPXF.html.md.erb | 213 ++++++++++++++++++ 4 files changed, 220 insertions(+), 2 deletions(-) create mode 100644 markdown/pxf/JdbcPXF.html.md.erb diff --git a/book/master_middleman/source/subnavs/apache-hawq-nav.erb b/book/master_middleman/source/subnavs/apache-hawq-nav.erb index 489f0c4..4d1f475 100644 --- a/book/master_middleman/source/subnavs/apache-hawq-nav.erb +++ b/book/master_middleman/source/subnavs/apache-hawq-nav.erb @@ -371,6 +371,7 @@
  • Accessing Hive Data
  • Accessing HBase Data
  • Accessing JSON Data
  • +
  • Accessing External SQL Databases
  • Writing Data to HDFS
  • Using Profiles to Read and Write Data
  • PXF External Tables and API
  • diff --git a/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb b/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb index 1a49576..8d82d8a 100644 --- a/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb +++ b/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb @@ -51,6 +51,10 @@ PXF allows users to create custom connectors to access other parallel data store This topic describes how to access JSON data using PXF. +- **[Accessing External SQL Databases](../pxf/JdbcPXF.html)** + + This topic describes how to access data in external SQL databases using PXF. + - **[Writing Data to HDFS](../pxf/HDFSWritablePXF.html)** This topic describes how to write to HDFS using PXF. diff --git a/markdown/pxf/InstallPXFPlugins.html.md.erb b/markdown/pxf/InstallPXFPlugins.html.md.erb index 4fbda74..8060e5e 100644 --- a/markdown/pxf/InstallPXFPlugins.html.md.erb +++ b/markdown/pxf/InstallPXFPlugins.html.md.erb @@ -21,7 +21,7 @@ specific language governing permissions and limitations under the License. --> -This topic describes how to install the built-in PXF service plug-ins that are required to connect PXF to HDFS, Hive, HBase, and JSON. +This topic describes how to install the built-in PXF service plug-ins that are required to connect PXF to HDFS, Hive, HBase, JDBC, and JSON. **Note:** PXF requires that you run Tomcat on the host machine. Tomcat reserves ports 8005, 8080, and 8009. If you have configured Oozie JXM reporting on a host that will run PXF, make sure that the reporting service uses a port other than 8005. This helps to prevent port conflict errors from occurring when you start the PXF service. @@ -48,7 +48,7 @@ Each PXF service plug-in resides in its own RPM. You may have built these RPMs Perform the following steps on **_each_** node in your cluster to install PXF: -1. Install the PXF software, including Apache, the PXF service, and all PXF plug-ins: HDFS, HBase, Hive, JSON: +1. Install the PXF software, including Apache, the PXF service, and all PXF plug-ins: HDFS, HBase, Hive, JDBC, JSON: ```shell $ sudo yum install -y pxf diff --git a/markdown/pxf/JdbcPXF.html.md.erb b/markdown/pxf/JdbcPXF.html.md.erb new file mode 100644 index 0000000..ba485b0 --- /dev/null +++ b/markdown/pxf/JdbcPXF.html.md.erb @@ -0,0 +1,213 @@ +--- +title: Accessing External SQL Databases with JDBC (Beta) +--- + + + +Some of your data may already reside in an external SQL database. The PXF JDBC plug-in reads data stored in SQL databases including MySQL, ORACLE, PostgreSQL, and Hive. + +This section describes how to use PXF with JDBC, including an example of creating and querying an external table that accesses data in a MySQL database table. + +## Prerequisites + +Before accessing external SQL databases using HAWQ and PXF, ensure that: + +- The JDBC plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- The JDBC JAR files for the external SQL database are installed on all cluster nodes. +- The file locations of external SQL database JDBC JAR files are added to `pxf-public.classpath`. If you manage your HAWQ cluster with Ambari, add the JARS via the Ambari UI. If you managed your cluster from the command line, edit the `/etc/pxf/conf/pxf-public.classpath` file directly. + + +## Querying External SQL Data +The PXF JDBC plug-in supports the single profile named `Jdbc`. + +Use the following syntax to create a HAWQ external table representing external SQL database tables you access via JDBC:  + +``` sql +CREATE [READABLE | WRITABLE] EXTERNAL TABLE + ( [, ...] | LIKE ) +LOCATION ('pxf://[:]/.. + ?PROFILE=Jdbc[&=[...]]') +FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); +``` + +JDBC-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below. + +| Keyword | Value | +|-------|-------------------------------------| +| \ | The name of the PXF external table column. The PXF \ must exactly match the \ used in the external SQL table.| +| \ | The data type of the PXF external table column. The PXF \ must be equivalent to the data type used for \ in the SQL table.| +| \ | The PXF host. While \ may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \ must identify the HDFS NameService. | +| \ | The PXF port. If \ is omitted, PXF assumes \ identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. | +| \ | The schema name. The default schema name is `default`. | +| \ | The database name. The default database name is determined by the external SQL server. | +| \ | The table name. | +| PROFILE | The `PROFILE` keyword must specify `Jdbc`. | +| \ | The custom options supported by the `Jdbc` profile are discussed later in this section.| +| FORMAT 'CUSTOM' | The JDBC `CUSTOM` `FORMAT` supports only the built-in `'pxfwritable_import'` `FORMATTER` property. | + +*Note*: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification. + + +### JDBC Custom Options + +You may include one or more custom options in the `LOCATION` URI. Preface each option with an ampersand `&`. + +The JDBC plug-in `Jdbc` profile supports the following \s: + +| Option Name | Description +|---------------|--------------------| +| JDBC_DRIVER | The JDBC driver class name. | +| DB_URL | The URL to the database; includes the hostname, port, and database name. | +| USER | The database user name. | +| PASS | The database password for USER. | +| PARTITION_BY | The partition column, \:\. The JDBC plug-in supports `date`, `int`, and `enum` \s. Use the `yyyy-MM-dd` format for the `date` \. A null `PARTITION_BY` defaults to a single fragment. | +| RANGE | The query range, \[:\]. `RANGE` may be empty for an `enum` \. \ may be empty for an `int` \. | +| INTERVAL | The interval, \[:\], of one fragment. `INTERVAL` may be empty for an `enum` \. \ may be empty for an `int` \. | + +**Note**: The partition `RANGE` interval is left closed, right open. That is, the range includes the \ but does *not* include the \. + +Example JDBC \ connection string: + +``` pre +&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://:/testdb&USER=user1&PASS=changeme +``` + +Example JDBC \ substrings identifying partitioning parameters: + +``` pre +&PARTITION_BY=year:int&RANGE=2011:2013&INTERVAL=1 +&PARTITION_BY=createdate:date&RANGE=2013-01-01:2016-01-01&INTERVAL=1:month +&PARTITION_BY=color:enum&RANGE=red:yellow:blue +``` + +### Example: Using the Jdbc Profile to Access a MySQL Database Table + + +#### Create a MySQL Table + +Perform the following steps to create a MySQL table named `mysql_table1` in the default schema of a database named `mtestdb`: + +1. Connect to the default MySQL database as the root user: + + ``` shell + $ mysql --user=root -p + ``` + +2. Create a database named `mtestdb`: + + ``` sql + mysql> CREATE DATABASE mtestdb; + ``` + +2. Assign privileges to `mtestdb` to a user named `muser1` and then exit: + + ``` sql + mysql> GRANT ALL on mtestdb.* TO 'muser1'@'localhost' IDENTIFIED BY 'muser1'; + mysql> \q + ``` + +2. Connect to `mtestdb` as user `muser1`: + + ``` shell + $ mysql --user=muser1 mtestdb -pmuser1 + ``` + +2. Create a table named `mysql_table1` and insert some data into this table: + + ``` sql + mysql> CREATE TABLE mysql_table1(id int); + mysql> INSERT INTO mysql_table1 VALUES (1); + mysql> INSERT INTO mysql_table1 VALUES (2); + mysql> INSERT INTO mysql_table1 VALUES (3); + ``` + +3. Determine the MySQL server hostname and port: + + ``` sql + mysql> SHOW VARIABLES LIKE 'hostname'; + +---------------+-----------------+ + | Variable_name | Value | + +---------------+-----------------+ + | hostname | mydb.server.com | + +---------------+-----------------+ + + mysql> SHOW VARIABLES LIKE 'port'; + +---------------+-------+ + | Variable_name | Value | + +---------------+-------+ + | port | 3306 | + +---------------+-------+ + 1 row in set (0.00 sec) + ``` + + Make note of the hostname and port values returned. + +4. Construct the JDBC connection string, substituting your MySQL server hostname and port number: + + ``` pre + &JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://mydb.server.com:3306/mtestdb&USER=muser1&PASS=muser1 + ``` + + Save this string for use later. + +#### Configure PXF + +If not already present on your system, download and copy the MySQL connector JAR file to your system and update `pxf-public.classpath` with the location of this file. + +For example, if you manage your HAWQ cluster with Ambari, and the MySQL connector jar file is located in the `/usr/share/java` directory, add the following line: + +``` pre +/usr/share/java/mysql-connector*.jar +``` + +in the Ambari **PXF** service **Configs > Advanced pxf-public.classpath** pane and then use the **Restart** button to restart PXF. + +If you manage your HAWQ cluster from the command line, directly edit the `/etc/pxf/conf/pxf-public.classpath` file, adding the MySQL connector JAR file path. Then restart PXF on *each* HAWQ node with the following command: + +``` shell +root@hawq-node$ sudo service pxf-service restart +``` + +#### Query Using the Jdbc Profile + +Perform the following steps to create and query an external PXF table to access the `mysql_table1` table you created in the previous section: + +1. Use the `Jdbc` profile to create an external table to access the MySQL `mysql_table1` table. For example: + + ``` sql + gpadmin=# CREATE EXTERNAL TABLE pxf_jdbc_mysql_table1(id int) + LOCATION ('pxf://:51200/mtestdb.mysql_table1?PROFILE=JDBC&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://:/mtestdb&USER=hawquser1&PASS=hawquser1') + FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); + ``` + + Substitute your PXF \, as well as the `DB_URL` string you constructed in the previous exercise. + +2. Display all rows of the `pxf_jdbc_mysql_table1` table: + + ``` sql + gpadmin=# SELECT * FROM pxf_jdbc_mysql_table1; + id + ---- + 1 + 2 + 3 + (3 rows) + ``` + From 0cc020abbbbf8cbd7fc2126cd648c450ff501245 Mon Sep 17 00:00:00 2001 From: Lisa Owen Date: Thu, 22 Jun 2017 13:40:40 -0700 Subject: [PATCH 2/4] clarify that the external db jars are the jdbc driver --- markdown/pxf/JdbcPXF.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/markdown/pxf/JdbcPXF.html.md.erb b/markdown/pxf/JdbcPXF.html.md.erb index ba485b0..058e304 100644 --- a/markdown/pxf/JdbcPXF.html.md.erb +++ b/markdown/pxf/JdbcPXF.html.md.erb @@ -30,7 +30,7 @@ This section describes how to use PXF with JDBC, including an example of creatin Before accessing external SQL databases using HAWQ and PXF, ensure that: - The JDBC plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. -- The JDBC JAR files for the external SQL database are installed on all cluster nodes. +- The JDBC driver JAR files for the external SQL database are installed on all cluster nodes. - The file locations of external SQL database JDBC JAR files are added to `pxf-public.classpath`. If you manage your HAWQ cluster with Ambari, add the JARS via the Ambari UI. If you managed your cluster from the command line, edit the `/etc/pxf/conf/pxf-public.classpath` file directly. From ececae7fccdff8c5b770c79ea7b66fc00f85aaa4 Mon Sep 17 00:00:00 2001 From: Lisa Owen Date: Mon, 26 Jun 2017 18:23:00 -0700 Subject: [PATCH 3/4] range value cannot be empty --- markdown/pxf/JdbcPXF.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/markdown/pxf/JdbcPXF.html.md.erb b/markdown/pxf/JdbcPXF.html.md.erb index 058e304..ea06db0 100644 --- a/markdown/pxf/JdbcPXF.html.md.erb +++ b/markdown/pxf/JdbcPXF.html.md.erb @@ -78,7 +78,7 @@ The JDBC plug-in `Jdbc` profile supports the following \s: | USER | The database user name. | | PASS | The database password for USER. | | PARTITION_BY | The partition column, \:\. The JDBC plug-in supports `date`, `int`, and `enum` \s. Use the `yyyy-MM-dd` format for the `date` \. A null `PARTITION_BY` defaults to a single fragment. | -| RANGE | The query range, \[:\]. `RANGE` may be empty for an `enum` \. \ may be empty for an `int` \. | +| RANGE | The query range, \[:\]. \ may be empty for an `int` \. | | INTERVAL | The interval, \[:\], of one fragment. `INTERVAL` may be empty for an `enum` \. \ may be empty for an `int` \. | **Note**: The partition `RANGE` interval is left closed, right open. That is, the range includes the \ but does *not* include the \. From 208dbe103968c967774356a5f4ebcb629ec5420c Mon Sep 17 00:00:00 2001 From: Lisa Owen Date: Wed, 28 Jun 2017 10:38:46 -0600 Subject: [PATCH 4/4] incorporate edits from david --- markdown/pxf/JdbcPXF.html.md.erb | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/markdown/pxf/JdbcPXF.html.md.erb b/markdown/pxf/JdbcPXF.html.md.erb index ea06db0..337de66 100644 --- a/markdown/pxf/JdbcPXF.html.md.erb +++ b/markdown/pxf/JdbcPXF.html.md.erb @@ -62,26 +62,24 @@ JDBC-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](.. | \ | The custom options supported by the `Jdbc` profile are discussed later in this section.| | FORMAT 'CUSTOM' | The JDBC `CUSTOM` `FORMAT` supports only the built-in `'pxfwritable_import'` `FORMATTER` property. | -*Note*: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification. +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification. ### JDBC Custom Options -You may include one or more custom options in the `LOCATION` URI. Preface each option with an ampersand `&`. +You include custom options in the `LOCATION` URI. Preface each option with an ampersand `&`. -The JDBC plug-in `Jdbc` profile supports the following \s: +The JDBC plug-in `Jdbc` profile supports the following \s. The JDBC connection set up options are required. | Option Name | Description |---------------|--------------------| -| JDBC_DRIVER | The JDBC driver class name. | -| DB_URL | The URL to the database; includes the hostname, port, and database name. | -| USER | The database user name. | -| PASS | The database password for USER. | +| JDBC_DRIVER | The JDBC driver class name. (Required) | +| DB_URL | The URL to the database; includes the hostname, port, and database name. (Required) | +| USER | The database user name. (Required) | +| PASS | The database password for USER. (Required) | | PARTITION_BY | The partition column, \:\. The JDBC plug-in supports `date`, `int`, and `enum` \s. Use the `yyyy-MM-dd` format for the `date` \. A null `PARTITION_BY` defaults to a single fragment. | -| RANGE | The query range, \[:\]. \ may be empty for an `int` \. | -| INTERVAL | The interval, \[:\], of one fragment. `INTERVAL` may be empty for an `enum` \. \ may be empty for an `int` \. | - -**Note**: The partition `RANGE` interval is left closed, right open. That is, the range includes the \ but does *not* include the \. +| RANGE | (Used only when `PARTITION_BY` is specified.) The query range, \[:\]. \ may be empty for an `int` \. The `RANGE` is left closed, right open. That is, the range includes the \ but does *not* include the \.| +| INTERVAL | (Used only when `PARTITION_BY` is specified.) The interval, \[:\], of one fragment. `INTERVAL` may be empty for an `enum` \. \ may be empty for an `int` \. | Example JDBC \ connection string: