Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 25 additions & 27 deletions docs/content/jdbc_cfg.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,11 @@ To access data in an external SQL database with the PXF JDBC Connector, you must
- Register a compatible JDBC driver JAR file
- Specify the JDBC driver class name, database URL, and client credentials

In previous releases of Greenplum Database, you may have specified the JDBC driver class name, database URL, and client credentials via options in the `CREATE EXTERNAL TABLE` command. PXF now supports file-based server configuration for the JDBC Connector. This configuration, described below, allows you to specify these options and credentials in a file.

**Note**: PXF external tables that you previously created that directly specified the JDBC connection options will continue to work. If you want to move these tables to use JDBC file-based server configuration, you must create a server configuration, drop the external tables, and then recreate the tables specifying an appropriate `SERVER=<server_name>` clause.
You can supply the JDBC driver class name, database URL, and client credentials in either of two ways: as options in the `CREATE EXTERNAL TABLE`/`CREATE SERVER` command, or in a file-based PXF server configuration for the JDBC Connector. The sections below describe the file-based approach.

## <a id="cfg_jar"></a>JDBC Driver JAR Registration

PXF is bundled with the `postgresql-42.4.1.jar` JAR file. If you require a different JDBC driver, ensure that you install the JDBC driver JAR file for the external SQL database in the `$PXF_BASE/lib` directory on each Greenplum host. Be sure to install JDBC driver JAR files that are compatible with your JRE version. See [Registering PXF Library Dependencies](reg_jar_depend.html) for additional information.
PXF is bundled with the `postgresql-42.7.2.jar` JAR file. If you require a different JDBC driver, ensure that you install the JDBC driver JAR file for the external SQL database in the `$PXF_BASE/lib` directory on each Apache Cloudberry host. Be sure to install JDBC driver JAR files that are compatible with your JRE version. See [Registering PXF Library Dependencies](reg_jar_depend.html) for additional information.

## <a id="cfg_server"></a>JDBC Server Configuration

Expand Down Expand Up @@ -158,7 +156,7 @@ The PXF JDBC Connector uses JDBC connection pooling implemented by [HikariCP](ht

One or more connection pools may exist for a given server configuration, and user access to different external tables specifying the same server may share a connection pool.

**Note**: If you have activated JDBC user impersonation in a server configuration, the JDBC Connector creates a separate connection pool for each Greenplum Database user that accesses any external table specifying that server configuration.
**Note**: If you have activated JDBC user impersonation in a server configuration, the JDBC Connector creates a separate connection pool for each Apache Cloudberry user that accesses any external table specifying that server configuration.

The `jdbc.pool.enabled` property governs JDBC connection pooling for a server configuration. Connection pooling is activated by default. To deactive JDBC connection pooling for a server configuration, set the property to false:

Expand All @@ -184,13 +182,13 @@ You can set other HikariCP-specific connection pooling properties for a server c

#### <a id="jdbcconpool_tune"></a>Tuning the Maximum Connection Pool Size

To not exceed the maximum number of connections allowed by the target database, and at the same time ensure that each PXF JVM services a fair share of the JDBC connections, determine the maximum value of `maximumPoolSize` based on the size of the Greenplum Database cluster as follows:
To not exceed the maximum number of connections allowed by the target database, and at the same time ensure that each PXF JVM services a fair share of the JDBC connections, determine the maximum value of `maximumPoolSize` based on the size of the Apache Cloudberry cluster as follows:

``` pre
max_conns_allowed_by_remote_db / #_greenplum_segment_hosts
max_conns_allowed_by_remote_db / #_cloudberry_segment_hosts
```

For example, if your Greenplum Database cluster has 16 segment hosts and the target database allows 160 concurrent connections, calculate `maximumPoolSize` as follows:
For example, if your Apache Cloudberry cluster has 16 segment hosts and the target database allows 160 concurrent connections, calculate `maximumPoolSize` as follows:

``` pre
160 / 16 = 10
Expand All @@ -203,9 +201,9 @@ In practice, you may choose to set `maximumPoolSize` to a lower value, since the

The PXF JDBC Connector uses the `jdbc.user` setting or information in the `jdbc.url` to determine the identity of the user to connect to the external data store. When PXF JDBC user impersonation is deactivated (the default), the behavior of the JDBC Connector is further dependent upon the external data store. For example, if you are using the JDBC Connector to access Hive, the Connector uses the settings of certain Hive authentication and impersonation properties to determine the user. You may be required to provide a `jdbc.user` setting, or add properties to the `jdbc.url` setting in the server `jdbc-site.xml` file. Refer to [Configuring Hive Access via the JDBC Connector](hive_jdbc_cfg.html) for more information on this procedure.

When you activate PXF JDBC user impersonation, the PXF JDBC Connector accesses the external data store on behalf of a Greenplum Database end user. The Connector uses the name of the Greenplum Database user that accesses the PXF external table to try to connect to the external data store.
When you activate PXF JDBC user impersonation, the PXF JDBC Connector accesses the external data store on behalf of a Apache Cloudberry end user. The Connector uses the name of the Apache Cloudberry user that accesses the PXF external table to try to connect to the external data store.

When you activate JDBC user impersonation for a PXF server, PXF overrides the value of a `jdbc.user` property setting defined in either `jdbc-site.xml` or `<greenplum_user_name>-user.xml`, or specified in the external table DDL, with the Greenplum Database user name. For user impersonation to work effectively when the external data store requires passwords to authenticate connecting users, you must specify the `jdbc.password` setting for each user that can be impersonated in that user's `<greenplum_user_name>-user.xml` property override file. Refer to [Configuring a PXF User](cfg_server.html#usercfg) for more information about per-server, per-Greenplum-user configuration.
When you activate JDBC user impersonation for a PXF server, PXF overrides the value of a `jdbc.user` property setting defined in either `jdbc-site.xml` or `<cloudberry_user_name>-user.xml`, or specified in the external table DDL, with the Apache Cloudberry user name. For user impersonation to work effectively when the external data store requires passwords to authenticate connecting users, you must specify the `jdbc.password` setting for each user that can be impersonated in that user's `<cloudberry_user_name>-user.xml` property override file. Refer to [Configuring a PXF User](cfg_server.html#usercfg) for more information about per-server, per-Cloudberry-user configuration.

The `pxf.service.user.impersonation` property in the `jdbc-site.xml` configuration file governs JDBC user impersonation.

Expand All @@ -214,7 +212,7 @@ The `pxf.service.user.impersonation` property in the `jdbc-site.xml` configurati

By default, PXF JDBC user impersonation is deactivated. Perform the following procedure to turn PXF user impersonation on or off for a JDBC server configuration.

1. Log in to your Greenplum Database coordinator host as the administrative user:
1. Log in to your Apache Cloudberry coordinator host as the administrative user:

``` shell
$ ssh gpadmin@<coordinator>
Expand All @@ -239,7 +237,7 @@ By default, PXF JDBC user impersonation is deactivated. Perform the following p

7. Save the `jdbc-site.xml` file and exit the editor.

8. Use the `pxf cluster sync` command to synchronize the PXF JDBC server configuration to your Greenplum Database cluster:
8. Use the `pxf cluster sync` command to synchronize the PXF JDBC server configuration to your Apache Cloudberry cluster:

``` shell
gpadmin@coordinator$ pxf cluster sync
Expand All @@ -266,9 +264,9 @@ After establishing the database connection, PXF implicitly runs the following co
SET SESSION_USER = bill
```

PXF recognizes a synthetic property value, `${pxf.session.user}`, that identifies the Greenplum Database user name. You may choose to use this value when you configure a property that requires a value that changes based on the Greenplum user running the session.
PXF recognizes a synthetic property value, `${pxf.session.user}`, that identifies the Apache Cloudberry user name. You may choose to use this value when you configure a property that requires a value that changes based on the Cloudberry user running the session.

A scenario where you might use `${pxf.session.user}` is when you authenticate to the remote SQL database with Kerberos, the primary component of the Kerberos principal identifies the Greenplum Database user name, and you want to run queries in the remote database using this effective user name. For example, if you are accessing DB2, you would configure your `jdbc-site.xml` to specify the Kerberos `securityMechanism` and `KerberosServerPrincipal`, and then set the `session_user` variable as follows:
A scenario where you might use `${pxf.session.user}` is when you authenticate to the remote SQL database with Kerberos, the primary component of the Kerberos principal identifies the Apache Cloudberry user name, and you want to run queries in the remote database using this effective user name. For example, if you are accessing DB2, you would configure your `jdbc-site.xml` to specify the Kerberos `securityMechanism` and `KerberosServerPrincipal`, and then set the `session_user` variable as follows:

``` xml
<property>
Expand All @@ -277,13 +275,13 @@ A scenario where you might use `${pxf.session.user}` is when you authenticate to
</property>
```

With this configuration, PXF `SET`s the DB2 `session_user` variable to the current Greenplum Database user name, and runs subsequent operations on the DB2 table as that user.
With this configuration, PXF `SET`s the DB2 `session_user` variable to the current Apache Cloudberry user name, and runs subsequent operations on the DB2 table as that user.

### <a id="sessauth_conpool"></a>Session Authorization Considerations for Connection Pooling

When PXF performs session authorization on your behalf and JDBC connection pooling is activated (the default), you may choose to set the `jdbc.pool.qualifier` property. Setting this property instructs PXF to include the property value in the criteria that it uses to create and reuse connection pools. In practice, you would not set this to a fixed value, but rather to a value that changes based on the user/session/transaction, etc. When you set this property to `${pxf.session.user}`, PXF includes the Greenplum Database user name in the criteria that it uses to create and re-use connection pools. The default setting is no qualifier.
When PXF performs session authorization on your behalf and JDBC connection pooling is activated (the default), you may choose to set the `jdbc.pool.qualifier` property. Setting this property instructs PXF to include the property value in the criteria that it uses to create and reuse connection pools. In practice, you would not set this to a fixed value, but rather to a value that changes based on the user/session/transaction, etc. When you set this property to `${pxf.session.user}`, PXF includes the Apache Cloudberry user name in the criteria that it uses to create and re-use connection pools. The default setting is no qualifier.

To make use of this feature, add or uncomment the following property block in `jdbc-site.xml` to prompt PXF to include the Greenplum user name in connection pool creation/reuse criteria:
To make use of this feature, add or uncomment the following property block in `jdbc-site.xml` to prompt PXF to include the Cloudberry user name in connection pool creation/reuse criteria:

``` xml
<property>
Expand All @@ -299,10 +297,10 @@ A PXF *named query* is a static query that you configure, and that PXF runs in t
To configure and use a PXF JDBC named query:

1. You [define the query](#namedquery_define) in a text file.
2. You provide the [query name](#namedquery_pub) to Greenplum Database users.
3. The Greenplum Database user [references the query](#namedquery_ref) in a Greenplum Database external table definition.
2. You provide the [query name](#namedquery_pub) to Apache Cloudberry users.
3. The Apache Cloudberry user [references the query](#namedquery_ref) in a Apache Cloudberry external table definition.

PXF runs the query each time the user invokes a `SELECT` command on the Greenplum Database external table.
PXF runs the query each time the user invokes a `SELECT` command on the Apache Cloudberry external table.


### <a id="namedquery_define"></a>Defining a Named Query
Expand All @@ -326,13 +324,13 @@ You may optionally provide the ending semicolon (`;`) for the SQL statement.

### <a id="namedquery_pub"></a>Query Naming

The Greenplum Database user references a named query by specifying the query file name without the extension. For example, if you define a query in a file named `report.sql`, the name of that query is `report`.
The Apache Cloudberry user references a named query by specifying the query file name without the extension. For example, if you define a query in a file named `report.sql`, the name of that query is `report`.

Named queries are associated with a specific JDBC server configuration. You will provide the available query names to the Greenplum Database users that you allow to create external tables using the server configuration.
Named queries are associated with a specific JDBC server configuration. You will provide the available query names to the Apache Cloudberry users that you allow to create external tables using the server configuration.

#### <a id="namedquery_ref"></a>Referencing a Named Query

The Greenplum Database user specifies `query:<query_name>` rather than the name of a remote SQL database table when they create the external table. For example, if the query is defined in the file `$PXF_BASE/servers/mydb/report.sql`, the `CREATE EXTERNAL TABLE` `LOCATION` clause would include the following components:
The Apache Cloudberry user specifies `query:<query_name>` rather than the name of a remote SQL database table when they create the external/foreign table. For example, if the query is defined in the file `$PXF_BASE/servers/mydb/report.sql`, the `CREATE EXTERNAL TABLE` `LOCATION` clause would include the following components:

``` sql
LOCATION ('pxf://query:report?PROFILE=jdbc&SERVER=mydb ...')
Expand All @@ -352,15 +350,15 @@ You can use the JDBC Connector to access Hive. Refer to [Configuring the JDBC Co

## <a id="cfg_proc"></a>Example Configuration Procedure

In this procedure, you name and add a PXF JDBC server configuration for a PostgreSQL database and synchronize the server configuration(s) to the Greenplum Database cluster.
In this procedure, you name and add a PXF JDBC server configuration for a PostgreSQL database and synchronize the server configuration(s) to the Apache Cloudberry cluster.

1. Log in to your Greenplum Database coordinator host:
1. Log in to your Apache Cloudberry coordinator host:

``` shell
$ ssh gpadmin@<coordinator>
```

2. Choose a name for the JDBC server. You will provide the name to Greenplum users that you choose to allow to reference tables in the external SQL database as the configured user.
2. Choose a name for the JDBC server. You will provide the name to Cloudberry users that you choose to allow to reference tables in the external SQL database as the configured user.

**Note**: The server name `default` is reserved.

Expand Down Expand Up @@ -401,7 +399,7 @@ In this procedure, you name and add a PXF JDBC server configuration for a Postgr
```
6. Save your changes and exit the editor.

7. Use the `pxf cluster sync` command to copy the new server configuration to the Greenplum Database cluster:
7. Use the `pxf cluster sync` command to copy the new server configuration to the Apache Cloudberry cluster:

``` shell
gpadmin@coordinator$ pxf cluster sync
Expand Down
Loading
Loading