From cf27ae0a9791eaca8203304c96bbf3dcbd37d48a Mon Sep 17 00:00:00 2001 From: Nikolay Antonov Date: Wed, 15 Apr 2026 10:49:26 +0500 Subject: [PATCH 1/2] Docs: update JDBC-related docs --- docs/content/jdbc_cfg.html.md.erb | 52 ++++++------ docs/content/jdbc_pxf.html.md.erb | 85 +++++++++++++------- docs/content/jdbc_pxf_mysql.html.md.erb | 26 ++++-- docs/content/jdbc_pxf_named.html.md.erb | 30 +++++-- docs/content/jdbc_pxf_postgresql.html.md.erb | 18 ++++- docs/content/jdbc_pxf_trino.html.md.erb | 19 +++-- 6 files changed, 150 insertions(+), 80 deletions(-) diff --git a/docs/content/jdbc_cfg.html.md.erb b/docs/content/jdbc_cfg.html.md.erb index 1d572b562..b0923ec83 100644 --- a/docs/content/jdbc_cfg.html.md.erb +++ b/docs/content/jdbc_cfg.html.md.erb @@ -13,13 +13,11 @@ To access data in an external SQL database with the PXF JDBC Connector, you must - Register a compatible JDBC driver JAR file - Specify the JDBC driver class name, database URL, and client credentials -In previous releases of Greenplum Database, you may have specified the JDBC driver class name, database URL, and client credentials via options in the `CREATE EXTERNAL TABLE` command. PXF now supports file-based server configuration for the JDBC Connector. This configuration, described below, allows you to specify these options and credentials in a file. - -**Note**: PXF external tables that you previously created that directly specified the JDBC connection options will continue to work. If you want to move these tables to use JDBC file-based server configuration, you must create a server configuration, drop the external tables, and then recreate the tables specifying an appropriate `SERVER=` clause. +You can supply the JDBC driver class name, database URL, and client credentials in either of two ways: as options in the `CREATE EXTERNAL TABLE`/`CREATE SERVER` command, or in a file-based PXF server configuration for the JDBC Connector. The sections below describe the file-based approach. ## JDBC Driver JAR Registration -PXF is bundled with the `postgresql-42.4.1.jar` JAR file. If you require a different JDBC driver, ensure that you install the JDBC driver JAR file for the external SQL database in the `$PXF_BASE/lib` directory on each Greenplum host. Be sure to install JDBC driver JAR files that are compatible with your JRE version. See [Registering PXF Library Dependencies](reg_jar_depend.html) for additional information. +PXF is bundled with the `postgresql-42.7.2.jar` JAR file. If you require a different JDBC driver, ensure that you install the JDBC driver JAR file for the external SQL database in the `$PXF_BASE/lib` directory on each Apache Cloudberry host. Be sure to install JDBC driver JAR files that are compatible with your JRE version. See [Registering PXF Library Dependencies](reg_jar_depend.html) for additional information. ## JDBC Server Configuration @@ -158,7 +156,7 @@ The PXF JDBC Connector uses JDBC connection pooling implemented by [HikariCP](ht One or more connection pools may exist for a given server configuration, and user access to different external tables specifying the same server may share a connection pool. -**Note**: If you have activated JDBC user impersonation in a server configuration, the JDBC Connector creates a separate connection pool for each Greenplum Database user that accesses any external table specifying that server configuration. +**Note**: If you have activated JDBC user impersonation in a server configuration, the JDBC Connector creates a separate connection pool for each Apache Cloudberry user that accesses any external table specifying that server configuration. The `jdbc.pool.enabled` property governs JDBC connection pooling for a server configuration. Connection pooling is activated by default. To deactive JDBC connection pooling for a server configuration, set the property to false: @@ -184,13 +182,13 @@ You can set other HikariCP-specific connection pooling properties for a server c #### Tuning the Maximum Connection Pool Size -To not exceed the maximum number of connections allowed by the target database, and at the same time ensure that each PXF JVM services a fair share of the JDBC connections, determine the maximum value of `maximumPoolSize` based on the size of the Greenplum Database cluster as follows: +To not exceed the maximum number of connections allowed by the target database, and at the same time ensure that each PXF JVM services a fair share of the JDBC connections, determine the maximum value of `maximumPoolSize` based on the size of the Apache Cloudberry cluster as follows: ``` pre -max_conns_allowed_by_remote_db / #_greenplum_segment_hosts +max_conns_allowed_by_remote_db / #_cloudberry_segment_hosts ``` -For example, if your Greenplum Database cluster has 16 segment hosts and the target database allows 160 concurrent connections, calculate `maximumPoolSize` as follows: +For example, if your Apache Cloudberry cluster has 16 segment hosts and the target database allows 160 concurrent connections, calculate `maximumPoolSize` as follows: ``` pre 160 / 16 = 10 @@ -203,9 +201,9 @@ In practice, you may choose to set `maximumPoolSize` to a lower value, since the The PXF JDBC Connector uses the `jdbc.user` setting or information in the `jdbc.url` to determine the identity of the user to connect to the external data store. When PXF JDBC user impersonation is deactivated (the default), the behavior of the JDBC Connector is further dependent upon the external data store. For example, if you are using the JDBC Connector to access Hive, the Connector uses the settings of certain Hive authentication and impersonation properties to determine the user. You may be required to provide a `jdbc.user` setting, or add properties to the `jdbc.url` setting in the server `jdbc-site.xml` file. Refer to [Configuring Hive Access via the JDBC Connector](hive_jdbc_cfg.html) for more information on this procedure. -When you activate PXF JDBC user impersonation, the PXF JDBC Connector accesses the external data store on behalf of a Greenplum Database end user. The Connector uses the name of the Greenplum Database user that accesses the PXF external table to try to connect to the external data store. +When you activate PXF JDBC user impersonation, the PXF JDBC Connector accesses the external data store on behalf of a Apache Cloudberry end user. The Connector uses the name of the Apache Cloudberry user that accesses the PXF external table to try to connect to the external data store. -When you activate JDBC user impersonation for a PXF server, PXF overrides the value of a `jdbc.user` property setting defined in either `jdbc-site.xml` or `-user.xml`, or specified in the external table DDL, with the Greenplum Database user name. For user impersonation to work effectively when the external data store requires passwords to authenticate connecting users, you must specify the `jdbc.password` setting for each user that can be impersonated in that user's `-user.xml` property override file. Refer to [Configuring a PXF User](cfg_server.html#usercfg) for more information about per-server, per-Greenplum-user configuration. +When you activate JDBC user impersonation for a PXF server, PXF overrides the value of a `jdbc.user` property setting defined in either `jdbc-site.xml` or `-user.xml`, or specified in the external table DDL, with the Apache Cloudberry user name. For user impersonation to work effectively when the external data store requires passwords to authenticate connecting users, you must specify the `jdbc.password` setting for each user that can be impersonated in that user's `-user.xml` property override file. Refer to [Configuring a PXF User](cfg_server.html#usercfg) for more information about per-server, per-Cloudberry-user configuration. The `pxf.service.user.impersonation` property in the `jdbc-site.xml` configuration file governs JDBC user impersonation. @@ -214,7 +212,7 @@ The `pxf.service.user.impersonation` property in the `jdbc-site.xml` configurati By default, PXF JDBC user impersonation is deactivated. Perform the following procedure to turn PXF user impersonation on or off for a JDBC server configuration. -1. Log in to your Greenplum Database coordinator host as the administrative user: +1. Log in to your Apache Cloudberry coordinator host as the administrative user: ``` shell $ ssh gpadmin@ @@ -239,7 +237,7 @@ By default, PXF JDBC user impersonation is deactivated. Perform the following p 7. Save the `jdbc-site.xml` file and exit the editor. -8. Use the `pxf cluster sync` command to synchronize the PXF JDBC server configuration to your Greenplum Database cluster: +8. Use the `pxf cluster sync` command to synchronize the PXF JDBC server configuration to your Apache Cloudberry cluster: ``` shell gpadmin@coordinator$ pxf cluster sync @@ -266,9 +264,9 @@ After establishing the database connection, PXF implicitly runs the following co SET SESSION_USER = bill ``` -PXF recognizes a synthetic property value, `${pxf.session.user}`, that identifies the Greenplum Database user name. You may choose to use this value when you configure a property that requires a value that changes based on the Greenplum user running the session. +PXF recognizes a synthetic property value, `${pxf.session.user}`, that identifies the Apache Cloudberry user name. You may choose to use this value when you configure a property that requires a value that changes based on the Cloudberry user running the session. -A scenario where you might use `${pxf.session.user}` is when you authenticate to the remote SQL database with Kerberos, the primary component of the Kerberos principal identifies the Greenplum Database user name, and you want to run queries in the remote database using this effective user name. For example, if you are accessing DB2, you would configure your `jdbc-site.xml` to specify the Kerberos `securityMechanism` and `KerberosServerPrincipal`, and then set the `session_user` variable as follows: +A scenario where you might use `${pxf.session.user}` is when you authenticate to the remote SQL database with Kerberos, the primary component of the Kerberos principal identifies the Apache Cloudberry user name, and you want to run queries in the remote database using this effective user name. For example, if you are accessing DB2, you would configure your `jdbc-site.xml` to specify the Kerberos `securityMechanism` and `KerberosServerPrincipal`, and then set the `session_user` variable as follows: ``` xml @@ -277,13 +275,13 @@ A scenario where you might use `${pxf.session.user}` is when you authenticate to ``` -With this configuration, PXF `SET`s the DB2 `session_user` variable to the current Greenplum Database user name, and runs subsequent operations on the DB2 table as that user. +With this configuration, PXF `SET`s the DB2 `session_user` variable to the current Apache Cloudberry user name, and runs subsequent operations on the DB2 table as that user. ### Session Authorization Considerations for Connection Pooling -When PXF performs session authorization on your behalf and JDBC connection pooling is activated (the default), you may choose to set the `jdbc.pool.qualifier` property. Setting this property instructs PXF to include the property value in the criteria that it uses to create and reuse connection pools. In practice, you would not set this to a fixed value, but rather to a value that changes based on the user/session/transaction, etc. When you set this property to `${pxf.session.user}`, PXF includes the Greenplum Database user name in the criteria that it uses to create and re-use connection pools. The default setting is no qualifier. +When PXF performs session authorization on your behalf and JDBC connection pooling is activated (the default), you may choose to set the `jdbc.pool.qualifier` property. Setting this property instructs PXF to include the property value in the criteria that it uses to create and reuse connection pools. In practice, you would not set this to a fixed value, but rather to a value that changes based on the user/session/transaction, etc. When you set this property to `${pxf.session.user}`, PXF includes the Apache Cloudberry user name in the criteria that it uses to create and re-use connection pools. The default setting is no qualifier. -To make use of this feature, add or uncomment the following property block in `jdbc-site.xml` to prompt PXF to include the Greenplum user name in connection pool creation/reuse criteria: +To make use of this feature, add or uncomment the following property block in `jdbc-site.xml` to prompt PXF to include the Cloudberry user name in connection pool creation/reuse criteria: ``` xml @@ -299,10 +297,10 @@ A PXF *named query* is a static query that you configure, and that PXF runs in t To configure and use a PXF JDBC named query: 1. You [define the query](#namedquery_define) in a text file. -2. You provide the [query name](#namedquery_pub) to Greenplum Database users. -3. The Greenplum Database user [references the query](#namedquery_ref) in a Greenplum Database external table definition. +2. You provide the [query name](#namedquery_pub) to Apache Cloudberry users. +3. The Apache Cloudberry user [references the query](#namedquery_ref) in a Apache Cloudberry external table definition. -PXF runs the query each time the user invokes a `SELECT` command on the Greenplum Database external table. +PXF runs the query each time the user invokes a `SELECT` command on the Apache Cloudberry external table. ### Defining a Named Query @@ -326,13 +324,13 @@ You may optionally provide the ending semicolon (`;`) for the SQL statement. ### Query Naming -The Greenplum Database user references a named query by specifying the query file name without the extension. For example, if you define a query in a file named `report.sql`, the name of that query is `report`. +The Apache Cloudberry user references a named query by specifying the query file name without the extension. For example, if you define a query in a file named `report.sql`, the name of that query is `report`. -Named queries are associated with a specific JDBC server configuration. You will provide the available query names to the Greenplum Database users that you allow to create external tables using the server configuration. +Named queries are associated with a specific JDBC server configuration. You will provide the available query names to the Apache Cloudberry users that you allow to create external tables using the server configuration. #### Referencing a Named Query -The Greenplum Database user specifies `query:` rather than the name of a remote SQL database table when they create the external table. For example, if the query is defined in the file `$PXF_BASE/servers/mydb/report.sql`, the `CREATE EXTERNAL TABLE` `LOCATION` clause would include the following components: +The Apache Cloudberry user specifies `query:` rather than the name of a remote SQL database table when they create the external/foreign table. For example, if the query is defined in the file `$PXF_BASE/servers/mydb/report.sql`, the `CREATE EXTERNAL TABLE` `LOCATION` clause would include the following components: ``` sql LOCATION ('pxf://query:report?PROFILE=jdbc&SERVER=mydb ...') @@ -352,15 +350,15 @@ You can use the JDBC Connector to access Hive. Refer to [Configuring the JDBC Co ## Example Configuration Procedure -In this procedure, you name and add a PXF JDBC server configuration for a PostgreSQL database and synchronize the server configuration(s) to the Greenplum Database cluster. +In this procedure, you name and add a PXF JDBC server configuration for a PostgreSQL database and synchronize the server configuration(s) to the Apache Cloudberry cluster. -1. Log in to your Greenplum Database coordinator host: +1. Log in to your Apache Cloudberry coordinator host: ``` shell $ ssh gpadmin@ ``` -2. Choose a name for the JDBC server. You will provide the name to Greenplum users that you choose to allow to reference tables in the external SQL database as the configured user. +2. Choose a name for the JDBC server. You will provide the name to Cloudberry users that you choose to allow to reference tables in the external SQL database as the configured user. **Note**: The server name `default` is reserved. @@ -401,7 +399,7 @@ In this procedure, you name and add a PXF JDBC server configuration for a Postgr ``` 6. Save your changes and exit the editor. -7. Use the `pxf cluster sync` command to copy the new server configuration to the Greenplum Database cluster: +7. Use the `pxf cluster sync` command to copy the new server configuration to the Apache Cloudberry cluster: ``` shell gpadmin@coordinator$ pxf cluster sync diff --git a/docs/content/jdbc_pxf.html.md.erb b/docs/content/jdbc_pxf.html.md.erb index e562a6428..a016c2ed8 100644 --- a/docs/content/jdbc_pxf.html.md.erb +++ b/docs/content/jdbc_pxf.html.md.erb @@ -32,9 +32,9 @@ This section describes how to use the PXF JDBC connector to access data in an ex Before you access an external SQL database using the PXF JDBC connector, ensure that: - You can identify the PXF runtime configuration directory (`$PXF_BASE`). -- You have configured PXF, and PXF is running on each Greenplum Database host. See [Configuring PXF](instcfg_pxf.html) for additional information. -- Connectivity exists between all Greenplum Database hosts and the external SQL database. -- You have configured your external SQL database for user access from all Greenplum Database hosts. +- You have configured PXF, and PXF is running on each Apache Cloudberry host. See [Configuring PXF](instcfg_pxf.html) for additional information. +- Connectivity exists between all Apache Cloudberry hosts and the external SQL database. +- You have configured your external SQL database for user access from all Apache Cloudberry hosts. - You have registered any JDBC driver JAR dependencies. - (Recommended) You have created one or more named PXF JDBC connector server configurations as described in [Configuring the PXF JDBC Connector](jdbc_cfg.html). @@ -68,11 +68,25 @@ PXF includes version 1.1.0 of the Hive JDBC driver. This version does **not** su | BYTEA | N/A | N/A | Read, Write | ## Accessing an External SQL Database -The PXF JDBC connector supports a single profile named `jdbc`. You can both read data from and write data to an external SQL database table with this profile. You can also use the connector to run a static, named query in external SQL database and read the results. +The PXF JDBC connector supports a single profile named `jdbc` for external tables and `jdbc_pxf_fdw` foreign data wrapper for FDW. You can both read data from and write data to an external SQL database table with this profile. You can also use the connector to run a static, named query in external SQL database and read the results. -To access data in a remote SQL database, you create a readable or writable Greenplum Database external table that references the remote database table. The Greenplum Database external table and the remote database table or query result tuple must have the same definition; the column names and types must match. +To access data in a remote SQL database, you create a readable or writable Apache Cloudberry external table that references the remote database table. The Apache Cloudberry external table and the remote database table or query result tuple must have the same definition; the column names and types must match. -Use the following syntax to create a Greenplum Database external table that references a remote SQL database table or a query result from the remote database: +Use the following syntax to create a Apache Cloudberry foreign table that references a remote SQL database table or a query result from the remote database: + +
+CREATE SERVER "my_server" FOREIGN DATA WRAPPER jdbc_pxf_fdw;
+CREATE USER MAPPING FOR CURRENT_USER SERVER "my_server";
+CREATE FOREIGN TABLE <table_name>
+    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
+    SERVER "my_server"
+    OPTIONS (
+     	resource '<external-table-name>|query:<query_name>'
+     	[,&<custom-option>=<value>[...]]
+	)
+
+ +OR create a Apache Cloudberry external table:
 CREATE [READABLE | WRITABLE] EXTERNAL TABLE <table_name>
@@ -82,7 +96,7 @@ FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'|'pxfwritable_export');
 
-The specific keywords and values used in the Greenplum Database [CREATE EXTERNAL TABLE](https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/ref_guide-sql_commands-CREATE_EXTERNAL_TABLE.html) command are described in the table below. +The specific keywords and values used in the Apache Cloudberry [CREATE FOREIGN TABLE](https://cloudberry.apache.org/docs/sql-stmts/create-foreign-table) / [CREATE EXTERNAL TABLE](https://cloudberry.apache.org/docs/sql-stmts/create-external-table/) command are described in the table below. | Keyword | Value | |-------|-------------------------------------| @@ -97,7 +111,7 @@ The specific keywords and values used in the Greenplum Database [CREATE EXTERNAL ### JDBC Custom Options -You include JDBC connector custom options in the `LOCATION` URI, prefacing each option with an ampersand `&`. `CREATE EXTERNAL TABLE` \s supported by the `jdbc` profile include: +You include JDBC connector custom options in the `OPTIONS` or `LOCATION` URI (prefacing each option with an ampersand `&`). `CREATE FOREIGN TABLE` / `CREATE EXTERNAL TABLE` \s supported by the `jdbc` profile include: | Option Name | Operation | Description |---------------|------------|--------| @@ -111,7 +125,6 @@ You include JDBC connector custom options in the `LOCATION` URI, prefacing each | INTERVAL | Read | Required when `PARTITION_BY` is specified and of the `int`, `bigint`, or `date` type. The interval, \[:\], of one fragment. Used with `RANGE` as a hint to aid the creation of partitions. Specify the size of the fragment in \. If the partition column is a `date` type, use the \ to specify `year`, `month`, or `day`. PXF ignores `INTERVAL` when the `PARTITION_BY` column is of the `enum` type. | | QUOTE_COLUMNS | Read | Controls whether PXF should quote column names when constructing an SQL query to the external database. Specify `true` to force PXF to quote all column names; PXF does not quote column names if any other value is provided. If `QUOTE_COLUMNS` is not specified (the default), PXF automatically quotes *all* column names in the query when *any* column name:
- includes special characters, or
- is mixed case and the external database does not support unquoted mixed case identifiers. | - #### Batching Insert Operations (Write) *When the JDBC driver of the external SQL database supports it*, batching of `INSERT` operations may significantly increase performance. @@ -152,9 +165,9 @@ To deactivate or activate a thread pool and set the pool size, create the PXF ex #### Partitioning (Read) -The PXF JDBC connector supports simultaneous read access from PXF instances running on multiple Greenplum Database hosts to an external SQL table. This feature is referred to as partitioning. Read partitioning is not activated by default. To activate read partitioning, set the `PARTITION_BY`, `RANGE`, and `INTERVAL` custom options when you create the PXF external table. +The PXF JDBC connector supports simultaneous read access from PXF instances running on multiple Apache Cloudberry hosts to an external SQL table. This feature is referred to as partitioning. Read partitioning is not activated by default. To activate read partitioning, set the `PARTITION_BY`, `RANGE`, and `INTERVAL` custom options when you create the PXF external table. -PXF uses the `RANGE` and `INTERVAL` values and the `PARTITON_BY` column that you specify to assign specific data rows in the external table to PXF instances running on the Greenplum Database segment hosts. This column selection is specific to PXF processing, and has no relationship to a partition column that you may have specified for the table in the external SQL database. +PXF uses the `RANGE` and `INTERVAL` values and the `PARTITON_BY` column that you specify to assign specific data rows in the external table to PXF instances running on the Apache Cloudberry segment hosts. This column selection is specific to PXF processing, and has no relationship to a partition column that you may have specified for the table in the external SQL database. Example JDBC \ substrings that identify partitioning parameters: @@ -176,9 +189,9 @@ For example, when a user queries a PXF external table created with a `LOCATION` - Fragment 4: WHERE (id >= 5) - implicitly-generated fragment for RANGE end-bounded interval - Fragment 5: WHERE (id IS NULL) - implicitly-generated fragment -PXF distributes the fragments among Greenplum Database segments. A PXF instance running on a segment host spawns a thread for each segment on that host that services a fragment. If the number of fragments is less than or equal to the number of Greenplum segments configured on a segment host, a single PXF instance may service all of the fragments. Each PXF instance sends its results back to Greenplum Database, where they are collected and returned to the user. +PXF distributes the fragments among Apache Cloudberry segments. A PXF instance running on a segment host spawns a thread for each segment on that host that services a fragment. If the number of fragments is less than or equal to the number of Cloudberry segments configured on a segment host, a single PXF instance may service all of the fragments. Each PXF instance sends its results back to Apache Cloudberry, where they are collected and returned to the user. -When you specify the `PARTITION_BY` option, tune the `INTERVAL` value and unit based upon the optimal number of JDBC connections to the target database and the optimal distribution of external data across Greenplum Database segments. The `INTERVAL` low boundary is driven by the number of Greenplum Database segments while the high boundary is driven by the acceptable number of JDBC connections to the target database. The `INTERVAL` setting influences the number of fragments, and should ideally not be set too high nor too low. Testing with multiple values may help you select the optimal settings. +When you specify the `PARTITION_BY` option, tune the `INTERVAL` value and unit based upon the optimal number of JDBC connections to the target database and the optimal distribution of external data across Apache Cloudberry segments. The `INTERVAL` low boundary is driven by the number of Apache Cloudberry segments while the high boundary is driven by the acceptable number of JDBC connections to the target database. The `INTERVAL` setting influences the number of fragments, and should ideally not be set too high nor too low. Testing with multiple values may help you select the optimal settings. ## Examples @@ -197,12 +210,12 @@ The PXF JDBC Connector allows you to specify a statically-defined query to run a - You need to join several tables that all reside in the same external database. - You want to perform complex aggregation closer to the data source. - You would use, but are not allowed to create, a `VIEW` in the external database. -- You would rather consume computational resources in the external system to minimize utilization of Greenplum Database resources. +- You would rather consume computational resources in the external system to minimize utilization of Apache Cloudberry resources. - You want to run a HIVE query and control resource utilization via YARN. -The Greenplum Database administrator defines a query and provides you with the query name to use when you create the external table. Instead of a table name, you specify `query:` in the `CREATE EXTERNAL TABLE` `LOCATION` clause to instruct the PXF JDBC connector to run the static query named `` in the remote SQL database. +The Apache Cloudberry administrator defines a query and provides you with the query name to use when you create the external table. Instead of a table name, you specify `query:` in the `CREATE EXTERNAL TABLE` `LOCATION` clause to instruct the PXF JDBC connector to run the static query named `` in the remote SQL database. -PXF supports named queries only with readable external tables. You must create a unique Greenplum Database readable external table for each query that you want to run. +PXF supports named queries only with readable external tables. You must create a unique Apache Cloudberry readable external table for each query that you want to run. The names and types of the external table columns must exactly match the names, types, and order of the columns return by the query result. If the query returns the results of an aggregation or other function, be sure to use the `AS` qualifier to specify a specific column name. @@ -222,8 +235,14 @@ SELECT c.name, sum(o.amount) AS total, o.month GROUP BY c.name, o.month ``` -This query returns tuples of type `(name text, total int, month int)`. If the `order_rpt` query is defined for the PXF JDBC server named `pgserver`, you could create a Greenplum Database external table to read these query results as follows: +This query returns tuples of type `(name text, total int, month int)`. If the `order_rpt` query is defined for the PXF JDBC server named `pgserver`, you could create a Apache Cloudberry external/foreign table to read these query results as follows: +``` sql +CREATE FOREIGN TABLE orderrpt_frompg(name text, total int, month int) + SERVER pgserver + OPTIONS ( resource 'query:order_rpt', PARTITION_BY 'month:int', RANGE '1:13', INTERVAL '3') +``` +OR ``` sql CREATE EXTERNAL TABLE orderrpt_frompg(name text, total int, month int) LOCATION ('pxf://query:order_rpt?PROFILE=jdbc&SERVER=pgserver&PARTITION_BY=month:int&RANGE=1:13&INTERVAL=3') @@ -238,7 +257,7 @@ The PXF JDBC connector automatically applies column projection and filter pushdo ## Overriding the JDBC Server Configuration with DDL -You can override certain properties in a JDBC server configuration for a specific external database table by directly specifying the custom option in the `CREATE EXTERNAL TABLE` `LOCATION` clause: +You can override certain properties in a JDBC server configuration for a specific external database table by directly specifying the custom option in the `CREATE SERVER` `OPTIONS` section or `CREATE EXTERNAL TABLE` `LOCATION` clause: | Custom Option Name | jdbc-site.xml Property Name | |----------------------|-----------------------------| @@ -251,18 +270,28 @@ You can override certain properties in a JDBC server configuration for a specifi | QUERY_TIMEOUT | jdbc.statement.queryTimeout | | DATE_WIDE_RANGE | jdbc.date.wideRange | -Example JDBC connection strings specified via custom options: - -``` pre -&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://pgserverhost:5432/pgtestdb&USER=pguser1&PASS=changeme -&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://mysqlhost:3306/testdb&USER=user1&PASS=changeme +For foreign tables: +```sql +CREATE SERVER "pgserver" FOREIGN DATA WRAPPER jdbc_pxf_fdw + OPTIONS ( + jdbc_driver 'org.postgresql.Driver', + db_url 'jdbc:postgresql://pgserverhost:5432/pgtestdb', + user 'pxfuser1', + pass 'changeme' + ); +CREATE USER MAPPING FOR CURRENT_USER SERVER "pgserver"; +CREATE FOREIGN TABLE pxf_pgtbl(name varchar, age int) + SERVER "pgserver" + OPTIONS (resource 'public.forpxf_table1'); ``` -For example: -
CREATE EXTERNAL TABLE pxf_pgtbl(name text, orders int)
-  LOCATION ('pxf://public.forpxf_table1?PROFILE=jdbc&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://pgserverhost:5432/pgtestdb&USER=pxfuser1&PASS=changeme')
-FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export');
+For external tables: +```sql +CREATE EXTERNAL TABLE pxf_pgtbl(name text, orders int) + LOCATION ('pxf://public.forpxf_table1?PROFILE=jdbc&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://pgserverhost:5432/pgtestdb&USER=pxfuser1&PASS=changeme') +FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export'); +```
Warning: Credentials that you provide in this manner are visible as part of the external table definition. Do not use this method of passing credentials in a production environment.
-Refer to [Configuration Property Precedence](cfg_server.html#override) for detailed information about the precedence rules that PXF uses to obtain configuration property settings for a Greenplum Database user. +Refer to [Configuration Property Precedence](cfg_server.html#override) for detailed information about the precedence rules that PXF uses to obtain configuration property settings for a Apache Cloudberry user. diff --git a/docs/content/jdbc_pxf_mysql.html.md.erb b/docs/content/jdbc_pxf_mysql.html.md.erb index 017be2846..dd4b55804 100644 --- a/docs/content/jdbc_pxf_mysql.html.md.erb +++ b/docs/content/jdbc_pxf_mysql.html.md.erb @@ -77,27 +77,27 @@ Perform the following steps to create a MySQL table named `names` in a database You must create a JDBC server configuration for MySQL, download the MySQL driver JAR file to your system, copy the JAR file to the PXF user configuration directory, synchronize the PXF configuration, and then restart PXF. -This procedure will typically be performed by the Greenplum Database administrator. +This procedure will typically be performed by the Apache Cloudberry administrator. -1. Log in to the Greenplum Database coordinator host: +1. Log in to the Apache Cloudberry coordinator host: ``` shell $ ssh gpadmin@ ``` 1. Download the MySQL JDBC driver and place it under `$PXF_BASE/lib`. If you [relocated $PXF_BASE](about_pxf_dir.html#movebase), make sure you use the updated location. You can download a MySQL JDBC driver from your preferred download location. The following example downloads the driver from Maven Central and places it under `$PXF_BASE/lib`: - 1. If you did not relocate `$PXF_BASE`, run the following from the Greenplum coordinator: + 1. If you did not relocate `$PXF_BASE`, run the following from the Cloudberry coordinator: ```shell gpadmin@gcoord$ cd /usr/local/pxf-gp/lib - gpadmin@coordinator$ wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.21/mysql-connector-java-8.0.21.jar + gpadmin@coordinator$ wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.26/mysql-connector-java-8.0.26.jar ``` - 2. If you relocated `$PXF_BASE`, run the following from the Greenplum coordinator: + 2. If you relocated `$PXF_BASE`, run the following from the Cloudberry coordinator: ```shell gpadmin@coordinator$ cd $PXF_BASE/lib - gpadmin@coordinator$ wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.21/mysql-connector-java-8.0.21.jar + gpadmin@coordinator$ wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.26/mysql-connector-java-8.0.26.jar ``` 1. Synchronize the PXF configuration, and then restart PXF: @@ -135,7 +135,7 @@ This procedure will typically be performed by the Greenplum Database administrat ``` -3. Synchronize the PXF server configuration to the Greenplum Database cluster: +3. Synchronize the PXF server configuration to the Apache Cloudberry cluster: ``` shell gpadmin@coordinator$ pxf cluster sync @@ -145,7 +145,7 @@ This procedure will typically be performed by the Greenplum Database administrat Perform the following procedure to create a PXF external table that references the `names` MySQL table that you created in the previous section, and reads the data in the table: -1. Create the PXF external table specifying the `jdbc` profile. For example: +1. Create the PXF foreign table specifying the `jdbc` profile. For example: ``` sql gpadmin=# CREATE EXTERNAL TABLE names_in_mysql (id int, name text, last text) @@ -153,6 +153,15 @@ Perform the following procedure to create a PXF external table that references t FORMAT 'CUSTOM' (formatter='pxfwritable_import'); ``` + OR foreign table + ``` sql + gpadmin=# CREATE SERVER "mysql" FOREIGN DATA WRAPPER jdbc_pxf_fdw; + gpadmin=# CREATE USER MAPPING FOR CURRENT_USER SERVER "mysql"; + gpadmin=# CREATE FOREIGN TABLE names_in_mysql (id int, name text, last text) + SERVER "mysql" + OPTIONS ( resource 'names' ); + ``` + 2. Display all rows of the `names_in_mysql` table: ``` sql @@ -175,6 +184,7 @@ Perform the following procedure to insert some data into the `names` MySQL table LOCATION('pxf://names?PROFILE=jdbc&SERVER=mysql') FORMAT 'CUSTOM' (formatter='pxfwritable_export'); ``` + OR reuse foreign table from previous steps 4. Insert some data into the `names_in_mysql_w` table. For example: diff --git a/docs/content/jdbc_pxf_named.html.md.erb b/docs/content/jdbc_pxf_named.html.md.erb index 3ee10b403..7804450b6 100644 --- a/docs/content/jdbc_pxf_named.html.md.erb +++ b/docs/content/jdbc_pxf_named.html.md.erb @@ -82,11 +82,11 @@ Perform the following procedure to create PostgreSQL tables named `customers` an ## Configure the Named Query -In this procedure you create a named query text file, add it to the `pgsrvcfg` JDBC server configuration, and synchronize the PXF configuration to the Greenplum Database cluster. +In this procedure you create a named query text file, add it to the `pgsrvcfg` JDBC server configuration, and synchronize the PXF configuration to the Apache Cloudberry cluster. -This procedure will typically be performed by the Greenplum Database administrator. +This procedure will typically be performed by the Apache Cloudberry administrator. -1. Log in to the Greenplum Database coordinator host: +1. Log in to the Apache Cloudberry coordinator host: ``` shell $ ssh gpadmin@ @@ -109,7 +109,7 @@ This procedure will typically be performed by the Greenplum Database administrat 4. Save the file and exit the editor. -5. Synchronize these changes to the PXF configuration to the Greenplum Database cluster: +5. Synchronize these changes to the PXF configuration to the Apache Cloudberry cluster: ``` shell gpadmin@coordinator$ pxf cluster sync @@ -117,7 +117,7 @@ This procedure will typically be performed by the Greenplum Database administrat ## Read the Query Results -Perform the following procedure on your Greenplum Database cluster to create a PXF external table that references the query file that you created in the previous section, and then reads the query result data: +Perform the following procedure on your Apache Cloudberry cluster to create a PXF external table that references the query file that you created in the previous section, and then reads the query result data: 1. Create the PXF external table specifying the `jdbc` profile. For example: @@ -127,7 +127,21 @@ Perform the following procedure on your Greenplum Database cluster to create a P FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); ``` - With this partitioning scheme, PXF will issue 4 queries to the remote SQL database, one query per quarter. Each query will return customer names and the total amount of all of their orders in a given month, aggregated per customer, per month, for each month of the target quarter. Greenplum Database will then combine the data into a single result set for you when you query the external table. + OR foreign table + ``` sql + CREATE SERVER "pgsrvcfg" FOREIGN DATA WRAPPER jdbc_pxf_fdw; + CREATE USER MAPPING FOR CURRENT_USER SERVER "pgsrvcfg"; + CREATE FOREIGN TABLE pxf_queryres_frompg (name text, city text, total int, month int) + SERVER "pgsrvcfg" + OPTIONS ( + resource 'query:pg_order_report', + PARTITION_BY 'month:int' + RANGE '1:13' + INTERVAL '3' + ); + ``` + + With this partitioning scheme, PXF will issue 4 queries to the remote SQL database, one query per quarter. Each query will return customer names and the total amount of all of their orders in a given month, aggregated per customer, per month, for each month of the target quarter. Apache Cloudberry will then combine the data into a single result set for you when you query the external table. 2. Display all rows of the query result: @@ -154,7 +168,7 @@ Perform the following procedure on your Greenplum Database cluster to create a P (2 rows) ``` - When you run this query, PXF requests and retrieves query results for only the `city` and `total` columns, reducing the amount of data sent back to Greenplum Database. + When you run this query, PXF requests and retrieves query results for only the `city` and `total` columns, reducing the amount of data sent back to Apache Cloudberry. 4. Provide additional filters and aggregations to filter the `total` in PostgreSQL: @@ -170,5 +184,5 @@ Perform the following procedure on your Greenplum Database cluster to create a P (2 rows) ``` - In this example, PXF will add the `WHERE` filter to the subquery. This filter is pushed to and run on the remote database system, reducing the amount of data that PXF sends back to Greenplum Database. The `GROUP BY` aggregation, however, is not pushed to the remote and is performed by Greenplum. + In this example, PXF will add the `WHERE` filter to the subquery. This filter is pushed to and run on the remote database system, reducing the amount of data that PXF sends back to Apache Cloudberry. The `GROUP BY` aggregation, however, is not pushed to the remote and is performed by Cloudberry. diff --git a/docs/content/jdbc_pxf_postgresql.html.md.erb b/docs/content/jdbc_pxf_postgresql.html.md.erb index 11520a69a..d813adf4c 100644 --- a/docs/content/jdbc_pxf_postgresql.html.md.erb +++ b/docs/content/jdbc_pxf_postgresql.html.md.erb @@ -75,16 +75,16 @@ Perform the following steps to create a PostgreSQL table named `forpxf_table1` i With these privileges, `pxfuser1` can read from and write to the `forpxf_table1` table. -7. Update the PostgreSQL configuration to allow user `pxfuser1` to access `pgtestdb` from each Greenplum Database host. This configuration is specific to your PostgreSQL environment. You will update the `/var/lib/pgsql/pg_hba.conf` file and then restart the PostgreSQL server. +7. Update the PostgreSQL configuration to allow user `pxfuser1` to access `pgtestdb` from each Apache Cloudberry host. This configuration is specific to your PostgreSQL environment. You will update the `/var/lib/pgsql/pg_hba.conf` file and then restart the PostgreSQL server. ## Configure the JDBC Connector You must create a JDBC server configuration for PostgreSQL and synchronize the PXF configuration. The PostgreSQL JAR file is bundled with PXF, so there is no need to manually download it. -This procedure will typically be performed by the Greenplum Database administrator. +This procedure will typically be performed by the Apache Cloudberry administrator. -1. Log in to the Greenplum Database coordinator host: +1. Log in to the Apache Cloudberry coordinator host: ``` shell $ ssh gpadmin@ @@ -114,7 +114,7 @@ This procedure will typically be performed by the Greenplum Database administrat ``` -3. Synchronize the PXF server configuration to the Greenplum Database cluster: +3. Synchronize the PXF server configuration to the Apache Cloudberry cluster: ``` shell gpadmin@coordinator$ pxf cluster sync @@ -132,6 +132,15 @@ Perform the following procedure to create a PXF external table that references t FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); ``` + OR foreign table + ``` sql + gpadmin=# CREATE SERVER "pgsrvcfg" FOREIGN DATA WRAPPER jdbc_pxf_fdw; + gpadmin=# CREATE USER MAPPING FOR CURRENT_USER SERVER "pgsrvcfg"; + gpadmin=# CREATE FOREIGN TABLE pxf_tblfrompg (name text, city text, total int, month int) + SERVER "pgsrvcfg" + OPTIONS ( resource 'public.forpxf_table1' ); + ``` + 2. Display all rows of the `pxf_tblfrompg` table: ``` sql @@ -155,6 +164,7 @@ Perform the following procedure to insert some data into the `forpxf_table1` Pos LOCATION ('pxf://public.forpxf_table1?PROFILE=jdbc&SERVER=pgsrvcfg') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export'); ``` + OR reuse foreign table from previous steps 4. Insert some data into the `pxf_writeto_postgres` table. For example: diff --git a/docs/content/jdbc_pxf_trino.html.md.erb b/docs/content/jdbc_pxf_trino.html.md.erb index ab30e01e2..601651c9d 100644 --- a/docs/content/jdbc_pxf_trino.html.md.erb +++ b/docs/content/jdbc_pxf_trino.html.md.erb @@ -30,9 +30,9 @@ Create a Trino table named `names` and insert some data into this table: You must create a JDBC server configuration for Trino, download the Trino driver JAR file to your system, copy the JAR file to the PXF user configuration directory, synchronize the PXF configuration, and then restart PXF. -This procedure will typically be performed by the Greenplum Database administrator. +This procedure will typically be performed by the Apache Cloudberry administrator. -1. Log in to the Greenplum Database coordinator host: +1. Log in to the Apache Cloudberry coordinator host: ```shell $ ssh gpadmin@ @@ -43,14 +43,14 @@ This procedure will typically be performed by the Greenplum Database administrat See [Trino Documentation - JDBC Driver](https://trino.io/docs/current/client/jdbc.html#installing) for instructions on downloading the Trino JDBC driver. The following example downloads the driver and places it under `$PXF_BASE/lib`: - 1. If you did not relocate `$PXF_BASE`, run the following from the Greenplum coordinator: + 1. If you did not relocate `$PXF_BASE`, run the following from the Cloudberry coordinator: ```shell gpadmin@coordinator$ cd /usr/local/pxf-gp/lib gpadmin@coordinator$ wget ``` - 2. If you relocated `$PXF_BASE`, run the following from the Greenplum coordinator: + 2. If you relocated `$PXF_BASE`, run the following from the Cloudberry coordinator: ```shell gpadmin@coordinator$ cd $PXF_BASE/lib @@ -131,7 +131,7 @@ This procedure will typically be performed by the Greenplum Database administrat ``` -1. Synchronize the PXF server configuration to the Greenplum Database cluster: +1. Synchronize the PXF server configuration to the Apache Cloudberry cluster: ```shell gpadmin@coordinator$ pxf cluster sync @@ -149,6 +149,14 @@ Perform the following procedure to create a PXF external table that references t LOCATION('pxf://memory.default.names?PROFILE=jdbc&SERVER=trino') FORMAT 'CUSTOM' (formatter='pxfwritable_import'); ``` + OR foreign table + ``` sql + CREATE SERVER "trino" FOREIGN DATA WRAPPER jdbc_pxf_fdw; + CREATE USER MAPPING FOR CURRENT_USER SERVER "trino"; + CREATE FOREIGN TABLE pxf_trino_memory_names (id int, name text, last text) + SERVER "trino" + OPTIONS ( resource 'memory.default.names' ); + ``` 1. Display all rows of the `pxf_trino_memory_names` table: @@ -173,6 +181,7 @@ You must create a new external table for the write operation. LOCATION('pxf://memory.default.names?PROFILE=jdbc&SERVER=trino') FORMAT 'CUSTOM' (formatter='pxfwritable_export'); ``` + OR reuse foreign table from previous steps 1. Insert some data into the `pxf_trino_memory_names_w` table. For example: From c85557292c9a8bd6b568c0d80f92be50a0be2fc8 Mon Sep 17 00:00:00 2001 From: Nikolay Antonov Date: Wed, 15 Apr 2026 15:56:05 +0500 Subject: [PATCH 2/2] review --- docs/content/jdbc_pxf.html.md.erb | 2 +- docs/content/jdbc_pxf_mysql.html.md.erb | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/content/jdbc_pxf.html.md.erb b/docs/content/jdbc_pxf.html.md.erb index a016c2ed8..42898d2c9 100644 --- a/docs/content/jdbc_pxf.html.md.erb +++ b/docs/content/jdbc_pxf.html.md.erb @@ -289,7 +289,7 @@ For external tables: ```sql CREATE EXTERNAL TABLE pxf_pgtbl(name text, orders int) LOCATION ('pxf://public.forpxf_table1?PROFILE=jdbc&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://pgserverhost:5432/pgtestdb&USER=pxfuser1&PASS=changeme') -FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export'); +FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export'); ```
Warning: Credentials that you provide in this manner are visible as part of the external table definition. Do not use this method of passing credentials in a production environment.
diff --git a/docs/content/jdbc_pxf_mysql.html.md.erb b/docs/content/jdbc_pxf_mysql.html.md.erb index dd4b55804..2c0bc593c 100644 --- a/docs/content/jdbc_pxf_mysql.html.md.erb +++ b/docs/content/jdbc_pxf_mysql.html.md.erb @@ -89,7 +89,7 @@ This procedure will typically be performed by the Apache Cloudberry administrato 1. If you did not relocate `$PXF_BASE`, run the following from the Cloudberry coordinator: ```shell - gpadmin@gcoord$ cd /usr/local/pxf-gp/lib + gpadmin@coordinator$ cd /usr/local/pxf-gp/lib gpadmin@coordinator$ wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.26/mysql-connector-java-8.0.26.jar ``` @@ -145,7 +145,7 @@ This procedure will typically be performed by the Apache Cloudberry administrato Perform the following procedure to create a PXF external table that references the `names` MySQL table that you created in the previous section, and reads the data in the table: -1. Create the PXF foreign table specifying the `jdbc` profile. For example: +1. Create the PXF external table specifying the `jdbc` profile. For example: ``` sql gpadmin=# CREATE EXTERNAL TABLE names_in_mysql (id int, name text, last text)