Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-5772 [C] Extend list of types to predicate pushdown #104

Open
wants to merge 33 commits into
base: feature/ADBDEV-5595
Choose a base branch
from

Conversation

bimboterminator1
Copy link
Member

@bimboterminator1 bimboterminator1 commented Jun 25, 2024

Extend the list of types for predicate pushdown

The recent changes to the pxffilters.c file in the external-table codebase and
in pxf_filter.c in the fdw codebase introduce several enhancements and
adjustments aimed at improving filter handling, particularly extending of
supported data types range.
Here is a summary of the key updates:

Array comparison support in the filters:
The operator map for pxf_supported_opr_op_expr now includes generic array
comparison operators like ARRAY_EQ_OP, enhancing the filter capabilities for
array types. Only = , <>, IN, IS NULL operators are supported.
A new oid list pxf_supported_array_types[] and a new function
supported_array_type(Oid type) are added to check if a given array type is
supported for filtering. It is used to process array as list constants
inside the opexpr_to_pxffilter() (OpExprToPxfFilter()) function.
pxf_serialize_filter_list() (PxfSerializeFilterList()) is updated to handle
scenarios where a filter operand is an attribute and the other is a list
constant.

List Constant Handling:
The function list_const_to_str() (ListConstToStr()) now includes an additional
bool parameter with_nulls, allowing it to handle NULL values in array constants
appropriately. For the case of IN operators (scalar_array_op_expr_to_pxffilter()
) the NULL value inside the enumeration is meaningless (the comparison with
NULLS can't be performed) and the list_const_to_str() is called with false
with_nulls argument. For the case of array comparison (like a =
ARRAY[TRUE, NULL]) the list_const_to_str() is called with true parameter
with_nulls.

BOOLARRAY type introduction:
New OID BOOLARRAYOID is defined as a new macro and is covered in #ifndef
because GPDB 6 does not have the definition for this type.
The patch makes updates to pxf_supported_opr_scalar_array_op_expr with new entry
BooleanEqualOperator to handle the IN operator for BOOL type. The
list_const_to_str() (ListConstToStr()) also uses predefined string constants
to encode the bool values in the final filter representation.

Other types are handled by extending pxf_supported_opr_op_expr[],
pxf_supported_types[], pxf_supported_array_types[],
pxf_supported_opr_scalar_array_op_expr[] lists and corresponding switch
operators inside the scalar_const_to_str() and list_const_to_str() functions.

The following new data types have been added or extended for array types:

BYTEA, BYTEAARRAY;
FLOAT4, FLOAT8, FLOAT4ARRAY, FLOAT8ARRAY;
BPCHARARRAY;
VARCHARARRAYOID (here the modification of scalar_array_op_expr_to_pxffilter()
was needed due to RelabelType occurence in queries like SELECT * from
test_varchar where t in ('aaa'::varchar(10), 'bbb'::varchar(10));
DATEARRAY;
TIME, TIMEARRAY;
TIMESTAMPARRAY;
TIMESTAMPTZ, TIMESTAMPTZARRAY;
INTERVAL, INTERVALARRAYOID;
NUMERICARRAY;
UUID, UUIDARRAY;
JSONB, JSONBARRAY;
JSON, JSONARRAY (scalar operators and IN are not supported);

In order to launch the regression tests the UserDataVerifyAccessor.java
has been extended to generate more fields of newly supported types.


Example How to test with jdbc:

First of all, build and install pxf server
.
Use a text editor to uncomment the following line in the $PXF_BASE/conf/pxf-application.properties file and set the desired log level pxf.log.level=debug

With external table:

  1. create jdbc profile in $PXF_BASE/conf/pxf-profiles.xml
Ex

<?xml version='1.0' encoding='UTF-8'?>
<profiles>
 <profile>
   <name>JDBC</name>
   <description>jdbc</description>
   <plugins>
     <fragmenter>org.greenplum.pxf.plugins.jdbc.JdbcPartitionFragmenter</fragmenter>
     <accessor>org.greenplum.pxf.plugins.jdbc.JdbcAccessor</accessor>
     <resolver>org.greenplum.pxf.plugins.jdbc.JdbcResolver</resolver>
   </plugins>
   </profile>
</profiles>

2. create server configuration in $PXF_BASE/servers/postgresql/jdbc-site.xml
Ex
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <property>
       <name>jdbc.driver</name>
       <value>org.postgresql.Driver</value>
       <description>Class name of the JDBC driver (e.g. org.postgresql.Driver)</description>
   </property>
   <property>
       <name>jdbc.url</name>
       <value>jdbc:postgresql://localhost:5432/postgres</value>
       <description>The URL that the JDBC driver can use to connect to the database (e.g. jdbc:postgresql://localhost/postgres)</description>
   </property>
   <property>
       <name>jdbc.user</name>
       <value>bimbo</value>
       <description>User name for connecting to the database (e.g. postgres)</description>
   </property>
</configuration>

create extension pxf;
CREATE readable EXTERNAL TABLE test_bytea(x bytea)
     LOCATION ('pxf://public.test_bytea?PROFILE=JDBC&SERVER=postgresql')
          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');

select * from test_bytea where x = '\132greenplum\132';
select * from test_bytea where x is NULL;
select * from test_bytea where x in ('\132greenplum\132', 'salsmasdm');

CREATE readable EXTERNAL TABLE test_bytea_arr(x bytea[])
     LOCATION ('pxf://public.test_bytea_arr?PROFILE=JDBC&SERVER=postgresql')
          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
select * from test_bytea_arr where x = array['\132greenplum\132'::bytea];

With fdw:

  1. Create configuration:
Ex
CREATE SERVER postgresql_server
    FOREIGN DATA WRAPPER jdbc_pxf_fdw
         OPTIONS ( 
        jdbc_driver 'org.postgresql.Driver', 
        db_url 'jdbc:postgresql://localhost:5432/postgres',
         batch_size '10000',
        fetch_size '2000'
    );
    
    
CREATE USER MAPPING FOR my_user
    SERVER postgresql_server
    OPTIONS ( user 'pg_user');

  1. The same queries:
            CREATE FOREIGN TABLE test_jsonb_fdw (t json)
          SERVER postgresql
          OPTIONS ( resource 'public.test_jsonb');

select * from test_jsonb_fdw where t in ('{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::jsonb, '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::jsonb);

And then just look at pxf-service.log, or here, or at tcpdump

Supported operators for filter pushdown:

  • IN(single_value)
  • Comparison operators >,>=,=,etc
  • IS NULL
  • IN (several values)
  • expr1 BOOL_OP expr2

MANUAL REGRESSION TEST run

in regression directory run:
make FDW_FilterPushDownTest FilterPushDownTest

The recent changes to the pxffilters.c file in the external-table codebase and
in pxf_filter.c in the fdw codebase introduce several enhancements and
adjustments aimed at improving filter handling, particularly for array types.
Here is a summary of the key updates:

Array Type Support:
The operator map for pxf_supported_opr_op_expr now includes generic array
comparison operators like ARRAY_EQ_OP, enhancing the filter capabilities for
array types. It means that only = , <> operators are supported.
A new function supported_array_type(Oid type) is added to check if a given array
type is supported for filtering. It is used to process array as list constants
inside the opexpr_to_pxffilter() (OpExprToPxfFilter()) function.
pxf_serialize_filter_list() (PxfSerializeFilterList()) is updated to handle
scenarios where a filter operand is an attribute and the other is a list
constant.

BOOLARRAY type introduction:
New OID for array types such as BOOLARRAYOID are defined
if not available, ensuring compatibility with different versions of GPDB.
Updates to pxf_supported_opr_scalar_array_op_expr with new entry
BooleanEqualOperator to handle the IN operator for BOOL type. The
list_const_to_str() (ListConstToStr()) also uses predefined string costants
to encode the bool values in the final filter representation.

List Constant Handling:
The function list_const_to_str() (ListConstToStr()) now includes an additional
bool parameter with_nulls, allowing it to handle NULL values in array constants
appropriately. For the case of IN operators (scalar_array_op_expr_to_pxffilter()
) the NULL value inside the enumeration is meaningless (the comparison with
NULLS can't be performed) and the list_const_to_str() is called with false
with_nulls argument. For the case of array comparison (like a =
ARRAY[TRUE, NULL]) the list_const_to_str() is called with true parameter
with_nulls.
@bimboterminator1 bimboterminator1 marked this pull request as ready for review June 25, 2024 10:11
@red1452
Copy link

red1452 commented Jun 27, 2024

Few small corrections of the description:

  1. Extend the list
  2. function supported_array_type(Oid type) is added
  3. a new macro
  4. because GPDB 6 does not have

@whitehawk
Copy link

The tests haven't been added due to absence of working test
framework.

is there any way to use unit tests from external-table\test?

@whitehawk
Copy link

In the testing description:

CREATE readable EXTERNAL TABLE test_bytea_arr(x bytea[])
     LOCATION ('pxf://public.test_bytea_arr?PROFILE=JDBC&SERVER=postgresql')
          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
select * from test_bytea where x = array['\132greenplum\132'::bytea];

shouldn't it be select * from test_bytea_arr?

@whitehawk
Copy link

In the description there is a typo 'string costants' -> 'string constants'.

appendStringInfo(interm_buf, "%s", NullConstValue);
}
else
appendStringInfo(interm_buf, "%s", extval);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to place the shortest branch first.

@bimboterminator1
Copy link
Member Author

bimboterminator1 commented Jun 28, 2024

is there any way to use unit tests from external-table\test?

I could try, but the tests version lagged behind the pxf extension version and can't be even compiled. The tests should be reviewed and recovered if possible. They also depend on gpdb source code and existed in pxf code base since the extension were the part of gpcontrib;

@whitehawk
Copy link

I could try, but the tests version lagged behind the pxf extension version and can't be even compiled. The tests should be reviewed and recovered if possible. They also depend on gpdb source code and existed in pxf code base since the extension were the part of gpcontrib;

Ok, let's drop it for now.

fdw/pxf_filter.c Outdated Show resolved Hide resolved
external-table/src/pxffilters.c Show resolved Hide resolved
external-table/src/pxffilters.c Show resolved Hide resolved
external-table/src/pxffilters.c Show resolved Hide resolved
external-table/src/pxffilters.c Show resolved Hide resolved
external-table/src/pxffilters.c Show resolved Hide resolved
@RekGRpth

This comment was marked as resolved.

@RekGRpth

This comment was marked as resolved.

@bimboterminator1
Copy link
Member Author

bimboterminator1 commented Jul 3, 2024

I meant, how good is it to call supported_array_type for non-array types?

I wanted to check somehow whether given Const is array or not. Because for the query like

CREATE readable EXTERNAL TABLE test_bytea_arr(x bytea[])
     LOCATION ('pxf://public.test_bytea_arr?PROFILE=JDBC&SERVER=postgresql')
          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
select * from test_bytea_arr where x = array['\000\132greenplum\000\132'::bytea, NULL];

the execution will get to the opexpr_to_pxffilter(). And i want to serialize array types via list_const_to_str(), not via scalar_const_to_str(). How good? I just compare oids, supported_array_type() is designed for the cases like above.

external-table/src/pxffilters.c Outdated Show resolved Hide resolved
external-table/src/pxffilters.c Outdated Show resolved Hide resolved
@bimboterminator1
Copy link
Member Author

I've also tried to launch regression tests. Queries for most of the types can be run within regress framework by applying the patch in the PR description.

@RekGRpth

This comment was marked as resolved.

@bimboterminator1
Copy link
Member Author

It's not the only part which should be modified. The existing java tests, regression tests will also be affected, the diff will be huge enough. And the tests anyway won't cover all the types mentioned in this patch. At least there is a bunch of errors for the timestamp with time zone, jsonb, and overcoming those issues is quite burdensome

@bimboterminator1
Copy link
Member Author

Have added regression tests, just run make FDW_FilterPushDownTest FilterPushDownTest in regression directory.
I couldn't deal with automation tests and changed those files without extra checking.

fdw/pxf_filter.c Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants