Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support customized alignment and storage for pg_tle create type API #266

Merged
merged 11 commits into from
Feb 27, 2024

Conversation

lyupan
Copy link
Contributor

@lyupan lyupan commented Feb 7, 2024

Issue #263.

In the previous version, we used default int4 alignment and plain storage (non-TOASTable). The data will always be stored in-line and not compressed. An error will be reported if the value exceeds the limit:
ERROR: row is too big: size xxx, maximum size xxx

With this change, users can customize the alignment and storage strategies. Compression or out-of-line storage can be used depending on the storage strategy (https://www.postgresql.org/docs/current/storage-toast.html).

The default alignment is int4 while the default storage is plain, compatible with previous versions.

A new version 1.3.5 is created along with this change.

Lyu Pan added 5 commits February 7, 2024 19:25
In the previous version, we used default int4 alignment and plain storage (non-TOASTable). The data will always be stored in-line and not compressed. An error will be reported if the value exceeds the limit:
ERROR: row is too big: size xxx, maximum size xxx.

With this change, users can customize the alignment and storage strategies. Compression or out-of-line storage can be used depending on the storage strategy (https://www.postgresql.org/docs/current/storage-toast.html).

The default alignment is int4 while the default storage is plain, compatible with previous versions.

aws#263
@lyupan lyupan marked this pull request as draft February 8, 2024 00:58
@lyupan lyupan marked this pull request as ready for review February 9, 2024 00:26
Lyu Pan added 2 commits February 14, 2024 21:52
…rove readability.

When the dabase gets upgraded to a new version(solib is updated) but pg_tle version is not, the old 5-arg version SQL function is invoked by users, but the previous C function always 7 args.

In this chagne, a new pg_tle_create_base_type_7args C function is created and used to create the create_base_type SQL function.
src/datatype.c Outdated
*/
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args);
Datum
pg_tle_create_base_type_7args(PG_FUNCTION_ARGS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is adding 7args to the name standard practice? Maybe pg_tle_create_base_type_with_storage would be more descriptive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the standard practice, but I have seen this naming style in postgres source code, for example
gin_extract_tsquery_5args(PG_FUNCTION_ARGS) https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/tsginidx.c#L318

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this naming scheme. While there's precedent in upstream, we don't have to mimic it from a user-exposed interface.

+1 to @adamguo0 suggestion: pg_tle_create_base_type_with_storage, or maybe pg_tle_create_base_type_with_storage_type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - to expose this, would we be able to rewrite the existing function to support 7 arguments, and have defaults fo rthe ones that are not necessarily set?

Copy link
Contributor Author

@lyupan lyupan Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated function name to pg_tle_create_base_type_with_storage.

The C function pg_tle_create_base_type_7args/with_storage is not exposed to the users, only pg_tle_create_base_type function (used in the previous 1.3.4 version) is exposed to the user.

v1.3.4: create function pg_tle.create_base_type() .... AS 'pg_tle_create_base_type'
v1.4.0+: create function pg_tle.create_base_type() .... AS 'pg_tle_create_base_type_with_storage'

Adding an internal pg_tle_create_base_type_with_storage C function (without modifying the exist one) makes it cleaner and easier to maintain.

Background: in my first commit, I modified the pg_tle_create_base_type C function to add these 2 arguments, however, it has a hidden bug for upgrade (See commit message in f56985b), this was discovered by Josh Tiefenbach and I made this update based on the suggestion, we think this approach creates less confusion and maintenance issues.

Makefile Outdated
@@ -1,13 +1,13 @@
EXTENSION = pg_tle
EXTVERSION = 1.3.4
EXTVERSION = 1.3.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get some opinions on whether to bump the version to 1.3.5 or 1.4.0. My initial thought was 1.3.5 since this is a small update to an existing feature, but it also introduces an API change which may warrant a minor version bump.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go strictly by semantic versioning, a new, backwards compatible change would require a minor version bump (1.4.0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Updated the new version number to 1.4.0

@@ -63,6 +63,8 @@ SELECT pgtle.create_shell_type_if_not_exists('public', 'test_citext');
* `infunc`: The name of a previously defined function to convert the type's external textual representation to the internal representation (`bytea`). The function must take one argument of type `text` and return `bytea`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter.
* `outfunc`: The name of a previously defined function to convert the type's internal binary representation (`bytea`) to the external textual representation. The function must take one argument of type `bytea` and return `text`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter.
* `internallength`: Total length of the base data type as bytes. Base data types can be fixed-length, in which case internallength is a positive integer, or variable-length, in which case internallength is -1.
* `alignment`: The optional alignment parameter specifies the storage alignment requirement of the data type. Allowed values are 'char', 'int2', 'int4', 'double'. The allowed values equate to alignment on 1, 2, 4, or 8 byte boundaries. The default is 'int4', alignment on 4 byte boundaries. Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.
Copy link
Contributor

@anth0nyleung anth0nyleung Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a sanity check, are the allowed values the same for all major version we support?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend giving more guidance on how this is used. Most builders are not going to be familiar with byte alignment.

Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.

I'd recommend rephrasing this:

A variable-length type (e.g., text) must have a byte-alignment of at least 4 due to the size of its header.

Copy link
Contributor Author

@lyupan lyupan Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a sanity check, are the allowed values the same for all major version we support?

Yes, this didn't change at least since PG11, https://www.postgresql.org/docs/11/sql-createtype.html

alignment
The storage alignment requirement of the data type. If specified, it must be char, int2, int4, or double; the default is int4.

storage
The storage strategy for the data type. If specified, must be plain, external, extended, or main; the default is plain.

Comment on lines 20 to 22
DROP FUNCTION pgtle.create_base_type;
DROP FUNCTION pgtle.create_base_type_if_not_exists;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need CASCADE to drop the GRANT we did here https://github.com/aws/pg_tle/blob/93e330cb146c4055a983b38a47d73df6ac924b33/pg_tle--1.1.1.sql#L555C1-L571 for the old versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated!

@anth0nyleung
Copy link
Contributor

could you also test upgrading to this version works and share the results?

Makefile Outdated
@@ -1,13 +1,13 @@
EXTENSION = pg_tle
EXTVERSION = 1.3.4
EXTVERSION = 1.3.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go strictly by semantic versioning, a new, backwards compatible change would require a minor version bump (1.4.0)

docs/09_datatypes.md Show resolved Hide resolved
@@ -63,6 +63,8 @@ SELECT pgtle.create_shell_type_if_not_exists('public', 'test_citext');
* `infunc`: The name of a previously defined function to convert the type's external textual representation to the internal representation (`bytea`). The function must take one argument of type `text` and return `bytea`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter.
* `outfunc`: The name of a previously defined function to convert the type's internal binary representation (`bytea`) to the external textual representation. The function must take one argument of type `bytea` and return `text`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter.
* `internallength`: Total length of the base data type as bytes. Base data types can be fixed-length, in which case internallength is a positive integer, or variable-length, in which case internallength is -1.
* `alignment`: The optional alignment parameter specifies the storage alignment requirement of the data type. Allowed values are 'char', 'int2', 'int4', 'double'. The allowed values equate to alignment on 1, 2, 4, or 8 byte boundaries. The default is 'int4', alignment on 4 byte boundaries. Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend giving more guidance on how this is used. Most builders are not going to be familiar with byte alignment.

Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.

I'd recommend rephrasing this:

A variable-length type (e.g., text) must have a byte-alignment of at least 4 due to the size of its header.

docs/09_datatypes.md Show resolved Hide resolved
src/datatype.c Outdated
*/
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args);
Datum
pg_tle_create_base_type_7args(PG_FUNCTION_ARGS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this naming scheme. While there's precedent in upstream, we don't have to mimic it from a user-exposed interface.

+1 to @adamguo0 suggestion: pg_tle_create_base_type_with_storage, or maybe pg_tle_create_base_type_with_storage_type

src/datatype.c Outdated
*/
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args);
Datum
pg_tle_create_base_type_7args(PG_FUNCTION_ARGS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - to expose this, would we be able to rewrite the existing function to support 7 arguments, and have defaults fo rthe ones that are not necessarily set?

src/datatype.c Outdated
Comment on lines 260 to 271
if (pg_strcasecmp(alignmentStr, "double") == 0)
alignment = TYPALIGN_DOUBLE;
else if (pg_strcasecmp(alignmentStr, "int4") == 0)
alignment = TYPALIGN_INT;
else if (pg_strcasecmp(alignmentStr, "int2") == 0)
alignment = TYPALIGN_SHORT;
else if (pg_strcasecmp(alignmentStr, "char") == 0)
alignment = TYPALIGN_CHAR;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("alignment \"%s\" not recognized", alignmentStr)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: put this lookup into its own function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, updated!

src/datatype.c Outdated
Comment on lines 273 to 284
if (pg_strcasecmp(storageStr, "plain") == 0)
storage = TYPSTORAGE_PLAIN;
else if (pg_strcasecmp(storageStr, "external") == 0)
storage = TYPSTORAGE_EXTERNAL;
else if (pg_strcasecmp(storageStr, "extended") == 0)
storage = TYPSTORAGE_EXTENDED;
else if (pg_strcasecmp(storageStr, "main") == 0)
storage = TYPSTORAGE_MAIN;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("storage \"%s\" not recognized", storageStr)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: put this lookup into its own function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@lyupan
Copy link
Contributor Author

lyupan commented Feb 23, 2024

could you also test upgrading to this version works and share the results?

Yes, tested locally. See results below. (This PR only changes the API to allow extra type options, it doesn't change the type storage/alignment if already defined)

  1. Create pg_tle with old version library and version
postgres=# create extension pg_tle;
CREATE EXTENSION
postgres=# \dx
                        List of installed extensions
  Name   | Version |   Schema   |                Description                 
---------+---------+------------+--------------------------------------------
 pg_tle  | 1.3.4   | pgtle      | Trusted Language Extensions for PostgreSQL
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)
  1. Create custom types and data
...
-- Variable-length
postgres=# SELECT pgtle.create_base_type('public', 'test_citext', 'test_citext_in(text)'::regprocedure, 'test_citext_out(bytea)'::regprocedure, -1);
 create_base_type 
------------------
 
(1 row)

-- fixed length
postgres=# SELECT pgtle.create_base_type('public', 'test_int2', 'test_int2_in(text)'::regprocedure, 'test_int2_out(bytea)'::regprocedure, 2);
 create_base_type 
------------------
 
(1 row)
postgres=# INSERT INTO test_dt_int2 VALUES ('11,22');
INSERT 0 1
postgres=# SELECT * FROM test_dt_int2;
  c1   
-------
 11,22
(1 row)

postgres=# INSERT INTO test_dt VALUES ('SELECT'), ('INSERT'), ('UPDATE'), ('DELETE');
INSERT 0 4
postgres=# select * from test_dt;
   c1   
--------
 SELECT
 INSERT
 UPDATE
 DELETE
(4 rows)

  1. Install the new extension library (1.4.0), restart postgres and upgrade
postgres=# ALTER EXTENSION pg_tle UPDATE;
ALTER EXTENSION
postgres=# \dx
                        List of installed extensions
  Name   | Version |   Schema   |                Description                 
---------+---------+------------+--------------------------------------------
 pg_tle  | 1.4.0   | pgtle      | Trusted Language Extensions for PostgreSQL
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)
  1. Check the old types (tables) can still read/write
postgres=# select * from test_dt;
   c1   
--------
 SELECT
 INSERT
 UPDATE
 DELETE
(4 rows)

postgres=# select * from test_dt_int2;
  c1   
-------
 11,22
(1 row)

postgres=# delete from test_dt;
DELETE 4
postgres=# delete from test_dt_int2;
DELETE 1
postgres=# INSERT INTO test_dt VALUES ('SELECT'), ('INSERT'), ('UPDATE'), ('DELETE');
INSERT 0 4
postgres=# INSERT INTO test_dt_int2 VALUES ('11,22');
INSERT 0 1
postgres=# select * from test_dt;
   c1   
--------
 SELECT
 INSERT
 UPDATE
 DELETE
(4 rows)

postgres=# select * from test_dt_int2;
  c1   
-------
 11,22
(1 row)

Copy link
Contributor

@jkatz jkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a couple of minor comments. Nice job!

docs/09_datatypes.md Show resolved Hide resolved
Comment on lines +60 to +68
static Datum
pg_tle_create_base_type_internal(Oid typeNamespace,
char *typeName,
Oid inputFuncId,
Oid outputFuncId,
int16 internalLength,
char *alignmentStr,
char *storageStr,
char *funcProbin);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this the output from pgindent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

src/datatype.c Outdated Show resolved Hide resolved
Co-authored-by: Jonathan S. Katz <jkatz@users.noreply.github.com>
@lyupan
Copy link
Contributor Author

lyupan commented Feb 27, 2024

Thank you all for the review!

@adamguo0 adamguo0 merged commit 4e18a50 into aws:main Feb 27, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants