-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support customized alignment and storage for pg_tle create type API #266
Conversation
In the previous version, we used default int4 alignment and plain storage (non-TOASTable). The data will always be stored in-line and not compressed. An error will be reported if the value exceeds the limit: ERROR: row is too big: size xxx, maximum size xxx. With this change, users can customize the alignment and storage strategies. Compression or out-of-line storage can be used depending on the storage strategy (https://www.postgresql.org/docs/current/storage-toast.html). The default alignment is int4 while the default storage is plain, compatible with previous versions. aws#263
…rove readability. When the dabase gets upgraded to a new version(solib is updated) but pg_tle version is not, the old 5-arg version SQL function is invoked by users, but the previous C function always 7 args. In this chagne, a new pg_tle_create_base_type_7args C function is created and used to create the create_base_type SQL function.
src/datatype.c
Outdated
*/ | ||
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args); | ||
Datum | ||
pg_tle_create_base_type_7args(PG_FUNCTION_ARGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is adding 7args
to the name standard practice? Maybe pg_tle_create_base_type_with_storage
would be more descriptive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the standard practice, but I have seen this naming style in postgres source code, for example
gin_extract_tsquery_5args(PG_FUNCTION_ARGS)
https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/tsginidx.c#L318
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this naming scheme. While there's precedent in upstream, we don't have to mimic it from a user-exposed interface.
+1 to @adamguo0 suggestion: pg_tle_create_base_type_with_storage
, or maybe pg_tle_create_base_type_with_storage_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - to expose this, would we be able to rewrite the existing function to support 7 arguments, and have defaults fo rthe ones that are not necessarily set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated function name to pg_tle_create_base_type_with_storage
.
The C function pg_tle_create_base_type_7args/with_storage
is not exposed to the users, only pg_tle_create_base_type
function (used in the previous 1.3.4 version) is exposed to the user.
v1.3.4: create function pg_tle.create_base_type() .... AS 'pg_tle_create_base_type'
v1.4.0+: create function pg_tle.create_base_type() .... AS 'pg_tle_create_base_type_with_storage'
Adding an internal pg_tle_create_base_type_with_storage
C function (without modifying the exist one) makes it cleaner and easier to maintain.
Background: in my first commit, I modified the pg_tle_create_base_type
C function to add these 2 arguments, however, it has a hidden bug for upgrade (See commit message in f56985b), this was discovered by Josh Tiefenbach and I made this update based on the suggestion, we think this approach creates less confusion and maintenance issues.
Makefile
Outdated
@@ -1,13 +1,13 @@ | |||
EXTENSION = pg_tle | |||
EXTVERSION = 1.3.4 | |||
EXTVERSION = 1.3.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get some opinions on whether to bump the version to 1.3.5 or 1.4.0. My initial thought was 1.3.5 since this is a small update to an existing feature, but it also introduces an API change which may warrant a minor version bump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go strictly by semantic versioning, a new, backwards compatible change would require a minor version bump (1.4.0
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Updated the new version number to 1.4.0
docs/09_datatypes.md
Outdated
@@ -63,6 +63,8 @@ SELECT pgtle.create_shell_type_if_not_exists('public', 'test_citext'); | |||
* `infunc`: The name of a previously defined function to convert the type's external textual representation to the internal representation (`bytea`). The function must take one argument of type `text` and return `bytea`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter. | |||
* `outfunc`: The name of a previously defined function to convert the type's internal binary representation (`bytea`) to the external textual representation. The function must take one argument of type `bytea` and return `text`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter. | |||
* `internallength`: Total length of the base data type as bytes. Base data types can be fixed-length, in which case internallength is a positive integer, or variable-length, in which case internallength is -1. | |||
* `alignment`: The optional alignment parameter specifies the storage alignment requirement of the data type. Allowed values are 'char', 'int2', 'int4', 'double'. The allowed values equate to alignment on 1, 2, 4, or 8 byte boundaries. The default is 'int4', alignment on 4 byte boundaries. Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a sanity check, are the allowed values the same for all major version we support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend giving more guidance on how this is used. Most builders are not going to be familiar with byte alignment.
Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.
I'd recommend rephrasing this:
A variable-length type (e.g.,
text
) must have a byte-alignment of at least4
due to the size of its header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a sanity check, are the allowed values the same for all major version we support?
Yes, this didn't change at least since PG11, https://www.postgresql.org/docs/11/sql-createtype.html
alignment
The storage alignment requirement of the data type. If specified, it must be char, int2, int4, or double; the default is int4.
storage
The storage strategy for the data type. If specified, must be plain, external, extended, or main; the default is plain.
pg_tle--1.3.4--1.3.5.sql
Outdated
DROP FUNCTION pgtle.create_base_type; | ||
DROP FUNCTION pgtle.create_base_type_if_not_exists; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need CASCADE to drop the GRANT we did here https://github.com/aws/pg_tle/blob/93e330cb146c4055a983b38a47d73df6ac924b33/pg_tle--1.1.1.sql#L555C1-L571 for the old versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, updated!
could you also test upgrading to this version works and share the results? |
Makefile
Outdated
@@ -1,13 +1,13 @@ | |||
EXTENSION = pg_tle | |||
EXTVERSION = 1.3.4 | |||
EXTVERSION = 1.3.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go strictly by semantic versioning, a new, backwards compatible change would require a minor version bump (1.4.0
)
docs/09_datatypes.md
Outdated
@@ -63,6 +63,8 @@ SELECT pgtle.create_shell_type_if_not_exists('public', 'test_citext'); | |||
* `infunc`: The name of a previously defined function to convert the type's external textual representation to the internal representation (`bytea`). The function must take one argument of type `text` and return `bytea`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter. | |||
* `outfunc`: The name of a previously defined function to convert the type's internal binary representation (`bytea`) to the external textual representation. The function must take one argument of type `bytea` and return `text`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter. | |||
* `internallength`: Total length of the base data type as bytes. Base data types can be fixed-length, in which case internallength is a positive integer, or variable-length, in which case internallength is -1. | |||
* `alignment`: The optional alignment parameter specifies the storage alignment requirement of the data type. Allowed values are 'char', 'int2', 'int4', 'double'. The allowed values equate to alignment on 1, 2, 4, or 8 byte boundaries. The default is 'int4', alignment on 4 byte boundaries. Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend giving more guidance on how this is used. Most builders are not going to be familiar with byte alignment.
Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.
I'd recommend rephrasing this:
A variable-length type (e.g.,
text
) must have a byte-alignment of at least4
due to the size of its header.
src/datatype.c
Outdated
*/ | ||
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args); | ||
Datum | ||
pg_tle_create_base_type_7args(PG_FUNCTION_ARGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this naming scheme. While there's precedent in upstream, we don't have to mimic it from a user-exposed interface.
+1 to @adamguo0 suggestion: pg_tle_create_base_type_with_storage
, or maybe pg_tle_create_base_type_with_storage_type
src/datatype.c
Outdated
*/ | ||
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args); | ||
Datum | ||
pg_tle_create_base_type_7args(PG_FUNCTION_ARGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - to expose this, would we be able to rewrite the existing function to support 7 arguments, and have defaults fo rthe ones that are not necessarily set?
src/datatype.c
Outdated
if (pg_strcasecmp(alignmentStr, "double") == 0) | ||
alignment = TYPALIGN_DOUBLE; | ||
else if (pg_strcasecmp(alignmentStr, "int4") == 0) | ||
alignment = TYPALIGN_INT; | ||
else if (pg_strcasecmp(alignmentStr, "int2") == 0) | ||
alignment = TYPALIGN_SHORT; | ||
else if (pg_strcasecmp(alignmentStr, "char") == 0) | ||
alignment = TYPALIGN_CHAR; | ||
else | ||
ereport(ERROR, | ||
(errcode(ERRCODE_INVALID_PARAMETER_VALUE), | ||
errmsg("alignment \"%s\" not recognized", alignmentStr))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: put this lookup into its own function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, updated!
src/datatype.c
Outdated
if (pg_strcasecmp(storageStr, "plain") == 0) | ||
storage = TYPSTORAGE_PLAIN; | ||
else if (pg_strcasecmp(storageStr, "external") == 0) | ||
storage = TYPSTORAGE_EXTERNAL; | ||
else if (pg_strcasecmp(storageStr, "extended") == 0) | ||
storage = TYPSTORAGE_EXTENDED; | ||
else if (pg_strcasecmp(storageStr, "main") == 0) | ||
storage = TYPSTORAGE_MAIN; | ||
else | ||
ereport(ERROR, | ||
(errcode(ERRCODE_INVALID_PARAMETER_VALUE), | ||
errmsg("storage \"%s\" not recognized", storageStr))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: put this lookup into its own function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Yes, tested locally. See results below. (This PR only changes the API to allow extra type options, it doesn't change the type storage/alignment if already defined)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a couple of minor comments. Nice job!
static Datum | ||
pg_tle_create_base_type_internal(Oid typeNamespace, | ||
char *typeName, | ||
Oid inputFuncId, | ||
Oid outputFuncId, | ||
int16 internalLength, | ||
char *alignmentStr, | ||
char *storageStr, | ||
char *funcProbin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this the output from pgindent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is.
Co-authored-by: Jonathan S. Katz <jkatz@users.noreply.github.com>
Thank you all for the review! |
Issue #263.
In the previous version, we used default int4 alignment and plain storage (non-TOASTable). The data will always be stored in-line and not compressed. An error will be reported if the value exceeds the limit:
ERROR: row is too big: size xxx, maximum size xxx
With this change, users can customize the alignment and storage strategies. Compression or out-of-line storage can be used depending on the storage strategy (https://www.postgresql.org/docs/current/storage-toast.html).
The default alignment is int4 while the default storage is plain, compatible with previous versions.
A new version 1.3.5 is created along with this change.