-
Notifications
You must be signed in to change notification settings - Fork 0
Developping a postgres C extension
Here are some advices and lessons learnt when trying to had C function to a postgres extension.
- Global organization of pluggin
- resources
- getting and returning data from/to Postgres
- getting data from postgres
- returning data from postgres
- memory
- Compilation
- Test
- Debug
- Documentation
An extension is a library with C functions, and some postgres side functions calling this library.
- Postgres side : As an extension, there are 2 files which control the postgres side of things (like creating the schema/type/functions, etc). One is a sql script, the other an admin file.
- lib side : The lib is C compiled code, it is organized as an api toward postgres (plpgsql), focusing on data input / output, and an api for function dealing wiht internal stuff and actually doing the things. Github propose an usefull tool to browse trough sources
Those internal functions are tested wiht CUNIT, the plpgsql function are tested via sql script, comparing the output.
- For global advice on C extension : doc page for C extension
- For advices on postgres extension framework : doc page for postgres extension framework
- Mainly, for technical details, the information is only available by browsing trough sources
This is the main difficulty, and is developer unfriendly because it uses a lot's of C macro, and there is no proper documentation.
It is particularly important to think the data model thoroughly, because it must have several layers :
postgres layers :
* what kind of type is used in plpgsql signature
* wat kind of type is defined for postgres ( example : PCPATCH
)
* postgres interface : everything is Datum
or pointer to Datum
, yet we can retrieve only a part of it (toast-ed data)
* Inter-plpgsql call : data should be transmitted between plpgsql function in a certain form.
* C level : the data struct used inside C library
- getting data from postgres First thing is to declare a plpgsql function with the correct signature which points to the C function. Example :
DROP FUNCTION IF EXISTS pc_patchsubset(p pcpatch, dimensions text[]);
CREATE OR REPLACE FUNCTION pc_patchsubset(p pcpatch, dimensions TEXT[])
RETURNS pcpatch AS '$libdir/pointcloud', 'pcpatch_subset'
LANGUAGE c IMMUTABLE STRICT;
Then on the C side, we get the argument passed by postgres using macros : Example :
Datum pcpatch_subset(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(pcpatch_subset);
Datum pcpatch_subset(PG_FUNCTION_ARGS)
{
PG_GETARG_SERPATCH_P(0);
...
}
In postgres data is stored as Hex encoded binary, of type cstring. This must be decoded into regular binary (according to the endianess of the hardware).
All the data is passed as a Datum
type , or a pointer to a Datum
.
Is is then possible to get all the data or only a part of it (because the Datum is possibly a toast-ed value, meaning it is written across several rows when the data is big).
(see pg_detoast_datum_slice
)
It is slightly more complicated when having a composite type as input, or an array. The best is then to find an existing function with same kind of signature, then to copy its code.
Some functions are implicitly called when casting the result into a database type, they are nammed typename_in and typname_out.
Example : pcpatch_in
* returning data from postgres
Data returned by function are a certain type, but daa written in base must be text or hex encoded binary. For a given return/input type, the data is implictyle preprocessed by the function typename_in and typename_out. Again we use C macro to return certain type. all functions have access so some contexte which store information about whet and when the function is called. This structure can then be used for caching. * Set returning functions For those very special function, there is a perticular macro, the main difference is that some of the function will be called only on the first call, then the rest will be called each time. There are some tools to switch memory context.
All memory must be allocated using postgres special allocation/freing, which allows for a kind of a garbage collector when the function end. There are some special meaning to switch between memory context for Set Returning Function
It is difficult to set the project, because the postgres sources have to be included for the interface part, and we still have to manage the regular includes for the library.
It is paramount to have a separate CUnit testbench, because testing functions from Database is very difficult (usually upon error, the server crashes, just saying segfault). So all the functionnality of lib must be tested in CUnit, this way when interfacing with postgres, we only need ot debug the interfacing part. Other test can be done on plpgsql function by comparing expected result vs actual result on known input.
This is very hard to debug if the postgres process is not attached ot a real debugger. Same, regular printf won't show in the database, special function must be used (ELOG) The process should be :
- develop function in the lib world, and test thoroughly with Cunit
- create an empty shell of interfacing function C
- create a correct signature plpgsql function referencing the empty shell
- when it works, plug the empty shell to the library and use special print to debug.
Several layers of doc are mandatory due to the complexity : user doc : for plpgsql function low level doc : doxygen data type schema : because of the numerous conversions necessary.