Skip to content

Developping a postgres C extension

Remi-C edited this page Jan 20, 2014 · 2 revisions

Abstract

Here are some advices and lessons learnt when trying to had C function to a postgres extension.

Summary

  1. Global organization of pluggin
  2. resources
  3. getting and returning data from/to Postgres
    • getting data from postgres
    • returning data from postgres
  4. memory
  5. Compilation
  6. Test
  7. Debug
  8. Documentation

Details

1. Global organization of an extension

An extension is a library with C functions, and some postgres side functions calling this library.

  • Postgres side : As an extension, there are 2 files which control the postgres side of things (like creating the schema/type/functions, etc). One is a sql script, the other an admin file.
  • lib side : The lib is C compiled code, it is organized as an api toward postgres (plpgsql), focusing on data input / output, and an api for function dealing wiht internal stuff and actually doing the things. Github propose an usefull tool to browse trough sources

Those internal functions are tested wiht CUNIT, the plpgsql function are tested via sql script, comparing the output.

2. resources

3. getting and returning data from/to Postgres

This is the main difficulty, and is developer unfriendly because it uses a lot's of C macro, and there is no proper documentation. It is particularly important to think the data model thoroughly, because it must have several layers : postgres layers : * what kind of type is used in plpgsql signature * wat kind of type is defined for postgres ( example : PCPATCH ) * postgres interface : everything is Datum or pointer to Datum, yet we can retrieve only a part of it (toast-ed data) * Inter-plpgsql call : data should be transmitted between plpgsql function in a certain form. * C level : the data struct used inside C library

  • getting data from postgres First thing is to declare a plpgsql function with the correct signature which points to the C function. Example :
    DROP FUNCTION IF EXISTS pc_patchsubset(p pcpatch, dimensions text[]);
    CREATE OR REPLACE FUNCTION pc_patchsubset(p pcpatch, dimensions TEXT[])
    RETURNS pcpatch AS '$libdir/pointcloud', 'pcpatch_subset'
    LANGUAGE c IMMUTABLE STRICT;

Then on the C side, we get the argument passed by postgres using macros : Example :

    Datum pcpatch_subset(PG_FUNCTION_ARGS);
    PG_FUNCTION_INFO_V1(pcpatch_subset);
Datum pcpatch_subset(PG_FUNCTION_ARGS)
{ 
PG_GETARG_SERPATCH_P(0);
              ...
}

In postgres data is stored as Hex encoded binary, of type cstring. This must be decoded into regular binary (according to the endianess of the hardware).

All the data is passed as a Datum type , or a pointer to a Datum. Is is then possible to get all the data or only a part of it (because the Datum is possibly a toast-ed value, meaning it is written across several rows when the data is big). (see pg_detoast_datum_slice) It is slightly more complicated when having a composite type as input, or an array. The best is then to find an existing function with same kind of signature, then to copy its code. Some functions are implicitly called when casting the result into a database type, they are nammed typename_in and typname_out. Example : pcpatch_in

 * returning data from postgres

Data returned by function are a certain type, but daa written in base must be text or hex encoded binary. For a given return/input type, the data is implictyle preprocessed by the function typename_in and typename_out. Again we use C macro to return certain type. all functions have access so some contexte which store information about whet and when the function is called. This structure can then be used for caching. * Set returning functions For those very special function, there is a perticular macro, the main difference is that some of the function will be called only on the first call, then the rest will be called each time. There are some tools to switch memory context.

4. memory

All memory must be allocated using postgres special allocation/freing, which allows for a kind of a garbage collector when the function end. There are some special meaning to switch between memory context for Set Returning Function

5. Compilation

It is difficult to set the project, because the postgres sources have to be included for the interface part, and we still have to manage the regular includes for the library.

6. Test

It is paramount to have a separate CUnit testbench, because testing functions from Database is very difficult (usually upon error, the server crashes, just saying segfault). So all the functionnality of lib must be tested in CUnit, this way when interfacing with postgres, we only need ot debug the interfacing part. Other test can be done on plpgsql function by comparing expected result vs actual result on known input.

7. Debug

This is very hard to debug if the postgres process is not attached ot a real debugger. Same, regular printf won't show in the database, special function must be used (ELOG) The process should be :

  • develop function in the lib world, and test thoroughly with Cunit
  • create an empty shell of interfacing function C
  • create a correct signature plpgsql function referencing the empty shell
  • when it works, plug the empty shell to the library and use special print to debug.

8. Documentation

Several layers of doc are mandatory due to the complexity : user doc : for plpgsql function low level doc : doxygen data type schema : because of the numerous conversions necessary.