Purely evil preprocessor macros adding anonymous functions and closures to ISO C99
C
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README
c_lambda.h
cltest.c
cmacros.h
examples.c

README

#
# C99-Lambda: nested functions, lambdas, and closures, in ISO C99
# ---------------------------------------------------------------
#
# originally by Alex "Yasha" Gilding, 2012
# 
# This project is placed in the public domain, and may be used for any purpose,
# without any conditions attached or attribution necessary. You can find a copy
# of this licence here:
#
# ...oh wait.
#
# Files supplied AS IS. No warranty is implied and no liability is accepted for
# any consequences of using this code. Using code like this in a real project
# is highly likely to result in a sound thrashing from your colleagues.
#


Requirements:
-------------

C99 conformant (or later) preprocessor. Works with GCC, should work with Clang.
Does not appear to work with TCC or PCC. Unlikely to work at all with MSVC. The
continuation-machine is quite demanding on the preprocessor.

A willing suspension of disbelief. All code examples in this readme really are
standard-conformant ISO C, but it may be hard to believe. This is probably a
bad thing.


Usage:
------

#include "c_lambda.h"

That's all. It's just a tiny test project, after all. Now your namespace is
polluted with a bunch of bizarre macros.


Public syntax extension macros:
-------------------------------

The macros intended to be used at the top-level - i.e. in user code - are as
follows:


 namespace(<name>, ...<body>)
 func(<return type>, <name>, ...<body>)

Any code making use of the inline functions must appear inside one of these two
block constructs. These must appear at the global scope level, as they provide
the framework out of which the inner functions are lifted. They also provide a
root naming scheme for any contained functions. "func" is a convenience version
that links the namespace to the scope of a normal C function. Example:

 namespace(myFunctionList,
   typedef int(* fptr)(void);
   fptr functions[] = {
     fn(int, (void), { return 1; }),  // Simple function literals
     fn(int, (void), { return 2; }),
     fn(int, (void), { return 3; })
   };
 )
 
 func(int, myGlobalFunc, (int blah) {
     //...code here, presumably involving inner functions
 })

All namespaces must be unique. Tying namespacing to normal toplevel functions
helps enforce this, while also abstracting the need to think about a naming
scheme for anonymous objects. Since namespaces must be global, they should not
nest and should not appear within function scopes.


 fn(<return type>, <args>, ...<body>)

The main macro for generating an inline anonymous function. This cannot appear
outside a func() or namespace() block. Inner functions may nest to any depth
(within reason and the limits of the continuation machine). Example:

 typedef int(* fptr)(int);
 func(fptr, someFunc, (void) {
     return fn(int, (int a), {
         fptr f = fn(int, (int b), { return b * 6; });
         return a * f(a + 1);
     });
 })

Inline functions declared in this way may *not* appear within parentheses, for
any reason, even as a macro argument (don't do that). An exception exists, and
is below, but for the most part parentheses interfere with the macro expansion.
Anonymous functions using this form *may* appear in most other contexts:
 - as a return value (shown above)
 - in a braced initializer list, as above in an array, or for a struct
 - as the RHS of an assignment (shown above)
 - in the function-call position, as long as there are no containing parens.
   Example:

 //...
 foo = fn(int, (int a, int b), { return (a + b) * (b - a); })(4, 5);
 fn(void, (int c), { printf("Received c: %d\n", c); })(47);
 //...


 defn(<return type>, <name>, <args>, ...<body>)

Define a function, named and visible in the local scope. The same rules apply
as for fn() regarding parentheses and namespace blocks. Unlike fn(), defn() is
a statement, not an expression; as with a normal function declaration, it cannot
be called in-place or used as a value. Example:

 // Same code as above, using defn() instead of fn()
 defn(int, helper, (int a, int b), {
     return (a + b) * (b - a);
 })
 foo = helper(4, 5);
 //...

The name declared by defn() is a normal local constant containing a function
pointer, and can be used from then on as one would in normal C code.


 cl(<return type>, <args>, <free variables>, ...<body>)

This macro creates a closure object in-place, over the variables named in the
"second args list". The expression is subject to the same rules as fn() - must
appear within a namespace block, may not appear within parentheses - but also,
because it does not return a raw C function pointer, may not appear in the call
position (and therefore C syntax cannot call it like a function). The returned
value is a pointer to a stack-allocated struct (declared at the same external
site as the function body), containing the function pointer, the total size of
the object, and the values of the closed-over variables. Example:

 // User supplies a big enough buffer here...
 func(void, makeAdder, (void * out_buf, int add) {
     closure_type(int, (int)) c = cl(int, (int x), ((int, add)), {
                                      return x + _env->add;
                                  });
     memcpy(out_buf, c, c->_size);
 })
 
 // Close over two variables, no args
 int x = 42, y = 47;
 cl(void, (...), ((int, x), (int, y)), {
     printf("%d, %d\n", _env->x, _env->y);
 });

In the second example, the closure takes no arguments; to indicate this, the
variadic tail should be used instead of "void". Also note how the free variable
list has a slightly more complicated format than the argument list.

To invoke a closure, call its _fun field with itself as the first argument. Any
other arguments may follow. Example:

 // Invoke a closure
 closure_type(int, (int)) c = ...
 c->_fun(c, 6);


 closure_type(<return type>, <arg types>)

This macro expands to an anonymous struct pointer type suitable for punning as
the type of a closure object. The type exposes the _fun and _size fields needed
for general usage of the closure object. Example:

 typedef closure_type(int, (int, int)) Clos;
 Clos c = cl(int, (int x, int y), ((int,a)), { ... });
 Clos d = malloc(c->_size); memcpy(d, c, c->_size);
 int x = d->_fun(d, 8, 10);
 
 closure_type(int, (int)) c1 = cl(int, (int x), ...);

Note that C's typing rules mean that every type created by closure_type is
technically distinct, so it should mostly be used with typedef rather than
in-place.


 _fe1(<function to call>, <lambda>, <other args>...)
 _fe2(<function to call>, <arg>, <lambda>, <other args>...)
 _fe3(<function to call>, <arg>, <arg>, <lambda>, <other args>...)
 _fe4(<function to call>, <arg>, <arg>, <arg>, <lambda>, <other args>...)
 _cle1(<function to call>, <closure>, <other args>...)
 _cle2(<function to call>, <arg>, <closure>, <other args>...)
 _cle3(<function to call>, <arg>, <arg>, <closure>, <other args>...)
 _cle4(<function to call>, <arg>, <arg>, <arg>, <closure>, <other args>...)

These macros provide the sole exception to the parenthesis rule for anonymous
functions and inline closures. An anonymous function or closure may appear in
the respective position marked "lambda" or "closure" for each form, using only
the procedure body rather than the fn() or cl() macro (see below: basically
just drop the "fn" or "cl").

The function in position zero will be called with the lambda as an argument,
along with any arguments appearing in the positions before or after it. If the
_feN form is being used, the function should accept a function pointer; and if
the _cleN form is being used, it should accept a closure pointer.

No syntactic form is provided with the lambda in position zero, since a fn()
form lambda may be called using direct C syntax (shown above), while doing so
with a closure is frankly more or less pointless.

The _feN and _cleN forms are statements, not expressions, and do not return any
value. To return a value, the easiest option would be to use an out-parameter
(possibly defining a wrapper function to pass it through).

Examples:

 // Assume gforeach and gmap are higher-order functions...
 _fe2(gforeach, glist((char*[]){"foo", "bar", "baz"}), (void, (char * c), {
     printf("%s: %d\n", c, strlen(c));
 }));
 
 GList * out; int k = 2;
 _cle2(gmap, glist((int[]){ 1, 2, 3, 4 }), (int, (int n), ((int, k)), {
     return n * _env->k;
 }), &out);
 // "out" now contains { 2, 4, 6, 8 }


Implementation
--------------

The main goal of the project is to find a way to "lift" lambdas out of their
original context, and move them to the global scope to be declared, along with
any modifications and extras required for them to work, ahead of their point of
use. There are three main components to this.

First, code must be broken up into a list of lambdas and non-lambda sections.
The enclosing namespace block wraps its contents in a "non-lambda" marker. When
an inner function is encountered, it closes the wrapping marker, wraps the body
of the inner function in a function marker, and reopens the wrapping marker.

Thus this:

 func(void, foo, (void) {
   fptr g = fn(void, (void), { 0; });
 })

...initially becomes:

 ( 8blk((void) {
   fptr g = ), 8fn(void, (void), ( 8blk( { 0; } ) )), 8blk(;
 }) )

This is a flat list of plain code alternating with inner functions. Functions
nested within inner functions are wrapped in much the same way, using its own
main 8blk list, creating a vague tree structure of code block lists within code
block lists. Further expansion is paused for the moment by truncating the macro
names to begin with a numeral. The enclosing function's name and return type go
directly to the namespace builder macro (where they will provide a root name,
then begin to build the header of the emitted block).

Flattening and processing a tree is a fundamentally recursive task, making it
ill-suited to a language where recursion is forbidden! Therefore, the block is
next passed to a group of macros implementing a continuation machine. The tasks
of splitting the block into a list of functions and a list of code + names,
pulling nested functions out and attaching them to the end of the queue, and
finding the next task in the queue are handled by this continuation machine.

The machine itself is heavily ripped off from Vesa Karvonen's Order-PP library,
except much smaller and lighter. A continuation consists of a macro to run, a
list of arguments to pass it, and optionally some output to dump (output is put
at the end, meaning that everything is written in reverse - thus the first code
encountered, the enclosing namespace using the inner functions, will be last to
be written out). The macro is cut off with a numeral and is given a prefix with
the paste operator when it is time to run; recursion is not expressed directly,
but rather in continuation-passing style.

The queue of "next operations", that accepts nested inner function definitions
to write (and "extras" like the struct definitions for closures) is actually
not made explicitly visible to the continuation machine, but instead passed as
part of the argument list. When a function has been written, it hands control
over to the 8DO_Q continuation, which grabs the next item from the queue, and
repackages the rest into the argument list for it.

Because a continuation is returned as a parenthesised list of procedure, args,
and remainder, breaking out is as simple as returning something not contained
within parentheses, so that the next continuation step is not invoked. An "end"
macro, waiting at the end of the queue, does this in an easily-cleaned-up way.

Because the next macro to invoke is held in truncated form, the recursion issue
is bypassed, as no macro returns its own name in full. There is a maximum limit
to the number of steps the continuation machine can provide, but at 2^N where N
is the number of defined CM macros, this is very easily extended.

Simpler tasks, such as applying a macro to each element of a list, are handled
by more traditional macros in "cmacros.h". These are not recursion-safe, so are
only ever called on to return values by the CM macros. They have a much lower
maximum limit, since they grow in a linear fashion (32 by default), but not
integrating them into the CM should dramatically improve performance, and also
does not waste CM steps on non-recursive computation. Where the macros need to
be able to cull some values (e.g. to avoid calculating the unused branch of an
IF expression), select additional CM steps can be added to filter the amount of
work being demanded.


Final note on usage
-------------------

...you probably really shouldn't abuse poor innocent C in this way.