# High Level Programming in C

According to the TIOBE report available at http://www.tiobe.com, Java and C are the two most popular languages for 15 years. Both share more than 30% of the ratings; other languages holds 6% or less of these ratings. While C is categorized a high level language, it remains mostly seen by average developers as a low level one with a high learning curve. One reason might be its excellent mapping with hardware which makes it a good candidate for operating system design. But we believe this is not the only reason, even not the main reason. 

In this document, we present a set of requirements for high level programming that in our opinion are not fulfilled by the C language. For each requirement exposed, a solution is proposed.


## Development Process

Starting a project from scratch (in C) requires adopting coding style, coding convention, packaging rules among others that are shortly described below. 

### Coding Convention

Compared to other languages, the C standard does not provide any coding convention *per se*. This leads to various coding convention: https://en.wikipedia.org/wiki/Indent_style. The [C FAQ](http://c-faq.com/style/layout.html) advises to follow the K&R coding style. We are even a bit stricter and we follow this guideline: https://users.ece.cmu.edu/~eno/coding/CCodingStandard.html.

We also adopt the following convention:
* most modules provides an object-oriented interface: for a given structure ``struct foo``, most module functions are defined by ``func(foo_p self, ...)`` where ``foo_p`` is a pointer on ``struct foo`` and represent the "object" on which the function ``func`` operates;
* enumeration are suffixed by ``_e``;
* static names defined in the .c file are prefixed with an underscore such as in ``_a_static_func(...)``;
* internal API within a module implementation use double underscore to differenciate with external API.

We also provide two templates that ease structuring the code in a consistent manner:
* header file (TODO: provide the link)
* implementation file (TODO: provide the link)


### Code Quality Convention

Neither C11, nor POSIX define code quality convention. There exist however at least two "standards" related somehow to this topic:
* MISRA: initially for the automative industry but now enlargen to embedded system:  https://en.wikipedia.org/wiki/MISRA_C;
* CERT C Coding Standard: defines a set of rules and recommandation for security and safety: https://www.securecoding.cert.org/confluence/x/HQE 

The problem is the tooling required to ensure you conform to those standard. They might not be free of charge. At the very minimum, we recommand the following:
* Compilation flags: turn all warning into error, all warnings must be treated before any release of the product; the sooner those warnings are taken into account, the easier the maintenance of the product on the long term. For example, with gcc, CFLAGS are: 

    CFLAGS="-wall ..."  # TODO DEFINE
    
* Compile with various compilers ---  at the very least, gcc and clang: each compiler might raise some issues unseen by others;
* Use [valgrind](http://valgrind.org/) systematically to detect memory problems. We use the following option for valgrind:

    VALGRIND_OPTS="..." # TODO DEFINE 
 

### Packaging

The layout of a project is important, especially when the project is made of several subprojects. If each subproject holds the same layout, it becomes much easier to find out a given file. Therefore, we propose the following layout that should fit most projects requirements:
TODO


### Versionning

Versionning a product is not so easy. First, one must adopt the versionning convention. There are many ways for that: https://en.wikipedia.org/wiki/Software_versioning. We propose to follow a quite common convention:
``major.minor.fix-release`` where:
* ``major`` is incremented when incompatible changes have been introduced;
* ``minor`` is incremented when compatible changes have been introduced (new functions in the API for example);
* ``fix`` is incremented when only bug fixes have been introduced;
* ``release`` is incremented when only packaging have been changed such as layout, makefile, or something else.

Once the versionning scheme has been designed, providing this version at the code level must be done in order to  implement the ``cmd --version`` or the ``About...`` dialog box. This can be done using the following API:

TODO: define and implement an API for version management

### Dependency Management

Most modern languages provide a system (integrated or not) to retrieve dependencies. For example, Java use [Maven](https://maven.apache.org/) or [Ivy](http://ant.apache.org/ivy/), Python uses [Pypi](https://pypi.python.org/pypi), Go has the [``go``](https://golang.org/cmd/go/) tool, ruby uses [``gem``](https://rubygems.org/) but C provides nothing.

According to the guideline defined here, especially concerning [Packaging](#Packaging) and [Versionning](#Versionning), we propose a tool that might use dependency management of C packages.

TODO: make ``bxivcs`` a product and describe it.

### Development Cycle

Probably the main difference between modern languages such as Java, Python, or Go, is the development cycle. Since Python is an interpreted language, it offers naturally the fastest one: code -> run. C, Java and Go must be compiled and their development cycle therefore includes one more step: code -> compile -> run. In particular, Java is so well integrated within "modern" IDEs such as Eclipse or Netbeans that the compile phase is mostly transparent. Therefore, at least in Java, the development cycle remains as fast as for any interpreted language: code -> run. Despite the support for the C language by Eclipse, it does not currently provide the same level of integration than with the Java language. This might change in the next future.


### Debugging

Most modern languages provide various mechanism that ease the debugging:
* no direct access to memory;
* smart boundary checking (that can be removed at runtime sometimes without harm);
* stack trace on error with dynamic message, including file, function and line number;
* assertion system that can be de-activated at runtime; 
* logging system that helps to understand what happens at runtime;
* monitoring system (at least in Java, with JMX).

By default, C11 does not provide any way to produce a stack trace, nor a logging system, neither a monitoring system. POSIX does not provide anything comparable to what "modern" languages provide (more on that [later](#Standard Libraries)). Therefore, most of the time, in C, when a problem occurs, it can result in a segmentation fault at best (with the famous ``core dumped`` message), and undetected at worst (e.g. memory corruption).

One first solution for this problem is to guarantee code quality using several tools and techniques already presented in section [Code Quality Convention](#Code Quality Convention). But this is not sufficient to accelarate the development cycle: using ``gdb`` or ``valgrind`` for simple bugs is too costly/time consuming. 

POSIX ``assert.h`` module is also not sufficient: assertion can only be disabled at compile time, not at runtime, and the error message is quite minimal: the stacktrace for example, is not included. Finally, of course, ``printf()`` is not a good option for a logging system: it is too costly, it does not provide enough information (thread, module, function, line), and cannot be partly disabled in a module and not in another. 

Therefore, some important libraries are missing for debugging as will be discussed in [Standard Libraries](#Standard Libraries). 


### Documentation 

Finally, the standard documentation of the C language is the venerable man page system. Despite being usefull for commands or configuration files, it does not compete with other documentation system specifically designed for APIs such as ``javadoc``, ``pydoc``, ``godoc`` and so on:
* searching with the ``apropos`` command provides too many items, most of them are useless (e.g.: wrong section);
* relation between structures/types/modules is not dynamic/hypertext;
* no figure or schema can be provided since the system is text based;

We actually use [Doxygen](http://www.doxygen.org/) since it supports C natively, and also C++, Java, Perl and Python. It has some drawbacks however, but we did not find any other good alternative suitable for projects that includes both C code *and* Python code. One major advantage we found with Doxygen however is its ability to support code snippet from examples and how it links them with the documentation. One example here: [TODO: provide the link](http://somewhere.net/bxibase/examples).



## Standard Libraries

Actually, despite its age --- C has been invented in the early 70's by Dennis Richie --- it is very poor in general purpose standard libraries. For example, while the current C standard --- known as C11 --- has been specified in 2011 by http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf, it still does not provide generic list or hashtable comparable to what can be found in ”modern” languages such as Java, Python, Perl, Go, Haskell, among many others. In theory, one should distinguish the language --- the set of syntaxic and grammar rules that defines it --- from the default set of libraries provided with it. But in practice, both makes a whole that must fulfill an industrial project requirements. Most modern languages provide a very rich set of default libraries. A comparison between the documentation of the C11 default standard library https://en.wikipedia.org/wiki/C_standard_library#Header_files  and at least one "modern" language makes this statement quite clear:

* Java: http://docs.oracle.com/javase/8/docs/api/index.html,
* Python: https://docs.python.org/2/library/index.html,
* Perl: https://docs.python.org/2/library/index.html,
* Go: https://golang.org/pkg/#stdlib
* Haskell: https://www.haskell.org/onlinereport/haskell2010/haskellpa2.html#x20-192000II


Even if the POSIX standard defined in http://pubs.opengroup.org/onlinepubs/9699919799/ provides many additionnal libraries to C11 --- general purpose hashtable and list can be found in the search.h header for example --- it is far from sufficient to be comparable to what is provided by "modern" languages (e.g.: for example, hcreate() provides an API for a hashtable but is not reentrant, does not support a dynamic maximal number of elements and supports only key as string). 

In the following, we consider a minimal set of libraries that are strictly required for high level programming. They either enhance what is provided by C11 and POSIX or they define a new missing module.

### Memory Management

C language supports low-level access to computer memory. This is often seen by experts as a strength. However it is also probably one of the main reasons why bugs, memory leak and security holes do exist. For allocation, functions ``malloc()``, ``calloc()`` and ``realloc()`` are used. Except ``calloc()``, those functions do not initialize the memory. This might look strange to users of Java, Python, or other similar languages that always initialize their memory. We believe this should be the default behaviour most of the time unless the profiling of the application shows otherwise. POSIX also provides ``free()``, for releasing the previously allocated memory. However this function does not nullify the pointer value given in argument. This often leads to a problem when the given pointer is given twice (by error) to ``free()``. Most of the time, keeping the previous value of the released memory is error prone.

Below a program that illustrates all problems mentionned above with the POSIX API.

In [None]:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void) {
    size_t n = 10;
    char *ptr = malloc(n*sizeof(*ptr));
    printf("\nMemory content after malloc #1 (not guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    memset(ptr, 'A', n*sizeof(*ptr));
    free(ptr);
    printf("\nValue of ptr after a free() (not guaranteed to be NULL): %p", ptr);
    
    ptr = malloc(n*sizeof(*ptr));
    printf("\nMemory content after malloc #2 (not guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    printf("\n");
    
    n *= 2;
    ptr = realloc(ptr, n*sizeof(*ptr));
    printf("\nMemory content after realloc (the new memory area is not guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    printf("\n");
    
    free(ptr);
    // free(ptr); // This will produce a big error 
}

We propose the [``bximem``](http://doc.bxi.hl/bxibase/bxi/base/mem.h) module that mainly provides functions more targetted to high level programming. It solve all problems above as shown by the example below:

In [None]:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#include <bxi/base/mem.h>

int main(void) {
    size_t n = 10;
    char *ptr = bximem_calloc(n*sizeof(*ptr));
    printf("\nMemory content after bximem_calloc #1 (guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    memset(ptr, 'A', n*sizeof(*ptr));
    BXIFREE(ptr);
    printf("\nValue of ptr after a BXIFREE() (guaranteed to be NULL): %p", ptr);
    
    ptr = bximem_calloc(n*sizeof(*ptr));
    printf("\nMemory content after malloc #2 (guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    printf("\n");
    
    size_t old_size = n;
    n *= 2;
    ptr = bximem_realloc(ptr, old_size, n*sizeof(*ptr));
    printf("\nMemory content after realloc (the new memory area is not guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    printf("\n");
    
    BXIFREE(ptr);
    BXIFREE(ptr); // This won't produce any error since ptr is already NULL 
}

### Error Management

If memory management in C, can be seen as a powerfull tool, when combined with C error management, it definitely becomes the main reason of buggy C programs. Actually, error management in C is tradionnally based on integer return code which makes it at the same time very poor and very weak:
* very poor: because a simple integer does not hold any context, it cannot tell a lot on the reason of an issue;
* very weak: because using the result of a function is not mandatory in C, it is very easy to just forget the returned code.

For the first problem, `errno`, `perror()` and `strerror()` functions are provided but they are a pain to use, as shown by the code below:

In [None]:
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void) {
    errno = 0;                                                   // (1) Don't forget that!
    FILE * file = fopen("/a/non/existent/file", "r");
    if (NULL == file) {                                          // (2) Don't forget that!
        char * msg = strerror(errno);                            // (3) Not thread-safe
        fprintf(stderr, "Something wrong happened: %s\n", msg);  // (4) Does the  message holds all required information? 
        // free(msg);                                            // (5) Don't do that, segfault guaranteed!
    } else {
        printf("Strange: you really have such a file?\n");       // This should not happened
    }
    
}

* In instruction (1), `errno` must be initialized before any call to function that modifies it. Unfortunately, the behavior of C standard functions is not consistent regarding their usage of `errno`. For example, most functions defined in `pthread.h` does not modify `errno`. Note that in this small example, the problem of setting `errno` might not appear at first, but in a real software, it is very easy to just forget its initialization. In such a case, the behavior of the processus might become completely strange: it might notify about error at some places where the error did not occur.

* Instruction (2) is standard practice in C, and we will rely on that for error management: checking the return value of a function.

* Instruction (3) uses the `strerror()` POSIX function defined in `string.h` which is not thread safe. It returns a string representing the error message related to the given error code, in our case `errno`.

* Instruction (4) is a very simple way to deal with the error: we just display it. However, the message returned by `strerror()` does not hold all the context, in our case for example, the file name. The context also includes the whole call stack. Therefore, the message displayed is not very usefull. Of course, in this simple example, we just can display the file name in the `fprintf()` function, and it will solve the problem. However in the general case, the fact that `strerror()` does not include the context is a problem: you cannot return the value to the caller for example, you have to create a specific structure that includes the context along with the error message (or the code, but not `errno` since it might change afterwards). Error reporting is also very poor: either displays on standard error using one of `perror()` or `error()` or do it yourself with `fprintf()`.

* Instruction (5) is also dangerous: `strerror()` does not return an allocated string --- this is the main reason why it cannot include the context by the way --- therefore it must not be freed nor modified.

Most modern languages uses [Exception Handling](https://en.wikipedia.org/wiki/Exception_handling) to solve most of those issues: an exception is an object, therefore it can hold a context that helps understanding what the problem is all about. In some languages (e.g. Java), some exceptions cannot be ignored at all, they have to be dealt with by the caller or they must be passed up in the call stack. If C libraries for exception handling [exist](https://github.com/guillermocalvo/exceptions4c/wiki/alternatives) they are of course not standard. 

Anyway, exception handling has some drawbacks (see [Exception Handling Criticism on Wikipedia](https://en.wikipedia.org/wiki/Exception_handling#Criticism) and [Why should I have written ZeroMQ in C, not C++ (part I) ](http://250bpm.com/blog:4)], therefore, we prefer to stick with the C return-code tradition.

The `bxierr` module has been designed specifically to address those problems. It defines an object-like structure `bxierr_p` that represents the error. Following the C tradition, a function is supposed to return such a `bxierr_p` object in case of error or the special constant `BXIERR_OK` when no problem occurs. 

In short term, the `bxierr.h` module is:

* efficient: determining if an error occurs after a call requires just a check between two pointers: the returned `bxierr_p` object and the constant `BXIERR_OK`; this operation is very fast (as fast as comparing two integers as done in the traditionnal approach);
* rich: the returned `bxierr_p` object includes an error code, a message that can include part of the context --- such as the file name in our example --- and the complete call stack;
* safer: since a `bxierr_p` object is returned, proper tool such as [Valgrind](valgrind.org) or [Coverity](www.coverity.com) will shout if it is not either destroyed or returned to the caller.

This last property prevents errors from being ignored as it is usually done in C (how many times do you really check the code returned by printf()?). However, handling the error after each function call is clearly a pain and makes the code unreadable in the end. Note that one of the feature of exception handling is the ability to factorize the error handling code at one location in the `except/catch` block. The `bxierr.h` module provides various features in this regards:

* error chaining: an error can be the cause of another error; 
* error list: non related errors can be stored in a list;
* error set: non related errors can be stored in a set --- that is two errors of the same type (that is   with the same error code) are only stored once; this makes it possible to report only errors once per   kind of errors.


The following example illustrates the use of the `bxierr.h` module for high level programming. 

In [None]:
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <bxi/base/err.h>

int main(void) {
    char * filename = "/a/non/existent/file";
    errno = 0;                                                   // (1) Don't forget that!
    FILE * file = fopen(filename, "r");
    if (NULL == file) {
        bxierr_p err = bxierr_errno("Problem while opening %s",  // (2) Convenient function to create a bxierr_p
                                    filename);
        bxierr_report(&err, STDERR_FILENO);                      // (3) Convenient function to report an error
    } else {
        printf("Strange: you really have such a file?\n");       // This should not happened
    }
}
    

* Instruction (1) is still requires since `fopen()` is called.
* Instruction (2) shows a convenient way to create a `bxierr_p` from an `errno` code. There are many other functions for `bxierr_p` creation.
* Instruction (3) shows how to report a `bxierr_p`; another more powerfull way will be presented in [Logging System](#Logging System).

Notice the output now provides much more information: 

* the error code;
* the message with the file name;
* the full call stack.

Of course, one might argue that if the file name is required in the output, it just have to do it inside the `fprintf()`, and the result will be somewhat equivalent. In this simple case, where the error is just reported, you will be right --- the call stack will be missing though. However, if the error is not only supposed to be reported, but also t

The `bxierr.h` module is a real change in C programming, and is one of the most important module for high level programming. To understand that, a more complete example is required: 

In [None]:
#include <stdlib.h>
#include <time.h>
#include <bxi/base/err.h>

bxierr_p f() {
    // This function actually generates many independant errors, we store them in a list
    bxierr_list_p err_list = bxierr_list_new();
    for (size_t i = 0; i < 5; i++) {
        // For example, some files are opened
        // open(...)
        bxierr_p err = bxierr_simple(5, "Error returned from thread %d", i); // An error type is given by an integer code
        bxierr_list_append(err_list, err);
    }
    return (0 == err_list->errors_nb) ? BXIERR_OK : bxierr_from_list(999, err_list, "At least one error occurs");  
}

bxierr_p g() {
    // This function might generate many errors in a loop, instead of storing each one of them,
    // we only store each distinct ones, limiting the number of reports.
    bxierr_set_p err_set = bxierr_set_new();
    while (3 > err_set->distinct_err.errors_nb) {
        // This loop is for example, accessing a file on a regular basis but must never crash even on error.
        // However error reporting is important. We report an error only the first time it is seen
        // read(...) or write(...)
        long int err_type = rand() % 4; // Let's suppose only 4 types of errors 
        bxierr_p err = bxierr_simple(err_type, "An error occured while reading/writing");
        if (bxierr_set_add(err_set, &err)) {
            // First time such an error is seen, we might report it 
            char * err_str = bxierr_str(err);
            fprintf(stderr, "First time seen and reported error: %s", err_str);
            BXIFREE(err_str);
        } // This error has already been seen, so we don't procude any report, note that it has already been destroyed
    }
    fprintf(stderr, "\n--- End of first time seen error reporting ---\n");
    
    return bxierr_from_set(888, err_set, "Maximal number of distinct errors reached: %zu/%zu", 
                                err_set->distinct_err.errors_nb, err_set->total_seen_nb);
}

int main() {
    bxierr_p err = BXIERR_OK, err2;
    
    srand(time(NULL));
    err2 = f();
    BXIERR_CHAIN(err, err2); // We don't want to deal with the error here
    
    err2 = g();
    BXIERR_CHAIN(err, err2); // Here, we chain both error, meaning that errors from f() might cause g() 
                             // to returned errors too
    
    bxierr_report(&err, STDERR_FILENO);
}

The `main()` illustrates the *error chaining pattern*: errors from function `f()` and  `g()` are not ignored; the error-handling code --- in this simple case, producing a report thanks to `bxierr_report()` --- is at the end of the program. If `f()` produces an error --- what will always happen in this simple case --- it will be the cause of the error produced by `g()` if any. Basically, an error can be dealt with in three different ways:

* you report it directly (using bxierr_report, even better `BXILOG_REPORT` as we will see in next section, or with fprintf() or whatever reporting system (note: the error is normally destroyed after this step);
* you pass it to the caller using a simple `return` statement;
* you try to continue anyway, in this case, you chain using `BXIERR_CHAIN()`.

Function `f()` uses the `bxierr_list_p` structure to store the different errors and turn them into a normal `bxierr_p` object.

Function `g()` uses the `bxierr_set_p` structure to only store distinct errors: the idea is to prevent a software from producing many identical reports such as "Can't write to file ...: disk full". Such a report can only be reported once using this simple but powerful mechanism.

The `bxierr.h` module provides very limited reporting facility: it is therefore only useful when nothing else is available. In particular, the `bxilog.h` module is much preferable for reporting as we will see in [Logging System](#Logging System).

### String Manipulation

String manipulation is usually a headache in C compared to Python, Perl or even Java which is compiled. One of the main reason is probaly the choice of the [NUL string terminator](https://en.wikipedia.org/wiki/Null-terminated_string) seen as ["the most expensive one-byte mistake"](http://queue.acm.org/detail.cfm?id=2010365) by FreeBSD developer Poul-Henning Kamph. However, for compatibility reason, changing this in C is not reasonable. Therefore, the `bxistr` module proposes few but very useful functions. 

Among those functions:

* `bxistr_new()` provides a simple API for creating a new string safely. The `bxistr_new()` function is similar to `printf()` and it defines the appropriate compiler attribute so if a mistake is made in the format string specifier, the compiler produces a warning;
* `bxistr_join()` allows multiple lines to be joined with a given separator, similarly to Python [str.join()](https://docs.python.org/2/library/stdtypes.html#str.join);
* `bxistr_apply_lines()` calls a given function for each line found in a given string;
* `bxistr_prefixer*()` allows a string to be prefixed


### IPC for Concurrent Programming 

Concurrent programming is hard. Much harder than sequential programming. The main reason is that concurrent programming introduces non-determinism whereas sequential programming is deterministic.
Many solutions have been proposed in the past (see [Concurrent Object Oriented Programming with Asynchronous References](https://www.researchgate.net/publication/228905458_Concurrent_object_oriented_programming_with_Asynchronous_References)). However none of them are currently used by the vast majority of developpers: most of them still rely on thread and lock --- probably the weakest solution. Nevertheless, a trend is ongoing: we share the same opinion expressed in [Multithreading Magic]( http://zeromq.org/blog:multithreading-magic): message passing is the way to go. Actually, message passing is massively used in parallel programming (MPI), and is also the basic abstraction of inherently concurrent languages such as [E](http://www.skyhunter.com/marcs/ewalnut.html#SEC18), [Go](https://golang.org/ref/spec#Channel_types) or [Erlang](http://erlang.org/doc/getting_started/conc_prog.html).


many systems, too different
signals
fifo
msg, shm,sockets
pthread_cond/wait
standard feature of modern languages
Go, Haskell, Eiffel, provides standard mechanism for IPC between threads and processes, remote or local
zeromq provides a similar mechanism
Efficient
Security is also provided since zmq 4.x

bxizmq: IPC based on zeromq
A small wrapper with error management

### Logging System

Logging is useful for different kind of people:

* Core developpers: *while* developping his software, a programmer places logging instructions at some places in order to understand what is happening;
* Functionnal test writers: when writing a test, a tester tries to understand the logs produced by the software;
* End-users: messages are sometimes targetted to an end-user;
* Maintenance engieneers: the whole set of messages produced by the software is often used by a maintenance engineer to understand a problem in a software and to repair it when possible.

According to this list, it must be clear that logs are written from day-0 of a software, and might be used in production or event in post-mortem diagnostic.

All modern languages provide a powerful logging API that ease high level programming:

* Java has its own [java.logging API](https://docs.oracle.com/javase/7/docs/api/java/util/logging/package-summary.html), and also some alternatives such as [`log4j`](http://logging.apache.org/log4j/2.x/), or [`logback`](http://logback.qos.ch/);
* Python includes the [logging](https://docs.python.org/2/library/logging.html) module;
* Go provides the [log](https://golang.org/pkg/log/) package.
* ...

However C provides nothing. POSIX provides `syslog()` but as its name implies it has been designed for the system: messages sent by a sofware, are received by the system loggers that decides what to do with the message at a system level. Therefore, even if the end-user can change the logging level of its software, it cannot change anything else: in particular, it usually has not the permission to read the file where the logs are written to (e.g.: `/var/log/message`).

To achieve high level programming, the logging system must be used for any message oriented output whether being on screen, on a file, or over the network. This has several advantages compared to the use of a logging API in addition to the standard `printf()` for console output:

* development simplification (one API to learn only, but you are right, any C programmer knows `printf()`);
* performance optimization: a single call might produce many outputs instead of one call with the logging API and one call for the console output with `printf()`;
* customizability: the software uses the logging API to produce messages, but how and where those messages are produced 
  is defined by the configuration.   

As an example, with a correct configuration one can guarantee that all messages produced on the console (and seen by the end users) are also recorded to a file (that can be hidden if required). The file might contains much more information (according to the configured verbosity), but always contain the messages that have been seen by the end-user. This can be quite usefull for debugging and/or maintenance.

Therefore, we propose the `bxilog.h` module.  

#### The `bxilog`  API

When writing a software, the programmer must focus on what messages must be produced and at what level. 
Where messages will be routed comes in second at the configuration phase. Note that in the case of a library software, this last configuration step is entirely skipped. It will be the responsability of the `main()` program to configure the logging framework.

The example below presents a basic use of the API:

In [20]:
#include <errno.h>
#include <bxi/base/err.h>
#include <bxi/base/log.h>
#include <bxi/base/log/console_handler.h> 

// (1) Define static loggers
SET_LOGGER(MAIN_LOGGER, "ex.main");
SET_LOGGER(ANOTHER_LOGGER, "ex.other");
 
void output_level_names() {
    TRACE(ANOTHER_LOGGER, "Fetching level names"); // (2) Use loggers
    
    char ** level_names;
    size_t n = bxilog_get_all_level_names(&level_names);
    
    OUT(MAIN_LOGGER, "Found %zu level names", n);  // (3) Level OUT replaces printf() like statement
    
    for (size_t i = 0; i < n; i++) {
        DEBUG(MAIN_LOGGER, "Level %zu:\t %s", i, level_names[i]);
    }
}

void output_logger_names() {
    TRACE(ANOTHER_LOGGER, "Fetching logger names")
        
    bxilog_logger_p * loggers;
    size_t n = bxilog_registry_getall(&loggers);   
    OUT(MAIN_LOGGER, "Currently, %zu loggers are known:", n);
    
    for (size_t i = 0; i < n; i++) {
        DEBUG(MAIN_LOGGER, "%s", loggers[i]->name);
    }
        
}
 
int main() {
    // (4) Create a new empty configuration 
    bxilog_config_p config = bxilog_config_new("API Example"); // You might prefer to use argv[0]
    
    bxilog_filters_p filters;
    // (5) Define filters: comment next line, and uncomment the second line to see filtering in action
    bxierr_p err = bxilog_filters_parse(":lowest", &filters); // Show everything
    // bxierr_p err = bxilog_filters_parse(":output,ex.main:debug,~bxilog:off", &filters);
    bxierr_abort_ifko(err);
    
    
    // (6) Add a console handler to the configuration
    bxilog_config_add_handler(config, 
                              BXILOG_CONSOLE_HANDLER, 
                              filters, 
                              BXILOG_WARNING,
                              BXILOG_COLORS_NONE);
    
    // (7) Initialize the bxilog library
    err = bxilog_init(config);
    bxierr_abort_ifko(err);
    
    // From now, the logging library can be used as a full replacement for 
    // fprintf() and the like.
    INFO(MAIN_LOGGER, "Starting the program")
    
    output_level_names();
    
    // (8) You might also use dynamic loggers if required
    bxilog_logger_p logger;
    err = bxilog_registry_get("ex.dynamic", &logger);
    bxierr_abort_ifko(err);
    
    WARNING(logger, "Producing a log through a dynamic logger");
    
    
    output_logger_names();   
    
    // (10) Illustrate error reporting
    char * filename = "/tmp/non_existing_file_foo_bar";
    errno = 0;
    int rc = open(filename, O_RDONLY);
    BXIASSERT(MAIN_LOGGER, -1 == rc);
    err = bxierr_errno("An error happened while opening file: %s", filename);
    BXILOG_REPORT(MAIN_LOGGER, BXILOG_ERROR, err, "Non critical error -- just for testing");
    
    // (11) Finalize the bxilog library
    err = bxilog_finalize(true);
    bxierr_abort_ifko(err);
}

[W] ex.dynam Producing a log through a dynamic logger
[E] ex.main  Non critical error -- just for testing
[E] ex.main  ##code## 2
[E] ex.main  ##mesg## An error happened while opening file: /tmp/non_existing_file_foo_bar: No such file or directory


[D] ~bxilog  Initialization done
[I] ex.main  Starting the program
[T] ex.other Fetching level names
Found 13 level names
[D] ex.main  Level 0:	 off
[D] ex.main  Level 1:	 panic
[D] ex.main  Level 2:	 alert
[D] ex.main  Level 3:	 critical
[D] ex.main  Level 4:	 error
[D] ex.main  Level 6:	 notice
[D] ex.main  Level 7:	 output
[D] ex.main  Level 8:	 info
[D] ex.main  Level 9:	 debug
[D] ex.main  Level 10:	 fine
[D] ex.main  Level 11:	 trace
[D] ex.main  Level 12:	 lowest
[T] ex.other Fetching logger names
Currently, 10 loggers are known:
[D] ex.main  ~bxilog.logger
[D] ex.main  ~bxilog
[D] ex.main  ~bxilog.fork
[D] ex.main  ~bxilog.cfg
[D] ex.main  bxilog.remote
[D] ex.main  ~bxilog.signal
[D] ex.main  ex.main
[D] ex.main  ~bxilog.netsnmp
[D] ex.main  ex.dynamic
[D] ex.main  ex.other
[T] ex.main  ##trce## Backtrace of tid 14319: 7 function calls 
[T] ex.main  ##trce## [00] /home/pierre/dev/scm/bxibase/build/packaged/lib/.libs/libbxibase.so.0(bxierr_backtrace_str+0x1b6) [0xb7701dc4]
[T] 

##### Loggers

In order to produce a log, a thread needs a logger. A logger is basically defined by an arbitrary name. The convention is to create a pseudo tree hierarchy using the '.' as a character separator for different loggers. This is important for filtering. More on that later.

Loggers can be defined statically as in (1) or dynamically as in (8). The only difference is that static loggers are created at loading time (when the library is loaded in memory) whereas dynamic loggers are created the first time the `bxilog_registry_get()` function is called with a logger name unknown to the system. The set of loggers currently known by the system can be retrieved using `bxilog_registry_getall()`.

Once you have a logger, producing a log can be done using various macros which name represents the logging level at shown in (2) with the level `TRACE`.

##### Logging levels

The bxilog module defines several logging levels that is a superset of syslog ones. Retrieving the set of logging levels can be done using the `bxilog_get_all_level_names()`.

The `OUTPUT` level should be used as a replacement for `printf()` as shown in (3). Notice also that levels from `NOTICE` up to `PANIC` should also be used as a replacement for `fprintf(stderr)`. 

How and where those logs will be produced is given by the configuration.

###### Configuration of log handlers

In order to initialize the bxilog library (7), a configuration must be created as in (4) and a logging handler should be added to it as in (6). Several handlers can be added sequentially. A logging handler consumes logs produced by threads and normally outputs them somewhere: on the console, on the filesystem, over the network, and so on. 

For example, the `BXILOG_CONSOLE_HANDLER` displays logs with level 'LOWEST' up to 'NOTICE' on the process standard output file descriptor (stdout) but logs with level 'WARNING' up to 'PANIC' on the process standard error file descriptor (stderr). It also supports colorization of log according to their level. Other logging handler might behave differently.

A simple API is provided to implement new logging handlers. Currently, bxilog provides:

* `BXILOG_CONSOLE_HANDLER`: produces logs on either stdout or stderr according to a customizable log    
  level with or without customizable colors;
* `BXILOG_FILE_HANDLER`: produces logs in a file with all details;
* `BXILOG_SYSLOG_HANDLER`: sends the logs to the syslog daemon;
* `BXILOG_NETSNMP_HANDLER`: produces logs to the net-snmp logging library; 
* `BXILOG_REMOTE_HANDLER`: sends the logs to another process;


In the example above, two logs have been output but they have not been produced by our code. Those logs have a logging name starting with `~bxilog`: they are produced by the logging library itself. This is a perfect example of libraries written with the logging system: it might help in the debugging process. However, most of the time, those logs must not be seen. This is where filtering takes place. 

##### Filtering

Filtering occurs at two different places: first, each logger is given a logging level. Therefore, when a thread is requesting the production of a log at the TRACE level as in (2), if the logger is currently having level 'OUTPUT', the log won't be produced at all. Second, each log handler filters each received log according to the logger name used for the production of the log, and the level of that log.

Filtering is expressed by a simple string, and processed with the `bxilog_filters_parse()` function as in (6). As an example, change (5) by commenting out the first line and uncommenting the second. The output will be rather different.

This is a very simple yet very efficient mechanism to customize the logging system. 

#### `bxilog` Advanced  Features

The previous section presented the main API of the `bxilog` module. However, the `bxilog` module provides many other advanced features presented in this section.  

##### Error Reporting

We have seen that `bxierr` is a good error mechanism for high level programming. However, if no error reporting system is provided, it reduces significantly its purpose. Therefore `bxilog` supports error reporting as shown in (10). An error report is produced using the `BXILOG_REPORT` macro which destroys the error. This report includes many information such as the error code, the logging message and the backtrace. Note that the actual error message and the backtrace are not produced at the same logging level. This allows the console output to display a simple error message while the filesystem can contain a file with both the error message and the backtrace thanks to well defined filters.

`Bxilog` also provides `BXIASSERT()` as a replacement for the standard `assert()` function. The main advantage is that if you use `assert()`, pending logs might not be flushed by the various handlers and might get lost. Also, `BXIASSERT()` provides more information such as the backtrace. 

In the same way, `BXIEXIT()` ensures all logs are flushed before actually exiting and provides more information.

##### Fork Support

Of course, the `bxilog` module is thread-safe (and lock-free), but it is also *fork-safe*. It basically,  means that it supports the standard POSIX `fork()` system call that creates a new process. In the face of a multi-threaded program which is always the case with the `bxilog` module, this is very hard to implement and usually unsupported. If you don't believe this statement, have a look to the RATIONALE section in the POSIX documentation of the [`fork()` system call](http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html) and also to the following articles:

* http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them
* http://thorstenball.com/blog/2014/10/13/why-threads-cant-fork/

When forking a child, the `bxilog` configuration remains the same in the parent, and the module is still configured as before the `fork()` call. However, in the child, this is not the case: the `bxilog` module has not been initialized. This is somewhat in opposition to the semantic of the `fork()` system call which is supposed to clone the entire process, and thus, to produce an exact equivalent of the parent. However, if you have seen the POSIX documentation, in the case of multi-threaded programs, this is no longer the case as only one thread is cloned: the one calling `fork()`. Anyway, the main reason is to prevent the overhead of a logging initialization with all the handlers if the only purpose of the `fork()` is the call to `execve()` which actually overwrite the whole process with a new program.

So, when using `bxilog`, after a `fork()`, keep in mind the following rules:

* the parent can continue to log as usual;
* the child can do an `execve` as usual with no overhead, the `bxilog` has not been initialized;
* the child might want to produce log itself, in this case, it can configure the `bxilog` as usual.

Note that with such a mechanism, the child can either use the same configuration as its parent, or use a distinct configuration. 

Consider the following example:

In [35]:
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <sysexits.h>

#include <bxi/base/err.h>
#include <bxi/base/time.h>
#include <bxi/base/log.h>
#include <bxi/base/log/console_handler.h>
#include <bxi/base/log/file_handler.h>

SET_LOGGER(PARENT_LOGGER, "parent");
SET_LOGGER(CHILD_LOGGER, "child");

int main() {
    bxilog_config_p config = bxilog_config_new("Fork Example");
    bxilog_config_add_handler(config, 
                              BXILOG_CONSOLE_HANDLER, 
                              BXILOG_FILTERS_ALL_ALL, 
                              BXILOG_WARNING,
                              BXILOG_COLORS_NONE);
    
    bxierr_p err = bxilog_init(config);
    bxierr_abort_ifko(err);
    
    INFO(PARENT_LOGGER, "In the parent process");
    
    size_t loop_nb = 5; 
    errno = 0;
    int rc = fork();
    if (-1 == rc) {
        err = bxierr_errno("Calling fork() failed");
        BXIEXIT(EX_OSERR, err, PARENT_LOGGER, BXILOG_CRITICAL);
    }
    if (0 == rc) {  // In the child
        // Configure the logging system: we use the same config
        bxierr_p err = bxilog_init(config);
        bxierr_abort_ifko(err);
    
        INFO(CHILD_LOGGER, "In the child process");
        for (size_t i = 0; i < loop_nb; i++) {
            INFO(CHILD_LOGGER, "Logging something at step %zu", i);
            bxierr_p err = bxitime_sleep(CLOCK_MONOTONIC, 0, 4e8);
            BXILOG_REPORT(CHILD_LOGGER, BXILOG_NOTICE, err, "Continuing...");
        }
        
    } else { // In the parent
        for (size_t i = 0; i < loop_nb; i++) {
            INFO(PARENT_LOGGER, "Logging something at step %zu", i);
            bxierr_p err = bxitime_sleep(CLOCK_MONOTONIC, 0, 4e8);
            BXILOG_REPORT(PARENT_LOGGER, BXILOG_NOTICE, err, "Continuing...");
        }
    }
    
    err = bxilog_finalize(true);
    bxierr_abort_ifko(err);
    
}




[D] ~bxilog  Initialization done
[I] parent   In the parent process
[F] ~bxilog. Preparing for a fork() (state == 3)
[F] ~bxilog  Requesting a flush()
[F] ~bxilog  flush() done succesfully on all 1 handlers
[F] ~bxilog. Ready after a fork()
[I] parent   Logging something at step 0
[I] parent   Logging something at step 1
[I] parent   Logging something at step 2
[D] ~bxilog  Initialization done
[I] child    In the child process
[I] child    Logging something at step 0
[I] child    Logging something at step 1
[I] child    Logging something at step 2
[I] child    Logging something at step 3
[I] child    Logging something at step 4
[D] ~bxilog  Exiting bxilog
[I] parent   Logging something at step 3
[I] parent   Logging something at step 4
[D] ~bxilog  Exiting bxilog


Note that logs won't mix-up between child and parents even if they share the same configuration.


#### Signal Catching

* support signal catching (SEGFAUT, SIGBUS, SIGTERM, except SIGQUIT)
* Support fork (Child must initialize the logging system, fork/exec has no overhead)
* Atomic write -> no mixup between different processes on same file

* Rich BXIASSERT() mechanism
* Ensure all logs are flushed
* Rich stacktrace
* All loggers can see the assertion
* Rich error reporting mechanism
* Error can be treated separately by handlers

#### Performance

* Low overhead: a thread that produces a log does not perform any I/O operation; the 
  minimum amount of time is spent producing the log,   then the system take care of the transport to its destination   
  (console, filesystem, network, ...).     
* Low latency and high throughput: the design is thread-safe but lock-less;



### Unit Testing

By the way, industrial programming does not come without unit testing nowadays, and there is no real equivalent to JUnit framework in C whereas it does exist in most other languages. [CUnit](http://cunit.sourceforge.net/) is the closest version, but it is no more maintained (since 2015) while having several drawbacks, the most important being the lack of clear message when a test fails. Therefore, while in 

In [None]:
#include <bxi/base/mem.h>
#include <stdio.h>

int main() {
    char * str = bximem_calloc(10);
    printf("Hello\n");
}