# High Level Programming in C

According to the TIOBE report available at http://www.tiobe.com, Java and C are the two most popular languages for 15 years. Both share more than 30% of the ratings; other languages holds 6% or less of these ratings. While C is categorized a high level language, it remains mostly seen by average developers as a low level one with a high learning curve. One reason might be its excellent mapping with hardware which makes it a good candidate for operating system design. But we believe this is not the only reason, even not the main reason. 

In this document, we present a set of requirements for high level programming that in our opinion are not fulfilled by the C language. For each requirement exposed, a solution is proposed.


## Development Process

Starting a project from scratch (in C) requires adopting coding style, coding convention, packaging rules among others that are shortly described below. 

### Coding Convention

Compared to other languages, the C standard does not provide any coding convention *per se*. This leads to various coding convention: https://en.wikipedia.org/wiki/Indent_style. The [C FAQ](http://c-faq.com/style/layout.html) advises to follow the K&R coding style. We are even a bit stricter and we follow this guideline: https://users.ece.cmu.edu/~eno/coding/CCodingStandard.html.

We also adopt the following convention:
* most modules provides an object-oriented interface: for a given structure ``struct foo``, most module functions are defined by ``func(foo_p self, ...)`` where ``foo_p`` is a pointer on ``struct foo`` and represent the "object" on which the function ``func`` operates;
* enumeration are suffixed by ``_e``;
* static names defined in the .c file are prefixed with an underscore such as in ``_a_static_func(...)``;
* internal API within a module implementation use double underscore to differenciate with external API.

We also provide two templates that ease structuring the code in a consistent manner:
* header file (TODO: provide the link)
* implementation file (TODO: provide the link)


### Code Quality Convention

Neither C11, nor POSIX define code quality convention. There exist however at least two "standards" related somehow to this topic:
* MISRA: initially for the automative industry but now enlargen to embedded system:  https://en.wikipedia.org/wiki/MISRA_C;
* CERT C Coding Standard: defines a set of rules and recommandation for security and safety: https://www.securecoding.cert.org/confluence/x/HQE 

The problem is the tooling required to ensure you conform to those standard. They might not be free of charge. At the very minimum, we recommand the following:
* Compilation flags: turn all warning into error, all warnings must be treated before any release of the product; the sooner those warnings are taken into account, the easier the maintenance of the product on the long term. For example, with gcc, CFLAGS are: 

    CFLAGS="-wall ..."  # TODO DEFINE
    
* Compile with various compilers ---  at the very least, gcc and clang: each compiler might raise some issues unseen by others;
* Use [valgrind](http://valgrind.org/) systematically to detect memory problems. We use the following option for valgrind:

    VALGRIND_OPTS="..." # TODO DEFINE 
 

### Packaging

The layout of a project is important, especially when the project is made of several subprojects. If each subproject holds the same layout, it becomes much easier to find out a given file. Therefore, we propose the following layout that should fit most projects requirements:
TODO


### Versionning

Versionning a product is not so easy. First, one must adopt the versionning convention. There are many ways for that: https://en.wikipedia.org/wiki/Software_versioning. We propose to follow a quite common convention:
``major.minor.fix-release`` where:
* ``major`` is incremented when incompatible changes have been introduced;
* ``minor`` is incremented when compatible changes have been introduced (new functions in the API for example);
* ``fix`` is incremented when only bug fixes have been introduced;
* ``release`` is incremented when only packaging have been changed such as layout, makefile, or something else.

Once the versionning scheme has been designed, providing this version at the code level must be done in order to  implement the ``cmd --version`` or the ``About...`` dialog box. This can be done using the following API:

TODO: define and implement an API for version management

### Dependency Management

Most modern languages provide a system (integrated or not) to retrieve dependencies. For example, Java use [Maven](https://maven.apache.org/) or [Ivy](http://ant.apache.org/ivy/), Python uses [Pypi](https://pypi.python.org/pypi), Go has the [``go``](https://golang.org/cmd/go/) tool, ruby uses [``gem``](https://rubygems.org/) but C provides nothing.

According to the guideline defined here, especially concerning [Packaging](#Packaging) and [Versionning](#Versionning), we propose a tool that might use dependency management of C packages.

TODO: make ``bxivcs`` a product and describe it.

### Development Cycle

Probably the main difference between modern languages such as Java, Python, or Go, is the development cycle. Since Python is an interpreted language, it offers naturally the fastest one: code -> run. C, Java and Go must be compiled and their development cycle therefore includes one more step: code -> compile -> run. In particular, Java is so well integrated within "modern" IDEs such as Eclipse or Netbeans that the compile phase is mostly transparent. Therefore, at least in Java, the development cycle remains as fast as for any interpreted language: code -> run. Despite the support for the C language by Eclipse, it does not currently provide the same level of integration than with the Java language. This might change in the next future.


### Debugging

Most modern languages provide various mechanism that ease the debugging:
* no direct access to memory;
* smart boundary checking (that can be removed at runtime sometimes without harm);
* stack trace on error with dynamic message, including file, function and line number;
* assertion system that can be de-activated at runtime; 
* logging system that helps to understand what happens at runtime;
* monitoring system (at least in Java, with JMX).

By default, C11 does not provide any way to produce a stack trace, nor a logging system, neither a monitoring system. POSIX does not provide anything comparable to what "modern" languages provide (more on that [later](#Standard Libraries)). Therefore, most of the time, in C, when a problem occurs, it can result in a segmentation fault at best (with the famous ``core dumped`` message), and undetected at worst (e.g. memory corruption).

One first solution for this problem is to guarantee code quality using several tools and techniques already presented in section [Code Quality Convention](#Code Quality Convention). But this is not sufficient to accelarate the development cycle: using ``gdb`` or ``valgrind`` for simple bugs is too costly/time consuming. 

POSIX ``assert.h`` module is also not sufficient: assertion can only be disabled at compile time, not at runtime, and the error message is quite minimal: the stacktrace for example, is not included. Finally, of course, ``printf()`` is not a good option for a logging system: it is too costly, it does not provide enough information (thread, module, function, line), and cannot be partly disabled in a module and not in another. 

Therefore, some important libraries are missing for debugging as will be discussed in [Standard Libraries](#Standard Libraries). 


### Documentation 

Finally, the standard documentation of the C language is the venerable man page system. Despite being usefull for commands or configuration files, it does not compete with other documentation system specifically designed for APIs such as ``javadoc``, ``pydoc``, ``godoc`` and so on:
* searching with the ``apropos`` command provides too many items, most of them are useless (e.g.: wrong section);
* relation between structures/types/modules is not dynamic/hypertext;
* no figure or schema can be provided since the system is text based;

We actually use [Doxygen](http://www.doxygen.org/) since it supports C natively, and also C++, Java, Perl and Python. It has some drawbacks however, but we did not find any other good alternative suitable for projects that includes both C code *and* Python code. One major advantage we found with Doxygen however is its ability to support code snippet from examples and how it links them with the documentation. One example here: [TODO: provide the link](http://somewhere.net/bxibase/examples).



## Standard Libraries

Actually, despite its age --- C has been invented in the early 70's by Dennis Richie --- it is very poor in general purpose standard libraries. For example, while the current C standard --- known as C11 --- has been specified in 2011 by http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf, it still does not provide generic list or hashtable comparable to what can be found in ”modern” languages such as Java, Python, Perl, Go, Haskell, among many others. In theory, one should distinguish the language --- the set of syntaxic and grammar rules that defines it --- from the default set of libraries provided with it. But in practice, both makes a whole that must fulfill an industrial project requirements. Most modern languages provide a very rich set of default libraries. A comparison between the documentation of the C11 default standard library https://en.wikipedia.org/wiki/C_standard_library#Header_files  and at least one "modern" language makes this statement quite clear:

* Java: http://docs.oracle.com/javase/8/docs/api/index.html,
* Python: https://docs.python.org/2/library/index.html,
* Perl: https://docs.python.org/2/library/index.html,
* Go: https://golang.org/pkg/#stdlib
* Haskell: https://www.haskell.org/onlinereport/haskell2010/haskellpa2.html#x20-192000II


Even if the POSIX standard defined in http://pubs.opengroup.org/onlinepubs/9699919799/ provides many additionnal libraries to C11 --- general purpose hashtable and list can be found in the search.h header for example --- it is far from sufficient to be comparable to what is provided by "modern" languages (e.g.: for example, hcreate() provides an API for a hashtable but is not reentrant, does not support a dynamic maximal number of elements and supports only key as string). 

In the following, we consider a minimal set of libraries that are strictly required for high level programming. They either enhance what is provided by C11 and POSIX or they define a new missing module.

### Memory Management

C language supports low-level access to computer memory. This is often seen by experts as a strength. However it is also probably one of the main reasons why bugs, memory leak and security holes do exist. For allocation, functions ``malloc()``, ``calloc()`` and ``realloc()`` are used. Except ``calloc()``, those functions do not initialize the memory. This might look strange to users of Java, Python, or other similar languages that always initialize their memory. We believe this should be the default behaviour most of the time unless the profiling of the application shows otherwise. POSIX also provides ``free()``, for releasing the previously allocated memory. However this function does not nullify the pointer value given in argument. This often leads to a problem when the given pointer is given twice (by error) to ``free()``. Most of the time, keeping the previous value of the released memory is error prone.

Below a program that illustrates all problems mentionned above with the POSIX API.

In [1]:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void) {
    size_t n = 10;
    char *ptr = malloc(n*sizeof(*ptr));
    printf("\nMemory content after malloc #1 (not guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    memset(ptr, 'A', n*sizeof(*ptr));
    free(ptr);
    printf("\nValue of ptr after a free() (not guaranteed to be NULL): %p", ptr);
    
    ptr = malloc(n*sizeof(*ptr));
    printf("\nMemory content after malloc #2 (not guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    printf("\n");
    
    n *= 2;
    ptr = realloc(ptr, n*sizeof(*ptr));
    printf("\nMemory content after realloc (the new memory area is not guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    printf("\n");
    
    free(ptr);
    // free(ptr); // This will produce a big error 
}




Memory content after malloc #1 (not guaranteed to be zeroed)
00000000000000000000
Value of ptr after a free() (not guaranteed to be NULL): 0xa00c110
Memory content after malloc #2 (not guaranteed to be zeroed)
00000000414141414141

Memory content after realloc (the new memory area is not guaranteed to be zeroed)
000000004141414141410000ffffffe90e020000000000


We propose the [``bximem``](http://doc.bxi.hl/bxibase/bxi/base/mem.h) module that mainly provides functions more targetted to high level programming. It solve all problems above as shown by the example below:

In [2]:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#include <bxi/base/mem.h>

int main(void) {
    size_t n = 10;
    char *ptr = bximem_calloc(n*sizeof(*ptr));
    printf("\nMemory content after bximem_calloc #1 (guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    memset(ptr, 'A', n*sizeof(*ptr));
    BXIFREE(ptr);
    printf("\nValue of ptr after a BXIFREE() (guaranteed to be NULL): %p", ptr);
    
    ptr = bximem_calloc(n*sizeof(*ptr));
    printf("\nMemory content after malloc #2 (guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    printf("\n");
    
    size_t old_size = n;
    n *= 2;
    ptr = bximem_realloc(ptr, old_size, n*sizeof(*ptr));
    printf("\nMemory content after realloc (the new memory area is not guaranteed to be zeroed)\n");
    for (size_t i = 0; i < n; i++) printf("%02x", ptr[i]);
    printf("\n");
    
    BXIFREE(ptr);
    BXIFREE(ptr); // This won't produce any error since ptr is already NULL 
}




Memory content after bximem_calloc #1 (guaranteed to be zeroed)
00000000000000000000
Value of ptr after a BXIFREE() (guaranteed to be NULL): (nil)
Memory content after malloc #2 (guaranteed to be zeroed)
00000000000000000000

Memory content after realloc (the new memory area is not guaranteed to be zeroed)
0000000000000000000000000000000000000000


### Error Management

If memory management in C, can be seen as a powerfull tool, when combined with C error management, it definitely becomes the main reason of buggy C programs. Actually, error management in C is tradionnally based on integer return code which makes it at the same time very poor and very weak:
* very poor: because a simple integer does not hold any context, it cannot tell a lot on the reason of an issue;
* very weak: because using the result of a function is not mandatory in C, it is very easy to just forget the returned code.

For the first problem, `errno`, `perror()` and `strerror()` functions are provided but they are a pain to use, as shown by the code below:

In [5]:
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void) {
    errno = 0;                                                   // (1) Don't forget that!
    FILE * file = fopen("/a/non/existent/file", "r");
    if (NULL == file) {                                          // (2) Don't forget that!
        char * msg = strerror(errno);                            // (3) Not thread-safe
        fprintf(stderr, "Something wrong happened: %s\n", msg);  // (4) Does the  message holds all required information? 
        // free(msg);                                            // (5) Don't do that, segfault guaranteed!
    } else {
        printf("Strange: you really have such a file?\n");       // This should not happened
    }
    
}

Something wrong happened: No such file or directory




* In instruction (1), `errno` must be initialized before any call to function that modifies it. Unfortunately, the behavior of C standard functions is not consistent regarding their usage of `errno`. For example, most functions defined in `pthread.h` does not modify `errno`. Note that in this small example, the problem of setting `errno` might not appear at first, but in a real software, it is very easy to just forget its initialization. In such a case, the behavior of the processus might become completely strange: it might notify about error at some place where the error did not occur.

* Instruction (2) is standard practice in C, and we will rely on that for error management: checking the return value of a function.

* Instruction (3) uses the `strerror()` POSIX function defined in `string.h` which is not thread safe. It returned a string representing the error message related to the given error code, in our case `errno`.

* Instruction (4) is a very simple way to deal with the error: we just display it. However, the message returned by `strerror()` does not hold all the context, in our case: the file name. Therefore, the message displayed is not very usefull. Of course, in this simple example, we just can display the file name in the `fprintf()` function, and it will solve the problem. However in the general case, the fact that `strerror()` does not include the context is a problem: you cannot return the value to the caller for example, you have to create a specific structure that includes the context along with the error message (or the code, but not `errno` since it might change afterwards).

* Instruction (5) is also dangerous: `strerror()` does not returned an allocated string --- this is the main reason why it cannot include the context by the way --- therefore it must not be freed nor modified.

Most modern languages uses [Exception Handling](https://en.wikipedia.org/wiki/Exception_handling) to solve most of those issues: an exception is an object, therefore it can hold a context that helps understanding what the problem is all about. In some languages (e.g. Java), some exceptions cannot be ignored at all, they have to be dealt with by the caller or they must be passed up in the call stack.

If C libraries for exception handling [exist](https://github.com/guillermocalvo/exceptions4c/wiki/alternatives) they are of course not standard. Since exception handling has some drawbacks (see [Exception Handling Criticism on Wikipedia](https://en.wikipedia.org/wiki/Exception_handling#Criticism) and [Why should I have written ZeroMQ in C, not C++ (part I) ](http://250bpm.com/blog:4)], we propose a solution that fits more with C tradition.  

The `bxierr` module 

errno does not provide context
strerr does not provide data, cause
error code is insufficient and does not impose the handling
exceptions libraries exist but are not standard
Exceptions are not as good as they seem to be (see zeromq)
no real standard and easy reporting tool

bxierr is :
    * efficient: Address comparison is fast (A little more expensive on error)
    * Rich: backtrace, Dynamic string of the error message (compared to errno), Provide data in the error that can be used by the caller
    * Not exception based, See zeromq blog on problem with exceptions for details (more powerfull ?)

    * provide low-level assertion: More details provided than assert.h

### String Manipulation

String manipulation is usually a headache in C compared to Python, Perl or even Java which is compiled. One of the main reason is probaly the choice of the [NUL string terminator](https://en.wikipedia.org/wiki/Null-terminated_string) seen as ["the most expensive one-byte mistake"](http://queue.acm.org/detail.cfm?id=2010365) by FreeBSD developer Poul-Henning Kamph. However, for compatibility reason, changing this in C is not reasonable. Therefore, the `bxistr` module proposes few but very useful functions. 

Among those functions:

* `bxistr_new()` provides a simple API for creating a new string safely. The `bxistr_new()` function is similar to `printf()` and it defines the appropriate compiler attribute so if a mistake is made in the format string specifier, the compiler produces a warning;
* `bxistr_join()` allows multiple lines to be joined with a given separator, similarly to Python [str.join()](https://docs.python.org/2/library/stdtypes.html#str.join);
* `bxistr_apply_lines()` calls a given function for each line found in a given string;
* `bxistr_prefixer*()` allows a string to be prefixed 


Must be simple and safe. Compile with flags for printf() like function. Check all error code.


bxistr -> does all this 



### Logging System


Error leads to SEGFAULT, OR SIGBUS, ... hard to debug
No logging system (syslog impact the whole system)

Error leads to stacktrace with code line: ease debugging
Logging system helps to understand what happened and can be deactivated at runtime

syslog is not a good candidate
too coarse grain
inefficient (To check)
not flexible enough

bxilog: 

* Low overhead, High throughput, Low latency
* Thread-safe but lock-less (Except for underlying libraries: malloc and zeromq)
* support signal catching if required (SEGFAUT, SIGBUS, SIGTERM, except SIGQUIT)
* Support fork (Child must initialize the logging system, fork/exec has no overhead)
* Atomic write -> no mixup between different processes on same file
* Support unlimited number of loggers
* Support simple but powerful filtering mechanism
* Already provide multiple handlers: syslog, console, Colored or B&W and customizable, file, remote, net-snmp
* file handler: bxilogparser ease human processing
* Simple API for new handler implementations
* Rich BXIASSERT() mechanism
* Ensure all logs are flushed
* Rich stacktrace
* All loggers can see the assertion
* Rich error reporting mechanism
* Error can be treated separately by handlers

### IPC
many systems, too different
signals
fifo
msg, shm,sockets
pthread_cond/wait
standard feature of modern languages
Go, Haskell, Eiffel, provides standard mechanism for IPC between threads and processes, remote or local
zeromq provides a similar mechanism
Efficient
Security is also provided since zmq 4.x

bxizmq: IPC based on zeromq
A small wrapper with error management

### Unit Testing

By the way, industrial programming does not come without unit testing nowadays, and there is no real equivalent to JUnit framework in C whereas it does exist in most other languages. [CUnit](http://cunit.sourceforge.net/) is the closest version, but it is no more maintained (since 2015) while having several drawbacks, the most important being the lack of clear message when a test fails. Therefore, while in 

In [1]:
#include <bxi/base/mem.h>
#include <stdio.h>

int main() {
    char * str = bximem_calloc(10);
    printf("Hello\n");
}



Hello
