# High Level Programming in C 

## Motivation


According to the [TIOBE index](http://www.tiobe.com/tiobe-index) report, Java and C are the two most popular languages for 15 years. While C is categorized a high level language, it remains mostly seen by average developers as a low level one with a high learning curve. One reason might be its excellent mapping to hardware which makes it a good candidate for operating system design. But we believe this is not the only reason, even not the main reason. 

In this document, we present a set of requirements for high level programming that in our opinion are not fulfilled by the C language *per se*. For each exposed requirement, a solution is proposed.


## Development Process

Starting a project from scratch (in C) requires adopting coding style, coding convention, packaging rules among others that are shortly described below. 

### Coding Convention

Compared to other languages, the C standard does not provide any coding convention *per se*. This leads to various [coding convention](https://en.wikipedia.org/wiki/Indent_style). The [C FAQ](http://c-faq.com/style/layout.html) advises to follow the K&R coding style. We are even a bit stricter and we follow [this guideline](https://users.ece.cmu.edu/~eno/coding/CCodingStandard.html). We also adopt the following convention:

* most modules provides an object-oriented interface: for a given structure ``struct foo_s_t``, most module functions are defined by ``func(foo_p self, ...)`` where ``foo_p`` is a pointer on ``struct foo`` and represent the "object" on which the function ``func`` operates;
* enumeration are suffixed by ``_e``;
* static names defined in the .c file are prefixed with an underscore such as in ``_a_static_func(...)``;
* internal API within a module implementation use double underscore to differenciate with external API.

We also provide two templates that ease structuring the code in a consistent manner:

* header file (TODO: provide the link)
* implementation file (TODO: provide the link)


### Code Quality Convention

Neither C11, nor POSIX define code quality convention. There exist however at least two "standards" related somehow to this topic:

* [MISRA](https://en.wikipedia.org/wiki/MISRA_C): initially for the automative industry but now enlargen to embedded system;
* [CERT C Coding Standard](https://www.securecoding.cert.org/confluence/x/HQE): defines a set of rules and recommandation for security and safety.

The problem is the tooling required to ensure you conform to those standard. They might not be free of charge. At the very minimum, we recommand the following:

* Compilation flags: turn all warning into error, all warnings must be treated before any release of the product; the sooner those warnings are taken into account, the easier the maintenance of the product on the long term. For example, with gcc, CFLAGS are: 

    CFLAGS="-wall ..."  # TODO DEFINE  
* Compile with various compilers ---  at the very least, gcc and clang: each compiler might raise some issues unseen by others;
* Use [valgrind](http://valgrind.org/) systematically to detect memory problems. We use the following option for valgrind:

    VALGRIND_OPTS="..." # TODO DEFINE 
 

### Packaging

The layout of a project is important, especially when the project is made of several subprojects. If each subproject holds the same layout, it becomes much easier to find out a given file. Therefore, we propose the following layout that should fit most projects requirements:
TODO


### Versionning

Versionning a product is not so easy. First, one must adopt the versionning convention. There are many ways for that (see [Wikipedia](https://en.wikipedia.org/wiki/Software_versioning)). We propose to follow the `major.minor.fix-release` convention, where:

* `major` is incremented when incompatible changes have been introduced;
* `minor` is incremented when compatible changes have been introduced (new functions in the API for example);
* `fix` is incremented when only bug fixes have been introduced;
* `release` is incremented when only packaging have been changed such as layout, makefile, or something else.

Once the versionning scheme has been designed, providing this version at the code level must be done in order to  implement the ``cmd --version`` or the ``About...`` dialog box. This can be done using the following API:

TODO: define and implement an API for version management

### Dependency Management

Most modern languages provide a system (integrated or not) to retrieve dependencies. For example, Java use [Maven](https://maven.apache.org/) or [Ivy](http://ant.apache.org/ivy/), Python uses [Pypi](https://pypi.python.org/pypi), Go has the [``go``](https://golang.org/cmd/go/) tool, ruby uses [``gem``](https://rubygems.org/) but C provides nothing.

According to the guideline defined here, especially concerning [Packaging](#Packaging) and [Versionning](#Versionning), we propose a tool that might use dependency management of C packages.

TODO: make ``bxivcs`` a product and describe it.

### Development Cycle

Probably the main difference between modern languages such as Java, Python, or Go, is the development cycle. Since Python is an interpreted language, it offers naturally the fastest one: code -> run. C, Java and Go must be compiled and their development cycle therefore includes one more step: code -> compile -> run. In particular, Java is so well integrated within "modern" IDEs such as Eclipse or Netbeans that the compile phase is mostly transparent. Therefore, at least in Java, the development cycle remains as fast as for any interpreted language: code -> run. Despite the support for the C language by Eclipse, it does not currently provide the same level of integration than with the Java language. This might change in the next future.


### Debugging

Most modern languages provide various mechanism that ease the debugging:

* no direct access to memory;
* smart boundary checking (that can be removed at runtime sometimes without harm);
* stack trace on error with dynamic message, including file, function and line number;
* assertion system that can be de-activated at runtime; 
* logging system that helps to understand what happens at runtime;
* monitoring system (at least in Java, with JMX).

By default, C11 does not provide any way to produce a stack trace, nor a logging system, neither a monitoring system. POSIX does not provide anything comparable to what "modern" languages provide (more on that [later](#Standard Libraries)). Therefore, most of the time, in C, when a problem occurs, it can result in a segmentation fault at best (with the famous ``core dumped`` message), and undetected at worst (e.g. memory corruption).

One first solution to this problem is to guarantee code quality using several tools and techniques already presented in section [Code Quality Convention](#Code Quality Convention). But this is not sufficient to accelarate the development cycle: using ``gdb`` or ``valgrind`` for simple bugs is too costly/time consuming. 

POSIX ``assert.h`` module is also not sufficient: assertion can only be disabled at compile time, not at runtime, and the error message is quite minimal: the stacktrace for example, is not included. Finally, of course, ``printf()`` is not a good option for a logging system: it is too costly, it does not provide enough information (thread, module, function, line), and cannot be partly disabled in a module and not in another. 

Therefore, some important libraries are missing for debugging as will be discussed in [Standard Libraries](#Standard Libraries). 


### Documentation 

Finally, the standard documentation of the C language is the venerable [man page system](https://en.wikipedia.org/wiki/Man_page). Despite being usefull for commands or configuration files, it does not compete with other documentation system specifically designed for APIs such as [JavaDoc](https://docs.oracle.com/javase/8/docs/api/), [Pydoc/Sphinx](https://docs.python.org/3/), [Go Doc](https://godoc.org/) and so on:

* searching with the ``apropos`` command provides too many items, most of them are useless (e.g.: wrong section);
* relation between structures/types/modules is not dynamic/hypertext;
* no figure or schema can be provided since the system is text based;

We actually use [Doxygen](http://www.doxygen.org/) since it supports C natively, and also C++, Java, Perl and Python. It has some drawbacks however, but we did not find any other good alternative suitable for projects that includes both C code *and* Python code. One major advantage we found with Doxygen however is its ability to support code snippet from examples and how it links them with the documentation. One example here: [TODO: provide the link](http://somewhere.net/bxibase/examples).



## Standard Libraries

Actually, despite its age --- C has been invented in the early 70's by Dennis Richie --- it is very poor in general purpose standard libraries. For example, while the current C standard --- known as [C11](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) --- has been specified in 2011, it still does not provide generic hashtable comparable to what can be found in ”modern” languages such as Java, Python, Perl, Go, Haskell, among many others. In theory, one should distinguish the language --- the set of syntaxic and grammar rules that defines it --- from the default set of libraries provided with it. But in practice, both makes a whole that must fulfill an industrial project requirements. Most modern languages provide a very rich set of default libraries. A comparison between the documentation of the [C11 default standard library](https://en.wikipedia.org/wiki/C_standard_library#Header_files)  and at least one "modern" language makes this statement quite clear:

* [Java documentation](http://docs.oracle.com/javase/8/docs/api/index.html),
* [Python documentation](https://docs.python.org/2/library/index.html),
* [Go documentation](https://golang.org/pkg/)
* [Haskell documentation](http://hackage.haskell.org/packages/)

Even if the [POSIX standard](http://pubs.opengroup.org/onlinepubs/9699919799/) provides many additionnal libraries to C11 --- general purpose hashtable and list can be found in the ``search.h`` header for example --- it is far from sufficient to be comparable to what is provided by "modern" languages (e.g.: for example, ``hcreate()`` provides an API for a hashtable but is not reentrant, does not support a dynamic maximal number of elements and supports only key as string). 

In the following, we consider a minimal set of libraries that are strictly required for high level programming. They either enhance what is provided by C11 and POSIX or they define new missing modules.

* [Memory Management](Memory Management.ipynb)
* [Error Management](Error Management.ipynb)
* [String Manipulation](String Manipulation.ipynb)
* [Concurrent, Parallel and Distributed Programming](Concurrent, Parallel and Distributed Programming.ipynb)
* [Logging System](Logging System.ipynb)

### Unit Testing

By the way, high level programming does not come without unit testing nowadays, and there is no real equivalent to JUnit framework in C whereas it does exist in most other languages. [CUnit](http://cunit.sourceforge.net/) is the closest version, but it is no more maintained (since 2015) while having several drawbacks, the most important being the lack of clear message when a test fails.

This is the framework we currently use, but we expect to enhance it in the future.

## About

Pierre Vignéras received the PhD degree in computer sciences from the University of Bordeaux 1, France, in 2004. He is the technical leader of the BXI Fabric Management since 2011, the main architect and co-developper of the BXI routing component. His research interests include high-performance interconnect, routing algorithms and their effective implementation in real-life systems.  He is a regular reviewer of the journal on Concurrency and Computation: Practice and Experience. He is also part-time lecturer at University of Paris-Sud, Orsay, France.








