## Error Management

If memory management in C, can be seen as a powerfull tool, when combined with C error management, it definitely becomes the main reason of buggy C programs. Actually, error management in C is tradionnally based on integer return code which makes it at the same time very poor and very weak:

* very poor: because a simple integer does not hold any context, it cannot tell a lot on the reason of an issue;
* very weak: because using the result of a function is not mandatory in C, it is very easy to just forget the returned code.

For the first problem, `errno`, `perror()` and `strerror()` functions are provided but they are a pain to use, as shown by the code below:

In [12]:
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void) {
    errno = 0;                                                   // (1) Don't forget that!
    FILE * file = fopen("/a/non/existent/file", "r");
    if (NULL == file) {                                          // (2) Don't forget that!
        char * msg = strerror(errno);                            // (3) Not thread-safe
        fprintf(stderr, "Something wrong happened: %s\n", msg);  // (4) Does the  message holds all required information? 
        // free(msg);                                            // (5) Don't do that, segfault guaranteed!
    } else {
        printf("Strange: you really have such a file?\n");       // This should not happened
    }
    
}

Something wrong happened: No such file or directory




* In instruction (1), `errno` must be initialized before any call to function that modifies it. Unfortunately, the behavior of C standard functions is not consistent regarding their usage of `errno`. For example, most functions defined in `pthread.h` does not modify `errno`. Note that in this small example, the problem of setting `errno` might not appear at first, but in a real software, it is very easy to just forget its initialization. In such a case, the behavior of the process might become completely strange: it might notify about error at some places where the error did not occur.

* Instruction (2) is standard practice in C, and we will rely on that for error management: checking the return value of a function.

* Instruction (3) uses the `strerror()` POSIX function defined in `string.h` which is not thread safe. It returns a string representing the error message related to the given error code, in our case `errno`.

* Instruction (4) is a very simple way to deal with the error: we just display it. However, the message returned by `strerror()` does not hold all the context, in our case for example, the file name. The context should also include the whole call stack. Therefore, the message displayed is not very usefull. Of course, in this simple example, we just can display the file name in the `fprintf()` function, and it will solve the problem. However in the general case, the fact that `strerror()` does not include the context is a problem: you cannot return the value to the caller for example, you have to create a specific structure that includes the context along with the error message (or the code, but not `errno` since it might change afterwards). Error reporting is also very poor: either displays on standard error using one of `perror()` or `error()` or do it yourself with `fprintf()`.

* Instruction (5) is also dangerous: `strerror()` does not return an allocated string --- this is the main reason why it cannot include the context by the way --- therefore it must not be freed nor modified.

Most modern languages uses [Exception Handling](https://en.wikipedia.org/wiki/Exception_handling) to solve most of those issues: an exception is an object, therefore it can hold a context that helps understanding what the problem is all about. In some languages (e.g. Java), some exceptions cannot be ignored at all, they have to be dealt with by the caller or they must be passed up in the call stack. If C libraries for exception handling [exist](https://github.com/guillermocalvo/exceptions4c/wiki/alternatives) they are of course not standard. 

Anyway, exception handling has some drawbacks (see [Exception Handling Criticism on Wikipedia](https://en.wikipedia.org/wiki/Exception_handling#Criticism) and [Why should I have written ZeroMQ in C, not C++ (part I)](http://250bpm.com/blog:4), therefore, we prefer to stick with the C return-code tradition.

The `bxierr` module has been designed specifically to address those problems. It defines an object-like structure `bxierr_p` that represents the error. Following the C tradition, a function is supposed to return such a `bxierr_p` object in case of error or the special constant `BXIERR_OK` when no problem occurs. 

In short term, the `bxierr.h` module is:

* efficient: determining if an error occurs after a call requires just a check between two pointers: the returned `bxierr_p` object and the constant `BXIERR_OK`; this operation is very fast (as fast as comparing two integers as done in the traditionnal approach);
* rich: the returned `bxierr_p` object includes an error code, a message that can include part of the context --- such as the file name in our example --- and the complete call stack;
* safer: since a `bxierr_p` object is returned, proper tool such as [Valgrind](valgrind.org) or [Coverity](www.coverity.com) will shout if it is not either destroyed or returned to the caller.

This last property prevents errors from being ignored as it is usually done in C (how many times do you really check the code returned by printf()?). However, handling the error after each function call is clearly a pain and makes the code unreadable in the end. Note that one of the feature of exception handling is the ability to factorize the error handling code at one location in the `except/catch` block. The `bxierr.h` module provides various features in this regards:

* error chaining: an error can be the cause of another error; 
* error list: non related errors can be stored in a list;
* error set: non related errors can be stored in a set --- that is two errors of the same type (that is   with the same error code) are only stored once; this makes it possible to report only errors once per   kind of errors.


The following example illustrates the use of the `bxierr.h` module for high level programming. 

In [13]:
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <bxi/base/err.h>

int main(void) {
    char * filename = "/a/non/existent/file";
    errno = 0;                                                   // (1) Don't forget that!
    FILE * file = fopen(filename, "r");
    if (NULL == file) {
        bxierr_p err = bxierr_errno("Problem while opening %s",  // (2) Convenient function to create a bxierr_p
                                    filename);
        bxierr_report(&err, STDERR_FILENO);                      // (3) Convenient function to report an error
    } else {
        printf("Strange: you really have such a file?\n");       // This should not happened
    }
}
    

##code## 2
##mesg## Problem while opening /a/non/existent/file: No such file or directory
##trce## Backtrace of tid 100: 7 function calls 
##trce## [00] /lib64/libbxibase.so.0(bxierr_backtrace_str+0x195) [0x7fd8adc95f15]
##trce## [01] /lib64/libbxibase.so.0(bxierr_new+0x99) [0x7fd8adc96239]
##trce## [02] /lib64/libbxibase.so.0(bxierr_vfromidx+0x97) [0x7fd8adc963c7]
##trce## [03] /lib64/libbxibase.so.0(bxierr_fromidx+0x94) [0x7fd8adc96504]
##trce## [04] /tmp/tmpqovgxgve.out() [0x400828]
##trce## [05] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fd8ad8edb35]
##trce## [06] /tmp/tmpqovgxgve.out() [0x400709]
##trce## Backtrace end




* Instruction (1) is still requires since `fopen()` is called.
* Instruction (2) shows a convenient way to create a `bxierr_p` from an `errno` code. There are many other functions for `bxierr_p` creation.
* Instruction (3) shows how to report a `bxierr_p`; another more powerfull way will be presented in [Logging System](#Logging System).

Notice the output now provides much more information: 

* the error code;
* the message with the file name;
* the full call stack.

Of course, one might argue that if the file name is required in the output, it just have to be passed to  `fprintf()`, and the result will be somewhat equivalent. In this simple case, where the error is just reported, you will be right --- the call stack will be missing though. However, if the error is not only supposed to be reported, but also to be treated by the caller, the filename must be present in the error itself as a data.

The `bxierr.h` module is a real change in C programming, and is one of the most important module for high level programming. To understand that, a more complete example is required: 

In [14]:
#include <stdlib.h>
#include <time.h>
#include <bxi/base/err.h>

bxierr_p f() {
    // This function actually generates many independant errors, we store them in a list
    bxierr_list_p err_list = bxierr_list_new();
    for (size_t i = 0; i < 5; i++) {
        // For example, some files are opened
        // open(...)
        bxierr_p err = bxierr_simple(5, "Error returned from thread %d", i); // An error type is given by an integer code
        bxierr_list_append(err_list, err);
    }
    return (0 == err_list->errors_nb) ? BXIERR_OK : bxierr_from_list(999, err_list, "At least one error occurs");  
}

bxierr_p g() {
    // This function might generate many errors in a loop, instead of storing each one of them,
    // we only store each distinct ones, limiting the number of reports.
    bxierr_set_p err_set = bxierr_set_new();
    while (3 > err_set->distinct_err.errors_nb) {
        // This loop is for example, accessing a file on a regular basis but must never crash even on error.
        // However error reporting is important. We report an error only the first time it is seen
        // read(...) or write(...)
        long int err_type = rand() % 4; // Let's suppose only 4 types of errors 
        bxierr_p err = bxierr_simple(err_type, "An error occured while reading/writing");
        if (bxierr_set_add(err_set, &err)) {
            // First time such an error is seen, we might report it 
            char * err_str = bxierr_str(err);
            fprintf(stderr, "First time seen and reported error: %s", err_str);
            BXIFREE(err_str);
        } // This error has already been seen, so we don't procude any report, note that it has already been destroyed
    }
    fprintf(stderr, "\n--- End of first time seen error reporting ---\n");
    
    return bxierr_from_set(888, err_set, "Maximal number of distinct errors reached: %zu/%zu", 
                                err_set->distinct_err.errors_nb, err_set->total_seen_nb);
}

int main() {
    bxierr_p err = BXIERR_OK, err2;
    
    srand(time(NULL));
    err2 = f();
    BXIERR_CHAIN(err, err2); // We don't want to deal with the error here
    
    err2 = g();
    BXIERR_CHAIN(err, err2); // Here, we chain both error, meaning that errors from f() might cause g() 
                             // to returned errors too
    
    bxierr_report(&err, STDERR_FILENO);
}

First time seen and reported error: ##code## 2
##mesg## An error occured while reading/writing
##trce## Backtrace of tid 106: 6 function calls 
##trce## [00] /lib64/libbxibase.so.0(bxierr_backtrace_str+0x195) [0x7fa8d401ff15]
##trce## [01] /lib64/libbxibase.so.0(bxierr_new+0x99) [0x7fa8d4020239]
##trce## [02] /tmp/tmpya_j8fyx.out() [0x400d31]
##trce## [03] /tmp/tmpya_j8fyx.out() [0x400e49]
##trce## [04] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fa8d3c77b35]
##trce## [05] /tmp/tmpya_j8fyx.out() [0x400b59]
##trce## Backtrace end
First time seen and reported error: ##code## 1
##mesg## An error occured while reading/writing
##trce## Backtrace of tid 106: 6 function calls 
##trce## [00] /lib64/libbxibase.so.0(bxierr_backtrace_str+0x195) [0x7fa8d401ff15]
##trce## [01] /lib64/libbxibase.so.0(bxierr_new+0x99) [0x7fa8d4020239]
##trce## [02] /tmp/tmpya_j8fyx.out() [0x400d31]
##trce## [03] /tmp/tmpya_j8fyx.out() [0x400e49]
##trce## [04] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fa8d3c77b35]



The `main()` illustrates the *error chaining pattern*: errors from function `f()` and  `g()` are not ignored; the error-handling code --- in this simple case, producing a report thanks to `bxierr_report()` --- is at the end of the program. If `f()` produces an error --- what will always happen in this simple case --- it will be the cause of the error produced by `g()` if any. Basically, an error can be dealt with in three different ways:

* you report it directly using bxierr_report or even better, using `BXILOG_REPORT` as we will see in next section, or with `fprintf()` or whatever reporting system (note: the error is normally destroyed after this step);
* you pass it to the caller using a simple `return` statement;
* you try to continue anyway, in this case, you chain using `BXIERR_CHAIN()`.

Function `f()` uses the `bxierr_list_p` structure to store the different errors and turn them into a normal `bxierr_p` object.

Function `g()` uses the `bxierr_set_p` structure to only store distinct errors: the idea is to prevent a software from producing many identical reports such as "Can't write to file ...: disk full". Such a report should only be reported once using this simple but powerful mechanism.

The `bxierr.h` module provides very limited reporting facility: it is therefore only useful when nothing else is available. In particular, the `bxilog.h` module is much preferable for reporting as we will see in [Logging System](Logging System.ipynb).