Skip to content

SATE 6 Classic DARPA CGC report

André Maroneze edited this page Feb 25, 2019 · 2 revisions

SATE 6 Classic Track - DARPA CGC test suite

This file contains an (ongoing) report about cases of undefined behavior and deviations from the ISO C standard found in the DARPA CGC corpus as part of the SATE 6 Classic Track.

These issues have been found with Frama-C/Eva, and manual inspection to propose patches for the code in some cases.

The issues were separated in the following categories:

  1. Syntactic issues: either current Frama-C limitations (e.g. uncommon situations related to flexible array members), or deviations from the C standard (e.g. usage of Clang's blocks extension)

  2. Unintentional undefined behavior in the challenges: each challenge contains a README.md file describing one or several intentional vulnerabilities in the code. The issues reported here seem unrelated to those vulnerabilities and are likely unintended bugs present in the code. When possible, a patch is suggested to fix the issue. Otherwise, the analysis is often unable to advance further.

  3. Intentional undefined behavior: in a few cases, the code deliberately violates the standard (often with comments indicating why). Such cases are not currently handled, although some patches could be proposed to ensure Frama-C/Eva could proceed with the analysis. Such patches would likely affect the intended exploits of such vulnerabilities.

  4. Possible undefined behavior due to absence of checks: some challenges contain code that does not perform some validation tests, which leads to possible undefined behavior under specific circumstances. It is not entirely clear whether such cases are intended (to make the vulnerabilities exploitable) or accidental. They are nevertheless listed as spots for possible improvements in the code.

  5. Undefined behavior in the "glue" code: in at least one file, there is a situation where the undefined behavior is present not in the challenges, but in the "glue" code written by Trail Of Bits to port the testcases to "common" operating systems: Windows, macOS and Linux.

Note: due to the large amount of test cases and manual effort required to analyze them, this report is not definitive. New cases may be periodically added to this report.

Syntactic issues

  • cgc_regexp.h: cannot parse: contains a field declared with a type containing a flexible array member

  • challenges/Corinth/lib/cgc_protocol.h: prototype for function cgc_protocol_with_recv_frame uses Clang's blocks extension, which is non-portable. The same also happens with the Query_Calculator challenge.

  • challenges/ShoutCTF/src/service.c:478, function main: The following expression performs arithmetic on a pointer to void, which is a non-C99 GNU extension:

      r = *(unsigned int *)secret_page ^ *(unsigned int *)&secret_page[20];
    

    Replacing void * with char * in the code should not change its behavior (since it is later cast into unsigned int * anyway).

  • challenges/middleware_handshake/src/cgc_modes.h: This program defines, in cgc_modes.h, a type mode_t which is a struct with several fields. There is an unfortunate collision with the mode_t type defined in POSIX, which is defined in sys/types.h and included by several other headers. Renaming this type to cgc_mode_t or some other name is sufficient to avoid possible collisions.

  • challenges/middleware_handshake/r_lcg.c and challenges/middleware_handshake/r_system.c: There are two incoherences in the renaming of global variables cgc_lcg_rng and cgc_system_rng: they have been renamed in the files which define and initialize them, but not in rng.c, which declares and uses them. Comparing the code with the original samples confirms that both had the same name. Otherwise, the program will try to dereference null pointers when using them (the variables in rng.c will be zero-initialized by default), leading to undefined behavior.

Unintentional undefined behavior in challenges

challenges/XStore/src/service.c

In function handle_delete, the following suspect code is present:

void handle_delete(size_t size) {

        ...

        free_object(data);

        for (i = 0; i < data->len; i++)

        {

            ...

        }

        ...

}

In this code, after calling free_object(data), the len field is used to free some inner elements of the data structure. This has not been confirmed 100% by Frama-C (due to instrumentation used to normalize code among all challenges), but it is very likely that len is being read from a dangling pointer, which could work in practice (since it has just been deallocated) but constitutes undefined behavior.

challenges/EternalPass/src/service.c

totalLen is declared in the stack, but not initialized before use, in total_len += len;. This is a UB, and totalLen is likely to contain a garbage value.

challenges/CNMP/src/joke.c:82

jokedb->count is read without having been initialized (jokedb is a stack variable local to main). This is a UB, and count is likely to contain a garbage value.

challenges/WhackJack/src/service.c:38

Local array playerInfoType players[8] leaves its elements uninitialized, then function add_player tests the first element of array player_name (before any initialization).

challenges/CGC_Image_Parser/src/fpai_image_data.c:244

In function cgc_fpai_read_nbits, local variable bits is and-ed without having been initialized: bits &= 0.

challenges/WhackJack/src/algorithms.c:144

Local variable hard is declared without initializer; a loop with a conditional branch optionally initializes it to 0, then it is tested. If the variable has not been set during the loop, the access is UB.

Fixing it (by initializing hard = 1) reveals another issue, later in the code.

challenges/WhackJack/src/player.c:40

An interesting issue in the following piece of code:

int i = 0;
while (playerList[i].player_name[0] != 0 && i < MAX_PLAYERS)
    ++i;
if (i == MAX_PLAYERS) {
    cgc_printf("Too many players\n");
    return -1;
}

The playerList array contains 8 (== MAX_PLAYERS) elements.

The order of the tests is inverted: i < MAX_PLAYERS is tested after accessing playerList[i], so when all players are filled, the loop will access playerList[8], causing an array out-of-bounds access and therefore UB.

This is visible in Frama-C thanks to the Red Alarms panel, since otherwise only a yellow alarm is emitted, due to the 7 first iterations being ok.

challenges/WordCompletion/src/main.c:199, function cgc_init

There is a non-conforming initialization of each byte in a memory page, after the cgc_gWords array is allocated. The array is allocated to its exact size (the number of words it will contain) and then each element is initialized to point to the corresponding word. However, after the loop, the array is iterated from its last position up to PAGE_SIZE/sizeof(char*), presumably to ensure the rest of the memory page contains nothing. However, since this memory region has never been explicitly allocated, such accesses lead to undefined behavior.

challenges/NarfRPN/src/rpncalc.c:679, function cgc_push

There is a signed overflow on a left shift ((0xff << 24)) which seems unintentional.

challenges/anagram_game/src/io.c:75, function cgc_write_int

In the loop, when bytes is 0, the code performs a right shift of a negative amount, which is UB.

challenges/ShoutCTF/src/service.c:478, function cgc_handle_view_challenge_detail

There are two goto fail in the code, one of which does not initialize variable err, which is accessed while printing an error message in the block following the label. In this code:

if (cgc_freaduntil(buf, sizeof(buf), '\n', cgc_stdin) < 0)
    goto fail;

It seems that the intention would have been to write instead:

if ((err = cgc_freaduntil(buf, sizeof(buf), '\n', cgc_stdin)) < 0)
    goto fail;

This requires moving the declaration of error_t err a few lines above its current location.

challenges/Estadio/src/handler.c:62, function HANDLER(rand)

(Unconfirmed: possibly a false positive due to instrumentation used to run Frama-C.)

Variable payload contains a field value pointing to buf near the end of the function. The call to cgc_free_frame(payload) deallocates the value field. The following call to cgc_deallocate thus performs a double free of payload.

challenges/hawaii_sets/src/service.c:162, function cgc_copy_element

This function allocates memory for a psetElement, which is pointer to a struct setElement, defined as:

struct setElement {
    char *value;
    int type;
};

Assuming a 32-bit architecture, struct setElement has sizeof 8, while psetElement, a pointer, has sizeof 4.

The allocated variable, copy, has its fields value and type set by the function. However, since only the 4 first bytes were allocated, instruction copy->type = element->type tries to write outside allocated memory and thus invokes undefined behavior.

The likely intended allocation should have been of a struct setElement instead of a psetElement, to ensure that both fields have their memory allocated.

challenges/Facilities_Access_Control_System/src/device.c:269, function cgc_GrantAccess

(This could be related to the intentional bug present in the code, but the description in README.md mentions a different function). The validation of the UserId variable in the code below misses value 128, which could lead to an out-of-bounds access, since cgc_Users has size 128:

// validate the userid
if (UserId > MAX_USERS) {
    return(0);
}
if (cgc_Users[UserId].Username[0] == '\0') {
    return(0);
}

challenges/Multi_Arena_Pursuit_Simulator/src/map.c:57, function cgc_newMap

There is a mismatch (related to a off-by-one error) between the memory allocated for maps and their usage. In the following code, both xWidth and yWidth equal 1. Therefore the allocation reserves space for a single int, but the loops iterate over 4 elements, triggering undefined behavior:

    if(!(map->data = cgc_malloc(xWidth * yWidth * sizeof(unsigned int))))
        cgc__terminate(ALLOCATE_ERROR);

    for(int c=0; c<=xWidth; c++) {
        for(int r=0; r<=yWidth; r++) {
            *(map->data + r*xWidth + c) = -1;
        }
    }

Allocating (xWidth+1) * (yWidth+1) * sizeof(unsigned int)) bytes fixes it.

challenges/Multicast_Chat_Server/src/subscription.c:205, function cgc_newChannel

The following part of the function allocates insufficient memory for the string (forgetting the terminating \0 character), meaning that the strcpy at the end overruns the buffer:

nameSize = cgc_strlen(name);
if(!(channel->name = cgc_malloc(nameSize))) {
    cgc_free(channel);
    return NULL;
}
cgc_memset(channel->name, 0, nameSize);
cgc_strcpy(channel->name, name);

For instance, the code is called with name pointing to "FLAG", so only 4 bytes are allocated, instead of 5.

challenges/Pattern_Finder/src/search_machine.c:143, function cgc_FindMatches

There is a missing branch in this function: a local variable Matches is declared and initialized to NULL, and then only referenced as the source argument of a call to cgc_memcpy (it is supposed to be copied into NewMatches, during growth reallocation). Since Matches is never assigned after it has been initialized, it still points to NULL during the call to memcpy, triggering undefined behavior. The intended code would likely check if Matches were still null (in which case it would simply assign it to NewMatches), otherwise do what is currently present in the code: copy the data in Matches to NewMatches, then free Matches, before updating it to the newly allocated zone.

challenges/Document_Rendering_Engine/src/service.c:1279, function cgc_initMacros

Function cgc_initMacros(Macro** macro_ptr) dereferences the macro_ptr pointer. It points to macros, a local variable in the caller which has not been initialized, thus triggering undefined behavior. Initializing macros to NULL should avoid the issue.

challenges/router_simulator/src/main.c:596, function cmd_query_route

The expression 1 << 31 overflows on machines with 32-bit integers. The same happens in function route_bit, and similar situations in other functions. In most cases, replacing 1 with 1U, to ensure the operations are performed on unsigned integers, avoids the problem.

challenges/router_simulator/src/main.c:138, function mask_to_length

This function relies on undefined behavior to work. For completeness, we present below the entire code of the function:

static int mask_to_length(unsigned int mask)
{
    int length;
    if (mask == 0)
        return 0;
    if (mask == 0xFFFFFFFF)
        return 32;
    for (length = 0; length <= 32; length++)
        if (0xFFFFFFFF << (32 - length) == mask)
            break;
    return length;
}

The undefined behavior happens inside the loop, as soon as the if is reached:

    if (0xFFFFFFFF << (32 - length) == mask)
        break;

By reading the code of the function, we assume the following intention: it is supposed to return the length, from left to right, of the number of bits set to 1 in mask. Valid masks consist exclusively of 1's followed by 0's. Therefore, there are 33 valid values (0x00000000, 0x80000000, 0xC0000000, etc.) and all other values are invalid, in which case the function returns 33.

However, for the code to work, it requires that, whenever a number is left-shifted, higher-order bits are discarded and lower-order bits are replaced with 0. For instance, given as input the mask 0xFFFFFFFE, the 32nd loop iteration will test 0xFFFFFFFF << (32 - 31), which is 0xFFFFFFFF << 1, which is, according to the intended logic, 0xFFFFFFE. This is equal to the provided mask, therefore we exit the loop and return 31.

Unfortunately, C99 does not provide such a portable shift operator with the desired behavior; the result is that, when compiling the code with a modern GCC, it is capable of statically inferring that 0xFFFFFFFF << (32 - length) results in undefined behavior for all values of length, and therefore decides to simply return true, making the function return 0 for every value of mask except 0xFFFFFFFF, which is handled before the loop.

Here, reverse engineering the function behavior (since it is undocumented) is relatively easy, but it cannot be reliably done via testing, since each compiler may decide on a different behavior: even exhaustive testing of the 232 possible values may lead to erroneous results, because the test oracle may vary depending on the compilation flags and on the compiler version.

Here is a modified version of the function which does not trigger undefined behavior:

int ub_free_mask_to_length(unsigned int mask) {
  int length;
  if (mask == 0) return 0;
  if (mask == 0xFFFFFFFF) return 32;
  for (length = 1; length < 32; length++)
    if (~((1U << (32 - length))-1) == mask)
      break;
  return length >= 32 ? 33 : length;
}

It is not so clear to read, but it only performs shifts smaller than 32, and the value never overflows. A subtraction followed by a bitwise negation produce the required mask pattern.

Similar patches are necessary in other parts of the code; for instance, functions route_mask, cmd_add_route and cmd_query_route also rely on the same style of left shifts that always overflow.

challenges/middleware_handshake/src/c_faith.c:90, function fK

Another case of a left shift which overflows, due to the fact that f0 has type unit8_t, and therefore in the expression f0 << 24 it is promoted to signed int and not unsigned int. Adding a cast to unsigned is sufficient to prevent the UB:

return ((unsigned)f0 << 24) | (f1 << 16) | (f2 << 8) | f3;

challenges/middleware_handshake/src/bn.c:122, function cgc_bn_from_bytes

Same situation as the previous one:

x |= bytes[i-1] << 24;

bytes has type char *, promoted to int, leading to possible overflow. Casting bytes[i-1] to unsigned once again prevents UB from happening.

Another very similar situation happens in bn.c:80, function cgc_bn_length:

if (d & (1 << (i - 1)))

Here, the first 1 forces type int for the shift expression, leading to UB when i == 32. Adding a U suffix to the constant prevents UB from happening.

Intentional undefined behavior

Besides the "accidental UB" cases above, there is at least one case of a deliberate UB, which is problematic for Frama-C.

challenges/TAINTEDLOVE/src/service.c:299

// Constify via float
float infinity = 1.0 / 0.0;
// NOTE: this should always constify to 0, but *technically* the behavior
// is undefined.
rx_buf[FLOAT_CONST_OFF] = (unsigned char)((float)rx_buf[FLOAT_CONST_OFF] * infinity);

Indeed, the multiplication of infinity with a value containing zero and non-zero results in either infinity or NaN; casting that to an integer value always results in undefined behavior.

challenges/Character_Statistics/src/charstats.c

Manipulates hardcoded addresses to memory zones which were not allocated, to more easily trigger intended segmentation faults. This prevents Frama-C from analyzing the program, since it assumes the strict C semantics from the standard.

Possible undefined behavior due to absence of checks

In some cases, undefined behavior seems possible in the code, due to absence of defensive programming. There is no guarantee that any given execution will reach it, but good programming practice should perform extra checks to avoid it. Such checks are also very helpful for analysis tools to rule out such possibilities.

challenges/Griswold (several files)

The comments of many functions, for instance in assemble.c, state that they may return NULL in case of elements not found. However, their callers never check for this possibility. One of the intended vulnerabilities in the code does mention dereferencing of null pointers, which might explain why no checks are present. However, it's not clear if all such checks are related to it, or if some of them were forgotten.

challenges/Pipelined/cb_1/src/service.c

cgc_get_function_by_position may return -1 if the service is not found, but the caller never checks for this possibility, using the index directly.

Undefined behavior in the "glue" code

There is also a case of a very suspicious code that appears in the "glue" code developed by Trail Of Bits, to port the challenges from the DECREE operating system to a Linux.

In include/ansi_x931_aes128.c, function cgc_gen_block, we have:

uint8_t i = BLOCK_SIZE - 1;
while (i >= 0) {
    ++prng->state.datetime[i];
    if (prng->state.datetime[i] != 0) break;
    i -= 1;
}

Because i is unsigned and always fits into an int, the while condition is equivalent to while (1), which is not necessarily the authors' intention. In particular, if the loop wraps around (supposing all prng values just got to zero), then i will become 255, and the access to state.datetime[255] will lead to UB.

Clone this wiki locally
You can’t perform that action at this time.