Skip to content

New pointer and array types

Katherine Kjeer edited this page May 14, 2021 · 3 revisions

Summary

The Checked C extension adds new array and pointer types. The new types include:

  • Single and multi-dimensional arrays with bounds checking. These are specified by placing the keyword _Checked before the dimension of the array. For example, int x _Checked[10] declares a 10-element integer array with bounds checking.
  • The _Ptr type for pointers to single objects. Pointer arithmetic is not allowed on these types. For example, _Ptr<int> p declares a pointer to a single object.
  • The _Array_ptr type for pointers to elements of arrays. Pointer arithmetic is allowed on these types. When _Array_ptr pointers are used to access memory, they are bounds checked and checked for non-nullness. Programmers must declare the bounds of _Array_ptr variables to be able to use them to access memory. The compiler constructs bounds for _Array_ptr typed expressions based on the bounds for variables.
  • The _Nt_array_ptr type for pointers to elements of null-terminated arrays. These are similar to _Array_ptr types, with two differences. The element just beyond the declared bounds can be read or have a zero written to it. If the element is non-null, the bounds for the pointer can be widened by 1 element.

Checked single-dimensional arrays

A programmer can declare a checked array by prefixing the dimension of the array with the keyword _Checked. Here is an example:

double a _Checked[10]

This declares an array with 10 elements. Accesses to this array will be bounds checked at runtime. The index to the array must be be >= 0 and < 10.
If a runtime bounds check fails, a runtime error is signalled. This typically causes the program to exit immediately:

a[0] = e;   // works fine.
a[-1] = e;  // program exits

Existing C array types do not check out-of-bounds array accesses, so they are called unchecked array types. In contrast to Checked C, they have undefined behavior when an out-of-bounds access occurs (typically they read or write the memory location that would correspond to the out-of-bounds array element, unless the memory is not mapped by the operating system).

Checked multi-dimensional arrays

Multi-dimensional arrays in C are simply arrays of arrays. The _Checked prefix can be placed before the first array dimension size. It propagates to the nested arrays in the declaration.

// checked 10 dimensional array of checked 5 element arrays
double m _Checked[10][5];
m[0][0] = e;      // works fine.
int x = m[-1][0]; // program exits

The prefix can be specified explicitly if desired:

double m _Checked[10]_Checked[5];  // equivalent to original declaration.

The prefix does not propagate into typedef'ed types. It only propagates to arrays declared as part of the declaration.

The runtime bounds checking checks that multi-dimensional array accesses are within the entire bounds of the array. It does not check that accesses for an individual dimension are within the range for that dimension. This allows runtime bounds checking to be efficient for multi-dimensional arrays, while preserving type safety. (in our experience, the cost of checking individual dimensions can cause performance problems in Java and C# programs.)

m[-1][10];   // runtime check does not fail because this accesses m[0][0]

A multi-dimensional array is either checked or unchecked. If an array has an element type that is also an array, the array and the element array both must be checked or unchecked. If a typedef'ed array type is used an element of another array, the checkedness must match.

typedef double vec[10];
vec a _Checked[10];   // incorrect - mismatch in checked property.

Null-terminated checked single-dimensional arrays

Strings in C are arrays of characters that are null-terminated. A programmer can declare a checked null-terminated array type by prefixing the dimension with _Nt_checked:

char s _Nt_checked[6] = "hello";

The array element type of an _Nt_checked array must be an integer type, an enumeration type, or a pointer type. The null terminator is defined to be 0:

int a _Nt_checked[5];    // OK
double b _Nt_checked[5]; // compile-time error
struct S {
    int x;
    int y;
};
S _Nt_checked[5];        // compile-time error

The bounds checking for null-terminated array accesses is slightly different than for regular arrays. All elements of the array can be read. For all elements of the array except the last element, any value can be written. For the element, only a null value can be written:

char c;
c = s[0];   // OK
c = s[5];   // OK.
s[0] = 'c'; // OK
s[5] = 'c'; // Program exits. Cannot ovewrite last element with non-zero value.

By themselves, null-terminated array types are not terribly useful. The sizes of the arrays are known, after all. Their usefulness will become apparent when we discuss pointers to null-terminated arrays.

Pointer types without pointer arithmetic

There are many uses of pointers in C programs where pointers are just that: pointers. The pointer may point to a single object or be null. Checked C provides a new pointer type _Ptr that declares a pointer to a singleton object. The same syntax as C++ templates is used to declare these pointers:

int x = 0;
_Ptr<int> x = &x;

struct binary_tree_node {
  int val;
  _Ptr<struct binary_tree_node> left, right;
};

Pointer arithmetic is not allowed on these pointers:

 x = x + 1;  // compile-time error: pointer arithmetic not allowed on _Ptr

Measurements show that 30-60% of pointers declared in C programs could be declared as _Ptr types.

A runtime null check is done before accessing memory with a _Ptr type. If the check fails, a runtime error is signalled. This typically causes the program to exit immediately.

_Ptr<int> p = 0;
*p = 0;      // program exits
int y = *p;  // program exits

Pointer types with pointer arithmetic

In C, most uses of array variables are converted implicitly to pointers (there are a few cases, such as the & and sizeof operators, where they aren't). It makes sense to allow pointer arithmetic on such pointers. Checked C provides a new pointer type _Array_ptr that declares a pointer to an array element. Pointer arithmetic is allowed on these pointers.

int a[5];
_Array_ptr<int> p = a;
_Array_ptr<int> t = p + 1;

A programmer must declare the bounds of an _Array_ptr variable to be able to access memory using the pointer or a pointer derived from the pointer. We'll explain bounds declarations in detail elsewhere. We introduce a simple bounds declaration here so that we can explain how _Array_ptrs work. The count bounds expression declares the number of array elements that a pointer points to:

int a[5] = { 0, 1, 2, 3, 4};
_Array_ptr<int> p : count(5) = a;  // p points to 5 elements.

The bounds are part of the declaration of a variable and are placed after the declarator using a :. A count of 5 means that it is valid to access memory at or above p and below p + 5 (the lower bound is included and the upper bound is excluded).

p     // valid to access
p + 4 // valid to access
p + 5 // not valid to access

When the indirection operator (*) is used to access memory and the pointer value is an _Array_ptr, the memory access is bounds checked. The compiler constructs expressions that represent at runtime the upper and lower bounds of memory that can be accessed using the ponter. Those bounds are use to check that the memory access is in range.

If the operand of the * operator is a variable, the compiler uses the declared bounds for the variable:

int y = *p;  // bounds of p are (p, p + 5), so this access is  OK.
*p = y + 1;  // same here.

If the operand is a pointer arithmetic expression, the compiler uses the bounds of the pointer-typed subexpression. Pointer arithmetic does not change the range of memory that is valid to access.

For example, given the operand p + i, it has the same bounds as p, these bounds will be used to bounds check the memory access:

int sum = 0;
for (int = 0; i < 5; i++) {
  int v = *(p + i);  // check that p + i is within (p, p + 5)
  sum += v;
}

If there was on off-by-one error in the for-loop, the program would terminate when i == 5.

for (int = 0; i < 6; i++) {
  int v = *(p + i);  // check that p + i is within (p, p + 5)
  ...
}

Checking array accesses

In C, arrays are converted implicitly to pointers at most uses. When a _Checked array is converted to a pointer, it is converted to an _Array_ptr. The computed bounds for the converted value are the bounds of the _Checked array:

int a _Checked[5];
 ... a ...  // bounds are (a, a + 5)

C defines e1[e2] to be the same as *(e1 + e2). Given:

int a _Checked[5];
a[i] = ...

this is the same as:

int a _Checked[5];
*(a + i) = ...

Because * is a memory access using a checked pointer, it will be bounds checked. The calculated bounds of a + i are (a, a + 5)`, leading to:

int a _Checked[5];
*(a + i) = ...  // check that a + i is within (a, a + 5).

Pointers to null-terminated arrays

C represents strings using null-terminated arrays of characters. A null-terminated array has a null element stored in its last elements. For strings, this is a null character. For strings, this avoids the need for a length field. C uses "pointer to character" types as string types, w with the convention that these pointers point to elements of null-terminated arrays.

This is a problem because "pointer to character" types can also point to elements of non-null-terminated arrays. A confusion about which convention applies to a specific pointer can result in a buffer overrun. A null terminator can be missing, if a pointer to a non-null terminated array is passed to a function expecting a null-terminated array. A null-terminator can be overwrriten if the reverse happens.

Strings are fundamental enough to C that the Checked C extension introduces a special pointer type for pointers to null-terminated arrays called _Nt_array_ptr. This way, instead of relying on programming conventions, the information is represented directly in the program and can be checked at compile time. A use of an _Nt_checked array converts implicitly to an _Nt_array_ptr:

char s _Nt_checked[] = "hello";
s[0] = 'c';  // s converts to a _Nt_checked array.

If a programmer does not declare bounds for an Nt_array_ptr variable, it has a bounds of count(0). The bounds checking rules for _Nt_array_ptr differ from the rules for _Array_ptr. Given a bounds of (lower, upper),

  • The memory locations that are >= lower and < upper can be read or written just like a regular _Array_ptr memory locations.
  • The memory location at upper can be read and the null value can be written to it.

When a use of an_Nt_checked array is converted implicitly to an _Nt_array_ptr, the count is one less than the number of elements in the array. The null element is excluded.

Here are some examples:

char s _Nt_checked[1] = "";
char t _Nt_checked[6] = "hello";
_Nt_array_ptr<char> m : count(0) = s; 
_Nt_array_ptr<char> n = t;   // defaults to count(0)
_Nt_array_ptr<char> p : count(5) = t;
if (*m) { }  // reading element at upper bound allowed
*p = 'c';    // OK.
*m = 'a';    // error - writing non-null value at upper bound.
p[5] = 'a';  // error - writing non-null value at upper bound.

_Nt_array_ptr pointers have a special property that _Array_ptr pointers do not: their bounds can be widened. If control flow establishes that the value at the upper bound is non-null, the bounds can be widened by 1.

_Nt_array_ptr<char> get_id_at(_Nt_array_ptr<char> str : count(i), unsigned i) : count(3) {
  if (*(str + i) && *(str + i + 1) && *(str + i + 2)) {
    return str + i;
  }
  return NULL;
}