Skip to content

Latest commit

 

History

History
637 lines (484 loc) · 27.8 KB

DIP1035.md

File metadata and controls

637 lines (484 loc) · 27.8 KB

@system Variables

Field Value
DIP: 1035
Review Count: 2
Author: Dennis Korpel (dkorpel@gmail.com)
Paul Backus
Implementation:
Status: Final Review

Abstract

The memory-safety of a program depends on the ability of the programmer and the language implementation to maintain the run-time invariants of the program's data.

The D compiler is aware of the run-time invariants of built-in types, like arrays and pointers, and can use compile-time checks to ensure they are maintained. These checks are not always sufficient for user-defined types. In order to reliably maintain invariants beyond those of which the compiler has hard-coded knowledge, D programmers must resort to manual verification of @safe code and defensive run-time checks.

This DIP proposes a new language feature, @system variables, to address this lack of expressiveness in D's memory-safety system. In @safe code, @system variables cannot be directly written to and cannot have their values altered in uncontrolled ways via casting, overlapping, void-initialization, etc. As such, they can be relied upon to store data subject to arbitrary run-time invariants.

Contents

Background

D's memory safety system distinguishes between safe values, which can be used freely in @safe code without causing undefined behavior, and unsafe values, which cannot. A type that has only safe values is a safe type; one that has both safe and unsafe values is an unsafe type. (For more detailed definitions of these and other related terms, refer to the Function Safety section of the D language spec.)

The D compiler has built-in knowledge of which types are safe and which are not. In broad terms, pointers, arrays, and other reference types are unsafe; integers, characters, and floating-point numbers are safe; and the safety of aggregate types is determined by the safety of their members.

A run-time invariant (or just "invariant") of a type is a rule that distinguishes between that type's safe and unsafe values. (N.B. "invariant" in this DIP is not referring to invariant blocks in contract programming) The values that satisfy the invariant are safe; those that do not are unsafe. It follows that any type with a run-time invariant is unsafe, and that a safe type has no run-time invariants.

To ensure that their invariants are not violated, the use of unsafe types is restricted in @safe code:

  • They cannot be void-initialized.
  • They cannot be overlapped in a union.
  • A T[] cannot be cast to a U[] when U is an unsafe type.

Rationale

Though the system described above works well for built-in types and their invariants, it does not provide any way for the programmer to indicate that a user-defined type has additional invariants of which the compiler may not be aware. As a result, maintaining such invariants requires extra effort from the programmer. For unsafe types, the programmer may be required to manually verify that those invariants are maintained in @safe code. For safe types, the programmer may additionally be required to insert defensive run-time checks to ensure that those invariants are maintained.

Example: User-Defined Slice

module intslice;

struct IntSlice
{
    private int* ptr;
    private size_t length;

    @safe
    this(int[] src)
    {
        ptr = &src[0];
        length = src.length;
    }

    @trusted
    ref int opIndex(size_t i)
    {
        if (i >= length) assert(0);
        return ptr[i];
    }
}

Invariant: The value of length must be equal to the length of the array pointed to by ptr.

First, observe that this code is memory-safe as-written (modulo bugs in the compiler). There are only two functions that directly access ptr and length, and both correctly maintain the invariant.

However, in order to prove that this code is memory-safe, it is not sufficient for the programmer to verify the correctness of its @trusted functions. Instead, every function that touches ptr and length must be checked manually.

If ptr and length were @system variables, then all code that directly accesed them would have to be @trusted, and the programmer would not need to manually verify any @safe code in order to prove that IntSlice's invariant is maintained.

The same general pattern occurs with other user-defined types whose invariants involve the relationship between two or more variables, such as tagged unions and reference-counted smart pointers.

Example: Short String

module shortstring;

struct ShortString
{
    private ubyte length;
    private char[15] data;

    @safe
    this(const(char)[] src)
    {
        assert(src.length <= data.length);

        length = cast(ubyte) src.length;
        data[0 .. src.length] = src[];
    }

    @trusted
    const(char)[] opIndex() const
    {
        // should be ok to skip the bounds check here
        return data.ptr[0 .. length];
    }
}

Invariant: length <= 15

Once again, there is a constructor that establishes an invariant, and a member function that relies on the invariant to do its work. Unlike in the previous example, however, this code is not memory-safe as-written, though it may appear to be at first glance.

To understand why, consider the following program, which uses ShortString to cause undefined behavior in @safe code:

@safe
void main()
{
    import shortstring;
    import std.stdio;

    ShortString oops = void;
    writeln(oops[]);
}

void-initializing a ShortString will very likely produce an instance that violates its invariant. Because opIndex relies on that invariant to skip the bounds check, this results in an out-of-bounds memory access rather than a safe, predictable crash.

Why does the compiler allow a ShortString to be void-initialized in @safe code? Because, according to the rules in the language spec, a struct containing only ubyte and char data is a safe type, and therefore must not have any invariants. It follows that @safe code is free to initialize a ShortString to any value, including an unspecified one, without risking memory corruption.

In order to make this code memory-safe, the programmer must include an additional bounds check in opIndex:

@safe
const(char)[] opIndex() const
{
    return data[0 .. length];
}

This solution is unsatisfying: the program must do redundant work at run-time to compensate for the language's lack of expressiveness, or give up on the guarantees of @safe. If ShortString.length could be marked as @system, this dilemma would not exist.

The same general pattern occurs with other user-defined types that attempt to impose invariants on types the compiler considers "safe", such as enum types used in final switch statements and integer "handles" used as array indices by external libraries.

Example: int as pointer

Sometimes it is desirable to have scope semantics enforced on handles used by external libraries [1][2]. Examples of such types are:

These are represented as simple int or uint types, but act like pointers since they refer to a resource that can be allocated or freed. However, the scope is ignored when the type has no pointers. Because an int is a safe type, any int value can be created from @safe code, so any memory corruption that could follow from escaping a scope int could also result from creating the same int value without accessing the variable. Even when the int is wrapped in a struct, no scope checking is done:

struct File
{
    private int fd;
}

File gFile;

@safe
void escape(scope File f)
{
    gFile = f; // allowed
}

A workaround could be to put the handle in a union with a pointer, but that would unnecessarily increase the size of the struct to size_t.sizeof. It is desirable to express that int fd; should be treated like a pointer when it comes to restrictions in @safe code.

Initial value of global variables

Allowing fields of an aggregate to be marked @system helps the compiler maintain run-time invariants on user-defined types, but it is also important to ensure the variable was not constructed with an unsafe value to begin with. Constructing unsafe values in a @safe function is not allowed, and constructing them in @system or @trusted functions leaves the responsibility of memory safety up to the programmer. When accessing global variables of an unsafe type in a @safe function, the compiler should be either conservative and reject any access, or do basic taint-checking:

int* x = cast(int*) 0xDEADBEEF;
extern int* y;
int* z = new int(20);

void main() @safe
{
    *x = 10; // Not allowed
    *y = 10; // Not allowed
    *z = 10; // Maybe allowed
}

Since the initialization expression cast(int*) 0xDEADBEEF would not be allowed in a @safe function, and since the initial value of y is unknown, the compiler should annotate variables x and y as possibly containing an unsafe value, so they cannot be accessed in a @safe function. Only z is known to have a safe initial value in this case, so the compiler could allow access to it in @safe code.

Allowing @trusted and @safe to be applied to variables is useful when the programmer wants to relax the constraints, and applying @system is useful to tighten the constraints:

@trusted int* x = cast(int*) 0xD000; // Assumed to be a good address
@safe    extern int* y0; // Assumed to always have safe value
@system  extern int* y1; // May have unsafe value
@system int* z = new int(20); // Starts out safe, but may be set to unsafe value in @trusted code

enum Opt {a, b, c}
@system  Opt opt = Opt.a; // @trusted code relies on this being in range and not e.g. `cast(Opt) 100`

Prior work

The need for encapsulation of data / restricted access to data in order to achieve memory safety has been mentioned in several discussions:

Other languages

Many other languages either do not allow systems programming at all (e.g. Java, Python) or do not support language-enforced memory safety (e.g. C/C++).

A notable exception is Rust, where the equivalent of this DIP has been proposed multiple times: Unsafe fields #381

Some excerpts from the discussion there are:

OTOH, privacy is primarily intended for abstraction (preventing users from depending on incidental details), not for protection (ensuring that invariants always hold). The fact that it can be used for protection is basically an happy accident. To clarify the difference, C strings have no abstraction whatever - they are a raw pointer to memory. However, they do have an invariant - they must point to a valid NUL-terminated string. Every place that constructs such a string must ensure it is valid, and every place that consumes it can rely on it. OTOH, a safe, say, buffered reader needs abstraction but doesn't need protection - it does not hold any critical invariant, but may want to change its internal representation.

source

This doesn't seem very useful to me. Within a module I would expect the authors to know what they're doing, and the unit-tests to save them when they do not. For other users, you could simply introduce getters and setters, and functions/methods can already be marked unsafe.

source

Ultimately the proposal has not yet been accepted. The idea of using private instead of @system variables for D is discussed in the alternatives section. More information about Rust's stand on unsafe functions can be found here:

Description

Existing rules for @system

Before the proposed changes, here is an overview of the relevant existing rules of which declarations can have the @system attribute.

@system int w = 2; // compiles, does nothing
@system enum int x = 3; // compiles, does nothing
enum E
{
    @system x, // error: @system is not a valid attribute for enum members
    y,
}
@system alias x = E; // compiles, does nothing
@system template T() {} // compiles, does nothing

void func(@system int x) // error: @system attribute for function parameter is not supported
{
    @system int x; // compiles, does nothing
}
template Temp(@system int x) {} // error: basic type expected, not @

Any function attribute can be attached to a variable declaration, but they cannot be retrieved:

@system @nogc pure nothrow int x;
pragma(msg, __traits(getFunctionAttributes, x)); // Error: first argument is not a function
pragma(msg, __traits(getAttributes, x)); // tuple()

Proposed changes

(0) Accessing variables or fields marked @system is not allowed in @safe code

This includes both writing to and reading the variable. Though reading a @system variable could be allowed when it has a safe type, it was decided to restrict this as well to make the rules less complex.

Examples:

@system int x;

struct S
{
    @system int y;
}

S s;

@safe
void main()
{
    x += 10; // error: cannot modify @system variable 'x'
    s.y += 10; // error: cannot modify @system field 'y'
    int y = x; // error: cannot read @system variable 'x'
    @system int z;
    z += 1; // error: cannot modify @system variable 'z'
}

// inferred as a @system function
auto foo()
{
    x = 0;
}

Further operations disallowed in @safe code on a @system variable or field are:

  • creating a mutable pointer to it by using &
  • passing it as an argument to a function parameter marked ref without const
  • returning it by ref without const

When using an alias to a @system variable, that alias has the same restrictions as the symbol to which it aliases.

@system int x = 3;
alias xAlias = x;

void increment(ref int x) @safe
{
    x++;
}

void checkX(const(int)* x) @safe
{
    assert(*x < 10);
}

@safe
void main()
{
    xAlias += 1; // error, cannot modify `@system` variable `x`
    increment(xAlias); // error, cannot take mutable reference of `@system` variable `x`
    checkX(&x); // error, even though pointer is const and `typeof(x)` is a safe type.
}

Initialization of a @system variable or field is allowed in @safe code. This includes static initialization, the automatically generated constructor, user-defined constructors, and the .init value of a type.

@system int x;

shared static this() @safe
{
    x = 3; // allowed, this is initialization
    x = 3; // second time disallowed, this is assignment to a `@system` variable
}

struct T
{
    @system int y;
    @system int z = 3; // allowed
    this(int y, int z) @safe
    {
        this.y = y; // allowed, this is initialization
        this.y = y; // second time disallowed, this is assignment to a `@system` variable
        this.z = z; // disallowed, this is assignment
    }
}

struct S
{
    @system int y = 2;
}

void main() @safe
{
    S s0 = {y: 3}; // static initialization
    S s1 = S(3); // automatically generated constructor
    S s2 = S.init; // .init value
    S s3; // same as above
    s3 = s2; // disallowed
}

Note that while it may be desirable to require a @trusted annotation near initialization of @system variables, realizing this is problematic since there is no syntax for @trusted assignment. @trusted as a function annotation has its limitations:

  • it does not work for global or local variables since a @trusted lambda there would move the declaration to that function's scope.
  • it not only trusts initialization of the variable on the left-hand side of the =, but also the initialization expression on right-hand side. Using a @trusted function to return a variable by ref and assigning it does not count as initialization of that variable.
  • it disables the scope/return scope checks of -dip1000
struct S
{
    this(ref scope S s) @system
    {
        *(cast(int*) 0xDEADBEEF) = 0;
    }
}

struct Wrapper(T)
{
    @system T t;
    this(T t) @trusted
    {
        this.t = t; // Oops! Calls a `@system` copy constructor
    }
}

void main() @safe
{
    auto w = Wrapper!S(S.init); // program killed by signal 11

    () @trusted {@system int x = 3;}();
    // x is not in scope anymore
}

@system int x = (() @trusted => 3)(); // this still does not mark the assignment `@trusted`
//() @trusted {@system int x = 3;}(); // does not work

(1) An aggregate with at least one @system field is an unsafe type

Such an aggregate receives the same restrictions as pointer types in @safe code, making implicit writes to @system variables using e.g., array casting, impossible. The scope keyword is not stripped away, even when the aggregate has no members that contain pointers.

struct Handle
{
    @system int handle;
}

void main() @safe
{
    Handle h = void; // error
    union U
    {
        Handle h;
        int i;
    }
    U u;
    u.i = 3; // error

    ubyte[Handle.sizeof] storage;
    auto array = cast(Handle[]) storage[]; // error

    scope Handle h0;
    static Handle h1 = h0; // disallowed
}

(2) Variables and fields without annotation are @safe unless their initial value is not @safe

The rules regarding variables and fields are as follows:

  • An initialization expression x is @system when the function (() => x) is inferred as @system.
  • When marked @system, the result is always @system regardless of the type.
  • When marked @trusted, the initialization expression x is treated as (() @trusted => x).
  • When marked @safe, the initialization expression must be @safe.
  • In the absence of an annotation, the result is @system only if the type is unsafe and the initialization expression is @system.
int* getPtr() @system {return cast(int*) 0x8035FDF0;}
int  getVal() @system {return -1;}

extern int* x0;                   // @safe by default
int* x1 = x0;                     // @safe, (() => x0) is @safe
int* x2 = cast(int*) 0x8035FDF0;  // @system, (() => cast(int*) 0x8035FDF0) is @system
int* x3 = getPtr();               // @system, (() => getPtr()) is @system
int  x4 = getVal();               // @safe, int is not an unsafe type
@system int x5 = 1;               // @system as requested
@trusted int* x6 = getPtr();      // @safe, the getPtr call gets trusted
@safe int* x7 = getPtr();         // error: cannot initialize @safe variable with @system initializer

struct S {
    // same rules for fields:
    int* x9 = x3; // @system
    int  x8 = x5; // @safe
}

An exception to the above rules is made on unsafe types when the compiler knows the resulting value is safe.

int* getNull() pure @system {return null;}
int* n = getNull(); // despite unsafe type with @system initialization expression, inferred as @safe

Annotations with a scope (@system {}) or colon (@system:) affect variables just like they do functions.

@system
{
    int y0; // @system
}

@system:
int y1; // @system

Grammar changes

Placing @system annotations is already allowed in the places where it's needed for this DIP, so there is no grammar change.

Alternatives

Using private

It has been suggested before that bypassing private using e.g. .tupleof or __traits(getMember) should not be allowed in @safe code. While the need for giving a way of ensuring struct invariants in @safe code is in line with this DIP, the idea to use private for it is argued against.

First of all, disallowing bypassing private in @safe code is not sufficient for ensuring run-time invariants on user-defined types. When an aggregate has no members with an unsafe type, the private fields can still be indirectly written to via overlap in a union, void-initialization, or array casting.

Second, private only acts on the module level, so a @trusted member function cannot assume that a struct's invariants are upheld unless all other @safe code in the module has been manually certified not to violate them. This undermines the ability of the programmer to easily distinguish code requiring manual verification from code that can be checked automatically, especially since certain member functions like constructors, destructors, and operator overloads must be defined in the same module as the data on which they operate.

Finally, disallowing bypassing visibility with __traits(getMember, ...) or .tupleof would break @safe code that relied on this, and issue 15371 explicitly requested this behavior.

Using invariant blocks to specify unsafe types

Some have suggested that a struct can be made into an unsafe type by adding an invariant block.

struct Handle
{
    invariant
    {
        // no run-time checks, just marking `Handle` as an unsafe type
    }
    private int fd;
}

However, Contract Programming is currently a separate feature from Memory Safety, and an empty invariant {} block looks like something that can be safely removed. Suddenly introducing @safe restrictions and scope semantics to types with invariant blocks can be undesirable. On top of that, it still does not protect from modifications outside of @trusted code.

Breaking Changes and Deprecations

Attaching the @system attribute to variables is already permitted, but doing so adds no compiler checks. The additional checks for @system variables in this proposal can cause existing @safe code to break (note that @system code is completely unaffected by everything in this DIP). However, since @system on variables does not currently do anything, the author suspects that users generally do not add this attribute to any variables at all, let alone variables that are meant to be used in @safe code. The biggest risk here is that variables accidentily fall inside a @system {} block or under a @system: section.

@system:

int x; // suddenly not writable in @safe code anymore
void unsafeFuncA() {};
void unsafeFuncB() {};

void main() @safe
{
    x++; // not allowed anymore
}

Misconstructed pointers can be inferred @system under the new rules.

struct S
{
    int* a = cast(int*) 0x8035FDF0;
}

void main() @safe
{
    S s;
    *s.a = 0; // this gives an error now
}

Whenever this happens, there is a risk of memory corruption, so a compiler error would be in its place.

Still, a two-year deprecation period is proposed where instead of raising an error, a deprecation message is given whenever the new memory safety rules are broken. A preview flag -preview=systemVariables can additionally be added that immediately raises errors for violations while leaving other deprecation messages as warnings. At the end of the preview period, there will also be a flag to revert it, -revert=systemVariables, so that users can choose to keep the old behavior for a little longer.

Reference

Copyright & License

Copyright (c) 2020-2022 by the D Language Foundation

Licensed under Creative Commons Zero 1.0

Reviews

Community Review Round 1

Reviewed Version

Discussion

Feedback

In the Feedback Thread, most of the feedback was related to details such as terminology, whether to use assert(x) in the examples, etc.

The one structural piece of criticism was that making initialization of @system variables safe is unsound, to wit, "Memory safety cannot depend on the correctness of a @safe constructor." The DIP author replied that this boils down to "@trusted assumptions about @safe code", on which there is no consensus, and he has yet to determine a satisfactory design.

Of note, a detailed list of feedback was misplaced in the Discussion Thread. In short, the reviewer asserted that this proposal is essentially a response to bugs in the implementation of @safe, and those bugs should be fixed rather than a new feature added to the language. Subsequent discussion appears to have led to consensus among the particpants that the DIP is necessary.

Community Review Round 2

Reviewed Version

Discussion

Feedback

Only two items of actionable feedback were provided in the Feedback Thread:

  • The Rationale and Description appear to have a conflict: the Rationale states that module-level extern variables should be disallowed in @safe code, but a later example declares such "@safe by default". A DIP author responded that there is no conflict; the Rationale describes existing behavior, the example describes the behavior following implementation of this DIP.
  • A specifc quote from the DIP ("...when the compiler knows the resulting value is safe") leads one to the question of what the compiler considers to be safe; the DIP should elaborate on this. A DIP author replied that the language spec already provides that information.