Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

@system Variables

Field Value
DIP: 1035
Review Count: 3
Author: Dennis Korpel (dkorpel@gmail.com)
Paul Backus
Implementation:
Status: Accepted

Abstract

The memory-safety of a program depends on the ability of the programmer and the language implementation to maintain the run-time invariants of the program's data.

The D compiler is aware of the run-time invariants of built-in types, like arrays and pointers, and can use compile-time checks to ensure they are maintained. These checks are not always sufficient for user-defined types. In order to reliably maintain invariants beyond those of which the compiler has hard-coded knowledge, D programmers must resort to manual verification of @safe code and defensive run-time checks.

This DIP proposes a new language feature, @system variables, to address this lack of expressiveness in D's memory-safety system. In @safe code, @system variables cannot be directly written to and cannot have their values altered in uncontrolled ways via casting, overlapping, void-initialization, etc. As such, they can be relied upon to store data subject to arbitrary run-time invariants.

Contents

Background

D's memory safety system distinguishes between safe values, which can be used freely in @safe code without causing undefined behavior, and unsafe values, which cannot. A type that has only safe values is a safe type; one that has both safe and unsafe values is an unsafe type. (For more detailed definitions of these and other related terms, refer to the Function Safety section of the D language spec.)

The D compiler has built-in knowledge of which types are safe and which are not. In broad terms, pointers, arrays, and other reference types are unsafe; integers, characters, and floating-point numbers are safe; and the safety of aggregate types is determined by the safety of their members.

A run-time invariant (or just "invariant") of a type is a rule that distinguishes between that type's safe and unsafe values. (N.B. "invariant" in this DIP is not referring to invariant blocks in contract programming) The values that satisfy the invariant are safe; those that do not are unsafe. It follows that any type with a run-time invariant is unsafe, and that a safe type has no run-time invariants.

To ensure that their invariants are not violated, the use of unsafe types is restricted in @safe code:

  • They cannot be void-initialized.
  • They cannot be overlapped in a union.
  • A T[] cannot be cast to a U[] when U is an unsafe type.

Rationale

Though the system described above works well for built-in types and their invariants, it does not provide any way for the programmer to indicate that a user-defined type has additional invariants of which the compiler may not be aware. As a result, maintaining such invariants requires extra effort from the programmer. For unsafe types, the programmer may be required to manually verify that those invariants are maintained in @safe code. For safe types, the programmer may additionally be required to insert defensive run-time checks to ensure that those invariants are maintained.

Example: User-Defined Slice

module intslice;

struct IntSlice
{
    private int* ptr;
    private size_t length;

    @safe
    this(int[] src)
    {
        ptr = &src[0];
        length = src.length;
    }

    @trusted
    ref int opIndex(size_t i)
    {
        if (i >= length) assert(0);
        return ptr[i];
    }
}

Invariant: The value of length must be equal to the length of the array pointed to by ptr.

First, observe that this code is memory-safe as-written (modulo bugs in the compiler). There are only two functions that directly access ptr and length, and both correctly maintain the invariant.

However, in order to prove that this code is memory-safe, it is not sufficient for the programmer to verify the correctness of its @trusted functions. Instead, every function that touches ptr and length must be checked manually.

If ptr and length were @system variables, then all code that directly accesed them would have to be @trusted, and the programmer would not need to manually verify any @safe code in order to prove that IntSlice's invariant is maintained.

The same general pattern occurs with other user-defined types whose invariants involve the relationship between two or more variables, such as tagged unions and reference-counted smart pointers.

Example: Short String

module shortstring;

struct ShortString
{
    private ubyte length;
    private char[15] data;

    @safe
    this(const(char)[] src)
    {
        assert(src.length <= data.length);

        length = cast(ubyte) src.length;
        data[0 .. src.length] = src[];
    }

    @trusted
    const(char)[] opIndex() const
    {
        // should be ok to skip the bounds check here
        return data.ptr[0 .. length];
    }
}

Invariant: length <= 15

Once again, there is a constructor that establishes an invariant, and a member function that relies on the invariant to do its work. Unlike in the previous example, however, this code is not memory-safe as-written, though it may appear to be at first glance.

To understand why, consider the following program, which uses ShortString to cause undefined behavior in @safe code:

@safe
void main()
{
    import shortstring;
    import std.stdio;

    ShortString oops = void;
    writeln(oops[]);
}

void-initializing a ShortString will very likely produce an instance that violates its invariant. Because opIndex relies on that invariant to skip the bounds check, this results in an out-of-bounds memory access rather than a safe, predictable crash.

Why does the compiler allow a ShortString to be void-initialized in @safe code? Because, according to the rules in the language spec, a struct containing only ubyte and char data is a safe type, and therefore must not have any invariants. It follows that @safe code is free to initialize a ShortString to any value, including an unspecified one, without risking memory corruption.

In order to make this code memory-safe, the programmer must include an additional bounds check in opIndex:

@safe
const(char)[] opIndex() const
{
    return data[0 .. length];
}

This solution is unsatisfying: the program must do redundant work at run-time to compensate for the language's lack of expressiveness, or give up on the guarantees of @safe. If ShortString.length could be marked as @system, this dilemma would not exist.

The same general pattern occurs with other user-defined types that attempt to impose invariants on types the compiler considers "safe", such as enum types used in final switch statements and integer "handles" used as array indices by external libraries.

Initial value of global variables

Allowing fields of an aggregate to be marked @system helps the compiler maintain run-time invariants on user-defined types, but it is also important to ensure the variable was not constructed with an unsafe value to begin with. Constructing unsafe values in a @safe function is not allowed, and constructing them in @system or @trusted functions leaves the responsibility of memory safety up to the programmer. When accessing global variables of an unsafe type in a @safe function, the compiler should be either conservative and reject any access, or do basic taint-checking:

int* x = cast(int*) 0xDEADBEEF;
extern int* y;
int* z = new int(20);

void main() @safe
{
    *x = 10; // Not allowed
    *y = 10; // Not allowed
    *z = 10; // Maybe allowed
}

Since the initialization expression cast(int*) 0xDEADBEEF would not be allowed in a @safe function, and since the initial value of y is unknown, the compiler should annotate variables x and y as possibly containing an unsafe value, so they cannot be accessed in a @safe function. Only z is known to have a safe initial value in this case, so the compiler could allow access to it in @safe code.

Allowing @trusted and @safe to be applied to variables is useful when the programmer wants to relax the constraints, and applying @system is useful to tighten the constraints:

@trusted int* x = cast(int*) 0xD000; // Assumed to be a good address
@safe    extern int* y0; // Assumed to always have safe value
@system  extern int* y1; // May have unsafe value
@system int* z = new int(20); // Starts out safe, but may be set to unsafe value in @trusted code

enum Opt {a, b, c}
@system  Opt opt = Opt.a; // @trusted code relies on this being in range and not e.g. `cast(Opt) 100`

Prior work

The need for encapsulation of data / restricted access to data in order to achieve memory safety has been mentioned in several discussions:

Other languages

Many other languages either do not allow systems programming at all (e.g. Java, Python) or do not support language-enforced memory safety (e.g. C/C++).

A notable exception is Rust, where the equivalent of this DIP has been proposed multiple times: Unsafe fields #381

Some excerpts from the discussion there are:

OTOH, privacy is primarily intended for abstraction (preventing users from depending on incidental details), not for protection (ensuring that invariants always hold). The fact that it can be used for protection is basically an happy accident. To clarify the difference, C strings have no abstraction whatever - they are a raw pointer to memory. However, they do have an invariant - they must point to a valid NUL-terminated string. Every place that constructs such a string must ensure it is valid, and every place that consumes it can rely on it. OTOH, a safe, say, buffered reader needs abstraction but doesn't need protection - it does not hold any critical invariant, but may want to change its internal representation.

source

This doesn't seem very useful to me. Within a module I would expect the authors to know what they're doing, and the unit-tests to save them when they do not. For other users, you could simply introduce getters and setters, and functions/methods can already be marked unsafe.

source

Ultimately the proposal has not yet been accepted. The idea of using private instead of @system variables for D is discussed in the alternatives section. More information about Rust's stand on unsafe functions can be found here:

Description

Existing rules for @system

Before the proposed changes, here is an overview of the relevant existing rules of which declarations can have the @system attribute.

@system int w = 2; // compiles, does nothing
@system enum int x = 3; // compiles, does nothing
enum E
{
    @system x, // error: @system is not a valid attribute for enum members
    y,
}
@system alias x = E; // compiles, does nothing
@system template T() {} // compiles, does nothing

void func(@system int x) // error: @system attribute for function parameter is not supported
{
    @system int x; // compiles, does nothing
}
template Temp(@system int x) {} // error: basic type expected, not @

Any function attribute can be attached to a variable declaration, but they cannot be retrieved:

@system @nogc pure nothrow int x;
pragma(msg, __traits(getFunctionAttributes, x)); // Error: first argument is not a function
pragma(msg, __traits(getAttributes, x)); // tuple()

Proposed changes

(0) Accessing variables or fields marked @system is not allowed in @safe code

Even though read-only access of a @system variable with a safe type could still be allowed without breaking @safe, it is decided to restrict any 'access' (as defined by the specification) for simplicity.

Examples:

@system int x;

struct S
{
    @system int y;
}

S s;

@safe
void main()
{
    x += 10; // error: cannot modify @system variable 'x'
    s.y += 10; // error: cannot modify @system field 'y'
    int y = x; // error: cannot read @system variable 'x'
    @system int z;
    z += 1; // error: cannot modify @system variable 'z'
}

// inferred as a @system function
auto foo()
{
    x = 0;
}

When using an alias to a @system variable, that alias has the same restrictions as the symbol to which it aliases.

Initialization of a @system variable or field is allowed in @safe code. This includes static initialization, the automatically generated constructor, user-defined constructors, and the .init value of a type.

@system int x;

shared static this() @safe
{
    x = 3; // allowed, this is initialization
    x = 3; // second time disallowed, this is assignment to a `@system` variable
}

struct T
{
    @system int y;
    @system int z = 3; // allowed
    this(int y, int z) @safe
    {
        this.y = y; // allowed, this is initialization
        this.y = y; // second time disallowed, this is assignment to a `@system` variable
        this.z = z; // allowed
    }
}

struct S
{
    @system int y = 2;
}

void main() @safe
{
    S s0 = {y: 3}; // static initialization
    S s1 = S(3); // automatically generated constructor
    S s2 = S.init; // .init value
    S s3; // same as above
    s3 = s2; // disallowed
}

Note that while it may be desirable to require a @trusted annotation near initialization of @system variables, realizing this is problematic since there is no syntax for @trusted assignment. @trusted as a function annotation has its limitations:

  • it does not work for global or local variables since a @trusted lambda there would move the declaration to that function's scope.
  • it not only trusts initialization of the variable on the left-hand side of the =, but also the initialization expression on right-hand side. Using a @trusted function to return a variable by ref and assigning it does not count as initialization of that variable.
  • it disables the scope/return scope checks of -dip1000
struct S
{
    this(ref scope S s) @system
    {
        *(cast(int*) 0xDEADBEEF) = 0;
    }
}

struct Wrapper(T)
{
    @system T t;
    this(T t) @trusted
    {
        this.t = t; // Oops! Calls a `@system` copy constructor
    }
}

void main() @safe
{
    auto w = Wrapper!S(S.init); // program killed by signal 11

    () @trusted {@system int x = 3;}();
    // x is not in scope anymore
}

@system int x = (() @trusted => 3)(); // this still does not mark the assignment `@trusted`
//() @trusted {@system int x = 3;}(); // does not work

(1) An aggregate with at least one @system field is an unsafe type

Such an aggregate receives the same restrictions as pointer types in @safe code, making implicit writes to @system variables using e.g., array casting, impossible.

struct Handle
{
    @system int handle;
}

void main() @safe
{
    Handle h = void; // error
    union U
    {
        Handle h;
        int i;
    }
    U u;
    u.i = 3; // error

    ubyte[Handle.sizeof] storage;
    auto array = cast(Handle[]) storage[]; // error
}

(2) Variables and fields without annotation are @safe unless their initial value is not @safe

The rules regarding variables and fields are as follows:

  • An initialization expression x is @system when the function (() => x) is inferred as @system.
  • When marked @system, the result is always @system regardless of the type.
  • When marked @trusted, the initialization expression x is treated as (() @trusted => x).
  • When marked @safe, the initialization expression must be @safe.
  • In the absence of an annotation, the result is @system if the type is unsafe and the initialization expression is @system, or if the type is unsafe and the variable is extern.
int* getPtr() @system {return cast(int*) 0x8035FDF0;}
int  getVal() @system {return -1;}

int* x1 = x0;                     // @safe, (() => x0) is @safe
int* x2 = cast(int*) 0x8035FDF0;  // @system, (() => cast(int*) 0x8035FDF0) is @system
int* x3 = getPtr();               // @system, (() => getPtr()) is @system
int  x4 = getVal();               // @safe, int is not an unsafe type
extern int* ext;                  // @system, unsafe type and initial value unknown
@system int x5 = 1;               // @system as requested
@trusted int* x6 = getPtr();      // @safe, the getPtr call gets trusted
@safe int* x7 = getPtr();         // error: cannot initialize @safe variable with @system initializer

struct S {
    // same rules for fields:
    int* x9 = x3; // @system
    int  x8 = x5; // @safe
}

An exception to the above rules is made on unsafe types when the compiler knows the resulting value is safe.

int* getNull() pure @system {return null;}
int* n = getNull(); // despite unsafe type with @system initialization expression, inferred as @safe

Annotations with a scope (@system {}) or colon (@system:) affect variables just like they do functions.

@system
{
    int y0; // @system
}

@system:
int y1; // @system

Grammar changes

Placing @system annotations is already allowed in the places where it's needed for this DIP, so there is no grammar change.

Alternatives

Using private

It has been suggested before that bypassing private using e.g. .tupleof or __traits(getMember) should not be allowed in @safe code. While the need for giving a way of ensuring struct invariants in @safe code is in line with this DIP, the idea to use private for it is argued against.

First of all, disallowing bypassing private in @safe code is not sufficient for ensuring run-time invariants on user-defined types. When an aggregate has no members with an unsafe type, the private fields can still be indirectly written to via overlap in a union, void-initialization, or array casting.

Second, private only acts on the module level, so a @trusted member function cannot assume that a struct's invariants are upheld unless all other @safe code in the module has been manually certified not to violate them. This undermines the ability of the programmer to easily distinguish code requiring manual verification from code that can be checked automatically, especially since certain member functions like constructors, destructors, and operator overloads must be defined in the same module as the data on which they operate.

Finally, disallowing bypassing visibility with __traits(getMember, ...) or .tupleof would break @safe code that relied on this, and issue 15371 explicitly requested this behavior.

Using invariant blocks to specify unsafe types

Some have suggested that a struct can be made into an unsafe type by adding an invariant block.

struct Handle
{
    invariant
    {
        // no run-time checks, just marking `Handle` as an unsafe type
    }
    private int fd;
}

However, Contract Programming is currently a separate feature from Memory Safety, and an empty invariant {} block looks like something that can be safely removed. Suddenly introducing @safe restrictions and scope semantics to types with invariant blocks can be undesirable. On top of that, it still does not protect from modifications outside of @trusted code.

Breaking Changes and Deprecations

Attaching the @system attribute to variables is already permitted, but doing so adds no compiler checks. The additional checks for @system variables in this proposal can cause existing @safe code to break (note that @system code is completely unaffected by everything in this DIP). However, since @system on variables does not currently do anything, the author suspects that users generally do not add this attribute to any variables at all, let alone variables that are meant to be used in @safe code. The biggest risk here is that variables accidentily fall inside a @system {} block or under a @system: section.

@system:

int x; // suddenly not writable in @safe code anymore
void unsafeFuncA() {};
void unsafeFuncB() {};

void main() @safe
{
    x++; // not allowed anymore
}

Misconstructed pointers can be inferred @system under the new rules.

struct S
{
    int* a = cast(int*) 0x8035FDF0;
}

void main() @safe
{
    S s;
    *s.a = 0; // this gives an error now
}

Whenever this happens, there is a risk of memory corruption, so a compiler error would be in its place.

Still, a two-year deprecation period is proposed where instead of raising an error, a deprecation message is given whenever the new memory safety rules are broken. A preview flag -preview=systemVariables can additionally be added that immediately raises errors for violations while leaving other deprecation messages as warnings. At the end of the preview period, there will also be a flag to revert it, -revert=systemVariables, so that users can choose to keep the old behavior for a little longer.

Reference

Copyright & License

Copyright (c) 2020-2022 by the D Language Foundation

Licensed under Creative Commons Zero 1.0

Reviews

Community Review Round 1

Reviewed Version

Discussion

Feedback

In the Feedback Thread, most of the feedback was related to details such as terminology, whether to use assert(x) in the examples, etc.

The one structural piece of criticism was that making initialization of @system variables safe is unsound, to wit, "Memory safety cannot depend on the correctness of a @safe constructor." The DIP author replied that this boils down to "@trusted assumptions about @safe code", on which there is no consensus, and he has yet to determine a satisfactory design.

Of note, a detailed list of feedback was misplaced in the Discussion Thread. In short, the reviewer asserted that this proposal is essentially a response to bugs in the implementation of @safe, and those bugs should be fixed rather than a new feature added to the language. Subsequent discussion appears to have led to consensus among the particpants that the DIP is necessary.

Community Review Round 2

Reviewed Version

Discussion

Feedback

Only two items of actionable feedback were provided in the Feedback Thread:

  • The Rationale and Description appear to have a conflict: the Rationale states that module-level extern variables should be disallowed in @safe code, but a later example declares such "@safe by default". A DIP author responded that there is no conflict; the Rationale describes existing behavior, the example describes the behavior following implementation of this DIP.
  • A specifc quote from the DIP ("...when the compiler knows the resulting value is safe") leads one to the question of what the compiler considers to be safe; the DIP should elaborate on this. A DIP author replied that the language spec already provides that information.

Final Review

Reviewed Version

Discussion

Feedback

The following actionable feedback was provided in the Feedback Thread:

  • extern variables should not be @safe by default. The DIP author responded that this was originally intended to be consistent with the @safe-by-default DIP. That DIP was rejected, so he will change this.

  • The DIP provides an example, "int as pointer", and is correct in stating that there is no memory-safe risk in allowing a value without indirections to escape from a function. This undermines the motivation of the example: if there is no benefit from applying scope checking to data without indirections, then there is no justification to enabling such checks in @safe code. This example should be removed. The DIP author responded that the motivation here was not to add scope checking to plain integers for non-memory safe reasons, but to be able to create custom types that represent an indirection under the hood. However, the DIP does not demonstrate what kind of @trusted code this enables, and there are cases where one might want unsafe values without scope checking, so maybe it shouldn't be an effect of @system members.

  • The description says that scope "is not stripped away from an aggregate with at least one @system field, even when the aggregate has no members that contain pointers." The example noted in the item above appears to be the only justification, so this sentence should be removed. See the DIP author's response to the previous item.

  • Relating to the workaround described in the "int as pointer" example, wouldn't it work to put the handle in the union with void[1]? The DIP said no, as void[1] is not a type with unsafe values.

  • In the "Proposed changes" section, the "Further operations disallowed" appears to allow a case that defeats the purpose of disallowing reads of @system variables in @safe code:

      ref const identity(T)(return ref const T var){ return var; }
    
      @safe void main()
      {
          auto x = someContainer.internalRepresentation.identity;
      }
    

    The DIP author agreed.

  • In one of the examples in the "Proposed changes" section, the comment in the line this.z = z; // disallowed, this is assignment is wrong; this is not assignment, but construction. The DIP author agreed.

Formal Assessment

Reviewed Version

The language maintainers accepted this DIP, saying that it correctly identifies a loophole in the @safe checks and provides a reasonable solution.