# Memory safety

Memory problems are another major problem in languages like c++. In best case, you will get a segmentation fault, fire up a gdb, and debug it like an exception. In worst case, it can become a remote code execution security vulnerablity.

The state of the art methods for preventing memory issues in c++ are:
* Static methods: Compiler warnings, clang-tidy, ... can catch some common mistakes. But pretty limited due the lack of language level support.
* Dynamic methods: Valgrind, asan, ... can catch (some of) memory problems in test suit. They are effective at finding memory bugs, but "Program testing can be used to show the presence of bugs, but never to show their absence.".
* Runtime methods: Hardware tagging, CHERI, bound checking, ... can catch memory issues at runtime, and raise exception. These methods can prevent memory issues to become security vulnerablity, or make spurious problems in other parts of code. But they have performance and/or hardware cost.

But all of these tools are not complete, and there are c/c++ programs with memory problems in the wild. In safe Rust, compiler will reject programs that contains memory safety problems, and in cases which it is not known statically, it will enforce runtime checks to ensure memory safety. Let's see this in action.

In [42]:
fn dangling_reference(some_input: &i32) -> &i32 {
    let x = 5 + *some_input;
    &x
}

Error: cannot return reference to local variable `x`

That was easy, and c++ linters will catch that as well. Here is a harder one:

In [43]:
fn reference_to_second(input: &Vec<i32>) -> &i32 {
    &input[1]
}
{ // Blocks are due jupyter limits
    let v = vec![1, 2, 3];
    let r = reference_to_second(&v);
    let z = *r;
    z
}

2

Rust didn't catched us, because everything was ok. But we can make it problematic by droping the vector before reading the reference.

In [44]:
use std::mem::drop;

{
    let v = vec![1, 2, 3];
    let r = reference_to_second(&v);
    drop(v);
    let z = *r;
    z
}

Error: cannot move out of `v` because it is borrowed

We can see that Rust is magically catching the use after free bug here, with detailed explanation. But how Rust knows that `r` is related to `v`?

## Lifetimes

Reference type in Rust has two arguments, a `T`, type of the value that reference refer to it, and a `'x`, lifetime of the reference. And in its full form, it is written as `&'x T`.

The syntax `&T` which you have seen before, is a syntax sugar for `&'_ T`, which means infer or elide the lifetime.

Let's start with `'static` lifetime, which means this reference is valid for the whole program.

In [None]:
// str literals are included in the binary, so reference to them is valid for the whole program
let ref_to_str: &'static str = "hello";
*ref_to_str

"hello"

Rvalue references is another way to create static references in Rust (which isn't possible in c++)

In [None]:
let ref_to_i32: &'static i32 = &42;
*ref_to_i32

42

But the compiler will catch us if we try to use a temporary reference with `'static` lifetime:

In [None]:
{
    let x = 5;
    let y: &'static i32 = &x;
    y
}

Error: `x` does not live long enough

`'static` is not the only possible lifetime. See the below code, which looks fine, but doesn't compile:

In [None]:
fn some_fn(vec: &Vec<i32>, index: &usize) -> &i32 {
    &vec[*index]
}

Error: missing lifetime specifier

Compiler want's lifetime specifier, so we will provide it.

In [None]:
fn some_fn<'a, 'b>(vec: &'a Vec<i32>, index: &'b usize) -> &'a i32 {
    &vec[*index]
}

let v = vec![1, 2, 3];
*some_fn(&v, &2)

3

The syntax `&'a Vec<i32>` is type of a reference that is live at least as `'a`. And the `<'a, 'b>` means that the signature of this function is generic over each lifetime that can satisfy it's arguments conditions. Compiler can now understand that the result is alive as long as the first input, and will reject this code:

In [None]:
{
    let i = 2;
    let r = {
        let v = vec![1, 2, 3];
        some_fn(&v, &i)
    };
    *r
}

Error: `v` does not live long enough

But not this code

In [None]:
{
    let v = vec![1, 2, 3];
    let r = {
        let i = 2;
        some_fn(&v, &i)
    };
    *r // reference to i is invalid here, but it doesn't matter
}

3

We can (incorrectly) make the signature of `some_fn` more restrictive, and force the result of function to live no longer than reference to the index:

In [None]:
fn some_fn<'a>(vec: &'a Vec<i32>, index: &'a usize) -> &'a i32 {
    &vec[*index]
}

Here we used `'a` for both parameters. It doesn't mean they will have exact same lifetime, but it means `'a` is a lifetime that both parameters are valid during it. Compiler will choose the maximum possible option for `'a`. Remember, `&'x T` means this reference is valid at least as `'x`.

So the above example will no longer compile with new `some_fn`:

In [None]:
{
    let v = vec![1, 2, 3];
    let r = {
        let i = 2;
        some_fn(&v, &i)
    };
    *r
}

Error: `i` does not live long enough

You: So we can add a wrong lifetime annonation, to mislead compiler to accept the first example, which dropped the vector?

No, compiler will catch that. Let's try it.

In [None]:
fn some_fn<'a, 'b>(vec: &'b Vec<i32>, index: &'a usize) -> &'a i32 {
    &vec[*index]
}

Error: lifetime may not live long enough

`'b: 'a` means `'b` is longer than `'a` (`'b >= 'a`). We can right this function in this way:

In [None]:
fn some_fn<'a: 'c, 'b, 'c>(vec: &'a Vec<i32>, index: &'b usize) -> &'c i32 {
    &vec[*index]
}

Which is just more verbose and has no difference. We can write it even more verbose:

In [None]:
fn some_fn<'a, 'b, 'c>(vec: &'a Vec<i32>, index: &'b usize) -> &'c i32
where
    'a: 'c,
{
    &vec[*index]
}

## Lifetime elision

Let's get back to the first example:

In [None]:
fn reference_to_second(input: &Vec<i32>) -> &i32 {
    &input[1]
}

As said before, `&Vec<i32>` or `&'_ Vec<i32>` means its the compiler job to fill the lifetime position. In this case, it will fill it like this:

In [None]:
fn reference_to_second<'a>(input: &'a Vec<i32>) -> &'a i32 {
    &input[1]
}

The rationale is, You can't create references out of air, and return reference to local variables or owned parameters. So if there is exactly one reference in inputs, and output is a reference, it can live as long as the input (and no longer).

There are some more heuristics like this, called "lifetime elision rules".

But in the case when there is two inputs, the user intention is not clear. The result might borrow from the first argument (`some_fn` above), from the second argument (`some_fn` with arguments swapped) or even both:

In [None]:
fn last_element_of_longer_vec<'a>(v1: &'a Vec<i32>, v2: &'a Vec<i32>) -> &'a i32 {
    if v1.len() > v2.len() {
        v1.last().unwrap()
    } else {
        v2.last().unwrap()
    }
}
{
    let v1 = vec![1, 2, 3];
    let v2 = vec![4, 5, 6, 7];
    let r = last_element_of_longer_vec(&v1, &v2);
    // droping any of v1 and v2 can potentially invalidate this reference
    *r
}

7

## Null pointers

We can't create null pointers in Rust:

In [None]:
let null_pointer: &i32 = 0; // You can try NUL, NIL, NULL, null_ptr, ...

Error: mismatched types

But what we can do if we really need a null pointer? For example, we want to write a function which adds a pointer to another pointer, and will consider them zero if they are null. This is the implementation in c:
```C++
int f(int *x, int *y) {
    if (x == NULL) return *y;
    if (y == NULL) return *x;
    return *x + *y;
}
```

In rust we will use `Option`. `Option` is similar to `std::optional`, but different. And it is not specific to pointers, so we can use it with `i32`, for example:

In [52]:
let o1: Option<i32> = Some(5); // An option with `Some` value
let o2: Option<i32> = None; // An option without value

`.unwrap()` will return the inner value if there is any:

In [46]:
o1.unwrap()

5

But will panic otherwise:

In [100]:
o2.unwrap();

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/lib.rs:146:4
stack backtrace:
   0: rust_begin_unwind
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
   2: core::panicking::panic
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:48:5
   3: run_user_code_55
   4: evcxr::runtime::Runtime::run_loop
   5: evcxr::runtime::runtime_hook
   6: evcxr_jupyter::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Segmentation fault.
   0: evcxr::runtime::Runtime::install_crash_handlers::segfault_handler
   1: <unknown>
   2: mi_free
   3: alloc::alloc::dealloc
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/alloc.rs:107:14
      <alloc::alloc::Global as core::alloc:

Error: Subprocess terminated with status: signal: 6 (core dumped)

We can use a default to prevent panic:

In [54]:
(o1.unwrap_or(42), o2.unwrap_or(24))

(5, 24)

We can check if they are none:

In [55]:
(o1.is_none(), o2.is_none())

(false, true)

So we can implement `.unwrap_or` manually:

In [56]:
if o1.is_none() { 42 } else { o1.unwrap() }

5

But the above code is the wrong way to do that. The right way is using a pattern matching:

In [57]:
match o1 {
    Some(x) => x,
    None => 42,
}

5

Or an `if let` which is a syntax sugar for the former (Or `.unwrap_or` itself, which is good practice in Rust):

In [58]:
if let Some(x) = o2 { x } else { 42 }

42

`Option<T>` usually use more bytes than `T`, because it has more data. There are `2^32` valid values for `i32`, but `2^32+1` valid values for `Option<i32>`, so it needs at least 33 bits:

In [59]:
use std::mem::size_of;

(size_of::<i32>(), size_of::<Option<i32>>())

(4, 8)

But Rust is using 64 bits. Exact difference depedends on alignment, for example these have equal size:

In [60]:
(size_of::<(i32, i32)>(), size_of::<i64>())

(8, 8)

But `Option` of them has different size: 

In [61]:
(size_of::<Option<(i32, i32)>>(), size_of::<Option<i64>>())

(12, 16)

Because references in rust are guarranteed to be valid, they are always non-null. So compiler can put the `None` case in the `NULL` position of the pointer space, so they will have equal size:

In [62]:
(size_of::<Option<&i32>>(), size_of::<&i32>())

(8, 8)

So `Option<&i32>` is a null or valid reference to a `i32`, and `None` is null, at the binary level:

In [66]:
let null_pointer: Option<&i32> = None;

We can't dereference nullable pointers, even if they are not null:

In [67]:
fn some_fn(maybe_null: Option<&i32>) -> i32 {
    *maybe_null
}
some_fn(Some(&5));

Error: type `Option<&i32>` cannot be dereferenced

We can use `unwrap`, which will panic if the pointer is null. Panic is way better than UB. We can translate our above C code to Rust code, which is equivalent, but will check for memory problems in runtime:

In [68]:
fn f_translated_to_rust(x: Option<&i32>, y: Option<&i32>) -> i32 {
    if x.is_none() {
        return *y.unwrap();
    }
    if y.is_none() {
        return *x.unwrap();
    }
    *x.unwrap() + *y.unwrap()
}

But as said before, `.is_none()` and then `.unwrap()` is super unidiomatic code in Rust, and we should use pattern matching:

In [70]:
fn f_the_rusty_way(x: Option<&i32>, y: Option<&i32>) -> i32 {
    match (x, y) {
        (Some(x), Some(y)) => *x + *y,
        (Some(a), None) | (None, Some(a)) => *a,
    }
}

Error: non-exhaustive patterns: `(None, None)` not covered

Oh, we didn't handled the case which both pointers are null! And this was the case for all previous codes (did you notice that?). In C it was UB, and in unidiomatic Rust it was a logic error. But in idiomatic Rust it was a compiler error. Let's fix that bug:

In [73]:
fn f_the_rusty_way(x: Option<&i32>, y: Option<&i32>) -> i32 {
    match (x, y) {
        (Some(x), Some(y)) => *x + *y,
        (Some(a), None) | (None, Some(a)) => *a,
        (None, None) => 0,
    }
}

f_the_rusty_way(Some(&5), None)

5

At this point, you might be confused by pattern matching, specially if you don't have a functional programming background. Let's take a closer look to it.

## Pattern matching

Pattern matching is a tool to destruct values. Patterns are some language construct that can match values, and fill some bindings if they match the value.

Simple bindings are patterns, which will match anything:

In [74]:
match 5 {
    some_value => format!("hello {some_value}"),
}

"hello 5"

Literals are patterns, which will match only if the value is equal to themselve. And `_` is a pattern which will match everything and ignore.

In [80]:
match 43 {
    1 => "one",
    43 => "a great number",
    _ => "other",
}

"a great number"

Ranges are patterns:

In [85]:
match 10_u32 {
    1..=10 => "A number between 1 .. 10",
    11..=100 => "A number between 11 .. 100",
    101.. => "A number bigger than 100",
    0 => "Zero",
}

"A number between 1 .. 10"

Tuple of patterns, are patterns, which will match if both part of tuples match

In [89]:
match (5, 12) {
    (1..=9, 1..=9) => "one digit - one digit",
    (1..=9, 10..=99) => "one digit - two digit",
    (10..=99, 10..=99) => "two digit - two digit",
    (100..=999, _) => "three digit - unknown",
    _ => "unknown - unknown",
}

"one digit - two digit"

Patterns can have intersection with each other. First satisfying pattern will be choosed (like `switch case` in C)

In [91]:
match (5, 12) {
    (_, 1..=10) => "this won't be selected, because it doesn't match",
    (3..=7, _) => "this will be selected",
    (1..=100, 1..=100) => "this won't be selected, even though it matches",
    _ => "rest cases",
}

"this will be selected"

If a match arm is unreachable, compiler will warn us. Jupyter doesn't show warnings, so I will make the warning error:

In [4]:
#[deny(unreachable_patterns)]
fn some_function_that_will_disallow_this_warning() -> &'static str {
    match 5 {
        1..=10 => "A number between 1 and 10",
        5 => "5, the great number",
        _ => "A bad number",
    }
}

Error: unreachable pattern

But we can ignore warnings as well:

In [5]:
match 5 {
    1..=10 => "A number between 1 and 10",
    5 => "5, the great number",
    _ => "A bad number",
}

"A number between 1 and 10"

On the other hand, if patterns don't cover the whole type space, it will be a hard error (we saw this before, when we didn't cover `(None, None)`).

In [8]:
match 5 {
    1..=10 => "1-10",
    11..=20 => "11-20",
}

Error: non-exhaustive patterns: `i32::MIN..=0_i32` and `21_i32..=i32::MAX` not covered

This is hard error, and not a warning, for memory safety! Ignoring it can lead to uninitalized value:

In [10]:
let s = match 5 {
    1 => "hello",
    2 => "world",
};
// What s can be? An string out of air?
s 

Error: non-exhaustive patterns: `i32::MIN..=0_i32` and `3_i32..=i32::MAX` not covered

We'll get back to the pattern matching in next chapters. For now, let's stick to the memory safety.

## Uninitalized memory