# Memory safety

Memory problems are another major problem in languages like c++. In best case, you will get a segmentation fault, fire up a gdb, and debug it like an exception. In worst case, it can become a remote code execution security vulnerablity.

The state of the art methods for preventing memory issues in c++ are:
* Static methods: Compiler warnings, clang-tidy, ... can catch some common mistakes. But pretty limited due the lack of language level support.
* Dynamic methods: Valgrind, asan, ... can catch (some of) memory problems in test suit. They are effective at finding memory bugs, but "Program testing can be used to show the presence of bugs, but never to show their absence.".
* Runtime methods: Hardware tagging, CHERI, bound checking, ... can catch memory issues at runtime, and raise exception. These methods can prevent memory issues to become security vulnerablity, or make spurious problems in other parts of code. But they have performance and/or hardware cost.

But all of these tools are not complete, and there are c/c++ programs with memory problems in the wild. In safe Rust, compiler will reject programs that contains memory safety problems, and in cases which it is not known statically, it will enforce runtime checks to ensure memory safety. Let's see this in action.

In [42]:
fn dangling_reference(some_input: &i32) -> &i32 {
    let x = 5 + *some_input;
    &x
}

Error: cannot return reference to local variable `x`

That was easy, and c++ linters will catch that as well. Here is a harder one:

In [43]:
fn reference_to_second(input: &Vec<i32>) -> &i32 {
    &input[1]
}
{ // Blocks are due jupyter limits
    let v = vec![1, 2, 3];
    let r = reference_to_second(&v);
    let z = *r;
    z
}

2

Rust didn't catched us, because everything was ok. But we can make it problematic by droping the vector before reading the reference.

In [44]:
use std::mem::drop;

{
    let v = vec![1, 2, 3];
    let r = reference_to_second(&v);
    drop(v);
    let z = *r;
    z
}

Error: cannot move out of `v` because it is borrowed

We can see that Rust is magically catching the use after free bug here, with detailed explanation. But how Rust knows that `r` is related to `v`?

## Lifetimes

Reference type in Rust has two arguments, a `T`, type of the value that reference refer to it, and a `'x`, lifetime of the reference. And in its full form, it is written as `&'x T`.

The syntax `&T` which you have seen before, is a syntax sugar for `&'_ T`, which means infer or elide the lifetime.

Let's start with `'static` lifetime, which means this reference is valid for the whole program.

In [None]:
// str literals are included in the binary, so reference to them is valid for the whole program
let ref_to_str: &'static str = "hello";
*ref_to_str

"hello"

Rvalue references is another way to create static references in Rust (which isn't possible in c++)

In [None]:
let ref_to_i32: &'static i32 = &42;
*ref_to_i32

42

But the compiler will catch us if we try to use a temporary reference with `'static` lifetime:

In [None]:
{
    let x = 5;
    let y: &'static i32 = &x;
    y
}

Error: `x` does not live long enough

`'static` is not the only possible lifetime. See the below code, which looks fine, but doesn't compile:

In [None]:
fn some_fn(vec: &Vec<i32>, index: &usize) -> &i32 {
    &vec[*index]
}

Error: missing lifetime specifier

Compiler want's lifetime specifier, so we will provide it.

In [None]:
fn some_fn<'a, 'b>(vec: &'a Vec<i32>, index: &'b usize) -> &'a i32 {
    &vec[*index]
}

let v = vec![1, 2, 3];
*some_fn(&v, &2)

3

The syntax `&'a Vec<i32>` is type of a reference that is live at least as `'a`. And the `<'a, 'b>` means that the signature of this function is generic over each lifetime that can satisfy it's arguments conditions. Compiler can now understand that the result is alive as long as the first input, and will reject this code:

In [None]:
{
    let i = 2;
    let r = {
        let v = vec![1, 2, 3];
        some_fn(&v, &i)
    };
    *r
}

Error: `v` does not live long enough

But not this code

In [None]:
{
    let v = vec![1, 2, 3];
    let r = {
        let i = 2;
        some_fn(&v, &i)
    };
    *r // reference to i is invalid here, but it doesn't matter
}

3

We can (incorrectly) make the signature of `some_fn` more restrictive, and force the result of function to live no longer than reference to the index:

In [None]:
fn some_fn<'a>(vec: &'a Vec<i32>, index: &'a usize) -> &'a i32 {
    &vec[*index]
}

Here we used `'a` for both parameters. It doesn't mean they will have exact same lifetime, but it means `'a` is a lifetime that both parameters are valid during it. Compiler will choose the maximum possible option for `'a`. Remember, `&'x T` means this reference is valid at least as `'x`.

So the above example will no longer compile with new `some_fn`:

In [None]:
{
    let v = vec![1, 2, 3];
    let r = {
        let i = 2;
        some_fn(&v, &i)
    };
    *r
}

Error: `i` does not live long enough

You: So we can add a wrong lifetime annonation, to mislead compiler to accept the first example, which dropped the vector?

No, compiler will catch that. Let's try it.

In [None]:
fn some_fn<'a, 'b>(vec: &'b Vec<i32>, index: &'a usize) -> &'a i32 {
    &vec[*index]
}

Error: lifetime may not live long enough

`'b: 'a` means `'b` is longer than `'a` (`'b >= 'a`). We can right this function in this way:

In [None]:
fn some_fn<'a: 'c, 'b, 'c>(vec: &'a Vec<i32>, index: &'b usize) -> &'c i32 {
    &vec[*index]
}

Which is just more verbose and has no difference. We can write it even more verbose:

In [None]:
fn some_fn<'a, 'b, 'c>(vec: &'a Vec<i32>, index: &'b usize) -> &'c i32
where
    'a: 'c,
{
    &vec[*index]
}

## Lifetime elision

Let's get back to the first example:

In [None]:
fn reference_to_second(input: &Vec<i32>) -> &i32 {
    &input[1]
}

As said before, `&Vec<i32>` or `&'_ Vec<i32>` means its the compiler job to fill the lifetime position. In this case, it will fill it like this:

In [None]:
fn reference_to_second<'a>(input: &'a Vec<i32>) -> &'a i32 {
    &input[1]
}

The rationale is, You can't create references out of air, and return reference to local variables or owned parameters. So if there is exactly one reference in inputs, and output is a reference, it can live as long as the input (and no longer).

There are some more heuristics like this, called "lifetime elision rules".

But in the case when there is two inputs, the user intention is not clear. The result might borrow from the first argument (`some_fn` above), from the second argument (`some_fn` with arguments swapped) or even both:

In [None]:
fn last_element_of_longer_vec<'a>(v1: &'a Vec<i32>, v2: &'a Vec<i32>) -> &'a i32 {
    if v1.len() > v2.len() {
        v1.last().unwrap()
    } else {
        v2.last().unwrap()
    }
}
{
    let v1 = vec![1, 2, 3];
    let v2 = vec![4, 5, 6, 7];
    let r = last_element_of_longer_vec(&v1, &v2);
    // droping any of v1 and v2 can potentially invalidate this reference
    *r
}

7

## Null pointers

We can't create null pointers in Rust:

In [None]:
let null_pointer: &i32 = 0; // You can try NUL, NIL, NULL, null_ptr, ...

Error: mismatched types

But what we can do if we really need a null pointer? For example, we want to write a function which adds a pointer to another pointer, and will consider them zero if they are null. This is the implementation in c:
```C++
int f(int *x, int *y) {
    if (x == NULL) return *y;
    if (y == NULL) return *x;
    return *x + *y;
}
```

In rust we will use `Option`. `Option` is similar to `std::optional`, but different. And it is not specific to pointers, so we can use it with `i32`, for example:

In [52]:
let o1: Option<i32> = Some(5); // An option with `Some` value
let o2: Option<i32> = None; // An option without value

`.unwrap()` will return the inner value if there is any:

In [46]:
o1.unwrap()

5

But will panic otherwise:

In [100]:
o2.unwrap();

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/lib.rs:146:4
stack backtrace:
   0: rust_begin_unwind
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
   2: core::panicking::panic
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:48:5
   3: run_user_code_55
   4: evcxr::runtime::Runtime::run_loop
   5: evcxr::runtime::runtime_hook
   6: evcxr_jupyter::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Segmentation fault.
   0: evcxr::runtime::Runtime::install_crash_handlers::segfault_handler
   1: <unknown>
   2: mi_free
   3: alloc::alloc::dealloc
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/alloc.rs:107:14
      <alloc::alloc::Global as core::alloc:

Error: Subprocess terminated with status: signal: 6 (core dumped)

We can use a default to prevent panic:

In [54]:
(o1.unwrap_or(42), o2.unwrap_or(24))

(5, 24)

We can check if they are none:

In [55]:
(o1.is_none(), o2.is_none())

(false, true)

So we can implement `.unwrap_or` manually:

In [56]:
if o1.is_none() { 42 } else { o1.unwrap() }

5

But the above code is the wrong way to do that. The right way is using a pattern matching:

In [57]:
match o1 {
    Some(x) => x,
    None => 42,
}

5

Or an `if let` which is a syntax sugar for the former (Or `.unwrap_or` itself, which is good practice in Rust):

In [58]:
if let Some(x) = o2 { x } else { 42 }

42

`Option<T>` usually use more bytes than `T`, because it has more data. There are `2^32` valid values for `i32`, but `2^32+1` valid values for `Option<i32>`, so it needs at least 33 bits:

In [14]:
use std::mem::size_of;

(size_of::<i32>(), size_of::<Option<i32>>())

(4, 8)

But Rust is using 64 bits. Exact difference depedends on alignment, for example these have equal size:

In [60]:
(size_of::<(i32, i32)>(), size_of::<i64>())

(8, 8)

But `Option` of them has different size: 

In [61]:
(size_of::<Option<(i32, i32)>>(), size_of::<Option<i64>>())

(12, 16)

Because references in rust are guarranteed to be valid, they are always non-null. So compiler can put the `None` case in the `NULL` position of the pointer space, so they will have equal size:

In [62]:
(size_of::<Option<&i32>>(), size_of::<&i32>())

(8, 8)

So `Option<&i32>` is a null or valid reference to a `i32`, and `None` is null, at the binary level:

In [66]:
let null_pointer: Option<&i32> = None;

We can't dereference nullable pointers, even if they are not null:

In [67]:
fn some_fn(maybe_null: Option<&i32>) -> i32 {
    *maybe_null
}
some_fn(Some(&5));

Error: type `Option<&i32>` cannot be dereferenced

We can use `unwrap`, which will panic if the pointer is null. Panic is way better than UB. We can translate our above C code to Rust code, which is equivalent, but will check for memory problems in runtime:

In [68]:
fn f_translated_to_rust(x: Option<&i32>, y: Option<&i32>) -> i32 {
    if x.is_none() {
        return *y.unwrap();
    }
    if y.is_none() {
        return *x.unwrap();
    }
    *x.unwrap() + *y.unwrap()
}

But as said before, `.is_none()` and then `.unwrap()` is super unidiomatic code in Rust, and we should use pattern matching:

In [70]:
fn f_the_rusty_way(x: Option<&i32>, y: Option<&i32>) -> i32 {
    match (x, y) {
        (Some(x), Some(y)) => *x + *y,
        (Some(a), None) | (None, Some(a)) => *a,
    }
}

Error: non-exhaustive patterns: `(None, None)` not covered

Oh, we didn't handled the case which both pointers are null! And this was the case for all previous codes (did you notice that?). In C it was UB, and in unidiomatic Rust it was a logic error. But in idiomatic Rust it was a compiler error. Let's fix that bug:

In [73]:
fn f_the_rusty_way(x: Option<&i32>, y: Option<&i32>) -> i32 {
    match (x, y) {
        (Some(x), Some(y)) => *x + *y,
        (Some(a), None) | (None, Some(a)) => *a,
        (None, None) => 0,
    }
}

f_the_rusty_way(Some(&5), None)

5

At this point, you might be confused by pattern matching, specially if you don't have a functional programming background. Let's take a closer look to it.

## Pattern matching

Pattern matching is a tool to destruct values. Patterns are some language construct that can match values, and fill some bindings if they match the value.

Simple bindings are patterns, which will match anything:

In [74]:
match 5 {
    some_value => format!("hello {some_value}"),
}

"hello 5"

Literals are patterns, which will match only if the value is equal to themselve. And `_` is a pattern which will match everything and ignore.

In [80]:
match 43 {
    1 => "one",
    43 => "a great number",
    _ => "other",
}

"a great number"

Ranges are patterns:

In [85]:
match 10_u32 {
    1..=10 => "A number between 1 .. 10",
    11..=100 => "A number between 11 .. 100",
    101.. => "A number bigger than 100",
    0 => "Zero",
}

"A number between 1 .. 10"

Tuple of patterns, are patterns, which will match if both part of tuples match

In [89]:
match (5, 12) {
    (1..=9, 1..=9) => "one digit - one digit",
    (1..=9, 10..=99) => "one digit - two digit",
    (10..=99, 10..=99) => "two digit - two digit",
    (100..=999, _) => "three digit - unknown",
    _ => "unknown - unknown",
}

"one digit - two digit"

Patterns can have intersection with each other. First satisfying pattern will be choosed (like `switch case` in C)

In [91]:
match (5, 12) {
    (_, 1..=10) => "this won't be selected, because it doesn't match",
    (3..=7, _) => "this will be selected",
    (1..=100, 1..=100) => "this won't be selected, even though it matches",
    _ => "rest cases",
}

"this will be selected"

If a match arm is unreachable, compiler will warn us. Jupyter doesn't show warnings, so I will make the warning error:

In [4]:
#[deny(unreachable_patterns)]
fn some_function_that_will_disallow_this_warning() -> &'static str {
    match 5 {
        1..=10 => "A number between 1 and 10",
        5 => "5, the great number",
        _ => "A bad number",
    }
}

Error: unreachable pattern

But we can ignore warnings as well:

In [2]:
match 5 {
    1..=10 => "A number between 1 and 10",
    5 => "5, the great number",
    _ => "A bad number",
}

"A number between 1 and 10"

On the other hand, if patterns don't cover the whole type space, it will be a hard error (we saw this before, when we didn't cover `(None, None)`).

In [8]:
match 5 {
    1..=10 => "1-10",
    11..=20 => "11-20",
}

Error: non-exhaustive patterns: `i32::MIN..=0_i32` and `21_i32..=i32::MAX` not covered

This is hard error, and not a warning, for memory safety! Ignoring it can lead to uninitalized value:

In [10]:
let s = match 5 {
    1 => "hello",
    2 => "world",
};
// What s can be? An string out of air?
s 

Error: non-exhaustive patterns: `i32::MIN..=0_i32` and `3_i32..=i32::MAX` not covered

We'll get back to the pattern matching in next chapters. For now, let's stick to the memory safety.

## Uninitalized memory

We usually initialize variables at declaration.

In [3]:
let x = 2;
x

2

But it is not neccessary:

In [4]:
let x; // not even let mut x!
x = 2;
x

2

Using an uninitalized value is compile error:

In [5]:
let x;
let y = x + 3;
x = 2;
y

Error: use of possibly-uninitialized variable: `x`

Rust runs a complex control flow analysis to be permissive:

In [6]:
let x;
let some_condition = true;
if some_condition {
    x = 5;
} else {
    panic!("some condition doesn't met, exiting");
}
let y = x + 3;
y

8

And maintain memory safety at the same time:

In [9]:
let x;
let some_condition = true;
let some_other_condition = false;
if some_condition {
    x = 5;
} else if some_other_condition {
    panic!("some condition doesn't met, exiting");
}
let y = x + 3;
y

Error: use of possibly-uninitialized variable: `x`

Even in loops:

In [12]:
fn last_member_stupid(nums: &Vec<i32>) -> i32 {
    let mut last;
    for x in nums {
        last = x;
    }
    *last
}

Error: use of possibly-uninitialized variable: `last`

## Move and copy

There are no move constructor, copy constructor, overloading assignment operator and such things. Assignment operator, returning from blocks and functions, function calls and every operation which needs move, will always do the following:
* Run the destructor of the thing in the dest. If it is uninitialized, do nothing.
* `memcpy` the source bits to the dest. In case of `Vec`, it means copying the allocation pointer, length and capacity. 24 bytes in a 64bit system.
* Compiler marks the source as uninitialized memory. So you can no longer use that.

In [18]:
let x = vec![1, 2, 3];
let y = x;
let z = x;

Error: use of moved value: `x`

The second step is super important for memory safety. Without that, there would be two vectors which own the allocation and can reallocate or free it.

If you read the compiler error, it says `Vec<i32>` doesn't implement the `Copy` trait. The point is, in types that implement `Copy` trait, the source is still valid after move, so move is effectively a copy.

Integers are one of `Copy` types. We can verify that with this:

In [19]:
let x = 2;
let y = x;
let z = x;
y + z

4

Or by using the static assertion crate:

In [27]:
:dep assert-impl = "0.1.3"

use assert_impl::assert_impl;

In [29]:
assert_impl!(Copy: i32, i64, u8);

References are also copy:

In [30]:
assert_impl!(Copy: &i32, &Vec<i32>, &str);

But mutable references are not:

In [31]:
assert_impl!(Copy: &mut i32);

Error: the function or associated item `assert` exists for struct `Helper<&mut i32>`, but its trait bounds were not satisfied

The error message is meaningless, because `assert_impl` is a hacky solution. But why mutable references are not copy? If you remember from previous chapter, mutable references are unique. But if they were copy, they would lose this property and Rust would lose it's safety.

User defined structs are not copy by default, but if all of its field implement copy, they can implement copy with derive:

In [32]:
#[derive(Clone, Copy)]
struct Human {
    name: &'static str,
    age: u32,
}

let x = Human { name: "hamid", age: 20 };
let y = x;
let z = x;
println!("{} {}", y.name, z.age);

hamid 20


## Bound checking

Vector index lookups are always bound checked. We will get panic in case of out of bound access:

In [6]:
:preserve_vars_on_panic 1

Preserve vars on panic: true


In [25]:
let v = vec![1, 2, 3];
v[5]

thread '<unnamed>' panicked at 'index out of bounds: the len is 3 but the index is 5', src/lib.rs:124:40
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
   2: core::panicking::panic_bounds_check
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:84:5
   3: <unknown>
   4: <unknown>
   5: evcxr::runtime::Runtime::run_loop
   6: evcxr::runtime::runtime_hook
   7: evcxr_jupyter::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


Panic occurred, the following variables have been lost: v

This was easy, and even c++ is able to do that. Because vector knows its length. But what about array pointers?

In [3]:
fn foo(x: &i32) -> i32 {
    x[3]
}


Error: cannot index into a value of type `&i32`

Rust is not C, and can't use arbitary pointers for arrays. It has special type for array pointers:

In [7]:
fn foo(x: &[i32]) -> i32 {
    x[3]
}

foo(&[1, 2])

thread '<unnamed>' panicked at 'index out of bounds: the len is 2 but the index is 3', src/lib.rs:3:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
   2: core::panicking::panic_bounds_check
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:84:5
   3: run_user_code_2


But how Rust is able to handle that?

## Fat pointers

Pointer of arrays are double of normal pointers:

In [8]:
use std::mem::size_of;

(size_of::<&i32>(), size_of::<&[i32]>())

(8, 16)

This is because they will store size in the pointer as well. This allows Rust to bound check accesses to slices as well.

If you know size at compile time, you can provide it:

In [9]:
(size_of::<&i32>(), size_of::<&[i32]>(), size_of::<&[i32; 5]>())

(8, 16, 8)

So below function will accept only array pointers of size 5.

In [10]:
fn foo(x: &[i32; 5]) -> i32 {
    x[3]
}
foo(&[1, 2, 3, 4, 5])

4

In [11]:
foo(&[1, 2])

Error: mismatched types

This will get a normal bound checking panic at runtime:

In [15]:
fn foo(x: &[i32; 5]) -> i32 {
    x[7]
}

As a bouns for including size in the pointer, we can iterate over it:

In [16]:
fn foo(ar: &[i32]) {
    for x in ar {
        print!("{x} ");
    }
    println!();
}
let v = vec![1, 2, 3, 4, 5];
foo(&v[2..]);
foo(&v[..3]);
foo(&v[2..=3]);

3 4 5 
1 2 3 
3 4 


Above for is garraunteed to not have any bound checks (beside what is needed to detect if loop is finished), even in debug builds. But even naive C like for will be optimized in release builds, so we are not worried of performance effect of bound checks.

In [17]:
fn foo(ar: &[i32]) {
    for i in 0..ar.len() {
        let x = ar[i];
        print!("{x} ");
    }
    println!();
}
let v = vec![1, 2, 3, 4, 5];
foo(&v[2..]);
foo(&v[..3]);
foo(&v[2..=3]);

3 4 5 
1 2 3 
3 4 


# Unsized types

Fat pointers are not specific to arrays. A pointer to every unsized type is fat.

`str` is an unsized type:

In [13]:
size_of::<&str>()

16

`str` is type of some bytes which are utf-8 encoded. You have seen them in string literals:

In [23]:
let x: &'static str = "(😀)";
// number of bytes, not count of characters, grapheme clusters, ...
x.len()

6

And so is every struct with a last unsized field:

In [17]:
struct UnsizedStruct {
    name: &'static str,
    age: u32,
    dna: [u8], // only last field can be unsized
}

size_of::<&UnsizedStruct>()

16

And trait objects, which we are not ready yet for them.

## Pointers in structs

References without lifetimes are disallowed in Rust:

In [2]:
struct VecMutatorLog {
    data: &mut Vec<i32>,
    number_of_pushs: usize,
}

Error: missing lifetime specifier

We should listen to the compiler and make it generic over lifetimes:

In [5]:
struct VecMutatorLog<'a> {
    data: &'a mut Vec<i32>,
    number_of_push: usize,
}

Things are similar:

In [7]:
fn push(mutator: &mut VecMutatorLog<'_>, data: i32) {
    mutator.number_of_push += 1;
    mutator.data.push(data);
}
{
    let mut v = vec![1, 2, 3];
    let mut mv = VecMutatorLog {
        data: &mut v,
        number_of_push: 0,
    };
    push(&mut mv, 10);
    push(&mut mv, 20);
    push(&mut mv, 30);
    (mv.number_of_push, v)
}

(3, [1, 2, 3, 10, 20, 30])

But if we push in the v again after read, compiler wil complain. (Remember: there can be only one mutable pointer and no other pointer to something at the same time)

In [11]:
{
    let mut v = vec![1, 2, 3];
    let mut mv = VecMutatorLog {
        data: &mut v,
        number_of_push: 0,
    };
    push(&mut mv, 10);
    push(&mut mv, 20);
    push(&mut mv, 30);
    println!("{:?}", v);
    push(&mut mv, 50);
    (mv.number_of_push, v)
}

Error: cannot borrow `v` as immutable because it is also borrowed as mutable

So it isn't possible to work around the borrow checker with structs, since they have lifetime annotations similar to functions.

## Smart pointers

References can model pointers that refer to a resource that someone else owns. That is, someone else is responsible to free it. With lifetime annotations, compiler can guarantee that we don't use references after the owner free their resource.

But this is not the only pattern pointers are used in C. Sometimes a pointer is the one who owns the pointee. This is useful when moving the original data is costly and we want to frequently change the owner of it.

In C++, this kind of pointers are wrapped in a `unique_ptr`, to free them when they are out of scope. `Box` will do the same in Rust:

In [13]:
let x: Box<i32> = Box::new(5); // will allocate 4 bytes in heap, and store it there
// Box overloads the deref (*) operator
*x

5

Heap allocation is mandatory, because things in the stack have an owner (stack frame) which will free them (at the end of function). Heap allocation is not free, so use `Box` (and `unique_ptr` in c++ and `malloc` in C) cautiously.

Box is exactly a pointer at the binary level:

In [16]:
// size of box and nullable box:
(size_of::<Box<i32>>(), size_of::<Option<Box<i32>>>())

(8, 8)

Box of unsized types are automatically fat:

In [17]:
size_of::<Box<[i32]>>()

16

We can get a reference to the inside of a box:

In [18]:
fn print_i32_ref(x: &i32) {
    println!("{x}");
}
let x = Box::new(12);
print_i32_ref(&*x);
print_i32_ref(&x); // * will be added by the compiler
*x

12
12


12

Box is not copy, even if their inside is copy. Similar to `&mut` references:

In [19]:
let x = Box::new(3);
let y = x;
let z = x;

Error: use of moved value: `x`

Box will free its resource at the end of scope. To see this in action, we will define a helper struct, which will print its name at the end:

In [48]:
struct DebugDrop(&'static str);
impl Drop for DebugDrop {
    fn drop(&mut self) {
        println!("{} has been dropped", self.0);
    }
}
{
    let x = DebugDrop("test");
    // it should print here
}
println!("finished");

test has been dropped
finished


Now let's see box behavior:

In [24]:
fn consume_box_for_fun(b: Box<DebugDrop>) {
    println!("Box {} moved here", b.0); // Remember: (*b).0 is not necessary
}
{
    let b1 = Box::new(DebugDrop("b1"));
    let b2 = Box::new(DebugDrop("b2"));
    let mut b3 = Box::new(DebugDrop("b3"));
    println!("position 1");
    b3 = b1; // original b3 will be dropped here
    println!("position 2");
    consume_box_for_fun(b2);
    println!("position 3");
}
println!("position 4");

position 1
b3 has been dropped
position 2
Box b2 moved here
b2 has been dropped
position 3
b1 has been dropped
position 4


Box can store arrays and slices as well:

In [25]:
let x: Box<[i32]> = Box::new([1, 2, 3]);
x.len()

3

But it can't change the length of the owned array. Another smart pointer, `Vec`, can:

In [26]:
let mut x: Vec<i32> = vec![1, 2, 3];
x.push(4);
x.push(5);
x.len()

5

Vec is a smart pointer in Rust (unlike c++) because it overloads the deref operator:

In [35]:
{
    let x: Vec<i32> = vec![1, 2, 3];
    let y: &[i32] = &*x;
    let z: &[i32] = &x; // compiler will add the *
    println!("{}", y.len());
    println!("{:?}", z);
};

3
[1, 2, 3]


`Vec<T>` is very similar to `Box<[T]>`. Both are a pointer that own an array on the heap, but `Vec` has an additional resize capability.

Since `Vec<T>` itself is a pointer, `&Vec<T>` is not very useful and we should use `&[T]`. It has better performance (because it is one level of indirection, not two) and is more general:

In [43]:
fn foo(x: &[i32]) {
    println!("{}, {:?}", x.len(), x);
}

let array_on_stack: [i32; 4] = [1, 2, 3, 4];
let vector: Vec<i32> = vec![5, 6, 7];
let boxed_array: Box<[i32]> = Box::new([9, 10]);
foo(&array_on_stack);
foo(&vector);
foo(&boxed_array);
foo(&array_on_stack[1..]);
foo(&vector[..2]);

4, [1, 2, 3, 4]
3, [5, 6, 7]
2, [9, 10]
3, [2, 3, 4]
2, [5, 6]


Similarly, `&&T`, `&Box<T>`, `&&mut T` and such should be avoided in favor of `&T`.

`&mut Vec<T>` might be useful sometimes:

In [44]:
fn vec_pusher(v: &mut Vec<i32>) {
    v.push(1); // not possible with &mut [i32]
    v.push(2);
    v.push(3);
}
let mut v = vec![4, 5, 6];
vec_pusher(&mut v);
vec_pusher(&mut v);
vec_pusher(&mut v);
v

[4, 5, 6, 1, 2, 3, 1, 2, 3, 1, 2, 3]

Rust also has an equivalent to `shared_ptr`, called `Arc`. This is useful when the resource has not a specific owner. It will move the resource on the heap, and count the users of it. It will increase the count on the clone, and decrease the count on drop, and will free the resource if count is zero.

For example, we want to share a resource between some threads, but we don't want to join them in the main thread. So we can not use scoped threads and own the shared resource on the stack of the main thread, so we should use `Arc`:

In [57]:
use std::{thread, sync::Arc, time::Duration};
{
    let a = Arc::new(DebugDrop("a"));
    for i in 1..10 {
        let a_clone = a.clone(); // increment the reference counter
        thread::spawn(move || {
            println!("thread {}: shared resource is {}", i, a_clone.0);
            // a_clone will be dropped here
        });
    }
    println!("goodbye from main thread");
    // original a is dropped here, decreasing the counter
};
// wait for other threads to finish
thread::sleep(Duration::from_millis(500));

thread 1: shared resource is a
thread 2: shared resource is a
thread 4: shared resource is a
thread 7: shared resource is a
goodbye from main thread
thread 3: shared resource is a
thread 6: shared resource is a
thread 9: shared resource is a
thread 8: shared resource is a
thread 5: shared resource is a
a has been dropped


Cost of atomic operations are very high, specially in multi core settings. Better to avoid `.clone` as much as possible.

Moving the `Arc` is one way to avoid clone:

In [58]:
fn some_function_that_needs_arc(x: Arc<Vec<i32>>, print_len_as_well: bool, id: i32) {
    thread::spawn(move || {
        if print_len_as_well {
            // Remember: Arc is a smart pointer (overrides deref) and compiler will insert * operator
            // in very places (like around method call) implicitly. so `.len` is for array slice, not Arc.
            println!("thread {}: {:?}. len is {}", id, x, x.len());
        } else {
            println!("thread {}: {:?}", id, x);
        }
    });
}
let a = Arc::new(vec![1, 2, 3]);
some_function_that_needs_arc(a.clone(), false, 1);
some_function_that_needs_arc(a.clone(), true, 2);
// we no longer need a after this function call, so we can move a to it and not clone
some_function_that_needs_arc(a, false, 3);
// wait for threads
thread::sleep(Duration::from_millis(500));

thread 2: [1, 2, 3]. len is 3
thread 1: [1, 2, 3]
thread 3: [1, 2, 3]


If a function doesn't need to do anything `Arc<T>` specific, it can use `&T`.

In [66]:
fn some_function_that_does_not_need_arc(x: &[i32], print_len_as_well: bool, id: i32) {
    if print_len_as_well {
        println!("thread {}: {:?}. len is {}", id, x, x.len());
    } else {
        println!("thread {}: {:?}", id, x);
    }
}

fn some_function_that_needs_arc(x: Arc<Vec<i32>>, print_len_as_well: bool, id: i32) {
    thread::spawn(move || {
        some_function_that_does_not_need_arc(&x, print_len_as_well, id);
        some_function_that_does_not_need_arc(&x, print_len_as_well, id);
        // we here used x two times, but without increasing the reference counter
    });
}
let a = Arc::new(vec![1, 2, 3, 4]);
some_function_that_needs_arc(a.clone(), false, 1);
some_function_that_needs_arc(a.clone(), true, 2);
some_function_that_needs_arc(a, false, 3);
// wait for threads
thread::sleep(Duration::from_millis(500));

thread 2: [1, 2, 3, 4]. len is 4
thread 1: [1, 2, 3, 4]
thread 1: [1, 2, 3, 4]
thread 3: [1, 2, 3, 4]
thread 3: [1, 2, 3, 4]
thread 2: [1, 2, 3, 4]. len is 4


Arc is acronym for Atomic reference counter. There is a non atomic version in the standard library as well:

In [70]:
use std::rc::Rc;
struct Human {
    room: Rc<DebugDrop>,
}
let hamid = Human { room: Rc::new(DebugDrop("TV Room")) }; // hamid turns on the tv room lights
let majid = Human { room: hamid.room.clone() }; // majid joins hamid
let mut saeed = Human { room: Rc::new(DebugDrop("Living Room")) };
println!("position 1");
saeed.room = majid.room.clone(); // saeed goes to the TV room and left living room
drop(hamid); // hamid left the room, but TV room is still here,
println!("position 2");
drop(majid);
drop(saeed);
println!("position 3");

position 1
Living Room has been dropped
position 2
TV Room has been dropped
position 3


Using `Rc` in multiple threads, will result in a compiler error (because Rust is thread safe): (This is one of the reasons C++ has no equivalent for it, because people would shoot themselves in the foot by using it over multiple threads)

In [71]:
{
    let a = Rc::new(DebugDrop("a"));
    for i in 1..10 {
        let a_clone = a.clone();
        thread::spawn(move || {
            println!("thread {}: shared resource is {}", i, a_clone.0);
        });
    }
    println!("goodbye from main thread");
};

Error: `Rc<DebugDrop>` cannot be sent between threads safely

In C++, smart pointers are encouraged, because C pointers are super dangerous. But in Rust they are discouraged, because of performance (heap allocation, increasing reference counts, ...). Smart pointers are one of defensive programming tactics in C++ that we would like to avoid in Rust. But sometimes there are unavoidable and we can't make compiler accept our (usually wrong) program without them.

## Memory leaks

Preventing memory leaks is not in the guarantees of Rust memory safety. Rust even has a specific function for leaking things.

In [72]:
let leaked_ref: &'static [i32];
{
    let array_to_leak = [1, 2, 3];
    let array_on_heap = Box::new(array_to_leak);
    // array_on_heap is a box, so it would normally dropped at the end of the block
    leaked_ref = Box::leak(array_on_heap);
}
leaked_ref

[1, 2, 3]

Or we can see that with `DebugDrop`:

In [77]:
let x = Box::new(DebugDrop("will run"));
let y = Box::new(DebugDrop("will never run"));
Box::leak(y);

will run has been dropped


You: But why Rust isn't guaranteeing the prevention of memory leak? Isn't memory leak a bad thing?

Yes, It's a bad thing, but it isn't very clear what things are memory leak. Look at this example:

In [83]:
let mut v = vec![1];
for _ in 1..100 {
    let x = v.iter().rev().take(2).sum::<i32>() % 500;
    v.push(x);
}
println!("{:?}", &v[0..10]); // for showing that it is fibonacci
v.last().unwrap()

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]


75

This is computing the 100th fibonacci item modulo 500. But it is not using memory at its best, because it always needs only the last two elements of `v`, so we can free the rest. Now imagine that the for loop above is handling some requests, and it is pushing results of previous requests in a vector (or hashmap) and using the recent results to answer the current request. Over time that vector will get bigger and bigger and eventually it will run out of memory. This is simply memory leak, but tools like valgrind won't show it, because the vector is being dropped at end of the scope.

## Mutability and memory safety

In thread safety chapter, we have seen that Rust disallows multiple mutable references, achieving thread safety with that. Here we will show that it is necessary in single thread as well, for memory safety.

Look at this example, which doesn't compile:

In [85]:
{
    let mut v = vec![1, 2, 3, 4];
    let ref_to_0 = &v[0];
    v.push(5);
    *ref_to_0
}

Error: cannot borrow `v` as mutable because it is also borrowed as immutable

While this code is single threaded and there can't be thread safety issues, there is memory safety issue with this code. The `push` in the third line can (and will, in this case) reallocate the memory of the vector, but the reference is to the first allocation, and won't be updated automatically. So it would be invalid at the last line, and compiler correctly prevent us from doing that.

## Runtime borrow checking

In the previous chapter, we saw that with `Mutex` we can enforce unique mutable access via lock at runtime. We may want to enforce borrow checking rules in runtime even for single threaded programs, because borrow checker analysis is sometimes stricter than needed. For example:

In [3]:
enum Log {
    Read,
    Write,
}
#[derive(Default)]
struct VecLogger {
    data: Vec<i32>,
    log: Vec<Log>,
}
fn read(x: &mut VecLogger) -> &Vec<i32> {
    x.log.push(Log::Read);
    &x.data
}
fn write(x: &mut VecLogger) -> &mut Vec<i32> {
    x.log.push(Log::Write);
    &mut x.data
}
{
    let mut logger = VecLogger::default(); // will set both vectors to empty
    let read1 = read(&mut logger);
    let read2 = read(&mut logger);
    assert!(read1 == read2);
}

Error: cannot borrow `logger` as mutable more than once at a time

The problem is, compiler will assume that the mutable reference should be alive at least as long as the result of `read`. There is nothing wrong with compiler here, signature of the `read` is saying exactly that (via lifetime elision), and compiler analysis are "local", that is, compiler will only look at the current function body and other function signatures, and won't look at other function bodies. This is a very important property, and guarantees that changes in a function implementation won't break other people's code. C++ doesn't have this property, and you can break other people's code by changing the implementation of a template function, which is a semver hazard.

Now that we understand problem, lets fix that:

In [10]:
use std::cell::RefCell;
#[derive(Debug)]
enum Log {
    Read,
    Write,
}
#[derive(Default, Debug)]
struct VecLogger {
    data: Vec<i32>,
    log: RefCell<Vec<Log>>,
}
fn read(x: &VecLogger) -> &Vec<i32> {
    // readable access to a RefCell is enough for mutating it
    let mut log_mut = x.log.borrow_mut();
    log_mut.push(Log::Read);
    &x.data
}
fn write(x: &mut VecLogger) -> &mut Vec<i32> {
    // here ^^^^ is needed anyway
    let mut log_mut = x.log.borrow_mut();
    log_mut.push(Log::Write);
    &mut x.data
}
{
    let mut logger = VecLogger::default(); // will set both vectors to empty
    let writer = write(&mut logger);
    writer.push(3);
    writer.push(5);
    let read1 = read(&logger);
    let read2 = read(&logger);
    assert!(read1 == read2);
    logger
}

VecLogger { data: [3, 5], log: RefCell { value: [Write, Read, Read] } }

## Things we learnt
* These things are impossible in Rust:
  * Use after free
  * Double free
  * Dereferencing invalid pointer (including null)
  * Out of bound access (and so buffer overflow)
* For preventing that, Rust needs lifetime annotations
  * Rust-like borrow checker analysis is not applicable to c++, because the lack of such annotations
* Pattern matching and its usage in preventing bugs
* `Option`
* Smart pointers: `Rc`, `Arc`, `Vec`, `Box`
