# Unsafe, C and foreign function interface

Safe Rust prevents memory safety, data races, and enforces type specification at runtime, in a world full of unsafe system calls, racy and non deterministic hardware, and codes in other languages like C/C++ which doesn't guarantee those and we need to interoperate with them. Rust achieve this via safe abstraction over unsafe things in world.

## Safe abstraction

Imagine we have a database which we can connect to, and store an integer on it:

In [29]:
use std::{sync::atomic::{AtomicI32, Ordering}, time::Duration, thread};

static MOCK_STORAGE: AtomicI32 = AtomicI32::new(5);

pub struct Database;
pub struct Connection;

fn mock_network_operation() {
    std::thread::sleep(Duration::from_millis(100));
}

impl Database {
    fn connect(&self) -> Connection {
        mock_network_operation();
        Connection
    }
}

impl Connection {
    fn get(&self) -> i32 {
        mock_network_operation();
        MOCK_STORAGE.load(Ordering::SeqCst)
    }

    fn set(&self, value: i32) {
        mock_network_operation();
        MOCK_STORAGE.store(value, Ordering::SeqCst);
    }
}

let connection = Database.connect();
connection.set(12);
connection.get()

12

Now we want to send some commands to database sequentially. Like sending a get to database and get `prev`, and then set it to `prev+1`:

In [15]:
fn increase(conn: &Connection) {
    conn.set(conn.get() + 1);
}

connection.set(23);
increase(&connection);
increase(&connection);
connection.get()

25

But now users of our function `increase` can use it in multiple threads and get race condition:

In [16]:
use std::thread;

connection.set(0);

thread::scope(|s| {
    for _ in 0..10 {
        s.spawn(|| {
            increase(&connection);
        });
    }
});

// ideally should be 10
connection.get()

1

Rust doesn't guarantee general race condition prevention, but we can help our users by providing a safe abstraction:

In [26]:
mod safe_connection { // Remember: Rust has module level privacy
    use crate::{Connection, Database};
    
    pub struct SafeAbstractConnection {
        inner_connection: Connection, // private field
    }

    impl SafeAbstractConnection {
        pub fn new(db: &Database) -> Self {
            Self {
                inner_connection: db.connect(),
            }
        }

        pub fn get(&self) -> i32 {
            self.inner_connection.get()
        }
    
        pub fn set(&self, value: i32) {
            self.inner_connection.set(value);
        }

        pub fn increase(&mut self) { // It doesn't need to be &mut, but we use &mut to enforce that
            // caller has unique access to this connection. Either because they are in a
            // single threaded setup, or because they are using Mutex and similar.
            self.set(self.get() + 1);
        }
    }
}

let mut safe_connection = safe_connection::SafeAbstractConnection::new(&Database);
safe_connection.set(12);
safe_connection.increase();
safe_connection.get()

13

Now misusing the `increase` method will result in compile error:

In [28]:
thread::scope(|s| {
    for _ in 0..10 {
        s.spawn(|| {
            safe_connection.increase();
        });
    }
});

Error: cannot borrow `safe_connection` as mutable more than once at a time

Although it won't prevent the misuse (user can create a connection manually, and in a real world example, they can make a tcp connection with the database directly), it makes accidental mistakes hard.

The whole Rust safety is a safe abstraction over the unsafe world. For example, `Vec` and `Box` are abstractions over heap allocation. But if they are using something more low level which is not safe, can we use that and violate Rust's safety guarantees? Something must prevent us.

## Calling C functions from Rust

Since Rust is a low level language, it has capability to link and run C functions. In many targets (compile target roughly means a pair of OS and CPU architecture), by default, rust binaries will link to the C standard library (because Rust's std uses it) and we can use it directly using `libc` crate (which is not a special crate in anyway):

In [52]:
:dep libc = "0.2"

In [32]:
unsafe {
    let mut result = 0;
    libc::time(&mut result);
    let tm_ptr = libc::localtime(&result);
    let tm = *tm_ptr;
    println!("{}:{}:{}", tm.tm_hour, tm.tm_min, tm.tm_sec);
};

21:42:37


`time` and `localtime` are C functions. `unsafe` block is needed for calling C functions:

In [33]:
let mut result = 0;
libc::time(&mut result);

Error: call to unsafe function is unsafe and requires unsafe function or block

We will see what `unsafe` actually is, later. Before that, let's see the signature of `time` function, by tricking the compiler to tell it:

In [40]:
libc::time == libc::time

Error: binary operation `==` cannot be applied to type `unsafe extern "C" fn(*mut i64) -> i64 {time}`

Many new things here. Let's break it into parts:
* `unsafe`: it means this function is an unsafe function. More on this later.
* `extern "C"`: it means this function should be called with `C` calling convention and ABI.
* `*mut i64`: the first argument is a mutable raw pointer.
* `-> i64`: the output of this function is a 64 bit signed integer (this wasn't new)

## Raw pointer

Rust has the bad old C pointers, and calls them `raw pointer`, with syntax `*const T` and `*mut T`. Rust needs them for interoperability with C codes.

You can create them with `&` operator:

In [41]:
let x: i32 = 2;
let x_raw_ptr: *const i32 = &x;
x_raw_ptr

0x7ffd70bb25ac

Or out of air, from an arbitrary address:

In [43]:
let y_raw_ptr: *const i32 = 12 as *const i32;
y_raw_ptr

0xc

Or a null pointer, which standard library even has a function for it:

In [4]:
let null_raw_ptr: *const i32 = std::ptr::null();
null_raw_ptr

0x0

Types and mutability in raw pointers don't mean anything, like in C. They are just a hint/comment for the programmer:

In [2]:
let x: i32 = 2;
let x_raw_ptr: *const i32 = &x;
let ptr_with_bad_type: *mut () = x_raw_ptr as *mut (); // cast from const int* to void*, similar to c
ptr_with_bad_type

0x7ffc761972ec

And this was all safe Rust! So memory safety and type safety were a lie? Fortunately, we cannot dereference these pointers:

In [5]:
let a_bad_i32: i32 = *null_raw_ptr;

Error: dereference of raw pointer is unsafe and requires unsafe function or block

## Unsafe Rust

There are some operations, that compiler can't check if they follow rules of Rust, like calling C functions and dereferencing raw pointers. Safe Rust is conservative, and disallow such operations by hard compile time errors. But users of Rust need such operations. `unsafe` block is a tool to say to compiler "I know what I'm doing". By using `unsafe`, you are the one who should guarantee that the code is free of memory bugs and data races, and not only your code, but all other safe codes. A `unsafe` code can cause UB which is detected by a segment fault in a safe code. Safe code can't cause segment fault, so blame is always on the unsafe code.

Let's abuse our power and violate everything safe Rust guarantees. We try to not cause segment fault, because it will crash the jupyter's process and we should restart our notebook. But we can still do bad things, like use after free:

In [10]:
{
    let a_box: Box<i32> = Box::new(5);
    let a_raw_ptr: *mut i32 = Box::into_raw(a_box); // this consume the box and return it as a raw pointer
    // `Box::into_raw` is safe, but it can cause memory leak.
    let a_value_valid = unsafe { *a_raw_ptr }; // reading a raw pointer is unsafe
    unsafe {
        Box::from_raw(a_raw_ptr); // we recover the box from the pointer and immediately drop it.
        // `Box::from_raw` is unsafe, since it can be called with a garbage pointer
    }
    let b_box = Box::new(12); // some allocation to replace the previous freed one
    let a_value_invalid = unsafe { *a_raw_ptr };
    (a_value_valid, a_value_invalid) 
}

(5, -1429802653)

The first usage was valid, but the second one is use after free, which is undefined behavior. Similarly, double free, out of bound access, and every other memory problem.

Similarly, unsafe can break thread safety and cause data race:

In [36]:
let x = 0;
thread::scope(|s| {
    for _ in 0..10 {
        s.spawn(|| {
            let x_raw_ptr: *const i32 = &x;
            let x_raw_ptr_mut: *mut i32 = x_raw_ptr as *mut i32;
            unsafe {
                *x_raw_ptr_mut += 1;
            }
        });
    }
});
x

10

It worked, but we know that it may not work without proper synchronization. That's one reason why UB bugs are bad, they work in development, but will break in production.

Unsafe can potentially breaks not only thread safety and memory safety, but type safety and privacy as well. Here, we will use `transmute` (equivalent to `reinterpret_cast` in C++) to read the private fields of a vector:

In [39]:
use std::mem::transmute;

let mut v = vec![1, 2, 3];
v.push(4);
let v_fields: [usize; 3] = unsafe {
    transmute(v)
};
v_fields

[93930869322592, 6, 4]

As you can probably guess, the first field is the allocation pointer, the second field is the capacity, and the third field is actual length. At first it started with an allocation with size 3, and after push, it will allocate with size 6, and set length to 4.

Not only we can see the private fields, we can mutate them as well:

In [41]:
let mut v = vec![1, 2, 3];
v.push(4);
let v_fields: &mut [usize; 3] = unsafe {
    transmute(&mut v)
};
v_fields[0] += 4; // make it point to one unit ahead
let (a, b, c, d) = (v[0], v[1], v[2], v[3]);
v_fields[0] -= 4; // fix our shenanigans, to prevent segment fault while dropping the vector
(a, b, c, d)

(2, 3, 4, 32)

32 is from uninitialized memory. We broke vector's invariants so it can't prevent us from out of bound access by panicking.

So if `unsafe` rust is so dangerous, why Rust even allows that? `unsafe` is just a tool, and it has good, bad, correct and incorrect usages.

## Implementing vector

As an example of good usage for `unsafe`, we will try to implement vector. This implementation vastly differs from the one in the standard library. It is not generic (only supports `i32`) and it uses C `malloc` and `free` instead of Rust's allocation api, to keep things simple and familiar for you, the reader, who is familiar with `C` and `C++` memory management.


In [53]:
mod my_vec {
    use std::ptr;
    use libc::{realloc, free, c_void};
    use std::ops::{Deref, DerefMut};

    pub struct MyVec {
        ptr: *mut i32,
        cap: usize,
        len: usize,
    }

    impl MyVec {
        pub fn new() -> MyVec {
            MyVec {
                ptr: ptr::null_mut(),
                cap: 0,
                len: 0,
            }
        }

        // private function
        fn grow(&mut self) {
            self.cap *= 2;
            // *mut c_void is void* in C. Rust's () is not exactly equal to C's void.
            self.ptr = unsafe { realloc(self.ptr as *mut c_void, self.cap) } as *mut i32;
            if self.ptr == ptr::null_mut() {
                // We should pay attention to this details, otherwise it is UB since safe code
                // can dereference this pointer through our api
                panic!("Out of memory");
            }
        }

        pub fn push(&mut self, elem: i32) {
            if self.len == self.cap {
                self.grow();
            }
            unsafe {
                *self.ptr.add(self.len) = elem;
            }
            self.len += 1;
        }

        pub unsafe fn set_len(&mut self, new_len: usize) {
            self.len = new_len;
        }
    }

    impl Deref for MyVec {
        type Target = [i32];
        fn deref(&self) -> &[i32] {
            unsafe {
                // create a fat pointer manually from a raw pointer and a len
                std::slice::from_raw_parts(self.ptr, self.len)
            }
        }
    }

    impl DerefMut for MyVec {
        fn deref_mut(&mut self) -> &mut [i32] {
            unsafe {
                // create a fat pointer manually from a raw pointer and a len
                std::slice::from_raw_parts_mut(self.ptr, self.len)
            }
        }
    }

    impl Drop for MyVec {
        fn drop(&mut self) {
            unsafe {
                free(self.ptr as *mut c_void); // deallocate the memory
            }
        }
    }
}

Our vector supports push:

In [54]:
use my_vec::MyVec;

let mut v = MyVec::new();
v.push(1);
v.push(2);
v.push(3);
v.push(4);
v.push(5);

Converting to slice by dereferencing operator:

In [55]:
{
    let v_slice: &[i32] = &*v;
    v_slice
}

[1, 2, 3, 4, 5]

Methods available on slice, like `len` and `get`, due auto deref:

In [56]:
v.len()

5

In [57]:
v.get(3)

Some(4)

Indexing, which uses the same auto deref mechanism of methods:

In [58]:
v[0] + v[1] + v[2]

6

Mutating, because of `DerefMut` implementation:

In [59]:
v[2] = 12;
v[1..=3]

[2, 12, 4]

And all of that while being safe! Our fields are private, so safe code outside of it's module can't break it's invariants. And by requiring a mutable reference for push and grow methods, it is guaranteed that only on thread can have access to the vector during the whole execution of the function.

Safe code inside of the module can break memory safety, but the fact that the safe code preserves the invariants is part of the reason why our use of `unsafe` is valid.

But there is a public function which doesn't look very good, `set_len`. It unconditionally changes the `len`. We can use it to simulate pop:

In [63]:
println!("{:?}", &*v);
unsafe {
    v.set_len(3);
}
println!("{:?}", &*v);

[1, 2, 12, 4, 5]
[1, 2, 12]


But misusing it can cause UB:

In [66]:
unsafe {
    v.set_len(7);
}
println!("{:?}", &*v);

[1, 2, 12, 4, 5, 0, 137857]


But our implementation is still safe, because `unsafe` block is needed to cause UB. Without `unsafe` block, above code wouldn't compile:

In [67]:
v.set_len(7);
println!("{:?}", &*v);

Error: call to unsafe function is unsafe and requires unsafe function or block

There is nothing inherently make `set_len` unsafe, it just sets a field in a struct. But since we added `unsafe` to its signature, it can't be called on safe Rust.

We can declare any function `unsafe`:

In [69]:
{
    unsafe fn hello_world() {
        println!("hello world");
    }

    hello_world();
}

Error: call to unsafe function is unsafe and requires unsafe function or block

## Avoid unsafe

That implementation of vector was easy, right? You, as a C/C++ programmer, will find Rust borrow checking rules annoying and may want to do something in C way. And you will find `unsafe` very handy. Don't use `unsafe` there and search for the Rusty way of doing it.

Writing unsafe code which is provably fine is outright impossible, since what is fine and what is UB is not defined, yet. Worse, something that is wildly considered as fine today, might declared as UB tomorrow. This has been happened in past, notably `mem::uninitialized()` was considered fine to use, but now every call to it is considered immediate UB.

That means, this code was considered fine in old Rust versions:

In [5]:
{
    use std::mem::uninitialized;
    // SAFETY: we assign to it before using it:
    let mut x: i32 = unsafe { uninitialized() };
    x = 5;
    x
}

5

But now it is considered UB. The correct version today is: 

In [73]:
use std::mem::MaybeUninit;
let mut x: MaybeUninit<i32> = MaybeUninit::uninit();
x.write(5);
unsafe { x.assume_init() }

5

If you don't understand what is wrong with the first code, then you are not qualified to write unsafe Rust. Writing correct unsafe Rust needs a great experience with Rust, and reading the Rustonomicon, unsafe code guidelines, and many other things. It is a good idea to let people writing the standard library and trusted third party crates to do that hard work for you.

You can disallow unsafe code with `#![forbid(unsafe_code)]` globally in a crate, or `#[forbid(unsafe_code)]` for a single function:

In [6]:
#[forbid(unsafe_code)]
fn do_bad_things() -> i32 {
    let a_pointer = 12 as *mut i32;
    unsafe { *a_pointer }
}

Error: usage of an `unsafe` block

For tracking unsafe code in your dependencies, you can use `cargo-geiger`.

Hope this suggestions prevent you from using `unsafe` unnecessarily. Let's continue with FFI.

## FFI

We have seen that how we can use `libc` crate to call function. But `libc` crate is nothing special. It just declares an `extern` function without body. Then compiler will link this to the actual libc crate and it will work. We can do that ourself:

In [8]:
extern "C" {
    #[link_name="time"]
    fn c_time_in_rust(output: *mut i64);
}

let mut result = 0;
unsafe {
    // calling extern functions requires unsafe
    c_time_in_rust(&mut result);
}
result

1662058529

We can also export Rust functions to C. Since we can't have C code, and we don't build codes ourself, there is nothing interesting to show here. We can just show the syntax:

In [10]:
#[no_mangle] // it means compiler should not change the name in the object file
extern "C" fn a_callable_function_from_c(x: i32) -> i32 {
    // extern "C" means this function uses C's calling convention (ABI)
    x + 5
}

We can get or return Rust types like `Vec` or `Option` to C:

In [12]:
#[no_mangle]
extern "C" fn some_fn(x: Vec<i32>) -> i32 {
    x.iter().sum()
}

But since the layout is unspecified, we can't do useful thing wih them without shooting ourself in foot. To define the layout of our structs to be the same as C, we can use `#[repr(C)]`:

In [13]:
use std::mem::size_of;

#[repr(C)]
struct ReprC {
    field1: u8,
    field2: u16,
    field3: u8,
}

struct ReprRust {
    field1: u8,
    field2: u16,
    field3: u8,
}

// By default, compiler will change order of fields, to make things more efficient.
(size_of::<ReprC>(), size_of::<ReprRust>())

(6, 4)

By default (`#[repr(Rust)]`), binary layout of Rust types are unspecified, to get compiler and libraries flexibility to change internals and optimize things. There are other `#[repr(X)]`, like `#[repr(u8)]`:

In [16]:
#[repr(u32)]
enum ReprU32 {
    Variant1 = 1,
    Variant2 = 100000,
}

size_of::<ReprU32>()

4

By default, compiler will choose the size for enums, but you can specify it, and binary representation of each variant yourself. There is also a `#[repr(transparent)]` for structs with a single field (and maybe multiple zero sized type fields):

In [17]:
#[repr(transparent)]
struct Id(i64);

// Now Id is exactly equal to i64 as binary level

You can also have `#[repr(C)]` for enums with fields, to force them have a specified and stable tagged union layout. `#[repr(C)]` enums can't have niche optimizations, so standard library types like `Option` are not `#[repr(C)]`:

In [19]:
enum OptionLike {
    Var1(Vec<i32>),
    Var2, // This will goes to the null case of vector
}

#[repr(C)]
enum OptionLikeReprC {
    Var1(Vec<i32>),
    Var2,
}

(size_of::<Vec<i32>>(), size_of::<OptionLike>(), size_of::<OptionLikeReprC>())

(24, 24, 32)

So now Rust has support for C pointers and structs. C has another datatype, `union`, which poorly does the job of enum with fields in C. At first Rust didn't have union, but later it added for FFI proposes.

`union`, as you probably know, is a construct that permits access to the same memory block by using a choice of differing type descriptions. Since union has no type safety, using it requires unsafe:

In [20]:
#[repr(C)] // is needed if you need union for FFI proposes
union U8OrBool {
    int: u8,
    boolean: bool,
}

let x = U8OrBool { int: 1 };
unsafe { x.boolean }

true

Although `union` was added for FFI, it is now used in implementation of `MaybeUninit`, `transmute`, and many other things in the `unsafe` world.

So Rust can express almost any C type and function. But writing Rust equivalent for each C function is a hard and boring work. `bindgen` crate can do that automatically for us:

In [21]:
:dep bindgen = "0.53.1"

In [50]:
use std::fs;

fs::write("/tmp/our_header.h", "#include<stdio.h>").unwrap();

let bindings = bindgen::Builder::default()
    .header("/tmp/our_header.h")
    .generate()
    .unwrap();
let result = bindings.to_string();
println!("{}", result.len());
println!("{}", &result[21700..22200]);

35524
ingify!(_unused2)
        )
    );
}
pub type off_t = __off_t;
pub type ssize_t = __ssize_t;
pub type fpos_t = __fpos_t;
extern "C" {
    pub static mut stdin: *mut FILE;
}
extern "C" {
    pub static mut stdout: *mut FILE;
}
extern "C" {
    pub static mut stderr: *mut FILE;
}
extern "C" {
    pub fn remove(__filename: *const ::std::os::raw::c_char) -> ::std::os::raw::c_int;
}
extern "C" {
    pub fn rename(
        __old: *const ::std::os::raw::c_char,
        __new: *const ::std::os::raw::c_c


As you can see, `bindgen` generated a 35kb file from `stdio.h`, containing types, functions, constants, and some assertions to make sure the generated bindgen is correct. Normally you would use `bindgen` in your `build.rs`, and then you can import from C headers like a normal Rust module.

## Mixing C++ and Rust

Rust, as a language, only knows how to interoperate with C. So in order to call C++ from Rust, you should provide some `extern "C"` functions in C++, and then use them like normal C functions in Rust.

Like `bindgen`, there are crates to ease this process. [`cxx`](https://cxx.rs/) allows you to write bindings between C++ and Rust, consist of shared C structs, opaque types (which can be complex classes with template parameters) by reference (`&`, `&mut`, `Box`, `shared_ptr`, ...) and functions. Then it will generate C bindings for both C++ and Rust, which are safe and straightforward to use. [`autocxx`](https://github.com/google/autocxx/) can generate `cxx` bindings automatically. [`moveit`](https://github.com/google/moveit/) works around constructors and self referential types. Rust and C++ interoperability is a moving target, which many people are actively working on it.

## Calling Rust from other languages

Rust is a great language to make libraries and tooling for other, slow languages, because of its performance and lack of (heavyweight) runtime. Those languages are able to call C functions with some interface, and Rust can pretend to be a C library, so the base is ready. Similar to `cxx`, there are tools to make this safe and easy.

* Python: `pyo3` provides bindings to use Rust in python and vise versa. `cryptography` which is the 18th downloaded package in pypy, is using it.
* WebAssembly: WebAssembly is a portable binary format, designed for using in web browsers. Rust has best in class Wasm support, because the people working on wasm and those working on Rust were either the same, or saw each other at Mozilla. Rust compiler supports wasm as a compilation target, and `wasm-bindgen` provides great ergonomics for calling JS functions and Rust functions from each other. Not only you can write compute heavy codes in Rust to speed up your web application, you can write the whole front-end in Rust, if you don't like JS as a language. There are multiple front-end frameworks in Rust, like `yew` and `dioxus`.
* [This website](https://www.hobofan.com/rust-interop/) indexed the binding libraries for many languages, including `Java`, `Ruby`, `R`, `Julia`, `PHP`, ... .

## Adding Rust to a C/C++ project can make it more unsafe

There are some safety problems that can happen only in the Rust and C/C++ boundary. For example, exceptions or panics which pass the boundary are UB, or freeing Rust allocated Box with C's `free` and vise versa is UB. Also, Rust is more conservative about UB and apply more optimizations, like optimizations that assume `no_alias` for `&mut` pointers, or assume some values (like `null` for references and anything other than 0 and 1 for bool) as invalid. So if you want to improve safety by adding Rust to a C/C++ project, consider keeping the boundary small, in scope, and carefully checked, and minimize `unsafe` codes in Rust (or even eliminate that by using something like `cxx`) since writing UB less `unsafe` Rust is harder that UB less C.

## Final words

Safe Rust and Unsafe Rust are almost two separate languages. One uses `&`, `&mut`, `Box`, `Arc`, `enum` ... and the other uses `*const`, `*mut`, `MaybeUninit`, `union`, ... . Safe Rust is stable, safe and beautiful. Unsafe Rust is neither of them, but can call C and extract the last bits of performance. Use it only when you know what you are doing.