From 8484aca046336416be4f5121a59c371831d9914a Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sat, 11 Jul 2015 12:38:50 +0530 Subject: [PATCH] Move wrapper types blog post into trpl --- src/doc/rust.css | 4 + src/doc/trpl/SUMMARY.md | 1 + src/doc/trpl/choosing-your-guarantees.md | 356 +++++++++++++++++++++++ 3 files changed, 361 insertions(+) create mode 100644 src/doc/trpl/choosing-your-guarantees.md diff --git a/src/doc/rust.css b/src/doc/rust.css index cd158283180a2..6204f38a37753 100644 --- a/src/doc/rust.css +++ b/src/doc/rust.css @@ -221,6 +221,10 @@ a > code { color: #428BCA; } +.section-header > a > code { + color: #8D1A38; +} + /* Code highlighting */ pre.rust .kw { color: #8959A8; } pre.rust .kw-2, pre.rust .prelude-ty { color: #4271AE; } diff --git a/src/doc/trpl/SUMMARY.md b/src/doc/trpl/SUMMARY.md index ca3381ffba465..5698cb1123aa3 100644 --- a/src/doc/trpl/SUMMARY.md +++ b/src/doc/trpl/SUMMARY.md @@ -16,6 +16,7 @@ * [Iterators](iterators.md) * [Concurrency](concurrency.md) * [Error Handling](error-handling.md) + * [Choosing your Guarantees](choosing-your-guarantees.md) * [FFI](ffi.md) * [Borrow and AsRef](borrow-and-asref.md) * [Release Channels](release-channels.md) diff --git a/src/doc/trpl/choosing-your-guarantees.md b/src/doc/trpl/choosing-your-guarantees.md new file mode 100644 index 0000000000000..db28ce6f42861 --- /dev/null +++ b/src/doc/trpl/choosing-your-guarantees.md @@ -0,0 +1,356 @@ +% Choosing your Guarantees + +One important feature of Rust as language is that it lets us control the costs and guarantees +of a program. + +There are various “wrapper type” abstractions in the Rust standard library which embody +a multitude of tradeoffs between cost, ergonomics, and guarantees. Many let one choose between +run time and compile time enforcement. This section will explain a few selected abstractions in +detail. + +Before proceeding, it is highly recommended that one reads about [ownership][ownership] and +[borrowing][borrowing] in Rust. + +[ownership]: ownership.html +[borrowing]: references-and-borrowing.html + +# Basic pointer types + +## `Box` + +[`Box`][box] is pointer which is “owned”, or a “box”. While it can hand +out references to the contained data, it is the only owner of the data. In particular, when +something like the following occurs: + +```rust +let x = Box::new(1); +let y = x; +// x no longer accessible here +``` + +Here, the box was _moved_ into `y`. As `x` no longer owns it, the compiler will no longer allow the +programmer to use `x` after this. A box can similarly be moved _out_ of a function by returning it. + +When a box (that hasn't been moved) goes out of scope, destructors are run. These destructors take +care of deallocating the inner data. + +This is a zero-cost abstraction for dynamic allocation. If you want to allocate some memory on the +heap and safely pass around a pointer to that memory, this is ideal. Note that you will only be +allowed to share references to this by the regular borrowing rules, checked at compile time. + +[box]: ../std/boxed/struct.Box.html + +## `&T` and `&mut T` + +These are immutable and mutable references respectively. They follow the &lquo;read-write lock&rquo; +pattern, such that one may either have only one mutable reference to some data, or any number of +immutable ones, but not both. This guarantee is enforced at compile time, and has no visible cost at +runtime. In most cases these two pointer types suffice for sharing cheap references between sections +of code. + +These pointers cannot be copied in such a way that they outlive the lifetime associated with them. + +## `*const T` and `*mut T` + +These are C-like raw pointers with no lifetime or ownership attached to them. They just point to +some location in memory with no other restrictions. The only guarantee that these provide is that +they cannot be dereferenced except in code marked `unsafe`. + +These are useful when building safe, low cost abstractions like `Vec`, but should be avoided in +safe code. + +## `Rc` + +This is the first wrapper we will cover that has a runtime cost. + +[`Rc`][rc] is a reference counted pointer. In other words, this lets us have multiple "owning" +pointers to the same data, and the data will be dropped (destructors will be run) when all pointers +are out of scope. + +Internally, it contains a shared “reference count” (also called “refcount”), +which is incremented each time the `Rc` is cloned, and decremented each time one of the `Rc`s goes +out of scope. The main responsibility of `Rc` is to ensure that destructors are called for shared +data. + +The internal data here is immutable, and if a cycle of references is created, the data will be +leaked. If we want data that doesn't leak when there are cycles, we need a garbage collector. + +#### Guarantees + +The main guarantee provided here is that the data will not be destroyed until all references to it +are out of scope. + +This should be used when we wish to dynamically allocate and share some data (read-only) between +various portions of yur program, where it is not certain which portion will finish using the pointer +last. It's a viable alternative to `&T` when `&T` is either impossible to statically check for +correctness, or creates extremely unergonomic code where the programmer does not wish to spend the +development cost of working with. + +This pointer is _not_ thread safe, and Rust will not let it be sent or shared with other threads. +This lets one avoid the cost of atomics in situations where they are unnecessary. + +There is a sister smart pointer to this one, `Weak`. This is a non-owning, but also non-borrowed, +smart pointer. It is also similar to `&T`, but it is not restricted in lifetime—a `Weak` +can be held on to forever. However, it is possible that an attempt to access the inner data may fail +and return `None`, since this can outlive the owned `Rc`s. This is useful for cyclic +data structures and other things. + +#### Cost + +As far as memory goes, `Rc` is a single allocation, though it will allocate two extra words (i.e. +two `usize` values) as compared to a regular `Box` (for "strong" and "weak" refcounts). + +`Rc` has the computational cost of incrementing/decrementing the refcount whenever it is cloned +or goes out of scope respectively. Note that a clone will not do a deep copy, rather it will simply +increment the inner reference count and return a copy of the `Rc`. + +[rc]: ../std/rc/struct.Rc.html + +# Cell types + +&lquo;Cell&rquo;s provide interior mutability. In other words, they contain data which can be manipulated even +if the type cannot be obtained in a mutable form (for example, when it is behind an `&`-ptr or +`Rc`). + +[The documentation for the `cell` module has a pretty good explanation for these][cell-mod]. + +These types are _generally_ found in struct fields, but they may be found elsewhere too. + +## `Cell` + +[`Cell`][cell] is a type that provides zero-cost interior mutability, but only for `Copy` types. +Since the compiler knows that all the data owned by the contained value is on the stack, there's +no worry of leaking any data behind references (or worse!) by simply replacing the data. + +It is still possible to violate your own invariants using this wrapper, so be careful when using it. +If a field is wrapped in `Cell`, it's a nice indicator that the chunk of data is mutable and may not +stay the same between the time you first read it and when you intend to use it. + +```rust +# use std::cell::Cell; +let x = Cell::new(1); +let y = &x; +let z = &x; +x.set(2); +y.set(3); +z.set(4); +println!("{}", x.get()); +``` + +Note that here we were able to mutate the same value from various immutable references. + +This has the same runtime cost as the following: + +```rust,ignore +let mut x = 1; +let y = &mut x; +let z = &mut x; +x = 2; +*y = 3; +*z = 4; +println!("{}", x); +``` + +but it has the added benefit of actually compiling successfully. + +#### Guarantees + +This relaxes the “no aliasing with mutability” restriction in places where it's +unnecessary. However, this also relaxes the guarantees that the restriction provides; so if your +invariants depend on data stored within `Cell`, you should be careful. + +This is useful for mutating primitives and other `Copy` types when there is no easy way of +doing it in line with the static rules of `&` and `&mut`. + +`Cell` does not let you obtain interior references to the data, which makes it safe to freely +mutate. + +#### Cost + +There is no runtime cost to using `Cell`, however if you are using it to wrap larger (`Copy`) +structs, it might be worthwhile to instead wrap individual fields in `Cell` since each write is +otherwise a full copy of the struct. + + +## `RefCell` + +[`RefCell`][refcell] also provides interior mutability, but isn't restricted to `Copy` types. + +Instead, it has a runtime cost. `RefCell` enforces the read-write lock pattern at runtime (it's +like a single-threaded mutex), unlike `&T`/`&mut T` which do so at compile time. This is done by the +`borrow()` and `borrow_mut()` functions, which modify an internal reference count and return smart +pointers which can be dereferenced immutably and mutably respectively. The refcount is restored when +the smart pointers go out of scope. With this system, we can dynamically ensure that there are never +any other borrows active when a mutable borrow is active. If the programmer attempts to make such a +borrow, the thread will panic. + +```rust +# use std::cell::RefCell; +let x = RefCell::new(vec![1,2,3,4]); +{ + println!("{:?}", *x.borrow()) +} + +{ + let mut my_ref = x.borrow_mut(); + my_ref.push(1); +} +``` + +Similar to `Cell`, this is mainly useful for situations where it's hard or impossible to satisfy the +borrow checker. Generally we know that such mutations won't happen in a nested form, but it's good +to check. + +For large, complicated programs, it becomes useful to put some things in `RefCell`s to make things +simpler. For example, a lot of the maps in [the `ctxt` struct][ctxt] in the rust compiler internals +are inside this wrapper. These are only modified once (during creation, which is not right after +initialization) or a couple of times in well-separated places. However, since this struct is +pervasively used everywhere, juggling mutable and immutable pointers would be hard (perhaps +impossible) and probably form a soup of `&`-ptrs which would be hard to extend. On the other hand, +the `RefCell` provides a cheap (not zero-cost) way of safely accessing these. In the future, if +someone adds some code that attempts to modify the cell when it's already borrowed, it will cause a +(usually deterministic) panic which can be traced back to the offending borrow. + +Similarly, in Servo's DOM there is a lot of mutation, most of which is local to a DOM type, but some +of which crisscrosses the DOM and modifies various things. Using `RefCell` and `Cell` to guard all +mutation lets us avoid worrying about mutability everywhere, and it simultaneously highlights the +places where mutation is _actually_ happening. + +Note that `RefCell` should be avoided if a mostly simple solution is possible with `&` pointers. + +#### Guarantees + +`RefCell` relaxes the _static_ restrictions preventing aliased mutation, and replaces them with +_dynamic_ ones. As such the guarantees have not changed. + +#### Cost + +`RefCell` does not allocate, but it contains an additional "borrow state" +indicator (one word in size) along with the data. + +At runtime each borrow causes a modification/check of the refcount. + +[cell-mod]: ../std/cell/ +[cell]: ../std/cell/struct.Cell.html +[refcell]: ../std/cell/struct.RefCell.html +[ctxt]: ../rustc/middle/ty/struct.ctxt.html + +# Synchronous types + +Many of the types above cannot be used in a threadsafe manner. Particularly, `Rc` and +`RefCell`, which both use non-atomic reference counts (_atomic_ reference counts are those which +can be incremented from multiple threads without causing a data race), cannot be used this way. This +makes them cheaper to use, but we need thread safe versions of these too. They exist, in the form of +`Arc` and `Mutex`/`RWLock` + +Note that the non-threadsafe types _cannot_ be sent between threads, and this is checked at compile +time. + +There are many useful wrappers for concurrent programming in the [sync][sync] module, but only the +major ones will be covered below. + +[sync]: ../std/sync/index.html + +## `Arc` + +[`Arc`][arc] is just a version of `Rc` that uses an atomic reference count (hence, "Arc"). +This can be sent freely between threads. + +C++'s `shared_ptr` is similar to `Arc`, however in the case of C++ the inner data is always mutable. +For semantics similar to that from C++, we should use `Arc>`, `Arc>`, or +`Arc>`[^4] (`UnsafeCell` is a cell type that can be used to hold any data and has +no runtime cost, but accessing it requires `unsafe` blocks). The last one should only be used if we +are certain that the usage won't cause any memory unsafety. Remember that writing to a struct is not +an atomic operation, and many functions like `vec.push()` can reallocate internally and cause unsafe +behavior, so even monotonicity may not be enough to justify `UnsafeCell`. + +[^4]: `Arc>` actually won't compile since `UnsafeCell` isn't `Send` or `Sync`, but we can wrap it in a type and implement `Send`/`Sync` for it manually to get `Arc>` where `Wrapper` is `struct Wrapper(UnsafeCell)`. + +#### Guarantees + +Like `Rc`, this provides the (thread safe) guarantee that the destructor for the internal data will +be run when the last `Arc` goes out of scope (barring any cycles). + +#### Cost + +This has the added cost of using atomics for changing the refcount (which will happen whenever it is +cloned or goes out of scope). When sharing data from an `Arc` in a single thread, it is preferable +to share `&` pointers whenever possible. + +[arc]: ../std/sync/struct.Arc.html + +## `Mutex` and `RwLock` + +[`Mutex`][mutex] and [`RwLock`][rwlock] provide mutual-exclusion via RAII guards (guards are +objects which maintain some state, like a lock, until their destructor is called). For both of +these, the mutex is opaque until we call `lock()` on it, at which point the thread will block +until a lock can be acquired, and then a guard will be returned. This guard can be used to access +the inner data (mutably), and the lock will be released when the guard goes out of scope. + +```rust,ignore +{ + let guard = mutex.lock(); + // guard dereferences mutably to the inner type + *guard += 1; +} // lock released when destructor runs +``` + + +`RwLock` has the added benefit of being efficient for multiple reads. It is always safe to have +multiple readers to shared data as long as there are no writers; and `RwLock` lets readers acquire a +"read lock". Such locks can be acquired concurrently and are kept track of via a reference count. +Writers must obtain a "write lock" which can only be obtained when all readers have gone out of +scope. + +#### Guarantees + +Both of these provide safe shared mutability across threads, however they are prone to deadlocks. +Some level of additional protocol safety can be obtained via the type system. +#### Costs + +These use internal atomic-like types to maintain the locks, which are pretty costly (they can block +all memory reads across processors till they're done). Waiting on these locks can also be slow when +there's a lot of concurrent access happening. + +[rwlock]: ../std/sync/struct.RwLock.html +[mutex]: ../std/sync/struct.Mutex.html +[sessions]: https://github.com/Munksgaard/rust-sessions + +# Composition + +A common gripe when reading Rust code is with types like `Rc>>` (or even more more +complicated compositions of such types). It's not always clear what the composition does, or why the +author chose one like this (and when one should be using such a composition in one's own code) + +Usually, it's a case of composing together the guarantees that you need, without paying for stuff +that is unnecessary. + +For example, `Rc>` is one such composition. `Rc` itself can't be dereferenced mutably; +because `Rc` provides sharing and shared mutability can lead to unsafe behavior, so we put +`RefCell` inside to get dynamically verified shared mutability. Now we have shared mutable data, +but it's shared in a way that there can only be one mutator (and no readers) or multiple readers. + +Now, we can take this a step further, and have `Rc>>` or `Rc>>`. These +are both shareable, mutable vectors, but they're not the same. + +With the former, the `RefCell` is wrapping the `Vec`, so the `Vec` in its entirety is +mutable. At the same time, there can only be one mutable borrow of the whole `Vec` at a given time. +This means that your code cannot simultaneously work on different elements of the vector from +different `Rc` handles. However, we are able to push and pop from the `Vec` at will. This is +similar to an `&mut Vec` with the borrow checking done at runtime. + +With the latter, the borrowing is of individual elements, but the overall vector is immutable. Thus, +we can independently borrow separate elements, but we cannot push or pop from the vector. This is +similar to an `&mut [T]`[^3], but, again, the borrow checking is at runtime. + +In concurrent programs, we have a similar situation with `Arc>`, which provides shared +mutability and ownership. + +When reading code that uses these, go in step by step and look at the guarantees/costs provided. + +When choosing a composed type, we must do the reverse; figure out which guarantees we want, and at +which point of the composition we need them. For example, if there is a choice between +`Vec>` and `RefCell>`, we should figure out the tradeoffs as done above and pick +one. + +[^3]: `&[T]` and `&mut [T]` are _slices_; they consist of a pointer and a length and can refer to a portion of a vector or array. `&mut [T]` can have its elements mutated, however its length cannot be touched.