Guide: motivate Box and Rc pointers with need, uses, benefits, and ex…

…amples. Explain that Rust has different pointer types because there is a tradeoff between flexibility and efficiency. Motivate boxes as fixed-size containers of variable-sized objects. Clarify that Box and Rc are pointer types that you deref with * just like references. Stick to explaining the semantics and avoid implementation details. Scope isn't the most accurate framework to think about deallocation (since you return boxes and otherwise move values out of scopes); it's more "when the value is done being used," i.e., lifetime. Provide a connection between Rust's pointer types by locating them on a flexibiltiy / performance scale. Explain the compiler can't statically analyze lifetimes with multiple owners; hence the need for (runtime) reference counting.
rust-lang · Oct 27, 2014 · d257b37 · d257b37
1 parent f037452
commit d257b37
Showing 1 changed file with 78 additions and 44 deletions.
diff --git a/src/doc/guide.md b/src/doc/guide.md
@@ -3637,40 +3637,72 @@ pub fn as_maybe_owned(&self) -> MaybeOwned<'a> { ... }
 
 ## Boxes
 
-All of our references so far have been to variables we've created on the stack.
-In Rust, the simplest way to allocate heap variables is using a *box*.  To
-create a box, use the `box` keyword:
+Most of the types we've seen so far have a fixed size or number of components.
+The compiler needs this fact to lay out values in memory. However, some data
+structures, such as a linked list, do not have a fixed size. You might think to
+implement a linked list with an enum that's either a `Node` or the end of the
+list (`Nil`), like this:
+
+```{rust,ignore}
+enum List {             // error: illegal recursive enum type
+    Node(u32, List),
+    Nil
+}
+```
+
+But the compiler complains that the type is recursive, that is, it could be
+arbitrarily large. To remedy this, Rust provides a fixed-size container called
+a **box** that can hold any type. You can box up any value with the `box`
+keyword. Our boxed List gets the type `Box<List>` (more on the notation when we
+get to generics):
 
 ```{rust}
-let x = box 5i;
+enum List {
+    Node(u32, Box<List>),
+    Nil
+}
+
+fn main() {
+    let list = Node(0, box Node(1, box Nil));
+}
 ```
 
-This allocates an integer `5` on the heap, and creates a binding `x` that
-refers to it. The great thing about boxed pointers is that we don't have to
-manually free this allocation! If we write
+A box dynamically allocates memory to hold its contents. The great thing about
+Rust is that that memory is *automatically*, *efficiently*, and *predictably*
+deallocated when you're done with the box.
+
+A box is a pointer type, and you access what's inside using the `*` operator,
+just like regular references. This (rather silly) example dynamically allocates
+an integer `5` and makes `x` a pointer to it:
 
 ```{rust}
 {
     let x = box 5i;
-    // do stuff
+    println!("{}", *x);     // Prints 5
 }
 ```
 
-then Rust will automatically free `x` at the end of the block. This isn't
-because Rust has a garbage collector -- it doesn't. Instead, when `x` goes out
-of scope, Rust `free`s `x`. This Rust code will do the same thing as the
-following C code:
+The great thing about boxes is that we don't have to manually free this
+allocation! Instead, when `x` reaches the end of its lifetime -- in this case,
+when it goes out of scope at the end of the block -- Rust `free`s `x`. This
+isn't because Rust has a garbage collector (it doesn't). Instead, by tracking
+the ownership and lifetime of a variable (with a little help from you, the
+programmer), the compiler knows precisely when it is no longer used.
+
+The Rust code above will do the same thing as the following C code:
 
 ```{c,ignore}
 {
     int *x = (int *)malloc(sizeof(int));
-    // do stuff
+    if (!x) abort();
+    *x = 5;
+    printf("%d\n", *x);
     free(x);
 }
 ```
 
-This means we get the benefits of manual memory management, but the compiler
-ensures that we don't do something wrong. We can't forget to `free` our memory.
+We get the benefits of manual memory management, while ensuring we don't
+introduce any bugs. We can't forget to `free` our memory.
 
 Boxes are the sole owner of their contents, so you cannot take a mutable
 reference to them and then use the original box:
@@ -3706,48 +3738,50 @@ let mut x = box 5i;
 *x;
 ```
 
-## Rc and Arc
-
-Sometimes, you need to allocate something on the heap, but give out multiple
-references to the memory. Rust's `Rc<T>` (pronounced 'arr cee tee') and
-`Arc<T>` types (again, the `T` is for generics, we'll learn more later) provide
-you with this ability.  **Rc** stands for 'reference counted,' and **Arc** for
-'atomically reference counted.' This is how Rust keeps track of the multiple
-owners: every time we make a new reference to the `Rc<T>`, we add one to its
-internal 'reference count.' Every time a reference goes out of scope, we
-subtract one from the count. When the count is zero, the `Rc<T>` can be safely
-deallocated. `Arc<T>` is almost identical to `Rc<T>`, except for one thing: The
-'atomically' in 'Arc' means that increasing and decreasing the count uses a
-thread-safe mechanism to do so. Why two types? `Rc<T>` is faster, so if you're
-not in a multi-threaded scenario, you can have that advantage. Since we haven't
-talked about threading yet in Rust, we'll show you `Rc<T>` for the rest of this
-section.
+Boxes are simple and efficient pointers to dynamically allocated values with a
+single owner. They are useful for tree-like structures where the lifetime of a
+child depends solely on the lifetime of its (single) parent. If you need a
+value that must persist as long as any of several referrers, read on.
 
-To create an `Rc<T>`, use `Rc::new()`:
+## Rc and Arc
 
-```{rust}
-use std::rc::Rc;
+Sometimes, you need a variable that is referenced from multiple places
+(immutably!), lasting as long as any of those places, and disappearing when it
+is no longer referenced. For instance, in a graph-like data structure, a node
+might be referenced from all of its neighbors. In this case, it is not possible
+for the compiler to determine ahead of time when the value can be freed -- it
+needs a little run-time support.
 
-let x = Rc::new(5i);
-```
+Rust's **Rc** type provides shared ownership of a dynamically allocated value
+that is automatically freed at the end of its last owner's lifetime. (`Rc`
+stands for 'reference counted,' referring to the way these library types are
+implemented.) This provides more flexibility than single-owner boxes, but has
+some runtime overhead.
 
-To create a second reference, use the `.clone()` method:
+To create an `Rc` value, use `Rc::new()`. To create a second owner, use the
+`.clone()` method:
 
 ```{rust}
 use std::rc::Rc;
 
 let x = Rc::new(5i);
 let y = x.clone();
+
+println!("{} {}", *x, *y);      // Prints 5 5
 ```
 
-The `Rc<T>` will live as long as any of its references are alive. After they
-all go out of scope, the memory will be `free`d.
+The `Rc` will live as long as any of its owners are alive. After that, the
+memory will be `free`d.
+
+**Arc** is an 'atomically reference counted' value, identical to `Rc` except
+that ownership can be safely shared among multiple threads. Why two types?
+`Arc` has more overhead, so if you're not in a multi-threaded scenario, you
+don't have to pay the price.
 
-If you use `Rc<T>` or `Arc<T>`, you have to be careful about introducing
-cycles. If you have two `Rc<T>`s that point to each other, the reference counts
-will never drop to zero, and you'll have a memory leak. To learn more, check
-out [the section on `Rc<T>` and `Arc<T>` in the pointers
-guide](guide-pointers.html#rc-and-arc).
+If you use `Rc` or `Arc`, you have to be careful about introducing cycles. If
+you have two `Rc`s that point to each other, they will happily keep each other
+alive forever, creating a memory leak. To learn more, check out [the section on
+`Rc` and `Arc` in the pointers guide](guide-pointers.html#rc-and-arc).
 
 # Patterns