Skip to content

Latest commit

 

History

History
177 lines (138 loc) · 6.63 KB

2017-10-10-totw-77.md

File metadata and controls

177 lines (138 loc) · 6.63 KB
title layout sidenav published permalink type
Tip of the Week #77: Temporaries, Moves, and Copies
tips
side-nav-tips.html
true
tips/77
markdown

Originally posted as totw/77 on Jul 9, 2014
Revised Oct 10, 2017

By Titus Winters

In the ongoing attempt to figure out how to explain to non-language-lawyers how C++11 changed things, we present yet another entry in the series "When are copies made?" This is part of a general attempt to simplify the subtle rules that have surrounded copies in C++ and replace it with a simpler set of rules.

Can You Count to 2?

You can? Awesome. Remember that the "name rule" means that each unique name you can assign to a certain resource affects how many copies of the object are in circulation.

Name Counting, in Brief

If you’re worrying about a copy being created, presumably you’re worried about some line of code in particular. So, look at that point. How many names exist for the data you think is being copied? There are only 3 cases to consider:

Two Names: It’s a Copy

This one is easy: if you’re giving a second name to the same data, it’s a copy.

std::vector<int> foo;
FillAVectorOfIntsByOutputParameterSoNobodyThinksAboutCopies(&foo);
std::vector<int> bar = foo; // Yep, this is a copy.

std::map<int, string> my_map;
string forty_two = "42";
my_map[5] = forty_two; // Also a copy: my_map[5] counts as a name.

One Name: It’s a Move (a new thing)

This one is a little surprising: C++11 recognizes that if you can’t refer to a name anymore, you also don’t care about that data anymore. The language had to be careful not to break cases where you were relying on the destructor (say, MutexLock), so return is the easy case to identify.

std::vector<int> GetSomeInts() {
  std::vector<int> ret = {1, 2, 3, 4};
  return ret;
}

// Just a move: either "ret" or "foo" has the data, but never both at
// once.
std::vector<int> foo = GetSomeInts();

The other way to tell the compiler that you’re done with a name is calling std::move:

std::vector<int> foo = GetSomeInts();
// Not a copy, move allows the compiler to treat foo as a
// temporary, so this is invoking the move constructor for
// std::vector<int>.
// Note that it isn’t the call to std::move that does the moving,
// it’s the constructor. The call to std::move just allows foo to
// be treated as a temporary (rather than as an object with a name).
std::vector<int> bar = std::move(foo);

Zero Names: It’s a Temporary

Temporaries are also special: if you want to avoid copies, avoid providing names to variables.

void OperatesOnVector(const std::vector<int>& v);

// No copies: the values in the vector returned by GetSomeInts()
// will be moved (O(1)) into the temporary constructed between these
// calls and passed by reference into OperatesOnVector().
OperatesOnVector(GetSomeInts());

Beware: Zombies

The above (other than std::move itself) is hopefully pretty intuitive, it’s just that we all built up weird notions of copies in the years pre-dating C++11. For a language without garbage collection, this type of accounting gives us an excellent mix of performance and clarity. However, it’s not without dangers, and the big one is this: What is left in a value after it has been moved from?

T bar = std::move(foo);
CHECK(foo.empty());        // Is this valid? Maybe, but don’t count on it.

This is one of the major difficulties: what can we say about these leftover values? The C++ specification effectively says that such a value is left in a "valid but unspecified state." The safe approach is to stay away from these objects: you are allowed to re-assign to them, or let them go out of scope, but don’t make any other assumptions about their state.

We’re hoping to provide compiler-based checking to prevent you from inappropriately relying on the state of such a moved-from object. We’re not there yet, so for now, please be careful. Call these out in code review, and avoid them in your own code. Stay away from the zombies.

Wait, std::move Doesn’t Move?

Yeah, one other thing to watch for is that a call to std::move isn’t actually a move itself, it’s just a cast to an rvalue-reference. It’s only the use of that reference by a move constructor or move assignment that does the work.

std::vector<int> foo = GetSomeInts();
std::move(foo);                        // Does nothing.

// Invokes std::vector<int>’s move-constructor.
std::vector<int> bar = std::move(foo);

This should almost never happen, and you probably shouldn’t waste a lot of mental storage on it. I really only mention it if the connection between std::move and a move constructor was confusing you.

Aaaagh! It’s All Complicated! Why!?!

First: it’s really not so bad. Once we’ve got move operations in all of our value types (hopefully coming soon), we can do away with all of the discussions of "Is this a copy? Is this efficient?" and just rely on name counting: two names, a copy. Fewer than that: no copy.

Ignoring the issue of copies, value semantics are clearer and simpler to reason about. Consider these two operations:

void Foo(std::vector<string>* paths) {
  ExpandGlob(GenerateGlob(), paths);
}

std::vector<string> Bar() {
  std::vector<string> paths;
  ExpandGlob(GenerateGlob(), &paths);
  return paths;
}

Are these the same? What about if there is existing data in *paths? How can you tell? Value semantics are easier for a reader to reason about than input/output parameters, where you need to think about (and document) what happens to existing data, and potentially whether there is an pointer ownership transfer.

Because of the simpler guarantees about lifetime and usage when dealing with values (instead of pointers), it is easier for the compiler’s optimizers to operate on code in that style. Well-managed value semantics also minimizes hits on the allocator (which is cheap but not free). Once we understand how move semantics help rid us of copies, the compiler’s optimizers can better reason about object types, lifetimes, virtual dispatch, and a host of other issues that help generate more efficient machine code.

Efforts are underway to get all of the containers and value types in google3 to work with C++11's move semantics. Once we’ve managed this, we can stop worrying about copies and pointer semantics, and focus on writing simple easy-to-follow code. Please make sure you understand the new rules: not all google3 interfaces will be updated to return by value (instead of by output parameter), so there will always be a mix of styles. It’s important that you understand when one is more appropriate than the other.