title | layout | sidenav | published | permalink | type |
---|---|---|---|---|---|
Tip of the Week #77: Temporaries, Moves, and Copies |
tips |
side-nav-tips.html |
true |
tips/77 |
markdown |
Originally posted as totw/77 on Jul 9, 2014
Revised Oct 10, 2017
In the ongoing attempt to figure out how to explain to non-language-lawyers how C++11 changed things, we present yet another entry in the series "When are copies made?" This is part of a general attempt to simplify the subtle rules that have surrounded copies in C++ and replace it with a simpler set of rules.
You can? Awesome. Remember that the "name rule" means that each unique name you can assign to a certain resource affects how many copies of the object are in circulation.
If you’re worrying about a copy being created, presumably you’re worried about some line of code in particular. So, look at that point. How many names exist for the data you think is being copied? There are only 3 cases to consider:
This one is easy: if you’re giving a second name to the same data, it’s a copy.
std::vector<int> foo;
FillAVectorOfIntsByOutputParameterSoNobodyThinksAboutCopies(&foo);
std::vector<int> bar = foo; // Yep, this is a copy.
std::map<int, string> my_map;
string forty_two = "42";
my_map[5] = forty_two; // Also a copy: my_map[5] counts as a name.
This one is a little surprising: C++11 recognizes that if you can’t refer to a
name anymore, you also don’t care about that data anymore. The language had to
be careful not to break cases where you were relying on the destructor (say,
MutexLock
), so return
is the easy case to identify.
std::vector<int> GetSomeInts() {
std::vector<int> ret = {1, 2, 3, 4};
return ret;
}
// Just a move: either "ret" or "foo" has the data, but never both at
// once.
std::vector<int> foo = GetSomeInts();
The other way to tell the compiler that you’re done with a name
is calling std::move
:
std::vector<int> foo = GetSomeInts();
// Not a copy, move allows the compiler to treat foo as a
// temporary, so this is invoking the move constructor for
// std::vector<int>.
// Note that it isn’t the call to std::move that does the moving,
// it’s the constructor. The call to std::move just allows foo to
// be treated as a temporary (rather than as an object with a name).
std::vector<int> bar = std::move(foo);
Temporaries are also special: if you want to avoid copies, avoid providing names to variables.
void OperatesOnVector(const std::vector<int>& v);
// No copies: the values in the vector returned by GetSomeInts()
// will be moved (O(1)) into the temporary constructed between these
// calls and passed by reference into OperatesOnVector().
OperatesOnVector(GetSomeInts());
The above (other than std::move
itself) is hopefully pretty intuitive, it’s
just that we all built up weird notions of copies in the years pre-dating C++11.
For a language without garbage collection, this type of accounting gives us an
excellent mix of performance and clarity. However, it’s not without dangers, and
the big one is this: What is left in a value after it has been moved from?
T bar = std::move(foo);
CHECK(foo.empty()); // Is this valid? Maybe, but don’t count on it.
This is one of the major difficulties: what can we say about these leftover values? The C++ specification effectively says that such a value is left in a "valid but unspecified state." The safe approach is to stay away from these objects: you are allowed to re-assign to them, or let them go out of scope, but don’t make any other assumptions about their state.
We’re hoping to provide compiler-based checking to prevent you from inappropriately relying on the state of such a moved-from object. We’re not there yet, so for now, please be careful. Call these out in code review, and avoid them in your own code. Stay away from the zombies.
Yeah, one other thing to watch for is that a call to std::move
isn’t actually
a move itself, it’s just a cast to an rvalue-reference. It’s only the use of
that reference by a move constructor or move assignment that does the work.
std::vector<int> foo = GetSomeInts();
std::move(foo); // Does nothing.
// Invokes std::vector<int>’s move-constructor.
std::vector<int> bar = std::move(foo);
This should almost never happen, and you probably shouldn’t waste a lot of
mental storage on it. I really only mention it if the connection between
std::move
and a move constructor was confusing you.
First: it’s really not so bad. Once we’ve got move operations in all of our value types (hopefully coming soon), we can do away with all of the discussions of "Is this a copy? Is this efficient?" and just rely on name counting: two names, a copy. Fewer than that: no copy.
Ignoring the issue of copies, value semantics are clearer and simpler to reason about. Consider these two operations:
void Foo(std::vector<string>* paths) {
ExpandGlob(GenerateGlob(), paths);
}
std::vector<string> Bar() {
std::vector<string> paths;
ExpandGlob(GenerateGlob(), &paths);
return paths;
}
Are these the same? What about if there is existing data in *paths
? How can
you tell? Value semantics are easier for a reader to reason about than
input/output parameters, where you need to think about (and document) what
happens to existing data, and potentially whether there is an pointer ownership
transfer.
Because of the simpler guarantees about lifetime and usage when dealing with values (instead of pointers), it is easier for the compiler’s optimizers to operate on code in that style. Well-managed value semantics also minimizes hits on the allocator (which is cheap but not free). Once we understand how move semantics help rid us of copies, the compiler’s optimizers can better reason about object types, lifetimes, virtual dispatch, and a host of other issues that help generate more efficient machine code.
Efforts are underway to get all of the containers and value types in google3 to work with C++11's move semantics. Once we’ve managed this, we can stop worrying about copies and pointer semantics, and focus on writing simple easy-to-follow code. Please make sure you understand the new rules: not all google3 interfaces will be updated to return by value (instead of by output parameter), so there will always be a mix of styles. It’s important that you understand when one is more appropriate than the other.