Serialization improvements #10785

Open
wants to merge 26 commits into
from

Conversation

Projects
None yet
4 participants
Owner

sipa commented Jul 10, 2017 edited

This PR improves correctness (removing potentially unsafe const_casts) and flexibility of the serialization code.

The main issue is that use of the current ADD_SERIALIZE_METHODS macro (which is the only way to not duplicate serialization and deserialization code) only expands to a single class method, and thus can only be qualified as either const or non-const - not both. In many cases, serialization needs to work on const objects however, and preferably that is done without casts that could hide const-correctness bugs.

To deal with that, this PR introduces a new approach that includes a SERIALIZE_METHODS(obj) macro, where obj is a variable name. It expands to some boilerplate and a static method to which the object itself is an argument. The advantage is that its type can be templated, and be const when serializing.

Another issue is the various serialization-wrapping macros (VARINT, COMPACTSIZE, FLATDATA and LIMITED_STRING). They all const_cast their argument in order to construct a wrapper object, which supports both serialization and deserialization. This PR makes them templated in the underlying data type (for example, CompactSizeWrapper<uint64_t>). This has the advantage that we can make the template type const when invoked on a const variable (so it would be CompactSizeWrapper<const uint64_t> in that case).

A last issue is the persistent use of the REF macro to deal with temporary expressions being passed in. Since C++11, this is not needed anymore as temporaries are explicitly represented as rvalue references. Thus we can remove REF invocations and instead just make the various classes and helper functions deal correctly with references.

The above changes permit a fully const-correct version of all serialization code. However, it is cumbersome. Any existing ADD_SERIALIZE_METHODS instances in the code that do more than just (conditionally) serializing/deserializing some fields (in particular, it contains branches that assign to some of the variables) need to be split up into an explicit Serialize and Unserialize method instead. In some cases this is inevitable (wallet serializers do some crazy transformations while serializing!), but in many cases it is just annoying duplication.

To improve upon this, a few more primitives that are currently inlined are turned into serialization wrappers:

  • BigEndianWrapper: Serializes/deserializes an integer as big endian rather than little endian (only for 16-bit). This permits the CService serialization to become a oneliner.
  • Uint48Wrapper: Serializes/deserializes only the lower 48 bits of an integer (used in BIP152 code).
  • VectorApplyWrapper: Serializes/deserializes a vector while using a custom serializer for its elements. This simplifies the undo and blockencoding serializers a lot.

Best of all, it removes 147 lines of while code adding a bunch of comments (though the increased use of vararg READWRITE is probably cheating a bit).

The commits are ordered into 3 sections:

  • First, introduce new classes that permit const-correct serialization.
  • Then one by one transform the various files to use the new serializers.
  • Finally, remove the old serializers.

This may be too much to go in at once. I'm happy to split things up as needed.

Member

gmaxwell commented Jul 10, 2017

Should probably be tested on big endian. :)

@sipa

Here is a self-review in which I point out some of the things reviewers may want to be aware of.

+ void Unserialize(Stream& s)
+ {
+ std::vector<uint64_t> tmp;
+ s >> blockhash >> VectorApply<CompactSizeWrapper>(tmp);
@sipa

sipa Jul 10, 2017 edited

Owner

For ease of implementation, deserialization first happens into a std::vector<uint64_t>, and is then converted. This means a temporary is created and allocated, which is an overhead that the old implementation didn't have.

src/test/serialize_tests.cpp
- READWRITEMANY(intval, boolval, stringval, FLATDATA(charstrval), txval);
+ SERIALIZE_METHODS(obj)
+ {
+ READWRITE(obj.intval, obj.boolval, obj.stringval, FlatData(obj.charstrval), obj.txval);
@sipa

sipa Jul 10, 2017

Owner

This whole test is somewhat less valuable now, as both cases use READWRITE.

- inline void SerializationOp(Stream& s, Operation ser_action) {
- if (ser_action.ForRead())
- Init(NULL);
+ template<typename Stream>
@sipa

sipa Jul 10, 2017

Owner

This is one of the more involved changes, as it's both splitting the serializer into two versions, and the Serialize code no longer modifies mapValue in-place (wtf?).

- ReadOrderPos(nOrderPos, mapValue);
+ template<typename Stream>
@sipa

sipa Jul 10, 2017 edited

Owner

Here is another big change, that avoids modifying mapValue and strAccount and then later fixing it up before returning (wtf?).

+ * V is not required to be an std::vector type. It works for any class that
+ * exposes a value_type, iteration, and resize method that behave like vectors.
+ */
+template<template <typename> class W, typename V> class VectorApplyWrapper
@sipa

sipa Jul 10, 2017

Owner

Notice the unusual construction of a template that takes a template as parameter here. See "Template template parameter" here: http://en.cppreference.com/w/cpp/language/template_parameters

{
- ::Serialize(s, std::forward<Arg>(arg));
+ ::Serialize(s, arg);
@sipa

sipa Jul 10, 2017

Owner

The reason for removing the std::forward calls here is explained in the commit message (there is no benefit in passing down the rvalue-ness).

laanwj requested a review from jonasschnelli Jul 11, 2017

Member

jonasschnelli commented Jul 11, 2017

Concept ACK.
Binaries: https://bitcoin.jonasschnelli.ch/build/210 (Currently running on a fresh node)
Agree with @gmaxwell that some BE testing would be good.

Will code-review soon.

sipa added some commits Jul 7, 2017

@sipa sipa Introduce new serialization macros without casts
This new approach uses a static method which takes the object as
a argument. This has the advantage that its constness can be a
template parameter, allowing a single implementation that sees the
object as const for serialization and non-const for deserialization,
without casts.

More boilerplate is included in the new macro as well.
e7208b9
@sipa sipa Support deserializing into temporaries
Currently, the READWRITE macro cannot be passed any non-const temporaries, as
the SerReadWrite function only accepts lvalue references.

Deserializing into a temporary is very common, however. See for example
things like 's >> VARINT(n)'. The VARINT macro produces a temporary wrapper
that holds a reference to n.

Fix this by accepting non-const rvalue references instead of lvalue references.
We don't propagate the rvalue-ness down, as there are no useful optimizations
that only apply to temporaries.
beeebe6
@sipa sipa Add READWRITEAS, a macro to serialize safely as a different type 4b0b7df
@sipa sipa Generalize CompactSize wrapper
This makes it const-correct and usable for other integer types.
f80d6b5
@sipa sipa Generalize VarInt wrappers a1cfa72
@sipa sipa Generalize FlatData wrapper 8db2354
@sipa sipa Generalize LimitedString wrapper 6020efb
@sipa sipa Merge READWRITEMANY into READWRITE
READWRITEMANY is more general, and its single-argument form is
identical to READWRITE.

After this, only a variable-argument READWRITE remains.
07a965c
@sipa sipa Add BigEndian serialization wrapper c891032
@sipa sipa Add custom vector-element serialization wrapper
This allows a very compact notation for serialization of vectors whose
elements are not serialized using their default encoding.
d2de054
@sipa sipa Convert primitives to new serialization 31f8a21
@sipa sipa Convert addrdb/addrman to new serialization 3b0cfd6
@sipa sipa Convert blockencodings to new serialization f2a45e5
@sipa sipa Convert merkleblock/bloom to new serialization 447ed4b
@sipa sipa Convert chain to new serialization 15cf59d
@sipa sipa Convert feerate to new serialization 4e038b0
@sipa sipa Convert protocol to new serialization 9977f5e
@sipa sipa Move compressor utility functions out of class 523973e
@sipa sipa Convert compressor/txdb/coins/undo/script to new serialization eb156cf
@sipa sipa Convert Qt to new serialization db96db5
@sipa sipa Convert rest to new serialization 115e545
@sipa sipa Convert dbwrapper tests to new serialization 2dc7350
@sipa sipa Convert serialize_tests to new serialization 30ecea4
@sipa sipa Convert wallet/walletdb/crypter to new serialization fbaa24b
@sipa sipa Convert netaddress to new serialization fab2228
@sipa sipa Remove old serialization primitives b0652ac
Owner

sipa commented Jul 30, 2017

Made some changes to reduce the size of the overall diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment