Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused statements in serialization #8658

Merged

Conversation

paveljanik
Copy link
Contributor

@paveljanik paveljanik commented Sep 2, 2016

As the line

nVersion = this->nVersion;

seems to have no meaning in READ and also in WRITE serialization op, let's remove it and see what our tests/travis will tell us. See #8468 for previous discussion.

@dcousens
Copy link
Contributor

dcousens commented Sep 3, 2016

I'd suspect the binaries to be the same if any optimization flags are enabled.

@paveljanik
Copy link
Contributor Author

Tests/travis is OK. Can anyone test binaries, please?

@paveljanik paveljanik force-pushed the 20160902_nVersion_serialization_cleanup branch from 89d7305 to 64d9507 Compare September 9, 2016 12:00
@paveljanik
Copy link
Contributor Author

Rebased to match net.h-> addrdb.h commit from upstream.

@sipa
Copy link
Member

sipa commented Sep 9, 2016

The only place where the implicitly-passed-around nVersion variable is actually used is in CAddress::SerializationOp, and there it is just comparing with a constant directly.

@paveljanik
Copy link
Contributor Author

@sipa Exactly. And there is no such line like in the proposed change.

@sipa
Copy link
Member

sipa commented Sep 9, 2016

utACK 64d9507

After this I think we can actually go further and completely remove the nType and nVersion arguments from all SerializationOp methods, and replace them with calls to s.GetType() and s.GetVersion().

@maflcko maflcko changed the title WIP/DO NOT MERGE: Remove unused statements in serialization Remove unused statements in serialization Sep 9, 2016
@maflcko
Copy link
Member

maflcko commented Sep 9, 2016

Tests/travis is OK. Can anyone test binaries, please?

@paveljanik No need to test them. Apparently I get the same binaries, which means this is indeed dead code. (Instead of deleting the lines, you can replace them with empty lines to get rid of the offsets in the objdump). Can you check this?

@maflcko
Copy link
Member

maflcko commented Sep 9, 2016

bitcoind-matches-ACK 64d9507 (qt binaries do not match, though)

@luke-jr
Copy link
Member

luke-jr commented Sep 10, 2016

Eh, isn't this intended to allow the serialised version to override the parameter?

@sipa
Copy link
Member

sipa commented Sep 10, 2016 via email

@luke-jr
Copy link
Member

luke-jr commented Sep 10, 2016

These are all implementation-specific objects, so decentralisation is less of an issue. But I guess it's fine so long as it's never been used before (but I'm not certain of that).

@laanwj
Copy link
Member

laanwj commented Sep 14, 2016

Concept ACK.

Eh, isn't this intended to allow the serialised version to override the parameter?

It can always be added back for a certain serialization op in the oft case that that needs to be used. No need to have dead code around 'just in case'.

@paveljanik
Copy link
Contributor Author

I think this is ready for more ACKs. I volunteer for the next steps pointed out by @sipa above.

@laanwj
Copy link
Member

laanwj commented Sep 27, 2016

bitcoind-matches-ACK 64d9507 (qt binaries do not match, though)

Cannot reproduce this, I detect differences in the following functions between 64d9507 and 6898213:

void CBanEntry::SerializationOp<CDataStream, CSerActionSerialize>(CDataStream&, CSerActionSerialize, int, int)
void CBanEntry::SerializationOp<CDataStream, CSerActionUnserialize>(CDataStream&, CSerActionUnserialize, int, int)
void CHDChain::SerializationOp<CDataStream, CSerActionSerialize>(CDataStream&, CSerActionSerialize, int, int)
void CHDChain::SerializationOp<CDataStream, CSerActionUnserialize>(CDataStream&, CSerActionUnserialize, int, int)
void CKeyMetadata::SerializationOp<CDataStream, CSerActionSerialize>(CDataStream&, CSerActionSerialize, int, int)
void CKeyMetadata::SerializationOp<CDataStream, CSerActionUnserialize>(CDataStream&, CSerActionUnserialize, int, int)
void CMerkleTx::SerializationOp<CDataStream, CSerActionSerialize>(CDataStream&, CSerActionSerialize, int, int)
void CMerkleTx::SerializationOp<CDataStream, CSerActionUnserialize>(CDataStream&, CSerActionUnserialize, int, int)

@maflcko
Copy link
Member

maflcko commented Sep 27, 2016

35: mov [-$0x34a,%r8d-]{+$0x34b,%r8d+}

Make sure to replace the deleted lines by empty lines to get rid of the offset?

@paveljanik
Copy link
Contributor Author

BTW - this is not -Wshadow PR...

I think that we should compare binaries in some higher -O...

@sipa
Copy link
Member

sipa commented Sep 28, 2016

I think it's unlikely that this can result in identical binaries. It requires some cross-function reasoning across multiple modules to see this has no effect. We are changing the actual values of arguments passed down - they're just not used.

@laanwj
Copy link
Member

laanwj commented Sep 28, 2016

Make sure to replace the deleted lines by empty lines to get rid of the offset?

So these are line numbers? Ok, that'd make sense, will try again and w/ disabled LINE.

I think it's unlikely that this can result in identical binaries.

Yes the changes in the constructors are more involved.

I think that we should compare binaries in some higher -O...

Increasing -O generally makes it worse, not better. A lot of optimization settings make the output extremely sensitive to small change in the input, as well as cause a change in one function to move another (little related) function, making per-function comparison useless. This is why I went out of my way to find specific flags that work for this in my build/compare script: https://github.com/laanwj/bitcoin-maintainer-tools/blob/master/build-for-compare.py#L33 . It'd be possible to add specific flags if they're known to be safe.

@paveljanik
Copy link
Contributor Author

It is very unlikely to produce the same binary, but to be safe, we should fully understand the difference.

Can template inline play some role here?

As a side note: in the "gcc set" -Wshadow I automatically changed nVersion to _nVersion in class CBlock : public CBlockHeader. It was not used at all in the function, so I thought without testing it is OK. But p2p-segwit.py started reproducibly failing. I used that binary to test -rescan on testnet and it always "failed" - see #8808 (review) for details. Thus I had to revert it. Simple rename! Can you shed some light on it? Maybe it is relevant with this one.

@laanwj
Copy link
Member

laanwj commented Sep 28, 2016

So these are line numbers? Ok, that'd make sense, will try again and w/ disabled LINE.

They had to do with __LINE__ usage in LOCK/TRY_LOCK, which is interesting because without lock debugging the line numbers aren't used at all. So our sync.h macros/wrappers do incur a slight overhead even in that case.

In any case I've updated the above list, they no longer show up after removing line number sensitivity there.

@laanwj
Copy link
Member

laanwj commented Sep 28, 2016

We are changing the actual values of arguments passed down - they're just not used.

Right: READWRITE expands to

#define READWRITE(obj)      (::SerReadWrite(s, (obj), nType, nVersion, ser_action))

Thus silently passes nVersion to ::SerReadWrite. This function may or may not use the argument, but it is used.

This does change my opinion on this change from "harmless" to "hard to verify for correctness".

As a side note: in the "gcc set" -Wshadow I automatically changed nVersion to _nVersion in class CBlock : public CBlockHeader.
It was not used at all in the function, so I thought without testing it is OK.

Indeed, if you change the argument name, the name binding will change and functions called will suddenly receive this->nVersion instead of the function argument. this->nVersion obviously gets updated by the READWRITE(this->nVersion).
This will change the executable, but should not change the behavior, which is essentially the same as nVersion = this->nVersion. So by renaming you'd keep the behavior that there is right now, instead of changing it as in this PR.

At least that's what I would infer. I don't see why it would lead to a crashing segwit test though... This makes it kind of scary.

@laanwj
Copy link
Member

laanwj commented Sep 28, 2016

in class CBlock : public CBlockHeader.

OHH I get it, maybe.
Unlike the serialization functions changed in this PR, CBlockHeader::SerializationOp does not have the nVersion = this->nVersion line.
This means that the behavior does change if you rename the argument nVersion to _nVersion. After all,

READWRITE(this->nVersion);
READWRITE(hashPrevBlock);  

expands to

(::SerReadWrite(s, (this->nVersion), nType, nVersion, ser_action))
(::SerReadWrite(s, (hashPrevBlock), nType, nVersion, ser_action))

So the first line will changes this->nVersion, the second line passes nVersion. Which, after your argument renaming, refers to this->nVersion too. SO you effectively added nVersion = this->nVersion, which is a behavior change.

Do you really need that change for shadowing? I'd prefer not to do that, it looks to me that we're taking a huge risk just to avoid some compiler warnings. CBlock is as deep in consensus code as you can get.

@dcousens
Copy link
Contributor

dcousens commented Sep 28, 2016

@laanwj IMHO, this reveals that READWRITE is hiding intricacies which should instead be in plain sight.

@laanwj
Copy link
Member

laanwj commented Sep 28, 2016

@laanwj IMHO, this reveals that READWRITE is hiding intricacies which should instead be in plain sight.

IMO it'd have been better to just write those macros out., they make the code shorter apparently but much more obfuscated.
(but that's an issue for another time)

@sipa
Copy link
Member

sipa commented Sep 28, 2016 via email

@maflcko
Copy link
Member

maflcko commented Sep 28, 2016

Guess I've been doing it wrong then.

git checkout bitcoin/master && \
git reset --hard HEAD && \
curl https://raw.githubusercontent.com/laanwj/bitcoin-maintainer-tools/6e4425587736144b067f67ad792d9ee904e74fd7/patches/stripbuildinfo.patch | patch -p 1 && \
make -j 2 && \
objdump -d -r -C --no-show-raw-insn src/bitcoind > /tmp/d_old && \
curl https://github.com/bitcoin/bitcoin/commit/64d9507ea5724634783cdaa290943292132086a9.diff | patch -p 1 && \
make -j 2 && \
objdump -d -r -C --no-show-raw-insn src/bitcoind > /tmp/d_new && \
diff /tmp/d_old /tmp/d_new | wc

      0       0       0

@laanwj
Copy link
Member

laanwj commented Sep 29, 2016

@MarcoFalke the differences there would be:

  • I check the commits themselves, you merge the patch on top of master
  • I use a specific set of optimization flags, you use default optimization flags (-O2)

I don't think the first would make executables suddenly match. I'll retry with "-O2" and see if I can get a match.

@laanwj
Copy link
Member

laanwj commented Sep 29, 2016

Good news: using -O2 gives a complete match on bitcoind:

317c917ed86f7c32113598876ac3f48f84ea73281d01f69df1de78f429a019ed  /tmp/compare/bitcoind.64d9507.stripped
317c917ed86f7c32113598876ac3f48f84ea73281d01f69df1de78f429a019ed  /tmp/compare/bitcoind.6898213.stripped

as well as on bitcoin-qt

8675a67f3c2e5d4fc76f0083caeaa50585ba612e9b881ca22deea35de7f62c8b  /tmp/compare/bitcoin-qt.64d9507.stripped
8675a67f3c2e5d4fc76f0083caeaa50585ba612e9b881ca22deea35de7f62c8b  /tmp/compare/bitcoin-qt.6898213.stripped

ACK 64d9507

@paveljanik
Copy link
Contributor Author

@laanwj Thanks!

@laanwj
Copy link
Member

laanwj commented Sep 29, 2016

This still has a [WIP] tag on the commit. However I'm going to merge nevertheless, as rebasing to change the commit message would have us all re-check executables again...

@laanwj laanwj merged commit 64d9507 into bitcoin:master Sep 29, 2016
laanwj added a commit that referenced this pull request Sep 29, 2016
64d9507 [WIP] Remove unused statement in serialization (Pavel Janík)
codablock pushed a commit to codablock/dash that referenced this pull request Jan 12, 2018
64d9507 [WIP] Remove unused statement in serialization (Pavel Janík)
zkbot added a commit to zcash/zcash that referenced this pull request Apr 18, 2018
Upstream serialization improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#5264
- bitcoin/bitcoin#6914
- bitcoin/bitcoin#6215
- bitcoin/bitcoin#8068
  - Only the `COMPACTSIZE` wrapper commit
- bitcoin/bitcoin#8658
- bitcoin/bitcoin#8708
  - Only the serializer variadics commit
- bitcoin/bitcoin#9039
- bitcoin/bitcoin#9125
  - Only the first two commits (the last two block on other upstream PRs)

Part of #2074.
zkbot added a commit to zcash/zcash that referenced this pull request Apr 19, 2018
Upstream serialization improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#5264
- bitcoin/bitcoin#6914
- bitcoin/bitcoin#6215
- bitcoin/bitcoin#8068
  - Only the `COMPACTSIZE` wrapper commit
- bitcoin/bitcoin#8658
- bitcoin/bitcoin#8708
  - Only the serializer variadics commit
- bitcoin/bitcoin#9039
- bitcoin/bitcoin#9125
  - Only the first two commits (the last two block on other upstream PRs)

Part of #2074.
andvgal pushed a commit to energicryptocurrency/gen2-energi that referenced this pull request Jan 6, 2019
64d9507 [WIP] Remove unused statement in serialization (Pavel Janík)
furszy added a commit to PIVX-Project/PIVX that referenced this pull request May 29, 2020
249cc9d Avoid -Wshadow errors (random-zebra)
8e1ec9e Use fixed preallocation instead of costly GetSerializeSize (random-zebra)
9b801d0 Add optimized CSizeComputer serializers (random-zebra)
0035a54 Make CSerAction's ForRead() constexpr (random-zebra)
9730a3f Get rid of nType and nVersion (random-zebra)
25ce2bb Make GetSerializeSize a wrapper on top of CSizeComputer (random-zebra)
1b479db Make nType and nVersion private and sometimes const (random-zebra)
35f1755 Make streams' read and write return void (random-zebra)
a395914 Remove unused ReadVersion and WriteVersion (random-zebra)
52e614c [WIP] Remove unused statement in serialization (random-zebra)
82a2021 Add COMPACTSIZE wrapper similar to VARINT for serialization (random-zebra)
13ad779 add bip32 pubkey serialization (random-zebra)
9e9b7b5 [QA] Update json files with sig hash type in ASM for bitcoin-util-test (random-zebra)
3383983 Resolve issue bitcoin#3166 (random-zebra)

Pull request description:

  -Based on top of
  - [x] #1629

  Backports the following serialization improvements from upstream and adds the required changes for the 2nd layer network and the legacy zerocoin code.

  - bitcoin#5264
    > show scriptSig signature hash types. fixes bitcoin#3166
    >
    > The fix basically appends the scriptSig signature hash types, within parentheses, onto the end of the signature(s) in the various "asm" json outputs. That's just the first formatting idea that came to my mind.
    >
    > Added some tests for this too.

  - bitcoin#6215
    > CExtPubKey should be serializable like CPubKey.
    > This would allow storing extended private and public key to support BIP32/HD wallets.

  - bitcoin#8068 (only commit 5249dac)
     This adds COMPACTSIZE wrapper similar to VARINT for serialization

  - bitcoin#8658
    > As the line
    > ```
    > nVersion = this->nVersion;
    > ```
    > seems to have no meaning in READ and also in WRITE serialization op, let's remove it and see what our tests/travis will tell us. See bitcoin#8468 for previous discussion.

  - bitcoin#9039
    > The commits in this pull request implement a sequence of changes:
    >
    > - Simplifications:
    >   - **Remove unused ReadVersion and WriteVersion** CDataStream and CAutoFile had a ReadVersion and WriteVersion method that was never used. Remove them.
    >   - **Make nType and nVersion private and sometimes const** Make the various stream implementations' nType and nVersion private and const (except in CDataStream where we really need a setter).
    >   - **Make streams' read and write return void** The stream implementations have two layers (the upper one with operator<< and operator>>, and a lower one with read and write). The lower layer's return values are never used (nor should they, as they should only be used from the higher layer), so make them void.
    >   - **Make GetSerializeSize a wrapper on top of CSizeComputer** Given that in default GetSerializeSize implementations we're already using CSizeComputer(), get rid of the specialized GetSerializeSize methods everywhere, and just use CSizeComputer. This removes a lot of code which isn't actually used anywhere. In a few places, this removes an actually more efficient size computing algorithm, which we'll bring back in the "Add optimized CSizeComputer serializers" commit later.
    >   - **Get rid of nType and nVersion** The big change: remove the nType and nVersion as parameters to all serialization methods and functions. There is only one place where it's read and has an impact (in CAddress), and even there it does not impact any of the member objects' serializations. Instead, the few places that need nType or nVersion read it directly from the stream, through GetType and GetVersion calls which are added to all streams.
    >   - **Avoid -Wshadow errors** As suggested by @paveljanik, remove the few remaining cases of variable shadowing in the serialization code.
    > - Optimizations:
    >   - **Make CSerAction's ForRead() constexpr** The CSerAction's ForRead() method does not depend on any runtime data, so guarantee that requests to it can be optimized out by making it constexpr (suggested by @theuni in bitcoin#8580).
    >   - **Add optimized CSizeComputer serializers** To get the advantages of faster GetSerializeSize implementations back, reintroduce them in the few places where they actually make a difference, in the form of a specialized Serialize implementation. This actually gets us in a better state than before, as these even get used when they're nested inside the serialization of another object.
    >   - **Use fixed preallocation instead of costly GetSerializeSize** dbwrapper uses GetSerializeSize to compute the size of the buffer to preallocate. For some cases (specifically: CCoins) this requires a costly compression call. Avoid this by just using fixed size preallocations instead.
    >
    > This will make it easier to address @TheBlueMatt's comments in bitcoin#8580, resulting is a simpler and more efficient way to simultaneously deserialize+construct objects with const members from streams.

ACKs for top commit:
  furszy:
    Long and nice PR 👌 , code review ACK 249cc9d .
  Fuzzbawls:
    ACK 249cc9d
  furszy:
    tested ACK 249cc9d and merging.

Tree-SHA512: 56b07634b1e18871e7c9a99d412282c83b85f77f1672ec56330a1131fc7c234cd1ba3a053bdd210cc29f1e636ee374477ff614fa9a930329a7f8f912c5006232
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants