build: comparison tool swap #6305

Merged
merged 1 commit into from Jun 19, 2015

Conversation

Projects
None yet
4 participants
@theuni
Member

theuni commented Jun 19, 2015

As suggested here: #6278 (comment)

This should be functionally identical to what's in place now. It was built from
theuni/bitcoinj@be0eef7

That commit is the same as this pruned commit in TheBlueMatt's repo:
TheBlueMatt/bitcoinj@0f7b5d8

Now we'll be able to trust the line numbers in the stack traces.

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jun 19, 2015

Member

While there are no other builds in the queue, I'll force-push a few times to try to trigger the NPE

Member

theuni commented Jun 19, 2015

While there are no other builds in the queue, I'll force-push a few times to try to trigger the NPE

build: comparison tool swap
This should be functionally identical to what's in place now. It was built from
theuni/bitcoinj@be0eef7

That commit is the same as this pruned commit in TheBlueMatt's repo:
TheBlueMatt/bitcoinj@0f7b5d8

Now we'll be able to trust the line numbers in the stack traces.
@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jun 19, 2015

Member

Grr, 7 forced rebuilds and no errors. I'm not sure if that's a good thing or not :)

Member

theuni commented Jun 19, 2015

Grr, 7 forced rebuilds and no errors. I'm not sure if that's a good thing or not :)

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 19, 2015

Member

Well it tends to happen in droves, as if it depends on the load on the Travis server.
I wouldn't applaud yet :)

Member

laanwj commented Jun 19, 2015

Well it tends to happen in droves, as if it depends on the load on the Travis server.
I wouldn't applaud yet :)

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 19, 2015

Member

Going ahead and merging this for troubleshooting improvement.

Member

laanwj commented Jun 19, 2015

Going ahead and merging this for troubleshooting improvement.

@laanwj laanwj merged commit a4d9f95 into bitcoin:master Jun 19, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Jun 19, 2015

Merge pull request #6305
a4d9f95 build: comparison tool swap (Cory Fields)
@dexX7

This comment has been minimized.

Show comment
Hide comment
@dexX7

dexX7 Jun 19, 2015

Contributor

The issue is still present. I looped it locally, until it failed (took around 30 min):

06:34:40 15 BitcoindComparisonTool$1.onPreMessageReceived: Requested 230386124745549b4a14cb801c16c308568fcc5679b5afd9df0d8601cf938f7e
06:34:40 15 Peer.processMessage: Received unhandled message: Reject: block 0f863347c378794f3a04e2899158a2f952c3720159fe71b6e13d23cca538f4d2 for reason 'bad-txns-inputs-missingorspent' (16)
06:34:40 15 Peer.processMessage: Received unhandled message: Reject: block 230386124745549b4a14cb801c16c308568fcc5679b5afd9df0d8601cf938f7e for reason 'bad-blk-sigops' (16)
06:34:40 1 BitcoindComparisonTool.main: Sent inv with block 0f863347c378794f3a04e2899158a2f952c3720159fe71b6e13d23cca538f4d2
06:34:40 15 BitcoindComparisonTool$1.onPreMessageReceived: Got empty header message from bitcoind
Exception in thread "main" java.lang.NullPointerException
    at com.google.bitcoin.core.BitcoindComparisonTool.main(BitcoindComparisonTool.java:311)

debug.log:

2015-06-19 16:34:40 received: getheaders (69 bytes) peer=1
2015-06-19 16:34:40 getheaders -1 to 0000000000000000000000000000000000000000000000000000000000000000 from peer=1
2015-06-19 16:34:40 sending: headers (1 bytes) peer=1
2015-06-19 16:34:40 received: ping (8 bytes) peer=1
2015-06-19 16:34:40 sending: pong (8 bytes) peer=1
2015-06-19 16:34:41 trying connection 0.0.0.0 lastseen=0.0hrs

It's caused by BitcoindComparisonTool.java#L311:

bitcoind.ping().get();

More specifically, because future of PendingPing (Peer.java#L1424) was null, which was returned by ping(), presumably because a pong was received, and processed via PendingPing#complete() (Peer.java#L1383), before it went through:

/**
* Sends the peer a ping message and returns a future that will be invoked when the pong is received back.
* The future provides a number which is the number of milliseconds elapsed between the ping and the pong.
* Once the pong is received the value returned by {@link com.google.bitcoin.core.Peer#getLastPingTime()} is
* updated.
* @throws ProtocolException if the peer version is too low to support measurable pings.
*/
public ListenableFuture<Long> ping()
Contributor

dexX7 commented Jun 19, 2015

The issue is still present. I looped it locally, until it failed (took around 30 min):

06:34:40 15 BitcoindComparisonTool$1.onPreMessageReceived: Requested 230386124745549b4a14cb801c16c308568fcc5679b5afd9df0d8601cf938f7e
06:34:40 15 Peer.processMessage: Received unhandled message: Reject: block 0f863347c378794f3a04e2899158a2f952c3720159fe71b6e13d23cca538f4d2 for reason 'bad-txns-inputs-missingorspent' (16)
06:34:40 15 Peer.processMessage: Received unhandled message: Reject: block 230386124745549b4a14cb801c16c308568fcc5679b5afd9df0d8601cf938f7e for reason 'bad-blk-sigops' (16)
06:34:40 1 BitcoindComparisonTool.main: Sent inv with block 0f863347c378794f3a04e2899158a2f952c3720159fe71b6e13d23cca538f4d2
06:34:40 15 BitcoindComparisonTool$1.onPreMessageReceived: Got empty header message from bitcoind
Exception in thread "main" java.lang.NullPointerException
    at com.google.bitcoin.core.BitcoindComparisonTool.main(BitcoindComparisonTool.java:311)

debug.log:

2015-06-19 16:34:40 received: getheaders (69 bytes) peer=1
2015-06-19 16:34:40 getheaders -1 to 0000000000000000000000000000000000000000000000000000000000000000 from peer=1
2015-06-19 16:34:40 sending: headers (1 bytes) peer=1
2015-06-19 16:34:40 received: ping (8 bytes) peer=1
2015-06-19 16:34:40 sending: pong (8 bytes) peer=1
2015-06-19 16:34:41 trying connection 0.0.0.0 lastseen=0.0hrs

It's caused by BitcoindComparisonTool.java#L311:

bitcoind.ping().get();

More specifically, because future of PendingPing (Peer.java#L1424) was null, which was returned by ping(), presumably because a pong was received, and processed via PendingPing#complete() (Peer.java#L1383), before it went through:

/**
* Sends the peer a ping message and returns a future that will be invoked when the pong is received back.
* The future provides a number which is the number of milliseconds elapsed between the ping and the pong.
* Once the pong is received the value returned by {@link com.google.bitcoin.core.Peer#getLastPingTime()} is
* updated.
* @throws ProtocolException if the peer version is too low to support measurable pings.
*/
public ListenableFuture<Long> ping()
@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jun 19, 2015

Member

@dexX7 Thanks for having a look. The line numbers didn't change, so it's the same as before.

I came to the same conclusion when I looked into it last time, but @mikehearn was confident that the issue couldn't be here. My guess was that it could be caused by a non-so-random PRNG causing ping lookups to race. In that case, replacing the bitcoind.ping().get() with bitcoind.ping(counter++).get() would avoid the issue I think.

@mikehearn Mind having another look, now that we're sure the line#s are good? Even better, since @dexX7 can reproduce locally, maybe he'll be able to test proposed changes more easily.

Member

theuni commented Jun 19, 2015

@dexX7 Thanks for having a look. The line numbers didn't change, so it's the same as before.

I came to the same conclusion when I looked into it last time, but @mikehearn was confident that the issue couldn't be here. My guess was that it could be caused by a non-so-random PRNG causing ping lookups to race. In that case, replacing the bitcoind.ping().get() with bitcoind.ping(counter++).get() would avoid the issue I think.

@mikehearn Mind having another look, now that we're sure the line#s are good? Even better, since @dexX7 can reproduce locally, maybe he'll be able to test proposed changes more easily.

@dexX7

This comment has been minimized.

Show comment
Hide comment
@dexX7

dexX7 Jun 19, 2015

Contributor

My guess was that it could be caused by a non-so-random PRNG causing ping lookups to race.

Ah, just to clarify: I'm not sure, how it's caused, and this was just a guess.

But regarding the nonce lookup "collision": I gave it a try and it looks like the nonce can be anything, there always seems to be only one ping in the pending pings list (at least in these tests, for what I saw).

I quickly created a stripped ping-pong only version of the tool with additional log output, and when running via:

qa/pull-tester/run-bitcoind-for-test.sh /usr/bin/java -jar /path/to/bitcoinj/core/target/pull-tests.jar qa/tmp/compTool 0 2>&1

Then the flipped created/clearing log entries further indicate the pong was indeed processed, before ping() finished:

09:44:37 1 Peer.ping: created ping with nonce: 1102
09:44:37 11 Peer.processPong: clearing pong with nonce: 1102 (pending: 1)
09:44:37 1 Peer.ping: created ping with nonce: 1103
09:44:37 11 Peer.processPong: clearing pong with nonce: 1103 (pending: 1)
09:44:37 1 Peer.ping: created ping with nonce: 1104
09:44:37 11 Peer.processPong: clearing pong with nonce: 1104 (pending: 1)
09:44:37 11 Peer.processPong: clearing pong with nonce: 1105 (pending: 1) <-- uh oh
09:44:37 1 Peer.ping: created ping with nonce: 1105
09:44:37 1 BitcoindComparisonTool.main: ERROR: ping() with nonce 1105 returned null
Contributor

dexX7 commented Jun 19, 2015

My guess was that it could be caused by a non-so-random PRNG causing ping lookups to race.

Ah, just to clarify: I'm not sure, how it's caused, and this was just a guess.

But regarding the nonce lookup "collision": I gave it a try and it looks like the nonce can be anything, there always seems to be only one ping in the pending pings list (at least in these tests, for what I saw).

I quickly created a stripped ping-pong only version of the tool with additional log output, and when running via:

qa/pull-tester/run-bitcoind-for-test.sh /usr/bin/java -jar /path/to/bitcoinj/core/target/pull-tests.jar qa/tmp/compTool 0 2>&1

Then the flipped created/clearing log entries further indicate the pong was indeed processed, before ping() finished:

09:44:37 1 Peer.ping: created ping with nonce: 1102
09:44:37 11 Peer.processPong: clearing pong with nonce: 1102 (pending: 1)
09:44:37 1 Peer.ping: created ping with nonce: 1103
09:44:37 11 Peer.processPong: clearing pong with nonce: 1103 (pending: 1)
09:44:37 1 Peer.ping: created ping with nonce: 1104
09:44:37 11 Peer.processPong: clearing pong with nonce: 1104 (pending: 1)
09:44:37 11 Peer.processPong: clearing pong with nonce: 1105 (pending: 1) <-- uh oh
09:44:37 1 Peer.ping: created ping with nonce: 1105
09:44:37 1 BitcoindComparisonTool.main: ERROR: ping() with nonce 1105 returned null
@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jun 19, 2015

Member

@dexX7 Ah, great work. Looks like we have an answer :)

Member

theuni commented Jun 19, 2015

@dexX7 Ah, great work. Looks like we have an answer :)

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 21, 2015

Contributor

@dexX7 Thanks! The race is fixed in this commit:

bitcoinj/bitcoinj@ce50e0b

Running "mvn package" inside a git master checkout of bitcoinj should create a file core/target/pull-tests.jar that contains a fresh pull tester.

Contributor

mikehearn commented Jun 21, 2015

@dexX7 Thanks! The race is fixed in this commit:

bitcoinj/bitcoinj@ce50e0b

Running "mvn package" inside a git master checkout of bitcoinj should create a file core/target/pull-tests.jar that contains a fresh pull tester.

theuni added a commit to theuni/bitcoin that referenced this pull request Jun 21, 2015

build: comparison tool NPE fix
This should fix the spurious comparison tool failures.
See discussion here: bitcoin#6305

The race fix was cherry-picked on top of the version we're currently using, so
it should be functionally identical otherwise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment