Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abandonconflict.py randomly fails on 'dev' and 'release' branches #537

Closed
ftrader opened this issue May 7, 2017 · 17 comments
Closed

abandonconflict.py randomly fails on 'dev' and 'release' branches #537

ftrader opened this issue May 7, 2017 · 17 comments

Comments

@ftrader
Copy link
Collaborator

ftrader commented May 7, 2017

This test fails randomly. Results of repetitive testing for 'abandonconflict.py' on Debian 7 x86_64 native:

'dev' branch: 43/100 failed
'release' branch: 42/100 failed

Release tag v1.01.4 : also fails 2/10
Release tag 1.0.1.3 tested ok - 10/10 PASS

On Ubuntu 16.10: 57/100 failed on 'dev'. Didn't bother to run repetivite on 'release', went straight for a bisection using the bisect.sh script from #532 .
Command was:
git bisect run bisect.sh 20 300 'qa/pull-tester/rpc-tests.py abandonconflict' (with 20 iterations, 300s timeout and RECONFIGURE_BETWEEN_RUNS=1). Starting commit (bad revision) was e78ce5e , good revision (through manual test above) was 1.0.1.3
Result:
first bad commit: [4e55fdde78e3406aa02d535fd928e3fa5fac4f72] Disconnect and ban a node if they fail to provide the inital HEADERS

A bisection for 'dev' is still in progress.

@ptschip
Copy link
Collaborator

ptschip commented May 7, 2017 via email

@ftrader
Copy link
Collaborator Author

ftrader commented May 7, 2017

@ptschip : Travis also has a problem with this test (this is how I initially noticed this test not passing well).

My system is old, but I'm pretty sure it's not THAT slow that it can hardly run a Bitcoin regression test :-)

I'll try to confirm though whether it's entirely timeout related. Thanks for the hint.

@ftrader
Copy link
Collaborator Author

ftrader commented May 7, 2017

Example failure on 'dev' from ongoing bisection:

JSONRPC error: non-JSON HTTP response with '503 Service Unavailable' from server
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/test_framework.py", line 184, in main
    self.run_test()
  File "/home/ftrader/bu/qa/rpc-tests/abandonconflict.py", line 27, in run_test
    sync_mempools(self.nodes)
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/util.py", line 170, in sync_mempools
    pool = set(rpc_connections[0].getrawmempool())
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/coverage.py", line 51, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/authproxy.py", line 156, in __call__
    response = self._request('POST', self.__url.path, postdata)
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/authproxy.py", line 135, in _request
    return self._get_response()
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/authproxy.py", line 179, in _get_response
    'code': -342, 'message': 'non-JSON HTTP response with \'%i %s\' from server' % (http_response.status, http_response.reason)})
Stopping nodes
Traceback (most recent call last):
  File "/home/ftrader/bu/qa/rpc-tests/abandonconflict.py", line 157, in <module>
    AbandonConflictTest().main()
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/test_framework.py", line 203, in main
    stop_nodes(self.nodes)
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/util.py", line 376, in stop_nodes
    node.stop()
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/coverage.py", line 51, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/authproxy.py", line 156, in __call__
    response = self._request('POST', self.__url.path, postdata)
  File "/home/ftrader/bu/qa/rpc-tests/test_framework/authproxy.py", line 134, in _request
    self.__conn.request(method, path, postdata, headers)
  File "/home/ftrader/lib/python3.5/http/client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "/home/ftrader/lib/python3.5/http/client.py", line 1151, in _send_request
    self.endheaders(body)
  File "/home/ftrader/lib/python3.5/http/client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "/home/ftrader/lib/python3.5/http/client.py", line 934, in _send_output
    self.send(msg)
  File "/home/ftrader/lib/python3.5/http/client.py", line 877, in send
    self.connect()
  File "/home/ftrader/lib/python3.5/http/client.py", line 849, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/home/ftrader/lib/python3.5/socket.py", line 711, in create_connection
    raise err
  File "/home/ftrader/lib/python3.5/socket.py", line 702, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

@ftrader
Copy link
Collaborator Author

ftrader commented May 7, 2017

Bisection result from 'dev' on Debian 7 native: 1df110e is the first bad commit
(same principal change as on 'release')
Bisection log:

$ git bisect log
# bad: [9d85241826f2521eec56b5d95f6b009bdf260eb5] Merge pull request #525 from kyuupichan/exp-unify
# good: [0824e741b3ed47a689ae4490b4206dda378da7ef] Improve tor documentation
git bisect start 'HEAD' '0824e741b3ed47a689ae4490b4206dda378da7ef'
# bad: [8708ac1bde8fe3bf8129b96af7050601723df962] Merge #9008: [net] Remove assert(nMaxInbound > 0)
git bisect bad 8708ac1bde8fe3bf8129b96af7050601723df962
# bad: [2982f4753f953a250b5e8b9ed7e4a36a504df6b5] Merge pull request #422 from AllanDoensen/devGuiFix
git bisect bad 2982f4753f953a250b5e8b9ed7e4a36a504df6b5
# bad: [b0790846dd08509f308e1e135f41d29b8f2aca14] Merge pull request #401 from gandrewstone/block_sync_polling_loop
git bisect bad b0790846dd08509f308e1e135f41d29b8f2aca14
# good: [fccefa5b2eaa5625547c8ebc58cd19e14672c154] Remove BU-specific alert system bits
git bisect good fccefa5b2eaa5625547c8ebc58cd19e14672c154
# good: [87392b448973b284d05400ef778023f288d6fee0] Merge pull request #360 from kyuupichan/remove-alerts
git bisect good 87392b448973b284d05400ef778023f288d6fee0
# bad: [101502f859321be2fbd5c5bd29bfa539ba88a911] Merge pull request #356 from ptschip/dev_sync
git bisect bad 101502f859321be2fbd5c5bd29bfa539ba88a911
# good: [0ba700c6b22a83d5961106ebd4d62a2b3f38c1d5] Merge pull request #351 from ptschip/dev_ui
git bisect good 0ba700c6b22a83d5961106ebd4d62a2b3f38c1d5
# bad: [1df110e02cfb4643fc61d74636cdd8968f827e62] Disconnect and ban a node if they fail to provide the inital HEADERS
git bisect bad 1df110e02cfb4643fc61d74636cdd8968f827e62
# first bad commit: [1df110e02cfb4643fc61d74636cdd8968f827e62] Disconnect and ban a node if they fail to provide the inital HEADERS

@gandrewstone
Copy link
Collaborator

The problem is that a node makes a HEADERs request, and the other node must respond or get booted.

But GETHEADERs does not always respond. For example:

if (IsInitialBlockDownload() && !pfrom->fWhitelisted) {
LogPrint("net", "Ignoring getheaders from peer=%d because node is in initial block download\n", pfrom->id);
return true;
}

We must either ensure that GETHEADERs always responds (in all clients) or remove this check.

@ptschip
Copy link
Collaborator

ptschip commented May 8, 2017 via email

@gandrewstone
Copy link
Collaborator

while whitelisting may work, on the real network we will still be kicking legit nodes out

@ptschip
Copy link
Collaborator

ptschip commented May 8, 2017 via email

@ftrader
Copy link
Collaborator Author

ftrader commented May 8, 2017

With whitelisting the test passed 16 times in a row (on dev @ 73a9cfe) , then it failed on iteration 17 failed with a '503 Service Unavailable' error after being timed out (I set it to 300s - the test usually takes 120-140s when it passes).

@ftrader
Copy link
Collaborator Author

ftrader commented May 9, 2017

Decided to whitelist this test as it does not interfere with the functionality tested.
The remaining timeout problem is something which apparently randomly affects multiple tests - seems a separate problem.

@ftrader ftrader changed the title abandonconflict.py is broken on 'dev' and 'release' branches abandonconflict.py randomly fails on 'dev' and 'release' branches May 9, 2017
ftrader added a commit to ftrader-bitcoinunlimited/BitcoinUnlimited that referenced this issue May 9, 2017
Issue: BitcoinUnlimited#537

Whitelisting them does not interfere with this test, and seems to lower the
probability of this test failing randomly. The remaining random hanging failure seems to have a different cause which remains to be investigated.
@sickpig
Copy link
Collaborator

sickpig commented May 11, 2017

@ftrader test pass for me 10 times in a row bot on current dev and release.

@ftrader
Copy link
Collaborator Author

ftrader commented May 12, 2017

@sickpig : Did another test run on current 1.0.2.0 release (9fcdbf5) and it timed out again on run 17.

Random seed: 1494567800
Initializing test directory  /tmp/testxzxvr2n4 Bitcoin conf:  None walletfiles:  None
[202, 300]
[210, 300]
[218, 300]
[239, 300]
[300, 300]
Unexpected exception caught during testing: timeout('timed out',)
Stopping nodes
Cannot send request: Request-sent
Connection broken:  [Errno 32] Broken pipe
Cleaning up
Failed
Running testscript ESC[1mabandonconflict.pyESC[0m ...
CalledProcessError for test abandonconflict: Command '/home/ftrader/bu/qa/rpc-tests/abandonconflict.py --srcdir /home/ftrader/bu/src  ' returned non-zero exit status 1
Duration: 70 s

On current 'dev' (76fd9a6) I had a pass rate of 17 / 20 .
First failure (little bit different than on 'release'):

[301, 301]
JSONRPC error: Node not found in connected nodes
Stopping nodes
Cleaning up
Failed

and two seemed to have failed with
JSONRPC error: non-JSON HTTP response with '503 Service Unavailable' from server

So unfortunately I don't think this problem has really been fixed on current 'dev' / 'release' .

@sickpig
Copy link
Collaborator

sickpig commented May 14, 2017

@ftrader

tested release @ 9fcdbf5 40 times in a row in 2 different systems and I didn't get any failures.

tested also dev @ 76fd9a6 30 times still no failures.

Tomorrow I'll try to test with a higher numbers of retries.

@ftrader
Copy link
Collaborator Author

ftrader commented May 15, 2017

Noticed bitcoin/bitcoin#10344 .

I will try the approach taken there.

@ptschip
Copy link
Collaborator

ptschip commented May 15, 2017 via email

@ftrader
Copy link
Collaborator Author

ftrader commented May 15, 2017

Raised #582 , but am doing some more repetitive testing on my system to get a feel for whether it's effective for me.

@sickpig
Copy link
Collaborator

sickpig commented Aug 14, 2017

abandonconflict.py does not fail on a series of 80 retries in a row, both on release and dev branches.

Closing, @ftrader please feel free to reopen if you are able to reproduce the problem.

@sickpig sickpig closed this as completed Aug 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants