New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
abandonconflict.py randomly fails on 'dev' and 'release' branches #537
Comments
@freetrader sounds like you might have a timing issue...initial headers
need to come back within 30 seconds..you might want to try 60 and see if
the problem goes away. (just fyi, i never have any issues running this
test on my system)
got ot line 116 main.h and make INITIAL_HEADERS_TIMEOUT = 60;
perpahs even try 600 just to see if the problem is the timout period.
…On 07/05/2017 10:57 AM, freetrader wrote:
Results of repetitive testing for 'abandonconflict.py' on Debian 7
x86_64 native:
'dev' branch: 43/100 failed
'release' branch: 42/100 failed
release tag v1.01.4 : also fails 2/10
Tag 1.0.1.3 tested ok - 10/10 PASS
On Ubuntu 16.10: 57/100 failed on 'dev'. Didn't bother to run
repetivite on 'release', went straight for a bisection using the
bisect.sh script from #532
<#532> .
Command was:
|git bisect run bisect.sh 20 300 'qa/pull-tester/rpc-tests.py
abandonconflict'| (with 20 iterations, 300s timeout and
RECONFIGURE_BETWEEN_RUNS=1). Starting commit (bad revision) was
e78ce5e
<e78ce5e>
, good revision (through manual test above) was 1.0.1.3
Result:
|first bad commit: [4e55fdd]
Disconnect and ban a node if they fail to provide the inital HEADERS|
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#537>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/AMRF0CM6M9078O3-LkB0BFGQGwjzwIhxks5r3gYKgaJpZM4NTPLK>.
|
@ptschip : Travis also has a problem with this test (this is how I initially noticed this test not passing well). My system is old, but I'm pretty sure it's not THAT slow that it can hardly run a Bitcoin regression test :-) I'll try to confirm though whether it's entirely timeout related. Thanks for the hint. |
Example failure on 'dev' from ongoing bisection:
|
Bisection result from 'dev' on Debian 7 native: 1df110e is the first bad commit
|
The problem is that a node makes a HEADERs request, and the other node must respond or get booted. But GETHEADERs does not always respond. For example: if (IsInitialBlockDownload() && !pfrom->fWhitelisted) { We must either ensure that GETHEADERs always responds (in all clients) or remove this check. |
has anyone tried just whitelisting the nodes during the tests?
…On 08/05/2017 10:22 AM, gandrewstone wrote:
The problem is that a node makes a HEADERs request, and the other node
must respond or get booted.
But GETHEADERs does not always respond. For example:
if (IsInitialBlockDownload() && !pfrom->fWhitelisted) {
LogPrint("net", "Ignoring getheaders from peer=%d because node is in
initial block download\n", pfrom->id);
return true;
}
We must either ensure that GETHEADERs always responds (in all clients)
or remove this check.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#537 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMRF0IWkO5RgTgxa1MJEgFeX_PHa0sqJks5r309xgaJpZM4NTPLK>.
|
while whitelisting may work, on the real network we will still be kicking legit nodes out |
on the real network we don't ban a node , unless they claim their chain
height is longer than ours AND we tried to getheaders from them to sync
our own chain. So we
would never ban a node if for instance they were doing IBD and couldn't
provide us the headers anyway.
on the other hand, i think you might be partly right here, in one
special case, if they are also doing IBD at the same time we are AND are
ahead of us, then we could potentially get into a ban...so perhaps we
need to instead issue a disconnect, instead of ban, so that we can
continue our IBD and get headers from the next node and also give the
other node a chance to reconnect later. And it would be safer anway...a
disconnect here is enough.
but all that said, it won't fix the problem with this script...i still
think the only solution for this script may be to whitelist.
…On 08/05/2017 11:18 AM, gandrewstone wrote:
while whitelisting may work, on the real network we will still be
kicking legit nodes out
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#537 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMRF0J_gEaoJXZL0P8Ce75HyZwu0UOfRks5r31xggaJpZM4NTPLK>.
|
With whitelisting the test passed 16 times in a row (on dev @ 73a9cfe) , then it failed on iteration 17 failed with a '503 Service Unavailable' error after being timed out (I set it to 300s - the test usually takes 120-140s when it passes). |
Decided to whitelist this test as it does not interfere with the functionality tested. |
Issue: BitcoinUnlimited#537 Whitelisting them does not interfere with this test, and seems to lower the probability of this test failing randomly. The remaining random hanging failure seems to have a different cause which remains to be investigated.
@ftrader test pass for me 10 times in a row bot on current |
@sickpig : Did another test run on current 1.0.2.0 release (9fcdbf5) and it timed out again on run 17.
On current 'dev' (76fd9a6) I had a pass rate of 17 / 20 .
and two seemed to have failed with So unfortunately I don't think this problem has really been fixed on current 'dev' / 'release' . |
Noticed bitcoin/bitcoin#10344 . I will try the approach taken there. |
looks good, that connect node bi thing is pretty crazy, i never
understood why they introduced that in the first place.
… Noticed bitcoin#10344 <bitcoin/bitcoin#10344> .
I will try the approach taken there.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#537 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMRF0Btdw7cjYRaMsTnPET7Qc8o6uWKaks5r6ItugaJpZM4NTPLK>.
|
Raised #582 , but am doing some more repetitive testing on my system to get a feel for whether it's effective for me. |
Closing, @ftrader please feel free to reopen if you are able to reproduce the problem. |
This test fails randomly. Results of repetitive testing for 'abandonconflict.py' on Debian 7 x86_64 native:
'dev' branch: 43/100 failed
'release' branch: 42/100 failed
Release tag v1.01.4 : also fails 2/10
Release tag 1.0.1.3 tested ok - 10/10 PASS
On Ubuntu 16.10: 57/100 failed on 'dev'. Didn't bother to run repetivite on 'release', went straight for a bisection using the bisect.sh script from #532 .
Command was:
git bisect run bisect.sh 20 300 'qa/pull-tester/rpc-tests.py abandonconflict'
(with 20 iterations, 300s timeout and RECONFIGURE_BETWEEN_RUNS=1). Starting commit (bad revision) was e78ce5e , good revision (through manual test above) was 1.0.1.3Result:
first bad commit: [4e55fdde78e3406aa02d535fd928e3fa5fac4f72] Disconnect and ban a node if they fail to provide the inital HEADERS
A bisection for 'dev' is still in progress.
The text was updated successfully, but these errors were encountered: