Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc_net.py intemittent failure / assert_equal(len(getrawaddrman[table_name]), len(table_info["entries"])) #28964

Closed
maflcko opened this issue Nov 29, 2023 · 11 comments · Fixed by #28998

Comments

@maflcko
Copy link
Member

maflcko commented Nov 29, 2023

https://cirrus-ci.com/task/5884375213015040?logs=ci#L4022

 test  2023-11-28T23:28:31.543000Z TestFramework (ERROR): Assertion failed 
                                   Traceback (most recent call last):
                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 131, in main
                                       self.run_test()
                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/rpc_net.py", line 71, in run_test
                                       self.test_getrawaddrman()
                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/rpc_net.py", line 506, in test_getrawaddrman
                                       check_getrawaddrman_entries(expected)
                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/rpc_net.py", line 424, in check_getrawaddrman_entries
                                       assert_equal(len(getrawaddrman[table_name]), len(table_info["entries"]))
                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 57, in assert_equal
                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
                                   AssertionError: not(3 == 2)
@maflcko maflcko added the Tests label Nov 29, 2023
@0xB10C
Copy link
Contributor

0xB10C commented Nov 29, 2023

I'm taking a look at this.

@0xB10C
Copy link
Contributor

0xB10C commented Nov 29, 2023

node1 [httpworker.2] [rpc/request.cpp:181] [parse] [rpc] ThreadRPCServer method=addpeeraddress user=__cookie__ 
node1 [httpworker.2] [logging/timer.h:58] [Log] [addrman] CheckAddrman: new 2, tried 1, total 3 started 
node1 [httpworker.2] [logging/timer.h:58] [Log] [addrman] CheckAddrman: completed (0.60ms) 
node1 [httpworker.2] [addrman.cpp:615] [AddSingle] [addrman] Added nrfj6inpyf73gpkyool35hcmne5zwfmse3jl3aw23vk7chdemalyaqad.onion:0 to new[238][20] 
node1 [httpworker.2] [addrman.cpp:687] [Add_] [addrman] Added 1 addresses (of 1) from nrfj6inpyf73gpkyool35hcmne5zwfmse3jl3aw23vk7chdemalyaqad.onion: 1 tried, 3 new 
node1 [httpworker.2] [logging/timer.h:58] [Log] [addrman] CheckAddrman: new 3, tried 1, total 4 started 
node1 [httpworker.2] [logging/timer.h:58] [Log] [addrman] CheckAddrman: completed (0.87ms) 
node1 [httpworker.2] [logging/timer.h:58] [Log] [addrman] CheckAddrman: new 3, tried 1, total 4 started 
node1 [httpworker.2] [logging/timer.h:58] [Log] [addrman] CheckAddrman: completed (0.87ms) 
node1 [httpworker.2] [addrman.cpp:668] [Good_] [addrman] Collision with 1.2.3.4:8333 while attempting to move nrfj6inpyf73gpkyool35hcmne5zwfmse3jl3aw23vk7chdemalyaqad.onion:0 to tried table. Collisions=1 
node1 [httpworker.2] [logging/timer.h:58] [Log] [addrman] CheckAddrman: new 3, tried 1, total 4 started 
test TestFramework (DEBUG): Added nrfj6inpyf73gpkyool35hcmne5zwfmse3jl3aw23vk7chdemalyaqad.onion to tried table 
test TestFramework (DEBUG): Test that the newly added addresses appear in getrawaddrman 

seems like this didn't work:

for (table_name, table_info) in expected.items():
# There's a slight chance that the to-be-added address collides with an already
# present table entry. To avoid this, we increment the port until an address has been
# added. Incrementing the port changes the position in the new table bucket (bucket
# stays the same) and changes both the bucket and the position in the tried table.
while True:
if node.addpeeraddress(address=table_info["entries"][1]["address"], port=port, tried=table_name == "tried")["success"]:
table_info["entries"][1]["port"] = port
self.log.debug(f"Added {table_info['entries'][1]['address']} to {table_name} table")
break
else:
port += 1

Did addpeeraddress return {"success": true} even if there was a collision while moving to tried?

(still investigating)

@mzumsande
Copy link
Contributor

I think that the problem is a rare collision in the tried table:
node1 2023-11-28T23:28:31.447418Z (mocktime: 2023-11-28T23:28:28Z) [httpworker.2] [addrman.cpp:668] [Good_] [addrman] Collision with 1.2.3.4:8333 while attempting to move nrfj6inpyf73gpkyool35hcmne5zwfmse3jl3aw23vk7chdemalyaqad.onion:0 to tried table. Collisions=1:

addpeeraddress reports success if the first part (adding to new) succeeded, but the second part (moving to tried) fails due to a collision (so that the addr stays in new).

@0xB10C
Copy link
Contributor

0xB10C commented Nov 29, 2023

@mzumsande is there a reason why addpeeraddress should return true if it fails to put in in the table we asked it to do? Sounds like a bug to me. Or, it should at least be documented clearly.

bitcoin/src/rpc/net.cpp

Lines 981 to 987 in 7bc8c53

if (addrman.Add({address}, address)) {
success = true;
if (tried) {
// Attempt to move the address to the tried addresses table.
addrman.Good(address);
}
}

@mzumsande
Copy link
Contributor

mzumsande commented Nov 29, 2023

I don't think there is a reason, but returning false wouldn't be great either. It might be best could return an error code with 3 values (not added to new; added to new but failed to move to tried; everything successful).

@0xB10C
Copy link
Contributor

0xB10C commented Dec 5, 2023

Improving addpeeraddress tried=true failure behavior in #28998

@m3dwards
Copy link
Contributor

Same or similar issue?

 test  2024-02-28T10:08:07.041000Z TestFramework (ERROR): Assertion failed 
                                   Traceback (most recent call last):
                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 131, in main
                                       self.run_test()
                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/rpc_net.py", line 63, in run_test
                                       self.test_getpeerinfo()
                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/rpc_net.py", line 123, in test_getpeerinfo
                                       assert_equal(
                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 57, in assert_equal
                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
                                   AssertionError: not({'id': 2, 'network': 'not_publicly_routable', 'services': '0000000000000000', 'servicesnames': [], 'relaytxes': False, 'lastsend': 1709114886, 'lastrecv': 1709114886, 'last_transaction': 0, 'last_block': 0, 'conntime': 1709114886, 'timeoffset': 0, 'version': 0, 'subver': '', 'inbound': True, 'bip152_hb_to': False, 'bip152_hb_from': False, 'startingheight': -1, 'presynced_headers': -1, 'synced_headers': -1, 'synced_blocks': -1, 'inflight': [], 'addr_relay_enabled': False, 'addr_processed': 0, 'addr_rate_limited': 0, 'permissions': [], 'minfeefilter': Decimal('0E-8'), 'bytessent_per_msg': {}, 'bytesrecv_per_msg': {}, 'connection_type': 'inbound', 'transport_protocol_type': 'detecting', 'session_id': ''} == {'addr_processed': 0, 'addr_rate_limited': 0, 'addr_relay_enabled': False, 'bip152_hb_from': False, 'bip152_hb_to': False, 'bytesrecv_per_msg': {}, 'bytessent_per_msg': {}, 'connection_type': 'inbound', 'conntime': 1709114886, 'id': 2, 'inbound': True, 'inflight': [], 'last_block': 0, 'last_transaction': 0, 'lastrecv': 1709114886, 'lastsend': 1709114886, 'minfeefilter': Decimal('0E-8'), 'network': 'not_publicly_routable', 'permissions': [], 'presynced_headers': -1, 'relaytxes': False, 'services': '0000000000000000', 'servicesnames': [], 'session_id': 'b6552478f74e655b236cd192c85c9831b69221681ad1f971fb99a1451391b159', 'startingheight': -1, 'subver': '', 'synced_blocks': -1, 'synced_headers': -1, 'timeoffset': 0, 'transport_protocol_type': 'v2', 'version': 0})

CI run: https://cirrus-ci.com/task/4559353030639616?logs=ci#L3823

@maflcko
Copy link
Member Author

maflcko commented Feb 28, 2024

It is in a different line, so it seems different.

Unrelated, I think it would be useful if the assert helper would sort and pretty-print dictionaries. Otherwise it is hard to spot the difference.

@mzumsande
Copy link
Contributor

yes, it's a different issue, most likely related to #29358 - @stratospher and I are looking into it.

@m3dwards
Copy link
Contributor

Gotcha.

I have since been able to get what looks like the same issue.

 node1 2024-02-28T14:39:06.105966Z (mocktime: 2024-02-28T14:39:05Z) [C:\Users\Max\source\bitcoin\src\logging\timer.h:58] [Log] [addrman] CheckAddrman: completed (0.03ms)
 node1 2024-02-28T14:39:06.105977Z (mocktime: 2024-02-28T14:39:05Z) [C:\Users\Max\source\bitcoin\src\logging\timer.h:58] [Log] [addrman] CheckAddrman: new 3, tried 1, total 4 started

 test  2024-02-28T14:39:06.106000Z TestFramework (ERROR): Assertion failed
                                   Traceback (most recent call last):
                                     File "C:\Users\Max\source\bitcoin\test\functional\test_framework\test_framework.py", line 131, in main
                                       self.run_test()
                                     File "C:\Users\Max\source\bitcoin/test/functional/rpc_net.py", line 72, in run_test
                                       self.test_getrawaddrman()
                                     File "C:\Users\Max\source\bitcoin/test/functional/rpc_net.py", line 512, in test_getrawaddrman
                                       check_getrawaddrman_entries(expected)
                                     File "C:\Users\Max\source\bitcoin/test/functional/rpc_net.py", line 430, in check_getrawaddrman_entries
                                       assert_equal(len(getrawaddrman[table_name]), len(table_info["entries"]))
                                     File "C:\Users\Max\source\bitcoin\test\functional\test_framework\util.py", line 57, in assert_equal
                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
                                   AssertionError: not(3 == 2)
 node1 2024-02-28T14:39:06.106021Z (mocktime: 2024-02-28T14:39:05Z) [C:\Users\Max\source\bitcoin\src\logging\timer.h:58] [Log] [addrman] CheckAddrman: completed (0.04ms)
 node1 2024-02-28T14:39:06.106033Z (mocktime: 2024-02-28T14:39:05Z) [C:\Users\Max\source\bitcoin\src\logging\timer.h:58] [Log] [addrman] CheckAddrman: new 3, tried 1, total 4 started

@mzumsande
Copy link
Contributor

This can be closed after #29007 merge (#28998 could still make sense, but the CI failure cannot happen anymore).

achow101 added a commit that referenced this issue Mar 22, 2024
99954f9 test: fix test to ensure hidden RPC is present in detailed help (stratospher)
0d01f6f test: remove unused mocktime in test_addpeeraddress (0xb10c)
6205466 rpc: "addpeeraddress tried" return error on failure (0xb10c)

Pull request description:

  When trying to add an address to the IP address manager tried table, it's first added to the new table and then moved to the tried table. Previously, adding a conflicting address to the address manager's tried table with test-only `addpeeraddress tried=true` RPC would return `{ "success": true }`. However, the address would not be added to the tried table, but would remain in the new table. This caused, e.g., issue #28964.

  This is fixed by new returning `{ "success":  false, "error": "..." }` for failed tried table additions. Since the address remaining in the new table can't be removed (the address manager interface does not support removing addresses at the moment and adding this seems to be a bigger effort), an error message is returned. This indicates to a user why the RPC failed and allows accounting for the extra address in the new table. This is done in the functional test for the `getrawaddrman` RPC.

  Fixes #28964

ACKs for top commit:
  achow101:
    ACK 99954f9
  stratospher:
    reACK 99954f9. 🚀
  brunoerg:
    utACK 99954f9

Tree-SHA512: 2f1299410c0582ebc2071271ba789a8abed905f9a510821f77afbcf2a555ec31397578ea55cbcd162fb828be27afedd3246c7b13ad8883f2f745bb8e04364a76
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants