New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: Continuous ASMap health check #27581
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
baacef4
to
9b1d32e
Compare
I think this is ready for review. I have included the feedback so far and I think the current extent of the checks and stats is good enough as a first step. I am currently still going through all the possible cases we could see an ASMap file be harmful to a node and this may lead to some additional checks and stats being logged in the future but I need some more time for this and it can be done as a follow-up. |
Concept ACK. |
Concept ACK, but I'm not conviced about the approach. In 9b1d32e "Net: Add continuous ASMap health check logging": You're basically fetching all addresses (ipv4/ipv6) from addrman and doing a sanity check every 24hs. Considering we don't change asmap file frequently (we need to restart the node to do it), do we really need to do it for all addresses every 24hs? Perhaps we could do it when:
|
Instead of creating void CConnman::ASMapHealthCheck()
{
const std::vector<CAddress> v4_addrs{GetAddresses(0, 0, Network::NET_IPV4)};
const std::vector<CAddress> v6_addrs{GetAddresses(0, 0, Network::NET_IPV6)};
std::vector<CNetAddr> clearnet_addrs;
clearnet_addrs.reserve(v4_addrs.size() + v6_addrs.size());
std::transform(v4_addrs.begin(), v4_addrs.end(), std::back_inserter(clearnet_addrs),
[](const CAddress& addr) { return static_cast<CNetAddr>(addr); });
std::transform(v6_addrs.begin(), v6_addrs.end(), std::back_inserter(clearnet_addrs),
[](const CAddress& addr) { return static_cast<CNetAddr>(addr); });
m_netgroupman.ASMapHealthCheck(clearnet_addrs);
} and then you could do in if (m_netgroupman.UsingASMap()) {
scheduler.scheduleFromNow([this] { ASMapHealthCheck(); }, ASMAP_HEALTH_CHECK);
} being ASMAP_HEALTH_CHECK the period, e.g.: static constexpr std::chrono::hours ASMAP_HEALTH_CHECK{24}; Obs: I didn't test it, just a suggestion that came to my mind. |
9b1d32e
to
93808b9
Compare
@brunoerg Thanks for the review! I like the idea of using the scheduler and I applied that change. However, I am unsure about tying the check to adding new addresses to the new table. It seems arbitrary to put it there, I would also like to run it on removals if we were going this route. From looking at some debug logs from a node I resynced the day before, I see changes in the address tables happening every couple of minutes. So we would still need to track the elapsed time to not run the check more often than necessary. So I think just using the 24h scheduler is cleaner and just as good. |
I agree. |
7fdb557
to
b87045d
Compare
Addressed the feedback and rebased, which seems to have fixed the CI issues finally. |
Rebased and addressed feedback from @brunoerg and @mzumsande . After some offline discussion with @mzumsande I have added the possibility to use |
77648a1
to
8c18138
Compare
8c18138
to
acd1759
Compare
Rebased |
CI times out |
acd1759
to
8f7dfe7
Compare
Doesn't really look related? One failed after 120m, one after 16s and most succeeded. I also don't see any relation to test changes here. Pushed a new rebase to see if the issues persist... |
8f7dfe7
to
3ea54e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 3ea54e5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crACK 3ea54e5
code lgtm, will test in practice soon!
ACK 3ea54e5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 3ea54e5
There are certain statistics we can collect by running all our known clearnet addresses against the ASMap file. This could show issues with a maliciously manipulated file or with an old file that has decayed with time.
This is just a proof of concept for now. My idea currently is to run the analysis once per day and print the results to logs if an ASMap file is used.