Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib: Use asmap for ASN lookup in makeseeds #24864

Merged
merged 3 commits into from
Jun 16, 2022

Conversation

laanwj
Copy link
Member

@laanwj laanwj commented Apr 15, 2022

Add an argument -a to provide a asmap file to do the IP to ASN lookups.

This speeds up the script greatly, and makes the output deterministic. Also removes the dependency on dns.lookup.

I've annotated the output with ASxxxx comments to provide a way to verify the functionality.

For now I've added instructions in README.md to download and use the demo.map from the asmap repository. When we have some other mechanism for distributing asmap files we could switch to that.

This continues #24824. I've removed the fallbacks and extra complexity, as everyone will be using the same instructions anyway.

Co-authored-by: Pieter Wuille pieter.wuille@gmail.com
Co-authored-by: russeree reese.russell@ymail.com

@jonatack
Copy link
Contributor

Concept ACK.

(Python linter says to remove unused headers.)

@laanwj laanwj force-pushed the 2022-04-makeseeds-use-asmap branch 2 times, most recently from b48d2b0 to 08ecc04 Compare April 15, 2022 16:08
@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 16, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

# Copyright (c) 2013-2020 The Bitcoin Core developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
import ipaddress
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I didn't realize before that Python had built-in functionality to parse/manipulate IP addresses (both v4 and v6). Might make sense (but not in this PR) to use this in more places and replace all the ad-hoc hacks.

def DecodeBytes(byts):
return [(byt >> i) & 1 for byt in byts for i in range(8)]

def DecodeBits(stream, bitpos, minval, bit_sizes):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code from here came from testmap which doesn't come with a license. It looks like @sipa and @jamesob contributed to it, so to include it we would need you to sign off that it's acceptable to include this here under the MIT license.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK MIT license from me.

Also, I'm currently rewriting the Python asmap code from scratch actually (to have a single class that can do creation of asmap files, decoding, lookup, diffing, have tests, ...), and it's close to being done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK MIT license

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Also, I'm currently rewriting the Python asmap code from scratch actually (to have a single class that can do creation of asmap files, decoding, lookup, diffing, have tests, ...), and it's close to being done.

OK, we can wait for that, there's no hurry here. It would be nice to have this before 0.24 branch-off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm currently rewriting the Python asmap code from scratch actually (to have a single class that can do creation of asmap files, decoding, lookup, diffing, have tests, ...), and it's close to being done.

/start slightly off-topic:

Nice! Is the diffing for encoded mappings or just output from asmap-rs? Or would this even make asmap-rs obsolete? I'm in the process of making some improvements to UX there to show progress on RIPE data dump downloads and finding the ASN bottlenecks. I was also thinking of including a command there to compare non-encoded maps. I was going to use that to run some sort of monthly historical analysis over the past year to get an idea of how much the maps tend to change. Just don't want to duplicate effort.

Just some of the things to help get asmaps distributed with releases and enabled by default. @sipa, just let me know if you want to continue this chat off GitHub or on some other issue. :)

/end slightly off-topic

Copy link
Member

@sipa sipa May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dunxen Yeah, possibly. At this point the new asmap python code can do everything asmap-rs can and more (encode, decode, diff, bottleneck, ...), but there is some more work around making it nicer to use, and of course questions around review and integration into processes.

@jonatack
Copy link
Contributor

Tested that both -a and --asmap run, but if makeseeds no longer works without the added arg, can it be removed?

$ python3 makeseeds.py < seeds_main.txt > nodes_main.txt
usage: makeseeds.py [-h] -a ASMAP
makeseeds.py: error: the following arguments are required: -a/--asmap

@@ -11,18 +11,7 @@ to addrman with).
The seeds compiled into the release are created from sipa's DNS seed data, like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe update here

Suggested change
The seeds compiled into the release are created from sipa's DNS seed data, like this:
The seeds compiled into the release are created from sipa's DNS seed and AS map data, like this:


print(f'Loading asmap database {args.asmap}... ', end='', file=sys.stderr, flush=True)
asmap = ASMap(args.asmap)
print(f'Done.', file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nitty suggestions here

-    print(f'Loading asmap database {args.asmap}... ', end='', file=sys.stderr, flush=True)
+    print(f'Loading asmap database "{args.asmap}"... ', end='', file=sys.stderr, flush=True)
     asmap = ASMap(args.asmap)
-    print(f'Done.', file=sys.stderr)
+    print('done.\n', file=sys.stderr)

@laanwj
Copy link
Member Author

laanwj commented Apr 18, 2022

Tested that both -a and --asmap run, but if makeseeds no longer works without the added arg, can it be removed?

I considered this, but I have a slight preference for explicit named instead of positional arguments that's why I kept the -a/--asmap.

@fanquake fanquake added this to the 24.0 milestone Apr 18, 2022
@@ -201,7 +178,18 @@ def ip_stats(ips: List[Dict]) -> str:

return f"{hist['ipv4']:6d} {hist['ipv6']:6d} {hist['onion']:6d}"

def parse_args():
argparser = argparse.ArgumentParser(description=f'Generates a list of bitcoin node seed ip addresses.')
argparser.add_argument("-a","--asmap", help=f'The location of the asmap asn database file (required)', required=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

-    argparser = argparse.ArgumentParser(description=f'Generates a list of bitcoin node seed ip addresses.')
-    argparser.add_argument("-a","--asmap", help=f'The location of the asmap asn database file (required)', required=True)
+    argparser = argparse.ArgumentParser(description='Generates a list of bitcoin node seed ip addresses.')
+    argparser.add_argument("-a", "--asmap", help='the location of the asmap asn database file (required)', required=True)
$  python3 makeseeds.py -h
usage: makeseeds.py [-h] -a ASMAP

Generates a list of bitcoin node seed ip addresses.

optional arguments:
  -h, --help            show this help message and exit
  -a ASMAP, --asmap ASMAP
                        the location of the asmap asn database file (required)

@sipa
Copy link
Member

sipa commented Apr 21, 2022

I've put more up-to-date and generic asmap files on https://bitcoin.sipa.be/asmap-unfilled.dat and https://bitcoin.sipa.be/asmap-filled.dat (the latter one has subnets with no actual ASN assigned to nearby ASNs to minimize the file size). The input data was sourced through https://github.com/rrybarczyk/asmap-rs, and compiled to asmap format by new code I'm working on at https://github.com/sipa/asmap/tree/nextgen. I hope that's in a presentable state in the next few days somewhere, but I'm put it online already to experiment with.

@laanwj laanwj force-pushed the 2022-04-makeseeds-use-asmap branch from 08ecc04 to 4f33193 Compare April 23, 2022 09:11
@laanwj
Copy link
Member Author

laanwj commented Apr 23, 2022

Re-pushed:

  • Addressed @jonatack 's comments
  • Changed curl URL to @sipa's new "asmap-filled.dat" asmap database (sorry for providing link to a corrupted one)
  • Mention @jamesob as co-author

Copy link
Contributor

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lightly tested ACK 4f33193 per git range-diff 10a626a1 08ecc04 4f33193

Loading asmap database "asmap-filled.dat"…Done.
  IPv4   IPv6  Onion Pass                                               
470769  73264      0 Initial
470769  73264      0 Skip entries with invalid address
470769  73264      0 After removing duplicates
  6321   1728      0 Enforce minimal number of blocks
  5492   1496      0 Require service bit 1
  3861    872      0 Require minimum uptime
  3782    849      0 Require a known and recent user agent
  3757    841      0 Filter out hosts with multiple bitcoin ports
   512    281      0 Look up ASNs and limit results per ASN and per net

contrib/seeds/README.md Outdated Show resolved Hide resolved
contrib/seeds/README.md Show resolved Hide resolved
@laanwj
Copy link
Member Author

laanwj commented Apr 26, 2022

Cherry-picked a commit from @jonatack with improvements.

@laanwj laanwj force-pushed the 2022-04-makeseeds-use-asmap branch from 3d7f68c to 7edacdc Compare April 26, 2022 11:58
@jonatack
Copy link
Contributor

re-ACK 7edacdc

Copy link
Contributor

@dunxen dunxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 7edacdc

With this version of the map:

₿ sha256sum asmap-filled.dat 
f479906ce1731281616a235a85a79c9c5085c36d88689939ef7fcc5196d30874  asmap-filled.dat

I get:

₿ python3 makeseeds.py -a asmap-filled.dat < seeds_main.txt > nodes_main.txt
Loading asmap database "asmap-filled.dat"…Done.
Loading and parsing DNS seeds…Done.
  IPv4   IPv6  Onion Pass                                               
471267  73410      0 Initial
471267  73410      0 Skip entries with invalid address
471267  73410      0 After removing duplicates
  6927   1908      0 Enforce minimal number of blocks
  5967   1617      0 Require service bit 1
  3636    856      0 Require minimum uptime
  3244    771      0 Require a known and recent user agent
  3221    767      0 Filter out hosts with multiple bitcoin ports
   512    238      0 Look up ASNs and limit results per ASN and per net

@laanwj
Copy link
Member Author

laanwj commented May 23, 2022

I'm kind of in doubt here, with the testing and review this has got, should we go ahead and merge this and leave updating to asmap-nextgen for a follow-up PR?

@dunxen
Copy link
Contributor

dunxen commented May 23, 2022

should we go ahead and merge this and leave updating to asmap-nextgen for a follow-up PR?

I think this is fair :)
Definitely adds a lot of value already.

@sipa
Copy link
Member

sipa commented May 23, 2022

The current module there in asmap.py has an ASMap class with tests, and a from_binary and lookup method, which should be sufficient for the purposes here. I don't think much is going to change there, and it's already much better at handling error conditions than the current code, so feel free to switch to that.

I'm going to still clean things up more and have some cli tools using it, but the asmap module is pretty much done.

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 7edacdc

@laanwj
Copy link
Member Author

laanwj commented May 25, 2022

I'm going to still clean things up more and have some cli tools using it, but the asmap module is pretty much done.

I've replaced asmap.py with the new one. I didn't find a way to look up an IP address directly so I did need to port over the decode_ip function. Please let me know if anything can be done more efficiently.

Result is the same, but it's a bit slower (I think the bottleneck is in the loading phase):

Before:

$ time python3 makeseeds.py -a asmap-filled.dat < seeds_main.txt > /dev/null 
Loading asmap database "asmap-filled.dat"…Done.
Loading and parsing DNS seeds…Done.
  IPv4   IPv6  Onion Pass                                               
470789  73296      0 Initial
470789  73296      0 Skip entries with invalid address
470789  73296      0 After removing duplicates
  6318   1747      0 Enforce minimal number of blocks
  5456   1504      0 Require service bit 1
  3866    887      0 Require minimum uptime
  3789    864      0 Require a known and recent user agent
  3760    856      0 Filter out hosts with multiple bitcoin ports
   512    295      0 Look up ASNs and limit results per ASN and per net

real    0m9.826s
user    0m9.349s
sys     0m0.468s

After:

Loading asmap database "asmap-filled.dat"…Done.
Loading and parsing DNS seeds…Done.
  IPv4   IPv6  Onion Pass                                               
470789  73296      0 Initial
470789  73296      0 Skip entries with invalid address
470789  73296      0 After removing duplicates
  6318   1747      0 Enforce minimal number of blocks
  5456   1504      0 Require service bit 1
  3866    887      0 Require minimum uptime
  3789    864      0 Require a known and recent user agent
  3760    856      0 Filter out hosts with multiple bitcoin ports
   512    295      0 Look up ASNs and limit results per ASN and per net

real    0m35.717s
user    0m35.050s
sys     0m0.628s

@sipa
Copy link
Member

sipa commented May 25, 2022

@laanwj Ah yes, there are only functions for converting between net ranges and prefixes, not addresses. I'll add functions for those too. I don't think that's holding up anything here though; I can PR an update when it's done.

The slowdown is indeed somewhat expected, as it's converting the whole map to a lookup tree format rather than interpreting the binary data directly, which is faster per lookup but slower to load. Is that a concern? I could add interpretation based logic too.

@laanwj
Copy link
Member Author

laanwj commented May 26, 2022

Is that a concern? I could add interpretation based logic too.

No, not for me at least. It's not a script to be run often and ~30s still a lot better than when it had to DNS query for every address. I also think the error checking implied by parsing instead of interpretation is useful here. And maybe there's scope for optimization of the loading (if anyone does care).

Edit: our linter did find a few problems, not sure you care about these as our coding style isn't necessarily the asmap one, but here goes:

contrib/seeds/asmap.py:5:1: F407 future feature annotations is not defined
contrib/seeds/asmap.py:284:9: E306 expected 1 blank line before a nested definition, found 0
contrib/seeds/asmap.py:349:9: E306 expected 1 blank line before a nested definition, found 0
contrib/seeds/asmap.py:370:9: E306 expected 1 blank line before a nested definition, found 0
contrib/seeds/asmap.py:383:17: E125 continuation line with same indent as next logical line
contrib/seeds/asmap.py:471:13: E306 expected 1 blank line before a nested definition, found 0
contrib/seeds/asmap.py:532:9: E306 expected 1 blank line before a nested definition, found 0
contrib/seeds/asmap.py:638:9: E306 expected 1 blank line before a nested definition, found 0
^---- failure generated from lint-python.py

@jonatack
Copy link
Contributor

Between 17 and 20 seconds total makeseeds.py time for me, which seems fine. The slower parts are preceded by a "Loading..." message, so all good.

@laanwj laanwj force-pushed the 2022-04-makeseeds-use-asmap branch from b499e2d to fdfa4c3 Compare May 30, 2022 16:24
@laanwj
Copy link
Member Author

laanwj commented May 30, 2022

Re-pushed to fix linter errors (upstream in sipa/asmap#5).

However there's still one left which I don't really know how to get rid of:

contrib/seeds/asmap.py:5:1: F407 future feature annotations is not defined

I don't get this locally. It must have to do with Python version differences.

Edit: trying to simply remove the from __future__ import annotatinos line.

Edit.2: that gives me an error while running

Traceback (most recent call last):
  File "…/bitcoin/contrib/seeds/makeseeds.py", line 16, in <module>
    from asmap import ASMap
  File "…/bitcoin/contrib/seeds/asmap.py", line 171, in <module>
    class _BinNode:
  File "…/bitcoin/contrib/seeds/asmap.py", line 179, in _BinNode
    def __init__(self, ins: _Instruction, arg1: _BinNode, arg2: _BinNode): ...
NameError: name '_BinNode' is not defined

@laanwj laanwj force-pushed the 2022-04-makeseeds-use-asmap branch 2 times, most recently from b00c08d to ed418f6 Compare May 30, 2022 16:37
@sipa
Copy link
Member

sipa commented May 30, 2022

FWIW the asmap code runs significantly faster inside of pypy3.

@jonatack
Copy link
Contributor

jonatack commented May 30, 2022

Was just wondering about this. I'm not seeing the issue locally either with Python 3.10.4.

laanwj and others added 2 commits May 31, 2022 11:57
Add an argument `-a` to provide a asmap file to do the IP to ASN
lookups.

This speeds up the script greatly, and makes the output deterministic.
Also removes the dependency on `dns.lookup`.

I've annotated the output with ASxxxx comments to provide a way to
verify the functionality.

For now I've added instructions in README.md to download and use the
`demo.map` from the asmap repository. When we have some other mechanism
for distributing asmap files we could switch to that.

This continues bitcoin#24824. I've removed all the fallbacks and extra
complexity, as everyone will be using the same instructions anyway.

Co-authored-by: Pieter Wuille <pieter.wuille@gmail.com>
Co-authored-by: James O'Beirne <james.obeirne@pm.me>
Co-authored-by: russeree <reese.russell@ymail.com>
@laanwj laanwj force-pushed the 2022-04-makeseeds-use-asmap branch from ed418f6 to bc23f34 Compare May 31, 2022 09:59
@laanwj
Copy link
Member Author

laanwj commented May 31, 2022

I added a commit that disables the Python linter for asmap.py.

    test: Exclude asmap.py from Python linter checks
    
    asmap.py, a third-party dependency, is excluded from Python linter checks
    because of `from __future__ import annotations` (PEP 563 postponed annotations)
    this workaround can be removed when the Python version used is upgraded to 3.7+.

I decided to do this instead of removing the annotations because I think the postponed annotations are useful and we should use them as well (in the future, when the Python version allows for this).

This should hopefully make this pass the CI again.

@laanwj laanwj force-pushed the 2022-04-makeseeds-use-asmap branch from bc23f34 to 1038342 Compare May 31, 2022 10:06
contrib/seeds/makeseeds.py Outdated Show resolved Hide resolved
contrib/seeds/makeseeds.py Outdated Show resolved Hide resolved
@laanwj laanwj force-pushed the 2022-04-makeseeds-use-asmap branch from 1038342 to 1df513a Compare May 31, 2022 15:33
@laanwj
Copy link
Member Author

laanwj commented May 31, 2022

Re-pushed, updated for @sipa's suggestions.

@jamesob
Copy link
Member

jamesob commented May 31, 2022

Concept ACK, will review soon.

@sipa
Copy link
Member

sipa commented May 31, 2022

@laanwj I've pushed another change, dropping the future annotations. It turns out you can use the string name of forward-declared types as well as typing annotations, which suffices here. I've also merged your sipa/asmap#5. Hopefully it now passes all linter requirements?

@laanwj laanwj force-pushed the 2022-04-makeseeds-use-asmap branch from 1df513a to 667e316 Compare June 1, 2022 12:40
@laanwj
Copy link
Member Author

laanwj commented Jun 1, 2022

Ok, thanks for adding a license too, updated to the new asmap.py.

@sipa
Copy link
Member

sipa commented Jun 1, 2022

ACK 667e316

Copy link
Contributor

@dunxen dunxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 667e316

@laanwj laanwj merged commit 7f2c983 into bitcoin:master Jun 16, 2022
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 17, 2022
@brunoerg
Copy link
Contributor

For now I've added instructions in README.md to download and use the demo.map from the asmap repository. When we have some other mechanism for distributing asmap files we could switch to that.

https://github.com/brunoerg/asmapy can be a good alternative!

@bitcoin bitcoin locked and limited conversation to collaborators Dec 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants