In the last lesson we encountered the Bitcoin protocol's [Version Handshake](https://en.bitcoin.it/wiki/Version_Handshake). We saw how Bitcoin network peers won't respond if you don't start the conversion with a `version` message.

But we cheated. I gave you a serialize `version` message and didn't tell you how I created it.

We were also lazy: we didn't parse the cryptic `payload` of the `version` message that our peer sent us.

We, too, we rude! After listening for our peer's `version` message we stopped listening and never received or responded to their `verack` message -- completing the handshake.

So you see, we have much to fix!

To begin, I'm going to redefine everything from last lesson. I'm going to rename `Message` -> `Packet`.

In [186]:
from hashlib import sha256
from io import BytesIO

NETWORK_MAGIC = 0xD9B4BEF9

def bytes_to_int(b, byte_order='little'):
    return int.from_bytes(b, byte_order)

def int_to_bytes(i, length, byte_order='little'):
    return int.to_bytes(i, length, byte_order)

def read_magic(sock):
    magic_bytes = sock.recv(4)
    magic = bytes_to_int(magic_bytes)
    return magic

def read_command(sock):
    raw = sock.recv(12)
    # remove empty bytes
    command = raw.replace(b"\x00", b"")
    return command

def read_length(sock):
    raw = sock.recv(4)
    length = bytes_to_int(raw)
    return length

def read_checksum(sock):
    # FIXME: protocol documentation says this should be an integer ...
    raw = sock.recv(4)
    return raw

def calculate_checksum(payload_bytes):
    first_round = sha256(payload_bytes).digest()
    second_round = sha256(first_round).digest()
    first_four_bytes = second_round[:4]
    return first_four_bytes

def read_payload(sock, length):
    payload = sock.recv(length)
    return payload


class Packet:

    def __init__(self, command, payload):
        self.command = command
        self.payload = payload

    @classmethod
    def from_socket(cls, sock):
        magic = read_magic(sock)
        if magic != NETWORK_MAGIC:
            raise RuntimeError(f'Network magic "{magic}" is wrong')

        command = read_command(sock)
        payload_length = read_length(sock)
        checksum = read_checksum(sock)
        payload = read_payload(sock, payload_length)
        
        calculated_checksum = calculate_checksum(payload)
        if calculated_checksum != checksum:
            raise RuntimeError("Checksums don't match")

        if payload_length != len(payload):
            raise RuntimeError("Tried to read {payload_length} bytes, only received {len(payload)} bytes")

        return cls(command, payload)

    def to_bytes(self):
        pass

    def to_message(self):
        message_class = command_to_message_class(self.command)
        return message_class.from_payload(self.payload)

    def __repr__(self):
        return f"<Message command={self.command} payload={self.payload}>"

In [2]:
import socket

PEER_IP = "35.187.200.6"
PEER_PORT = 8333

# magic "version" bytestring
VERSION = b'\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00j\x00\x00\x00\x9b"\x8b\x9e\x7f\x11\x01\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x93AU[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00rV\xc5C\x9b:\xea\x89\x14/some-cool-software/\x01\x00\x00\x00\x01'

sock = socket.socket()
sock.connect((PEER_IP, PEER_PORT))

# initiate the "version handshake"
sock.send(VERSION)

# receive their "version" response
version_message = Packet.from_socket(sock)

print(version_message.payload)

b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x00\xb4\x9dZ[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00b\x8f\xc9N~]\x00\xb2\x10/Satoshi:0.16.0/p%\x08\x00\x01'


Our next task is to parse this payload. Parsing the payload will work almost exactly the same as the `Packet.from_socket` method defined above. 

[This chart](https://en.bitcoin.it/wiki/Protocol_documentation#version) from the protocol documentation will act as our blueprint.

![image](./images/version-message.png)

### Exercise #1: parse the version field contained within the payload of the version (a mouthful, I know!)

In [5]:
# binary_stream will look something like this:
# binary_stream = BytesIO(b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x00\xb4\x9dZ[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00b\x8f\xc9N~]\x00\xb2\x10/Satoshi:0.16.0/p%\x08\x00\x01'
)

def read_version(binary_stream):
    ### your code here ###
    # read and interpret bytes from the stream
    # fixme
    bytes_ = binary_stream.read(4)
    int_ = bytes_to_int(bytes_)
    return int_

In [4]:
import ipytest, pytest
import test_data

# ipytest.clean_tests("test_read_version*")

version_streams = test_data.make_version_streams()

def test_read_version_0():
    n = read_version(version_streams[0])
    assert n == 70015

def test_read_version_1():
    n = read_version(version_streams[1])
    assert n == 60001

def test_read_version_2():
    n = read_version(version_streams[2])
    assert n == 106
    
ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_read_version*")

unittest.case.FunctionTestCase (test_read_version_0) ... ok
unittest.case.FunctionTestCase (test_read_version_1) ... ok
unittest.case.FunctionTestCase (test_read_version_2) ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK



### Exercise #2: Given a version message binary stream, tell me whether the node that sent it can send a `pong` message ([hint](https://bitcoin.org/en/developer-reference#protocol-versions)).

In [6]:
def can_send_pong(binary_stream):
    ### your code here ###
    return read_version(binary_stream) >= 60001

In [7]:
version_streams = test_data.make_version_streams()

def test_can_send_pong_0():
    result = can_send_pong(version_streams[0])
    assert result == True

def test_can_send_pong_1():
    result = can_send_pong(version_streams[1])
    assert result == True

def test_can_send_pong_2():
    result = can_send_pong(version_streams[2])
    assert result == False
    
ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_can_send_pong*")

unittest.case.FunctionTestCase (test_can_send_pong_0) ... ok
unittest.case.FunctionTestCase (test_can_send_pong_1) ... ok
unittest.case.FunctionTestCase (test_can_send_pong_2) ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.002s

OK


Now we've got the hang of it.

Here's the outline of a `VersionMessage` class.

* It will have an `__init__` constructor method, which allows us to pass custom attrivutes like `timestamp` which may never take the same value twice if this code were running in production.
* It will have a hard-coded ("class variable")[https://www.toptal.com/python/python-class-attributes-an-overly-thorough-guide] of `command = b"version"`. With this decision we are setting a convention: any instance of this class or the other 26 XMessage classes we still have to implement will have a `.command` attribute so we can always know why kind of message we're dealing with.
* `VersionMessage.from_bytes` is also a convention that all 26 other `XMessage` classes will implement. Let's assume we are trying to handle an incoming `Packet` insance, which we will call `packet`. The purpose of this `VersionMessage.from_bytes` classmethod is to take the value of `packet.payload`, and based on `packet.command`, correctly read the payload values and instantiate an instance of the `XMessage` class. So, if `packet.command` is `b"block"` then we will need to read [these values](https://en.bitcoin.it/wiki/Protocol_documentation#block) and use them to instantiate a `BlockMessage` object. `packet.payload`s are raw bytes, which are much easier to read `n` bytes at a time if we first turn them into an `io.BytesIO` object -- so that's what we'll do.
* Lastly, you will notice `read_int`, `read_var_str`, `read_var_int` and `read_bool` methods, which we need to implement. Since we will doing some of these operations over and over again -- such as reading `n` bytes and interpreting them as a Python `int`, it makes sense to implement so-called "helper methods" to simplify our code, make it more testable and readable. 

In [8]:
class VersionMessage:

    command = b"version"

    def __init__(self, version, services, timestamp, addr_recv, addr_from, 
                 nonce, user_agent, start_height, relay):
        self.version = version
        self.services = services
        self.timestamp = timestamp
        self.addr_recv = addr_recv
        self.addr_from = addr_from
        self.nonce = nonce
        self.user_agent = user_agent
        self.start_height = start_height
        self.relay = relay

    @classmethod
    def from_bytes(cls, payload):
        stream = BytesIO(payload)
        
        version = read_int(stream, 4)
        services = read_int(stream, 8)
        timestamp = read_int(stream, 8)
        addr_recv = stream.read(26)
        addr_from = stream.read(26)
        nonce = read_int(stream, 8)
        user_agent = read_var_str(stream)
        start_height = read_int(stream, 4)
        relay = read_bool(stream)
        
        return cls(version, services, timestamp, addr_recv, addr_from, 
                   nonce, user_agent, start_height, relay)
    


![image](./images/version-message.png)

The `from_bytes` classmethod just translates the `Description` and `Data Type` columns of the above protocol documentation chart into python code. 

Here we encounter some "types" that we're familiar with abstracting the Bitcoin protocol's [general message structure](https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure) with our `Packet` class -- `int32_t` / `uint64_t` / `int64_t` -- which are different types in a "low-level" language like C++, but are all equivalent to the `int` type in Python. Our previously implemented `bytes_to_int` can handle these just fine.

But we also encounter some new types: `net_addr`, `varstr`, and `bool`. Even worse, if we click on the [`varstr` link](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_string) we see that it contains one additional type: `varint`. Worse still, the [`net_addr` link](https://en.bitcoin.it/wiki/Protocol_documentation#Network_address) contains a `time`, `services` and `IPv6/4` fields nominally of types `uint32`, `uint64_t` and `char[16]` but in order for us to make sense of what they hell them mean each requires parsing: the `time` integer as a Unix timestamp, the `services` integer as a damn "bitfield" (whatever that is!), and `IPv6/4` IP address as a 16 digit bytestring where the first 12 digits are always `00 00 00 00 00 00 00 00 00 00 FF FF` and only the last 4 matter! Oh, and remember how I mentioned that Satoshi usually, but not always, encoded his integers with "little endian" byte order (most significant digits is on the left)? Well the `port` attribute of `net_addr` is encoded "big endian", where the most significant digit is on the right. Yes, the exact opposite of everything else!!!

Hunker down for a looooooong lesson!

# "Integer" fields

In the last lesson we implemented `bytes_to_int(n)`. We'll start tackling all this mountain of work by implementing a small helper method `read_int(stream)` atop `bytes_to_int(n)` which first reads `n` from `stream` and then calls `bytes_to_int` with the bytes it read.

Adn we're going to create an argument`byte_order` which will default to `little` because almost every integer our program deals with will be little-endian encoded. But IP ports -- and soon other -- are big-endian encoded so we must allow callers to override this `bytes_order="little"` default value if they have a big-endian endcoded integer on their hands.

In [12]:
def read_int(stream, n, byte_order='little'):
    b = stream.read(n)
    return bytes_to_int(b, byte_order)

# "Boolean" fields

`bool` is the next simplest: it's a `1` or it's `0`. Actually, it's even simpler, huh? But we're going to resuse the code above so I'm introducing it second.

In fact, we could just use `read_int` and pass around `1`'s and `0`'s and our program would work just fine. After all, in Python the statement `1 == True and 0 == False` evaluates to `True` in Python. But Python gives us a built-in `bool` class for dealing with true-or-false, 1-or-0 values because it gives our programs greater clarity and readability.

### Exercise #3: implement `read_bool`

In [72]:
def read_bool(stream):
    bytes_ = stream.read(1)
#     if len(bytes_) != 1:
#         raise RuntimeError("Stream ran dry")
    integer =  bytes_to_int(bytes_)
    boolean = bool(integer)
    return boolean

In [18]:
import test_data

def test_read_bool_0():
    stream = test_data.make_stream(test_data.true_bytes)
    result = read_bool(stream)
    assert type(result) == bool
    assert result is True
    
def test_read_bool_1():
    stream = test_data.make_stream(test_data.false_bytes)
    result = read_bool(stream)
    assert type(result) == bool
    assert result is False
    
ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_read_bool_*")

unittest.case.FunctionTestCase (test_read_bool_0) ... ok
unittest.case.FunctionTestCase (test_read_bool_1) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.001s

OK


Once you get your `read_bool` function to pass these tests by successfully reading `True` and `False` values, I want you to implement one more thing.

I want you to raise a RuntimeError if `stream.read(n)` doesn't return a byte string of length `n`. This is just a check to make sure that our program is running correctly.

In [19]:
def test_read_bool_2():
    stream = test_data.make_stream(b"")
    with pytest.raises(RuntimeError) as e_info:
        result = read_bool(stream)

ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_read_bool_*")

unittest.case.FunctionTestCase (test_read_bool_2) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.001s

OK


# Timestamp fields

Network messages us Unix timestamps whenever they communicate a timestamp.

Here's how we interpret a Unix timestamp in Python

In [127]:
from datetime import datetime

def read_timestamp(stream):
    timestamp = read_int(stream, 8)
    return datetime.fromtimestamp(timestamp)


# "Variable Length" fields

Next comes `var_str`, the type of the "User Agent", which is basically an advertisement of the Bitcoin software implementation that the node is using. You can see a listing of popular values [here](https://bitnodes.earn.com/nodes/).

["Variable Length Strings"](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_string) are used for values that string fields undependable length. This technique strives to use only the space it needs. It does this by prepending a "variable length integer" in front of the string value being communicated. This `var_int` tells the recipient how many bytes they need to read to get the string value being communicated. This is kind of similar to hopw the payload bytes are handled in our `Packet.from_bytes` -- first we read `length` and then we read `lenght`-many bytes to get our raw payload. Same idea here, but now the length of the string isn't an integer, but a "variable length integer".

How does `var_int` work?

The first byte of a `var_int` is a marker which says how many bytes come after it:
* `0xFF`: 8 byte integer follows
* `0xFE`: 4 byte integer follows
* `0xFD`: 2 byte integer follows
* < `0xFD`: interpret first byte as a 1 byte integer

### Exercise #4:  implement `var_int`, since `var_str` depends on it.

In [137]:
def read_var_int(stream):
    i = read_int(stream, 1)
    if i == 0xff:
        return bytes_to_int(stream.read(8))
    elif i == 0xfe:
        return bytes_to_int(stream.read(4))
    elif i == 0xfd:
        return bytes_to_int(stream.read(2))
    else:
        return i

In [99]:
import ipytest, pytest
import test_data as td

enumerated = (
    (td.eight_byte_int, td.eight_byte_var_int),
    (td.four_byte_int, td.four_byte_var_int),
    (td.two_byte_int, td.two_byte_var_int),
    (td.one_byte_int, td.one_byte_var_int),
)

def test_read_var_int():
    for correct_int, var_int in enumerated:
        stream = td.make_stream(var_int)
        calculated_int = read_var_int(stream)
        assert correct_int == calculated_int

ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_read_var_int*")

unittest.case.FunctionTestCase (test_read_var_int) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... ok
runTest (ipytest._DocTestCase) ... 

var int first bytes 255
var int first bytes 254
var int first bytes 253
var int first bytes 7


ok

----------------------------------------------------------------------
Ran 15 tests in 0.036s

OK


Now that we have that out of the way:

### Exercise #5: Implement `read_var_str`

In [138]:
def read_var_str(stream):
    length = read_var_int(stream)
    string = stream.read(length)
    return string

In [46]:
import ipytest, pytest
import test_data as td

enumerated = (
    (td.short_str, td.short_var_str),
    (td.long_str, td.long_var_str),
)

def test_read_var_str():
    for correct_byte_str, var_str in enumerated:
        stream = td.make_stream(var_str)
        calculated_byte_str = read_var_str(stream)
        assert correct_byte_str == calculated_byte_str

ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_read_var_str*")

unittest.case.FunctionTestCase (test_read_var_str) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.001s

OK


# `services` field

[The version section of the protocol docs](https://en.bitcoin.it/wiki/Protocol_documentation#version) provides us with the following guide for interpreting the `services` field of the `version` payload:

![image](images/services.png)

The type of this field is "bitfield". [Check out the wikipedia entry](https://en.wikipedia.org/wiki/Bit_field) for a better explanation that I can provide.

In a bitfield every bit holds some pre-defined meaning. This is a 8 byte / 64 bit bitfield (remember, a byte is just a collection of 8 bits so 8 bytes is 8*8=64 bits).


From the table above we can see that the least significant digit in the binary representation (decimal value `2^0=1`) represents `NODE_NETWORK`, or whether this peer "can be asked for full blocks or just headers".

The second least-significant digit (decimal value `2^1=2`) NODE_GETUTXO

The third least-significant digit (decimal value `2^2=4`)NODE_BLOOM

The fourth least-significant digit (decimal value `2^3=8`) NODE_WITNESS

The eleventh least-significant digit (decimal value `2^10=1024`) NODE_NETWORK_LIMITED

The rest of the bits (decimal values `2*n` where n in {4, 5, 6, 7, 8, 9, 11, 12, ..., 63} have no meaning, yet.

So, in order to interpret this field we need to look up the nth bit in the table above and see if it means anything.

So, in python we want to be able to basically produce a dictionary like this for every node we connect to. This would allow us to look up what services that node offers _by name_.

```
{
    'NODE_NETWORK': True,
    'NODE_GETUTXO': False,
    'NODE_BLOOM': True,
    'NODE_WITNESS': False,
    'NODE_NETWORK_LIMITED': True,
}
```

Furthermore, we could write a function that produces this lookup table for us given an integer bitfield and a magical `check_bit(n)` function:

```
def read_services(stream):
    n = read_int(stream, 4)
    return {
        'NODE_NETWORK': check_bit(services_int, 0),           # 1    = 2**0
        'NODE_GETUTXO': check_bit(services_int, 1),           # 2    = 2**1
        'NODE_BLOOM': check_bit(services_int, 2),             # 4    = 2**2
        'NODE_WITNESS': check_bit(services_int, 3),           # 8    = 2**3
        'NODE_NETWORK_LIMITED': check_bit(services_int, 10),  # 1024 = 2**10
    }
```

For now, I'm just going to give you a definition of the magical `check_bit` function:

In [89]:
def check_bit(number, index):
    """See if the bit at `index` in binary representation of `number` is on"""
    mask = 1 << index
    return bool(number & mask)

def services_int_to_dict(services_int):
    return {
        'NODE_NETWORK': check_bit(services_int, 0),           # 1    = 2**0
        'NODE_GETUTXO': check_bit(services_int, 1),           # 2    = 2**1
        'NODE_BLOOM': check_bit(services_int, 2),             # 4    = 2**2
        'NODE_WITNESS': check_bit(services_int, 3),           # 8    = 2**3
        'NODE_NETWORK_LIMITED': check_bit(services_int, 10),  # 1024 = 2**10
    }

def read_services(stream):
    services_int = read_int(stream, 8)
    return services_int_to_dict(services_int)

Here's some `read_services` outputs for some possible inputs:

In [41]:
from pprint import pprint

bitfields = [
    1,
    8,
    1 + 8,
    1024,
    8 + 1024,
    1 + 2 + 4 + 8 + 1024,
    2**5 + 2**9 + 2**25,
]

for bitfield in bitfields:
    pprint(f"(n={bitfield})")
    stream = BytesIO(int_to_bytes(bitfield, 4))
    pprint(read_services(stream))
    print()

'(n=1)'
{'NODE_BLOOM': False,
 'NODE_GETUTXO': False,
 'NODE_NETWORK': True,
 'NODE_NETWORK_LIMITED': False,
 'NODE_WITNESS': False}

'(n=8)'
{'NODE_BLOOM': False,
 'NODE_GETUTXO': False,
 'NODE_NETWORK': False,
 'NODE_NETWORK_LIMITED': False,
 'NODE_WITNESS': True}

'(n=9)'
{'NODE_BLOOM': False,
 'NODE_GETUTXO': False,
 'NODE_NETWORK': True,
 'NODE_NETWORK_LIMITED': False,
 'NODE_WITNESS': True}

'(n=1024)'
{'NODE_BLOOM': False,
 'NODE_GETUTXO': False,
 'NODE_NETWORK': False,
 'NODE_NETWORK_LIMITED': True,
 'NODE_WITNESS': False}

'(n=1032)'
{'NODE_BLOOM': False,
 'NODE_GETUTXO': False,
 'NODE_NETWORK': False,
 'NODE_NETWORK_LIMITED': True,
 'NODE_WITNESS': True}

'(n=1039)'
{'NODE_BLOOM': True,
 'NODE_GETUTXO': True,
 'NODE_NETWORK': True,
 'NODE_NETWORK_LIMITED': True,
 'NODE_WITNESS': True}

'(n=33554976)'
{'NODE_BLOOM': False,
 'NODE_GETUTXO': False,
 'NODE_NETWORK': False,
 'NODE_NETWORK_LIMITED': False,
 'NODE_WITNESS': False}



Hopefully that kind of makes sense.

### Exercise #6: complete these useless functions to hammer home you understanding of this strange `services` "bitfield"

In [42]:
def offers_node_network_service(services_bitfield):
    # given integer services_bitfield, return whether the NODE_NETWORK bit is on
    return services_int_to_dict(services_bitfield)['NODE_NETWORK']

def offers_node_bloom_and_node_witness_services(services_bitfield):
    # given integer services_bitfield, return whether the 
    # NODE_BLOOM and NODE_WITNESS bits are on
    return services_int_to_dict(services_bitfield)['NODE_BLOOM'] \
        and services_int_to_dict(services_bitfield)['NODE_WITNESS']

In [43]:
import ipytest, pytest

def test_services_0():
    assert offers_node_network_service(1) is True
    assert offers_node_network_service(1 + 8) is True
    assert offers_node_network_service(4) is False
    

def test_services_1():
    assert offers_node_bloom_and_node_witness_services(1) is False
    assert offers_node_bloom_and_node_witness_services(1 + 8) is False
    assert offers_node_bloom_and_node_witness_services(4 + 8) is True
    
ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_services*")

unittest.case.FunctionTestCase (test_services_0) ... ok
unittest.case.FunctionTestCase (test_services_1) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK


In [299]:
def services_offered(bitfield):
    return [k for k, v in services_int_to_dict(bitfield).items() if v]

In [44]:
import ipytest, pytest

raw_version_to_services_offered = {
    b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x00\xa3\xfcY[\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04\x9b\xf8\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x88\x88\x18\xfaf\x93\x92\x97\x10/Satoshi:0.16.0/\x0eq\x07\x00\x01': ['NODE_NETWORK', 'NODE_BLOOM', 'NODE_WITNESS', 'NODE_NETWORK_LIMITED'],
    b'r\x11\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\xf8\xfcY[\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04\xc3N\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffR\x16x\xe8 \x8d+L\xd2\xab\xc1\xd8\x0c\xeb\x0f/Satoshi:0.9.3/n\x80\x06\x00\x01': ['NODE_NETWORK'],
    b'\x7f\x11\x01\x00\r\x00\x00\x00\x00\x00\x00\x00|\xfcY[\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04\xc9.\r\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x8b3;\xd3\xde-Y\x99\x10/Satoshi:0.15.0/&%\x08\x00\x01': ['NODE_NETWORK', 'NODE_BLOOM', 'NODE_WITNESS'],
    b'\x7f\x11\x01\x00\x0c\x04\x00\x00\x00\x00\x00\x00y\xfcY[\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04\x8e\xa4\x0c\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00`4"\xe2B\xc6\x82\r\x10/Satoshi:0.16.0/&%\x08\x00\x01': ['NODE_BLOOM', 'NODE_WITNESS', 'NODE_NETWORK_LIMITED'],
}

def test_services_offered():
    for raw_version, services_offered in raw_version_to_services_offered.items():
        v = VersionMessage.from_bytes(raw_version)
        calculated_services_offered = [k for k, v in services_int_to_dict(v.services).items() if v]
        assert set(services_offered) == set(calculated_services_offered)

ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_services_offered*")

unittest.case.FunctionTestCase (test_services_offered) ... ERROR

ERROR: unittest.case.FunctionTestCase (test_services_offered)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-44-2d1def9011d5>", line 12, in test_services_offered
    v = VersionMessage.from_bytes(raw_version)
  File "<ipython-input-9-1a5fbd798093>", line 28, in from_bytes
    user_agent = read_var_str(stream)
NameError: name 'read_var_str' is not defined

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)


# "Network Address" Type

[`net_addr`](https://en.bitcoin.it/wiki/Protocol_documentation#Network_address) is the most complicated, so we'll handle it last.

Network addresses require we interpret 4 new kinds of data:

1. `time`: Unix timestamp. Already done.
2. `services`: integer bitfield. Already done.
3. `IP address`: complicated ...
4. `port`: big-endian encoded `int`

Here's a Python class abstracting this "Network Address" type. Just two methods remain to be implemented: `read_ip` and `read_port`:

In [None]:
class Address:

    def __init__(self, services, ip, port, time):
        self.services = services
        self.ip = ip
        self.port = port
        self.time = time

    @classmethod
    def from_bytes(cls, bytes_, version_msg=False):
        stream = BytesIO(bytes_)
        return cls.from_stream(stream, version_msg)
    
    @classmethod
    def from_stream(cls, stream, version_msg=False):
        if version_msg:
            time = None
        else:
            time = read_timestamp(stream)
        services = read_services(stream)
        ip = stream.read(16)
        port = read_int(stream, 2, byte_order='big')
        return cls(services, ip, port, time)
    
    def __repr__(self):
        return f"<Address {self.ip}:{self.port}>"

## "Network Addess > IPv6/4" field

Since these "network addresses" are sort of their own "data type", let's make a class to abstract them.

Here's my best translation of the docs:

![image](images/network-address.png)

In [None]:
def read_timestamp(stream):
    timestamp = read_int(stream, 8)
    return datetime.fromtimestamp(timestamp)

## "Network Address > Port" Field

This is just 2 byte integer -- but it's encoded with the opposite byte order of what we usually read using `read_bytes`. But have no fear, `read_bytes` takes an optional `byte_order` parameter which defaults to `little` -- since we're usually reading little-endian encoded messages. But if we set it to `big`, then `read_int` will successfully read the "big endian" / "network byte order" port number.

In order to have clean, testable code we will define another helper method: `read_port`

In [140]:
def read_port(stream):
    return read_int(stream, 2, byte_order="big")

# Let's Parse 1000 Version Messages From Real Bitcoin Nodes

So it seems like we correctly translated all the relevant tables from the protocol documentation into Python code. But how can we be sure we didn't make any mistakes. We can probably never be completely sure, but a good way for us to get started would be to send `version` messages to a large number of Bitcoin nodes, listen for and decode their `version` replies using `VersionMessage` classmethods and seeing if things "make sense".

`earn.com` offers [a free, unauthenticated API](https://bitnodes.earn.com/api/#list-nodes) where you can get a list of all visible Bitcoin full nodes.

Execute this command in your terminal to see what kind of data this API gives us:

```
curl -H "Accept: application/json; indent=4" https://bitnodes.earn.com/api/v1/snapshots/latest/
```

We can call this API directly from Python using the `requests` module:

In [1]:
import requests
from pprint import pprint

def get_nodes():
    url = "https://bitnodes.earn.com/api/v1/snapshots/latest/"
    response = requests.get(url)
    return response.json()["nodes"]

In [4]:
nodes = get_nodes()
pprint(nodes)

{'1.119.141.170:8333': [70015,
                        '/Satoshi:0.15.1/',
                        1532693338,
                        13,
                        534018,
                        '1.119.141.170',
                        'Beijing',
                        'CN',
                        39.9289,
                        116.3883,
                        'Asia/Harbin',
                        'AS4847',
                        'China Networks Inter-Exchange'],
 '1.122.92.199:8333': [70015,
                       '/Satoshi:0.16.1/',
                       1532707844,
                       1037,
                       534019,
                       'cpe-1-122-92-199.bpe6-r-961.pie.wa.bigpond.net.au',
                       'Parkerville',
                       'AU',
                       -31.8747,
                       116.138,
                       'Australia/West',
                       'AS1221',
                       'Telstra Pty Ltd'],
 '1.234.51.91:8333': [70002,
   

                        'Pacific/Auckland',
                        'AS9790',
                        'CallPlus Services Limited'],
 '102.177.134.63:8333': [70015,
                         '/Satoshi:0.14.2/UASF-Segwit:1.0(BIP148)/',
                         1532714673,
                         134217741,
                         534018,
                         '102.177.134.63',
                         'Johannesburg',
                         'ZA',
                         -26.2309,
                         28.0583,
                         'Africa/Johannesburg',
                         'AS328239',
                         'EvoNet-AS'],
 '103.1.210.27:8333': [70015,
                       '/Satoshi:0.16.0/',
                       1531728084,
                       1037,
                       534018,
                       '103.1.210.27',
                       'Hanoi',
                       'VN',
                       21.0333,
                       105.85,
                      

                         'AS8100',
                         'QuadraNet, Inc'],
 '104.231.36.186:8333': [70015,
                         '/Satoshi:0.16.1/',
                         1532465027,
                         1037,
                         534018,
                         'cpe-104-231-36-186.cinci.res.rr.com',
                         'Middletown',
                         'US',
                         39.5151,
                         -84.3983,
                         'America/New_York',
                         'AS10796',
                         'Time Warner Cable Internet LLC'],
 '104.234.220.135:8333': [70015,
                          '/Satoshi:0.16.99/',
                          1530858116,
                          1037,
                          534018,
                          '104.234.220.135',
                          'Brampton',
                          'CA',
                          43.7196,
                          -79.6854,
                          'Am

                         '/Satoshi:0.16.0/',
                         1532667105,
                         1037,
                         534018,
                         '110.173.51.114',
                         None,
                         'HK',
                         22.25,
                         114.1667,
                         'Asia/Hong_Kong',
                         'AS45753',
                         'NETSEC NOC'],
 '110.173.59.106:8333': [70015,
                         '/Satoshi:0.16.0/',
                         1530858117,
                         1037,
                         534018,
                         '110.173.59.106',
                         None,
                         'HK',
                         22.25,
                         114.1667,
                         'Asia/Hong_Kong',
                         'AS45753',
                         'NETSEC NOC'],
 '110.21.244.176:8333': [70015,
                         '/Satoshi:0.16.0/',
                 

                         'CN',
                         34.7725,
                         113.7266,
                         None,
                         'AS4837',
                         'CHINA UNICOM China169 Backbone'],
 '116.148.136.7:8333': [70015,
                        '/Satoshi:0.15.1/',
                        1532716885,
                        13,
                        534018,
                        '116.148.136.7',
                        None,
                        'CN',
                        34.7725,
                        113.7266,
                        None,
                        'AS4837',
                        'CHINA UNICOM China169 Backbone'],
 '116.206.80.33:8333': [70015,
                        '/Bitcoin ABC:0.16.1(EB8.0)/',
                        1530858117,
                        37,
                        530361,
                        'bch-node-1.solnode.cloud',
                        None,
                        'AU',
                  

 '122.96.180.181:8333': [70015,
                         '/Satoshi:0.15.1/',
                         1532716885,
                         13,
                         534018,
                         '122.96.180.181',
                         'Nanjing',
                         'CN',
                         32.0617,
                         118.7778,
                         'Asia/Shanghai',
                         'AS4837',
                         'CHINA UNICOM China169 Backbone'],
 '122.96.198.139:8333': [70015,
                         '/Satoshi:0.15.1/',
                         1532728001,
                         13,
                         534018,
                         '122.96.198.139',
                         'Nanjing',
                         'CN',
                         32.0617,
                         118.7778,
                         'Asia/Shanghai',
                         'AS4837',
                         'CHINA UNICOM China169 Backbone'],
 '122.96.198.55:

                        1531397979,
                        1036,
                        534018,
                        '13.94.112.231',
                        'Dublin',
                        'IE',
                        53.3331,
                        -6.2489,
                        'Europe/Dublin',
                        'AS8075',
                        'Microsoft Corporation'],
 '13.95.209.31:8333': [70015,
                       '/Satoshi:0.15.1/',
                       1531784876,
                       13,
                       534018,
                       '13.95.209.31',
                       'Amsterdam',
                       'NL',
                       52.35,
                       4.9167,
                       'Europe/Amsterdam',
                       'AS8075',
                       'Microsoft Corporation'],
 '13.95.232.174:8333': [70015,
                        '/Satoshi:0.16.0(Tecknoworks)/',
                        1531787228,
                        10

                       'Frankfurt',
                       'DE',
                       50.1167,
                       8.6833,
                       'Europe/Berlin',
                       'AS200130',
                       'Digital Ocean, Inc.'],
 '138.68.79.25:8333': [70015,
                       '/Satoshi:0.16.0/',
                       1532739088,
                       1037,
                       534019,
                       'p-037.coinage.space',
                       'Frankfurt',
                       'DE',
                       50.1167,
                       8.6833,
                       'Europe/Berlin',
                       'AS200130',
                       'Digital Ocean, Inc.'],
 '138.68.79.27:8333': [70015,
                       '/Satoshi:0.16.0/',
                       1532739088,
                       1037,
                       534018,
                       'p-038.coinage.space',
                       'Frankfurt',
                       'DE',
       

 '144.76.172.156:8333': [70015,
                         '/Satoshi:0.16.0/',
                         1532589871,
                         1037,
                         534018,
                         'globe.groupsecure.com',
                         None,
                         'DE',
                         51.2993,
                         9.491,
                         'Europe/Berlin',
                         'AS24940',
                         'Hetzner Online GmbH'],
 '144.76.172.176:8333': [70015,
                         '/Satoshi:0.16.0/',
                         1530858116,
                         1037,
                         534018,
                         'globe.groupsecure.com',
                         None,
                         'DE',
                         51.2993,
                         9.491,
                         'Europe/Berlin',
                         'AS24940',
                         'Hetzner Online GmbH'],
 '144.76.175.139:8333': [70012,
  

                        1532715793,
                        13,
                        534018,
                        '153.37.12.170',
                        'Nanjing',
                        'CN',
                        32.0617,
                        118.7778,
                        'Asia/Shanghai',
                        'AS4837',
                        'CHINA UNICOM China169 Backbone'],
 '153.37.12.183:8333': [70015,
                        '/Satoshi:0.15.1/',
                        1532728001,
                        13,
                        534018,
                        '153.37.12.183',
                        'Nanjing',
                        'CN',
                        32.0617,
                        118.7778,
                        'Asia/Shanghai',
                        'AS4837',
                        'CHINA UNICOM China169 Backbone'],
 '153.37.12.187:8333': [70015,
                        '/Satoshi:0.15.1/',
                        1532725387,
        

                        1530858116,
                        1037,
                        534018,
                        '159.65.85.189',
                        'London',
                        'GB',
                        51.5142,
                        -0.0931,
                        'Europe/London',
                        'AS14061',
                        'DigitalOcean, LLC'],
 '159.65.95.133:8333': [70015,
                        '/Satoshi:0.16.1/',
                        1531959272,
                        1036,
                        534018,
                        '159.65.95.133',
                        'London',
                        'GB',
                        51.5142,
                        -0.0931,
                        'Europe/London',
                        'AS14061',
                        'DigitalOcean, LLC'],
 '159.69.14.65:8333': [70015,
                       '/Satoshi:0.16.1/',
                       1532466990,
                       1037,
      

KeyboardInterrupt: 

In particular, we can get a list of addresses using the `nodes.keys()`:

In [2]:
def get_addr_tuples():
    nodes = get_nodes()
    raw_addrs = nodes.keys()
    addr_tuples = []
    for raw_addr in raw_addrs:
        ip, port = raw_addr.rsplit(":", 1)
        addr_tuple = (ip, int(port))
        addr_tuples.append(addr_tuple)
    return addr_tuples

addr_tuples = get_addr_tuples()
print(addr_tuples)

[('31.185.140.227', 8333), ('220.75.229.130', 3927), ('185.25.48.184', 8333), ('95.213.145.52', 8333), ('111.8.128.194', 8333), ('47.97.174.230', 8333), ('78.46.32.92', 8333), ('71.114.57.136', 8333), ('46.197.190.196', 8333), ('[2001:41d0:1008:2bed::]', 8333), ('73.149.21.171', 8333), ('188.218.66.173', 8333), ('109.68.136.85', 8333), ('212.124.170.74', 8333), ('67.205.137.64', 8333), ('176.10.136.143', 8333), ('52.29.42.170', 8333), ('35.190.133.32', 8333), ('34.194.208.92', 8333), ('93.70.51.221', 8333), ('52.73.147.181', 8333), ('[2a01:7c8:aaba:3f5:5054:ff:fe5d:2122]', 8333), ('98.5.4.22', 8333), ('155.4.153.21', 8333), ('122.100.186.58', 8333), ('97.95.169.43', 8333), ('95.216.12.243', 8333), ('216.10.169.254', 8333), ('103.37.233.211', 8333), ('13.59.42.238', 8333), ('213.239.201.46', 8333), ('124.248.227.62', 8333), ('67.169.94.193', 8333), ('120.78.188.194', 8334), ('88.191.247.11', 8333), ('188.40.121.97', 8333), ('158.69.250.4', 8333), ('62.210.94.63', 8333), ('93.105.187.121

In [1]:
import downloader

downloader.cleanup()
addrs = downloader.get_addr_tuples()
downloader.connect_many(addrs)

1 / 9488 tasks succeeded (0 failed)
2 / 9488 tasks succeeded (0 failed)
3 / 9488 tasks succeeded (0 failed)
4 / 9488 tasks succeeded (0 failed)
5 / 9488 tasks succeeded (0 failed)
6 / 9488 tasks succeeded (0 failed)
7 / 9488 tasks succeeded (0 failed)
8 / 9488 tasks succeeded (1 failed)
9 / 9488 tasks succeeded (1 failed)
10 / 9488 tasks succeeded (1 failed)
11 / 9488 tasks succeeded (2 failed)
12 / 9488 tasks succeeded (2 failed)
13 / 9488 tasks succeeded (2 failed)
14 / 9488 tasks succeeded (2 failed)
15 / 9488 tasks succeeded (2 failed)
16 / 9488 tasks succeeded (2 failed)
17 / 9488 tasks succeeded (2 failed)
18 / 9488 tasks succeeded (2 failed)
19 / 9488 tasks succeeded (3 failed)
20 / 9488 tasks succeeded (3 failed)
21 / 9488 tasks succeeded (3 failed)
22 / 9488 tasks succeeded (3 failed)
23 / 9488 tasks succeeded (3 failed)
24 / 9488 tasks succeeded (3 failed)
25 / 9488 tasks succeeded (4 failed)
26 / 9488 tasks succeeded (4 failed)
27 / 9488 tasks succeeded (4 failed)
28 / 9488 

KeyboardInterrupt: 

Do you notice how slow this is?

My machine received 9513 addresses from earn.com, and is processes about 5 messages per second. This is going to take about 30 seconds to process everything. 

TOO SLOW!!!

Now let's thing for a second. Why's it so slow? In fact, it's because we're spending almost all our time waiting for `sock.connect` or `sock.recv` to give us a return value. Our Python program is just sitting on its hands while packets fly across the world, one at a time.

Isn't there something we could have our Python program work on while it waits? Couldn't we perhaps have it send a few messages at a time?

The answer, or course, is "yes". But this requires "asynchronous programming". FIXME: insert youtube link

I'm not going to attempt to fully explain how this works, but I'll once again give you a magical program that does what we want.

In [2]:
import downloader

downloader.cleanup()
addrs = downloader.get_addr_tuples()
downloader.async_connect_many(addrs)

RuntimeError: This event loop is already running

500 / 9488 tasks succeeded (160 failed)
1000 / 9488 tasks succeeded (276 failed)
1500 / 9488 tasks succeeded (371 failed)
2000 / 9488 tasks succeeded (475 failed)
2500 / 9488 tasks succeeded (572 failed)
3000 / 9488 tasks succeeded (680 failed)
3500 / 9488 tasks succeeded (755 failed)
4000 / 9488 tasks succeeded (840 failed)
4500 / 9488 tasks succeeded (937 failed)
5000 / 9488 tasks succeeded (1029 failed)
5500 / 9488 tasks succeeded (1143 failed)
6000 / 9488 tasks succeeded (1237 failed)
6500 / 9488 tasks succeeded (1338 failed)
7000 / 9488 tasks succeeded (1429 failed)
7500 / 9488 tasks succeeded (1530 failed)


So what the hell is going on here?

These strings don't look like port numbers, and 

In [12]:
from collections import Counter
from library import VersionMessage, Address

def get_versions():
    with open('versions.txt', 'rb') as f:
        lines = f.readlines()
        lines[:] = (value.strip() for value in lines if value != b'\n')
        return lines

In [34]:
from collections import Counter

vms = []

for raw in get_versions():
    try:
        vm = VersionMessage.from_bytes(raw)
        vms.append(vm)
    except Exception as e:
        print(e)
        continue

name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 're

name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 're

name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 'read_int' is not defined
name 're

In [17]:
len(vms)

7179

In [139]:
for vm in vms:
    print(vm.addr_recv)

<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':48334>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':37948>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':58338>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':38104>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':44134>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':40206>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':41446>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':36032>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':37736>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':42252>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':39038>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04':41086>
<Address b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05

In [62]:
ports_counter = Counter([addr.port for addr in addrs])
ports_counter.most_common(10)

[]

In [127]:
ip_counter = Counter([addr.ip for addr in addrs])
ip_counter.most_common(10)

[(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04', 853),
 (b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc6\x1bd\t', 53)]

In [191]:
ips = Counter([addr.formatted_ip for addr in addrs])
ips

Counter({'104:5:61:4': 854, '198:27:100:9': 53})

I get 

```
{IPv4Address('104.5.61.4'), IPv4Address('198.27.100.9')}
```

'104.5.61.4' is my public ip address



In [None]:
# all 53 which report 8333 also report the wrong ip address ...
set([interpret_raw_ip(addr.ip) for addr in addrs if addr.port == 8333])

In [20]:
raw_wrong_ip = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc6\x1bd\t'

for line in get_versions():
    try:
        vm = VersionMessage.from_bytes(line)
        if vm.addr_recv.ip == raw_wrong_ip:
            print(vm.user_agent)
    except:
        continue

So there's something funky goin gon with that version of the bitcoin software. I would guess that it's hardcoding the port and ip. The reason I'm guessing it's hardcoded is because 8333 is the port that bitcoin core runs on. 

But not all node reporting this user agent get my ip / port wrong:

In [229]:
satoshi_16_user_agent = b'/Satoshi:0.16.0/'

for line in lines:
    try:
        vm = VersionMessage.from_bytes(line)
        a = Address.from_bytes(vm.addr_recv, version_msg=True)
        if vm.user_agent == satoshi_16_user_agent:
            print(interpret_raw_ip(a.ip), a.port)
    except:
        continue

104.5.61.4 38104
104.5.61.4 40206
104.5.61.4 36032
104.5.61.4 42252
104.5.61.4 41086
104.5.61.4 60476
104.5.61.4 40444
104.5.61.4 50582
104.5.61.4 33992
104.5.61.4 41322
104.5.61.4 56678
104.5.61.4 34268
104.5.61.4 54494
104.5.61.4 49878
104.5.61.4 38566
104.5.61.4 34762
104.5.61.4 39368
104.5.61.4 59762
104.5.61.4 54394
104.5.61.4 38494
104.5.61.4 60240
104.5.61.4 57094
104.5.61.4 57306
104.5.61.4 35640
104.5.61.4 48560
104.5.61.4 47394
104.5.61.4 53204
104.5.61.4 60322
104.5.61.4 45656
104.5.61.4 46622
104.5.61.4 37308
104.5.61.4 46610
104.5.61.4 45784
104.5.61.4 38942
104.5.61.4 41368
104.5.61.4 51998
104.5.61.4 42938
104.5.61.4 37044
104.5.61.4 40808
104.5.61.4 56424
104.5.61.4 41222
104.5.61.4 38006
198.27.100.9 8333
104.5.61.4 55146
104.5.61.4 51558
104.5.61.4 56934
104.5.61.4 59922
104.5.61.4 47928
104.5.61.4 47572
104.5.61.4 41672
104.5.61.4 58354
104.5.61.4 33416
104.5.61.4 37798
104.5.61.4 51558
104.5.61.4 60962
104.5.61.4 35142
104.5.61.4 44140
104.5.61.4 44658
104.5.61.4 35

At this point I think we can be reasonably confident that we've figured out how to parse ip addresses. But along the way it seems that we've also learned to not trust them!

In [178]:
right_ip = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04'

formatted = ":".join([str(b) for b in right_ip[-4:]])

formatted

'104:5:61:4'

OK, here's another test that puts everything together. Given a raw version message payload, tell me what services it offers:

# Parsing a complete Version response

In [305]:
# Don't look, ugly code ...

def version_report(version_payload):
    vm = VersionMessage.from_bytes(version_payload)
    for attr in ['command', 'nonce', 'relay', 'start_height', 'user_agent', 'version']:
        print(f"{attr}: {getattr(vm, attr)}")

    print(f"timestamp: {interpret_timestamp(vm.timestamp)}")
    for outside_attr in ['addr_from', 'addr_recv']:
        for inside_attr in ['formatted_ip', 'port', 'services', 'time']:
            print(f"{outside_attr}.{inside_attr}: {getattr(getattr(vm, outside_attr), inside_attr)}")
              
    print(f"services offered: {services_offered(vm.services)}")

In [306]:
payload = b'\x7f\x11\x01\x00\r\x00\x00\x00\x00\x00\x00\x00z\xfcY[\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x05=\x04\xcc\xd6\r\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00x/w\xa2\xfc2\x8d)\x10/Satoshi:0.15.1/&%\x08\x00\x01'

version_report(payload)

command: b'version'
nonce: 2994105387910115192
relay: True
start_height: 533798
user_agent: b'/Satoshi:0.15.1/'
version: 70015
timestamp: 2018-07-26 11:53:14
addr_from.formatted_ip: 0:0:0:0
addr_from.port: 0
addr_from.services: 0
addr_from.time: 13
addr_recv.formatted_ip: 61:4:204:214
addr_recv.port: 0
addr_recv.services: 0
addr_recv.time: 0
services offered: ['NODE_NETWORK', 'NODE_BLOOM', 'NODE_WITNESS']


In [292]:
from datetime import datetime

datetime.fromtimestamp(1532623994)

datetime.datetime(2018, 7, 26, 11, 53, 14)

In [310]:
import socket

PEER_IP = "35.187.200.6"

PEER_PORT = 8333

# magic "version" bytestring
VERSION = b'\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00j\x00\x00\x00\x9b"\x8b\x9e\x7f\x11\x01\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x93AU[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00rV\xc5C\x9b:\xea\x89\x14/some-cool-software/\x01\x00\x00\x00\x01'

sock = socket.socket()
sock.connect((PEER_IP, PEER_PORT))

# initiate the "version handshake"
sock.send(VERSION)

# receive their "version" response
msg = Packet.from_socket(sock)

version_report(msg.payload)

command: b'version'
nonce: 17531512664508898302
relay: True
start_height: 533834
user_agent: b'/Satoshi:0.16.0/'
version: 70015
timestamp: 2018-07-26 17:17:48
addr_from.formatted_ip: 0:0:0:0
addr_from.port: 0
addr_from.services: 0
addr_from.time: 1037
addr_recv.formatted_ip: 0:0:0:0
addr_recv.port: 0
addr_recv.services: 0
addr_recv.time: 1039
services offered: ['NODE_NETWORK', 'NODE_BLOOM', 'NODE_WITNESS', 'NODE_NETWORK_LIMITED']


Boom! This is basically the same code we finished the last lesson with, but our magical `version_report` function and all the functions it calls are able to decipher what this cryptic message _means_!

# `to_bytes()`

There are stil 3 large problems with the code above:
* I still supply our `VERSION` message, when I should construct it.
* I don't listen for our peer's `verack` response.
* I dont's send my `verack` upon receipt of our peer's `verack`

To fix the first problem we need to implement two methods:
* `Packet.to_bytes()`: This method will allow us to construct packets to be sent across the internet via socket
* `VersionMessage.to_bytes()`. This method will allow us to specify a valid payload for a `Packet` instance with `command` attribute set to `b"version"` -- our very own, hand-rolled version message.

So no we just need to reverse what `Packet.from_socket` and `VersionMessage.from_bytes` do.

In [96]:
from datetime import datetime
from time import mktime
from io import BytesIO
from ipaddress import ip_address

### OLD ###
def bytes_to_int(b, byte_order="little"):
    return int.from_bytes(b, byte_order)

def int_to_bytes(i, length, byte_order='little'):
    return int.to_bytes(i, length, byte_order)

def encode_command(cmd):
    padding_needed = 12 - len(cmd)
    padding = b"\x00" * padding_needed
    return cmd + padding


def read_magic(sock):
    magic_bytes = sock.recv(4)
    magic = bytes_to_int(magic_bytes)
    if magic != NETWORK_MAGIC:
        raise ValueError(f'Network magic "{magic}" is wrong')
    return magic


def read_command(sock):
    raw = sock.recv(12)
    # remove empty bytes
    command = raw.replace(b"\x00", b"")
    return command


def read_length(sock):
    raw = sock.recv(4)
    length = bytes_to_int(raw)
    return length


def read_checksum(sock):
    # FIXME: protocol documentation says this should be an integer ...
    raw = sock.recv(4)
    return raw


def calculate_checksum(payload_bytes):
    """First 4 bytes of sha256(sha256(payload))"""
    first_round = sha256(payload_bytes).digest()
    second_round = sha256(first_round).digest()
    first_four_bytes = second_round[:4]
    return first_four_bytes


def read_payload(sock, length, checksum):
    payload = sock.recv(length)
    calculated_checksum = calculate_checksum(payload)
    if calculated_checksum != checksum:
        raise RuntimeError("Checksums don't match")
    if length != len(payload):
        raise RuntimeError(
            "Tried to read {length} bytes, only received {len(payload)} bytes"
        )
    return payload


def read_int(stream, n, byte_order="little"):
    b = stream.read(n)
    return bytes_to_int(b, byte_order)


def read_bool(stream):
    bytes_ = stream.read(1)
    if len(bytes_) != 1:
        raise RuntimeError("Stream ran dry")
    integer = bytes_to_int(bytes_)
    boolean = bool(integer)
    return boolean


def read_var_int(stream):
    i = read_int(stream, 1)
    if i == 0xff:
        return bytes_to_int(stream.read(8))
    elif i == 0xfe:
        return bytes_to_int(stream.read(4))
    elif i == 0xfd:
        return bytes_to_int(stream.read(2))
    else:
        return i


def read_var_str(stream):
    length = read_var_int(stream)
    string = stream.read(length)
    return string


def read_timestamp(stream):
    timestamp = read_int(stream, 8)
    return datetime.fromtimestamp(timestamp)


def read_ip(stream):
    bytes_ = stream.read(16)
    return ip_address(bytes_)


def read_port(stream):
    return read_int(stream, 2, byte_order="big")


def port_to_bytes(port):
    return int_to_bytes(port, 2, byte_order="big")


def check_bit(number, index):
    """See if the bit at `index` in binary representation of `number` is on"""
    mask = 1 << index
    return bool(number & mask)


def services_int_to_dict(services_int):
    return {
        "NODE_NETWORK": check_bit(services_int, 0),  # 1    = 2**0
        "NODE_GETUTXO": check_bit(services_int, 1),  # 2    = 2**1
        "NODE_BLOOM": check_bit(services_int, 2),  # 4    = 2**2
        "NODE_WITNESS": check_bit(services_int, 3),  # 8    = 2**3
        "NODE_NETWORK_LIMITED": check_bit(services_int, 10),  # 1024 = 2**10
    }


def read_services(stream):
    services_int = read_int(stream, 8)
    return services_int_to_dict(services_int)

### end OLD ###


def bool_to_bytes(boolean):
    return int_to_bytes(int(boolean), 1)

def services_to_int(services):
    key_to_multiplier = (
        ("NODE_NETWORK", 2**0),
        ("NODE_GETUTXO", 2**1),
        ("NODE_BLOOM", 2**2),
        ("NODE_WITNESS", 2**3),
        ("NODE_NETWORK_LIMITED", 2**10),
    )
    return sum([
        int(services[key]) * multiplier
        for key, multiplier in key_to_multiplier
    ])

def services_to_bytes(services):
    return int_to_bytes(services_to_int(services), 8)

def timestamp_to_bytes(timestamp):
    unix_seconds = int(mktime(timestamp.timetuple()))
    return int_to_bytes(unix_seconds, 4)

def ip_to_bytes(ip):
    # FIXME correct?
    return ip.packed

def int_to_var_int(i):
    '''encodes an integer as a varint'''
    if i < 0xfd:
        return bytes([i])
    elif i < 0x10000:
        return b'\xfd' + int_to_little_endian(i, 2)
    elif i < 0x100000000:
        return b'\xfe' + int_to_little_endian(i, 4)
    elif i < 0x10000000000000000:
        return b'\xff' + int_to_little_endian(i, 8)
    else:
        raise RuntimeError('integer too large: {}'.format(i))

def str_to_var_str(s):
    length = len(s)
    return int_to_var_int(length) + s


class Address:

    def __init__(self, services, ip, port, time):
        self.services = services
        self.ip = ip
        self.port = port
        self.time = time

    @classmethod
    def from_bytes(cls, bytes_, version_msg=False):
        stream = BytesIO(bytes_)
        return cls.from_stream(stream, version_msg)

    @classmethod
    def from_stream(cls, stream, version_msg=False):
        if version_msg:
            time = None
        else:
            time = read_timestamp(stream)
        services = read_services(stream)
        ip = read_ip(stream)
        port = read_port(stream)
        return cls(services, ip, port, time)
    
    def to_bytes(self, version_msg=False):
        # FIXME: don't call this msg
        msg = b""
        # FIXME: What's the right condition here
        if self.time:
            msg += int_to_bytes(self.time, 4)
        msg += services_to_bytes(self.services)
        msg += ip_to_bytes(self.ip)
        msg += port_to_bytes(self.port)
        return msg


    def __repr__(self):
        return f"<Address {self.ip}:{self.port}>"

class VersionMessage:

    command = b"version"

    def __init__(self, version, services, timestamp, addr_recv, addr_from, 
                 nonce, user_agent, start_height, relay):
        self.version = version
        self.services = services
        self.timestamp = timestamp
        self.addr_recv = addr_recv
        self.addr_from = addr_from
        self.nonce = nonce
        self.user_agent = user_agent
        self.start_height = start_height
        self.relay = relay

    @classmethod
    def from_bytes(cls, payload):
        stream = BytesIO(payload)
        version = read_int(stream, 4)
        services = read_services(stream)
        timestamp = read_timestamp(stream)
        addr_recv = Address.from_stream(stream, version_msg=True)
        addr_from = Address.from_stream(stream, version_msg=True)
        nonce = read_int(stream, 8)
        user_agent = read_var_str(stream)
        start_height = read_int(stream, 4)
        relay = read_bool(stream)
        return cls(
            version,
            services,
            timestamp,
            addr_recv,
            addr_from,
            nonce,
            user_agent,
            start_height,
            relay,
        )

    def to_bytes(self):
        msg = int_to_bytes(self.version, 4)
        msg += services_to_bytes(self.services)
        msg += timestamp_to_bytes(self.timestamp)
        msg += self.addr_recv.to_bytes()
        msg += self.addr_from.to_bytes()
        msg += int_to_bytes(self.nonce, 8)
        msg += str_to_var_str(self.user_agent)
        msg += int_to_bytes(self.start_height, 4)
        msg += bool_to_bytes(self.relay)
        return msg

# FIXME: should i just define like Packet.to_bytes?

class Packet:

    def __init__(self, command, payload):
        self.command = command
        self.payload = payload

    @classmethod
    def from_socket(cls, sock):
        magic = read_magic(sock)
        if magic != NETWORK_MAGIC:
            raise RuntimeError(f'Network magic "{magic}" is wrong')

        command = read_command(sock)
        payload_length = read_length(sock)
        checksum = read_checksum(sock)
        payload = read_payload(sock, payload_length)
        
        calculated_checksum = calculate_checksum(payload)
        if calculated_checksum != checksum:
            raise RuntimeError("Checksums don't match")

        if payload_length != len(payload):
            raise RuntimeError("Tried to read {payload_length} bytes, only received {len(payload)} bytes")

        return cls(command, payload)

    def to_message(self):
        message_class = command_to_message_class(self.command)
        return message_class.from_payload(self.payload)

    def to_bytes(self):
        # FIXME make sure this is the bytes value ...
        result = NETWORK_MAGIC
        result += command_to_bytes(self.command)
        result += int_to_bytes(len(self.payload), 4)
        result += calculate_checksum(self.payload)
        result += self.payload
        return result    
    
    def __repr__(self):
        return f"<Message command={self.command} payload={self.payload}>"
    
class Verack:

    command = b'verack'

    @classmethod
    def from_bytes(cls, s):
        return cls()

    def to_bytes(self):
        return b""

In [97]:
from collections import Counter

vms = []

for raw in get_versions():
    try:
        vm = VersionMessage.from_bytes(raw)
        vms.append(vm)
    except Exception as e:
        print(e)
        continue

Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
b'' does not appear to be an IPv4 or IPv6 address
b'\x01' does not appear to be an IPv4 or IPv6 address
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
Stream ran dry
[Errno 75] Value too large for defined data type
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffd\x06' does not appear to be an IPv4 or IPv6 address
b'' does not appear to be an IPv4 or IPv6 address
[Errno 75] Value too large for defined data type
Stream ran dry
b'' does not appear to be an IPv4 or IPv6 address
Stream ran dry
[Errno 75] Value too la

Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
b'' does not appear to be an IPv4 or IPv6 address
b'.2/\x1c\x1b\x08\x00\x01' does not appear to be an IPv4 or IPv6 address
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
Stream ran dry
[Errno 75] Value too large for defined data type
b'' does not appear to be an IPv4 or IPv6 address
timestamp out of range for platform time_t
Stream ran dry
[Errno 75] Value too large for defined data type
S

In [98]:
len(vms)

7179

In [99]:
vm = vms[0]

In [101]:
pkt = Packet(encode_command(vm.command), vm.to_bytes())

In [102]:
print(vm == VersionMessage.from_bytes(pkt.payload))

RuntimeError: Stream ran dry

In [103]:
import test_data as td

vbs = td.version_byte_strings[0]

print(vbs)

print(VersionMessage.from_bytes(vbs).to_bytes())

b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x003\xc2X[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00{\xc5\xa7\x80\xa1\x87\xc1\xda\x10/Satoshi:0.16.0/\x8d$\x08\x00\x01'
b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x003\xc2X[\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00{\xc5\xa7\x80\xa1\x87\xc1\xda\x10/Satoshi:0.16.0/\x8d$\x08\x00\x01'


In [41]:
from collections import Counter

vms = []

for raw in get_versions():
    try:
        vm = VersionMessage.from_bytes(raw)
        vms.append(vm)
    except Exception as e:
        print(e)
        continue

Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran dry
Stream ran