# Recap

In the first lesson we learned to interpret the outermost layer of the Bitcoin Protocol onion: the [message structure](https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure). We learned to send a `version` message to Bitcoin Network peers and listen for their `version` response. We learned to read this response and check that the correct `magic` bytes came at the beginning of the message; to interpret which of the 27 `command` types the message is; to read `payload` data associated with the command and the check the payload with a `checksum` given to us by our remote peer;

In the second lesson we learned to peel another layer or two of the onion. We learned to read the [payload](https://en.bitcoin.it/wiki/Protocol_documentation#version) of `version` messages. Along the way we had to figure out how to interpret all the sub-structures of the data, such as variable-length strings", variable-length integers, network addresses, `services` bitfields, Unix timestamps, and big-endian encoded port numbers.

We've come a long way, but we still have a long way to go.

# Getting To Know Each Other

Now that we can talk to our peers, let's be friendly neighbors and introduce ourself.

In this lesson we will connect to the nearl 10,000 Bitcoin Network peers that operate out in the open. We'll send each a `version` message and we'll record for their responses. Our first attempts at this will be far too slow and we will learn about "concurrent programming" -- a technique that frees our program to work on many things at once, in our case talking to Bitcoin Network peers.

Lastly we'll do some "data science" to find patterns in this sea of bytes. FIXME more words/

Let's get started!

# bitnodes.earn.com

The first thing we did in the first lesson was to pull up [this website](https://bitnodes.earn.com/nodes/) and look for the IP address of some other node to talk to. 

Now we're going to write some Python code to do this for us.

bitnodes.earn.com offers [a free, unauthenticated API](https://bitnodes.earn.com/api/#list-nodes) to help us do this. You've probably heard this word before -- API -- and you probably don't know exactly what it means. The acronym [API](https://en.wikipedia.org/wiki/Application_programming_interface) stands for "Application Programming Interface". An "Application Programming Interface" is a description of how a programmer can interact with a piece of software. For example, Python has an API for converting `bytes` to `int`s: [int.from_bytes(bytes, byteorder, \*, signed=False)](https://docs.python.org/3/library/stdtypes.html#int.from_bytes). Python defines this exact function allowing programmers to accomplish this exact operation. There are multiple different "implementations" of python -- CPython, PyPy, MicroPython etc -- and they all implement this same API.

So that's the original meaning of the term "application programming interface". But it's most frequently used describe this sort of thing in a specific domain: web programming. Please read this [explainer](https://medium.freecodecamp.org/what-is-an-api-in-english-please-b880a3214a82) of this more narrow definition of the term. The [earn.com API](https://bitnodes.earn.com/api/) is one such example of "API" in this sense of the word.

The earn.com API is free and also "unauthenticated" which means we don't have to present any kind of credential in order to use this -- stock market data APIs, for one, aren't so kind!

The API has this specific [List Nodes endpoint](https://bitnodes.earn.com/api/#list-nodes) which will give a list of every node they are aware of at present or some specific point in the past. We are able to specify 

To "exercise" this API we need to send a GET http request. This is the same sort of request that your browser sense every time you load a webpage. It just fetches data.

### cURL: A Terminal Utility

Go to your command line and type this in:

```
$ curl -H "Accept: application/json; indent=4" https://bitnodes.earn.com/api/v1/snapshots/latest/
```

(If you get any error you probably need to install the cURL program. Google it!)

This should spit a huge amoutn of "JSON" out onto your terminal. This is a complete list of all Bitcoin Network nodes which earn.com has been able to find.

### Requests: A Python Library

This is great, but we need to find a way to do this from Python. This is where the `requests` library comes in. Watch [this video](https://www.youtube.com/watch?v=_8HPCToXdAk) to learn how to use `requests`

##### Exercise #1: Use `requests.get` to make the same https request we made using cURL above.

Return a dictionary of the JSON response from the API 

another hint: [Relevant part](https://youtu.be/_8HPCToXdAk?t=3m12s) of Youtube video above.

hint: `.json()` get's the JSON response

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
import requests

def get_bitnodes_api_response():
    BITNODES_URL = "https://bitnodes.earn.com/api/v1/snapshots/latest/"
    ### YOUR CODE ###
    raise NotImplementedError()

In [5]:
def get_bitnodes_api_response():
    BITNODES_URL = "https://bitnodes.earn.com/api/v1/snapshots/latest/"
    return requests.get(BITNODES_URL).json()

In [6]:
nodes_json = open("ibd/four/response.json").read()
nodes_dict = json.loads(example_json)

NameError: name 'json' is not defined

In [7]:
def test_get_bitnodes_api_response():
    BITNODES_URL = "https://bitnodes.earn.com/api/v1/snapshots/latest/"
    with requests_mock.mock() as mock:
        mock.get(BITNODES_URL, json=nodes_dict)
        response = get_bitnodes_api_response()
        assert response == nodes_dict

ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_get_bitnodes_api_response*")

NameError: name 'ipytest' is not defined

#### Exercise #2: Call the bitnodes API and return just the `"nodes"` part of the JSON response

hint: relevant part of the YouTube video, where you grab the value corresponding to the `name` key from the `r.json()` response JSON dictionary. We're doing the same thing in this exercise, just looking up the `nodes` key instead of the `name` key.
```
r = requests.get("http://swapi.co/api/people/1")
r.json()['name']
```

In [8]:
def get_nodes():
    BITNODES_URL = "https://bitnodes.earn.com/api/v1/snapshots/latest/"
    ### YOUR CODE ###
    raise NotImplementedError()

In [9]:
def get_nodes():
    data = get_bitnodes_api_response()
    return data['nodes']

In [10]:
def test_get_nodes():
    BITNODES_URL = "https://bitnodes.earn.com/api/v1/snapshots/latest/"
    with requests_mock.mock() as mock:
        mock.get(url, json=example_data)
        nodes = get_nodes()
        assert nodes == nodes_dict['nodes']

ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_get_nodes*")

NameError: name 'ipytest' is not defined

##### Exercise #FIXME: Turn the `nodes` object into a list of `ip:port` string addresses

_Notice that the keys of the `node` object are `ip:port`_

This exercise just asks you to turn a dictionary into a list of it's keys. There's a built-in `dict` method to do this. Look it up.

In [11]:
def nodes_to_address_strings(nodes):
    raise NotImplementedError()    

In [12]:
def nodes_to_address_strings(nodes):
    return nodes.keys()

In [13]:
mock_nodes = {
    "192.168.0.1:8333": {}, # ipv4
    "FE80:CD00:0:CDE:1257:0:211E:729C:8333": {}, # ipv6
}

def test_nodes_to_address_strings():
    address_strings = nodes_to_address_strings(mock_nodes)
    solution_set = {"192.168.0.1:8333", "FE80:CD00:0:CDE:1257:0:211E:729C:8333"}
    assert set(address_strings) == solution_set

ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_nodes_to_address_strings*")

NameError: name 'ipytest' is not defined

##### Exercise #FIXME: Turn the `nodes` object into a list of `(ip, port)` tuples where ip is a string and port is an integer

If you recall, [`socket.connect`](https://docs.python.org/3/library/socket.html#socket.socket.connect) takes such a tuple as its argument. This is why I want you to do this. Once we have a list of every such tuple we can iterate across it and connect to every node.

note: this is a challenging exercise

FIXME: explain this as the gameplan / objective at the beginning.

In [14]:
def nodes_to_address_tuples(nodes):
    raise NotImplementedError()

In [15]:
def nodes_to_address_tuples(nodes):
    address_strings = nodes.keys()
    address_tuples = []
    for address_string in address_strings:
        ip, port = address_string.rsplit(":", 1)
        address_tuple = (ip, int(port))
        address_tuples.append(address_tuple)
    return address_tuples

In [16]:
mock_nodes = {
    "192.168.0.1:8333": {}, # ipv4
    "FE80:CD00:0:CDE:1257:0:211E:729C:8333": {}, # ipv6
}
solution_set = {
    ("192.168.0.1", 8333), 
    ("FE80:CD00:0:CDE:1257:0:211E:729C", 8333),
}

def test_nodes_to_address_tuples():
    address_tuples = nodes_to_address_tuples(mock_nodes)
    assert set(address_tuples) == solution_set

ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_nodes_to_address_tuples*")

NameError: name 'ipytest' is not defined

# Calling All Nodes!

Now we have a list of address tuples -- just like the `socket.connect` API uses. Let's iterate over them and download version messages from every node

In [22]:
import socket
from ibd.two.complete import Packet, VersionMessage # get the final version ...

OUR_VERSION = b'\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00j\x00\x00\x00\x9b"\x8b\x9e\x7f\x11\x01\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x93AU[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00rV\xc5C\x9b:\xea\x89\x14/some-cool-software/\x01\x00\x00\x00\x01'

def get_version_message(address_tuple):
    sock = socket.socket()
    sock.settimeout(1) # only wait 1 second for connections / responses
    
    sock.connect(address_tuple)

    # initiate the "version handshake"
    sock.send(OUR_VERSION)

    # receive their "version" response
    packet = Packet.from_socket(sock)

    version_message = VersionMessage.from_bytes(packet.payload)
    return version_message
    
def get_version_messages(address_tuples):
    version_messages = []
    exceptions = []
    for address_tuple in address_tuples:
        try:
            version_message = get_version_message(address_tuple)
        except Exception as e:
            exceptions.append(e)
            continue
        version_messages.append(version_message)
        
        successes = len(version_messages)
        total = len(address_tuples)
        failures = len(exceptions)
        remaining = total - (success + failures)
        progress = (successes + failures) / total
        print(f"{successes} Received | {failures} Failures | {remaining} Remaining | {progress:.3f}% Complete")
        

In [23]:
nodes = get_nodes()
address_tuples = nodes_to_address_tuples(nodes)
get_version_messages(address_tuples)

KeyboardInterrupt: 

After about 10 seconds of waiting for this cell to finish executing, I hope you start to wonder if our code might be running too slow? What's going on? Are we progressing? Are we stuck?

It's time to add a little logging to better understand what's happening

In [24]:
import time

def get_version_messages_logger(address_tuples, version_messages, exceptions, start_time):
    successes = len(version_messages)
    total = len(address_tuples)
    failures = len(exceptions)
    now = time.time()
    elapsed = now - start_time
    
    remaining = total - (successes + failures)
    progress = (successes + failures) / total
    seconds_remaining = elapsed / progress
    minutes_remaining = seconds_remaining / 60
    
    print(f"{successes} Received | {failures} Failures | {remaining} Remaining | {progress:.3f}% Complete | ~{minutes_remaining:.0f} Minutes Left")

def get_version_messages(address_tuples, logger=False):
    version_messages = []
    exceptions = []
    start_time = time.time()
    for address_tuple in address_tuples:
        try:
            version_message = get_version_message(address_tuple)
        except Exception as e:
            exceptions.append(e)
            continue
        version_messages.append(version_message)
        if logger:
            logger(address_tuples, version_messages, exceptions, start_time)

In [None]:
nodes = get_nodes()
address_tuples = nodes_to_address_tuples(nodes)
get_version_messages(address_tuples, logger=get_version_messages_logger)

# Too Slow

Do you feel like waiting around for an hour for all these version messages to download? I don't ...

There must be a better way, right?

Well, there is. It's called "concurrency". And it ain't easy!

![image](./images/this-tall.jpg)

Let's write some multi-threaded code!

Please read [this tutorial](https://code.tutsplus.com/articles/introduction-to-parallel-and-concurrent-programming-in-python--cms-28612), and stop at the "Gevent" section.

# Profiling -- Exactly Where is our Code Slow?

You'll notice that we're basically doing the same thing as the tutorial: sending internet requests to many different computers and waiting for their responses. In practice, almost all of our time is spend waiting for a response.

In order to convince you this is the case, let's profile our code.

We're going to use a tool called [line_profiler](https://github.com/rkern/line_profiler/). [Here is a nice tutorial](https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html) that describes a few methods of profiling python code, including line_profiler. Please read it.

First, we load line_profiler as an Jupyter extension. Next, we run our `get_version_message` function through it:

In [64]:
%load_ext line_profiler

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


In [66]:
%lprun -f get_version_message get_version_message(address_tuples[1])

You should see something like this at the bottom of your Jupyter window:

![image](images/profiler.png)

If you look in the "% Time" column, you will see that the `sock.connect` and `sock.recv` (within `Packet.from_socket`) calls are each taking up about 50% of the time. It's not because these functions are "slow" or "unoptimized" -- no, it's because they're waiting for a response from our peer. Meanwhile, our beautiful python program is stuck and can't do any other work while we're waiting.

Concurrency allows our computer to get unstuck. We'll give our computer a whole boatload of work and allow it to switch between tasks as it sees fit.

Downloader TODO
* make a test file with my version of the return addresses. thsi would be good for testing.
* grab the ip addresses. convert them to tuples like the socket API likes.
* synchronous `get_version(addr)` method
* synchronous `get_versions_synchronous(addrs)`
* progress printing function for ^^
* extract start and stop times for ^^ and print using matplotlib
    * wouldn't it be dope if more than 1 task ran at a time?
    * maybe also pull out some other timestamps demonstrating that all time is spent waiting for the remote peer.
* `get_versions_threaded(addrs)`
    * first just prints
    * second uses queue to communicate with main thread
* `get_versions_multiprocess(addrs)
* Print out the cool display showing how threads / processes are fast and how they're actually doing work in different threads / processes
* Graph how they're doing work concurrently
* (optional) given an example of a cpu-intensive task where multipprocessing excels (fib in a nod to David Beazley?)
* async / await version with `curio`
    * let's avoid having to use TaskGroup initially ... just obscures what's going on

Data Science TODO
* 