# MIT BlockSci Workshop

Hello and welcome to our workshop on exploring the Bitcoin blockchain with BlockSci!  
This notebook contains a number of assignments that you can work on during the next few hours.



## A few helpful tips

- You can split your solution into many code cells, which can be useful when some tasks need a lot more time to run than others. You can add another cell by clicking the `+` Symbol in the toolbar at the top.
- Cells can contain code or text. To write text (such as this one), change the cell type using the dropdown menu from "Code" to "Markdown".
- If you are not certain which methods or attributes an object `obj` provides, you can inspect it using the function `dir(obj)`.

## 0) Setup

You need to run the following two cells (e.g., by clicking "Run" in the toolbar on top or by moving the cursor into the cell and pressing `shift` + `enter` on your keyboard) in order to load and configure the required packages.

In [None]:
import blocksci
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [None]:
# Plots are non-interactive - change this only if you are familiar with the behavior of interactive plots
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 8)

Now, we are going to initialize the Bitcoin blockchain.

In [None]:
chain = blocksci.Blockchain("/blocksci/bitcoin")

## 1) Block intervals

Bitcoin’s targeted block interval is 10 minutes. However, as the amount of computation effort that miners invest changes over time, the actual block interval can deviate from this goal.

1. Compute the average time between two blocks
    - over the entire blockchain
    - for each coinbase reward epoch (i.e. the epochs in which the coinbase reward has not changed)
2. Plot the average block interval over time (aggregated on a monthly basis) and interpret your findings

*Note:* Use `block.timestamp` instead of `block.time` to avoid having to deal with Python's datetime objects. (If you use `block.time`, note that subtracting two datetime objects from each other returns a `timedelta` object. To get the number of seconds of this time span, you then need to use the function `.total_seconds()`).

In [None]:
# Start your solution here

## 2) Mining Reward

As a reward for solving the proof-of-work puzzle, miners receive a fixed block reward plus all transaction fees included in a block.
As the block reward is decreased every 210,000 blocks, transaction fees gradually become more and more important for miners.
Assess how important transaction fees are for a miner's income today by plotting the block block reward (in BTC) against the total amount of fees included in a block, aggregated as

1. the maximum amount per day
2. the average over a month

Interpret your results.

In [None]:
# Start your solution here

## 3) UTXO Growth

An important metric (e.g., in the block size debate) is the size of the UTXO pool (the data structure that keeps track of all unspent transaction outputs).
Compute and visualize the relative change in the size of the UTXO pool as well as its absolute size over time.

In [None]:
# Start your solution here

## 4) Transaction Fees

In this task we evaluate the transaction fees paid by users.

1. What is the average transaction fee (per byte) per block? Plot it over time. Can you make sense of the changes you see in the plot?
2. Next look at outliers that paid enormously high transaction fees. How many blocks have transactions that pay more than 1 BTC in transaction fees?
3. Plot the transaction fee of these transactions against their timestamp (ideally as as a scatterplot using pyplot, i.e. `plt.scatter`).
4. Make another scatterplot using the same data as before, but this time convert the BTC values into USD/EUR. What is the largest fee a user has (erroneously) paid?

Users are free to choose their own transaction fees based on their time preference (e.g., they can pay more to have their transaction confirmed faster).
Most wallets offer fee estimation algorithms that are intended to help the user to select an appropriate fee.
Unfortunately, most of these are not very good at predicting the necessary fee level.
As a result, users sometimes drastically overpay their fees.

1. Plot the spread of transaction fees paid (per byte), starting at block height 200000.
2. How does the plot change once you account for outliers? (Suggestion: Use the numpy function `np.percentile` to calculate percentiles)

In [None]:
# Start your solution here

## 5) SPV Mining

SPV mining refers to mining on top of a new block right after it is announced but before the miner has been able to verify the block itself.
The block that is mined on top therefore cannot include any transactions (as those may have already been included in the block that was just discovered). Find evidence for SPV mining on the blockchain and quantify the frequency with which it occurs.

1. Find blocks that do not include any transactions (limit your analysis to blocks found after height 400000)
2. Compare the time difference of these blocks to the previous block to identify SPV mining
3. Visualize the total amount of SPV-mined blocks per month

In [None]:
# Start your solution here

## 6) CoinJoin Detection

Write a function `is_coinjoin(tx)` to detect CoinJoin transactions. The function takes a transaction and should return `True` if the transaction is likely a CoinJoin transaction, and `False` otherwise. Describe the characteristics you use to detect CoinJoin transactions.

In [None]:
def is_coinjoin(tx):
    # Start your solution here

You can use the following hashes of CoinJoin transactions to test your algorithm (taken from the *Joinmarket* Twitter account https://twitter.com/joinmarket). You may need to test your method with other transactions to prevent *false positives*.

In [None]:
is_coinjoin(blocksci.Tx("b877b4f66a7e8fa847e8a775024be20a2b76d10c1b9300f0b95b3757a1448d4d"))

In [None]:
is_coinjoin(blocksci.Tx("7d588d52d1cece7a18d663c977d6143016b5b326404bbf286bc024d5d54fcecb"))

In [None]:
is_coinjoin(blocksci.Tx("55eac9d4a4159d4ba355122c6c18f85293c19ae358306a3773ec3a5d053e2f1b"))

## 7) Get Creative

Now it's your turn to start exploring the blockchain using BlockSci. While many of the previous examples were longitudinal analyses, analyzing specific use cases can be even more exciting. Here are a couple of suggestions:

- Use your CoinJoin heuristic from part 6 to identify chains of CoinJoins in the blockchain
- Analyze the behavior of different entities in the ecosystem. For example, you could compare the win rate of a specific SatoshiDice address to the actual win rate. You can also use a list of addresses from [WalletExplorer.com](https://www.walletexplorer.com/) as a starting point for similar analyses.
- Try to identify cold wallet addresses based on an exchange's known hot wallets.