# Introduction
---

**Author:** Ties de Kok ([Personal Website](http://www.tiesdekok.com))  
**Last updated:** 18 June 2018  
**Python version:** Python 3.6  
**License:** MIT License   

This Jupyter Notebook contains the Python code equivalent of the R code provided in the paper *"Introduction to Blockchain With R"* written by **Theophanis Stratopoulos** and **Jesus Calderon**. 

This paper is available on SSRN here:  

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3189518

### New to Python? Check out my `LearnPythonforResearch` repository!

<a href='https://github.com/TiesdeKok/LearnPythonforResearch'><img style="position: relative; top: 0px; left: 60px;" src="http://www.tiesdekok.com/EAA_2018_NLP/images/learnpython.PNG", width=50%></a>

### Imports

In [1]:
import re, copy
import pandas as pd
import numpy as np
import hashlib

To make some of the prints easier to read:

In [2]:
import pprint
pp = pprint.PrettyPrinter(indent=0)

# Traditional ledger
---

### A time stamped transaction

In [3]:
txn_example = pd.DataFrame([{'timestamp' : pd.Timestamp.now(),
                            'to_account' : 'Alice',
                            'from_account' : 'Bob',
                            'amount' : 900}])

In [4]:
txn_example

Unnamed: 0,amount,from_account,timestamp,to_account
0,900,Bob,2018-06-18 18:03:51.979629,Alice


### A function to generate transactions on the ledger

In [5]:
def transaction(from_account, to_account, amount):
    ## Python `.isnumeric()` only works with integers, not with float values
    if isinstance(amount, int) or isinstance(amount, float):
        new_txt = pd.DataFrame([{'timestamp' : pd.Timestamp.now(),
                            'from_account' : from_account,
                            'to_account' : to_account,
                            'amount' : amount}])
        return new_txt
    else:
        print('Amount must be numeric')

In [6]:
ldgr_txn_genesis = transaction(from_account='Genesis Endowment', 
                               to_account = 'Bob',
                               amount = 2000
                              )

In [7]:
ledger = ldgr_txn_genesis
ledger

Unnamed: 0,amount,from_account,timestamp,to_account
0,2000,Genesis Endowment,2018-06-18 18:03:52.040468,Bob


### Account balances on ledger

In [8]:
def get_balance_ldgr(ledger, account):
    deposits = ledger[ledger.to_account == account]['amount']
    total_deposits = deposits.sum()
    
    withdrawls = ledger[ledger.from_account == account]['amount']
    total_withdrawls = withdrawls.sum()
    
    balance = total_deposits - total_withdrawls
    
    return balance

In [9]:
get_balance_ldgr(ldgr_txn_genesis, 'Bob')

2000

### Create a block on the ledger

In [10]:
ldgr_txn_bob2alice = transaction(from_account = 'Bob',
                                  to_account = 'Alice',
                                  amount = 900
                                 )
ldgr_txn_bob2alice

Unnamed: 0,amount,from_account,timestamp,to_account
0,900,Bob,2018-06-18 18:03:52.113272,Alice


In [11]:
ledger = ledger.append(ldgr_txn_bob2alice)
ledger

Unnamed: 0,amount,from_account,timestamp,to_account
0,2000,Genesis Endowment,2018-06-18 18:03:52.040468,Bob
0,900,Bob,2018-06-18 18:03:52.113272,Alice


In [12]:
for account in ['Alice', 'Bob', 'Christie']:
    print('Balance of {}: {}'.format(account, 
                                     get_balance_ldgr(ledger, account)))

Balance of Alice: 900
Balance of Bob: 1100
Balance of Christie: 0


#### Add another transaction

In [13]:
ldgr_txn_alice2christie = transaction(from_account = 'Alice',
                                       to_account = 'Christie',
                                       amount = 300
                                      )

In [14]:
ledger = ledger.append(ldgr_txn_alice2christie)
ledger

Unnamed: 0,amount,from_account,timestamp,to_account
0,2000,Genesis Endowment,2018-06-18 18:03:52.040468,Bob
0,900,Bob,2018-06-18 18:03:52.113272,Alice
0,300,Alice,2018-06-18 18:03:52.180092,Christie


In [15]:
for account in ['Alice', 'Bob', 'Christie']:
    print('Balance of {}: {}'.format(account, 
                                     get_balance_ldgr(ledger, account)))

Balance of Alice: 600
Balance of Bob: 1100
Balance of Christie: 300


## Blockchain ledger
---

A couple of clarifications:  

1. The number of transactions in a block in this example is only 1 transaction per block. See footnote 3 in the paper for an explanation.  
2. 


### Modified transaction function

**Note:** to make the hashing more predictable the modified `transaction_bc` function returns the data as a `dictionary` instead of a `Pandas DataFrame`.

In [16]:
def transaction_bc(from_account, to_account, amount):
    ## Python `.isnumeric()` only works with integers, not with float values
    if isinstance(amount, int) or isinstance(amount, float):
        transaction_dict = {'timestamp' : pd.Timestamp.now(),
                            'from_account' : from_account,
                            'to_account' : to_account,
                            'amount' : amount}
        return transaction_dict
    else:
        print('Amount must be numeric')

### Create blockchain with genesis block

In [17]:
genesis_transaction = transaction_bc(from_account='Genesis Endowment', 
                                              to_account = 'Bob',
                                              amount = 2000
                                     )

genesis_block = {'transactions' : genesis_transaction,
                 'prev_hash' : None,
                 'proof_of_work' : 0
                }

In [18]:
blockchain = [genesis_block]

In [19]:
pp.pprint(blockchain)

[{'prev_hash': None,
'proof_of_work': 0,
'transactions': {'amount': 2000,
                'from_account': 'Genesis Endowment',
                'timestamp': Timestamp('2018-06-18 18:03:52.259882'),
                'to_account': 'Bob'}}]


### Hash function

Convenience function as Python requires to encode the string first.

In [20]:
def hash_sha256(obj):
    return hashlib.sha256(str(obj).encode('utf-8')).hexdigest()

**Note:** `Hashlib` in Python will only yield the same value as `digest` in R if serialization is turned off:
```
> digest('banana', 'sha256')
[1] '351efb5a0bc748f203667a09aa40278b165f5a50742af3dbc1b7077c6293010b'
> digest('banana', 'sha256', serialize=FALSE)
[2] 'b493d48364afe44d11c0165cf470a4164d1e2609911ef998be868d46ade3de4e'
```

In [21]:
hash_sha256('banana')

'b493d48364afe44d11c0165cf470a4164d1e2609911ef998be868d46ade3de4e'

In [22]:
hash_sha256('bananas')

'e4ba5cbd251c98e6cd1c23f126a3b81d8d8328abc95387229850952b3ef9f904'

In [23]:
hash_sha256('Banana')

'f9782dd7999dc14b39c1329735e6e4ef72e77a3cf5fa32f2f57bf8d5493f0fc5'

### Hash the genesis block

In [24]:
genesis_hash = hash_sha256(genesis_block)
genesis_hash

'b7b1bae7d16d202880edecdcf51b7c23ede397a43ba8a755d1e29b9628f7b541'

**Practice Problem:**  
Will the genesis_hash remain the same if your re-run the
above sequence of commands (genesis block, blockchain, and hashing)? Why?

### Validation function

In [25]:
def is_valid_proof(last_proof, this_proof):
    guess = str(last_proof) + str(this_proof)
    guess_hash = hash_sha256(guess)
    is_valid = bool(re.search('0{3}$',guess_hash))
    return is_valid

**Note:** renamed the variable `test` to `is_valid` so that it is more intuitive.

**Clarification:** validation is done by defining a pattern that indicates validity, in essence this pattern is arbitrary and could be anything. The pattern is important because it determines the difficulty of finding a "valid" combination. In this example the pattern is "end with 3 zeros", which implies that a hash is "valid" if it ends with 3 zeros. 

### Proof of work

In [26]:
def proof_of_work(last_proof):
    candidate_proof = 0
    while not is_valid_proof(last_proof, candidate_proof):
        candidate_proof += 1
    return candidate_proof

**Clarification:** the `proof_of_work` function will keep trying different (in this case increasing) numbers until (essentially by chance) the "valid" number is found that results in a "valid" combination. Depending on the difficulty this can take a long time!

In [27]:
%%time
second_proof = proof_of_work(genesis_block['proof_of_work'])
print(second_proof)

5735
Wall time: 26.9 ms


In [42]:
is_valid_proof(0, 5735)

True

In [43]:
hash_sha256(str(0) + str(second_proof))

'3a4d5ba8ce221c92771b35cec354df30de468c7e67a994c2bfc08fa437fa0000'

**Practice Problem:**  
1. Change the number of zeros in the validation function (is_valid_proof) from three to four.
2. Run the proof of work on the genesis block with the updated requirements and measure how long it takes (you can use `%%time` like demonstrated above)
3. How does it compare to the one above?

## Account balance on Blockchain

In [30]:
def get_balance(blockchain, account):
    ldgr_txns = []
    for block in blockchain:
        ldgr_txns.append(block['transactions'])
        
    balance = get_balance_ldgr(pd.DataFrame(ldgr_txns), account)
    
    return balance

In [31]:
get_balance(blockchain, 'Bob')

2000

### Create and add new block to blockchain

#### Create new block

In [32]:
def create_block(txns, prev_hash = None, proof_of_work = None):
    new_block = {'transactions' : txns, 
                 'prev_hash' : prev_hash,
                 'proof_of_work' : proof_of_work}
    return new_block

In [33]:
txn_bob = transaction_bc(from_account = 'Bob',
                         to_account = 'Alice',
                         amount = 900
                        )  

In [34]:
second_block = create_block(txns = txn_bob, 
                            prev_hash = genesis_hash,
                            proof_of_work = second_proof)

#### Add block function

In [35]:
def add_block_if_valid(blockchain, new_block):
    new_blockchain = copy.deepcopy(blockchain)
    
    check_proof = is_valid_proof(blockchain[-1]['proof_of_work'], 
                                 new_block['proof_of_work'])
    
    last_hash = hash_sha256(blockchain[-1])
    check_hash = last_hash == new_block['prev_hash']
        
    acct_balance = get_balance(blockchain, new_block['transactions']['from_account'])
    check_balance = acct_balance > new_block['transactions']['amount']
    
    if check_proof and check_hash and check_balance:
        new_blockchain.append(new_block)
    else:
        print('This block is not valid.')
        
    return new_blockchain

**Note:** without the first line `new_blockchain = copy.deepcopy(blockchain)` Python will modify the `blockchain` variable in place when we append to it. We want to avoid that and instead explicitly overwrite the `blockchain` variable in the next cell. It is usually bad practice to modify variable directly from within a function.

In [36]:
blockchain = add_block_if_valid(blockchain, second_block)
pp.pprint(blockchain)

[{'prev_hash': None,
'proof_of_work': 0,
'transactions': {'amount': 2000,
                'from_account': 'Genesis Endowment',
                'timestamp': Timestamp('2018-06-18 18:03:52.259882'),
                'to_account': 'Bob'}},
{'prev_hash': 'b7b1bae7d16d202880edecdcf51b7c23ede397a43ba8a755d1e29b9628f7b541',
'proof_of_work': 5735,
'transactions': {'amount': 900,
                'from_account': 'Bob',
                'timestamp': Timestamp('2018-06-18 18:03:52.596980'),
                'to_account': 'Alice'}}]


## Example: add a third block to the blockchain

In [37]:
txn_alice = transaction_bc(from_account = 'Alice',
                        to_account = 'Christie',
                        amount = 300)

In [38]:
third_block = create_block(txns = txn_alice,
                           prev_hash = hash_sha256(blockchain[-1]),
                           proof_of_work = proof_of_work(blockchain[-1]['proof_of_work'])
                          )

In [39]:
third_block

{'prev_hash': '22b463b32bc7143c482384d7745bcf76a0f84777e9b760d3999ffa834cfeae50',
 'proof_of_work': 626,
 'transactions': {'amount': 300,
  'from_account': 'Alice',
  'timestamp': Timestamp('2018-06-18 18:03:52.672780'),
  'to_account': 'Christie'}}

In [40]:
blockchain = add_block_if_valid(blockchain, third_block)
pp.pprint(blockchain)

[{'prev_hash': None,
'proof_of_work': 0,
'transactions': {'amount': 2000,
                'from_account': 'Genesis Endowment',
                'timestamp': Timestamp('2018-06-18 18:03:52.259882'),
                'to_account': 'Bob'}},
{'prev_hash': 'b7b1bae7d16d202880edecdcf51b7c23ede397a43ba8a755d1e29b9628f7b541',
'proof_of_work': 5735,
'transactions': {'amount': 900,
                'from_account': 'Bob',
                'timestamp': Timestamp('2018-06-18 18:03:52.596980'),
                'to_account': 'Alice'}},
{'prev_hash': '22b463b32bc7143c482384d7745bcf76a0f84777e9b760d3999ffa834cfeae50',
'proof_of_work': 626,
'transactions': {'amount': 300,
                'from_account': 'Alice',
                'timestamp': Timestamp('2018-06-18 18:03:52.672780'),
                'to_account': 'Christie'}}]


#### Check balance:

In [41]:
for account in ['Alice', 'Bob', 'Christie']:
    print('Balance of {}: {}'.format(account, 
                                     get_balance(blockchain, account)))

Balance of Alice: 600
Balance of Bob: 1100
Balance of Christie: 300


## Final practice problems

1) How do we know if the above proof of work (626) is correct?

2) Bob wants to transfer 500 to David. Is this a valid transaction/block? If yes, show the updated blockchain. 

3) Alice wants to transfer 700 to Eric. Is this a valid transaction/bloc? If yes, show the updated blockchain.