# Extract Kitten Data from blockchain

### Setup.

1. Install and sync an ethereum node such as Geth or Parity. I used Parity because it syncs very quickly. Using docker:

    ```
    docker pull parity/parity:stable
    ```

    run container, persist the stored data to ~/.local/share/io.parity.ethereum/ and map all the relevant ports. I think only the 8545 needs to be mapped but I have not tried. This is right out of the parity docker docs.

    ```
    docker run -ti -p 8180:8180 -p 8545:8545 -p 8546:8546 -p 30303:30303 -p 30303:30303/udp -v ~/.local/share/io.parity.ethereum/ docker/:/root/.local/share/io.parity.ethereum/ parity/parity:stable --base-path /root/.local/share/io.parity.ethereum/ --ui-interface all --jsonrpc-interface all
    ```

2. Setup web3.py, this is a python api that mimicks the javascript api for talking to the ethereum blockchain.

    ```
    pip install web3
    ```

3. Download the contract ABI, this is a json file that tells us how we can interact with the contract, you can get it from here https://etherscan.io/address/0x06012c8cf97bead5deae237070f9587f8e7a266d#code

The node might take several hours to sync and the cryptokitties contract is at block 4.6million something. It took my computer on a fairly slow connection about 12 hours.


In [42]:
from web3 import Web3, HTTPProvider
import json
import pandas as pd
from tqdm import tqdm

In [43]:
#connect to the local node
web3 = Web3(HTTPProvider('http://localhost:8545'))

In [44]:
#how far have we synced?
web3.eth.blockNumber

4688085

In [45]:
kitty_addr = '0x06012c8cf97BEaD5deAe237070F9587f8E7A266d'

In [46]:
with open('kitty_abi.json','r') as f:
    kitty_abi = json.loads(f.read())

In [47]:
kitty_contract = web3.eth.contract(abi=kitty_abi, address=kitty_addr)

In [48]:
num_kitties = kitty_contract.call().totalSupply()
print(num_kitties)

119072


### Extract the data

The contract exposes a function called getKitty( kitten_id), we can just call this over and over again for each kitten id. It is however very slow and there is probably a better way. It might take an hour or so.

#### result from getKitty

The getKitty function returns the following data (example)
```
0 isGestating   bool :  false
1  isReady   bool :  true
2  cooldownIndex   uint256 :  6
3  nextActionAt   uint256 :  0
4  siringWithId   uint256 :  0
5  birthTime   uint256 :  1512092101
6  matronId   uint256 :  16336
7  sireId   uint256 :  15486
8  generation   uint256 :  12
9  genes   uint256 :  531881267885876605735019567258882137769398970896872218627788409434126733
```

In [52]:
!ls *.csv

103148_to_119072_cats.csv  smaller_sample2.csv
1_to_103147_cats.csv	   smaller_sample.csv


In [50]:
cat_db=[]
START = 0+1
END = num_kitties+1

for cat_id in tqdm(range(START,END)):
    cat = kitty_contract.call().getKitty(cat_id)
    cat[9] = str(cat[9])
    cat_db.append([cat_id]+cat)

100%|██████████| 15925/15925 [15:16<00:00, 17.38it/s]


In [51]:
columns=['_id','is_gestating','is_ready','cooldown_index','next_action_at',
         'siring_with_id','birth_time','matron_id','sire_id','generation','genes']
df=pd.DataFrame(cat_db, columns=columns)

df.to_csv('%s_to_%s_cats.csv'%(df._id.min(),df._id.max()), index=False)

df.head()

Unnamed: 0,_id,is_gestating,is_ready,cooldown_index,next_action_at,siring_with_id,birth_time,matron_id,sire_id,generation,genes
0,103148,False,True,3,4684158,0,1512534961,101482,100776,2,6216765256253898498647653478943414584391110774...
1,103149,False,True,5,4683790,0,1512534961,62300,49752,9,5111761325625045209208445116394741302184963640...
2,103150,False,True,6,4684185,0,1512534961,102466,100752,8,6285596844569006952754794083584465441832967318...
3,103151,False,True,8,0,0,1512534961,23215,28267,17,5164097337500594131134018293304447194756063914...
4,103152,False,True,3,0,0,1512534961,78266,37475,6,4630239810681127245802440597048180964650767225...
