# Playing around with Blockchain Graphs
We want to explore blockchain data in a more intuitive way using graphs.  We use the API from flipside crypto to make queries, and then use some graph network libraries to play with the results. To gain access to flipside, visit [https://docs.flipsidecrypto.com/](https://docs.flipsidecrypto.com/) to make an account and create your API key, use this as the "my_key" variable below.

The way this is structured is that we are interested in following connections between wallets as far as it takes until we hit a labelled wallet. This way, we have information about known entities, and bunch of un-labelled addresses that might be of interest.

## General flow
- Input a list of seed addresses from which we will grow a graph of transactions and associated wallets
- Run SQL queries with ShroomDK to find all transactions involving those seed addresses, as well as looking for any address labels, and also check if any addresses are contracts or not.
- We Then use NetworkX to produce a graph from these transactions. This creates node and edge objects from a dataframe of transactions
- Using the address labels, we simplify the graph, e.g. if there are 10 different addresses associated with "Binance",  we represent them as a single node.
- We can then enrich the graph with data about the nodes and edges (e.g total transaction volume between nodes, and net volume/direction)

### First, initiate the api

In [8]:
%load_ext autoreload
%autoreload 2

import pandas as pd
from flipside import Flipside
import networkx as nx
from utils import grow_df, draw_graph

# my key
my_key = '20a4ac26-4880-4fa2-a4b7-53ea31edb54f'
sdk = Flipside(my_key, "https://api-v2.flipsidecrypto.xyz")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Grow the dataframe from the seed addresses

In [2]:
# list of starting address(es) of interest
seed_addresses = ['0x64e9B9cD74A46f71e7631CB033afA6E7849a8683']

# grow_df is designed to work the same for every step of growing the graph, so we must start with some initializations of empty objects
nogrow_addresses = [] # this is a list of addresses that we won't continue to grow from, we can update this as we go.
spam_symbols = ['VOLT','CYFM'] # list of problematic token symbols e.g ones that have incorrect usd amounts (like trillions of dollars)
contracts = [] # This is a list of contract addresses that we won't grow the graph from. Can also update as we go.
address_label_dict = {} # initialize an empty dictionary for the address to label dict.
df = pd.DataFrame() # initialize an empty pandas dataframe 

# make sure lowercase addresses, otherwize problems
seed_addresses = [x.lower() for x in seed_addresses] # just making sure lower case addresses
nogrow_addresses = [x.lower() for x in nogrow_addresses] # just making sure lower case addresses
contracts = [x.lower() for x in contracts] # just making sure lower case addresses

limit_connections = 500 # max number of transactions to return from the set of seed addresses
rank_by = "amount_usd" # Ranking of above number of transactions, e.g. if rank_by ="amount_usd" and limit_connections=500, then we get the top 500 transactions from the seed addresses ranked by transaction usd amount 

out = grow_df(seed_addresses,
              nogrow_addresses,
              sdk,
              address_label_dict=address_label_dict,
              contracts=contracts,
              spam_symbols=spam_symbols,
              df=df,
              limit_connections=limit_connections)

seed_addresses, nogrow_addresses, address_label_dict, label_address_dict, contracts, df  = out

# some entries in df
df.head(5)

Running address query
Running contract query
Running label query


Unnamed: 0,symbol,decimals,amount,amount_usd,tx_hash,from_address,to_address,block_timestamp,__row_index
0,ETH,18,593.055365,717027.66,0xa3edc7316839c90280fa24edb73f0dfb64c49a208fdd...,0x68b3465833fb72a70ecdf485e0e4c7bd8665fc45,0x64e9b9cd74a46f71e7631cb033afa6e7849a8683,2022-12-27T21:18:11.000Z,0
1,LDO,18,678760.68074,699123.501162,0xfbc259adbb19af1299e2fece3bafff7bc6dd56a226a2...,0xe5d0ef77aed07c302634dc370537126a2cd26590,0x64e9b9cd74a46f71e7631cb033afa6e7849a8683,2022-12-27T20:54:47.000Z,1
2,LDO,18,638569.225332,657726.302092,0xa3edc7316839c90280fa24edb73f0dfb64c49a208fdd...,0x64e9b9cd74a46f71e7631cb033afa6e7849a8683,0xa3f558aebaecaf0e11ca4b2199cc5ed341edfd74,2022-12-27T21:18:11.000Z,2
3,ETH,18,500.0,605615.0,0xd55d60beeab7f3bb8205c21ec915144d4a8f5bb2277d...,0x7386df2cf7e9776bce0708072c27d6a7135d51cb,0x64e9b9cd74a46f71e7631cb033afa6e7849a8683,2022-12-28T00:48:59.000Z,3
4,ETH,18,499.100669,604525.7,0x9dd6a7de29e7fe6a838ae659a86649bc2c5d004861cd...,0x64e9b9cd74a46f71e7631cb033afa6e7849a8683,0x68b3465833fb72a70ecdf485e0e4c7bd8665fc45,2022-12-28T00:50:59.000Z,4


### Use the dataframe to make a graph, exported as an html
Currently setup such that:

- Addresses are circles
- Contracts are squares
- labelled addresses are yellow and large
- unlabelled addresses are red and small
- edges connecting to at least one labelled node are yellow
- all other edges are red (connecting only to unlabelled addresses)
- edge widths are determined by total transaction volume

html file is saved to directory with name "filename". 

"draw_graph" also returns the network object (pyvis class), and a dataframe that has been enriched with additional summary statistics (net usd vol etc..)

In [3]:
filename = "test1"
net, df_enriched = draw_graph(df,address_label_dict,label_address_dict,contracts,filename)

# some entries in df_enriched
df_enriched.head(5)

# Can also e.g look at net.nodes, or net.edges

Unnamed: 0,symbol,decimals,amount,amount_usd,tx_hash,from_address,to_address,block_timestamp,__row_index,from_label,to_label,usd_net_vol_out,usd_vol,n_transactions
0,ETH,18,593.055365,717027.66,0xa3edc7316839c90280fa24edb73f0dfb64c49a208fdd...,0x68b3465833fb72a70ecdf485e0e4c7bd8665fc45,0x64e9b9cd74a46f71e7631cb033afa6e7849a8683,2022-12-27T21:18:11.000Z,0,uniswap,0x6683,-17542.57,4284839.0,17.0
1,LDO,18,678760.68074,699123.501162,0xfbc259adbb19af1299e2fece3bafff7bc6dd56a226a2...,0xe5d0ef77aed07c302634dc370537126a2cd26590,0x64e9b9cd74a46f71e7631cb033afa6e7849a8683,2022-12-27T20:54:47.000Z,1,alameda research,0x6683,1016743.0,1016743.0,10.0
2,LDO,18,638569.225332,657726.302092,0xa3edc7316839c90280fa24edb73f0dfb64c49a208fdd...,0x64e9b9cd74a46f71e7631cb033afa6e7849a8683,0xa3f558aebaecaf0e11ca4b2199cc5ed341edfd74,2022-12-27T21:18:11.000Z,2,0x6683,uniswap,17542.57,4284839.0,17.0
3,ETH,18,500.0,605615.0,0xd55d60beeab7f3bb8205c21ec915144d4a8f5bb2277d...,0x7386df2cf7e9776bce0708072c27d6a7135d51cb,0x64e9b9cd74a46f71e7631cb033afa6e7849a8683,2022-12-28T00:48:59.000Z,3,0x71cb,0x6683,629737.5,629737.5,2.0
4,ETH,18,499.100669,604525.7,0x9dd6a7de29e7fe6a838ae659a86649bc2c5d004861cd...,0x64e9b9cd74a46f71e7631cb033afa6e7849a8683,0x68b3465833fb72a70ecdf485e0e4c7bd8665fc45,2022-12-28T00:50:59.000Z,4,0x6683,uniswap,17542.57,4284839.0,17.0


### Now grow graph by another step, i.e. for every new addresss, find all connections to those addresses too.

In [4]:
out = grow_df(seed_addresses,
              nogrow_addresses,
              sdk,
              address_label_dict=address_label_dict,
              contracts=contracts,
              spam_symbols=spam_symbols,
              df=df,
              limit_connections=limit_connections)

seed_addresses, nogrow_addresses, address_label_dict, label_address_dict, contracts, df = out

Running address query
Running contract query
Running label query


In [5]:
# make another graph including the second order connections from our starting point
filename = "test2"
net, df_enriched = draw_graph(df,address_label_dict,label_address_dict,contracts,filename)

### And one more step

In [6]:
out = grow_df(seed_addresses,
              nogrow_addresses,
              sdk,
              address_label_dict=address_label_dict,
              contracts=contracts,
              spam_symbols=spam_symbols,
              df=df,limit_connections=limit_connections)

seed_addresses, nogrow_addresses, address_label_dict, label_address_dict, contracts, df = out

Running address query
Running contract query
Running label query


In [7]:
# make another graph including the second order connections from our starting point
filename = "test3"
net, df_enriched = draw_graph(df,address_label_dict,label_address_dict,contracts,filename)

Open test1.html, test2.html, test3.html, in Chrome to view and interact with each of the graphs. These should have been written to the current directory

### Issues
- Some spam tokens still a problem, mess with the visualizations of comparitive transfer volume, because they are supposedly billions or trillions in value (this is from the api)