<img src="https://secureservercdn.net/160.153.137.210/86v.eb1.myftpupload.com/wp-content/uploads/2020/09/Logos-3.png?time=1625746717" align="right" width="120"/>

# Open Trust Fabric (OTF)
# Digital Platform Use Case
# 09-ContractNetwork
June 2021

Exploring the Airbnb contract network

**TODO**: Add a general description of the objectives related to this digital contract science notebook.

This notebook uses Graphistry for visual network analysis. Follow the instructions in README.md to configure Graphistry.

In [1]:
import json
with open('../config.json') as f:
    config = json.load(f)

Registering Graphistry:

In [2]:
import graphistry
graphistry.register(
    api=3, protocol="https", server="hub.graphistry.com", 
    username=config['graphistry_username'], password=config['graphistry_password'])

In [8]:
import pandas as pd
import networkx as nx

Reading the policy network for aggregate data:

In [4]:
df = pd.read_csv('data/POLICY_NETWORK.csv.gzip', compression='gzip', low_memory=False)

In [5]:
df.head()

Unnamed: 0,MODEL_INSTANCE,TIME_STAMP,PARTIES_PROPOSER_ID,PARTIES_PARTICIPANT_ID,ASSET_ID,ASSET_PRICE,ASSET_MIGRATION,ASSET_LOCATION,ASSETS_REVIEW_SCORES_ACCURACY,ASSET_NUMBER_REVIEWS,ASSET_AVAILABILITY_30,YEAR,MONTH
0,PJNs4Fpyg4,2015-04-10,62142,30537860,15883,,,,,,,2015,4
1,rGvohKWuaC,2016-06-19,62142,37529754,15883,$85.00,0.0,vienna,10.0,1.0,9.0,2016,6
2,O1ZE3GDKxL,2016-07-29,62142,3147341,15883,$85.00,0.0,vienna,10.0,2.0,17.0,2016,7
3,2t2vxoLsH5,2016-08-13,62142,29518067,15883,$85.00,0.0,vienna,9.0,3.0,9.0,2016,8
4,KBHJiEPuFf,2016-11-21,62142,36016357,15883,$85.00,0.0,vienna,10.0,4.0,9.0,2016,11


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6640646 entries, 0 to 6640645
Data columns (total 13 columns):
 #   Column                         Dtype  
---  ------                         -----  
 0   MODEL_INSTANCE                 object 
 1   TIME_STAMP                     object 
 2   PARTIES_PROPOSER_ID            int64  
 3   PARTIES_PARTICIPANT_ID         int64  
 4   ASSET_ID                       int64  
 5   ASSET_PRICE                    object 
 6   ASSET_MIGRATION                float64
 7   ASSET_LOCATION                 object 
 8   ASSETS_REVIEW_SCORES_ACCURACY  float64
 9   ASSET_NUMBER_REVIEWS           float64
 10  ASSET_AVAILABILITY_30          float64
 11  YEAR                           int64  
 12  MONTH                          int64  
dtypes: float64(4), int64(5), object(4)
memory usage: 658.6+ MB


Let's move straight to creating the first network view. Now, social network analysis is very straigthforward--in principle. Here is the famous Karate Club graph in Graphistry:

In [9]:
graphistry.bind(source='src', destination='dst', node='nodeid').plot(nx.karate_club_graph())

However, as the size of the network increases, things get much more difficult. How large is the network under investigation, then?

In [10]:
import networkx as nx
G = nx.DiGraph()
for edge in df.itertuples():
    # print(edge)
    s = edge.PARTIES_PROPOSER_ID
    t = edge.PARTIES_PARTICIPANT_ID

    if G.has_edge(s,t):
        G[s][t]['weight'] += 1
    else:
        G.add_edge(s, t, weight = 1)

First, we want to know the size of the network.

In [11]:
nx.info(G)

'Name: \nType: DiGraph\nNumber of nodes: 5402538\nNumber of edges: 6562837\nAverage in degree:   1.2148\nAverage out degree:   1.2148'

It is impossible to visually explore the full network of actors. However, to give a quick demo of Graphistry, let's create a sample of the dataset to create a network visualization.

In [12]:
graphistry.bind(
        source='PARTIES_PROPOSER_ID', destination='PARTIES_PARTICIPANT_ID'
    ).edges(
        df.sample(frac=0.001)
    ).plot()

Calculating PageRank to identify key nodes or supernodes in the network:

In [13]:
pr = nx.pagerank(G)
# Setting page rank values for nodes
nx.set_node_attributes(G, pr, 'pagerank')
nx.set_node_attributes(G, values= nx.in_degree_centrality(G), name='indegree')
# nx.set_node_attributes(G, values=G.out_degree(), name='outdegree')

Creating a data frame for nodes:

In [14]:
nodes = dict(G.nodes(data=True))

df_nodes = pd.DataFrame.from_dict(nodes, orient='index')
df_nodes['nodeid'] = nodes.keys()
df_nodes = df_nodes.sort_values('pagerank', ascending=False)
df_nodes['nodeid'] = df_nodes.nodeid.astype(int)
df_nodes.head()

Unnamed: 0,pagerank,indegree,nodeid
115767079,1.053131e-06,2.5e-05,115767079
3479650,8.615874e-07,9e-06,3479650
292913665,7.038633e-07,3e-06,292913665
20787352,6.799564e-07,4e-06,20787352
13690581,6.78193e-07,1e-06,13690581


In [15]:
df_nodes.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5402538 entries, 115767079 to 62142
Data columns (total 3 columns):
 #   Column    Dtype  
---  ------    -----  
 0   pagerank  float64
 1   indegree  float64
 2   nodeid    int64  
dtypes: float64(2), int64(1)
memory usage: 164.9 MB


In [16]:
df_nodes.head()

Unnamed: 0,pagerank,indegree,nodeid
115767079,1.053131e-06,2.5e-05,115767079
3479650,8.615874e-07,9e-06,3479650
292913665,7.038633e-07,3e-06,292913665
20787352,6.799564e-07,4e-06,20787352
13690581,6.78193e-07,1e-06,13690581


In [17]:
# Exploring the top node according to PageRank

top_pr = df_nodes.sort_values('pagerank', ascending=False).iloc[0]
int(top_pr.nodeid)

115767079

In [18]:
ego_g = nx.ego_graph(G, int(top_pr.nodeid), radius=2, undirected=True)

We found inspiration from a demonstration on [Exploring €1.3 trillion in public contracts with graph visualization](https://linkurio.us/blog/exploring-e1-3-trillion-in-public-contracts-with-graph-visualization/) that draws data on European Union contracts and uses a graph database Neo4j and a commercial graph analysis tool Linkurious to explore the contract ecosystem.
 
However, our existing analytical infrastructure does not merit similar dynamic and interactive analysis.  
 
Demonstracting an alternative tool for egocentric network exloration:

In [20]:
import ipycytoscape
import ipywidgets as widgets
# import networkx as nx


In [None]:
directed = ipycytoscape.CytoscapeWidget()
directed.graph.add_graph_from_networkx(ego_g, directed=True)
# directed.set_layout(name='grid')
display(directed)

In [None]:
graphistry.bind(source='src', destination='dst', node='nodeid').plot(ego_g)

## Moving forward

To move forward with full contract analysis, we would aggregate the value that each actor has accumulated to create a list of top hosts to explore their potential interconnections.

1. reading the policy network, 
1. join with events and assets table
1. clean price data (to euros), and  
1. multiply price with occupancy. 

As the first step, we would need to clean price data before calculating the total value of each host. Perhaps we should clean the price data upstream to avoid this phase?

In [None]:
# ASSET_PRICE
def fix_price (value):
    try:
        value = value.replace("$", '').replace(",", '')
        value = float(value)
        return value
    except: return np.nan  

df['ASSET_PRICE'] = df['ASSET_PRICE'].apply(lambda x : fix_price(x))

# We clean the way price is reported removing special values