# Learning from networks - Stonks

Start by importing the libraries we will use throughout the notebook.

In [1]:
import networkx as nx
import extended_networkx as ex

## Load graph and compute market capitalization

First of all let's start by loading the graph file, then we compute what we called "market capitalization" which is how much an node has been bought.

In [2]:
# Load Graph
G = nx.read_gml("out_graph.gml")

print(f"The graph contains {len(G.nodes())} nodes and {len(G.edges())} edges.")
print(f"There are {len(list(nx.connected_components(G)))} connected components.")
print(f"The diameter of the graph is {nx.diameter(G)}.")


def compute_capitalization(G: nx.Graph):
    """
    Adds the 'capitalization' attribute to every node, which is the sum of the incoming edges weights.
    """
    for node in G.nodes():
        capitalization = 0
        for edge in G.in_edges(node):
            capitalization += G.get_edge_data(*edge)["weight"]
        G.nodes[node]["capitalization"] = capitalization

compute_capitalization(G)

Now let's print the top 20 capitalization nodes.

In [4]:
k = 20
print(f"Top {k} nodes with highest capitalization: {ex.max_k_nodes(G, k, 'capitalization')}")

Top 20 nodes with highest capitalization: ['CPIN', 'AAPL', 'MSFT', 'AMZN', 'ADRO', 'FB', 'GOOGL', 'GOOG', 'TSLA', 'NVDA', 'JPM', 'JNJ', 'UNVR', 'V', 'UNH', 'PG', 'HD', 'PYPL', 'ADBE', 'BAC']


## Node-level features

### Betweenness centrality

We try to compute betweenness centralities of the nodes. The graph is too big to compute the exact betweenness centrality of each node, so we only use a small percentage of the nodes.

In [5]:
b_centralities = ex.betweenness_centrality_percent(G, percentage=0.02)
print(sorted(b_centralities.items(), key=lambda t: t[1], reverse=True)[:k])

[('FM', 0.00011222668143293993), ('IWD', 7.882921513785058e-05), ('VBK', 6.42761292662474e-05), ('JKH', 5.5973407198473786e-05), ('SMIN', 4.8416997226679826e-05), ('VOE', 3.582298060702322e-05), ('HART', 3.0412217911170756e-05), ('IGM', 2.8733015695216544e-05), ('EDEN', 2.51880332393132e-05), ('CEY', 2.2855807939376796e-05), ('THD', 2.1083316711425126e-05), ('BMED', 1.417992982361336e-05), ('BFIT', 1.1474548475687126e-05), ('VNM', 6.810097875814311e-06), ('RXL', 6.436941827824485e-06), ('UMI', 5.783918743842291e-06), ('IG', 5.224184671857553e-06), ('AIA', 3.638271467900796e-06), ('CTEC', 3.265115419910971e-06), ('DIG', 2.985248383918602e-06)]


### Clustering coeffiients

The computation for the exact clustering coefficient is doable ...

In [6]:
nodes_clustering_coeff = nx.clustering(G, weight="weight")
# Print top k nodes with highest clustering coefficient
print(f"Top {k} nodes by clustering coefficient") 
print(sorted(nodes_clustering_coeff.items(), key=lambda t: t[1], reverse=True)[:k])
print(f"Global clustering coefficient: {nx.transitivity(G)}")


Top 20 nodes by clustering coefficient
[('RELIANCEP1', 0.00016589196490068256), ('TATAMTRDVR', 4.4570382724807e-05), ('WHIRLPOOL', 3.127923241268114e-05), ('GLAXO', 3.102642871368198e-05), ('CADILAHC', 2.6203799127153454e-05), ('RWVG', 2.3440949852530024e-05), ('BAJAJHLDNG', 2.327737443598335e-05), ('CHT', 2.276084431203389e-05), ('BOSCHLTD', 2.2435231450985962e-05), ('TVSMOTOR', 2.178549609819416e-05), ('UNA', 2.0883890679729935e-05), ('HONAUT', 1.925677269269553e-05), ('AUOTY', 1.9130916486344096e-05), ('PFIZER', 1.882216126163122e-05), ('KANSAINER', 1.761364406751079e-05), ('BANKBARODA', 1.720363579034824e-05), ('KNIP11', 1.5558204365693597e-05), ('SSI', 1.5216670357133018e-05), ('4161', 1.4634795460600534e-05), ('9USDUSD953', 1.4543323738002963e-05)]
Global clustering coefficient: 0.00011406233747100072


### Closeness Centrality

To compute closeness centrality we use an algotithm found on the internet {RICCANZA SPIEGA TU} but we devised our own algorithm to draw a connected sample of `k` nodes from the graph `G`: `connected_random_subgraph(G, n)`.

In [18]:
sub_G = ex.connected_random_subgraph(G, 6000)
c_centralities = ex.closeness_centrality_matrix(sub_G)
print(sorted(c_centralities.items(), key=lambda t: t[1], reverse=True)[:k])

There are 1 components with more than 10000 nodes.
[(4145, 0.0029140443181847322), (7042, 0.00033471582511703077), (6722, 5.488401600682521e-06), (5469, 3.237364448963933e-06), (2468, 2.6533099134342216e-06), (4520, 2.633291299853007e-06), (8846, 1.4001924533222734e-06), (1823, 1.3940655980183724e-06), (9529, 1.242505279267518e-06), (9082, 1.0752513895581802e-06), (3095, 1.0203034516968893e-06), (6821, 9.917066783075425e-07), (5833, 9.741440096117975e-07), (1873, 9.477723043743265e-07), (6014, 7.582048790461929e-07), (7555, 7.246926058189602e-07), (2272, 7.128077388465728e-07), (9666, 5.937751302606947e-07), (7366, 5.74867231245154e-07), (6639, 5.0630237702124e-07)]
