# Sources Overlap graphs

#### Since the pie charts are too busy and don't give enough information, try for a network graph using pyvis.

#### Want to import from the google sheet or at least a version of it that sn't too far off. Could import as a matrix, the rows and column headers are nodes, and the existance of items in cells are edges. Would be great to colour attributes and features by essential element, and to have their relative size dictated by the number of edges connected to them.

In [25]:
import pandas as pd
import numpy as np
from scipy import sparse
import networkx as nx
from pyvis.network import Network

In [31]:
df = pd.read_excel('SourcesOverlapNetwork1.xlsx', dtype=int)
print(df.values)

[[1 0 1 ... 0 0 0]
 [1 0 1 ... 0 0 0]
 [1 0 1 ... 0 0 0]
 ...
 [0 0 0 ... 0 1 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 0 0]]


In [45]:
df.columns[0]

'GORC IG Typology'

#### This is a bipartite graph (https://en.wikipedia.org/wiki/Bipartite_graph), following the suggestion here: https://stackoverflow.com/questions/59862598/importing-non-square-adjacency-matrix-into-networkx-python

In [50]:
sm = sparse.csr_matrix(df.values)
graphdata = nx.algorithms.bipartite.matrix.from_biadjacency_matrix(sm)

# visualize it with pyvis
Netdata = Network(height='800px', width='100%', bgcolor='#222222', font_color='white',notebook=True)
Netdata.barnes_hut()
for n in graphdata.nodes:
    if int(n) >= 584:
        Netdata.add_node(int(n), title=df.columns[int(n)-584],color='red', label=df.columns[int(n)-584])
    else:
        Netdata.add_node(int(n))
for e in graphdata.edges:
    Netdata.add_edge(int(e[0]), int(e[1]))
    
Netdata.write_html('SourcesOverlap.html')

Local cdn resources have problems on chrome/safari when used in jupyter-notebook. 


#### Works! Very bare bones and takes A WHILE (1-3 min) to load. Should definitely revisit once the attribute list is reduced and finalized for v1.
#### Next steps will be to add labels, perferably tool tips, to all the nodes and decide on colour and shapes. Need to at least differentiate between source and attribute and KPI nodes, would be nice to have essential element differentiation as well.
#### Looks like tooltips work with titles, size are the size of the node, labels are floating next to the node, and color sets the colour. Can pass in as lists, same number as there are nodes.

## Test

#### From https://stackoverflow.com/questions/63492418/plotting-a-graph-with-neworkx-using-a-csv-file-as-co-occurence-matrix

In [11]:
# creating a dummy adjacency matrix of shape 20x20 with random values of 0 to 3
adj_mat = np.random.randint(0, 3, size=(20, 20))
np.fill_diagonal(adj_mat, 0)  # setting the diagonal values as 0
dftest = pd.DataFrame(adj_mat)

# create a graph from your dataframe
G = nx.from_pandas_adjacency(dftest)

# visualize it with pyvis
N = Network(height='100%', width='100%', bgcolor='#222222', font_color='white',notebook=True)
N.barnes_hut()
for n in G.nodes:
    N.add_node(int(n))
for e in G.edges:
    N.add_edge(int(e[0]), int(e[1]))

#N.write_html('coocc-graph.html')
N.show('coocc-graph.html')

Local cdn resources have problems on chrome/safari when used in jupyter-notebook. 


In [12]:
print(dftest)

    0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  \
0    0   0   1   1   2   2   0   2   0   2   2   1   0   1   1   1   0   2   
1    1   0   1   1   1   1   0   0   1   1   0   1   0   1   0   2   1   2   
2    2   1   0   0   2   2   2   2   2   1   2   0   1   2   2   1   0   0   
3    2   2   1   0   2   1   1   1   2   0   2   0   2   0   1   1   2   0   
4    1   2   1   1   0   0   1   2   2   2   1   1   2   1   1   2   1   0   
5    1   1   1   0   2   0   0   1   2   2   0   1   1   1   0   1   1   0   
6    2   2   2   2   2   0   0   0   0   2   2   0   0   1   2   2   0   2   
7    2   0   0   2   1   0   0   0   1   0   0   1   0   0   0   2   1   1   
8    0   2   0   1   2   0   0   0   0   0   2   2   2   2   0   1   1   1   
9    0   2   1   2   2   2   1   2   2   0   1   1   1   0   2   2   0   1   
10   2   0   0   0   1   0   0   1   0   0   0   2   0   2   2   1   2   1   
11   0   0   0   1   1   2   1   0   0   1   2   0   0   0   1  

In [13]:
print(G)

Graph with 20 nodes and 166 edges


In [19]:
print(adj_mat)

[[0 0 1 1 2 2 0 2 0 2 2 1 0 1 1 1 0 2 2 2]
 [1 0 1 1 1 1 0 0 1 1 0 1 0 1 0 2 1 2 0 0]
 [2 1 0 0 2 2 2 2 2 1 2 0 1 2 2 1 0 0 2 2]
 [2 2 1 0 2 1 1 1 2 0 2 0 2 0 1 1 2 0 0 1]
 [1 2 1 1 0 0 1 2 2 2 1 1 2 1 1 2 1 0 0 2]
 [1 1 1 0 2 0 0 1 2 2 0 1 1 1 0 1 1 0 1 0]
 [2 2 2 2 2 0 0 0 0 2 2 0 0 1 2 2 0 2 1 2]
 [2 0 0 2 1 0 0 0 1 0 0 1 0 0 0 2 1 1 2 2]
 [0 2 0 1 2 0 0 0 0 0 2 2 2 2 0 1 1 1 0 2]
 [0 2 1 2 2 2 1 2 2 0 1 1 1 0 2 2 0 1 2 0]
 [2 0 0 0 1 0 0 1 0 0 0 2 0 2 2 1 2 1 0 2]
 [0 0 0 1 1 2 1 0 0 1 2 0 0 0 1 1 2 1 0 0]
 [1 1 2 1 0 2 2 0 0 1 1 2 0 1 0 1 1 0 0 2]
 [0 2 1 1 1 0 1 0 2 1 2 1 1 0 2 2 1 1 2 0]
 [1 2 2 0 2 1 2 1 0 1 0 2 1 0 0 0 0 1 0 0]
 [2 1 2 0 0 0 1 1 0 1 2 2 0 0 2 0 2 1 1 1]
 [1 2 2 2 1 1 2 0 2 2 0 1 2 2 2 0 0 1 1 0]
 [0 0 2 1 0 0 0 2 1 1 1 2 0 0 0 2 1 0 1 2]
 [0 0 2 0 0 0 1 1 0 2 0 0 2 0 0 1 0 0 0 1]
 [0 1 2 1 0 2 1 0 0 1 0 0 0 0 0 1 2 2 0 0]]


In [14]:
for n in G.nodes:
    print(n)
for e in G.edges:
    print(e)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(0, 2)
(0, 3)
(0, 4)
(0, 5)
(0, 7)
(0, 9)
(0, 10)
(0, 11)
(0, 13)
(0, 14)
(0, 15)
(0, 17)
(0, 18)
(0, 19)
(0, 1)
(0, 6)
(0, 12)
(0, 16)
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(1, 8)
(1, 9)
(1, 11)
(1, 13)
(1, 15)
(1, 16)
(1, 17)
(1, 6)
(1, 12)
(1, 14)
(1, 19)
(2, 4)
(2, 5)
(2, 6)
(2, 7)
(2, 8)
(2, 9)
(2, 10)
(2, 12)
(2, 13)
(2, 14)
(2, 15)
(2, 18)
(2, 19)
(2, 3)
(2, 16)
(2, 17)
(3, 4)
(3, 5)
(3, 6)
(3, 7)
(3, 8)
(3, 10)
(3, 12)
(3, 14)
(3, 15)
(3, 16)
(3, 19)
(3, 9)
(3, 11)
(3, 13)
(3, 17)
(4, 6)
(4, 7)
(4, 8)
(4, 9)
(4, 10)
(4, 11)
(4, 12)
(4, 13)
(4, 14)
(4, 15)
(4, 16)
(4, 19)
(4, 5)
(5, 7)
(5, 8)
(5, 9)
(5, 11)
(5, 12)
(5, 13)
(5, 15)
(5, 16)
(5, 18)
(5, 14)
(5, 19)
(6, 9)
(6, 10)
(6, 13)
(6, 14)
(6, 15)
(6, 17)
(6, 18)
(6, 19)
(6, 11)
(6, 12)
(6, 16)
(7, 8)
(7, 11)
(7, 15)
(7, 16)
(7, 17)
(7, 18)
(7, 19)
(7, 9)
(7, 10)
(7, 14)
(8, 10)
(8, 11)
(8, 12)
(8, 13)
(8, 15)
(8, 16)
(8, 17)
(8, 19)
(8, 9)
(9, 10)
(9, 11)
(9, 12)
(9, 14)
(9, 15)
(9, 17

In [37]:
print(sm)

  (0, 0)	1
  (0, 2)	1
  (0, 7)	1
  (0, 32)	1
  (1, 0)	1
  (1, 2)	1
  (1, 29)	1
  (2, 0)	1
  (2, 2)	1
  (3, 0)	1
  (3, 2)	1
  (3, 32)	1
  (4, 0)	1
  (4, 2)	1
  (4, 32)	1
  (5, 0)	1
  (5, 2)	1
  (6, 0)	1
  (6, 2)	1
  (7, 2)	1
  (7, 11)	1
  (7, 41)	1
  (8, 1)	1
  (8, 2)	1
  (8, 32)	1
  :	:
  (573, 35)	1
  (574, 2)	1
  (574, 16)	1
  (574, 32)	1
  (574, 35)	1
  (575, 2)	1
  (575, 35)	1
  (576, 2)	1
  (577, 2)	1
  (577, 36)	1
  (577, 37)	1
  (577, 38)	1
  (577, 41)	1
  (578, 1)	1
  (578, 2)	1
  (578, 35)	1
  (579, 37)	1
  (580, 27)	1
  (580, 32)	1
  (581, 16)	1
  (581, 28)	1
  (581, 32)	1
  (581, 35)	1
  (581, 40)	1
  (582, 41)	1


In [39]:
graphdata.edges

EdgeView([(0, 584), (0, 586), (0, 591), (0, 616), (1, 584), (1, 586), (1, 613), (2, 584), (2, 586), (3, 584), (3, 586), (3, 616), (4, 584), (4, 586), (4, 616), (5, 584), (5, 586), (6, 584), (6, 586), (7, 586), (7, 595), (7, 625), (8, 585), (8, 586), (8, 616), (9, 586), (9, 591), (9, 616), (10, 586), (11, 584), (11, 586), (11, 591), (11, 613), (11, 616), (11, 625), (12, 586), (12, 625), (13, 586), (13, 625), (14, 586), (14, 625), (15, 586), (16, 588), (16, 613), (16, 616), (17, 586), (17, 588), (17, 614), (18, 586), (18, 588), (18, 616), (18, 623), (19, 586), (19, 616), (19, 619), (19, 624), (19, 625), (20, 586), (20, 625), (21, 625), (22, 584), (22, 586), (22, 613), (23, 584), (23, 586), (23, 613), (24, 584), (24, 586), (24, 613), (24, 624), (25, 614), (25, 624), (26, 586), (26, 613), (26, 616), (26, 625), (27, 586), (28, 586), (28, 589), (29, 586), (29, 620), (30, 586), (30, 620), (31, 586), (31, 620), (32, 586), (33, 584), (33, 586), (34, 584), (34, 586), (34, 591), (34, 622), (35, 5