## Last Exercise: Adsorption Algorithm

### 1) Implement Adsorption Algorithm

Fill in the appropriate blanks.

Note that Adsorption, or family of Label Propagation in general is supposed to be fast. So you do not need to bother with checking convergence and just use a maximum number of iterations.

It can be either synchronous (you update l_inferred after iterating everynode) or asynchronous (you update current node's l_inferred then proceed to next one in the same iteration). Both is fine. Here I implemented asynchronous version. 

In [None]:
import networkx as nx
from math import log
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
import random
%matplotlib inline

In [None]:
# Load the Graph of Zachary's Karate Club
G = nx.karate_club_graph()
labels = [lab[1] for lab in G.nodes(data = 'club')]
colors = ["blue"  if (label == 'Mr. Hi') else 'green' for label in labels]

In [None]:
# Ground Truth For The Karate Graph
pos = nx.spring_layout(G)
nx.draw(G, with_labels=True, node_color = colors, pos = pos, label = 'Ground Truth')
plt.show()

In [None]:
# Initialize vectors and parameters
nodes_num = len(G.nodes())
edge_num = len(G.edges())
label_num = len(set(labels)) + 1
degrees = np.array([tup[1] for tup in G.degree()])

In [None]:
# We need seeds to propagate their label. You can randomly use some of them.  
def randomly_sample_ground_truth(frac):
    l_apriori = np.zeros((nodes_num, label_num))

    # Ground truth
    sample = random.sample(list(G.nodes()), int(nodes_num*random_frac))
    club_index_mapping = {
        'Mr. Hi': 0,
        'Officer': 1
    }

    for node in sample:
        club = G.nodes[node]['club']
        club_index = club_index_mapping[club]
        l_apriori[int(node)][club_index] = 1

    return l_apriori

In [None]:
seed_l_apriori = randomly_sample_ground_truth(frac = 0.5)

In [None]:
color_map = []
for node_labels in seed_l_apriori:
    if (node_labels[0] == 1):
        color_map.append('blue')
    elif(node_labels[1] == 1):
        color_map.append('green')
    else:
        color_map.append('red')

nx.draw(G, with_labels=True, node_color = color_map, pos = pos, label = 'Seed Nodes')
plt.show()

Note that probability parameters in the original paper proposing the algorithm (and the lecture) are set heuristically. If the outcome is not as expected, i.e. everynode is labeled as unknown, play with the probability parameters. 

Warning: For this graph, set abandon probability to zero.

In [None]:
H = np.zeros(nodes_num)
for i, node in enumerate(G.nodes()):
    H[i] = # fill in

c = np.zeros(nodes_num)
for i in range(0, nodes_num):
    c[i] = # fill in

d = np.zeros(nodes_num)
labeled = np.sum(l_apriori, axis=1)
z = np.zeros(nodes_num)

for i in range(0, nodes_num):
    if (labeled[i] != 0):
        d[i] = # fill in

    z[i] = # fill in

p_inj = # fill in
p_cont = # fill in
p_abandon = # fill in

In [None]:
l_apriori = np.array(seed_l_apriori)
l_inferred = # fill in
l_unknown = np.zeros((nodes_num, label_num))
l_unknown[:, -1] = # fill in

In [None]:
# THE ALGORITHM
max_iter = 3
for i in range(0, max_iter):
    for i, node in enumerate(G.nodes()):
        l_inferred_temp = np.zeros((label_num))
        for neigh in G.neighbors(n=node):  # note: in the example graph, indexes are also node names
            l_inferred_temp += # fill in
        
        inj = # fill in
        cont = # fill in
        abandon = # fill in

        l_inferred[i] = # fill in


In [None]:
# FINAL LABELS
color_map = []
labeled = 0
for node_labels in l_inferred:
    if (node_labels[0] > 0.5):
        color_map.append('blue')
        labeled += 1
    elif(node_labels[1] > 0.5):
        color_map.append('green')
        labeled += 1
    else:
        color_map.append('red')

nx.draw(G, with_labels=True, node_color = color_map, pos = pos, label = 'Final Labels')
plt.show()

In [None]:
print(l_inferred)

## 2. Compare With Louvain Method

First, install louvain for networkx by 
pip install python-louvain

Then run the following code to see the results. How the outcome is different then the ground truth and the outcome of Adsorption algorithm?

In [None]:
community_dict = community.best_partition(G)
print(community_dict)

# Color mapping for community names
colors = []
for key, item in community_dict.items():
    print(item)
    if(item == 0):
        colors.append('yellow')
    elif(item == 1):
        colors.append('green')
    elif(item == 2):
        colors.append('blue')
    elif(item == 3):
        colors.append('magenta')
    elif(item == 4):
        colors.append('cyan')
    elif(item == 5):
        colors.append('black')
    else:
        colors.append('white')


print(colors)
plt.clf()
nx.draw(G, with_labels=True, node_color = colors, pos = pos, label = 'Louvain Labels')
plt.show()

## 3. Manually Labeled Seeds (Exercise)

In the first part, we randomly used some of the ground truth as seed, so you may have had different results depending on how lucky you are.

Try labeling a handful of nodes (cheating from the ground truth) and evaluate the results. 

First, try labeling high degree nodes, then try labeling low degree nodes.

In a real setting, when you have huge amounts of unlabeled data, you can try this method; manually annotate some seed nodes then predict the rest.