# Creating and Manipulating Graphs

Eight employees at a small company were asked to choose 3 movies that they would most enjoy watching for the upcoming company movie night. These choices are stored in the file `Employee_Movie_Choices.txt`.

A second file, `Employee_Relationships.txt`, has data on the relationships between different coworkers. 

The relationship score has value of `-100` (Enemies) to `+100` (Best Friends). A value of zero means the two employees haven't interacted or are indifferent.

Both files are tab delimited.

---

_You are currently looking at **version 1.0** of this notebook._

---

### Import

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
from itertools import permutations

from networkx.algorithms import bipartite

### Graph basics

In [None]:
G = nx.Graph()   # or DiGraph, MultiGraph, MultiDiGraph, etc
G.add_path([0, 1, 2])
G.number_of_nodes()
G.number_of_edges() 
G.nodes()
G.edges()
G.degree()

In [None]:
nx.draw_networkx(G)

In [None]:
G.add_node(1, time='5pm')
G.nodes(data=True) 
G.edges(data=True)

In [None]:
G.add_node(1, type='type descr')
G.nodes(data=True), 
G.edges()

In [None]:
G[0]
G[0][1] # edge between nodes 0 and 1
G[1][0]
nx.get_node_attributes(G, 'time')

In [None]:
nx.spring_layout(G) # plot coords

### Data

In [None]:
# This is the set of employees
employees = set(['Pablo',
                 'Lee',
                 'Georgia',
                 'Vincent',
                 'Andy',
                 'Frida',
                 'Joan',
                 'Claude'])

# This is the set of movies
movies = set(['The Shawshank Redemption',
              'Forrest Gump',
              'The Matrix',
              'Anaconda',
              'The Social Network',
              'The Godfather',
              'Monty Python and the Holy Grail',
              'Snakes on a Plane',
              'Kung Fu Panda',
              'The Dark Knight',
              'Mean Girls'])

### Helper to plot graph

In [None]:
def plot_graph(G, weight_name=None):
    '''Plot NX graphs.
    G: a networkx G
    weight_name: name of the attribute for plotting edge weights (if G is weighted)
    '''
    plt.figure()
    pos = nx.spring_layout(G)
    edges = G.edges()
    weights = None
    
    if weight_name:
        weights = [int(G[u][v][weight_name]) for u,v in edges]
        labels = nx.get_edge_attributes(G, weight_name)
        nx.draw_networkx_edge_labels(G, pos, edge_labels=labels)
        nx.draw_networkx(G, pos, edges=edges, width=weights);
    else:
        nx.draw_networkx(G, pos, edges=edges);

### Load bipartite graph

Using NetworkX, load in the bipartite graph from `Employee_Movie_Choices.txt` and return that graph.

In [None]:
# !find ../.. | grep -i Employee_Movie_Choices.txt

In [None]:
G = nx.read_edgelist('../_data/Employee_Movie_Choices.txt', delimiter='\t')
G.edges(data=True)

plot_graph(G)

### Node attributes

Using the graph from the previous question, add nodes attributes named `'type'` where movies have the value `'movie'` and employees have the value `'employee'` and return that graph.

In [None]:
G_df = pd.read_csv('../_data/Employee_Movie_Choices.txt', delimiter='\t', skiprows=1, names=['employee', 'movie'])
G_df.sample(3)

G = nx.read_edgelist('../_data/Employee_Movie_Choices.txt', delimiter='\t')
_ = [G.add_node(n, type='employee') for n in G.nodes if n in employees]
_ = [G.add_node(n, type='movie') for n in G.nodes if n in movies]
G.nodes(data=True)

plot_graph(G)
# G.edges(data=True)

### Weighted projection of the graph

Find a weighted projection of the graph which tells us how many movies different pairs of employees have in common.

In [None]:
L = [x for x in G if G.node[x]['type']=='employee']
R = [x for x in G if G.node[x]['type']=='movie']
L
R

#### Bipartite graph

In [None]:
B = nx.Graph() 
B.add_nodes_from(L, bipartite=0)
B.add_nodes_from(R, bipartite=1)
B.nodes(data=True)

B.add_edges_from(G.edges())
B.edges(data=True)
assert bipartite.is_bipartite(B) # Check if B is bipartite

plot_graph(B)

#### Bipartite graph with weighted projection

Number of movies employees have in common {weight}

In [None]:
G2 = bipartite.weighted_projected_graph(B, L)
G2.edges(data=True)
plot_graph(G2, 'weight')

number of edges per employee

In [None]:
G2.degree
G2.degree['Andy']

### Question 4

Suppose you'd like to find out if people that have a high relationship score also like the same types of movies.

Find the Pearson correlation ( using `DataFrame.corr()` ) between employee relationship scores and the number of movies they have in common. If two employees have no movies in common it should be treated as a 0, not a missing value, and should be included in the correlation calculation.

In [None]:
G_df = pd.read_csv('../_data/Employee_Relationships.txt', delimiter='\t', skiprows=1, names=['emp1', 'emp2', 'score'])
G_df.sample(3)

#### Convert dataframe to graph

In [None]:
G1 = nx.from_pandas_dataframe(G_df, 'emp1', 'emp2', edge_attr='score')
G1.edges(data=True)

#### Compose = Merge 2 graphs

In [None]:
G3 = nx.compose(G1, G2)
G3.edges(data=True)

In [None]:
df = pd.DataFrame(list(G3.edges(data=True)), columns=['emp1', 'emp2', 'score_weight'])
df.sample(5)

#### Split score_weight in 2 features

In [None]:
def fun(x, ftr):
    try:
        x[ftr]
    except:
        return 0
    return x[ftr]

df['score'] = df['score_weight'].map(lambda x: fun(x, 'score'))
df['weight'] = df['score_weight'].map(lambda x: fun(x, 'weight'))
df.sample(5)

### Correlation

Correlation between relationshipscore and movies in common.

In [None]:
df['weight'].corr(df['score'])