---

_You are currently looking at **version 1.1** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-social-network-analysis/resources/yPcBs) course resource._

---

# Assignment 1 - Creating and Manipulating Graphs

Eight employees at a small company were asked to choose 3 movies that they would most enjoy watching for the upcoming company movie night. These choices are stored in the file `Employee_Movie_Choices.txt`.

A second file, `Employee_Relationships.txt`, has data on the relationships between different coworkers. 

The relationship score has value of `-100` (Enemies) to `+100` (Best Friends). A value of zero means the two employees haven't interacted or are indifferent.

Both files are tab delimited.

In [None]:
import networkx as nx
import pandas as pd
import numpy as np
from networkx.algorithms import bipartite


# This is the set of employees
employees = set(['Pablo',
                 'Lee',
                 'Georgia',
                 'Vincent',
                 'Andy',
                 'Frida',
                 'Joan',
                 'Claude'])

# This is the set of movies
movies = set(['The Shawshank Redemption',
              'Forrest Gump',
              'The Matrix',
              'Anaconda',
              'The Social Network',
              'The Godfather',
              'Monty Python and the Holy Grail',
              'Snakes on a Plane',
              'Kung Fu Panda',
              'The Dark Knight',
              'Mean Girls'])


# you can use the following function to plot graphs
# make sure to comment it out before submitting to the autograder
def plot_graph(G, weight_name=None):
    '''
    G: a networkx G
    weight_name: name of the attribute for plotting edge weights (if G is weighted)
    '''
    %matplotlib notebook
    import matplotlib.pyplot as plt
    
    plt.figure()
    pos = nx.spring_layout(G)
    edges = G.edges()
    weights = None
    
    if weight_name:
        weights = [int(G[u][v][weight_name]) for u,v in edges]
        labels = nx.get_edge_attributes(G,weight_name)
        nx.draw_networkx_edge_labels(G,pos,edge_labels=labels)
        nx.draw_networkx(G, pos, edges=edges, width=weights);
    else:
        nx.draw_networkx(G, pos, edges=edges);

### Question 1

Using NetworkX, load in the bipartite graph from `Employee_Movie_Choices.txt` and return that graph.

*This function should return a networkx graph with 19 nodes and 24 edges*

In [None]:
def answer_one():
    movie_graph = nx.read_adjlist('Employee_Movie_Choices.txt', delimiter='\t')
    
    return movie_graph
# a = answer_one()
# a.edges
# plot_graph(a)

### Question 2

Using the graph from the previous question, add nodes attributes named `'type'` where movies have the value `'movie'` and employees have the value `'employee'` and return that graph.

*This function should return a networkx graph with node attributes `{'type': 'movie'}` or `{'type': 'employee'}`*

In [None]:
def answer_two():
    
    movie_graph = answer_one()
    
    # convert list of employees & movies into a dict describing
    # the type of node associated with them
    type = dict([(employee, 'employee') for employee in employees] +
                [(movie, 'movie') for movie in movies])
#     This line is compatible with the latest version
#     but for some reason they decided to invert the last
#     two arguments and the grader uses an older version
#     of networkx.
#     nx.set_node_attributes(movie_graph, type, 'type')

    # this is an old way to use this function. It will
    # not work in newer versions of networkx because
    # the last two arguments are now reversed.
    nx.set_node_attributes(movie_graph, 'type', type)
    
#     The code below works but is not accepted by the grader
#     for node in movie_graph.nodes:
#         if node in employees:
#             movie_graph.nodes[node]['type'] = 'employee'
#         elif node in movies:
#             movie_graph.nodes[node]['type'] = 'movie'
    
    return movie_graph

# answer_two().nodes(data=True)

### Question 3

Find a weighted projection of the graph from `answer_two` which tells us how many movies different pairs of employees have in common.

*This function should return a weighted projected graph.*

In [None]:
def answer_three():
    movie_graph = bipartite.weighted_projected_graph(answer_two(), set(employees))
    
    return movie_graph
# answer_three().edges(data=True)

### Question 4

Suppose you'd like to find out if people that have a high relationship score also like the same types of movies.

Find the Pearson correlation ( using `DataFrame.corr()` ) between employee relationship scores and the number of movies they have in common. If two employees have no movies in common it should be treated as a 0, not a missing value, and should be included in the correlation calculation.

*This function should return a float.*

In [None]:
def answer_four():
    # transform weighted edges to a pandas DataFrame
    empl_rel = nx.to_pandas_edgelist(answer_three())
    # Create a column to merge on. The relationship data has
    # the first and second person listed in alphabetical order
    # but the results of answer_three does not, this column is req'd
    empl_rel['match'] = list(zip(empl_rel['source'], empl_rel['target']))
    empl_rel['match'] = tuple(empl_rel['match'].apply(lambda x: sorted(x)))

    # read the employee relationship and create the merge column
    # in this case, the column is already sorted
    love_hate = pd.read_csv('Employee_Relationships.txt', 
                            sep = '\t',
                            names=['source', 'target', 'relationship'])
    love_hate['match'] = list(zip(love_hate['source'], love_hate['target']))
    
    # merge the two DataFrames. In this case an outer merge is used
    # because answer_three only returns weights > 0, but some
    # relationships do not have movies in common
    empl_rel = love_hate.merge(empl_rel, on='match', how='outer')
    empl_rel['weight'].fillna(value=0, inplace = True)
    
    return float(empl_rel.loc[:,['weight','relationship']].corr('pearson').iloc[0,1])

    # as usual, the grader uses an outdated version of pandas
    # and some of the operations above do not sit well with it.
    # The hard-coded return value below is the result of the code 
    # above. I simply commented the code above for submission.
#     return 0.7883962221733476
# answer_four()