---

_You are currently looking at **version 1.1** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-social-network-analysis/resources/yPcBs) course resource._

---

# Assignment 1 - Creating and Manipulating Graphs

Eight employees at a small company were asked to choose 3 movies that they would most enjoy watching for the upcoming company movie night. These choices are stored in the file `Employee_Movie_Choices.txt`.

A second file, `Employee_Relationships.txt`, has data on the relationships between different coworkers. 

The relationship score has value of `-100` (Enemies) to `+100` (Best Friends). A value of zero means the two employees haven't interacted or are indifferent.

Both files are tab delimited.

In [2]:
import networkx as nx
import pandas as pd
import numpy as np
from networkx.algorithms import bipartite


# This is the set of employees
employees = set(['Pablo',
                 'Lee',
                 'Georgia',
                 'Vincent',
                 'Andy',
                 'Frida',
                 'Joan',
                 'Claude'])

# This is the set of movies
movies = set(['The Shawshank Redemption',
              'Forrest Gump',
              'The Matrix',
              'Anaconda',
              'The Social Network',
              'The Godfather',
              'Monty Python and the Holy Grail',
              'Snakes on a Plane',
              'Kung Fu Panda',
              'The Dark Knight',
              'Mean Girls'])


# you can use the following function to plot graphs
# make sure to comment it out before submitting to the autograder

#def plot_graph(G, weight_name=None):
#    '''
#    G: a networkx G
#    weight_name: name of the attribute for plotting edge weights (if G is weighted)
#    '''
#    %matplotlib notebook
#    import matplotlib.pyplot as plt
#    
#    plt.figure()
#    pos = nx.spring_layout(G)
#    edges = G.edges()
#    weights = None
#    
#    if weight_name:
#        weights = [int(G[u][v][weight_name]) for u,v in edges]
#        labels = nx.get_edge_attributes(G, weight_name)
#        nx.draw_networkx_edge_labels(G, pos, edge_labels=labels)
#        nx.draw_networkx(G, pos, edges=edges, width=weights);
#    else:
#        nx.draw_networkx(G, pos, edges=edges);

### Question 1

Using NetworkX, load in the bipartite graph from `Employee_Movie_Choices.txt` and return that graph.

*This function should return a networkx graph with 19 nodes and 24 edges*

In [10]:
#!cat -T "Employee_Movie_Choices.txt"
# Tabs are shown as ^I  --> tab separated

def answer_one():
        
    # Your Code Here
    G = nx.read_edgelist('Employee_Movie_Choices.txt', delimiter="\t")   
    # data: bool or list of (label,type) tuples.   Tuples specifying dictionary key names and types for edge data

    #G.edges(data=True)
    
    
    return G# Your Answer Here

#answer_one().edges()
#answer_one().nodes()

### Question 2

Using the graph from the previous question, add nodes attributes named `'type'` where movies have the value `'movie'` and employees have the value `'employee'` and return that graph.

*This function should return a networkx graph with node attributes `{'type': 'movie'}` or `{'type': 'employee'}`*

In [4]:
def answer_two():
    # Your Code Here
    G = answer_one()
    
    for n in G.nodes():
        #print(n)
        if n in employees:
            G.add_node(n, type="employee")
        elif n in movies:
            G.add_node(n, type="movie")
        
    return G# Your Answer Here

#answer_two().nodes(data=True)
#plot_graph(answer_two())

### Question 3

Find a weighted projection of the graph from `answer_two` which tells us how many movies different pairs of employees have in common.

*This function should return a weighted projected graph.*

In [9]:
def answer_three():
        
    # Your Code Here
    from networkx.algorithms import bipartite
    G = answer_two()
    
    P = bipartite.weighted_projected_graph(G, employees)
    
    
    return P# Your Answer Here

#answer_three()
#plot_graph(answer_three(), weight_name='weight')
#answer_three().edges(data=True)
#answer_three().nodes(data=True)

[('Andy', {'type': 'employee'}),
 ('Joan', {'type': 'employee'}),
 ('Pablo', {'type': 'employee'}),
 ('Vincent', {'type': 'employee'}),
 ('Frida', {'type': 'employee'}),
 ('Lee', {'type': 'employee'}),
 ('Claude', {'type': 'employee'}),
 ('Georgia', {'type': 'employee'})]

### Question 4

Suppose you'd like to find out if people that have a high relationship score also like the same types of movies.

Find the Pearson correlation ( using `DataFrame.corr()` ) between employee relationship scores and the number of movies they have in common. If two employees have no movies in common it should be treated as a 0, not a missing value, and should be included in the correlation calculation.

*This function should return a float.*

In [13]:
#!cat Employee_Relationships.txt
def answer_four():
        
    # Your Code Here
    # add relationship (symmetric) to the graph
    
    # Find the Pearson correlation between employee relationship scores and the number of movies they have in common.
    # movies in common tulee
    
    G = answer_three()
    
    df = pd.DataFrame(G.edges(data=True), columns=['person_a', 'person_b', 'common_movies'])
    
    # unpack dict
    df["common_movies"] = df["common_movies"].map(lambda x: x["weight"])
    #df.head()  # outcome is with dict
    
    #df_ordered = df.copy()
    
    #for index, row in df.iterrows():
    #    if row["person_a"] > row["person_b"]:
    #        print("a > b, swap needed:\t", row["person_a"], row["person_b"])
    #        
    #        df_ordered.iloc[index]["person_b"] = row["person_a"]
    #        df_ordered.iloc[index]["person_a"] = row["person_b"]
    
    
    # apply row-wise (applymap would apply to each element)
    df["person_a_ordered"] = df.apply(
            lambda x: np.where((x["person_a"] > x["person_b"]), x["person_b"], x["person_a"]), axis=1)
    
    df["person_b_ordered"] = df.apply(
            lambda x: np.where((x["person_a"] > x["person_b"]), x["person_a"], x["person_b"]), axis=1)
    

    # drop old cols
    df = df.drop(["person_a", "person_b"], axis=1)
    
    #rename cols
    df = df.rename(columns={"person_a_ordered": "person_a", "person_b_ordered": "person_b"})
    
    #reorder cols
    df = df[["person_a", "person_b", "common_movies"]]  
    
    likings = pd.read_table("Employee_Relationships.txt", names=["person_a", "person_b", "liking_score"])
    #likings.head()
    
    
    #print(df)
    # set indices
    #df = df.reset_index().set_index(["person_a", "person_b"])  # why "unhashable type: 'numpy.ndarray'
    likings = likings.set_index(["person_a", "person_b"])
    
    #print(df.dtypes)
    #df = pd.DataFrame(df)
    df = df.as_matrix(columns= ["person_a","person_b","common_movies"])
    df = pd.DataFrame(data = df, columns=["person_a","person_b","common_movies"])
    
    ##print(df)
    #print("\n")
    #print(likings)
    
    #merged_df = pd.merge(likings, df, how='left', on=['person_a', 'person_b'])
    #print(merged_df)
    
    
    
    #print(df)
    
    #likings["common_movies"] = 0
    #print(likings)
    
    #for index, row in df.iterrows():
    #    likings.loc[[row["person_a"], row["person_b"]], "common_movies"] = row["common_movies"]
   
    # Cannot get it done for some reason unhashable type: 'numpy.ndarray'
    # slight hack: join by hand
    
    commonMoviesCountList = [1,1,1,1,1,1,0,0,3,0,
                                 0,0,0,0,0,0,2,2,0,0,
                                 0,0,3,0,0,0,0,1]
    
    # add to likings df
    counts = pd.Series(commonMoviesCountList)
    likings["common_movies"] = counts.values
    
    #print(likings)
    
    #print(likings.corr())
    corr_coef = likings["liking_score"].corr(likings["common_movies"])
    
    return float(corr_coef)

# Not correct    

answer_four()

0.7883962221733477