## List of graphs

In this set of exercises, you'll use a college messaging dataset to learn how to filter graphs for time series analysis. In this dataset, nodes are students, and edges denote messages being sent from one student to another. The graph as it stands right now captures all communications at all time points.

Let's start by analyzing the graphs in which only the edges change over time.

The dataset has been loaded into a DataFrame called data. Feel free to explore it in the IPython Shell. Specifically, check out the output of data['sender'] and data['recipient'].

### Instructions
    - Initialize an empty list called Gs.
    - Use a for loop to iterate over months. Inside the loop:
        - Instantiate a new undirected graph called G, using the nx.Graph() function.
        - Add in all nodes that have ever shown up to the graph. To do this, use the .add_nodes_from() method on G two times, first with data['sender'] as argument, and then with data['recipient'].
        - Filter the DataFrame so there's only the given month. This has been done for you.
        - Add edges from the filtered DataFrame. To do this, use the .add_edges_from() method with df_filtered['sender'] and df_filtered['recipient'] passed into zip().
        - Append G to the list of graphs Gs.

In [None]:
import networkx as nx 

months = range(4, 11)

# Initialize an empty list: Gs
Gs = [] 
for month in months:
    # Instantiate a new undirected graph: G
    G = nx.Graph()
    
    # Add in all nodes that have ever shown up to the graph
    G.add_nodes_from(data['sender'])
    G.add_nodes_from(data['recipient'])
    
    # Filter the DataFrame so that there's only the given month
    df_filtered = data[data['month'] == month]
    
    # Add edges from filtered DataFrame
    G.add_edges_from(zip(df_filtered['sender'], df_filtered['recipient']))
    
    # Append G to the list of graphs
    Gs.append(G)
    
print(len(Gs))

## Graph differences over time

Now, you'll compute the graph differences over time! To look at the simplest case, here you'll use a window of (month, month + 1), and then keep track of the edges gained or lost over time. This exercise is preparation for the next exercise, in which you will visualize the changes over time.

### Instructions
    - Inside the for loop:
        - Assign Gs[i] to g1 and Gs[i + window] to g2.
        - Using nx.difference() compute the difference between g2 and g1. Append the result to added.
        - Append the difference between g1 and g2 to removed.
    - Print fractional_changes.

In [None]:
import networkx as nx  
# Instantiate a list of graphs that show edges added: added
added = []
# Instantiate a list of graphs that show edges removed: removed
removed = []
# Here's the fractional change over time
fractional_changes = []
window = 1

for i in range(len(Gs) - window):
    g1 = Gs[i]
    g2 = Gs[i + window]
        
    # Compute graph difference here
    added.append(nx.difference(g2, g1))   
    removed.append(nx.difference(g1, g2))
    
    # Compute change in graph size over time
    fractional_changes.append((len(g2.edges()) - len(g1.edges())) / len(g1.edges()))
    
# Print the fractional change
print(fractional_changes)

## Plot number of edge changes over time

You're now going to make some plots! All of the lists that you've created before have been loaded for you in this exercise too. Do not worry about some of the fancy matplotlib code that shows up below: there are comments to help you understand what's going on.

### Instructions
    - Plot the number of edges added over time. To do this:
        - Use a list comprehension to iterate over added and create a list called edges_added. The output expression of the list comprehension is len(g.edges()), where g is your iterator variable.
        - Pass in the edges_added list to ax1.plot().
    - Plot the number of edges removed over time. Once again, use a list comprehension, this time iterating over removed instead of added.
    - Plot the fractional changes over time by passing it in as an argument to ax2.plot().

In [None]:
# Import matplotlib
import matplotlib.pyplot as plt

fig = plt.figure()
ax1 = fig.add_subplot(111)

# Plot the number of edges added over time
edges_added = [len(g.edges()) for g in added]
plot1 = ax1.plot(edges_added, label='added', color='orange')

# Plot the number of edges removed over time
edges_removed = [len(g.edges()) for g in removed]
plot2 = ax1.plot(edges_removed, label='removed', color='purple')

# Set yscale to logarithmic scale
ax1.set_yscale('log')  
ax1.legend()

# 2nd axes shares x-axis with 1st axes object
ax2 = ax1.twinx()

# Plot the fractional changes over time
plot3 = ax2.plot(fractional_changes, label='fractional change', color='green')

# Here, we create a single legend for both plots
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax2.legend(lines1 + lines2, labels1 + labels2, loc=0)
plt.axhline(0, color='green', linestyle='--')
plt.show()

## Number of edges over time

You're now going to get some practice plotting other evolving graph statistics. We'll start with a simpler exercise to kick things off. First off, plot the number of edges over time.

To do this, you'll create a list of the number of edges per month. The index of this list will correspond to the months elapsed since the first month.

### Instructions
    - Import matplotlib.pyplot as plt.
    - Create a list of the number of edges per month called edge_sizes. Use a list comprehension to do this, where you iterate over Gs using an iterator variable called g, and your output expression is len(g.edges()).
    - Plot edge sizes over time.

In [None]:
# Import matplotlib
import matplotlib.pyplot as plt

fig = plt.figure()

# Create a list of the number of edges per month
edge_sizes = [len(g.edges()) for g in Gs]

# Plot edge sizes over time
plt.plot(edge_sizes)
plt.xlabel('Time elapsed from first month (in months).') 
plt.ylabel('Number of edges')                           
plt.show() 

## Degree centrality over time

Now, you're going to plot the degree centrality distribution over time. Remember that the ECDF function will be provided, so you won't have to implement it.

### Instructions
    - Create a list of degree centrality scores month-by-month. To do this:
        - In each iteration of the first for loop, compute the degree centrality of G using the nx.degree_centrality() function. Save the result as cent.
        - Append cent to the list cents.
    - Plot ECDFs over time. To do this:
        - Iterate over range(len(cents)) using a for loop. Inside the loop, use the ECDF() function with cents[i].values() as the argument. Unpack the output of this into x and y.
        - Pass x and y as arguments to plt.plot().

In [None]:
# Import necessary modules
import networkx as nx
import matplotlib.pyplot as plt

# Create a list of degree centrality scores month-by-month
cents = []
for G in Gs:
    cent = nx.degree_centrality(G)
    cents.append(cent)


# Plot ECDFs over time
fig = plt.figure()
for i in range(len(cents)):
    x, y = ECDF(cents[i].values()) 
    plt.plot(x, y, label='Month {0}'.format(i+1)) 
plt.legend()   
plt.show()