# Lab 07 - Social Network Analysis

Social network analysis is a type of analysis which interprets, analyzes, and visualises *relational* data. Instead of beginning from the person or tweet as the unit of analysis, with social network analysis (or SNA) we begin from the relationship between the two.

The building blocks of a network are *nodes* and *edges*. Nodes represent individuals in the network. They are people, tweets, firms, Twitter users, etc. They are the thing doing the interaction.

![](img/net-1-node.png)

The connection between nodes are called *edges*. They imply some kind of relationship between the edges. This interaction could be friendship, mutual attendance of an event, dating, or has done business with.

![](img/net-1-edge.png)

Edges can be *directed* or *undirected*. For instance, on Facebook, friendships are mutual and both parties must agree to that friendship. Therefore, it is called *undirected* because it is by definition a two-way relationship. However, on Twitter, user A can follow user B, but user B does not have to follow user A. This is called a *directed* graph because it can be a one-way relationship. 


Lastly, edges can be *weighted*. Weights are usually numerical values which indicate a strength of a relationship. The edge between you and your best friend is probably higher than you and one of your classmates who you do not speak to often.

![](http://evelinag.com/blog/2015/12-15-star-wars-social-network/star-wars-logo.png)

In this lab we will be using a small network that indicates [interactions in the movie Star Wars Episode IV](http://evelinag.com/blog/2015/12-15-star-wars-social-network/). Here, each node is a character and each edge indicates whether they appeared together in a scene of the movie. Edges here are thus undirected and they also have weights attached, since they can appear in multiple scenes together.

The first step is to read the list of edges in this network. For this exercise, we are going to use the <code>[networkx](https://networkx.github.io/)</code> module to read, analyse, and visualise the networks. 

In [None]:
import networkx as nx

We will also use the <code>matplotlib</code> module for visualisation.

In [None]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

We can read in the network as a weighted edgelist. This is a CSV file in the format of <code>node1, node2, weight</code>. 

In [None]:
G = nx.read_weighted_edgelist('data/star-wars-network-edges.csv', delimiter = ",")

We can use a method to see all the edges in the network.

In [None]:
G.edges()

And we can use a similar one to see all the nodes.

In [None]:
G.nodes()

To see a specific attribute of an edge, we need to use <code>get_edge_attributes</code>. Who seems to have the highest weight in their interactions?

In [None]:
nx.get_edge_attributes(G, 'weight')

Now we're ready to draw. We're going to play with two layouts. The first is a "circular" layout, which is useful because we can see all the nodes and the connections between them. However, with this layout, we have a harder time seeing what groups of nodes seem to cluster together.

The second layout is called "Fruchterman-Reingold". It is a "force-directed" layout, which implies that if subnetworks seem to be tied closer together, they squeeze together more in the graph. Let's play with both.

In [None]:
#pos = nx.circular_layout(G, scale = 2)
pos = nx.fruchterman_reingold_layout(G)

In addition changing layout, we also need to display the weight of the edge. We can do this by setting some levels for line weights. We can have three: small, mid, and large. 

Lastly, we can set the colours of nodes based on whether the person is on the light side, the dark side, or is other. 

In [None]:
## select edges by weight
esmall = [(u,v) for (u,v,d) in G.edges(data = True) if d['weight']  < 5]
emid   = [(u,v) for (u,v,d) in G.edges(data = True) if d['weight'] >= 5 and d['weight'] < 10 ]
elarge = [(u,v) for (u,v,d) in G.edges(data = True) if d['weight'] >= 10]

## draw edges in varying edge widths
nx.draw_networkx_edges(G, pos, edgelist = elarge, width = 4, alpha = 0.5)
nx.draw_networkx_edges(G, pos, edgelist = emid,   width = 2, alpha = 0.5)
nx.draw_networkx_edges(G, pos, edgelist = esmall, width = 1, alpha = 0.5)

## select nodes by light side / dark side / other
dark_side = ["DARTH VADER", "MOTTI", "TARKIN"]
light_side = ["R2-D2", "CHEWBACCA", "C-3PO", "LUKE", "CAMIE", "BIGGS",
                "LEIA", "BERU", "OWEN", "OBI-WAN", "HAN", "DODONNA",
                "GOLD LEADER", "WEDGE", "RED LEADER", "RED TEN"]
other = ["GREEDO", "JABBA"]

## draw the nodes
nx.draw_networkx_nodes(G, pos, node_color = 'red', nodelist = dark_side)
nx.draw_networkx_nodes(G, pos, node_color = 'blue', nodelist = light_side)
nx.draw_networkx_nodes(G, pos, node_color = 'gray', nodelist = other)
nx.draw_networkx_labels(G, pos)

plt.axis('off')

In addition to graphing, we can create some network-level statistics which characterize the network. This includes *density*, which measures how many of the possible connections in this network have been made. If density equals 1, that would imply that everyone in the movie had a scene with everyone else.

In [None]:
nx.density(G)

Lastly, there are node-level statistics which characterize individual nodes. One of the more important one of these is *degree*, which means how many edges are connected to this particular node. Which nodes seem to have the highest degree?

In [None]:
nx.degree(G)