# <font face="times"><font size="6pt"><p style = 'text-align: center;'> The City University of New York, Queens College

<font face="times"><font size="6pt"><p style = 'text-align: center;'><b>Introduction to Computational Social Science</b><br/><br/>

<p style = 'text-align: center;'><font face="times"><b>Lesson 12 |  Social Network Analysis I <br/><br/>

<p style = 'text-align: center;'><font face="times"><b>6 Checkpoints<br/><br/>

***
***

# Begin Lesson 12
## Social Network Analysis

A network is simply a set of relations between objects which could be people, organizations, nations, items found in a Google search, brain cells, or electrical transformers. A network contains a set of objects--called nodes--and relations between them, called edges or links. We live surrounded by social networks, but usually cannot see more than one step beyond the people we are directly connected to, if that. (It is like being stuck in a traffic jam surrounded by cars and trucks.) Social network theory is one of the few theories in social science that can be applied to a variety of levels of analysis from small groups and organizations to nations and international and global systems. 

***
![Neuron](Images/13_Network_Example_1.png) 
***
![Neuron](Images/13_Network_Example_2.png) 
***

In this Notebook, we're going to learn how to use the `NetworkX` module. Uncomment the code below and update to the latest version of `NetworkX`. 

In [None]:
!pip3.6 install --user networkx

**Note:** The current version of NetworkX has some incapatabilities with Python3.6 (and greater). Until these are resolved, running NetworkX will throw a lot of annoying errors. To sidesteps these, we'll import a module that "mutes" these warnings so they won't bother us. 

**Note \#2**: For this reason, it's worth pointing out that the suite of tools in R likely outpaces that in Python. Since our course is built around Python, we'll stick it out here. But if you find this interesting and want to follow up more, R might be worth a look.

In [None]:
import warnings
warnings.filterwarnings('ignore')

# Nodes and Edges: How do we represent relationships between individuals using NetworkX?

Networks, also known as graphs, are comprised of individual entities and their relationships to one another. The technical term for these are, respectively:

- Individual entities > nodes (or vertices)
- Relationships > edges (or ties, or connections)

Some of the lingo here is borrowing from the overlap between graph theory and geometry, just in case that helps make it more intuitive for you: the four sides of a square are edges, and the point where two edges connect is called a vertex. 

When we a network we typically use circles to represent nodes and lines to indicate edges.

![Neuron](Images/13_Network_Image_01.png) 

NetworkX is a python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

In [None]:
import networkx as nx
import matplotlib.pyplot as plt

# Creating a Simple Network: The Dyad

In the `networkx` implementation, graph objects store their data in dictionaries. 

Nodes are part of the attribute `Graph.node`, which is a dictionary where the key is the node ID and the values are a dictionary of attributes. 

Edges are part of the attribute `Graph.edge`, which is a nested dictionary. Data are accessed as such: `G.edge[node1][node2]['attr_name']`.

Because of the dictionary implementation of the graph, nodes can be strings and tuples, but not lists and sets.

Let's consider the same network shown above, but now let's add more data about both the node and the edge. Let's consider a friendship between Hugo and Eric. Both Hugo and Eric have unique IDs associated to their nodes, as well as their ages. We also have data on the edge that connects them, namely the start of their friendship. 

![Neuron](Images/13_Network_Image_02.png) 

In `NetworkX`, let's start by creating an empty network (or graph) called G. We can grow this network in one of several ways. 

In [None]:
G = nx.Graph()

First, let's add the two nodes for Hugo and Eric using the `.add_nodes_from` method. There is also an `.add_node` method, but that only lets you add one node at a time.

We do so by simply adding a list of nodes for nodes 1 and 2 to this method. 

In [None]:
G.add_nodes_from([1, 2]) 

Let's see if this worked. Let's check out the nodes that are now stored in G. 

In [None]:
G.nodes()

Great, now the nodes are stored in G. However, we don't have anything connecting them. In other words, there isn't an edge between them. So let's add an edge between nodes 1 and 2. 

In [None]:
G.add_edge(1, 2)

Just like before, let's double-check and take a look at the edge that connects the two nodes. 

In [None]:
G.edges()

Notice that this is a list `[]` that represents the edge, and that edge contains a tuple `()` denoting a connection between nodes 1 and 2. In other words, there is only one edge that links nodes 1 and 2 together.

Be aware that you can make graphs where two nodes are connected by multiple edges. Sometimes this is interpreted as a "weight," indicating how important an edge is compared to another, but in other situations there are simply multiple ways to connect two nodes. That redundancy can be important to certain phenomena (like the robustness of a network, for instance).

---

Now, let's add some of the extra data we have about the nodes. Let's add their age. This is easy to do. All we have to do is use the `.node` method. Recall that `NetworkX` graphs work like `dictionaries`. So we first access the node using the node id (e.g., 1 or 2), then we can add attributes to the node by assigning both the type of attribute and its value. 

For instance, let's add their ages for nodes 1 and 2, which is 34 and 29, respectively. 

In [None]:
G.node[1]['Age'] = 34
G.node[2]['Age'] = 29

Great. Now we've added information on their ages. However, we can also add their names too. Let's follow the same procedure as before and add their names as node attributes. 

Note: We're going to add their names as `strings`, you can add whatever data about these nodes that you like.

In [None]:
G.node[1]['Name'] = "Hugo"
G.node[2]['Name'] = "Eric"

Now, let's check it out to see what G looks like now. 

In [None]:
G.nodes(data=True)

Neat, now let's see what this plot looks like. To do so we're going to use the `draw_networkx` function from `NetworkX`. Let's pass in our network `G` into `draw_networkx()`. 

In [None]:
nx.draw(G)
plt.show()

Okay, not bad. But we want to be able to identify each node, and perhaps to change the size of the nodes. 

In [None]:
nx.draw(G, 
        node_size=600, #The size of the node
        font_color = 'white', #The color of the font of the node's label
        font_size = 25, #The size of the font used for the node's label
        node_color='red', #The color of the node
        width = 5, #The width of the edge
        with_labels=True) #Display the node IDs
plt.show()

***
***

# Checkpoint 1 of 6
## Now you try! 

### Create a new network object. Call it `V`. Just as you did before, create a simple dyad. 
### Here, create new attributes for the node that aren't age or name. Be creative! 
### Finally, plot your graph `V`. 

***
***

***
***

# Triads and Directed Networks

This above dyad examples are neat, but they aren't very useful for us. Let's complicate the previous example in two ways. We're going to first add a third node. This setup is called a triad, or a network that contains three components. It is actually the most unstable social unit (lookup "Balance Theory"). 

Next, we're going to make the edge directed. In real life relationships are not always reciprocated. Love triangles are only problematic because each person in the network loves a person that *does not* love them back. Friendship, by contrast, is mutual. (To be technical about it, this is not always the case--as I may consider you a friend, but you don't consider me a friend--but for our purposes here, we'll simplify things a bit.)

The direction of an edge captures exactly these kinds of complexities. If I send a text message to you, that's edge will point for me to you in a network made up of text communications. If you also send one back to me, then the edge would be directed both ways. 

This is a bit abstract to think about, so consider the images below. Look at the network on the right. Let's say it's a network of three friends--A, B, and C--who just sent text messages to each other. Friend A sent B a text message, and both A and C sent text messages to each other (hence, the double-headed arrow).  

We can represent the network on the right in a data format we can use for our purposes. This is called a sociomatrix or adjacency matrix. By convention, the rows are the nodes that are **sending** a text message, and the columns are the nodes that are **receiving** a text message. 

You'll notice that the cells in the matrix are blacked out for when a node (in the rows) send to another node in the network (in the columns). In an adjacency matrix black cells would be represented by a "1" (for presence) and the white cells would be represented by a "0" (for absense).

![Neuron](Images/13_Network_Image_04_Directed.png) 

***
***

## Representing Networks as Data Structures

We can represent this network as data in one of several ways, both mathematically and as `python` data structures. There are three main ways with which to represent both nodes and their connections. Here, I show them, along with a way we can implement them in `python`. (Although, there are many ways of doing this in `python`.)

First, we can represent a network as an **adjacency matrix**, which is the $n x n$ matrix that we previously saw. We can save it as a list made up of other lists (or even better, using a numpy matrix):

        0  1  1        G = [[0, 1, 1],
        0  0  0             [0, 0, 1],
        1  0  0             [1, 0, 0]]

We can also use an **adjacency list**, which is a list of each edge "sent" by every node in the network:

        A: B, C        G = {'A': ['B', 'C'],
        C: A                'C': ['A']} 

Finally, we can use an **edge list**, which is a list of edges, of the sender (in the first column) to the receiver in the second column:

        A B            G = [['A', 'B'],
        A C                 ['A', 'C'],
        C A                 ['C', 'A']]
        
You will often see large networks stored in this format, since otherwise the data becomes unwieldy very quickly.

***
***

## Creating and Plotting Graphs

So far we've only been looking at networks with only a few nodes (e.g., dyad and triad). Before we start to get into network data with multiple nodes sharing multiple types of edges, let's first take a quick aside and talk about visualing networks. 

There are lots of different ways to layout a network. How the network is actually visualized has little bearing on the features that we're interested in, but it is nice to see it displayed in a simple way. 

To explore this, let's create a random network. (Random networks are actually a huge area of research for network scientists, but for our purposes, we're just going to use one example  to illustrate different ways to visualize networks.) Let's create a famous random network called an erdos-renyi graph. (Again, what this network is isn't really important for our purposes here, so you don't need to actually need to know what this network generation code is doing.)

In [None]:
g_random = nx.erdos_renyi_graph(50,0.15) # Creates a random graph with 50 nodes. 

Let's try plotting it with some "idealized" layout that tries to spread the network out as much as possible, so that it doesn't looked too bunched up. (Although, in this case, it's hard to avoid.) Notice that each time you run this code it produces something different. 

In [None]:
nx.draw(g_random)
plt.show()

Not bad. Now, let's try plotting it in some random layout using the `draw_random()` function from `NetworkX`. 

In [None]:
nx.draw_random(g_random)
plt.show()

Let's try plotting the nodes in a circle using the `draw_circular` function.

In [None]:
nx.draw_circular(g_random)
plt.show()

Finally, let's try the `draw_spectral` function that lays out the network in a "spectral" formation (i.e., like a star).

In [None]:
nx.draw_spectral(g_random)
plt.show()

---
Sidenote about random networks:

We mentioned above that random networks are a major interest for researchers in the area, but it might not be immediately obvious to you why that should be. Random networks aren't all that interesting at first blush.

What they are *extremely* useful for, however, is for developing and testing theories about how networks form, and how that shapes their characteristics. You want to study friendship networks at a social network like Facebook, LinkedIn, or even Twitter? Then you need to know how to predict what friendships will form in the future, even if they don't exist yet. Random networks help to pin down the forces that are operating in real world networks, by comparison. As such they are a valuable piece of a data scientist's toolkit.

***
***

# Checkpoint 2 of 6
## Now you try!

### Create your own random graph using the `erdos_renyi_graph()` function and pick one of the above plotting algorithims to plot it. 

***
***

# Creating and Plotting Directed, Weighted, and Multiplex Edges in Networks

So far, we've only discussed one type of edge. However, there are many different kinds of relationships that we can represent with an edge. For instance:

- a friendship edge might represent a relationship shared between two people (although friendship can be surprisingly one-sided, especially in adolescent networks of frienships),
- but it can also represent sending/receiving something (e.g., sending and receiving a text).
- The edge can also carry a weight (e.g., how many times did you text someone and how many texts did you receive.), as we mentioned above.

The figure below highlights the different forms that edges in a network can take:

- We've covered undirected and directed, so far.
- The figure also includes an illustration of weighted edges (in other words, the "intensity" of the relationship or transaction can vary between nodes, like how people who send a lot of text messages would have a heavier weight to their edge as compared to people who texted one another infrequently).
- And you can also see the related idea of multiplex edges, where multiple edges can represent the relationship between two nodes, with slight variations. For instance, imagine if we created a network where we not only measured how often people texted, but also how often they call one another, send messages on social media, etc. Each of these relationships are distinct edges in the network. These edges can either be undirected or directed, and they can also have weight. 

![Neuron](Images/13_Network_Image_05_Edge_Types.png) 

***
***

## Defining and Plotting Directed Networks

We'll put some of this into practice now. First, make a graph that contains directed edges (we call this, incidentally, a "directed graph") using the function `nx.DiGraph()`.

In [None]:
G_Directed = nx.DiGraph() # Directed Graph

Now, let's add the edges from A to B; A to C; and from C to A.

In [None]:
G_Directed.add_edges_from([('A', 'B'), ('A', 'C'), ('C', 'A')])

Finally, let's plot it out, just like we did before. However, let's add `arrows` as a parameter and set it to `True` so that the edges show directionality between nodes. 

In [None]:
nx.draw(G_Directed,
        font_color = 'white',
        font_size = 25,
        node_size=600,
        arrows=True,#Lets make these edges appear as arrows
        width = 5,
        node_color='red', 
        with_labels=True)
plt.show()

***
***

# Checkpoint 3 of 6
## Now you try!

### Create a directed graph. Let's call it `R`.  Your graph will have four nodes: `A`, `B`, `C`, and `D`. 

### Here, `A` is directed to `C`; `D` is directed at `C`; `C` is directed at `B` and `A`; and `D` is directed at `A`. 

### With this information, construct `R` and plot it. 

***
***

***
***

## Defining and Plotting Directed and Weighted Networks

Now, let's created a directed weighted network between four nodes, following the image below: 

![Neuron](Images/13_Network_Image_06_Network_Directed_Weighted_Example.png) 

We can easily replicate this using the `.add_weighted_edges_from` method associated with G. This is done just as we did before adding the tuples that captured sender (first value in the tuple) and receiver (second value in the tuple). The only difference is including the weight, which is the third value in the tuple. Without including this value, `Networkx` defaults to unweighted ties.  

In [None]:
G_Directed_Weighted = nx.DiGraph() # Directed Graph

In [None]:
G_Directed_Weighted.add_weighted_edges_from([(1, 2, 1.25), (1, 3, 7.5), (2 ,4 , 12), (3 ,4 , 9.5)])

Now, let's pass in some information to `nx.draw`, specifically the weights of the edges. To do so, let's use lists and create a `for loop` to go through each edge and save the weights. Let's call this list `edge_weight`. We'll then pass it in to `nx.draw`. 

In [None]:
edge_width = [G_Directed_Weighted[u][v]['weight'] for u,v in G_Directed_Weighted.edges()]

In [None]:
nx.draw(G_Directed_Weighted,
        font_color = 'white',
        font_size = 25,
        node_size=600,
        arrows=True,
        width=edge_width,#Lets pass in the list of weights for each of the edges. 
        node_color='red', 
        with_labels=True)
plt.show()

Notice, that edges are directed and the weights of the edges are thicker corresponding to the values we passed in. (I.e., the edge between node 2 and 4 is the thickest, while the edge between node 1 and 2 is the thinnest, both corresponding to their relative edge weights.)

Now, try running the plot code again. Notice that each time you run the above code, the network changes shape. The actual layout of the network doesn't matter, but we can organize the nodes and edges using various algorithms that help to optimize its visual layout. 

Here, let's try the spring layout, using the `NetworkX` layout algorithm called `spring_layout`. 

In [None]:
pos=nx.spring_layout(G_Directed_Weighted) # This returns the positions for all nodes

In [None]:
nx.draw(G_Directed_Weighted,
        pos,# Pass in the "spring layout" here as the second parameter. 
        font_color = 'white',
        font_size = 25,
        node_size=600,
        arrows=True,
        width=edge_width,
        node_color='red', 
        with_labels=True)
plt.show()

***
***

# Checkpoint 4 of 6
## Now you try!

### Take your previous graph `R` and now add weights to these directed links. (Pick any values for the weights you'd like.) Let's call it `R_Weighted`. 

### Use any plotting layout (e.g., spring, etc.) and plot `R_Weighted`. 

***
***

***
***

## Reading in Data and Creating a Network

Recall, the edgelist format represents edge pairings in the first two columns. So a text file containing network data will have:

> Node 1, Node 2
> Node 2, Node 3

...and so on until it has specified all the edges. If there are edge weights, we add an element to each row, like this:

> Node 1, Node 2, 4
> Node 2, Node 3, 1.2

Let's look at the file called `G_Edgelist.txt` by reading it in using `NetworkX`.

In [None]:
G_from_TextEdgelist = nx.read_edgelist('Data/G_Edgelist.txt', 
                      data=[('Weight', int)]) #This sets the third column as the edge weights (set an an integer)

In [None]:
G_from_TextEdgelist.edges(data=True)

From the first row, we can see the edge between nodes `0` and `1`, has a weight of `4`.

Using `read_edgelist` and passing in a list of tuples with the name and type of each edge attribute will create a graph with our desired edge attributes.

Graphs can also be created from `pandas` `DataFrames` if they are in edge list format.

In [None]:
import pandas as pd

In [None]:
G_df = pd.read_csv('G_Edgelist.txt', 
                   delim_whitespace=True, # The columns are seperated by a single whitespace
                   header=None, # The first row in the file is data, so we don't have column names (hence set to None)
                   names=['n1', 'n2', 'weight']) # Add column names, the sender (n1), the receiver (n2), andt the weight.

Let's see if it imported in properly. 

In [None]:
G_df

Lucky for us, we can also transform a `pandas` `DataFrame` into a `NetworkX` graph using the method `from_pandas_dataframe`. 

In [None]:
G_from_Pandas_df = nx.from_pandas_edgelist(G_df, # The DataFrame
                              'n1', # The sender column
                              'n2', # The receiver column
                              edge_attr='weight') # Column with the weight
G_from_Pandas_df.edges(data=True)

Let's take these data and plot it, just like we did above. 

First, let's get the edge weights for our visualization and let's use the spring layout, just as we did before in the previous example. 

In [None]:
edge_width_G_from_Pandas_df = [G_from_Pandas_df[u][v]['weight'] for u,v in G_from_Pandas_df.edges()] # Edge weights to be passed in for the visualization

In [None]:
pos_G_from_Pandas_df = nx.spring_layout(G_from_Pandas_df) # Spring layout to be passed in for the visualization

In [None]:
nx.draw(G_from_Pandas_df,
        pos_G_from_Pandas_df,
        font_color = 'white',
        font_size = 25,
        node_size=600,
        arrows=True,
        width=edge_width_G_from_Pandas_df,#Lets pass in the list of weights for each of the edges. 
        node_color='red', 
        with_labels=True)
plt.show()

***
***

# Checkpoint 5 of 6
## Now you try!

### We have data on who talks to whom (and how often) in Star Wars movies. You can read this file in as:

`Star_Wars_df = pd.read_csv("Data/star-wars-network-edges.csv")`

### use `.head()` to see the names of the columns, which consists of a source, a target, and a weight. 

### Repeat the steps above, turn the `DataFrame` into a `NetworkX` edgelist, and finally plot this network. 

### Call your network `SW_Graph`. 

***
***

***
***

## Defining a Multiplex Network

Now, let's create a multiplex directed network. Unfortunately, NetworkX performs badly with drawing multigraph (MultiGraph and MultiDiGraph) as it does not display distinct edges, not to mention to label edges separately.

This is actually okay, as what's really important with social network analysis isn't the visualizations, but understanding the underlying structure. 

First, let's define a multiplex network that is directed. 

In [None]:
#MG = nx.MultiGraph() # Note: You would use this if our network didn't have edge weights.

In [None]:
MG = nx.MultiDiGraph()

Now, let's add both weights and different ties. The weights worked just like they did before, with a list of tuples, where the first value is the sender, the second is the receiver, and the third is the weight. However, here we can include that same edge multiple times, each representing a different type of relationship (e.g., texting versus calling someone on the phone) with its own unique weight (e.g., the number of times you text versus the number of times you called). 

In [None]:
MG.add_weighted_edges_from([(1, 2, 5), # Notice that the edge from 1 to 2 now has two weights
                            (1, 2, 1), 
                            (2, 3, 2), # Same here for the edge from 2 to 3
                            (2, 3, 6), 
                            (1, 4, 10), # And from 1 to 4
                            (1, 4, 3.25), 
                            (4, 1, 8)]) # But 4 to 1  only has one type. 

***
***

## Degree Measures 

Now let's introduce one of the central measures in networks: degree. It simiply is the number of edges that are connected to a node. This has a lot of applications ranging from simple descriptive statistics (e.g., how many texts did this sender send in the network can be calculated in a "text-sending" network by just counting the ties for each node) to more structural inferences (e.g., why do some nodes have all of the connections and most others so few?). In general we interpret it as a measure of importance---though it's not the only one.

Let's use the multiplex network to explore this. 

To calculate the degree value for each node, you simply just need to use the `.degree()` method associated to a `NetworkX` object. Let' see what it is for `MG`.

In [None]:
MG.degree()

Notice here that Node 1 has a degree of five. Why is that the case? Look above at the edgelist for `MG`. Node 1 has two types of edges to Node 2 (1->2 and 1->2); three types of edges connections with Node 4: two outbound to 4 and one inbound from 4 (1->4, 1->4, and 4->1). This sums to five total edges. 

The same logic holds for Node 2, which receives two different edges from 1 and sends two different types of edges to Node 3, which makes for a total degree of 4. 

We can also test this for a singular node by passing in that node into `.degree()`. Let's see what this matches up with what we got above by passing in the degree for Node 2. 

In [None]:
MG.degree(2) 

However, these edges are weighted. Not a problem. We can count the total degree for each node using the edge weights. Let's consider Node 1, which has a total edge weight of 5 (1->2), 1 (1->2), 10 (1->4), 3.25 (1->4), and 8 (4->1), which sums to 27.25. 

This logic is the same for the other nodes. (Test it out for yourself!)

In [None]:
MG.degree(weight='weight')

We can also just return the degree of a single node, just like we did before, but also get its degree with edge weight, as well. 

In [None]:
MG.degree(2,weight='weight') 

For edges and networks that have direction, we can calculate all of the "outbound" edges, called it's "out degree." This may be helpful for many different applications (e.g., how many times did you send a text and not receive one?). 

Let's first calculate the outdegree for MG without weights, using `.out_degree()`.

What do you see? How is it different that just `.degree()`? 

Focus on Node 1. It has a value of 4 and not 5. That's because we're only looking at out bound edges: (1->2, 1->2, 1->4, and 1->4). Hence, we don't count (4 -> 1), because that is an edge "sent" from 4 to 1 and "received" by 1. 

The same logic holds for Node 2, which counts two outbound edges (2->3 and 2->3) but none of its "received" edges (1->2 and 1->2). 

In [None]:
MG.out_degree()

Let's flip the script. Now let's count the inbound ties by using the "in degree." This works in the exact same way as for `.out_degree()`, but not looks at how many edges are "received" by the node. The only change we make is to instead use `.in_degree()`.

In [None]:
MG.in_degree()

Finally, we can add the weights to both the "out degree" edges and "in degree" edges, just as we did before with just degree. See how this matches up with what you'd expect, where "out degree" with weights is simply the addition of all of the weights on "outbound" edges, and vice-versa for "in degree."  

In [None]:
MG.out_degree(weight='weight')

In [None]:
MG.in_degree(weight='weight')

We can also calcualte these above "in degree" and "out degree" measures with weights for individual nodes. Let's again test this out with Node 2. 

In [None]:
MG.out_degree(2,weight='weight')

In [None]:
MG.in_degree(2,weight='weight')

***
***

# Checkpoint 6 of 6
## Now you try!

### Recall the previous graph you made with the Star Wars data: `SW_Graph`.

### Calculate the in-degree and out-degree with and without weights, for a total of four values. Why are they different?

***
***

***
***

## Connecting Network Measures with Real World Data

Let's apply some of these basic network measures to some real data to see its significance. We're going to use read in a dataset on drug usage in Hartford, Connecticut. These data come from this paper: https://link.springer.com/article/10.1023%2FA%3A1015457400897

***This paper describes the process used to document and analyze drug user social networks in Hartford and the implications of network relationships for HIV risk and promoting prevention within high-risk drug-use sites. They present characteristics of study participants regarding their personal networks, ties that link these individuals to
each other, and the larger connected network evident through mapping these ties among participants.***

First, let's read it in. 

In [None]:
hartford = nx.read_edgelist('Data/hartford_drug.txt', # Text file saved as an edgelist with a sender, receive column
                            create_using=nx.DiGraph(), # Set it up as a "Directed" graph
                            nodetype=int) # The nodes don't have labels, they are integers used to mask people's actually names. 

Let's first calculate the number of nodes in the network and the total number of edges. From these values we can calculate the average distance in the network. This of average distance as the average number of hops (e.g., edges) between one node and every other node.

Use `.order()` and `.size()` to get the number of nodes `N` and edges `K`

In [None]:
N,K = hartford.order(), hartford.size() 

In [None]:
print("Nodes: ", N)
print("Edges: ", K)
print("Average degree: ", K/N) # Calcualte the average distance by dividing N by K

Okay, now let's delve a bit deeper into these numbers, as averages can often hide hidden trends. First, let's calculate the in degree and out degree of the network, and let's save each of these values respectively. 

In both cases, we can use the `in_degree()` and `out_degree()` methods of the `NetworkX` object that we just defined (called `hartford`) and save them. They'll both save as `dictionaries` with nodes as the `key` value and its respective `degree` as its value pair. 

In [None]:
in_degrees = hartford.in_degree() # dictionary node:degree

In [None]:
out_degrees = hartford.out_degree() # dictionary node:deg


Now, let's just get the degree values and save it as a list (i.e., let's just get the degree values from the dictionaries and ignore the node names). 

Let's make this list just have unique values and sort these values in order using `sorted()`. In other words, we don't want repeated values, because we're just interested in looking at what are the in degrees and out degrees that exist in the network, not how often they occur or which nodes they're associated to. 

In [None]:
in_values = [pair[1] for pair in in_degrees]
out_values = [pair[1] for pair in out_degrees]
in_unique = sorted(set([pair[1] for pair in in_degrees]))
out_unique = sorted(set([pair[1] for pair in out_degrees]))

Let's take a look at these values in both of these lists to see what the values are for in degree and out degree.

In [None]:
in_unique

In [None]:
out_unique

 It looks like there's more variance for the in degree values than the out degree values. 

Now, for each in-degree and out-degree value in these lists, let's count the number of nodes that have this value. In other words, how many nodes have an out degree of 0 or 1 or 2 or 3, etc. and many nodes have in degree of 0 or 1 or 2, etc. 

In [None]:
in_hist = [in_values.count(x) for x in in_unique]
out_hist = [out_values.count(x) for x in out_unique]

Finally, let's create a histogram, where we count on the x-axis both the in degree and out degree measures, and the y-axis is the frequency count of nodes that have this particular in degree or out degree value. Take a look at this unique distribution. 

In [None]:
in_hist

In [None]:
out_hist

In [None]:
plt.plot(in_unique,in_hist,'ro-') # in-degree
plt.plot(out_unique,out_hist,'bv-') # out-degree
plt.legend(['In-degree','Out-degree'])
plt.xlabel('Degree')
plt.ylabel('Number of nodes')
plt.title('Hartford drug users network')
plt.show()

In [None]:
# Recall that if you just want a histogram of a single measure you can do:
# plt.hist(in_values)
# plt.show()

Look at the plot above. What strikes you about it? How does the in degree and out degree differ?