> **DO NOT EDIT IF INSIDE `tsds` folder** 

# Week 4: Networks 1


*Thursday, March 1st, 2018*

A network is a way to represent a system of things that interact or are interrelated in some fashion. It could just be your facebook network, that's obvious, but a network can also represent the interaction patterns of proteins in the human body, who votes what on whose bill in the US Senate, the mutually attractive forces between galaxies in the universe or just about anything else that you can possibly imagine. This week we are going to study how characters in the Marvel universe relate to each other using networks. In the exercises today you will:

* Get warmed up to thinking about systems as networks
* Learn to use the `NetworkX` library
* Explore the partnership network of the Marvel dataset

**Questions**: Outside of class, use [issue on GitHub](https://github.com/abjer/tsds/issues) for asking questions.

## Exercises

### Part 0: Networks

Questions for chapter 1 and 2 of [Network Science](http://barabasi.com/networksciencebook/).

> **Ex. 4.0.1**: List three different real networks and state the nodes and links for each of them.


> **Ex. 4.0.2**: What networks are you personally most interested in. Address the following questions:
>   * What are its nodes and links? 
>   * How large is it? 
>   * Can be mapped out? 
>   * Why do you care about it? 


> **Ex. 4.0.3**: In your view what would be the area where network science could have the biggest impact in the next decade? Explain your answer - and base it on the text in the book. 

> **Ex. 4.0.4**: Section 2.5 states that real networks are sparse. Can you think of a real network where each node has *many* connections? Is that network still sparse? If yes, can you explain why?

### Part 1: The `NetworkX` library

`NetworkX` is the Python library we will be using for working with networks. It should come installed with your Anaconda distribution, but you wont know how to use it yet. The best way to get familiar is to work through a tutorial. That's what the following exercises are about.

> **Ex. 4.1.1**: Go to the `NetworkX` project's [tutorial page](https://networkx.github.io/documentation/stable/tutorial/index.html). Read the first few sections about nodes, edges and attributes. Write an example below where you create a graph, `G`, and then:
1. add some nodes and edges to,
2. print its nodes and edges
3. delete a node, and
4. plot it.

> **Ex. 4.1.2**: `NetworkX` has lots of functions to generate synthetic networks. One of the most famous types of synthetic networks is called an Erdős–Rényi (ER) network.
1. Describe what an ER network is and how to create one.
2. On the [`NetworkX` documentations site](https://networkx.github.io/documentation/networkx-1.10/reference/generators.html), find a method you can use for generating an ER network. Create one with $N=100$ nodes and probability of edge creation $p=0.1$.
3. Plot the network
4. Visualize the [degree distribution](https://en.wikipedia.org/wiki/Degree_distribution) of your newly created ER graph.
    * A node's *degree* is the number of other nodes it is connected to.
    * The *degree distribution* is a histogram which shows the distribution of node degrees.

> **Ex. 4.1.3**: Another very famous type of synthetic network is the Barabási–Albert (BA) network.
1. Describe what a BA network is and how to create one.
2. On the [`NetworkX` documentations site](https://networkx.github.io/documentation/networkx-1.10/reference/generators.html), find a method you can use for generating a BA network. Create one with $N=100$ nodes and number of new edges from each new node $m=2$.
3. Plot the network
4. Visualize the [degree distribution](https://en.wikipedia.org/wiki/Degree_distribution) of your newly created BA graph.
5. Explain the main difference(s) you observe, when comparing to the ER network.
6. Are there other measures we could have used to describe the ER and BA networks further?

### Part 2: Marvel character partners

Now that you know how to build a network in `NetworkX`, visualize it and plot some statistics, you are ready to analyze todays dataset. We will be using the Wikipedia page markup string for all characters in the Marvel universe. The aim of the following exercises is to analyze the network of *partnerships* between characters.

The dataset and the code for retrieving it is in the [GitHub repo](https://github.com/abjer/tsds/tree/master/data/marvel_dataset).

> **Ex. 4.2.1**: Write a function called `get_partners` that takes a character name, and extracts a list of their partners. Print the partnerships of Iron Man. They should be `['War Machine', 'Pepper Potts', 'Riri Williams']`.

>*Hint: Use regular expressions to extract the partnerships. For a character, a partnerhsip is a link to another character. The links are formatted like `[[Name of Wikipedia page]]` or `[[Name of Wikipedia page|Name that appears in text]]`. Make sure to extract only the name of the wikipedia page. Try to solve this yourself, but if you get stuck completely, check [my solution](http://ix.io/U4A).*

> **Ex: 4.2.2**: Create a `networkx.Graph` that represents each partnership between characters, as an edge. Print the number of nodes, edges and average degree of the resulting network.

> **Ex: 4.2.3**: Plot the degree distribution of your character network. What type of random network does it resemble?

> **Ex: 4.2.4**: Create three bar charts, side-by-side, each with the 10 most high-degree characters one of the classes. Make sure that each subfigure has the same y-axis (control with `plt.ylim` or use `plt.subplots` with parameter `sharey=True`).

> **Ex: 4.2.5**: Do the same, but for the highest [betweenness centrality](https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.algorithms.centrality.betweenness_centrality.html) (BC). Again, make sure the y-axes are shared. When you comment on the result:
* Explain briefly what the BC measures (and why it takes a short while to compute it). An intuitive explanation is also great!
* Interpret, if you can, what the differences you see between the distribution of degree and BC means for the different classes.

> **Ex: 4.2.6**: Create yet another similar bar chart where instead you measure the [local clustering coefficient](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.clustering.html#networkx.algorithms.cluster.clustering) (CC).
* Explain briefly what the CC measures, and what it intuitively means that a node is highly clustered (CC=1) or not clustered at all (CC=0).
* Interpret the difference between classes.

> **Ex: 4.2.BONUS**: There seems to be some correlation between BC, degree and clustering, but it definitely doesn't look perfect. Try to make a scatter plot of characters' degree vs. their BC. If time permits, pimp your plot by:
* Prettifying your plot by:
    * Setting appropriate axis limits and labels
    * Fontsizes, figure title, etc.
    * Adding a slight amount of *jitter* to the degree
* Coloring nodes by class (hero: blue, villain: red, ambiguous: yellow)
* Scaling node sizes by the CC (add an offset so no nodes have size 0).

> Highlight your findings:
* Which characters have an unusually high centrality compared to their degree and vice versa? How can you explain this?
* Does this inspire you to think about new things that can be investigated?

> **Ex: 4.2.7**: We want to plot the network so we can visually get a feeling for how characters are connected. However, the network is too big to plot without some kind of pre-processing. Therefore, remove all links between nodes that are not on the top 10 list of high-degree characters within their class, and THEN plot the network.
* *Must*: Color the nodes by class. Try if you can to use blue for heroes, red for villains and yellow for ambiguous.
* *Optional*: Scale the size of nodes by the square root of their degree.
* *Optional*: Adjust the layout of the figure so it doesn't look too messy (e.g. edge line width, node line width, type of force-layout [I can recommend *neato*, but you have to install graphviz with conda to use it])
* *Must*: Comment on anything interesting you see. Get any new ideas for things that can be investigated?