# Practice Session 09: Viral Propagation

In this session we will use [NetworkX](https://networkx.github.io/) for simulating propagations through a network.

We will use the [SFHH Conference Dataset](http://www.sociopatterns.org/datasets/sfhh-conference-data-set/), which approximately describes face to face interactions between 403 attendees to an academic conference in 2009 that took place in Nice, France. These 403 attendees agred to wear a badge containing an RFID tag and receiver which picked up signals from other RFID tags, and which allowed to log (anonymously) timestamps corresponding to being in close proximity of each other.

The dataset you will use is contained on the file `sfhh-conference-2009.csv` and for simplicity does not include the time variable that is present in the original dataset.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

# 1. The SFHH Conference Dataset

Below we provide code to load and do a default drawing of the graph. You can leave as-is.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

In [None]:
import io
import csv
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import random

In [None]:
INPUT_FILENAME = "sfhh-conference-2009.csv"

In [None]:
# LEAVE AS-IS

# Create a new undirected graph
g = nx.Graph()

with io.open(INPUT_FILENAME) as input_file:
    # Create a CSV reader for a comma-delimited file with a header
    reader = csv.DictReader(input_file, delimiter=',')
    
    # Iterate through records, each record is a dictionary
    for record in reader:
        
        # Add one edge per record
        g.add_edge(record['Source'], record['Target'])

In [None]:
# LEAVE AS-IS

# Create an empty figure; feel free to change size to accommodate to your screen
plt.figure(figsize=(20,10))

# Draw the graph
nx.draw_networkx(g)

The following code, which you can leave as-is, plots the degree distribution in this graph.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

In [None]:
# LEAVE AS-IS

def plot_degree_dist(graph):
    
    # Obtain the sequence of degree of nodes
    # Function graph.degree() return tuples (node,degree)
    degrees = [degree_tuple[1] for degree_tuple in graph.degree()]
    
    # Draw the histogram of the degree
    plt.hist(degrees, density=True, bins=20)
    plt.title("Degree distribution", fontdict={'fontsize': 'xx-large'})
    plt.xlabel("Degree $k$", fontdict={'fontsize': 'x-large'})
    plt.ylabel("Probability $p_k$", fontdict={'fontsize': 'x-large'})
    plt.show()
    
    # Print some degree statistics
    print("Degree: {:.1f} +/- {:.1f}, range [{:d}, {:d}]".format(
        np.mean(degrees), np.std(degrees), np.min(degrees), np.max(degrees)))
    
plot_degree_dist(g)

<font size="+1" color="red">Replace this cell by your answer to the following question: is this a scale-free network? Why or why not?</font>

# 2. Independent cascade model

Next, we will simulate the independent cascade propagation model. We will assume each edge has the same probability of transmission, *0 < beta < 1*.

Your algorithm should do the following:

1. Initialize an `infected` dictionary with every node having value `False`
1. Mark a starting node *u* as infected with value `True`
1. For each neighbor *v* of this node:
  * If the neighbor *v* is not infected:
    * Generate a random r number in [0, 1] using [random.uniform](https://docs.python.org/3/library/random.html)
    * If r is smaller than the probability of transmission of edge *(u,v)*:
      * Infect node *v*
1. Return the `infected` dictionary

Your code should look like this:

```python
def infect_recursive(graph, starting_node, beta, infected):
    # YOUR CODE HERE

def simulate_independent_cascade(graph, starting_node, beta):
    infected = dict([(node, False) for node in graph.nodes()])
    # YOUR CODE HERE
    return infected
```

Tip: use an auxiliary function `infect_recursive(graph, node, beta, infected)` that takes as input a *graph*, a *node* to be infected, the transmission probability *beta*, the dictionary *infected* and infects node *node* and tries to infect all the neighbors of node (`graph.neighbors(node)`)

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

<font size="+1" color="red">Replace this cell with your code implementing simulate_independent_cascade.</font>

Next, write function `simulate_multiple_independent_cascades(graph, beta, repetitions)` that takes as input a graph and a transmisibility parameter *beta*, and a number of trials *repetitions* and repeats *repetitions* times the following:

1. Picks a random node in the graph, using `random.choice(list(g.nodes()))`
1. Simulate an independent cascade starting from that node

The function should return the average **fraction** of infected nodes across the *repetitions* done. This is a number between 0.0 and 1.0, where 1.0 means that all *N* nodes were infected.

Tip: to get the number of `True` values in the values of a dictionary `d`, you can just use `sum(d.values())`.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

<font size="+1" color="red">Replace this cell with your code implementing simulate_multiple_independent_cascades.</font>

Test `simulate_multiple_independent_cascades(graph, beta, repetitions)` using the following code, which should print increasingly large numbers. The last number should be a 1.0.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

In [None]:
# LEAVE AS-IS

REPETITIONS=100
for beta in [0.01, 0.1, 0.2, 0.9, 1.0]:
    print("Beta={:.2f}; Fraction of infected={:.6f} (average of {:d} runs)".format(
        beta,
        simulate_multiple_independent_cascades(g, beta, REPETITIONS),
        REPETITIONS
    ))


Create a plot that should have in the x axis the transmission probability *beta* (from 0.0 to 0.5, in increments of 0.01), and in the y axis the expected fraction of infected nodes obtained by doing 100 repetitions. Remember to label both axis.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

<font size="+1" color="red">Replace this cell with your code for plotting the expected fraction of nodes that are infected as a function of the contagion probability. Remember to label both axes</font>

<font size="+1" color="red">Replace this cell with a brief commentary of what you observe on this plot. At which transmissibility *beta* do you start to notice that almost all nodes end up infected? Why does this happen?</font>

# 3. Reduce maximum degree

Create a function `graph_max_degree(graph, max_degree)` that returns a copy of *graph* in which no node has degree larger than *max_degree*. 

There are many ways of doing this, a relatively easy one is this:

1. Create an empty graph
1. Iterate through all nodes in the input graph, creating that node in the output graph
1. Obtain the list of all the edges of the input graph with `g.edges()`
1. Randomly shuffle that list of edges using `np.random.permutation()`
1. Add each edge *(u,v)* to the output graph as long as the degree of the *u* and the degree of *v* is smaller than max_degree

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

<font size="+1" color="red">Replace this cell with your code implementing graph_max_degree</font>

Test your code using the following function, which you should leave as-is. Note that if you repeat multiple times the average degree and/or the standard deviation might experience small changes.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

In [None]:
# LEAVE AS-IS

# Reduce the max degree of the graph
gmax = graph_max_degree(g, 20)

# If you notice nodes with degree larger than specified,
# it means your graph_max_degree function is deffective.
plot_degree_dist(gmax)

Now, let us assume beta has a constant value *BETA=0.2*, which we will use for the following experiments. That means that only 1 in 5 encounters between an infected and a susceptible person will produce an infection.

Create a graph that has in the x axis the maximum degree allowed, and on the y axis the fraction of infected nodes on the graph. Perform experiments setting the maximum degree to 2, 3, 4, 5, ..., 99, 100. Remember to repeat each experiment at least *REPETITION* times (100) and plotting the average. Remember to include labels in your plot's axes.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

<font size="+1" color="red">Replace this cell with your code for creating the requested plot</font>

<font size="+1" color="red">Replace this cell with a brief commentary with your observations on this plot. To what value would you set the maximum degree to ensure less than half of the people are infected? How does this compare to the previous plot when we used this value of beta but did not modify the graph?</font>

# 4. Random vs friendship paradox immunization

Finally, we will immunize some nodes. Immunized nodes cannot catch the infection. We will create a `immunity` dictionary in which keys are nodes and `immunity[node] = True` if and only if the *node* cannot be infected.

Write function `give_immunity_random(graph, fraction)` that returns a dictionary in which keys are the nodes in *graph* and `fraction * graph.number_of_nodes()` nodes are immune. Nodes to be immunized are selected uniformly at random.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

<font size="+1" color="red">Replace this cell with your code for give_immunity_random</font>

Test your code. Note that the fraction of immunized will be close but might not be exactly the requested fraction. It is OK if there is some small deviation (say, 1%) between what you obtain and what you request, but larger deviations are not acceptable.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

In [None]:
# LEAVE AS-IS

def test_immunize(graph, method, values):
    for fraction in values:
        immunity = method(g, fraction)
        number_immunized = sum(immunity.values())
        number_not_immunized = len(immunity) - number_immunized
        fraction_immunized = number_immunized / (number_immunized + number_not_immunized)
        print("Immunized {:.1f}% of nodes; got a dictionary with {:d} 'True' and {:d} 'False' values ({:.1f}% 'True')".format(
            fraction*100,
            number_immunized,
            number_not_immunized,
            fraction_immunized*100
        ))

test_immunize(g, give_immunity_random, [0.1, 0.3, 0.7])

Write function `give_immunity_random_friend(graph, fraction)` that returns a dictionary in which keys are the nodes in *graph* and `fraction * graph.number_of_nodes()` nodes are immune. Nodes to be immunized are the random neighbor of a randomly selected node.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

<font size="+1" color="red">Replace this cell with your code for give_immunity_random_friend</font>

Test your code using the same function as above. Remember that the fraction of immunized nodes should be within, say, 1% of the requested. If it is smaller, it means your code probably might be re-immunizing over and over the same nodes.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

In [None]:
# LEAVE AS-IS

test_immunize(g, give_immunity_random_friend, [0.1, 0.3, 0.7])

Write functions `simulate_independent_cascade_immune` and `infect_recursive_immune` as variations of your previous code that receive as extra parameter this dictionary.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

<font size="+1" color="red">Replace this cell with your code for simulate_independent_cascade_immune, and infect_recursive_immune</font>

Perform simulations in which you immunize a fraction of nodes *0.00, 0.05, 0.10, ..., 1.00*, repeating at least *REPETITIONS = 500* times each simulation, and plot a graph in which in the x axis is the fraction of immunized nodes and in the y axis the fraction of infected nodes. Assume a constant *BETA=0.03*.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

<font size="+1" color="red">Replace this cell with your code for performing the requested simulations and displaying the requested plot. Remember to add labels to the x-axis and y-axis of your plot, to include both immunization strategies in the same plot, and to include a **legend** for your plot.</font>

<font size="+1" color="red">Replace this cell with a brief commentary on the result of these simulations. Do you see some differences in the performance of the methods? Describe the similarities and differences. Try to provide some explanation of what you observe.</font>

# 5. Limitations

<font size="+1" color="red">Replace this cell with a brief commentary on what are the *limitations* of these simulations. Which are some of the many reasons why **one cannot obtain general conclusions about how to deal with a pandemic from this exercise**?</font>

# Deliver (individually)

A .zip file containing:

* This notebook.


## Extra points available

For extra points and extra learning (+2, so your maximum grade can be a 12 in this assignment), include one more immunization strategy: "targeted immunization", in which the people who receive the vaccine are the people with the largest degree or the people with the largest betweenness. Plot one graph of percentage immunized vs fraction infected comparing the three strategies: random immunization, random friend immunization, and targeted immunization. Add a brief commentary with your conclusions.

**Note:** if you go for the extra points, add ``<font size="+2" color="blue">Additional results: targeted immunization</font>`` at the top of your notebook.

<font size="-1" color="gray">(Remove this cell when delivering.)</font>

<font size="+2" color="#003300">I hereby declare that, except for the code provided by the course instructors, all of my code, report, and figures were produced by myself.</font>