# Networks: structure, evolution & processes
**Internet Analytics - Lab 2**

---

**Group:** *H*

**Names:**

* *Antoine Basseto*
* *Andrea Pinto*
* *Jérémy Baffou*

---

#### Instructions

*This is a template for part 3 of the lab. Clearly write your answers, comments and interpretations in Markodown cells. Don't forget that you can add $\LaTeX$ equations in these cells. Feel free to add or remove any cell.*

*Please properly comment your code. Code readability will be considered for grading. To avoid long cells of codes in the notebook, you can also embed long python functions and classes in a separate module. Don’t forget to hand in your module if that is the case. In multiple exercises, you are required to come up with your own method to solve various problems. Be creative and clearly motivate and explain your methods. Creativity and clarity will be considered for grading.*

In [None]:
import numpy as np
import networkx as nx
import json
import epidemics_helper
import matplotlib.pyplot as plt
import pandas as pd
import part3_functions as part3

Before jumping into the exercices let's load our graph.

In [None]:
with open("../data/nyc_augmented_network.json", "r") as read_file:
    data = json.load(read_file)

In [None]:
nodes_list = list(map(lambda d : (d['id'], {'coordinates' : d['coordinates']}),data["nodes"]))
edges_list = list(map(lambda d : (d['source'],d['target']), data["links"]))

In [None]:
G = nx.Graph()
G.add_nodes_from(nodes_list)
G.add_edges_from(edges_list)

In [None]:
part3.draw_graph(G)

---

## 2.3 Epidemics

#### Exercise 2.9: Simulate an epidemic outbreak

In [None]:
SIR = epidemics_helper.SimulationSIR(G, beta=10.0, gamma=0.1)
SIR.launch_epidemic(source=23654, max_time=100.0)

In [None]:
time_stamps_status, nodes_status = part3.nodes_status_over_time(SIR, 100, [1,3,30])

In [None]:
part3.plot_population_status(nodes_status,percentage=True)

In [None]:
part3.epidemic_markers(nodes_status,0.6)

In [None]:
fig, axs = plt.subplots(len(time_stamps_status.values()), figsize=(400,500))
fig.suptitle('Epidemic evolution of the network')
i = 0
for stamps, status in time_stamps_status.items():
    axs[i].set_title("At day "+ str(stamps), fontsize=300)
    part3.draw_graph(G,nodes_status=status,ax=axs[i])
    i+=1

---

### 2.3.1 Stop the apocalypse!

#### Exercise 2.10: Strategy 1

In [None]:
part3.strategy_1_simulation(nodes_list,edges_list, sim_nb=2, draw=False)

Not really efficient as the curves are really similar to the initial case.

In [None]:
part3.strategy_1_simulation(nodes_list,edges_list,budget=10000)

#### Exercise 2.11: Strategy 2

Now we have to implement a strategy to maintain the epidemy under control. We thought about three of them:

- Reduce the "centrality" of the graph
- Isolate high degree node
- Partition the graph in communities


### High betweenness method

The idea behind this strategy is that if we cut the edges with the highest edge-betweenness (i.e. which are most present in shortest paths in the graph). Then it would take more time to reach any node and thus letting the time for the node to recover before containing a huge number of neighbours. For this we used the edge_betweenness_centrality function of the library networkx. It computes for every pair of nodes the shortest path between them using the Disjktra algorithm and then for each edges, compute the percentage of shortest paths that contain it. This is really computationnal costly (we used a good portion of the cluster for around 1 hour, so it may not be the first choice. Further more it produces better result that the random strategy but not outstanding ones.

In [None]:
max_centered_edges = part3.extract_max_centered_edges() #have been computed using the cluster and the function https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.edge_betweenness_centrality.html#networkx.algorithms.centrality.edge_betweenness_centrality

Here we can visualize the edges with a high betweenness, which contains many bridges which makes sense because they are needed to connect the graph. This is a good thing to cut them because it allows us to cut the graph in several connected components.

In [None]:
Graph_high_centrality = nx.nx.Graph()
Graph_high_centrality.add_nodes_from(nodes_list)
Graph_high_centrality.add_edges_from(max_centered_edges[:2500])
part3.draw_graph(Graph_high_centrality,title="View of high centrality edges",edge_width=10.0)

Now we can see the results of multiple simulation using our strategy of high betweenness:

In [None]:
part3.strategy_2_simulation(nodes_list, edges_list, max_centered_edges, sim_nb=2)

### Vaccination method (high degree node)

Another possible strategies (and more realistic one) is the vaccination. If you want to slow down the epidemy, the main target of the vaccination should be the people with a lot of contacts, i.e. high degree node (we suppose here that the vaccination avoid also to contaminate other people). This is not a really good idea in our situation as the cost to get rid of a high degree node is really high and thus we will spend a lot of edges on only a really small fraction of our graph and thus we find ourself in a position similar to the random case (maybe worse).

In [None]:
part3.strategy_3_simulation(nodes_list, edges_list, budget=1000)

We can effectively see that it is really similar to the random case anmd thus it is a bad strategy (but irl it is a great one!)

### Community method

The idea behind the last strategy is to separate the graph into strongly connected community and to isolate these communities from the rest of the network. Like this, an epidemy that would start in a community cannot extend to the rest of the network, preserving the vast majority of the population. We implemented it by using the Clauset-Newman-Moore greedy modularity maximization, present in the function greedy_modularity_communities of the library networkx (https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.community.modularity_max.greedy_modularity_communities.html). Once we have our communities we use method from the boudaries section of networkx to find the edges between every communities. Then we cut them. You can see a representation of the different communities and the edges between them. This method is computationally efficient and provide excellent result, this is the one we should use to fight the epidemy.

Note that we cut at max ~750 edges because this is the number of boudary edges. So incrementing the budget over 750 won't result in any changes. To optimize even more this strategy we could implement a reccurent function that works on the communities of the initial graph to split them into sub-communities, etc. We prefered exploring various strategy rather than optimizing even more this strategy that reaches already 97% of susceptible at day 30 in average.

In [None]:
communities = nx.algorithms.community.greedy_modularity_communities(G)

In [None]:
part3.draw_communities(G, communities, boundaries=True)

In [None]:
part3.strategy_4_simulation(nodes_list,edges_list)

## Cost/Effectiveness comparison

To have a clearer view of the real "value" of our strategy we provide a small graph of their cost-effectiveness

In [None]:
random = []
vaccination = []
betweenness = []
community = []
strategies = ["random","vaccination","betweenness","community"]
budgets = [100,200,300,400,500,600,700,800,900,1000,1500,2000,2500,5000,7500,10000]
for b in budgets:
    random.append(part3.compute_mean_susceptible(nodes_list, edges_list, b, "random"))
    vaccination.append(part3.compute_mean_susceptible(nodes_list, edges_list, b, "vaccination"))
    betweenness.append(part3.compute_mean_susceptible(nodes_list, edges_list, b, "betweenness"))
    community.append(part3.compute_mean_susceptible(nodes_list, edges_list, b, "community"))

In [None]:
fig, ax = plt.subplots()
ax.set_title("Comparison of cost/effectiveness for different strategies") 
ax.set_xlabel("budget")
ax.set_ylabel("% of Susceptible")
ax.plot(budgets, random, label="Random")
ax.plot(budgets, vaccination, label="Vaccination")
ax.plot(budgets, betweenness, label="Betweenness")
ax.plot(budgets, community, label="Community")
ax.legend()

We can see that the community strategy is growing exponentially reaching quickly around 97% of Susceptible. The betweenness one is slightly better than linear, but if we compare to its computationnal cost, it has a really bad cost-effectiveness. The vaccination and random strategy are sub-linear and thus pretty inneficient.