In [1]:
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

The assessment will guide you to critically investigate the resilience of the London’s underground as a network and the methodological limitations. You will do this in two ways. In the first part, you will only take into consideration the infrastructural network, where stations are connected through only one link, regardless of the number of lines connecting the stations. In the second part, you will consider the commuting flows, and discuss the impact of the analysis on the number of people moving from one part of the city to another. Then, you will recompute the flows using spatial interaction models according to different scenarios described below and discuss the vulnerability of the network under these new scenarios.    
评估将引导你批判性地调查伦敦地铁作为一个网络的复原力和方法上的局限性。你将通过两种方式完成这一任务。在第一部分，你将只考虑基础设施网络，即车站只通过一个环节连接，而不考虑连接车站的线路数量。在第二部分，你将考虑通勤流，并讨论分析对从城市的一个部分移动到另一个部分的人数的影响。然后，你将根据下面描述的不同情况，使用空间互动模型重新计算流量，并讨论在这些新情况下网络的脆弱性。

# Part 1: London’s underground resilience

In [2]:
# data import
tube_station_network = pd.read_csv('./data/london_flows.csv')

tube_station_network.head(5)

Unnamed: 0,station_origin,station_destination,flows,population,jobs,distance
0,Abbey Road,Bank and Monument,0,599,78549,8131.525097
1,Abbey Road,Beckton,1,599,442,8510.121774
2,Abbey Road,Blackwall,3,599,665,3775.448872
3,Abbey Road,Canary Wharf,1,599,58772,5086.51422
4,Abbey Road,Canning Town,37,599,15428,2228.923167


## I. Topological network

### 1.1 Centrality measures

Select 3 centrality measures to characterise nodes, aiming at identifying the most important nodes in this particular network. Give the definition of each of the measures (including their equation), put the measures into the context of the underground, and why they will allow you to find the stations that are most crucial for the functioning of the underground. Compute the measures for your nodes in the network, and give the results in a table for the first 10 ranked nodes for each of the 3 measures.   
选择3个中心度量来描述节点的特征，目的是确定这个特定网络中最重要的节点。给出每个衡量标准的定义（包括它们的方程式），把这些衡量标准放在地下的背景中，以及为什么它们能让你找到对地下运作最关键的车站。计算网络中各节点的衡量标准，并在表格中给出3种衡量标准中排名前10的节点的结果。

In [3]:
# Create graph object
g_tube = nx.from_pandas_edgelist(tube_station_network, 
                                 source='station_origin',
                                 target='station_destination',
                                 edge_attr=['flows', 'population', 'jobs', 'distance'],
                                 create_using=nx.DiGraph())

#### Degree centrality

度中心性（Degree Centrality）是一种常用的中心性指标，用于衡量网络中节点的重要性或中心性。在地铁网络中，度中心性可以用来评估车站的交互性，即与其他车站直接连接的数目。源数据格式为常见的OD数据，因此属于有向图的度中心性计算。 在有向图中，节点的度中心性（Degree Centrality）有两种不同的度量方式，分别是入度中心性（In-Degree Centrality）和出度中心性（Out-Degree Centrality），用于衡量节点在网络中的重要性或中心性。以下是这两种度中心性的标准和定义的公式：   
1. 入度中心性（In-Degree Centrality）：入度中心性衡量了节点在网络中被其他节点直接连接的数目，即节点的入度数。节点的入度数越高，表示有更多的节点指向该节点，从而在网络中起着更重要的作用。
$$C_{d_{in}}(i) = \frac{k_{in}(i)}{N-1}$$
其中，$C_{d_{in}}(i)$ 表示节点 i 的入度中心性，$k_{in}(i)$ 表示节点 i 的入度数（即指向节点 i 的边的数目），$N$表示网络中节点的总数。    
1. 出度中心性（Out-Degree Centrality）： 出度中心性衡量了节点在网络中直接连接其他节点的数目，即节点的出度数。节点的出度数越高，表示有更多的节点由该节点发出，从而在网络中起着更重要的作用。
$$C_{d_{out}}(i) = \frac{k_{out}(i)}{N-1}$$    
其中，$C_{d_{out}}(i)$ 表示节点 i 的出度中心性，$k_{out}(i)$ 表示节点 i 的出度数（即由节点 i 发出的边的数目），$N$表示网络中节点的总数。

The degree centrality values are normalized by dividing by the maximum possible degree in a simple graph n-1 where n is the number of nodes in G.   

In [13]:
# Degree centrality
degree_centrality = nx.degree_centrality(g_tube)
degree_centrality_in = nx.in_degree_centrality(g_tube)
degree_centrality_out = nx.out_degree_centrality(g_tube)
# degree_centrality = degree_centrality_in + degree_centrality_out
# type(degree_centrality_in)
dc_tube = pd.DataFrame.from_dict(degree_centrality,columns=['degree_centrality'],orient='index').sort_values(by='degree_centrality', ascending=False)
dc_tube.head(10)

Unnamed: 0,degree_centrality
Stratford,1.841709
Highbury & Islington,1.570352
Whitechapel,1.537688
Canary Wharf,1.532663
Canada Water,1.527638
Bank and Monument,1.527638
Liverpool Street,1.522613
Canning Town,1.512563
West Brompton,1.502513
Richmond,1.48995


#### Closeness centrality

在有向图中，节点的接近中心性（Closeness Centrality）用于衡量节点与其他节点之间的距离，即节点在网络中的可达性和连接紧密度。接近中心性较高的节点通常在网络中能够更快地与其他节点进行信息传递和交流。接近中心性衡量了节点到其他节点的平均最短路径长度，即节点与其他节点之间的平均距离。节点的接近中心性越高，表示节点与其他节点之间的距离越短，连接越紧密。$$C_{c}(i) = \frac{N-1}{ {\textstyle \sum_{j=1}^{N}} d(i,j)} $$
其中，$C_c(i)$ 表示节点 i 的接近中心性，$d(i,j)$ 表示节点 i 到节点 j 的最短路径长度，$N$ 表示网络中节点的总数。

在nx包中，有向图的closeness计算的是incoming distanc，如果需要计算outcoming distance 需要使用G.reverse()。 但是在这个例子中，数据肯定是有向图了，但是两站之间的距离似乎是不变的，因此似乎 out和 in 的值是一样的。 需要后期进行处理下。 maybe?     
同时，nx中对上述公式进行了改进，使其更接近小的值
$$C_{WF}(u) = \frac{n-1}{N-1} \frac{n - 1}{\sum_{v=1}^{n-1} d(v, u)}$$
nx.closeness_centrality()方法默认使用$C_{WF}(u)$计算方式

In [17]:
# Closeness centrality
closeness_centrality = nx.closeness_centrality(g_tube, distance='distance') 
# type(closeness_centrality)
cc_tube = dc_tube = pd.DataFrame.from_dict(closeness_centrality,columns=['closeness_centrality'],orient='index').sort_values(by='closeness_centrality', ascending=False)
cc_tube.head(10)

Unnamed: 0,closeness_centrality
Holborn,7.9e-05
King's Cross St. Pancras,7.9e-05
Tottenham Court Road,7.9e-05
Oxford Circus,7.9e-05
Leicester Square,7.8e-05
Piccadilly Circus,7.8e-05
Charing Cross,7.8e-05
Chancery Lane,7.8e-05
Covent Garden,7.8e-05
Embankment,7.8e-05


#### Betweenness centrality

在有向图中，节点的中介中心性（Betweenness Centrality）用于衡量节点在网络中充当中介角色的程度，即节点在连接其他节点之间的最短路径上的频率。中介中心性较高的节点通常在网络中具有较强的信息传递和控制能力。中介中心性衡量了节点在网络中作为中介节点的程度，即节点在网络中连接其他节点之间的重要性。节点的中介中心性越高，表示节点在网络中充当中介角色的频率越高，对网络的连接和信息传递起着重要作用。
$$C_B(v) =\sum_{s,t \in V} \frac{\sigma(s, t|v)}{\sigma(s, t)}$$

where $V$ is the set of nodes.
$\sigma(s, t)$ is the number of shortest $(s, t)$-paths, and $\sigma(s, t|v)$ is the number of those paths passing through some node $v$ other than $s,t$.
if $s=t, \sigma(s, t)=1$ and if $v \in {s, t}, \sigma(s, t|v) = 0$



其中，$C_B(v)$ 表示节点 i 的中介中心性，$\sigma_{st}(i)$ 表示节点 i 作为中介节点时，节点 s 到节点 t 的最短路径经过节点 i 的数量，$\sigma_{st}$ 表示节点 s 到节点 t 的最短路径的总数。

In [15]:
# Betweenness centrality
betweenness_centrality = nx.betweenness_centrality(g_tube,normalized=True)
# type(betweenness_centrality)
bc_tube = pd.DataFrame.from_dict(betweenness_centrality,columns=['betweenness_centrality'],orient='index').sort_values(by='betweenness_centrality', ascending=False)
bc_tube.head(10)

Unnamed: 0,betweenness_centrality
Stratford,0.101882
Liverpool Street,0.035123
Canary Wharf,0.028367
Canning Town,0.028243
Bank and Monument,0.02798
West Ham,0.023558
Highbury & Islington,0.023021
Whitechapel,0.020857
Canada Water,0.019486
Shadwell,0.016622


### 1.2 Impact measures

Find 2 different measures to evaluate the impact of the node removal on the network. These need to be global measures referring to the whole network and not to specific nodes or links. Explain whether these two measures are specific to the London underground, or whether they could also be used to evaluate the resilience of any other network.   
找到2种不同的措施来评估删除节点对网络的影响。这些需要是指整个网络的全局性措施，而不是指具体的节点或链接。解释这两种措施是否专门针对伦敦地铁，或者它们是否也可用于评估任何其他网络的复原力。

### 1.3 Node removal

For each of the centrality measures selected in 1.- remove at least 10 nodes following two different strategies. A) Non-sequential removal: using your table in 1.- remove 1 node at a time following the rank in the table, i.e. from the most important one to the 10th most important one. After each removal, evaluate the impact of the removal using your two measures in 2.-, and proceed until you have removed at least 10 nodes. B) Sequential: remove the highest ranked node and evaluate the impact using the 2 measures. After removal, re-compute the centrality measure. Remove the highest ranked node in the new network and evaluate the impact. Continue until removing at least 10 nodes. Report the results of the 2 strategies in one plot, and critically discuss the following: which centrality measure reflects better the importance of a station for the functioning of the underground, which strategy is more effective at studying resilience, and which impact measure is better at assessing the damage after node removal. 

对于在1.中选择的每个中心度量，按照两种不同的策略至少删除10个节点。   
A) 非连续删除：使用你在1.-中的表格，按照表格中的排名一次删除一个节点，即从最重要的一个到第10个最重要的一个。每次删除后，使用2.-中的两个措施评估删除的影响，并继续进行，直到你删除了至少10个节点。   
B) 顺序：删除排名最高的节点，并使用2种措施评估其影响。删除后，重新计算中心性度量。删除新网络中排名最高的节点并评估其影响。继续下去，直到删除至少10个节点。    
在一张图中报告2种策略的结果，并批判性地讨论以下问题：哪种中心性度量能更好地反映一个车站对地下运行的重要性，哪种策略在研究复原力方面更有效，哪种影响度量能更好地评估节点移除后的损害。

## II. Flows: weighted network

In this section, you will include passengers into the underground, and assess whether different measures need to be used when flows are considered. The network to use in this section is the weighted network given to you in the coursework, where the flows of passengers were assigned to the links between stations.   
II.1. Consider the centrality measure derived in I.- indicating the most relevant stations for assessing the vulnerability of the underground. What would you need to do to adjust this measure for a weighted network? Recompute the ranking of the 10 most important nodes according to this adjusted measure. Do you find the same ones as in I.1?   
II.2. Now consider the measure for assessing the impact of node removal. Would you adjust the measure for a weighted network? If yes, how? Propose a different measure that would be better at assessing the impact of closing a station taking into consideration the passengers.   
II.3. Remove only the 3 highest ranked nodes according to the best performing centrality measure found in I.1. Evaluate the impact according to the 2 measures in II.2. Repeat the experiment for the highest 3 ranked nodes using the adjusted measure. Critically discuss which station closure will have the largest impact on passengers, referring to your measures and results.   

在本节中，你将把乘客纳入地下，并评估在考虑流量时是否需要使用不同的措施。本节中要使用的网络是课件中给你的加权网络，其中乘客的流量被分配到车站之间的链接。    
II.1. 考虑在I.-中得出的中心度量，表明与评估地铁的脆弱性最相关的车站。对于加权网络，你需要做什么来调整这个度量？根据这个调整后的措施，重新计算10个最重要的节点的排名。你是否发现与I.1中的相同？   
II.2. 现在考虑用于评估节点移除影响的措施。你会为一个加权网络调整该措施吗？如果是的话，如何调整？提出一个不同的措施，在考虑到乘客的情况下，能更好地评估关闭车站的影响。  
II.3. 根据I.1中发现的最佳表现的中心性度量，只删除排名最高的三个节点。根据II.2中的2个衡量标准评估其影响。使用调整后的措施对排名最高的3个节点进行重复实验。参照你的措施和结果，批判性地讨论哪一个车站关闭对乘客的影响最大。  

# Part 2: Spatial Interaction models

For this section, you will be given a “symbolic” population and the number of jobs for the stations in the underground. You will also be given the number of people that commute from one station to another, through an OD matrix.

在这一部分，你将得到一个 "象征性 "的人口和地下各站的工作数量。你还将通过OD矩阵得到从一个车站到另一个车站的通勤人数。

## III. Models and calibration

### III.1

Briefly introduce the spatial interaction models covered in the lectures using equations and defining the terms, taking particular care in explaining the role of the parameters.    
用方程和术语定义简要介绍讲座中涉及的空间互动模型，特别注意解释参数的作用。

### III.2

Using the information of population, jobs and flows, select a spatial interaction model and calibrate the parameter for the cost function (usually denoted as $\beta$). It is essential that you justify the model selected.    
利用人口、工作和流量的信息，选择一个空间互动模型，校准成本函数的参数（通常表示为$\beta$）。你必须证明所选模型的合理性。   

## IV. Scenarios

## IV.1

Scenario A: assume that Canary Wharf has a 50% decrease in jobs after Brexit. Using the calibrated parameter $\beta$, compute the new flows for scenario A. Make sure the number of commuters is conserved, and explain how you ensured this.   
情景A：假设金丝雀码头在英国脱欧后工作机会减少50%。使用校准的参数$beta$，计算方案A的新流量，确保通勤者的数量得到保留，并解释你如何确保这一点。

## IV.2

Scenario B: assume that there is a significant increase in the cost of transport. Select 2 values for the parameter in the cost function reflecting scenario B. Recompute the distribution of flows.   
情景B：假设运输成本有显著增加。为反映方案B的成本函数中的参数选择2个值，重新计算流量分布。  

## IV.3

Discuss how the flows change for the 3 different situations: scenario A, and scenario B with two selections of parameters. Which scenario would have more impact in the redistribution of flows? Explain and justify your answers using the results of the analysis.   
讨论3种不同情况下的流量变化：方案A和方案B的两种参数选择。哪种情况会对流量的重新分配产生更大的影响？用分析的结果来解释和证明你的答案。  