# Rush Hour Dynamics: Using Python to Study the London Underground
#### Camilla Montonen
#### PyData Paris 2015

<img src="londontube.png" height=300 width=300>

# Introduction
<img src="introimage.png" height=300 width=400>



#Background

* Bryn Mawr College 2013 
* University of Edinburgh 2014
* Currently working as in QA at Caplin Systems Ltd.
* Member of Pyladies London and Women in Data. If you're ever in London, please drop in to one of our meetups!


# Roadmap

1. Motivation: Why would you want to analyse the London Underground?! Commuting on it is bad enough.
2. Data collection: The Challenge of Collecting Data Stored in a Map 
3. Data analysis: Leveraging graph-tool to analyse the London Underground
4. Simulations: Creating simulations using Bokeh

# There are interesting data problems everywhere...

* Python gives you the tools, but you have to ask the questions!

<img src="fancy_graph_arf.png">

# Back in August 2014...

<img src="train.gif">


# Which Tube line should I take to work?
<img src="tube.png">

## Some days it was all good...

<img src="norush.jpg" height=400 width=600>



# Other days ...not so good

<img src="congestion.jpg" height=400 width=400>

# A pattern starts to emerge

<img src="delays.jpg">
Source: [BBC News](http://news.bbc.co.uk/1/hi/in_pictures/8092917.stm)

### Observation: delays or suspensions on one station can affect remote stations

<img src="tube.png">


## Questions that demand an answer



What are the most "important" stations in the London Underground network?


How does suspending these "important" stations affect the rest of the network

# Let's bring the Python to the Data

<img src="python-logo.png" height=50 widht=50>
<img src="bokeh_logo_small.png">
<img src="graph-tool-small-logo.png">

# In the beginning, there was the 'Data'

How do I translate a physical map of the London Underground into a Graph I can process with Python?





## Start


<img src="tube.png">

## Goal

<img src="fancy_graph_sfdp.png">

# Goal

<img src="betweenness.png">

## Data collection:

It would be cool to program some kind of OCR to automatically read the data from the map and produce a data file!
But alas, I had to resort to manually creating a data file:




```
#Station #Neighbour(line)
Acton Town	        Chiswick Park (District), South Ealing (Picadilly), Turnham Green (Picadilly)
Aldgate		        Tower Hill (Circle; District), Liverpool Street (Metropolitan; Circle; District)
Aldgate East	    Tower Hill (District), Liverpool Street (HammersmithCity; Metropolitan)
Alperton	        Sudbury Town (Picadilly), Park Royal (Picadilly)
```


# Now it's a piece of cake...

<img src="data-slide.png" >

#... to make a graph

<img src="arrow-graph.png">

#Let's go back to our question 1

1. What is the most "important" station in the London Underground network?


# Defining "importance"

<img src="tube.png">

# Let's talk about betweenness centrality

<img src="betweenness_illustration.png" height=400 width=400>

## Betweenness seems like a good metric to measure the "importance" of a station


# Graphs and Python:  `graph-tool`

<img src="graph-tool-small-logo.png">

* `graph-tool` is a Python library written by Tiago Peixoto that provides a number of tools for analyzing and plotting graphs.




# What can you do with `graph-tool` ?



## Create a graph object 


In [3]:
from graph_tool.all import Graph

#create a new Graph object
graph_object=Graph()

## Add edges and vertices to the graph

In [4]:
# add a vertex 
vertex1 = graph_object.add_vertex()
vertex2 = graph_object.add_vertex()





In [5]:
# add an edge
edge1 = graph_object.add_edge(vertex1, vertex2)

## Create property maps

super helpful for storing information about your nodes and edges



In [None]:
vertex_names = graph_object.new_vertex_property("string")



## Create visualizations

In [None]:
from graph_tool.draw import graph_draw



## Filter vertices and edges 

# A sample visualization of the London Underground


<img src="fancy_graph_arf.png">

#Let's go back to betweenness





# Betweenness 

<img src="betweennessimagenew.png">

# We have our answer for question 1...

## Let's take our analysis of betweenness one step further... and answer question 2


## How do problems on one of these important stations affect the Underground network?

## Bokeh: creating interactive data visualization

<img src="bokeh_logo_small.png">

# A basic visualization of the London Underground

<img src="londontube.png">

# A Basic Visualization of Betweenness

<img src="betweenness_full.png">

# How does the betwenness of each station change when Baker Street is suspended?



<img src="removed_baker.png">

## Bokeh allows us to visualize this interactively in the browser

In [3]:
from IPython.display import YouTubeVideo

YouTubeVideo('QBksBcXm1qg')