# Introduction

This notebook builds an interactive map to explore John Snow's famous cholera outbreak investigation through data. 

We'll visualize deaths over time and location, highlight water pumps, and show how the outbreak peaked and declined. 

This approach brings the historic story to life with layered, interactive mapping, adding nuance to the classic tale.


**John Snow's Historic Cholera Map**

![Alt text](images/snow_historic_map.png)

## Base Map

Using Folium, created a base map of Soho, London, where the outbreak took place in 1854.


In [86]:
!pip install folium

Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels


In [103]:
import folium
import pandas as pd

# Data
pumps = pd.read_csv('data/pumps.csv')
deaths = pd.read_csv('data/deaths.csv')

# Center the map with lat and long
center_lat = pd.concat([pumps['latitude'], deaths['latitude']]).mean()
center_lon = pd.concat([pumps['longitude'], deaths['longitude']]).mean()

# Base map
map_base = folium.Map(location=[center_lat, center_lon], zoom_start=17, tiles='Openstreetmap')
map_base


## Pump Locations

This section creates an interactive map that focuses on the water pumps of Soho, the key sites in John Snow’s investigation.

* We load the pump data and center the map using the average coordinates of all pump locations.
* The map is centered based on the average location of all pumps and deaths to provide the best overview.
* Each water pump is marked with an icon and an informative popup, including a fun or historical fact about the pump.
* Broad Street Pump is specially marked to highlight its unique place in the story.
* Use this map to explore the role and placement of communal water sources in the neighborhood, setting the stage for understanding how water access affected the spread and discovery of cholera.

This simplified map helps visualize the infrastructure that shaped both daily life and the course of the epidemic.


In [104]:
map_pump = map_base

# random facts of Broad St. & its neighboring pumps
pump_facts = {
    "Broad St.": "John Snow removed the pump handle here on Sept 8! 💪",
    "Crown Chapel": "A lesser-known pump, but still important.",
    "Gt Marlborough": "Did this pump play a role? 🤔",
    "Dean St.": "Dean Street: bustling and busy.",
    "So Soho": "Soho's secret water source.",
    "Briddle St.": "Briddle St. pump: overlooked by many.",
    "Coventry St.": "Coventry St. pump: on the edge of the outbreak.",
    "Warwick": "Warwick pump: a survivor!"
}

# Loop through each row of the pumps Data
for i, row in pumps.iterrows():
    folium.Marker(
        [row['latitude'], row['longitude']],    # Pump Locations
        popup=f"<b>{row['pump_name']}</b><br>{pump_facts.get(row['pump_name'], 'A mysterious pump!')}",  # Pump Names and facts
        icon=folium.Icon(
            color='blue' if row['pump_name']=='Broad St.' else 'green',                                  # Broad St. pump is blue, others green
            icon='tint', prefix='fa'
        ),
        tooltip="Click for pump info"           # Popup for pumps info
    ).add_to(map_pump)

map_pump


## Death Locations

This code chunk takes cholera mortality data that includes daily death counts and assigns each death a specific date, creating a detailed temporal view of the outbreak. It then visualizes these deaths on a map as emoji markers, which can be clicked to reveal the exact date of each death. 
* Uses daily death counts to assign approximate dates to each mortality record
* Displays deaths as emoji icons on the map for intuitive visualization
* Popups provide detailed date info for every death when clicked

In [105]:
# Deaths Map
map_death = map_pump

# Create a list repeating each date for the number of deaths on that date
date_assignments = []

for _, date_row in dates.iterrows():
    # For each row in snow_dates, repeat that date for as many deaths as in 'deaths'
    date_assignments.extend([date_row['date']] * date_row['deaths'])

# Now assign these dates as a new column to deaths
deaths['assigned_date'] = date_assignments[:len(deaths)]

# Add deaths to the map as emoji markers with date popups
deaths_fg = folium.FeatureGroup(name="Cholera Deaths", show=True)

# For each death, add an emoji icon and a popup with the date
for _, row in deaths.iterrows():
    # Format the assigned date for display
    date_str = row['assigned_date'].strftime('%B %d, %Y')
    popup_html = f"<b>Cholera victim</b><br>Date: {date_str}</br>"
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        icon=folium.DivIcon(html=f"""<div style="font-size:16.5px;">🥵</div>"""),
        popup=f"<b>Cholera victim</b><br>Date: {date_str}"
    ).add_to(deaths_fg)
deaths_fg.add_to(map_death)

map_death


## Interactive Cholera Map

The last part combines all the above layers & data points into one. The pumps and deaths are clickable. They pop information about the pump names, death dates, and locations. I made layers that include:
* The pump locations
* Death locations
* Death clusters representing the outbreak period:
    * Before the peak outbreak
    * During the peak  outbreak
    * After the pump handle removal
* An Epicenter marker that shows the affected area during the peak outbreak
  
The toggleable layers and popup markers are fun to explore and improve the interactivity with Snow's data.

In [109]:
import pandas as pd
import folium

# --- LOAD DATA ---
pumps = pd.read_csv('data/pumps.csv')
deaths = pd.read_csv('data/deaths.csv')
dates = pd.read_csv('data/snow_dates.csv')
dates['date'] = pd.to_datetime(dates['date'])

# --- ASSIGN DEATH DATES ---
date_assignments = []
for _, date_row in dates.iterrows():
    date_assignments.extend([date_row['date']] * date_row['deaths'])
deaths['assigned_date'] = date_assignments[:len(deaths)]

# --- MAP CENTER ---
center_lat = pd.concat([pumps['latitude'], deaths['latitude']]).mean()
center_lon = pd.concat([pumps['longitude'], deaths['longitude']]).mean()
my_map = folium.Map(location=[center_lat, center_lon], zoom_start=16.2, tiles='OpenStreetMap')

# --- WATER PUMPS FEATURE GROUP ---
pump_facts = {
    "Broad St.": "John Snow removed the pump handle here on Sept 8! 💪",
    "Crown Chapel": "A lesser-known pump, but still important.",
    "Gt Marlborough": "Did this pump play a role? 🤔",
    "Dean St.": "Dean Street: bustling and busy.",
    "So Soho": "Soho's secret water source.",
    "Briddle St.": "Briddle St. pump: overlooked by many.",
    "Coventry St.": "Coventry St. pump: on the edge of the outbreak.",
    "Warwick": "Warwick pump: a survivor!"
}
pump_fg = folium.FeatureGroup(name="Water Pumps", show=True)
for _, row in pumps.iterrows():
    fact = pump_facts.get(row['pump_name'], "A mysterious pump!")
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=folium.Popup(f"<b>{row['pump_name']}</b><br>{fact}", max_width=250),
        icon=folium.Icon(color='blue' if row['pump_name']=='Broad St.' else 'green', icon='tint', prefix='fa'),
        tooltip='Click for story'
    ).add_to(pump_fg)
pump_fg.add_to(my_map)

# --- CHOLERA DEATHS SPLIT BY PERIOD ---
deaths['assigned_date'] = pd.to_datetime(deaths['assigned_date'])
before_peak = deaths[deaths['assigned_date'] < '1854-09-01']
during_peak = deaths[(deaths['assigned_date'] >= '1854-09-01') & (deaths['assigned_date'] <= '1854-09-07')]
after_intervention = deaths[deaths['assigned_date'] >= '1854-09-08']

# EMOJI MARKERS: GROUPS BEFORE/DURING/AFTER
before_fg = folium.FeatureGroup(name="Before Peak (Aug 19-31)", show=True)
for _, row in before_peak.iterrows():
    date_str = row['assigned_date'].strftime('%B %d, %Y')
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        icon=folium.DivIcon(html='<div style="font-size:16.5px;">🥵</div>'),
        popup=f"<b>Cholera victim</b><br>Date:<br>{date_str}",
    ).add_to(before_fg)
before_fg.add_to(my_map)

during_fg = folium.FeatureGroup(name="During Peak (Sept 1-7)", show=True)
for _, row in during_peak.iterrows():
    date_str = row['assigned_date'].strftime('%B %d, %Y')
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        icon=folium.DivIcon(html='<div style="font-size:16.5px;">🥵</div>'),
        popup=f"<b>Cholera victim</b><br>Date:<br>{date_str}",
    ).add_to(during_fg)
during_fg.add_to(my_map)

after_fg = folium.FeatureGroup(name="After Intervention (Sept 8-30)", show=True)
for _, row in after_intervention.iterrows():
    date_str = row['assigned_date'].strftime('%B %d, %Y')
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        icon=folium.DivIcon(html='<div style="font-size:16.5px;">🥵</div>'),
        popup=f"<b>Cholera victim</b><br>Date:<br>{date_str}",
    ).add_to(after_fg)
after_fg.add_to(my_map)

# --- EPICENTER FEATURE GROUP ---
epic_fg = folium.FeatureGroup(name="Epicenter Zone", show=True)
folium.Circle(
    radius=350,
    location=[51.513341, -0.136668],  # set to Broad St pump
    color='crimson',
    fill=True,
    fill_opacity=0.16,
    popup="<b>Epicenter of outbreak!</b><br>Broad St. Pump area"
).add_to(epic_fg)
epic_fg.add_to(my_map)

# --- LAYER CONTROL ---
folium.LayerControl(collapsed=False).add_to(my_map)
my_map
