# The Numbers Behind the Breakthrough: Semmelweis's Handwashing Data Revisited

![image](https://assets.datacamp.com/production/project_20/datasets/ignaz_semmelweis_1860.jpeg)

## 1. Introduction 

In the early 1840s, the Vienna General Hospital was a place of both hope and despair. While it was a leading medical institution, it was also the site of a devastating phenomenon: childbed fever, a deadly disease affecting women who had just given birth. Shockingly, as many as 10% of women delivering babies at the hospital succumbed to this illness. Among the physicians working there was Dr. Ignaz Semmelweis, a Hungarian physician born in 1818, who would later be hailed as a pioneer in the field of medical hygiene.

Dr. Semmelweis was deeply troubled by the high mortality rates from childbed fever. Through careful observation and analysis, he hypothesised that the cause of the disease was the contaminated hands of doctors who performed autopsies and then attended to childbirths without washing their hands. Despite his compelling evidence, his contemporaries were skeptical and resistant to his pleas for handwashing. This resistance would delay the widespread adoption of this life-saving practice for decades.

In this Jupyter notebook, we will re-analyse the data that led Dr. Semmelweis to his groundbreaking discovery. By examining the mortality rates before and after the implementation of handwashing, we aim to shed light on the profound impact of this simple yet revolutionary practice. Through this analysis, we hope to honor Dr. Semmelweis's legacy and underscore the critical importance of hygiene in medical settings.

## 2. Disturbing Death Toll

To understand the gravity of the situation at the Vienna General Hospital during the 1840s, we begin by examining the data that made Dr. Ignaz Semmelweis realise something was terribly wrong with the hospital's procedures. Using `pandas.read_csv`, we load a dataset that contains the number of women giving birth at the hospital's two clinics between 1841 and 1846. The data reveals a shocking reality: childbirth was extremely dangerous, with a significant number of women dying as a result, primarily from childbed fever.

In [1]:
# Importing modules
import pandas as pd

# Read datasets/yearly_deaths_by_clinic.csv into yearly
yearly = pd.read_csv("datasets/yearly_deaths_by_clinic.csv")

# Print out yearly
print(yearly)

    year  births  deaths    clinic
0   1841    3036     237  clinic 1
1   1842    3287     518  clinic 1
2   1843    3060     274  clinic 1
3   1844    3157     260  clinic 1
4   1845    3492     241  clinic 1
5   1846    4010     459  clinic 1
6   1841    2442      86  clinic 2
7   1842    2659     202  clinic 2
8   1843    2739     164  clinic 2
9   1844    2956      68  clinic 2
10  1845    3241      66  clinic 2
11  1846    3754     105  clinic 2


To better grasp the severity of the issue, we focus on the **proportion of deaths** relative to the number of women giving birth. This metric allows us to quantify the risk faced by women during childbirth. Specifically, we zoom in on **Clinic 1**, where the mortality rates were particularly alarming.

In [2]:
# Calculate proportion of deaths per no. births
yearly["proportion_deaths"] = yearly["deaths"]/yearly["births"]

# Extract Clinic 1 data into clinic_1 and Clinic 2 data into clinic_2
clinic_1 = yearly[yearly["clinic"] == "clinic 1"]
clinic_2 = yearly[yearly["clinic"] == "clinic 2"]

# Print out clinic_1
print(clinic_1)

   year  births  deaths    clinic  proportion_deaths
0  1841    3036     237  clinic 1           0.078063
1  1842    3287     518  clinic 1           0.157591
2  1843    3060     274  clinic 1           0.089542
3  1844    3157     260  clinic 1           0.082357
4  1845    3492     241  clinic 1           0.069015
5  1846    4010     459  clinic 1           0.114464


### Key Observations:
- The data shows that a substantial proportion of women who gave birth in Clinic 1 died from childbed fever.
- The mortality rates were consistently high, indicating a systemic issue rather than an isolated incident.
- These findings highlight the urgent need for intervention, which Dr. Semmelweis later identified as the lack of hand hygiene among medical staff.

By analysing the proportion of deaths, we can clearly see why Dr. Semmelweis was deeply concerned and motivated to investigate the root cause of this tragedy. In the following sections, we will delve deeper into the data to explore how his discovery of handwashing transformed the outcomes for women at the Vienna General Hospital.

## 3. Death at the clinics
<p>If we now plot the proportion of deaths at both Clinic 1 and Clinic 2  we'll see a curious pattern…</p>

In [None]:
# Import matplotlib
import matplotlib.pyplot as plt

# This makes plots appear in the notebook
%matplotlib inline

# Plot yearly proportion of deaths at the two clinics
ax = clinic_1.plot(x="year", y="proportion_deaths",
              label="clinic_1")
clinic_2.plot(x="year", y="proportion_deaths",
         label="clinic_2", ax=ax, ylabel="Proportion deaths")

## 4. The handwashing begins
<p>Why is the proportion of deaths consistently so much higher in Clinic 1? Semmelweis saw the same pattern and was puzzled and distressed. The only difference between the clinics was that many medical students served at Clinic 1, while mostly midwife students served at Clinic 2. While the midwives only tended to the women giving birth, the medical students also spent time in the autopsy rooms examining corpses. </p>
<p>Semmelweis started to suspect that something on the corpses spread from the hands of the medical students, caused childbed fever. So in a desperate attempt to stop the high mortality rates, he decreed: <em>Wash your hands!</em> This was an unorthodox and controversial request, nobody in Vienna knew about bacteria at this point in time. </p>
<p>Let's load in monthly data from Clinic 1 to see if the handwashing had any effect.</p>

In [None]:
# Read datasets/monthly_deaths.csv into monthly
monthly = pd.read_csv("datasets/monthly_deaths.csv", parse_dates=["date"])

# Calculate proportion of deaths per no. births
monthly["proportion_deaths"] =  monthly["deaths"]/monthly["births"]

# Print out the first rows in monthly
print(monthly.head(1))

## 5. The effect of handwashing
<p>With the data loaded we can now look at the proportion of deaths over time. In the plot below we haven't marked where obligatory handwashing started, but it reduced the proportion of deaths to such a degree that you should be able to spot it!</p>

In [None]:
# Plot monthly proportion of deaths
ax = monthly.plot(x="date", y="proportion_deaths",
         ylabel="Proportion deaths")

## 6. The effect of handwashing highlighted
<p>Starting from the summer of 1847 the proportion of deaths is drastically reduced and, yes, this was when Semmelweis made handwashing obligatory. </p>
<p>The effect of handwashing is made even more clear if we highlight this in the graph.</p>

In [None]:
# Date when handwashing was made mandatory
handwashing_start = pd.to_datetime('1847-06-01')

# Split monthly into before and after handwashing_start
before_washing = monthly[monthly["date"] < handwashing_start]
after_washing = monthly[monthly["date"] >= handwashing_start]

# Plot monthly proportion of deaths before and after handwashing
ax = before_washing.plot(x="date", y="proportion_deaths",
              label="before_washing")
after_washing.plot(x="date", y="proportion_deaths",
         label="after_washing", ax=ax, ylabel="Proportion deaths")

## 7. More handwashing, fewer deaths?
<p>Again, the graph shows that handwashing had a huge effect. How much did it reduce the monthly proportion of deaths on average?</p>

In [None]:
# Difference in mean monthly proportion of deaths due to handwashing
before_proportion = before_washing["proportion_deaths"]
after_proportion = after_washing["proportion_deaths"]
mean_diff = after_proportion.mean() - before_proportion.mean()
mean_diff

## 8. A Bootstrap analysis of Semmelweis handwashing data
<p>It reduced the proportion of deaths by around 8 percentage points! From 10% on average to just 2% (which is still a high number by modern standards). </p>
<p>To get a feeling for the uncertainty around how much handwashing reduces mortalities we could look at a confidence interval (here calculated using the bootstrap method).</p>

In [None]:
# A bootstrap analysis of the reduction of deaths due to handwashing
boot_mean_diff = []
for i in range(3000):
    boot_before = before_proportion.sample(frac=1, replace=True)
    boot_after = after_proportion.sample(frac=1, replace=True)
    boot_mean_diff.append(boot_before.mean() - boot_after.mean())

# Calculating a 95% confidence interval from boot_mean_diff 
confidence_interval = pd.Series(boot_mean_diff).quantile([0.025, 0.975])
confidence_interval

## 9. The fate of Dr. Semmelweis
<p>So handwashing reduced the proportion of deaths by between 6.7 and 10 percentage points, according to a 95% confidence interval. All in all, it would seem that Semmelweis had solid evidence that handwashing was a simple but highly effective procedure that could save many lives.</p>
<p>The tragedy is that, despite the evidence, Semmelweis' theory — that childbed fever was caused by some "substance" (what we today know as <em>bacteria</em>) from autopsy room corpses — was ridiculed by contemporary scientists. The medical community largely rejected his discovery and in 1849 he was forced to leave the Vienna General Hospital for good.</p>
<p>One reason for this was that statistics and statistical arguments were uncommon in medical science in the 1800s. Semmelweis only published his data as long tables of raw data, but he didn't show any graphs nor confidence intervals. If he would have had access to the analysis we've just put together he might have been more successful in getting the Viennese doctors to wash their hands.</p>

In [None]:
# The data Semmelweis collected points to that:
doctors_should_wash_their_hands = True