## 1. The Discovery of Handwashing
<p>In the mid 1800s, Dr. Ignaz Semmelweis was an obstetrician at Vienna General Hospital. At the time, maternal death due to puerperal fever was common, but he was particularly concerned that the death rate in his clinic (Clinic 1) was much higher than the death rate in another clinic at Vienna General Hospital (Clinic 2). <em>So what was the difference between these two clinics?</em> Doctors and midwives worked in Clinic 1, while only midwives worked in Clinic 2. This led Dr. Semmelweis to hypothesize that doctors carried deadly "cadaverous particles" from their autopsies to their patients in Clinic 2.</p>
<p>In 1847, Dr. Semmelweis instated a policy where doctors had to use a chlorine solution to wash their hands between performing autopsies and seeing patients. The maternal mortality rate drastically decreased as seen in the plot below. Sadly, germ theory (the idea that there are particles that cause disease) was not widely accepted at the time, so his hypothesis was rejected by most doctors.</p>
<p><img src="https://assets.datacamp.com/production/project_1187/img/semmelweis_plot.png" alt="Line plot of maternal mortality rate in Clinic 1 at Vienna General Hospital" width="600px"></p>
<p>The two datasets you will use are from Dr. Semmelweis's original 1859 publication<sup>1</sup>. Here are the details:</p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
<div style="font-size:20px"><b>datasets/clinic_data.csv</b></div>
This contains yearly clinic-level data on births and maternal deaths in each of the two maternity clinics at Vienna General Hospital.
<ul>
<li><b><code>year</code>:</b> each year from 1833 to 1858</li>
<li><b><code>births</code>:</b> total number of births in the clinic</li>
<li><b><code>deaths</code>:</b> number of maternal deaths in the clinic</li>
<li><b><code>clinic</code>:</b> clinic (either <code>clinic_1</code> or <code>clinic_2</code>). Doctors and midwives worked in Clinic 1, while only midwives worked in Clinic 2.</li>
</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
<div style="font-size:20px"><b>datasets/hospital_data.csv</b></div>
This contains yearly hospital-level data on births and maternal deaths. 
<ul>
<li><b><code>year</code>:</b> each year from 1784 to 1848</li>
<li><b><code>births</code>:</b> total number of births at the hospital</li>
<li><b><code>deaths</code>:</b> number of maternal deaths at the hospital</li>
<li><b><code>hospital</code>:</b> hospital (either <code>Vienna</code> or <code>Dublin</code>). At the Vienna General Hospital where Dr. Semmelweis worked, doctors began performing pathological autopsies in 1823. At the Dublin Rotunda Hospital, doctors did not perform pathological autopsies at all.</li>
</ul>
</div>
<p><small><sup>1</sup><a href="http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/the%20etiology,%20concept%20and%20prophylaxis%20of%20childbed%20fever.pdf">Ignaz Semmelweis: The etiology, concept, and prophylaxis of childbed fever.</a></small></p>

In [36]:
# Load necessary library
library(dplyr)

# Load the data
clinic_data <- read.csv("datasets/clinic_data.csv")
hospital_data <- read.csv("datasets/hospital_data.csv")

# Print the structure of hospital_data to check the available columns
print(str(hospital_data))

# Ensure 'deaths' and 'births' columns exist and calculate death_rate
if ("deaths" %in% colnames(hospital_data) & "births" %in% colnames(hospital_data)) {
  hospital_data <- hospital_data %>%
    mutate(death_rate = deaths / births)
} else {
  stop("Columns 'deaths' and 'births' must exist in hospital_data.")
}

if ("deaths" %in% colnames(clinic_data) & "births" %in% colnames(clinic_data)) {
  clinic_data <- clinic_data %>%
    mutate(death_rate = deaths / births)
} else {
  stop("Columns 'deaths' and 'births' must exist in clinic_data.")
}

# Check the column names
print(colnames(hospital_data))

# Filter for only the hospital in Vienna (assuming there's a column 'hospital' to identify it)
if ("hospital" %in% colnames(hospital_data)) {
  hospital_data_vienna <- hospital_data %>%
    filter(hospital == "Vienna")
  
  # Ensure the 'autopsies_introduced' column exists and create it if necessary
  if (!"autopsies_introduced" %in% colnames(hospital_data_vienna)) {
    hospital_data_vienna <- hospital_data_vienna %>%
      mutate(autopsies_introduced = year >= 1823)
  }

  # Calculate average death rate before handwashing was introduced in 1847 for each clinic
  rate_by_clinic_pre_handwashing <- clinic_data %>%
    filter(year < 1847) %>%
    group_by(clinic) %>%
    summarize(avg_rate = mean(death_rate), .groups = 'drop')

  # Calculate average death rates before and after autopsies were introduced in 1823
  rate_by_autopsies_introduced <- hospital_data_vienna %>%
    group_by(autopsies_introduced) %>%
    summarize(avg_rate = mean(death_rate), .groups = 'drop')

  # Display the results
  print("Clinic Data with Death Rate:")
  print(clinic_data)
  print("Hospital Data with Death Rate:")
  print(hospital_data_vienna)
  print("Average Death Rate Before Handwashing by Clinic:")
  print(rate_by_clinic_pre_handwashing)
  print("Average Death Rate Before and After Autopsies Introduced:")
  print(rate_by_autopsies_introduced)
} else {
  print("Column 'hospital' does not exist in hospital_data.")
}


'data.frame':	130 obs. of  4 variables:
 $ year    : int  1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 ...
 $ births  : int  1261 1292 1351 1347 1469 1435 1546 1602 1631 1747 ...
 $ deaths  : int  11 8 8 10 23 25 12 25 10 19 ...
 $ hospital: chr  "Dublin" "Dublin" "Dublin" "Dublin" ...
NULL
[1] "year"       "births"     "deaths"     "hospital"   "death_rate"
[1] "Clinic Data with Death Rate:"
   year births deaths   clinic  death_rate
1  1833   3737    197 clinic_1 0.052716082
2  1834   2657    205 clinic_1 0.077154686
3  1835   2573    143 clinic_1 0.055577147
4  1836   2677    200 clinic_1 0.074710497
5  1837   2765    251 clinic_1 0.090777577
6  1838   2987     91 clinic_1 0.030465350
7  1839   2781    151 clinic_1 0.054297015
8  1840   2889    267 clinic_1 0.092419522
9  1841   3036    237 clinic_1 0.078063241
10 1842   3287    518 clinic_1 0.157590508
11 1843   3060    274 clinic_1 0.089542484
12 1844   3157    260 clinic_1 0.082356668
13 1845   3492    241 clinic_1 0.0690148