<center><img src="ignaz_semmelweis_1860_small.jpeg"></center>

Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital. Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on **June 1st, 1847**, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.

I will reanalyze the data that made Semmelweis discover the importance of handwashing and its impact on the hospital and the number of deaths.

The data is stored as two CSV files within the `data` folder.

`data/yearly_deaths_by_clinic.csv` contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.

| Column | Description |
|--------|-------------|
|`year`  |Years (1841-1846)|
|`births`|Number of births|
|`deaths`|Number of deaths|
|`clinic`|Clinic 1 or clinic 2|

`data/monthly_deaths.csv` contains data from 'Clinic 1' of the hospital where most deaths occurred.

| Column | Description |
|--------|-------------|
|`date`|Date (YYYY-MM-DD)
|`births`|Number of births|
|`deaths`|Number of deaths|

In [52]:
# Imported libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

# Loading data

deaths_year=pd.read_csv('data/yearly_deaths_by_clinic.csv')
deaths_month=pd.read_csv('data/monthly_deaths.csv')

print(deaths_year.info(),deaths_month.info())
print(deaths_year.head(12),deaths_month.head(10))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   year    12 non-null     int64 
 1   births  12 non-null     int64 
 2   deaths  12 non-null     int64 
 3   clinic  12 non-null     object
dtypes: int64(3), object(1)
memory usage: 512.0+ bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 98 entries, 0 to 97
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    98 non-null     object
 1   births  98 non-null     int64 
 2   deaths  98 non-null     int64 
dtypes: int64(2), object(1)
memory usage: 2.4+ KB
None None
    year  births  deaths    clinic
0   1841    3036     237  clinic 1
1   1842    3287     518  clinic 1
2   1843    3060     274  clinic 1
3   1844    3157     260  clinic 1
4   1845    3492     241  clinic 1
5   1846    4010     459  clinic 1
6   1841    2442      86  clinic 2
7   1

In [53]:
# What year had the highest yearly proportion of deaths at each clinic?
deaths_year['death_proportion']=deaths_year['deaths']/deaths_year['births']
max_value_row = deaths_year.loc[deaths_year['death_proportion'].idxmax()]
highest_year = max_value_row['year']
print("highest death:", highest_year)

highest death: 1842


In [54]:
# Handwashing was introduced on June 1st, 1847. What are the mean proportions of deaths before and after handwashing from the monthly data?
deaths_month['death_proportion']=deaths_month['deaths']/deaths_month['births']
before_handwashing=deaths_month[deaths_month['date']<'1847-07-01']
after_handwashing=deaths_month[deaths_month['date']>='1847-07-01']
print(before_handwashing.head(),after_handwashing.head())

monthly_summary = pd.DataFrame([[False, before_handwashing['death_proportion'].mean()], 
                                [True, after_handwashing['death_proportion'].mean()]], 
                               index=['Row 1', 'Row 2'], 
                               columns=['handwashing_started', 'mean deaths'])

print(monthly_summary.head())

         date  births  deaths  death_proportion
0  1841-01-01     254      37          0.145669
1  1841-02-01     239      18          0.075314
2  1841-03-01     277      12          0.043321
3  1841-04-01     255       4          0.015686
4  1841-05-01     255       2          0.007843           date  births  deaths  death_proportion
77  1847-07-01     250       3          0.012000
78  1847-08-01     264       5          0.018939
79  1847-09-01     262      12          0.045802
80  1847-10-01     278      11          0.039568
81  1847-11-01     246      11          0.044715
       handwashing_started  mean deaths
Row 1                False     0.103976
Row 2                 True     0.021032


In [55]:
#Analyze the difference in the mean monthly proportion of deaths before and after the introduction of handwashing using all of the data and calculate a 95% confidence interval.
# Split the monthly data into before and after handwashing was introduced
before_proportion = before_handwashing["death_proportion"]
after_proportion = after_handwashing["death_proportion"]

# Perform a bootstrap analysis of the reduction of deaths due to handwashing
boot_mean_diff = []
for i in range(3000):
    boot_before = before_proportion.sample(frac=1, replace=True)
    boot_after = after_proportion.sample(frac=1, replace=True)
    boot_mean_diff.append( boot_after.mean() - boot_before.mean() )

# Calculate a 95% confidence interval
confidence_interval = pd.Series(boot_mean_diff).quantile([0.025, 0.975])
print(confidence_interval)

0.025   -0.099906
0.975   -0.066683
dtype: float64
