<div style="width: 38.5%;">
    <p><strong>City College of San Francisco</strong><p>
    <hr>
    <p>MATH 108 - Foundations of Data Science</p>
</div>

# Lecture 02: Cause and Effect

Associated Textbook Sections: [2.0, 2.1, 2.2, 2.3. 2.4, 2.5](https://inferentialthinking.com/chapters/02/causality-and-experiments.html)

## Overview

* [Associations](#Associations)
* [A Data Science Origin Story](#A-Data-Science-Origin-Story)
* [Causation](#Causation)
* [Confounding Variables](#Confounding-Variables)

---

## Associations

### Regularly Eating Chocolate Is Linked to 8 Percent Lower Heart Attack Risk

Image and Headline Source: [everydayhealth.com](https://www.everydayhealth.com/diet-nutrition/eating-chocolate-regularly-linked-to-lower-heart-attack-risk)

<a href="https://www.everydayhealth.com/diet-nutrition/eating-chocolate-regularly-linked-to-lower-heart-attack-risk"><img src="./img/lec02_chocolate.jpeg" width = 50%></a>

Study Source: [European Journal of Preventive Cardiology](https://heart.bmj.com/content/101/16/1279)

### Study Observations


* Individuals (study subjects, participants, units, etc.)
    * 336,289 US, Swedish, and Australian adults in several studies.
* Treatment
    * Chocolate consumption
* Outcome
    * Coronary heart disease risk

### An Initial Question

Is there an association between chocolate consumption and heart disease risk?

### An Answer

Yes, the reviewed article in the European Journal of Preventive Cardiology concludes that those consumed chocolate more than 1 time per week or more than 3.5 times per month were associated with fewer cases of heart disease compared with those that didn't.


### A Follow Up Question

Does chocolate consumption **lead to** a reduction in heart disease? This question is often harder to answer.

<center>
    Causality
</center>

### An Answer

No, there are several factors that could explain why fewer people that consumed chocolate regularly developed heart disease. For example, better health care access could explain financial freedom to consume more foods like chocolate and explain less cases of heart disease.

> “Dr. Alice Lichtenstein, an American Heart Association volunteer and professor of nutrition science and policy at Tufts University, was more skeptical of the findings.”

---

## A Data Science Origin Story



### London, Early 1850’s

Image Source: [Wikipedia - 1954 Broad Street Cholera Outbreak](https://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak)

<img src="./img/punch_a_court_for_king_cholera.png" width = 50%>

### Miasmas, Miasmatism, Miasmatists

* Bad smells given off by waste and rotting matter
* Believed to be the main source of disease
* Staunch believers:
    * Florence Nightingale (founder of modern nursing)
    * Edwin Chadwick (Commissioner of the General Board of Health)


### Suggested Remedies

#### Cholera, around 1850

* “fly to clene air”
* “a pocket full o’posies”
* “fire off barrels of gunpowder”

This might seem strange ...

#### COVID-19, 2020

* Inject disinfectant
* Sunlight
* Hydroxychloroquine
* Take 6 deep breaths, then cough while covering mouth
* Cannabis, cocaine, mangoes, onion, garlic, drinking water every 15 minutes, tea, eating ice cream, avoiding ice cream

### John Snow, 1813-1858

<img src="./img/john_snow.jpeg" width = 25%>

### Cholera Map

Image and Text Source: [National Geographic - Mapping A London Epidemic](https://www.nationalgeographic.org/activity/mapping-london-epidemic/)

According to the National Geographic Society, 

> "This map of London was created by John Snow in 1854. London was experiencing a deadly cholera epidemic, when Snow tracked the cases on this map. The cholera cases are highlighted in black. Using this map, Snow and other scientists were able to trace the cholera outbreak to a single infected water pump."

<img src="./img/cholera_map.jpeg" width = 50%>

In [None]:
from IPython.display import IFrame
IFrame(src="https://www.google.com/maps/embed?pb=!1m18!1m12!1m3!1d2482.9971371478814!2d-\
            0.13879218398430104!3d51.51326851809472!2m3!1f0!2f0!3f0!3m2!1i1024!2i768!4f13\
            .1!3m3!1m2!1s0x487604d4eb49ec6d%3A0xc4ff84518f83499d!2sJohn%20Snow!5e0!3m2!1\
            sen!2sus!4v1642117611191!5m2!1sen!2sus", 
       width=800, height=600)

---

## Causation

### London Water Supply Service Regions

Image Source: [British Library - John Snow's map showing the water supply in London, 1855](https://www.bl.uk/collection-items/john-snows-map-showing-the-water-supply-in-london-1855)

Image NOTE:
* Blue - Southwark and Vauxhall Company
* Red - Lambeth Company
* Purple - The area in which the pipes of both Companies are intermingled.

<a href = "./img/johnsnow_water_companies.jpeg"><img src="./img/johnsnow_water_companies.jpeg" width = 50%></a>

### Comparison

* Treatment group
* Control group
    * Does not receive the treatment


### Snow’s “Grand Experiment” ... Study

“… there is no difference whatever in the houses or the people receiving the supply of the two Water Companies, or in any of the physical conditions with which they are surrounded …”

The two groups were similar except for the treatment.


### Snow's Table

#### Python Imports and Settings

In [None]:
from datascience import *
import numpy as np
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
%matplotlib inline

In [None]:
snows_table = Table(['Supply Area', 'Number of Houses', 'Cholera Deaths']).with_rows([
    ['S&V', 40046, 1263], 
    ['Lambeth', 26107, 98],
    ['Rest of London', 256423, 1422]
])
snows_table

To compare the deaths totals in various supply areas, calculate the relative frequency of deaths per household.

In [None]:
death_per_house = ...
snows_table.with_column('Deaths per House', 
                        death_per_house)

Scale and round the rates to show whole numbers.

In [None]:
deaths_per_10000_houses = ...
snows_table.with_column('Deaths per 10,000 Houses', 
                       np.round(deaths_per_10000_houses))

Scaling rates a common presentation technique. This can provide clarity, but it can also be misleading!

Image Source: [CDC - Rates of COVID-19-Associated Hospitalization (Updated Jan 8, 2022)](https://gis.cdc.gov/grasp/covidnet/covid19_3.html)

<a href="https://gis.cdc.gov/grasp/covidnet/covid19_3.html"><img src="./img/CDC-COVID-hospitalization-rates.png" width = 50%></a>

### A Key to Establishing Causality

If the treatment and control groups are similar apart from the treatment, then differences between the outcomes in the two groups can be ascribed to the treatment.

---

## Confounding Variables

### Confounding Factors Weaken a Causal Argument

* If the treatment and control groups have systematic differences other than the treatment, then it might be difficult to identify causality. 

* Such differences are often present in observational studies.

* When they lead researchers astray, they are called confounding factors.

### Example of a Confounding Relationship

<img src="./img/cheese_sheets_association.svg" width = 100%>

### Randomize! to Strengthen a Causal Argument

* If you assign individuals to treatment and control at random, then the two groups are *likely* to be similar apart from the treatment.
* You can (mathematically) account for variability in the assignment.
* **Randomized Controlled Experiment**:
    * Randomly assign individuals to treatments
    * Ensure one treatment is a control where there outcome is understood.

### Be Careful ...

Regardless of what the dictionary says,
in probability theory

<center>
    Random ≠ Haphazard
</center>

---

<footer>
    <p>Adopted from UC Berkeley DATA 8 course materials.</p>
    <p>This content is offered under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC Attribution Non-Commercial Share Alike</a> license.</p>
</footer>