<div style="width: 38.5%;">
    <p><strong>City College of San Francisco</strong><p>
    <hr>
    <p>MATH 108 - Foundations of Data Science</p>
</div>

# Lecture 27: The Normal Distribution

Associated Textbook Sections: [14.3, 14.4](https://inferentialthinking.com/chapters/14/3/SD_and_the_Normal_Curve.html)

---

## Outline

* [Standard Units Review](#Standard-Units-Review)
* [Normal Distributions](#Normal-Distributions)
* [Normal Proportions](#Normal-Proportions)

---

## Set Up the Notebook

In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import scipy
import plotly.express as px
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

---

## Standard Units Review

* How many SDs above average?
* `z = (value - average)/SD`
    * Negative z: value below average
    * Positive z: value above average
    * z = 0: value equal to average
* When values are in standard units: average = 0, SD = 1
* Gives us a way to compare/understand data no matter what the original units


---

### Demo: Standard Units

* Load UC Berkeley DATA 8 exam distribution data from Fall 2018.
* Calculate the mean midterm and final exam scores.
* Visualize the distribution of scores.
* Create a function that calculates the standard units of the numerical values in an array.
* Add the standardized midterm and final exam scores to the `exams` table.
* Visualize the distribution of standardized scores to notice that the general shape of the distribution does not change.

In [None]:
exams = Table.read_table('exams_fa18.csv')
exams.show(5)

In [None]:
midterm_mean = np.mean(exams.column('Midterm'))
final_mean = np.mean(exams.column('Final'))
midterm_mean, final_mean

In [None]:
exams.hist(overlay=False, bins=np.arange(0,101,5))

In [None]:
def standard_units(x):
    """Convert array x to standard units."""
    ...

In [None]:
midterm_su = ...
exams = exams.with_column('Midterm in Standard Units', midterm_su)

final_su = ...
exams = exams.with_column('Final in Standard Units', final_su)

exams.show(10)

In [None]:
exams.select(
    'Midterm in Standard Units', 'Final in Standard Units'
).hist(overlay=False, bins=np.arange(-4,2,0.1))

---

## Normal Distributions

---

### Bell-Shaped Curves

<img src="./hanging_bell.jpeg" width=50%>

---

### There are many, many, many normal curves!

<img src="./normal_curves.png" width=50%>

---

### Probability Density Curve

The height of the (probability density) curves are determined by the formula*

$$f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2}$$

and the shape is determined by the mean (`mu`, $\mu$) and standard deviation (`sigma`, $\sigma$) values.

_Note: You will not be working with this formula directly in this class, but we can use a python representation of it (`scipy.stats.norm.pdf`) to see how the mean and standard deviation impact the shape of the distribution._

In [None]:
def graph_norm(mu=0, sigma=1):
    x=np.arange(-10, 10, 0.01)
    display(px.line(x=x, 
                    y=scipy.stats.norm.pdf(x, loc=mu, scale=sigma), 
                    range_y=[0, 1]));

Adjust the values of `mu` and `sigma` using the slider to see how the shape of the distribution changes.

In [None]:
interact(graph_norm, mu=(-5, 5), sigma=(0.2**0.5, 5**0.5, 0.05));

---

## Normal Proportions

---

### How Big are Most of the Values?

* No matter what the shape of the distribution, the bulk of the data are in the range "average ± a few SDs" (Chebyshev's Inequality)
* **If a histogram is bell-shaped**, then almost all of the data are in the range "average ± 3 SDs"


---

### Bounds and Normal Approximations

<img src="./normal_bounds.png" width = 50%>

---

### A "Central" Area

<img src="./central_area.png" width = 50%>

---

<footer>
    <p>Adopted from UC Berkeley DATA 8 course materials.</p>
    <p>This content is offered under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC Attribution Non-Commercial Share Alike</a> license.</p>
</footer>