<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Statistics Fundamentals Part 1

_Authors: Alexander Egorenkov (DC)_

---

<a id="learning-objectives"></a>
## Learning Objectives
- Review basics of linear algebra in preperation for Sklearn documentaion.
- Use NumPy and Pandas libraries to analyze datasets to extract summary statistics: mean, median, mode, max, min, quartile, inter-quartile range, variance, standard deviation, and correlation.
- Create basic data visualizations - including: scatter plots, box plots, and histograms.
- Use visualizations to discern characteristics and trends in a dataset.
- Describe the bias and variance of statistical estimators.
- Use summary statistics and data visualizations to identify a normal distribution within a dataset.

### Lesson Guide
- [Where are we in the data science workflow?](#where-are-we-in-the-data-science-workflow)
- [Linear Algebra Review](#linear-algebra-review)
	- [Scalars, vectors and matrices](#scalars-vectors-and-matrices)
	- [Basic matrix algebra](#basic-matrix-algebra)
	- [Dot product](#dot-product)
	- [Matrix multiplication](#matrix-multiplication)
	- [Vector norm](#vector-norm)
- [Linear Algebra Applications to Machine Learning](#linear-algebra-applications-to-machine-learning)
	- [Distance between actual values and predicted values](#distance-between-actual-values-and-predicted-values)
	- [Mean Squared Error](#mean-squared-error)
	- [Least squares](#least-squares)
- [Codealong: Examining the Titanic Dataset](#codealong-examining-the-titanic-dataset)
- [Descriptive Statistics Fundamentals](#descriptive-statistics-fundamentals)
	- [Measures of Central Tendency](#measures-of-central-tendency)
	- [Math Review](#math-review)
	- [Measures of Dispersion: Standard Deviation and Variance](#measures-of-dispersion-standard-deviation-and-variance)
- [Our First Model](#our-first-model)
- [A Short Introduction to Model Bias and Variance](#a-short-introduction-to-model-bias-and-variance)
	- [Bias-Variance decomposition](#bias-variance-decomposition)
	- [Example using Bessel's correction](#example-using-bessels-correction)
- [Correlation and Association](#correlation-and-association)
	- [Codealong: Correlation in Pandas](#codealong-correlation-in-pandas)
- [The Normal Distribution](#the-normal-distribution)
	- [What is the Normal Distribution?](#what-is-the-normal-distribution)
	- [Skewness](#skewness)
	- [Kurtosis](#kurtosis)
- [Determining the Distribution of Your Data](#determining-the-distribution-of-your-data)
	- [Exercise](#exercise)
- [Topic Review](#topic-review)


In [6]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact
plt.style.use('fivethirtyeight')

# This makes sure that graphs render in your notebook
%matplotlib inline

<a id="where-are-we-in-the-data-science-workflow"></a>
## Where are we in the data science workflow?

![Data Science Workflow](./assets/images/data-science-workflow.png)

<a id="linear-algebra-review"></a>
## Linear Algebra Review
---

<a id="scalars-vectors-and-matrices"></a>
### Scalars, vectors and matrices

A **scalar** is a single number.

$$a$$

A **vector** is several numbers in sequence.

$$\vec{u} = \left[ \begin{array}{c}
1&3&7
\end{array} \right]$$

In [1]:
# Create a vector using np.array

An m x n **matrix** is a rectangular array of numbers with m rows and n columns. Each number in the matrix is an entry. Entries can be denoted $a_{mn}$

$$A= \left[ \begin{array}{c}
a_{11} & a_{12} & ... & a_{1n}  \\
a_{21} & a_{22} & ... & a_{2n}  \\
... & ... & ... & ... \\
a_{m1} & a_{m2} & ... & a_{mn}
\end{array} \right]$$
$$A \in \mathbb{R}^{mn}$$

In [2]:
# create a matrix using np.array

<a id="basic-matrix-algebra"></a>
### Basic matrix algebra


#### Addition and subtraction
Vector **addition** is straightforward. If two vectors are of equal dimensions:

$\vec{v} = \left[ \begin{array}{c}
1 \\
3 \\
7
\end{array} \right],  \vec{w} = \left[ \begin{array}{c}
1 \\
0 \\
1
\end{array} \right]$

In [5]:
v = np.array([1, 3, 7])
w = np.array([1, 0, 1])

$\vec{v} + \vec{w} =
\left[ \begin{array}{c}
1 \\
3 \\
7
\end{array} \right] + \left[ \begin{array}{c}
1 \\
0 \\
1
\end{array} \right] = 
\left[ \begin{array}{c}
1+1 \\
3+0 \\
7+1
\end{array} \right] = 
\left[ \begin{array}{c}
2 \\
3 \\
8
\end{array} \right]
$

(Subtraction is similar.)

In [3]:
# add the vectors together with +

#### Scalar multiplication
We scale a vector with **scalar multiplication**, multiplying a vector by a scalar (single quantity):

$ 2 \left[ \begin{array}{c}
1 \\
3 \\
7
\end{array} \right] = 
 \left[ \begin{array}{c}
2*1 \\
2*3 \\
2*7
\end{array} \right] = 
 \left[ \begin{array}{c}
2 \\
6 \\
14
\end{array} \right]$ 

In [4]:
# multiply v by 2

<a id="dot-product"></a>
### Dot product
The **dot product** of two _n_-dimensional vectors is:

$ \vec{v} \cdot \vec{w} =\sum _{i=1}^{n}v_{i}w_{i}=v_{1}w_{1}+v_{2}w_{2}+\cdots +v_{n}w_{n} $

So, if:

$\vec{v} = \left[ \begin{array}{c}
1 \\
3 \\
7
\end{array} \right], \vec{w} = \left[ \begin{array}{c}
1 \\
0 \\
1
\end{array} \right]$

$ \vec{v} \cdot \vec{w} = 1*1 + 3*0 + 7*1 = 8 $

In [7]:
v = np.array([1, 3, 7])
w = np.array([1, 0, 1])

# calculate the dot product of v and w using np.dot

<a id="matrix-multiplication"></a>
### Matrix multiplication
**Matrix multiplication**, $A_{mn} * B_{ij}$, is valid when the left matrix has the same number of columns as the right matrix has rows ($n = i$). Each entry is the dot product of corresponding row and column vectors.

![](./assets/images/matrix-multiply-a.gif)
(Image: mathisfun.com!)

In [8]:
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[7, 8], [9, 10], [11, 12]])
A.dot(B)

array([[ 58,  64],
       [139, 154]])

<a id="vector-norm"></a>
### Vector norm
The **magnitude** of a vector $\vec{v} \in \mathbb{R}^{n}$ in is interpretable as its length in n-dimensional space, and is calculable via the Euclidean distance:

$\vec{v} = \left[ \begin{array}{c}
v_{1} \\
v_{2} \\
\vdots \\
v_{n}
\end{array} \right]$

then $\| \vec{v} \| = \sqrt{v_{1}^{2} + v_{2}^{2} + ... + v_{n}^{2}} = \sqrt{v^Tv}$

E.g. if $\vec{v} = 
\left[ \begin{array}{c}
3 \\
4
\end{array} \right]$, then $\| \vec{v} \| = \sqrt{3^{2} + 4^{2}} = 5$

This is also called the vector **norm**. You will see this often in machine learning as least squares.

In [9]:
x = np.array([3,4])

# calculate the norm of the vector x with np.linalg.norm

<a id="linear-algebra-applications-to-machine-learning"></a>
## Linear Algebra Applications to Machine Learning
---

<a id="distance-between-actual-values-and-predicted-values"></a>
### Distance between actual values and predicted values
We often need to know the difference between predicted values and actual values.
We calculate this as:
$$\| \vec{actual} - \vec{predicted} \| =\sqrt{(actual_1 - predicted_1)^2 + (actual_2 - predict_2)^2}$$
<a id="mean-squared-error"></a>
### Mean Squared Error
Most of the time it's easier to read the average distance between predicted values and actual values.
$$\frac{1} {n} \| \vec{y} - f(X) \|$$
<a id="least-squares"></a>
### Least squares
Many machine learning models are composed in the following form:
$$\min \| \vec{y} - f(X) \|$$
The goal is to minimize the distance between model predictions and actual data.

Let's see this in sklearn http://scikit-learn.org/stable/modules/linear_model.html

<a id="codealong-examining-the-titanic-dataset"></a>
## Codealong: Examining the Titanic Dataset

---

#### Objective: Read in the Titanic data and look at a few summary statistics

In [10]:
# import Pandas
import pandas as pd

In [11]:
# If you don't have git use this link: http://bit.ly/2ae8zAT
titanic = pd.read_csv('http://bit.ly/2ae8zAT')

#### Print out the column names

In [15]:
# Answer:

Index([u'survived', u'pclass', u'name', u'sex', u'age', u'sibsp', u'parch',
       u'ticket', u'fare', u'cabin', u'embarked'],
      dtype='object')

#### Print out the dimensions of the DataFrame using the `.shape` attribute:

In [14]:
# Preview data dimensions

#### Print out the data types of the columns using the `.dtypes` attribute

In [15]:
# What are the column data types?

#### Print out the first 5 rows of the data using the `.head()` built-in function.

In [16]:
# Look at the first 5 rows

#### Use the `.value_counts()` built-in function to count the values of each type in the `pclass` column.

In [18]:
# Count the values of the the plcass variable.

#### Pull up descriptive statistics for each variables using the built-in `.describe()` function.

In [19]:
# Pull up descriptive statistics for each variable

#### Diagnosing data problems

- Whenever you get a new dataset, the fastest way to find mistakes and inconsistencies is to look at the descriptive statistics
  - If anything look too high or too low relative to your experience, there may be issues with the data collection
- Your data may have a lot of missing values and may need to be cleaned meticulously before being combined with other data
  - You can take a quick average or moving average to smooth out the data and combine that to preview your results before you embark on your much longer data cleaning journey
  - Sometimes filling in missing values with their means or medians will be the best solution for dealing with missing data other times you may want to drop the offending rows or do real imputation.

<a id="descriptive-statistics-fundamentals"></a>
## Descriptive Statistics Fundamentals
---

#### Objective: Review the following terms: mean, median, mode, interquartile-range, variance, and standard deviation and then view and example of how this information can be produced using Pandas.

#### A QUICK REVIEW OF NOTATION

The sum of a constant, k,  n times
$$\sum_{i=1}^nk$$

In [20]:
# k + k + k + k + ... + k

The sum of all numbers from 1 up to n:
$$\sum_{i=1}^ni$$

In [21]:
# 1 + 2 + 3 + ... + n

The sum of all x from the first x entry to the nth x entry:
$$\sum_{i=0}^nx_i$$

In [22]:
# x_1 + x_2 + x_3 + ... + x_n

##### Codealong

In [24]:
# Compute the sum of 7 4s using base Python


In [25]:
# Compute the sum of 7 4s using NumPy


In [26]:
# Compute the sum of 1 through 10 using base Python


In [28]:
# Using the titanic.fare column compute the total fare paid by passengers


<a id="measures-of-central-tendency"></a>
### Measures of Central Tendency

- mean
- median
- mode

#### MEAN
The mean, also known as an average or the expected value is defined as:
$$E[X] = \bar{X} =\frac 1n\sum_{i=1}^nx_i$$

#### MEDIAN
The median refers to the midpoint in a series of numbers.

$$ 0,1,2,[3],5,5,9 $$

$$ 1,3,4,[4,5],5,5,7 $$

To find the median:
- Arrange the numbers in order smallest to 
  largest.

- If there is an odd number of values, the 
  middle value is the median.

- If there is an even number of values, the 
  average of the middle two values is the 
  median.


#### MODE
The mode of a set of values is the value that occurs most often.
A set of values may have more than one mode or no mode.

$$1,0,1,5,7,8,9,3,4,1$$ 1 is the mode since it occurs the most often

##### Codealong

In [29]:
# Find the mean of the titanic.fare series using base Python

In [30]:
# Find the mean of the titanic.fare series using NumPy

In [31]:
# Find the mean of the titanic.fare series using Pandas

In [32]:
# What was the median fare paid (using Pandas)?

In [33]:
# Use Pandas to find the most common fare paid on the Titanic

<a id="math-review"></a>
### Math Review

**How do we measure distance?**

One method is two take the difference between two points
$$X_2 - X_1$$

However, this can be inconvenient due to negative numbers.

We often use this square root trick to deal with negative numbers
$$\sqrt{(X_2-X_1)^2}$$

**What about distance in multiple dimensions?**

We can turn to the Pythagorean theorem
$$a^2 + b^2 = c^2$$

To find the distance along a diagnal it is sufficient to measure one dimension at a time
$$\sqrt{a^2 + b^2} = c$$

More generally we can write this as (You'll see this in machine learning papers)
$$\|X\|_2 = \sqrt{\sum{x_i^2}} = c$$

If we want to work with points rather than distances, we can write
$$\sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2} = c$$
or
$$\sqrt{\sum{(x_i - y_i)^2}} = c$$

<a id="measures-of-dispersion-standard-deviation-and-variance"></a>
### Measures of Dispersion: Standard Deviation and Variance

Standard deviation (SD, σ for population standard deviation, s for sample standard deviation) is a measure that is used to quantify the amount of variation or dispersion from the mean of a set of data values. A low standard deviation means that most of the numbers are very close to the average. A high standard deviation means that the numbers are spread out.

Standard deviation is the square root of variance.
$$variance = \frac {\sum{(x_i - \bar{X})^2}} {n-1}$$

$$s = \sqrt{\frac {\sum{(x_i - \bar{X})^2}} {n-1}}$$

That can be a lot to take in so let's break it down in Python.

#### Assign the first 5 rows of titanic age data to a variable.

In [34]:
# Take the first 5 rows of titanic age data

#### Calculate the mean by hand.

In [35]:
# Calculate mean by hand

#### Calculate the variance by hand.

In [36]:
# Calculate variance by hand

#### Calculate the variance and the standard deviation using pandas.

In [37]:
# Verify with Pandas

<a id="our-first-model"></a>
## Our First Model
---

Let's make our first model from scratch. We'll predict the fare column in the titanic data. What data will we use? Actually, none.

The simplest model we can build is an estimation of the mean, median, or most common value. If we have no feature matrix and only an outcome, this is the best approach to make a prediction using only empirical data. 

This seems silly, but we'll actually use it all the time to create a baseline of how well we do with no data, and whether our more sophisticated models make an improvement.

http://scikit-learn.org/stable/modules/model_evaluation.html#dummy-estimators

#### Get the fare column from the titanic data and store it in variable y:

In [38]:
# Get the Fare column from the titanic data and store as y

#### Create predictions y_pred (in this case just the mean of y)

In [39]:
# Stored predictions in y_pred

#### Use the least squares (euclidean distance) formula to see how close our predictions are on average:

This is also known as mean squared error.

In [40]:
# Squared Error is really hard to read, Let's look at Mean Squared Error

#### Calculate the root mean square error, which is the square root of the mean squared distance:

In [41]:
# 

<a id="a-short-introduction-to-model-bias-and-variance"></a>
## A Short Introduction to Model Bias and Variance 

---

(There will be more intros)

In simple terms, **Bias** shows how on target a model is in its predictions.

**Variance** shows how reliable a model is in its performance.

These characteristics have important interactions, but we will save that for later.

![Bias and Variance](./assets/images/biasVsVarianceImage.png)

Remember how we just calculated Mean Squared Error to see how good our prediction was? It turns out we can do this for any statistical estimator, including means, variances, and machine learning models.

We can even decompose Mean Squared Error to identify where the source of error comes from.

<a id="bias-variance-decomposition"></a>
### Bias-Variance decomposition

In the following notation $f$ refers to a perfect model while $\hat{f}$ refers to our model.

**Bias**

Error due to bias is calculated at the difference between the expected prediction of our model and the correct value we are trying to predict.
$$Bias = E[\hat{f}(x)] - f(x)$$

**Variance**

Error due to variance is taken as the variability of a model prediction for a given point.

$$Variance = E[(\hat{f}(x) - E[\hat{f}(x)])^2]$$

**Mean Squared Error**
$$MSE(\hat{f}(x)) = Var(\hat{f}(x)) + Bias(\hat{f}(x),f(x))^2$$

This topic will come up again, it's enough for now to know that we can decompose MSE into Bias of the estimator and Variance of the estimator.

<a id="example-using-bessels-correction"></a>
### Example using Bessel's correction

It's rarely practical to measure every single item in a population to gather a statistic. We will usually sample a few items and use those to infer a population value.

For example, we can take a class of 200 students and measure their height, but rather than measuring everyone, we select students at random to estimate the average height in the class and the variance of the height in the class.

We know we can take the mean as follows:
$$E[X] = \bar{X} =\frac 1n\sum_{i=1}^nx_i$$

What about the variance?

Intuitively and by definition, population variance looks like this (the average distance from the mean):
$$\frac {\sum{(x_i - \bar{X})^2}} {n}$$

It's actually better to use the following for a sample (why?):

$$\frac {\sum{(x_i - \bar{X})^2}} {n-1}$$

In some cases, we may even use:

$$\frac {\sum{(x_i - \bar{X})^2}} {n+1}$$

Detailed explanations can be found here:
- https://en.wikipedia.org/wiki/Bessel%27s_correction
- https://en.wikipedia.org/wiki/Mean_squared_error

In [42]:
heights = np.random.rand(200) + 6.5

In [43]:
def plot_means(sample_size):
    true_mean = np.mean(heights)
    mean_heights = []
    for n in range(5,sample_size):
        for j in range(30):
            mean_height = np.mean(np.random.choice(heights, n, replace=False))
            mean_heights.append((n, mean_height))
    sample_height = pd.DataFrame(mean_heights, columns=['sample_size', 'height'])
    sample_height.plot.scatter(x='sample_size', y='height', figsize=(14, 4), alpha=0.5)
    plt.axhline(y=true_mean, c='r')
    plt.title("The Bias and Variance of the Mean Estimator")
    plt.show()

In [44]:
def plot_variances(sample_size):
    true_variance = np.var(heights)
    var_heights = []
    for n in range(5,sample_size):
        for j in range(30):
            var_height1 = np.var(np.random.choice(heights, n, replace=False), ddof=0)
            var_height2 = np.var(np.random.choice(heights, n, replace=False), ddof=1)
            var_height3 = np.var(np.random.choice(heights, n, replace=False), ddof=-1)
            var_heights.append((n, var_height1, var_height2, var_height3))
    sample_var = pd.DataFrame(var_heights, columns=['sample_size', 'variance1', 'variance2', 'variance3'])
    sample_var.plot.scatter(x='sample_size', y='variance1', figsize=(14, 3), alpha=0.5)
    plt.axhline(y=true_variance, c='r')
    plt.title("The Bias and Variance of the Population Variance Estimator (n)")
    sample_var.plot.scatter(x='sample_size', y='variance3', figsize=(14, 3), alpha=0.5)
    plt.axhline(y=true_variance, c='r')
    plt.title("The Bias and Variance of the Biased Sample Variance Estimator (n+1)")
    sample_var.plot.scatter(x='sample_size', y='variance2', figsize=(14, 3), alpha=0.5)
    plt.axhline(y=true_variance, c='r')
    plt.title("The Bias and Variance of the Sample Variance Estimator (n-1)")
    plt.show()

In [45]:
interact(plot_means, sample_size=(5,200));

- The red line above is the true average height, but because we don't want to ask 200 people about their height, we take a samples.

- The blue dots show the estimate of the average height after taking a sample. To give us an idea of how sampling works, we simulate taking multiple sample.

- The X axis shows the sample size we take, the blue dots show the likely average heights we'll conclude for a given sample size.

- Even though the true average height is around 7 feet, a small sample may lead us to think that it's actually 6.7 or 7.3 feet. 

- Notice that the red line is in the center of our estimates. On average, we are correct and have no bias.

- If we take a larger sample size we get a better estimate. Meaning that the variance of our estimate gets smaller with larger samples sizes.

In [46]:
interact(plot_variances, sample_size=(5,200));

- Not all estimators are created equal.

- The red line shows the true variance of height

- The top graph is the Population Variance estimator, while the bottom graph is the Sample Variance estimator.

- This is very subtle, but notice that the Population Variance estimator is not centered on the red line. It's actually biased and consistently underestimates the true variance, especially at low sample sizes.

- You may also notice that the scatter of of Population Variance estimator is smaller. That means the variance of the Population Variance estimator is smaller. That phrase can be really confusing, it's the variability of the estimator. 

- Play around with the sliders to get a good view of the graphs.

<a id="correlation-and-association"></a>
## Correlation and Association
---

Correlation measures how variables related to each other.

Typical when we talk about the pearson correlation coefficient which is a measure of **linear** association

We refer to perfect correlation as colinearity

![Example correlation values](./assets/images/correlation_examples.png)

<a id="codealong-correlation-in-pandas"></a>
### Codealong: Correlation in Pandas

**Objective: Explore options for measuring and visualizing correlation in Pandas**

#### Display the correlation matrix for all titanic variables.

In [47]:
# A:

#### Use seaborn to plot a heatmap of the correlation matrix.

The `sns.heatmap()` function will do this.

In [48]:
# Use seaborn to plot a correlation heatmap

In [49]:
# Take a closer look at survived and fare using a scatter plot

<a id="the-normal-distribution"></a>
## The Normal Distribution
---

**Objective: Introduce normal (gaussian distributions) and how to identify them.**

#####  Math Review
- What is an event space?
  - A listing of all possible occurances
- What is a probability distribution?
  - A function that describes how events occur in an event space
- What are general properties of probability distributions?
  - All probabilities of an event are between 0 and 1
  - The probability that something occurs is almost certain or 1.
  

<a id="what-is-the-normal-distribution"></a>
### What is the Normal Distribution?
- A normal distribution is often a key assumption to many models.
  - In practice, if the normal distribution assumption is not met, it's not the end of the world. Your model is just less efficient in most cases.

- The normal distribution depends upon the mean and the standard deviation.

- The mean determines the center of the distribution.  The standard deviation determines the height and width of the distribution.

- Normal distributions are symmetric, bell-shaped curves.

- When the standard deviation is large, the curve is short and wide.

- When the standard deviation is small, the curve it tall and narrow.

![normal distribution](../assets/images/normal.png)


#### Why do we care about Normal distributions?
- Shows up in nature very often
- Aggregated processes tend to distribute normally regardless of their underlying distribution provided that the processes are uncorrelated or weakly correlated (Central Limit Theorem)
- Good simplification that makes it easy to make approximations.

#### Plot a histogram of 1000 samples from a random normal distribution.

The `np.random.randn()` function will draw from a random normal distribution with mean 0 and standard deviation 1.

In [50]:
# Plot a histogram of several random normal samples from numpy

<a id="skewness"></a>
###  Skewness
- Skewness is a measure of the asymmetry of the distribution of a random variable about its mean.
- Skewness can be positive or negative, or even undefined.
- Notice that the mean, median, and mode are the same when there is no skew
![skewness](./assets/images/skewness---mean-median-mode.jpg)

#### Plot a lognormal distribution generated with numpy.

Take 1000 samples using `np.random.lognormal` and plot them on a histogram.

In [51]:
# Plot a lognormal distribution generated with numpy

#####  Real World Application - When mindfullness beats complexity
- Skewness is surprisingly important.
- Most algorithms implicitly use the mean by default when making approximations.
- If you know your data is heavily skewed you may have to either transform your data or set your algorithms to work with the median.
- In the DIDI tech challenge, changing a few default options quickly put you in the 80th percentile, ahead of some very brilliant programmers who missed the basics. This amount to switch the algorithm from estimating a mean to estimating a median.

<a id="kurtosis"></a>
### Kurtosis
- Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution.
- Datasets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. 

![kurtosis](./assets/images/kurtosis.jpg)

#####  Real World Application - Risk Analysis
- Long-tailed distributions with high kurtosis elude intuition, we naturally think the event is too improbable to pay attention to
- It's often the case that there is a a large cost associated with the very low probability event as is the case with hurricane damage
- It's unlikely you will get hit by a category 5 hurricane, but when you do, the damage is catastrophic
- Pay attention to what happens at the tails and whether it influences the problem at hand
- In these cases understanding the costs may be more important than understanding the risks

<a id="determining-the-distribution-of-your-data"></a>
## Determining the Distribution of Your Data
---

**Objective: Introduce histograms and density plots in Pandas.**

![](./assets/images/distributions.png)

#### Use the `.hist()` function of your titantic DataFrame to plot histograms of all the variables in your data.

In [52]:
# Plot all variables in titanic using histograms

#### Use the `.plot.box()` built-in function of your titanic DataFrame to plot boxplots of your variables.

In [53]:
# Plotting all histograms can be unweildly, boxplots can be more concise

<a id="exercise"></a>
### Exercise

1. Look at the titanic data variables
- Are any of them normal?
- Are any skewed?
- How might this affect our modeling?

![](./assets/images/visualization_flow_chart.jpg)

<a id="topic-review"></a>
## Topic Review
---

- We covered several different types of summary statistics, what are they?
- We covered three different types of visualizations, which ones?
- Describe bias and variance and why they are important.
- What are some important characteristics of distributions?

**Any further questions?**