# Tips Dataset Clustering Exercises

Create either a python script or a jupyter notebook named ```explore_tips``` that explores the tips data set that is built in to seaborn. Perform at least 1 t-test and 1 chi square test.

You can load the data set like this:

> ```import seaborn as sns```
> 
> ```sns.load_dataset('tips')```

### Load Data

In [4]:
import pandas as pd
import numpy as np
import seaborn as sns
from math import sqrt
from scipy import stats
import matplotlib.pyplot as plt

In [5]:
df = sns.load_dataset('tips')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [6]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
total_bill,244.0,19.785943,8.902412,3.07,13.3475,17.795,24.1275,50.81
tip,244.0,2.998279,1.383638,1.0,2.0,2.9,3.5625,10.0
size,244.0,2.569672,0.9511,1.0,2.0,2.0,3.0,6.0


### T-Test

$
\begin{align*}
   H_0 & : \text{There is no difference between smokers' tips and the overall average tip.}
   \\
   H_a & : \text{There is a difference between smokers' tips and the overall average tip.}
   \\
    \alpha & : \text{0.05}
\end{align*}
$

In [9]:
df.tip.mean(), df.tip.median()

(2.9982786885245902, 2.9)

They are close enough to normal, so now I can calcuate the t-statistic and the p-value.

In [11]:
smokers = df[df.smoker == 'Yes']

In [13]:
t, p = stats.ttest_1samp(smokers.tip, df.tip.mean())

print(f't = {t:.3f}')
print(f'p = {p:.3f}')

t = 0.072
p = 0.943


The **p-value** was .943, above the alpha, so we **fail to reject the null hypothesis** that there is no difference between smokers' tips and the overall average tip.

In [15]:
df.head(3)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3


### Chi-Square Test

$
\begin{align*}
   H_0 & : \text{Smokers and day of the week are independent of each other.}
   \\
   H_a & : \text{Smokers and day of the week are not independent of each other.}
   \\
    \alpha & : \text{0.05}
\end{align*}
$

In [18]:
'''
chi2, p, degf, expected = stats.chi2_contingency(observed)

print('Observed\n')
print(observed.values)
print('---\nExpected\n')
print(expected)
print('---\n')
print(f'chi^2 = {chi2:.4f}')
print(f'p     = {p:.4f}')
'''

"\nchi2, p, degf, expected = stats.chi2_contingency(observed)\n\nprint('Observed\n')\nprint(observed.values)\nprint('---\nExpected\n')\nprint(expected)\nprint('---\n')\nprint(f'chi^2 = {chi2:.4f}')\nprint(f'p     = {p:.4f}')\n"