## ANOVA

### Objectives:
1. Explain the dataset
2. Check the summary statistics and discuss the max, min, avg, median, and percentiles.
3. The manager wants to find out whether the same amount was spent for the three advertisements `(TV , Radio and Newspaper)`. Comment on your findings.

### Importing relevant libraries

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

### Reading the dataset

In [7]:
df = pd.read_csv('Data/Advertising.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,TV,Radio,Newspaper,Sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9


### 1. Explain the dataset

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  200 non-null    int64  
 1   TV          200 non-null    float64
 2   Radio       200 non-null    float64
 3   Newspaper   200 non-null    float64
 4   Sales       200 non-null    float64
dtypes: float64(4), int64(1)
memory usage: 7.9 KB


#### Insights:
- The data contains 200 rows and 5 columns
- The dataset appears not to have null values
- The data contains amount spent on advertisement through various platforms like TV, Radio, & Newspaper

In [11]:
df.drop('Unnamed: 0', axis=1, inplace=True)

Dropping unnecessary column : Unnamed

### 2. Checking Summary statistics

In [14]:
df[['TV', 'Radio','Newspaper']].describe()

Unnamed: 0,TV,Radio,Newspaper
count,200.0,200.0,200.0
mean,147.0425,23.264,30.554
std,85.854236,14.846809,21.778621
min,0.7,0.0,0.3
25%,74.375,9.975,12.75
50%,149.75,22.9,25.75
75%,218.825,36.525,45.1
max,296.4,49.6,114.0


#### Insights:
- Average, maximum, minimum, median, percentile amount spent on adverting through:

> TV - $\mu_{1}$:147.04, $\sigma_{1}$: 85.85

> Radio - $\mu_{2}$: 23.26, $\sigma_{2}$: 14.85

> Newspaper - $\mu_{3}$: 30.55, $\sigma_{3}$: 21.78 
- Means vary greatly across each group
- Maximum amount of advertisement spent through TV
- Least amount spent through Radio
- Median amount spent through TV adv:149.75, Radio: 22.9, Newspaper: 25.75

### 3. Comparing amount spent through each media

To compare multiple means we use ANOVA Test

- $H_{0}$ : Average amount spent through each media is same; that is: $\mu_{1}$ = $\mu_{2}$ = $\mu_{3}$
- $H_{a}$ : Averages are not equal; that is: $\mu_{1}$ $\neq$ $\mu_{2}$ $\neq$ $\mu_{3}$
- Significance level $\alpha$: 0.05

In [5]:
F_value , p_value = stats.f_oneway(df['TV'], df['Radio'], df['Newspaper'])

In [17]:
print('\nThe f value is: ', round(F_value, 3))
print('\nThe p value for the given f value assuming the null hypothesis is true is: ', p_value)


The f value is:  358.851

The p value for the given f value assuming the null hypothesis is true is:  4.552931539744962e-103


#### Insights:
- The p value is very smaller than the significance value $\alpha$ = 0.05
- We therefore reject the null hypothesis that the means are equal.

### Conclusion:
> The amount spent on advertisement through TV, Radio and Newspaper are not equal