# Case Study on ANOVA<br>

A Company has offices in four different zones. The company wishes to investigate the following :<br>
-- The mean sales generated by each zone.<br>
-- Total sales generated by all the zones for each month.<br>
-- Check whether all the zones generate the same amount of sales.<br>
**Input data=Sales_data_zone_wise.csv**<br>                                                                                                 

In [None]:
#Importing relevant libraries

In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import f_oneway

In [None]:
#Reading the dataset

In [3]:
data= pd.read_csv('G:\my trials\Casestudy\Sales_data_zone_wise.csv')
data.head()

Unnamed: 0,Month,Zone - A,Zone - B,Zone - C,Zone - D
0,Month - 1,1483525,1748451,1523308,2267260
1,Month - 2,1238428,1707421,2212113,1994341
2,Month - 3,1860771,2091194,1282374,1241600
3,Month - 4,1871571,1759617,2290580,2252681
4,Month - 5,1244922,1606010,1818334,1326062


In [None]:
# Performing general analysis

In [3]:
data.shape

(29, 5)

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29 entries, 0 to 28
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Month     29 non-null     object
 1   Zone - A  29 non-null     int64 
 2   Zone - B  29 non-null     int64 
 3   Zone - C  29 non-null     int64 
 4   Zone - D  29 non-null     int64 
dtypes: int64(4), object(1)
memory usage: 1.3+ KB


**Insights**<br>
-  The datset consists of 29 raws and 5 columns.<br>
-  The first  column is of object type and remaining are interger type.<br>
-  The dataset contains no null values.<br> 

##### **Q. Find the mean sales generated by each zone**

In [4]:
data.describe()

Unnamed: 0,Zone - A,Zone - B,Zone - C,Zone - D
count,29.0,29.0,29.0,29.0
mean,1540493.0,1755560.0,1772871.0,1842927.0
std,261940.1,168389.9,333193.7,375016.5
min,1128185.0,1527574.0,1237722.0,1234311.0
25%,1305972.0,1606010.0,1523308.0,1520406.0
50%,1534390.0,1740365.0,1767047.0,1854412.0
75%,1820196.0,1875658.0,2098463.0,2180416.0
max,2004480.0,2091194.0,2290580.0,2364132.0


**Insights:**<br>
Comparing the  mean sales generated by each zone,<br>
-  Zone D have the highest sales .<br>
-  The lowest sales is in the Zone A.<br>
-  The Zone B and C have alomst same mean of Sales.<br> 

##### Q.Find the total sales generated by all the zones for each month.

In [4]:
data['Total Sales by all Zones'] = data['Zone - A'] + data['Zone - B']+data['Zone - C'] + data['Zone - D']
data.head()

Unnamed: 0,Month,Zone - A,Zone - B,Zone - C,Zone - D,Total Sales by all Zones
0,Month - 1,1483525,1748451,1523308,2267260,7022544
1,Month - 2,1238428,1707421,2212113,1994341,7152303
2,Month - 3,1860771,2091194,1282374,1241600,6475939
3,Month - 4,1871571,1759617,2290580,2252681,8174449
4,Month - 5,1244922,1606010,1818334,1326062,5995328


In [7]:
data.nlargest(5,'Total Sales by all Zones')

Unnamed: 0,Month,Zone - A,Zone - B,Zone - C,Zone - D,Total Sales by all Zones
3,Month - 4,1871571,1759617,2290580,2252681,8174449
7,Month - 8,1625696,1665534,2161754,2363315,7816299
23,Month - 24,1880820,1752873,2098463,2052591,7784747
21,Month - 22,1481619,1527574,2255729,2295079,7560001
27,Month - 28,1616640,1547991,2128022,2178267,7470920


In [8]:
data.nsmallest(5,'Total Sales by all Zones')

Unnamed: 0,Month,Zone - A,Zone - B,Zone - C,Zone - D,Total Sales by all Zones
12,Month - 13,1254939,1588473,1348629,1733383,5925424
14,Month - 15,1128185,1804613,1767047,1234311,5934156
4,Month - 5,1244922,1606010,1818334,1326062,5995328
24,Month - 25,1256333,1622671,1521792,1695122,6095918
11,Month - 12,1537539,1875658,1237722,1460165,6111084


In [9]:
data.describe()

Unnamed: 0,Zone - A,Zone - B,Zone - C,Zone - D,Total Sales by all Zones
count,29.0,29.0,29.0,29.0,29.0
mean,1540493.0,1755560.0,1772871.0,1842927.0,6911851.0
std,261940.1,168389.9,333193.7,375016.5,590891.9
min,1128185.0,1527574.0,1237722.0,1234311.0,5925424.0
25%,1305972.0,1606010.0,1523308.0,1520406.0,6506659.0
50%,1534390.0,1740365.0,1767047.0,1854412.0,7032783.0
75%,1820196.0,1875658.0,2098463.0,2180416.0,7155515.0
max,2004480.0,2091194.0,2290580.0,2364132.0,8174449.0


**Insights**<br>
-- Total Sales by all zones have an mean of 6911851.<br>
-- The 'month-4' have the highest sales generated by all zones.<br>
-- The lowest sales generated by all zones is in the 'month-13'.<br>


##### Q. Check whether all the zones generate the same amount of sales

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean.<br>
**Null hypothesis:** Groups means are equal<br>
**Alternative hypothesis:** At least, one group mean is different from other groups<br>
Significance level : 0.05

In [10]:
a=data['Zone - A']
b=data['Zone - B']
c=data['Zone - C']
d=data['Zone - D']

In [13]:
# stats f_oneway functions takes the groups as input and returns ANOVA F and p value
F, p = f_oneway(a, b, c, d)
print('The F statics value is: ', F)
print('The p value is: ', p)

The F statics value is:  5.672056106843581
The p value is:  0.0011827601694503335


**Insights**<br>
-- The p-value is below 0.05. <br>
-- we can reject the null hypothesis.<br>
-- This means that at least one group mean is significantly different.