# Case Study on ANOVA

XYZ Company has offices in four different zones. The company wishes to
investigate the following :

● The mean sales generated by each zone.

● Total sales generated by all the zones for each month.

● Check whether all the zones generate the same amount of sales.

Help the company to carry out their study with the help of data provided.

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats
from scipy.stats import ttest_ind

In [5]:
data=pd.read_csv('Sales_data_zone_wise.csv')

In [7]:
data.head()

Unnamed: 0,Month,Zone - A,Zone - B,Zone - C,Zone - D
0,Month - 1,1483525,1748451,1523308,2267260
1,Month - 2,1238428,1707421,2212113,1994341
2,Month - 3,1860771,2091194,1282374,1241600
3,Month - 4,1871571,1759617,2290580,2252681
4,Month - 5,1244922,1606010,1818334,1326062


In [9]:
data.tail()

Unnamed: 0,Month,Zone - A,Zone - B,Zone - C,Zone - D
24,Month - 25,1256333,1622671,1521792,1695122
25,Month - 26,1422853,1715465,1853636,1520406
26,Month - 27,1384426,1983163,1611169,1289160
27,Month - 28,1616640,1547991,2128022,2178267
28,Month - 29,1310654,1660092,1947119,1854412


# Data
The data contains monthly sales of 4 zones of a company.

# 1.The mean sales generated by each zone.

In [11]:
data.mean()

Zone - A    1.540493e+06
Zone - B    1.755560e+06
Zone - C    1.772871e+06
Zone - D    1.842927e+06
dtype: float64

In [24]:
data.describe()

Unnamed: 0,Zone - A,Zone - B,Zone - C,Zone - D
count,29.0,29.0,29.0,29.0
mean,1540493.0,1755560.0,1772871.0,1842927.0
std,261940.1,168389.9,333193.7,375016.5
min,1128185.0,1527574.0,1237722.0,1234311.0
25%,1305972.0,1606010.0,1523308.0,1520406.0
50%,1534390.0,1740365.0,1767047.0,1854412.0
75%,1820196.0,1875658.0,2098463.0,2180416.0
max,2004480.0,2091194.0,2290580.0,2364132.0


# 2.Total sales generated by all the zones for each month.


In [17]:
tot_sales=data.sum(axis=1)

In [18]:
print('Total sales generated by all the zones for each month')
print(tot_sales)

Total sales generated by all the zones for each month
0     7022544
1     7152303
2     6475939
3     8174449
4     5995328
5     7151387
6     7287108
7     7816299
8     6703395
9     7128210
10    7032783
11    6111084
12    5925424
13    7155515
14    5934156
15    6506659
16    7149383
17    7083490
18    6971953
19    7124599
20    7389597
21    7560001
22    6687919
23    7784747
24    6095918
25    6512360
26    6267918
27    7470920
28    6772277
dtype: int64


# 3.Check whether all the zones generate the same amount of sale.

In [19]:
#Inorder to Check whether all zones are generating same amount of sales,we perform the ANOVA Test.
#NULL HYPOTHESIS: All group means are Equal.H0: μ1=μ2=…=μp
#Alternative Hypothesis: Atleast one group Means is different.H1: All μ are not equal

In [20]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

In [26]:
F,p = stats.f_oneway(data['Zone - A'],data['Zone - B'],data['Zone - C'],data['Zone - D'])
print('F-Statistic=%.3f, p=%.3f' % (F, p))

F-Statistic=5.672, p=0.001


# Result
Since the p value is less than 0.05, we can reject the null hypothesis.
All zones donot generate the same amount of sale.