# Case study 04 (Week 08)

## **Description**

***Case Study on Testing of Hypothesis***

A company started to invest in digital marketing as a new way of their product promotions. For that, they collected data and decided to carry out a study on it.

1. The company wishes to clarify whether there is any increase in sales after stepping into digital marketing.

2. The company needs to check whether there is any dependency between the features “**Region**” and “**Manager**”.

Help the company to carry out their study with the help of data provided.

In [1]:
# NumPy Library as np
import numpy as np
# Pandas Library as 'pd'
import pandas as pd
# SciKit Learn library itself
import scipy

In [25]:
sdata = pd.read_csv('Sales_add.csv')
sdata.head(10)

Unnamed: 0,Month,Region,Manager,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Month-1,Region - A,Manager - A,132921,270390
1,Month-2,Region - A,Manager - C,149559,223334
2,Month-3,Region - B,Manager - A,146278,244243
3,Month-4,Region - B,Manager - B,152167,231808
4,Month-5,Region - C,Manager - B,159525,258402
5,Month-6,Region - A,Manager - B,137163,256948
6,Month-7,Region - C,Manager - C,130625,222106
7,Month-8,Region - A,Manager - A,131140,230637
8,Month-9,Region - B,Manager - C,171259,226261
9,Month-10,Region - C,Manager - B,141956,193735


In [47]:
sdata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 5 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Month                           22 non-null     object
 1   Region                          22 non-null     object
 2   Manager                         22 non-null     object
 3   Sales_before_digital_add(in $)  22 non-null     int64 
 4   Sales_After_digital_add(in $)   22 non-null     int64 
dtypes: int64(2), object(3)
memory usage: 1008.0+ bytes


In [6]:
sdata.isna().sum().to_frame('Null value count')

Unnamed: 0,Null value count
Month,0
Region,0
Manager,0
Sales_before_digital_add(in $),0
Sales_After_digital_add(in $),0


In [27]:
print("1. Unique values in 'Month' feature :\n",sdata.Month.unique(),"\n")
print("2. Unique values in 'Region' feature :\n",sdata.Region.unique(),"\n")
print("3. Unique values in 'Manager' feature :\n",sdata.Manager.unique())

1. Unique values in 'Month' feature :
 ['Month-1' 'Month-2' 'Month-3' 'Month-4' 'Month-5' 'Month-6' 'Month-7'
 'Month-8' 'Month-9' 'Month-10' 'Month-11' 'Month-12' 'Month-13'
 'Month-14' 'Month-15' 'Month-16' 'Month-17' 'Month-18' 'Month-19'
 'Month-20' 'Month-21' 'Month-22'] 

2. Unique values in 'Region' feature :
 ['Region - A' 'Region - B' 'Region - C'] 

3. Unique values in 'Manager' feature :
 ['Manager - A' 'Manager - C' 'Manager - B']


In [24]:
sdata.describe(exclude='int64')

Unnamed: 0,Month,Region,Manager
count,22,22,22
unique,22,3,3
top,Month-1,Region - A,Manager - A
freq,1,10,9


In [46]:
sdata.drop(['Month','Sales_before_digital_add(in $)','Sales_After_digital_add(in $)'], axis=1).value_counts().to_frame('Count')

Unnamed: 0_level_0,Unnamed: 1_level_0,Count
Region,Manager,Unnamed: 2_level_1
Region - A,Manager - A,4
Region - B,Manager - A,4
Region - A,Manager - B,3
Region - A,Manager - C,3
Region - C,Manager - B,3
Region - B,Manager - C,2
Region - B,Manager - B,1
Region - C,Manager - A,1
Region - C,Manager - C,1


### Data summary :
- There is no null values in the Data
- Unique entries in Month, Region & Manager are 22,3 & 3 respectively
- All of the managers are participated in all Regions of sales.
- Out of the 3 regions Manager A supercedes others by 9 (Top = Region A(4 Reports))

## Analysis 01
<ins> The company wishes to clarify whether there is any increase in sales after stepping into digital marketing. </ins>

### Defining Hypothesis and choosing a testing method :
*   The Null Hypothesis, ***H(0) :  sales not increased after upgrading into digital marketing.***
*   The Alternative Hypothesis, ***H(a) : sales increased after upgrading into digital marketing***
*   Significance level = 5%
*   Here, we are doing **One Tailed T - Test**

In [54]:
from scipy import stats
ttest,pval = stats.ttest_rel(sdata['Sales_After_digital_add(in $)'], sdata['Sales_before_digital_add(in $)'])
print('Test statistic is:',ttest)
print('P-value for Two tailed test is: %0.15f'%pval)
if(pval<=0.05):
    print("We reject null hypothesis")
else:
    print("We accept null hypothesis")

Test statistic is: 12.09070525287017
P-value for Two tailed test is: 0.000000000063367
We reject null hypothesis


### Conclusion :
Since the **P-Value is lesser than the level of significance** , we reject our Null Hypothesis. Thus, we concludes that - **Upgradation in the marketing caused an increase in the sales**.

## Analysis 02
<ins> The company needs to check whether there is any dependency between the features “**Region**” and “**Manager**”. </ins>

### Defining Hypothesis and choosing a testing method :
*   The Null Hypothesis, ***H(0) : There is no dependency between the features "Region" & "Manager".***
*   The Alternative Hypothesis, ***H(a) : Dependency present between the features "Region" & "Manager".***
*   Significance level = 5%
*   Here, we are doing **chi-square test (Choosen because both features having catogorical values)**

In [73]:
crosstab_sdata = pd.crosstab(sdata.Region, sdata.Manager)
crosstab_sdata

Manager,Manager - A,Manager - B,Manager - C
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Region - A,4,3,3
Region - B,4,1,2
Region - C,1,3,1


In [72]:
a,b,c,d=stats.chi2_contingency(crosstab_sdata)
print('Statistic Value:',a,'\nP Value:',b,'\nDegree of Freedom :', c,'\nExpected Value matrix:\n',d)

Statistic Value: 3.050566893424036 
P Value: 0.5493991051158094 
Degree of Freedom : 4 
Expected Value matrix:
 [[4.09090909 3.18181818 2.72727273]
 [2.86363636 2.22727273 1.90909091]
 [2.04545455 1.59090909 1.36363636]]


### Conclusion :
Since the **P-Value is greater than the level of significance** , we reject our Null Hypothesis. Thus, we concludes that - **There is no dependency present between Region & Manager**.