# TESTING OF HYPOTHESIS

A company started to invest in digital marketing as a new way of their product
promotions.For that they collected data and decided to carry out a study on it.

● The company wishes to clarify whether there is any increase in sales after
stepping into digital marketing.

● The company needs to check whether there is any dependency between the
features “Region” and “Manager”.

Help the company to carry out their study with the help of data provided.

# Importing Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
from scipy.stats import chi2_contingency 

# Read the dataset

In [2]:
data=pd.read_csv('Sales_add.csv')

# Printing the dataset

In [3]:
data

Unnamed: 0,Month,Region,Manager,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Month-1,Region - A,Manager - A,132921,270390
1,Month-2,Region - A,Manager - C,149559,223334
2,Month-3,Region - B,Manager - A,146278,244243
3,Month-4,Region - B,Manager - B,152167,231808
4,Month-5,Region - C,Manager - B,159525,258402
5,Month-6,Region - A,Manager - B,137163,256948
6,Month-7,Region - C,Manager - C,130625,222106
7,Month-8,Region - A,Manager - A,131140,230637
8,Month-9,Region - B,Manager - C,171259,226261
9,Month-10,Region - C,Manager - B,141956,193735


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 5 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Month                           22 non-null     object
 1   Region                          22 non-null     object
 2   Manager                         22 non-null     object
 3   Sales_before_digital_add(in $)  22 non-null     int64 
 4   Sales_After_digital_add(in $)   22 non-null     int64 
dtypes: int64(2), object(3)
memory usage: 1008.0+ bytes


In [5]:
data.isna().sum()

Month                             0
Region                            0
Manager                           0
Sales_before_digital_add(in $)    0
Sales_After_digital_add(in $)     0
dtype: int64

In [6]:
data.describe()

Unnamed: 0,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
count,22.0,22.0
mean,149239.954545,231123.727273
std,14844.042921,25556.777061
min,130263.0,187305.0
25%,138087.75,214960.75
50%,147444.0,229986.5
75%,157627.5,250909.0
max,178939.0,276279.0


Insights:
    
    * Small data having only 21 items.
    * No Null values present in data.
    * The difference between the average sales before and after digital add is 81883.772728 (in $)

# ● The company wishes to clarify whether there is any increase in sales after stepping into digital marketing

# One Tailed Paired Sample T-Test

Hypothesis:
    
    * Ho: The sales before and after stepping into digital marketing are same.
    
    * Ha: The sales after stepping into digital marketing is greater than before.
    
    * alpha = 0.05
   

In [9]:
sales_BDA=data['Sales_before_digital_add(in $)']
sales_ADA=data['Sales_After_digital_add(in $)']

In [10]:
tvalue, pvalue=stats.ttest_rel(sales_ADA,sales_BDA,alternative='greater')

In [11]:
print("The t value of means is: ", round(tvalue, 4))
print("The p value based on the t score is: ", round(pvalue,4))

The t value of means is:  12.0907
The p value based on the t score is:  0.0


In [13]:
stats.ttest_rel(sales_ADA,sales_BDA,alternative='greater')

Ttest_relResult(statistic=12.09070525287017, pvalue=3.168333502287889e-11)

* The statistical value is 12.09

* p value is 0 which is less than alpha (i.e., 0.05) So,one can reject the null hypothesis in support of the alternative.

* Therefore, The sales after stepping into digital marketing is greater than before.

# ● The company needs to check whether there is any dependency between the features “Region” and “Manager”.


Hypothesis:

* Ho: The two features are independent.

* Ha: The two features are dependent.

* alpha = 0.05

In [14]:
data1=pd.crosstab(data.Region, data.Manager, margins=True,values=data['Sales_before_digital_add(in $)'],aggfunc=np.sum)

In [15]:
data1

Manager,Manager - A,Manager - B,Manager - C,All
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Region - A,594514,466851,420684,1482049
Region - B,570900,152167,330598,1053665
Region - C,147463,469477,130625,747565
All,1312877,1088495,881907,3283279


In [16]:
valueBDA = np.array([data1.iloc[0][0:3].values,
                  data1.iloc[1][0:3].values,
                  data1.iloc[2][0:3].values])
stat, p, dof, expected=chi2_contingency(valueBDA)

In [17]:
print(chi2_contingency(valueBDA)[0:3])
alpha = 0.05
print("p value is " + str(p))

(474356.9740349385, 0.0, 4)
p value is 0.0


In [27]:
if p <= alpha:
    print('Dependent (reject Ho)')
else:
    print('Independent (Ho holds true)')

Dependent (reject Ho)


In [19]:
data2=pd.crosstab(data.Region, data.Manager, margins=True,values=data['Sales_After_digital_add(in $)'],aggfunc=np.sum)

In [20]:
data2

Manager,Manager - A,Manager - B,Manager - C,All
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Region - A,1030437,656832,701262,2388531
Region - B,939851,231808,429436,1601095
Region - C,229336,643654,222106,1095096
All,2199624,1532294,1352804,5084722


In [21]:
valueADA = np.array([data2.iloc[0][0:3].values,
                  data2.iloc[1][0:3].values,
                  data2.iloc[2][0:3].values])
stat, p, dof, expected=chi2_contingency(valueADA)

In [23]:
print(chi2_contingency(valueADA)[0:3])
alpha = 0.05
print("p value is " + str(p))

(671476.3637089065, 0.0, 4)
p value is 0.0


In [28]:
if p <= alpha:
    print('Dependent (reject Ho)')
else:
    print('Independent (Ho holds true)')

Dependent (reject Ho)


Before and After stepping into Digital Marketing, 

 * The p value is 0 and Degree of Freedom is 4.
    
 * The statistical value is 474356.9740349385 & 671476.3637089065 respectively.

 * p value is 0 which is less than alpha (i.e., 0.05) So,one can reject the null hypothesis in support of the alternative.

 * Therefore, the features "Region" and "Manager" are dependent.

 