# Case Study on Testing of Hypothesis

### QUESTION: A company started to invest in digital marketing as a new way of their product promotions.For that they collected data and decided to carry out a study on it.

In [1]:
#Importing libraries
import pandas as pd
import scipy.stats as stat

In [2]:
#Reading the dataset
data=pd.read_csv('Sales_add.csv')
data.head()

Unnamed: 0,Month,Region,Manager,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Month-1,Region - A,Manager - A,132921,270390
1,Month-2,Region - A,Manager - C,149559,223334
2,Month-3,Region - B,Manager - A,146278,244243
3,Month-4,Region - B,Manager - B,152167,231808
4,Month-5,Region - C,Manager - B,159525,258402


In [3]:
#Chcking the total number of rows and columns in our dataset
data.shape

(22, 5)

In [4]:
#Checking if any null values present or not
data.isnull().sum()

Month                             0
Region                            0
Manager                           0
Sales_before_digital_add(in $)    0
Sales_After_digital_add(in $)     0
dtype: int64

In [5]:
#Getting the details of the dataset using info()
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 5 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Month                           22 non-null     object
 1   Region                          22 non-null     object
 2   Manager                         22 non-null     object
 3   Sales_before_digital_add(in $)  22 non-null     int64 
 4   Sales_After_digital_add(in $)   22 non-null     int64 
dtypes: int64(2), object(3)
memory usage: 1008.0+ bytes


# 1. The company wishes to clarify whether there is any increase in sales after stepping into digital marketing.


In [6]:
#Getting the statistical values for our dataset
data.describe()

Unnamed: 0,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
count,22.0,22.0
mean,149239.954545,231123.727273
std,14844.042921,25556.777061
min,130263.0,187305.0
25%,138087.75,214960.75
50%,147444.0,229986.5
75%,157627.5,250909.0
max,178939.0,276279.0


In [7]:
#Sum of sales based on regions before and after stepping into digital marketing
data1=data.groupby('Region',as_index=False).sum()
data1

Unnamed: 0,Region,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Region - A,1482049,2388531
1,Region - B,1053665,1601095
2,Region - C,747565,1095096


In [8]:
#We can use the mean and total sum of sales in each region from the above two tables to get an idea about the sales  . 

### HYPOTHESIS TESTING 

In [9]:
#Sample size is less than 30 in our data set so we are using t-test here.

Taking "H0 : NULL HYPOTHESIS" AND "H1 : ALTERNATIVE HYPOTHESIS"

### H0:There is no significant increase in sales after stepping into digital marketing


### H1:There is significant increase in sales after stepping into digital marketing

In [10]:
#Using paired sample t-test to find the P Value.
#Doing t-test on two related samples of data.

In [11]:
ttest,p_val=stat.ttest_rel(data['Sales_before_digital_add(in $)'],data['Sales_After_digital_add(in $)'])
print(p_val)

6.336667004575778e-11


In [12]:
#We take the sinificance value alpha = .05

In [13]:
if p_val<0.05:
    print(" Since P value",p_val,"is less than .05(alpha value), we can reject the null hypothesis")
else:
    print(" Since P value",p_val,"is greater than .05(alpha value), we cant reject the null hypothesis")

 Since P value 6.336667004575778e-11 is less than .05(alpha value), we can reject the null hypothesis


### "Here we rejected the  Null Hypothesis .So we can conclude that there is increase in sales after stepping into digital marketing".

# 2. The company needs to check whether there is any dependency between the features "Region" and "Manager".

In [14]:
#value_counts() function used for returning count of unique values 
data['Region'].value_counts()

Region - A    10
Region - B     7
Region - C     5
Name: Region, dtype: int64

In [15]:
data['Manager'].value_counts()

Manager - A    9
Manager - B    7
Manager - C    6
Name: Manager, dtype: int64

Taking "H0 : NULL HYPOTHESIS" AND "H1 : ALTERNATIVE HYPOTHESIS"

### H0: There is no dependancy between Region & Manager


### H1: There is dependancy between Region & Manager

In [16]:
#Contingency table between region and manager.
pd.crosstab(data.Region, data.Manager)

Manager,Manager - A,Manager - B,Manager - C
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Region - A,4,3,3
Region - B,4,1,2
Region - C,1,3,1


In [17]:
#Importing chi2_contingency from scipy library
from scipy.stats import chi2_contingency

In [18]:
#Finding the Chi-Square(chi_sq),p value (p_value) ,degree of freedom (df) and the expected cell counts (exp)
test_stat,p_value,df,exp= chi2_contingency(pd.crosstab(data.Region, data.Manager))

In [19]:
#printing Chi-Square(chi_sq),p value(p_value),degree of freedom(df) and expected cell counts(exp)

In [20]:
print(test_stat)

3.050566893424036


In [21]:
print(p_value)

0.5493991051158094


In [22]:
print(df)

4


In [23]:
print(exp)

[[4.09090909 3.18181818 2.72727273]
 [2.86363636 2.22727273 1.90909091]
 [2.04545455 1.59090909 1.36363636]]


In [24]:
#We take the sinificance value at alpha = .05

In [25]:
if p_value<0.05:
    print(" Since P value",p_value,"is less than .05(alpha value), we can reject the null hypothesis")
else:
    print(" Since P value",p_value,"is greater than .05(alpha value), we cant reject the null hypothesis")

 Since P value 0.5493991051158094 is greater than .05(alpha value), we cant reject the null hypothesis


#### Calculated P Value=0.5493991051158094 which is greater than the alpha value .05 so we accept the Null Hypothesis.we can conclude that there is no dependancy between region and manager, they are independant. Also we can consider that the calculated chi-square value (3.0505),which is less than the chi square table value 9.488( alpha value=0.05 with degrees of freedom=4 ),so we accept the null hypothesis.