# Case Study on Testing of Hypothesis


## Ques. A company started to invest in digital marketing as a new way of their product promotions. For that they collected data and decided to carry out a study on it.

## ● The company wishes to clarify whether there is any increase in sales after stepping into digital marketing.

## ● The company needs to check whether there is any dependency between the features “Region” and “Manager”.

In [1]:
import numpy as np                      #importing numpy library
import pandas as pd                     #importing pandas library

In [2]:
data=pd.read_csv(r"C:\Users\Amby\Downloads\Sales_add.csv")        # loading the dataset

In [3]:
data.head()                                                       # table shows the first 5 rows 

Unnamed: 0,Month,Region,Manager,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Month-1,Region - A,Manager - A,132921,270390
1,Month-2,Region - A,Manager - C,149559,223334
2,Month-3,Region - B,Manager - A,146278,244243
3,Month-4,Region - B,Manager - B,152167,231808
4,Month-5,Region - C,Manager - B,159525,258402


In [4]:
data.shape                                        # finding total no: of rows and columns in the datset

(22, 5)

In [5]:
data.info()                                       # gives brief information about the dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 5 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Month                           22 non-null     object
 1   Region                          22 non-null     object
 2   Manager                         22 non-null     object
 3   Sales_before_digital_add(in $)  22 non-null     int64 
 4   Sales_After_digital_add(in $)   22 non-null     int64 
dtypes: int64(2), object(3)
memory usage: 1008.0+ bytes


In [6]:
data.isna().sum()                               # checks whether there is any null values present in the dataset.

Month                             0
Region                            0
Manager                           0
Sales_before_digital_add(in $)    0
Sales_After_digital_add(in $)     0
dtype: int64

# Case -1

## Q.1.To see whether there is any increase in sales after stepping into digital marketing.

### Null Hypothesis : There is no increase in sales after implementing digital marketing.


### Alternate Hypothesis : There is an increase in sales after implementing digital marketing. 

In [7]:
import statistics as stat                   #importing statistics library

In [20]:
from scipy.stats import ttest_rel           #importing stats module from scipy library along with ttest_rel function from scipy
                                             # ttest_rel is the function for paired T-test

In [9]:
t_stat,p_value=ttest_rel(data['Sales_before_digital_add(in $)'],data['Sales_After_digital_add(in $)']) # sales before and sales after digital_add are given as parameters

In [23]:
t_stat                                      # t-statistical value

-12.09070525287017

In [22]:
p_value                                     # probability value

6.336667004575778e-11

In [21]:
if p_value<0.05:                            # Here, the significance level is taken as 0.05
    print("Null Hypothesis Rejected")
else:
    print("Null Hypothesis Accepted")

Null Hypothesis Rejected


### Based on the results of the  paired T-test, where the null hypothesis was rejected, we can confidently state that there is a statistically significant increase in sales after implementing digital marketing. This indicates that the company's decision to invest in digital marketing has had a positive impact on their sales performance.

# Case-2

## Q.2.  To check whether there is any dependency between the features “Region” and                            “Manager”.

### Null Hypothesis : There is no dependency between the features "Region" and "Manager."

### Alternate Hypothesis : There is a dependency between the features "Region" and "Manager".

In [13]:
from scipy.stats import chi2_contingency                 # importing chi2_contigency function from scipy
                                                         # chi2_contingency is the function for chi-square test of independence

In [14]:
contingency_table=pd.crosstab(data['Region'],data['Manager'])  #creating contingency_table using crosstab

In [15]:
contingency_table

Manager,Manager - A,Manager - B,Manager - C
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Region - A,4,3,3
Region - B,4,1,2
Region - C,1,3,1


In [16]:
chi_stat,p_val,dof,exp_val=chi2_contingency(contingency_table)  # contingency_table as parameter to the function

In [17]:
print(chi_stat,p_val,dof,exp_val)

3.050566893424036 0.5493991051158094 4 [[4.09090909 3.18181818 2.72727273]
 [2.86363636 2.22727273 1.90909091]
 [2.04545455 1.59090909 1.36363636]]


In [24]:
p_val

0.5493991051158094

In [25]:
if p_val<0.05:                # Here,the significance level is taken as 0.05
    print("Null Hypothesis Rejected")
else:
    print("Null Hypothesis Accepted")

Null Hypothesis Accepted


### Based on the result of chi-square of test of independence, where the null hypothesis was accepted, it implies that there is no significant relationship or dependency between the features "Region" and "Manager". This suggests that the two features are independent of each other.