# Case Study on Testing of Hypothesis

## Question

A company started to invest in digital marketing as a new way of their product
promotions.For that they collected data and decided to carry out a study on it.

   **● The company wishes to clarify whether there is any increase in sales after
    stepping into digital marketing.**

   **● The company needs to check whether there is any dependency between the
    features “Region” and “Manager”.**

Help the company to carry out their study with the help of data provided.

## Answer

In [2]:
# importing libraries 
import numpy as np
import pandas as pd
import scipy
from scipy import stats

In [3]:
# reading dataset to python environment
data=pd.read_csv('Sales_add.csv')
data

Unnamed: 0,Month,Region,Manager,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Month-1,Region - A,Manager - A,132921,270390
1,Month-2,Region - A,Manager - C,149559,223334
2,Month-3,Region - B,Manager - A,146278,244243
3,Month-4,Region - B,Manager - B,152167,231808
4,Month-5,Region - C,Manager - B,159525,258402
5,Month-6,Region - A,Manager - B,137163,256948
6,Month-7,Region - C,Manager - C,130625,222106
7,Month-8,Region - A,Manager - A,131140,230637
8,Month-9,Region - B,Manager - C,171259,226261
9,Month-10,Region - C,Manager - B,141956,193735


In [21]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 5 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Month                           22 non-null     object
 1   Region                          22 non-null     object
 2   Manager                         22 non-null     object
 3   Sales_before_digital_add(in $)  22 non-null     int64 
 4   Sales_After_digital_add(in $)   22 non-null     int64 
dtypes: int64(2), object(3)
memory usage: 1008.0+ bytes


In [7]:
# checking for null values
data.isna().sum()

Month                             0
Region                            0
Manager                           0
Sales_before_digital_add(in $)    0
Sales_After_digital_add(in $)     0
dtype: int64

In [None]:
# no null values present

In [9]:
data.shape

(22, 5)

In [10]:
data.nunique()

Month                             22
Region                             3
Manager                            3
Sales_before_digital_add(in $)    22
Sales_After_digital_add(in $)     22
dtype: int64

In [20]:
data.describe()

Unnamed: 0,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
count,22.0,22.0
mean,149239.954545,231123.727273
std,14844.042921,25556.777061
min,130263.0,187305.0
25%,138087.75,214960.75
50%,147444.0,229986.5
75%,157627.5,250909.0
max,178939.0,276279.0


**This is data set showing the sales of a company reported by different managers of different regions
Data set have 5 features: 3 of them are object datatype and the rest are integer datatype with 64 bit.**

**Sales of 3 regions(A,B,C) reported by 3 Managers (A,B,C)**

In [22]:
data['Region'].value_counts()

Region - A    10
Region - B     7
Region - C     5
Name: Region, dtype: int64

In [23]:
data['Manager'].value_counts()

Manager - A    9
Manager - B    7
Manager - C    6
Name: Manager, dtype: int64

## 1. The company wishes to clarify whether there is any increase in sales after stepping into digital marketing.

This is data set having less than 30 entries, so that we can perform a t-test.

  we take the level of significance(p-value) as 0.05 with 95%
  
  setting up the two hypothesis:
  


**H0: There is no increase in sales after stepping into digital marketing**
    
**H1: There is an increase in sales after stepping into digital marketing**

The alternative hypothesis claims that there is an increase in sales, so we do a one tailed t-test(Right tailed).

Since the test is conducted in the same set of population, we can perform a paired sample t-test

In [4]:
# Assigning the the two samples into variables,a and b respectively
a=data['Sales_before_digital_add(in $)']
b=data['Sales_After_digital_add(in $)']

In [18]:
# calculating the T score using scipy
scipy.stats.ttest_rel(a, b, axis=0, nan_policy='propagate')

Ttest_relResult(statistic=-12.09070525287017, pvalue=6.336667004575778e-11)

### Inference:

* **since p-value(6.336667004575778e-11) is less than 0.05, we reject the null hypothesis.**
* **that is there is an increase in sales after stepping into digital marketing.**

## 2. The company needs to check whether there is any dependency between the features “Region” and “Manager”.

Since the region and manager features have catogorical values, we can perform chi-square test.

The company needs to know the dependancy between the two features, so chi-squared Test for independence.

**H0: There is no dependency between the features "Region" & "Manager"**

**H1: There is dependency between the features "Region" & "Manager"**


In [11]:
# creating a contigency table using pandas.crosstab
cont_table=pd.crosstab(data["Region"], data["Manager"])
print('Cross tabulation of observed values:\n')
cont_table

Cross tabulation of observed values:



Manager,Manager - A,Manager - B,Manager - C
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Region - A,4,3,3
Region - B,4,1,2
Region - C,1,3,1


**the values in this table represent frequencies.**

In [17]:
# calculating chi_square value using scipy.stats.chi2_contigency test
stat,p_val,dof,exp_val=scipy.stats.chi2_contingency(cont_table, correction=True, lambda_=None)
print('Statistic vlaue is:',stat)
print('P-value is: ',p_val)
print('Degree of freedom vlaue is: %0.3f\n'%dof)
print('The expected vaues are:\n',exp_val)

Statistic vlaue is: 3.050566893424036
P-value is:  0.5493991051158094
Degree of freedom vlaue is: 4.000

The expected vaues are:
 [[4.09090909 3.18181818 2.72727273]
 [2.86363636 2.22727273 1.90909091]
 [2.04545455 1.59090909 1.36363636]]


### Inference:

* **Since the p-value is > 0.05, we failed to reject the null hypothesis.**
* **thus we can say that the features "Region" and "Manager" are independent to each other. There is no relationship between the two features.**

submitted by:**Shameema Muneer,DSA,Bach_3**