###### Problem:
A company started to invest in digital marketing as a new way of their product promotions. For that they collected data and decided to carry out a study on it.

● The company wishes to clarify whether there is any increase in sales after stepping into digital marketing.

● The company needs to check whether there is any dependency between the features “Region” and “Manager”.

Help the company to carry out their study with the help of data provided.

In [1]:
import pandas as pd
import numpy as np

In [4]:
data = pd.read_csv('Sales_add.csv')
data

Unnamed: 0,Month,Region,Manager,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Month-1,Region - A,Manager - A,132921,270390
1,Month-2,Region - A,Manager - C,149559,223334
2,Month-3,Region - B,Manager - A,146278,244243
3,Month-4,Region - B,Manager - B,152167,231808
4,Month-5,Region - C,Manager - B,159525,258402
5,Month-6,Region - A,Manager - B,137163,256948
6,Month-7,Region - C,Manager - C,130625,222106
7,Month-8,Region - A,Manager - A,131140,230637
8,Month-9,Region - B,Manager - C,171259,226261
9,Month-10,Region - C,Manager - B,141956,193735


In [5]:
#checking dtypes of columns
data.dtypes

Month                             object
Region                            object
Manager                           object
Sales_before_digital_add(in $)     int64
Sales_After_digital_add(in $)      int64
dtype: object

#### Testing for the effect of digital marketing.
Here we have to determine weather there is any impact on the sales after digital marketing. To test this we need to use T test because sample size is less than 30 and population variance is unknown.
So here the hypothesis will be,

1.H0: There is no increase in sales after digital marketing

2.H1: There is a significant increase in sales after digital marketing.

This will be a right tailed test. Taking level of significance as 0.05

In [6]:
h0= "H0: There is no increase in sales after digital marketing"
h1 ="H1: There is a significant increase in sales after digital marketing"

In [8]:
#importing the ttest function from scipy
from scipy.stats import ttest_ind

#computing the p value and statistic
t_score,p_value = ttest_ind(data.iloc[:,3], data.iloc[:,4])

print(f"The p value is : {p_value}")
print(f"The t score is : {t_score}")
# checking weather to accept or reject H0
if p_value < 0.05:
  print("Reject H0")
  print("That is accept "+h1)
else:
  print("Accpet H0")
  print("That is accept "+h0)

The p value is : 2.614368006904645e-16
The t score is : -12.995084451110875
Reject H0
That is accept H1: There is a significant increase in sales after digital marketing


#### Testing the relationship between Manager and Region¶
We need to check weather there is a relationship between the categorical variables Manager and Region. So we need to use chi square test for independence.

H0: There is no dependency between Manager and Region

H1: There is a dependece between Manager and Region

In [9]:
h0="H0: There is no dependency between Manager and Region"
h1="H1: There is a dependece between Manager and Region"

In [10]:
pd.crosstab(data['Manager'], data['Region'])

Region,Region - A,Region - B,Region - C
Manager,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Manager - A,4,4,1
Manager - B,3,1,3
Manager - C,3,2,1


In [11]:
from scipy.stats import chi2_contingency

#creating a pandas crosstab with Manager and Region column of data
manager_x_region = pd.crosstab(data['Manager'], data['Region'])


result = chi2_contingency(manager_x_region)
print(result)
print("")
if result.pvalue < 0.05:
  print("Reject H0. and accept "+h1)
else:
  print("Accept H0, that is "+h0)

Chi2ContingencyResult(statistic=3.050566893424036, pvalue=0.5493991051158094, dof=4, expected_freq=array([[4.09090909, 2.86363636, 2.04545455],
       [3.18181818, 2.22727273, 1.59090909],
       [2.72727273, 1.90909091, 1.36363636]]))

Accept H0, that is H0: There is no dependency between Manager and Region
