A company started to invest in digital marketing as a new way of their product
promotions.For that they collected data and decided to carry out a study on it.

1) The company wishes to clarify whether there is any increase in sales after
stepping into digital marketing.

In [2]:
import pandas as pd
df=pd.read_csv('Sales_add.csv')
df.head()

Unnamed: 0,Month,Region,Manager,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Month-1,Region - A,Manager - A,132921,270390
1,Month-2,Region - A,Manager - C,149559,223334
2,Month-3,Region - B,Manager - A,146278,244243
3,Month-4,Region - B,Manager - B,152167,231808
4,Month-5,Region - C,Manager - B,159525,258402


### 1) One tailed Paired T Test

In [8]:
# Null hypothesis: There is no significant difference in sales after stepping into digital marketing
# Alternate hypothesis: There is significant increase in sales after stepping into digital marketing
alpha=0.05 # represents the default 5% significance level

In [5]:
n1=len(df) # sample-size
n1

22

In [6]:
dof1=n1-1 # degrees of freedom
dof1

21

In [7]:
from scipy.stats import ttest_rel, t

In [10]:
t_critical=t.ppf(q=1-alpha, df=dof1) # finds critical t value for right-tailed test
t_critical

1.7207429028118775

In [12]:
t_statistic, pval1=ttest_rel(df['Sales_After_digital_add(in $)'],
                             df['Sales_before_digital_add(in $)'],
                             alternative='greater') # finds t-statistic and p-value for right-tailed t test
t_statistic, pval1

(12.09070525287017, 3.168333502287889e-11)

In [20]:
print("\033[1mConclusion based on p-value \033[0m\n") 
if(pval1<=alpha):
    print("The null hypothesis is rejected. \nThereby the alternate hypothesis which states that there is significant increase in sales after stepping into digital marketing can be accepted\n\n")
else:
    print("Failed to reject the null hypothesis that there is any significant difference in sales after stepping into digital marketing\n\n")

print("\033[1mConclusion based on t-statistic\033[0m\n") 
if(t_statistic>t_critical):
    print("The null hypothesis is rejected. \nThereby the alternate hypothesis which states that there is significant increase in sales after stepping into digital marketing can be accepted")
else:
    print("Failed to reject the null hypothesis that there is any significant difference in sales after stepping into digital marketing")

[1mConclusion based on p-value [0m

The null hypothesis is rejected. 
Thereby the alternate hypothesis which states that there is significant increase in sales after stepping into digital marketing can be accepted


[1mConclusion based on t-statistic[0m

The null hypothesis is rejected. 
Thereby the alternate hypothesis which states that there is significant increase in sales after stepping into digital marketing can be accepted


Based on both p-value and t-statistic we can conclude that the null hypothesis is rejected and the alternate hypothesis 
which states that there is significant increase in sales after stepping into digital marketing can be accepted

### 2) Chi-Square test

The company needs to check whether there is any dependency between the features “Region” and “Manager”.

In [23]:
# Null Hypothesis: There is no dependency between features Region and Manager
# ALternate Hypothesis: There is dependency between features Region and Manager

In [24]:
from scipy.stats import chi2_contingency, chi2
cross_table=pd.crosstab(df['Region'],df['Manager']) # creates a cross table based on the provided inputs
cross_table

Manager,Manager - A,Manager - B,Manager - C
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Region - A,4,3,3
Region - B,4,1,2
Region - C,1,3,1


In [25]:
chi2_val, pval2, dof2, exp_val = chi2_contingency(cross_table) # finds chisquare value, p-value, degrees of freedom & expected values 
chi2_val, pval2, dof2, exp_val

(3.050566893424036,
 0.5493991051158094,
 4,
 array([[4.09090909, 3.18181818, 2.72727273],
        [2.86363636, 2.22727273, 1.90909091],
        [2.04545455, 1.59090909, 1.36363636]]))

In [26]:
# finding critical chi-square value
chi_critical=chi2.ppf(q=1-alpha, df=dof2)
chi_critical

9.487729036781154

In [27]:
print("\033[1mConclusion based on p-value \033[0m\n") 
if(pval2<=alpha):
    print("The null hypothesis is rejected. \nThereby the alternate hypothesis which states that there is dependence between Region and Manager features can be accepted\n\n")
else:
    print("Failed to reject the null hypothesis which concludes there is no dependency between features Region and Manager\n\n")

print("\033[1mConclusion based on chisquare-statistic\033[0m\n") 
if(chi2_val>chi_critical):
    print("The null hypothesis is rejected. \nThereby the alternate hypothesis which states that there is dependence between Region and Manager features can be accepted")
else:
    print("Failed to reject the null hypothesis which concludes there is no dependency between features Region and Manager")

[1mConclusion based on p-value [0m

Failed to reject the null hypothesis which concludes there is no dependency between features Region and Manager


[1mConclusion based on chisquare-statistic[0m

Failed to reject the null hypothesis which concludes there is no dependency between features Region and Manager


Based on both p-value and chisquare-value we fail to reject the null hypothesis which concludes that there is no dependency between the features "Region" and "Manager"