### A company started to invest in digital marketing as a new way of their product promotions. For that they collected data and decided to carry out a study on it.

### a. The company wishes to clarify whether there is any increase in sales after stepping into digital marketing.

### b. The company needs to check whether there is any dependency between the features “Region” and “Manager”.

## Importing Libraries

In [58]:
import numpy as np
import pandas as pd
import statistics as stat
import math
## visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
df = pd.read_csv('Sales_add.csv')

In [3]:
df.head()

Unnamed: 0,Month,Region,Manager,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
0,Month-1,Region - A,Manager - A,132921,270390
1,Month-2,Region - A,Manager - C,149559,223334
2,Month-3,Region - B,Manager - A,146278,244243
3,Month-4,Region - B,Manager - B,152167,231808
4,Month-5,Region - C,Manager - B,159525,258402


In [4]:
df.shape

(22, 5)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 5 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Month                           22 non-null     object
 1   Region                          22 non-null     object
 2   Manager                         22 non-null     object
 3   Sales_before_digital_add(in $)  22 non-null     int64 
 4   Sales_After_digital_add(in $)   22 non-null     int64 
dtypes: int64(2), object(3)
memory usage: 1012.0+ bytes


In [7]:
df.describe()

Unnamed: 0,Sales_before_digital_add(in $),Sales_After_digital_add(in $)
count,22.0,22.0
mean,149239.954545,231123.727273
std,14844.042921,25556.777061
min,130263.0,187305.0
25%,138087.75,214960.75
50%,147444.0,229986.5
75%,157627.5,250909.0
max,178939.0,276279.0


In [59]:
stat.mean(df['Sales_before_digital_add(in $)'])

149239.95454545456

In [60]:
stat.mean(df['Sales_After_digital_add(in $)'])

231123.72727272726

## Is there any increase in sales after stepping into digital marketing?

In [76]:
# Two sample T-Test
# null hypothesis(H0) : Sales_before_digital_add(in $)=Sales_After_digital_add(in $)
# alternative hypothesis(Ha) : Sales_before_digital_add(in $)<Sales_After_digital_add(in $)

In [77]:
from scipy.stats import ttest_ind

In [78]:
t_stat,p_value = ttest_ind(df['Sales_before_digital_add(in $)'],df['Sales_After_digital_add(in $)'])

In [79]:
p_value

2.614368006904645e-16

In [80]:
if p_value<0.05:
    print('Reject Null Hypothesis')
else:
    print('Accept Null Hypothesis')

Reject Null Hypothesis


In [62]:
# since the p_value>0.05, we Reject null hypothesis. Hence there is increase in sales after stepping into digital marketing.

## Is there any dependency between the features “Region” and “Manager”.

In [48]:
# chi-square test
# Null Hypothesis(H0): There is no relationship
# Alternative Hypothesis(Ha): There is relationship
# if p<0.05 then we reject Null Hypothesis
# else ,we fail to reject NULL Hypothesis

In [69]:
data = pd.crosstab(df.Region,df.Manager)

In [70]:
data

Manager,Manager - A,Manager - B,Manager - C
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Region - A,4,3,3
Region - B,4,1,2
Region - C,1,3,1


In [71]:
from scipy.stats import chi2_contingency

In [72]:
chi2_stat,p_value,dof,exp = chi2_contingency(data)

In [73]:
p_value

0.5493991051158094

In [74]:
if p_value<0.05:
    print('Reject Null Hypothesis')
else:
    print('Accept Null Hypothesis')

Accept Null Hypothesis


In [75]:
# Hence we fail to reject null hypothesis. Thus there is no relationships between 'Region' and 'Manager'