In [16]:
import pandas as pd

In [17]:
df = pd.read_csv('Customer_conversion.csv')

In [18]:
df.head()

Unnamed: 0,id,time,con_treat,page,converted
0,851104,11:48.6,control,old_page,0
1,804228,01:45.2,control,old_page,0
2,661590,55:06.2,treatment,new_page,0
3,853541,28:03.1,treatment,new_page,0
4,864975,52:26.2,control,old_page,1


In [19]:
pd.pivot_table(data=df,values='converted',columns='page',index='con_treat',aggfunc='count')

page,new_page,old_page
con_treat,Unnamed: 1_level_1,Unnamed: 2_level_1
control,1928,145274
treatment,145311,1965


#### There is some error with the data. Control group shouldn't have got new page, similarly treatment group shouldn't have received old page.

#### We need to clean the data to get the correct working group

In [20]:
df_new = df[(df['con_treat']=='control')&(df['page']=='old_page') | (df['con_treat']=='treatment')&(df['page']=='new_page')] 

In [21]:
pd.pivot_table(data=df_new,values='converted',columns='page',index='con_treat',aggfunc='count')

page,new_page,old_page
con_treat,Unnamed: 1_level_1,Unnamed: 2_level_1
control,,145274.0
treatment,145311.0,


#### Now data looks correct

In [22]:
df_new.groupby('con_treat')['converted'].mean()

con_treat
control      0.120386
treatment    0.118807
Name: converted, dtype: float64

* 12% of the control group were converted to new customer with the old website page

* 11.8% of the treatment group were converted to new customer with the new website page

### Problem Statement : Did the new website page had an impact in customer conversion?

H0 : The new website page had no impact

H1 : The new website page had impact

p = 0.05

In [24]:
import numpy as np
df_chi_square = df_new[['con_treat','converted']]
df_chi_square.head()

Unnamed: 0,con_treat,converted
0,control,0
1,control,0
2,treatment,0
3,treatment,0
4,control,1


In [25]:
df_chi_square['con_treat'] = np.where(df_chi_square['con_treat']=='treatment',1,0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_chi_square['con_treat'] = np.where(df_chi_square['con_treat']=='treatment',1,0)


In [26]:
df_chi_square.head(10)

Unnamed: 0,con_treat,converted
0,0,0
1,0,0
2,1,0
3,1,0
4,0,1
5,0,0
6,1,1
7,0,0
8,1,1
9,1,1


In [30]:
from sklearn.feature_selection import chi2

f_p_values=chi2(df_chi_square[['con_treat']],df_chi_square[['converted']])

In [33]:
f_p_values

(array([0.86004764]), array([0.35372536]))

#### Conclusion

* Here we can see that the p-value = 0.35 which is greater than 0.05, hence there is no strong evidence to reject null hypothesis.

* Therefore, we accept the null hypothesis and say that the new website did not have a significant impact on customer conversion