# Goal
Many sites make money by selling ads. For these sites, the number of pages visited by users on each session is one of the most important metric, if not the most important metric. Data science plays a huge role here, especially by building models to suggest personalized content. In order to check if the model is actually improving engagement, companies then run A/B tests. It is often data scientist responsibility to analyze test data and understand whether the model has been successful. The goal of this project is to look at A/B test results and draw conclusions.

# Challenge Description
The company of this exercise is a social network. They decided to add a feature called: Recommended Friends, i.e. they suggest people you may know. A data scientist has built a model to suggest 5 people to each user. These potential friends will be shown on the user newsfeed. At ﬁrst, the model is tested just on a random subset of users to see how it performs compared to the newsfeed without the new feature. The test has been running for some time and your boss asks you to check the results. You are asked to check, for each user, the number of pages visited during their ﬁrst session since the test started. If this number increased, the test is a success. 

# Question1
Is the test winning? That is, should 100% of the users see the Recommended Friends feature? Is the test performing similarly for all user segments or are there diﬀerences among diﬀerent segments? If you identiﬁed segments that responded diﬀerently to the test, can you guess the reason? Would this change your point 1 conclusions?


In [4]:
import pandas as pd
import numpy as np

In [8]:
test_file='data/Engagement_Test/test_table.csv'
user_file='data/Engagement_Test/user_table.csv'

In [9]:
test=pd.read_csv(test_file)
user=pd.read_csv(user_file)

In [12]:
test.describe()

Unnamed: 0,user_id,test,pages_visited
count,100000.0,100000.0,100000.0
mean,4511960.0,0.50154,4.60403
std,2596973.0,0.5,2.467845
min,34.0,0.0,0.0
25%,2271007.0,0.0,3.0
50%,4519576.0,1.0,5.0
75%,6764484.0,1.0,6.0
max,8999849.0,1.0,17.0


In [19]:
TestRate=test.loc[test["test"]==1].shape[0]/100000

In [20]:
TestRate

0.50154

## sanity check for test event rate, assuming experiment was set up 50% and 50%


In [26]:
upper=0.5+1.56*np.sqrt(0.5*0.5/100000)
lower=0.5-1.56*np.sqrt(0.5*0.5/100000)
print('test rate should be range between {0:.4f}~{1:.4f}'.format(lower,upper))

test rate should be range between 0.4975~0.5025


In [13]:
test.head()

Unnamed: 0,user_id,date,browser,test,pages_visited
0,600597,2015-08-13,IE,0,2
1,4410028,2015-08-26,Chrome,1,5
2,6004777,2015-08-17,Chrome,0,8
3,5990330,2015-08-27,Safari,0,8
4,3622310,2015-08-07,Firefox,0,1


In [28]:
len(test.user_id.unique())#user_id is unique 

100000

In [14]:
user.describe()

Unnamed: 0,user_id
count,100000.0
mean,4511960.0
std,2596973.0
min,34.0
25%,2271007.0
50%,4519576.0
75%,6764484.0
max,8999849.0


In [30]:
user.shape[0] #100000 records 

100000

In [31]:
len(user.user_id.unique())#confirm user_id is unique 

100000

In [15]:
user.head()

Unnamed: 0,user_id,signup_date
0,34,2015-01-01
1,59,2015-01-01
2,178,2015-01-01
3,285,2015-01-01
4,383,2015-01-01


In [34]:
data=test.merge(user,how='left',on='user_id')

In [36]:
data.head()

Unnamed: 0,user_id,date,browser,test,pages_visited,signup_date
0,600597,2015-08-13,IE,0,2,2015-01-19
1,4410028,2015-08-26,Chrome,1,5,2015-05-11
2,6004777,2015-08-17,Chrome,0,8,2015-06-26
3,5990330,2015-08-27,Safari,0,8,2015-06-25
4,3622310,2015-08-07,Firefox,0,1,2015-04-17
