# Integrated Project 2

**Project Description**


You work at a startup that sells food products. You need to investigate user behavior for the company's app.


- First study the sales funnel. Find out how users reach the purchase stage. How many users actually make it to this stage? How many get stuck at previous stages? Which stages in particular?


- Then look at the results of an A/A/B test. (Read on for more information about A/A/B testing.) The designers would like to change the fonts for the entire app, but the managers are afraid the users might find the new design intimidating. They decide to make a decision based on the results of an A/A/B test.


- The users are split into three groups: two control groups get the old fonts and one test group gets the new ones. Find out which set of fonts produces better results.


- Creating two A groups has certain advantages. We can make it a principle that we will only be confident in the accuracy of our testing when the two control groups are similar. If there are significant differences between the A groups, this can help us uncover factors that may be distorting the results. Comparing control groups also tells us how much time and data we'll need when running further tests.


- You'll be using the same dataset for general analytics and for A/A/B analysis. In real projects, experiments are constantly being conducted. Analysts study the quality of an app using general data, without paying attention to whether users are participating in experiments.

**Description of the data**


Each log entry is a user action or an event.


- EventName — event name


- DeviceIDHash — unique user identifier


- EventTimestamp — event time


- ExpId — experiment number: 246 and 247 are the control groups, 248 is the test group


**Instructions for completing the project**


**Step 1. Open the data file and read the general information**


File path: /datasets/logs_exp_us.csv Download dataset


**Step 2. Prepare the data for analysis**


- Rename the columns in a way that's convenient for you


- Check for missing values and data types. Correct the data if needed


- Add a date and time column and a separate column for dates


**Step 3. Study and check the data**


- How many events are in the logs?


- How many users are in the logs?


- What's the average number of events per user?


- What period of time does the data cover? Find the maximum and the minimum date. Plot a histogram by date and time. Can you be sure that you have equally complete data for the entire period? Older events could end up in some users' logs for technical reasons, and this could skew the overall picture. Find the moment at which the data starts to be complete and ignore the earlier section. What period does the data actually represent?


- Did you lose many events and users when excluding the older data?


- Make sure you have users from all three experimental groups.


**Step 4. Study the event funnel**


- See what events are in the logs and their frequency of occurrence. Sort them by frequency.


- Find the number of users who performed each of these actions. Sort the events by the number of users. Calculate the proportion of users who performed the action at least once.


- In what order do you think the actions took place. Are all of them part of a single sequence? You don't need to take them into account when calculating the funnel.


- Use the event funnel to find the share of users that proceed from each stage to the next. (For instance, for the sequence of events A → B → C, calculate the ratio of users at stage B to the number of users at stage A and the ratio of users at stage C to the number at stage B.)


- At what stage do you lose the most users?


- What share of users make the entire journey from their first event to payment?


**Step 5. Study the results of the experiment**


- How many users are there in each group?


- We have two control groups in the A/A test, where we check our mechanisms and calculations. See if there is a statistically significant difference between samples 246 and 247.


- Select the most popular event. In each of the control groups, find the number of users who performed this action. Find their share. Check whether the difference between the groups is statistically significant. Repeat the procedure for all other events (it will save time if you create a special function for this test). Can you confirm that the groups were split properly?


- Do the same thing for the group with altered fonts. Compare the results with those of each of the control groups for each event in isolation. Compare the results with the combined results for the control groups. What conclusions can you draw from the experiment?


- What significance level have you set to test the statistical hypotheses mentioned above? Calculate how many statistical hypothesis tests you carried out. With a statistical significance level of 0.1, one in 10 results could be false. What should the significance level be? If you want to change it, run through the previous steps again and check your conclusions.

# Step 1. Open the data file and read the general information

Let's import the libraries that we need to start our research.


In [None]:
#!pip install -q plotly==5.5.0

In [None]:
#pip install -U sidetable

In [None]:
import math
from scipy import stats
import pandas as pd
import datetime
from datetime import datetime
import sidetable
import seaborn as sns
from matplotlib import pyplot as plt
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import warnings

Let's read the data and study the general information.

In [None]:
data = pd.read_csv('/datasets/logs_exp_us.csv')

# Step 2. Prepare the data for analysis


In [None]:
data.head()

We see from the head that we need to separate the columns.

**Rename the columns in a way that's convenient for you**

In [None]:
data = pd.read_csv('/datasets/logs_exp_us.csv', sep='\t')
data.columns =['event_name','user_id','timestamp','experiment_id']

In [None]:
data.head()

In [None]:
data.shape

We have 244126 rows with four columns.


In [None]:
data.info()

Look like we dont have missing values, but we need to change data types.


In [None]:
data.stb.missing()

No missing values.


Let's change the data type of the timestamp column.


In [None]:
data['timestamp']=data['timestamp'].apply(lambda x:datetime.fromtimestamp(x))
data.head()

Let's look at duplicated values.

In [None]:
data.duplicated().sum()

There are 413 duplicated rows in our data.


In [None]:
data.loc[data.duplicated(keep=False), :]

It seems like some of the events for the same users appear twice.


Le's dive deeper into this problem.

In [None]:
for i in data[data.duplicated()].columns:
    print(i,':', data[data.duplicated()][i].nunique())

Look like we have duplicated values in all five events, in 237 ids which is a lot, and at different times.

In [None]:
data[data.duplicated()]['timestamp'].dt.date.unique()

The duplicated values appear from the 30th of month 7 to the 6th of month 8.
it can happen due to a technical problem. We will notify the data engineer and our colleges about this.

For now, we will drop the duplicated values.

In [None]:
data=data.drop_duplicates()

In [None]:
data.shape

There are no duplicated values.


Find out how users reach the purchase stage. How many users actually make it to this stage? How many get stuck at previous stages? Which stages in particular?

Let's look at the event names.

In [None]:
data.stb.freq(['event_name'])

34,118 reach the purchase stage.
42,668 reach the add to cart stage.

In [None]:
42668 - 34118

8,550 got stuck in the previous step.

# Step 3. Study and check the data

**How many events are in the logs?**

In [None]:
data['event_name'].nunique()

we have 5 events.

**How many users are in the logs?**

In [None]:
data['user_id'].nunique()

7551 users.

**What's the average number of events per user?**

In [None]:
data.groupby(['user_id'])['event_name'].count().mean()

the avg. number of events per user is 32.275

**What period of time does the data cover? Find the maximum and the minimum date. Plot a histogram by date and time. Can you be sure that you have equally complete data for the entire period?** 


Let's look at the daily activity 

In [None]:
fig = px.histogram(data, x="timestamp")
fig.show()

In [None]:
data['timestamp'].min()

In [None]:
data['timestamp'].max()

we can see that most of the actions happened during the day.
Those big fall downs are just the activity during the night.
the data cover dates between 2019-07-25 04:43:36 to 2019-08-07 21:15:17.
We can say that the experiment started on Aug 1.
We can't be sure about the data before Aug 1, so we will move forward with Aug 1.

**Older events could end up in some users' logs for technical reasons, and this could skew the overall picture. Find the moment at which the data starts to be complete and ignore the earlier section. What period does the data actually represent?**

Let's cut the data and save it on a new data frame.

In [None]:
data_new = data.query('timestamp >= "2019-08-01"')

In [None]:
fig = px.histogram(data_new, x="timestamp")
fig.show()

That looks much better.

**Did you lose many events and users when excluding the older data?
Make sure you have users from all three experimental groups.**

In [None]:
data_new['event_name'].value_counts()

In [None]:
data['event_name'].value_counts()

In [None]:
event_lose= data['event_name'].value_counts() - data_new['event_name'].value_counts()
event_lose

We did lose some events but compared it to the amount of data; it is not a lot.

Let's see how many users we lose.

In [None]:
data['user_id'].nunique()

In [None]:
user_lose=data['user_id'].nunique()-data_new['user_id'].nunique()
user_lose

It looks like we lost 17 users, which is not a lot compared to the number of unique users in our experiment.

# Step 4. Study the event funnel

**See what events are in the logs and their frequency of occurrence. Sort them by frequency.**

In [None]:
data_events=data_new.groupby(['event_name'])['user_id'].count().sort_values(ascending=False).reset_index()
data_events

MainScreenAppear is First place, OffersScreenAppear second place, CartScreenAppear is third place, PaymentScreenSuccessful is fourth, and Tutorial is in place 5.
It seems like the event funnel is going down in users logically until our primary goal is a purchase. The Tutorial is part of the primary funnel.

**Find the number of users who performed each of these actions. Sort the events by the number of users. Calculate the proportion of users who performed the action at least once.**

In [None]:
proportion=data_new.groupby(['event_name'])['user_id'].nunique().sort_values(ascending=False).reset_index()
proportion

The amount of user action follows the logical funnel.


In [None]:
proportion['prc']=proportion['user_id']/data.user_id.nunique()
proportion

In [None]:
proportion['prc']=proportion['prc'].sort_values(ascending=False)

In [None]:
data_new.groupby(['event_name'])['user_id'].nunique().sort_values(ascending=False)/data.user_id.nunique()


In [None]:
fig=px.funnel(proportion,x='prc',y='event_name',color='event_name')
fig.show()

**In what order do you think the actions took place. Are all of them part of a single sequence? You don't need to take them into account when calculating the funnel.**

It seems like the Funnel of users moves from the main screen to the offer to add to the cart and then the purchase screen. The tutorial event is not a primary part of this Funnel.

In [None]:
sorted_data=data_new[data_new['event_name']!='Tutorial'].sort_values(by=['user_id','timestamp'])

In [None]:
sorted_data.sample()

Users usually start their customer journey on the main screen. This very much makes sense.


Let's look at the customer journey of one of our users.


In [None]:
sorted_data[sorted_data.user_id==8619840625096464102]

In [None]:
def sequence(user):
    sorted_user=sorted_data[sorted_data['user_id']==user].sort_values(by=['user_id','timestamp'])
    return sorted_user['event_name'].drop_duplicates().to_list()

In [None]:
sequence(8619840625096464102)

It looks like this user starts his journey on the main screen, and he moves right to the cart page and does not look at the offer screen, which means that users can skip steps in the Funnel all their way to the purchase stage.

Let's create a path for all our users in our data.

In [None]:
sequence_empty=[]
for i in sorted_data.user_id.unique():
    sequence_empty.append([i,sequence(i)])

In [None]:
path_data=pd.DataFrame(sequence_empty,columns=['user','path'])
path_data

In [None]:
path_data['path'].value_counts()

The most common path for the users is to leave after visiting the main page.

**Use the event funnel to find the share of users that proceed from each stage to the next. (For instance, for the sequence of events A → B → C, calculate the ratio of users at stage B to the number of users at stage A and the ratio of users at stage C to the number at stage B.)**

In [None]:
data_new=data_new[data_new['event_name'] != 'Tutorial']

In [None]:
funnel_shift=data_new.groupby(['event_name'])['user_id'].nunique().sort_values(ascending=False).reset_index()
funnel_shift

In [None]:
funnel_shift['perc_ch']=funnel_shift['user_id'].pct_change()
funnel_shift

**At what stage do you lose the most users?
What share of users make the entire journey from their first event to payment?**

Look like the majority of users leave after looking on the main screen.(38%)
Look like we lost 61.9% of users until we got to the payment stage.

In [None]:
funnel_by_groups=[]
for i in data_new.experiment_id.unique():
    group=data_new[data_new.experiment_id==i].groupby(['event_name','experiment_id'])['user_id'].nunique().reset_index().sort_values(by='user_id',ascending=False)
    display(group)
    funnel_by_groups.append(group)
    

In [None]:
funnel_by_groups=pd.concat(funnel_by_groups)
funnel_by_groups

In [None]:
fig=px.funnel(funnel_by_groups,x='user_id',y='event_name',color='experiment_id')
fig.show()

# **Step 5. Study the results of the experiment**


<li>How many users are there in each group?
<li>We have two control groups in the A/A test, where we check our mechanisms and calculations. 
See if there is a statistically significant difference between samples 246 and 247: 
   <li> Select the most popular event. In each of the control groups, find the number of users who performed this action. 
Find their share. Check whether the difference between the groups is statistically significant. Repeat the procedure for all other events (it will save time if you create a special function for this test). Can you confirm that the groups were split properly? (A/A) 246 vs 247
<li>Do the same thing for the group with altered fonts. (A/B) 
    
 Compare the results with those of each of the control groups for each event in isolation. (247 vs 248 and 246 vs 248)
    
    
    
    
    Compare the results with the combined results for the control groups.(247+246 vs 248)
    What conclusions can you draw from the experiment?
<li>What significance level have you set to test the statistical hypotheses mentioned above? Calculate how many statistical hypothesis tests you carried out. With a statistical significance level of 0.1, one in 10 results could be false. What should the significance level be? If you want to change it, run through the previous steps again and check your conclusions.

### **How many users are there in each group?**

In [None]:
data_new.groupby(['experiment_id'])['user_id'].nunique()

There are 2489 in the first control group.
There are 2520 in the second control group.
There are 2542 in the experiment group.
Let's check if we have a user id in both groups.

In [None]:
data_new.groupby(['user_id'])['experiment_id'].nunique().reset_index().query('experiment_id>1')

**We have two control groups in the A/A test, where we check our mechanisms and calculations. 
See if there is a statistically significant difference between samples 246 and 247:** 

requirements for proprotion test:
   <li> The sampling method is simple random sampling.
  <li> Each sample point can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure.
  <li> The sample includes at least 10 successes and 10 failures.
  <li> The population size is at least 20 times as big as the sample size.

In [None]:
from scipy import stats as st
import numpy as np
import math as mth
alpha = .05 # significance level

successes = np.array([78, 120])
trials = np.array([830, 909])

# success proportion in the first group:
p1 = successes[0]/trials[0]

# success proportion in the second group:
p2 = successes[1]/trials[1]

# success proportion in the combined dataset:
p_combined = (successes[0] + successes[1]) / (trials[0] + trials[1])

# the difference between the datasets' proportions
difference = p1 - p2
# calculating the statistic in standard deviations of the standard normal distribution
#z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials[0] + 1/trials[1]))

# setting up the standard normal distribution (mean 0, standard deviation 1)
distr = st.norm(0, 1)
# calculating the statistic in standard deviations of the standard normal distribution
z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials[0] + 1/trials[1]))

# setting up the standard normal distribution (mean 0, standard deviation 1)
distr = st.norm(0, 1) 

p_value = (1 - distr.cdf(abs(z_value))) * 2

print('p-value: ', p_value)

if (p_value < alpha):
    print("Rejecting the null hypothesis: there is a significant difference between the proportions")
else:
    print("Failed to reject the null hypothesis: there is no reason to consider the proportions different")

In [None]:
test_1=data_new[data_new.experiment_id==246]['user_id'].unique()
test_2=data_new[data_new.experiment_id==247]['user_id'].unique()
control=data_new[data_new.experiment_id==248]['user_id'].unique()


Suppose, in the middle of the project you get a pivot like the one below and you need to compare the 'conversion' of two test groups for different events. So you need to check whether two test groups are significantly different in each event.

In [None]:
pivot=data_new.pivot_table(index='event_name', columns='experiment_id',values='user_id',aggfunc='nunique').reset_index()

In [None]:
pivot

In [None]:
pivot[pivot.event_name=='CartScreenAppear'][246].iloc[0]

Now we have lists with our users from each group and we can use it in calculation of our proportions) 

We can access this value of success directly from our pivot table)

In [None]:
def check_hypothesis(group1,group2,event,alpha=0.05):
    success1=pivot[pivot.event_name==event][group1].iloc[0]
    success2=pivot[pivot.event_name==event][group2].iloc[0]
    
    trials1=data_new[data_new.experiment_id==group1]['user_id'].nunique()
    trials2=data_new[data_new.experiment_id==group2]['user_id'].nunique()
    
    
    # success proportion in the first group:
    p1 = success1/trials1

    # success proportion in the second group:
    p2 = success2/trials2

    # success proportion in the combined dataset:
    p_combined = (success1 + success2) / (trials1 + trials2)

    # the difference between the datasets' proportions
    difference = p1 - p2
    # calculating the statistic in standard deviations of the standard normal distribution
    z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))

    # setting up the standard normal distribution (mean 0, standard deviation 1)
    distr = st.norm(0, 1)
    # calculating the statistic in standard deviations of the standard normal distribution


    p_value = (1 - distr.cdf(abs(z_value))) * 2

    print('p-value: ', p_value)

    if (p_value < alpha):
        print("Rejecting the null hypothesis for", event,"and groups", group1,group2)
    else:
        print("Failed to reject the null hypothesis for",event,"and groups", group1,group2 )
    

In [None]:
check_hypothesis(246,247,'CartScreenAppear',alpha=0.05)

What are we actually checking? Whether there is a statistical difference in conversion aka proportion - "Is the share of users (from all users in the test) that had "CartScreenAppear of one test group is statistically different from another?"

In [None]:
for i in pivot.event_name.unique():
    check_hypothesis(246,247,i,alpha=0.05)

Thus, knowing the event name and the name of the group, we can access any value of success from our pivot table for any event. Keep in mind that value of trials will always be the same - the number of users in the groups)

nowwe can say that the split was done correctlly becuase we failed to reject the null hypothesis for our control group.

**Do the same thing for the group with altered fonts. (A/B)
Compare the results with those of each of the control groups for each event in isolation. (247 vs 248 and 246 vs 248)**

In [None]:
for i in pivot.event_name.unique():
    check_hypothesis(247,248,i,alpha=0.05)

In [None]:
for i in pivot.event_name.unique():
    check_hypothesis(246,248,i,alpha=0.05)

we Failed to reject the null hypothesis in all events of the groups.

**Compare the results with the combined results for the control groups.(247+246 vs 248)
What conclusions can you draw from the experiment?**

In [None]:
pivot

In [None]:
pivot.columns

In [None]:
pivot.columns = ['event_name', 'control_1', 'control_2', 'test']

In [None]:
pivot['combined_control'] = pivot['control_1'] + pivot['control_2']
pivot

In [None]:
from scipy.stats import levene

In [None]:
levene(pivot['combined_control'], pivot['test'])

In [None]:
Levene_pvalue = 0.48668816154181105

In [None]:
Levene_alpha = 0.05

if (Levene_pvalue < Levene_alpha):
    print("Rejecting the null hypothesis")
else:
    print("Failed to reject the null hypothesis")

We can also see it visually through a histogram.

In [None]:
pivot.hist('combined_control', bins=100)
pivot.hist('test', bins=100)
plt.show()

As we can see, the numbers are higher for each event in the combined_control, but the data distribution is almost identical.

**What significance level have you set to test the statistical hypotheses mentioned above? Calculate how many statistical hypothesis tests you carried out. With a statistical significance level of 0.1, one in 10 results could be false. What should the significance level be? If you want to change it, run through the previous steps again and check your conclusions.**

Let's see how many tests we did:
247 vs. 248,
246 vs. 248
And 247_246 vs. 248.
In each group, we did four tests (without the Tutorial), but on the 247_246 vs. 248, we did just 1. so should be 13 tests in total.

In [None]:
corrected_alpha=0.05/13
corrected_alpha

our corrected alpha should be - 0.0038461538461538464

# General conclusion

**First, we load the data and import all the libraries.**


**Prepare our data for analysis:**


- we change the names of the column for a better understanding.
- We explored our data and looked at missing values.
- We understood that we may have duplicated values due to a technical problem.
- We drop those duplicated values.
- We find that 34,118 reach the purchase stage. Forty-two thousand six hundred sixty-eight advances the add to cart stage.


**study and check the data:**


- We find that we have five events and 7551 customers in our experiment.
- The avg. number of events per user is 32.275.
- We noticed that we had some inconsistent data before aug one, so we deleted it to move forward with data from Aug 1 and after it.


**study the event funnel:**


- We find that the Funnel of users moves from the main screen to the offer to add to the cart and then the purchase screen. The tutorial event is not a primary part of this Funnel.

- We find that The most common path for the users is to leave after visiting the main page.

- we find that most users leave after looking on the main screen. (38%) We lost 61.9% of users until we got to the payment stage.


- study the results of the experiments:


- We find that there are 2489 in the first control group.
- There are 2520 in the second control group.
- There are 2542 in the experiment group.

- We rejected the null hypothesis and found a significant difference between the proportions of groups 246 and 247.


- We failed to reject the null hypothesis that there is a significant difference in the conversion of each event between groups 246 and 247.


- We failed to reject the null hypothesis that there is a significant difference in the conversion of each event between groups 247 and 248.


- We failed to reject the null hypothesis that there is a significant difference in the conversion of each event between groups 246 and 248.


- We combined the results of the two control groups, and we look if there is a statistical difference in conversion between the test group and  Failed to reject the null hypothesis.


- We also look visually at the distribution of the groups and see that they are almost identical.


- We used the Bonferroni correction approach to find the corrected alpha and find that it should be 0.0038461538461538464.

**The conclusion is that those changes that we made didn't affect much of the results, and the company should start running another test instead.**


