## Project Description:
We work in a startup that sells food. We need to figure out how the users of our mobile application behave. 
1. Let's study the sales funnel. Let's find out how users reach the purchase. How many users reach the purchase, and how many are "stuck" in the previous steps? Which ones exactly?
2. After that, we examine the results of the A/A/B experiment. Designers wanted to change fonts throughout the application, and managers were afraid that users would be unusual. We agreed to make a decision based on the results of the A/A/B test. Users were divided into 3 groups: 2 control ones with old fonts and one experimental one with new ones. Let's find out which font is better.
3. Creating two A groups instead of one has certain advantages. If the two control groups are equal, we can be sure of the accuracy of the test. If there are significant differences between the values of A and A, it will help to detect the factors that led to the distortion of the results. Comparing the control groups will also help you understand how much time and data will be required for further tests.

In the case of general analytics and A/A/B experiment, work with the same data. There are always experiments in real projects. Analysts investigate the quality of the application based on general data, without taking into account the affiliation of users to experiments.

#### Data Description:
* Each log entry is a user action or event. 
* eventName — event name;
* DeviceIDHash — unique user ID;
* EventTimestamp — time of the event;
* ExpId — experiment number: 246 and 247 are control groups, and 248 is experimental.

## 1. Let's look at the data and import the necessary libraries

In [None]:
import re
import pandas as pd
import numpy as np
import scipy.stats as stats
import plotly.express as px 
import plotly.graph_objects as go
import plotly.subplots as sp
import math
from datetime import datetime, timedelta
from plotly.subplots import make_subplots
from scipy import stats as st

In [None]:
try:
    logs = pd.read_csv('/datasets/logs_exp.csv', sep='\t')
except:
    logs = pd.read_csv('https://code.s3.yandex.net//datasets/logs_exp.csv', sep='\t')


In [None]:
logs.head()

In [None]:
logs.info()

There are no omissions, the column names are not in Python format, the Event Timestamp field is of type int64, and this is the time.

## 2. Prepare the data

In [None]:
logs.duplicated().sum() # check duplicates

In [None]:
logs = logs.drop_duplicates().reset_index(drop=True) # delet obvious duplicates

In [None]:
logs.columns = (logs.columns.str.replace('(?<=[a-z])(?=[A-Z])', '_', regex=True).str.lower()) 
logs['date_time'] = pd.to_datetime(logs['event_timestamp'] , unit='s') # creaate columns with date
logs['date'] = pd.to_datetime(logs['date_time']).dt.date
logs['date'] = pd.to_datetime(logs['date']) 
logs.head()

Brought the column name to snake_case, checked the data for omissions, added columns and brought them to the datetime format and removed obvious duplicates.

## 3. Check the data

In [None]:
logs.nunique() # unique data

* 5 event names.
* * 7551 unique user ids
* 3 experimental group numbers: 246 and 247 — control groups, and 248 — experimental.
* 14 dates on which observations were carried out

In [None]:
print(f"Total events in the log:{logs['event_name'].count()}")

In [None]:
print(f"Average number of events per user: {round(logs['event_name'].count() / logs['device_idhash'].nunique())}")

14 days in which the experiments were conducted.
Find the start date and the end date

In [None]:
print(f"Minimum observation date: {logs['date_time'].min()}")
print(f"Maximum observation date: {logs['date_time'].max()}")

In [None]:
fig = go.Figure(data=[go.Histogram(x=logs['date_time'])])

fig.update_layout(
     title_text='Time distribution by dates',
     xaxis_title_text='Date',
     font=dict(size=12),
     width=800,
     height=500
)
fig.show()

The graph shows that you have differently complete data for the entire period, from 25.07.2019 to 01.08.2019, the data is not complete.

In [None]:

new_logs = logs.query("date > '2019-07-31'") # Delete uncompleted data
new_logs.nunique()


In [None]:
#number of events
print(
     '\nNumber of events after data cleaning:', new_logs.shape[0],
     '\nNumber of lost events:', logs.shape[0] - new_logs.shape[0],
     '\n % of events from initial:', round(new_logs.shape[0]/logs.shape[0]*100, 2)
      )

#number of unique users
print(
     '\nNumber of users after data clearing:', len(new_logs['device_idhash'].unique()),
     '\n Number of lost users:', len(logs['device_idhash'].unique()) - len(new_logs['device_idhash'].unique()),
     '\n % of users from initial:', round(new_logs['device_idhash'].nunique() / logs['device_idhash'].nunique()*100, 2))

* Lost 2826 observations, which is 1.16%.
* The number of "lost" users is 17, which is 0.23%

The losses of both observations and users are not significant.

### Output:
1. Total events in the log: 243713
2. * 5 event names.
  *  7551 unique user ids
  * 3 experiment number: 246 and 247 are control groups, and 248 is experimental.
  * 14 dates on which observations were carried out
3. Average number of events per user: 32
4. * Minimum observation date: 2019-07-25 04:43:36
  * Maximum observation date: 2019-08-07 21:15:17
5. Discarding incomplete data, 17 users and 2610 observations were lost.

## 4. Explore the funnel of events

In [None]:
logs['event_name'].unique() 

1. MainScreenAppear - appearance of the main screen
2. OffersScreenAppear - the appearance of a screen with an offer;
3. CartScreenAppear - the appearance of the screen with the cart;
4. PaymentScreenSuccessful - screen about successful payment;
5. Tutorial - familiarize yourself with the instructions.

In [None]:
new_logs['event_name'].value_counts() # # # let's look at the distribution of lags between events

In [None]:
# calculation of the number of users for all possible events.
event_user = (new_logs
                 .groupby('event_name')
                 .agg({'device_idhash':'nunique'})
                 .reset_index()
                 .sort_values(by='device_idhash', ascending=False)
                 .reset_index(drop=True)
                 )
event_user.columns = ['event_name', 'user_cnt']
event_user

The places were distributed by the number of users for each event in the same way as by the frequency of events.

Creating a table for analysis

In [None]:
# Group data by event, its frequency and the number of unique users
event_user = (
         new_logs.groupby('event_name').agg({'event_name':'count', 'device_idhash': 'nunique'})
         .rename(columns={'event_name':'event_cnt', 'device_idhash':'user_cnt'})
         .sort_values(by ='event_cnt', ascending=False).reset_index()
     )

# Share of unique users relative to the total number of users
event_user['rate %'] = round((event_user['user_cnt'] / len(new_logs['device_idhash'].unique())) * 100, 1)

# Shift the user_cnt column down one row
event_user['step'] = event_user['user_cnt'].shift()

# Conversion to next step
event_user['convers_step'] = round(event_user['user_cnt'] / event_user['step'] * 100, 1)

# Remove the step column
event_user.drop(columns= ['step'], axis = 1, inplace = True)

# I will replace NaN in the first line with 100% and we will use them as the next step
event_user = event_user.fillna(100)

event_user

It can be seen that not all users go to the main screen, perhaps they leave at the tutorial stage. There may be problems with the usability of the instructions and their complexity for understanding. Or the part comes bypassing the main screen.

In [None]:
# Build a graph of event frequency
fig2 = go.Figure(go.Bar(
         x=event_user['event_cnt'],
         y=event_user['event_name'],
         orientation='h',
         textposition="inside",))

fig2.update_layout(
     title="Event frequency",
     font=dict(size=12),
     width=500,
     height=500,)

# Building a graph of the event funnel
fig1 = go.Figure(go.Funnel(
     y=event_user['event_name'],
     x=event_user['user_cnt'],
     textposition="inside",))

fig1.update_layout(
     title="Event Funnel",
     font=dict(size=12),
     width=500,
     height=500)

# Merging two graphs
fig = sp.make_subplots(rows=1, cols=2, subplot_titles=("Event Funnel", "Event Frequency"))
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=1, col=2)

# General name of graphs
fig.update_layout(
     title_x=0.50,
     title_text="App Events",
     font=dict(size=12))

fig.update_layout(width=1750, height=500)

fig.show()

The number of users for each event was distributed:
1. 7,419 users entered the main screen. 
2. The offer screen was opened by 4,593 users. 
3. The screen with the basket appeared in 3,734 users.
4. Screen with successful payment - 3,539 users. 
5. The instruction was opened by 840 users, which indicates that not every user needs a tutorial to complete all the steps before payment.

Frequency of events:
1. The most frequent event is the opening of the main screen (117,328 times).
2. A screen with suggestions (46,333 times). 
3. Basket (42,303 times).
4. Successful payment (33,918 times). 
5. Tutorial (1,005 times). 

It is possible that users do not consider it necessary to read the instructions and refer to it only if necessary.
Therefore, it is probably not necessary to take into account the Tutorial in the funnel.

In [None]:
event_user = event_user.query('event_name != "Tutorial"').reset_index(drop=True) # Delet Tutorial 

In [None]:
fig = go.Figure(go.Funnel(
    y=event_user['event_name'],
    x=event_user['user_cnt'],
    textposition="inside",
    textinfo="value+percent previous"))

fig.update_layout(
    title="Funnel of events", title_x=0.50,
    font=dict(size=12),
    width=800,
    height=500)

fig.show()

From the analysis of the funnel of events, it follows that the transition to the offer screen is the most difficult stage for users. It is successfully passed by only 62% of users who go to the main screen, then at the next step, 82 users who got to the screen with the offer go to the shopping cart. And finally, of those who get into the basket, 95% complete the purchase. I would venture to assume that the application may have usability problems on the home screen, which lead to an outflow of users.

In [None]:
result = (
     event_user.loc[event_user['event_name']=='PaymentScreenSuccessful', 'user_cnt'].sum() /
     event_user.loc[event_user['event_name']=='MainScreenAppear', 'user_cnt'].sum() * 100
)
print("Percentage of users who passed from the first event to payment: {:.2f}%".format(result))

### Conclusion:
*  Not all users go to the main screen, perhaps they leave at the tutorial stage.And the instructions in the application may have problems (complexity, usability, etc.)
* Places by the number of users and by the frequency of events were distributed equally.
 1. The most frequent event is the opening of the main screen.
 2. A screen with suggestions.
 3. Shopping cart.
 4. Successful payment.
 5. Tutorial

*  We found out that the transition to the offer screen is the most difficult stage for users.
 * Suggested that the application may have usability problems on the home screen, which lead to an outflow of users.
 * Calculated the percentage of users who passed from the first event to payment: 47.70%

## 5. Explore results of test

In [None]:
new_logs['exp_id'].value_counts() # distribution of events by groups

The largest number of events in the experimental group is 248, the second place is in the group 246 and the last place is 247.
* The number of events in the experimental 248 group is 6.74% more than 246 and 9.79% higher than in 247.

In [None]:
new_logs.groupby('exp_id')['device_idhash'].nunique().sort_values(ascending=False) # number of users by group

The groups are formed exactly without a big difference in the number of participants.
But most of all users in the experimental group.

In [None]:
user_group_counts = new_logs.groupby('device_idhash')['exp_id'].nunique()
users_in_group = user_group_counts[user_group_counts > 1].index.tolist()
print(f"Number of users included in more than one group: {len(users_in_group)}")

In [None]:
def event_group_pivot(group):
     result = (
         new_logs
         .query('exp_id == @group and event_name != "Tutorial"')
         .groupby('event_name')
         .agg(device_count=('device_idhash', 'nunique'))
         .sort_values(by='device_count', ascending=False)
         .reset_index()
     )
     return result

def event_group_ratio(df):
     unique_devices = new_logs['device_idhash'].nunique()
     df['ratio'] = round((df['device_count'] / unique_devices), 3)

# Tables for each group
event_246_pivot = event_group_pivot(246)
event_group_ratio(event_246_pivot)
event_246_pivot.columns = ['event_name', '246', '246_pr_user']

event_247_pivot = event_group_pivot(247)
event_group_ratio(event_247_pivot)
event_247_pivot.columns = ['event_name', '247', '247_pr_user']

event_248_pivot = event_group_pivot(248)
event_group_ratio(event_248_pivot)
event_248_pivot.columns = ['event_name', '248', '248_pr_user']

# Joining tables
event_group_pivot = (
     event_246_pivot
     .merge(event_247_pivot, on='event_name')
     .merge(event_248_pivot, on='event_name')
     )

event_group_pivot

In [None]:
fig = go.Figure(go.Funnel(
    name='246',
    y = event_group_pivot['event_name'],
    x = event_group_pivot['246'],
    textinfo = 'value+percent initial'))

fig.add_trace(go.Funnel(
    name='247',
    y = event_group_pivot['event_name'],
    x = event_group_pivot['247'],
    textinfo = 'value+percent initial'))

fig.add_trace(go.Funnel(
    name='248',
    y = event_group_pivot['event_name'],
    x = event_group_pivot['248'],
    textinfo = 'value+percent initial'))

fig.update_layout(
    title="Funnel of events groups", title_x=0.50,
    font=dict(size=12),
    width=900,
    height=600)

fig.show()

At first glance, group 246 seems to be a little more successful than the rest of the groups at almost all stages. Groups 247 and 248 practically do not differ from each other (with the exception of the basket stage).

In [None]:
fig = make_subplots(rows=2, cols=2, subplot_titles=[
     'MainScreenAppear', 'OffersScreenAppear', 'CartScreenAppear', 'PaymentScreenSuccessful'
])

fig.update_layout(title='Distribution of the number of users by groups and event dates', height=800, title_x=0.50)

for idx, event_name in enumerate(['MainScreenAppear', 'OffersScreenAppear', 'CartScreenAppear', 'PaymentScreenSuccessful'], 1):
     row = (idx - 1) // 2 + 1
     col = idx % 2 if idx % 2 else 2
     for exp_id in new_logs['exp_id'].unique():
         data = new_logs.query('event_name == @event_name and exp_id == @exp_id')['date_time']
         fig.add_trace(go.Histogram(x=data, name=f'Group {exp_id}'), row=row, col=col)

fig.update_xaxes(title_text='Date', tickformat='%d-%m-%y', row=1, col=1)
fig.update_xaxes(title_text='Date', tickformat='%d-%m-%y', row=1, col=2)
fig.update_xaxes(title_text='Date', tickformat='%d-%m-%y', row=2, col=1)
fig.update_xaxes(title_text='Date', tickformat='%d-%m-%y', row=2, col=2)
fig.update_yaxes(title_text='Number of users', row=1, col=1)
fig.update_yaxes(title_text='Number of users', row=1, col=2)
fig.update_yaxes(title_text='Number of users', row=2, col=1)
fig.update_yaxes(title_text='Number of users', row=2, col=2)

fig.update_layout(width=1600, height=800)

fig.show()

The data for the day have a normal distribution with small deviations on individual days. This allows us to formulate and test the hypothesis about the equality of shares in events for different groups.You can use the Z-criterion for the difference of fractions. This criterion is based on the normal distribution of the difference of sample fractions and allows us to test the hypothesis of equality of fractions between two groups.

### Investigating the difference between A/A test samples

Let 's define hypotheses:
* Hypothesis H0 the shares of two samples of unique visitors who visited the funnel stage are equal to each other.
* Hypothesis H1 there is a significant difference between the shares of unique visitors who visited the funnel stage..
* We will choose the significance level equal to 1%, which means that the probability of mistakenly rejecting H0 if it is true should not exceed 1%.
I lowered the significance level because we will test 4 groups with each other, taking into account the fact that we have multiple hypothesis testing, which means the risk of a false positive result increases, to find differences where they actually do not exist.

A function for conducting a z-test to test the hypothesis of the equality of the fractions of two samples

In [None]:
def z_test(exp_group_1, exp_group_2, event, alpha):
    
     alpha = alpha
    
     group_1 = new_logs.query('exp_id == @exp_group_1')
     group_2 = new_logs.query('exp_id == @exp_group_2')

     #sample values at the level of the event being tested
     successes = np.array([event_group_pivot.query('event_name == @event')[str(exp_group_1)].sum(),
                           event_group_pivot.query('event_name == @event')[str(exp_group_2)].sum()])
    
     #initial sample values
     trials = np.array([len(group_1['device_idhash'].unique()), len(group_2['device_idhash'].unique())])
    
     # proportion of successes in groups:
     p1 = successes[0]/trials[0]
     p2 = successes[1]/trials[1]
    
     # proportion of successes in the combined dataset:
     p_combined = (successes[0] + successes[1]) / (trials[0] + trials[1])
    
     # difference in proportions in datasets
     difference = p1 - p2
    
     # calculate statistics in standard deviations of the standard normal distribution
     z_value = difference / math.sqrt(p_combined * (1 - p_combined) * (1/trials[0] + 1/trials[1]))
    
     # set the standard normal distribution (mean 0, standard deviation 1)
     distr = st.norm(0, 1)
    
     p_value = (1 - distr.cdf(abs(z_value))) * 2
    
     print('p-value: ', round(p_value, 4))
     if (p_value < alpha):
         print('Reject the null hypothesis: there is a significant difference between the proportions')
     else:
         print('It was not possible to reject the null hypothesis, there is no reason to consider the shares to be different.')

In [None]:
z_test(246, 247, 'MainScreenAppear', 0.01)

In [None]:
z_test(246, 247, 'OffersScreenAppear', 0.01)

In [None]:
z_test(246, 247, 'CartScreenAppear', 0.01)

In [None]:
z_test(246, 247, 'PaymentScreenSuccessful', 0.01)

#### Conclusion:
Based on the results of all A/A tests, where the absence of statistically significant differences in the proportions of users who committed each of the events was checked, we can conclude that the division into groups occurs correctly.

### Investigation of the difference between A/B test samples

In [None]:
event_group_pivot['Cluster_246_247'] = (event_group_pivot['246'] + event_group_pivot['247'])
event_group_pivot['Cluster_246_247'] = event_group_pivot['Cluster_246_247'].astype('int')
event_group_pivot

Function for comparison with a combined group.

In [None]:
def z_test_united(exp_group_2, event, alpha):
    
     alpha=alpha
     group_1 = new_logs.query('exp_id == 246 | exp_id == 247')
     group_2 = new_logs.query('exp_id == @exp_group_2')

     successes = np.array([event_group_pivot.query('event_name == @event')['Cluster_246_247'].sum(),
                           event_group_pivot.query('event_name == @event')[str(exp_group_2)].sum()])
     trials = np.array([len(group_1['device_idhash'].unique()), len(group_2['device_idhash'].unique())])
    
     p1 = successes[0]/trials[0]
     p2 = successes[1]/trials[1]
     p_combined = (successes[0] + successes[1]) / (trials[0] + trials[1])
    
     difference = p1 - p2
    
     z_value = difference / math.sqrt(p_combined * (1 - p_combined) * (1/trials[0] + 1/trials[1]))
     distr = st.norm(0, 1)
     p_value = (1 - distr.cdf(abs(z_value))) * 2
    
     print('p-value: ', round(p_value, 4))
     if (p_value < alpha):
         print('Reject the null hypothesis: there is a significant difference between the proportions')
     else:
         print('It was not possible to reject the null hypothesis, there is no reason to consider the shares to be different.')

#### A/B-test for the "MainScreenAppear" event

In [None]:
z_test(246, 248, 'MainScreenAppear', 0.01)
z_test(247, 248, 'MainScreenAppear', 0.01)
z_test_united(248, 'MainScreenAppear', 0.01)

##### There is no reason to consider the shares at the first step different.

#### A/B-тест для события "OffersScreenAppear"

In [None]:
z_test(246, 248, 'OffersScreenAppear', 0.01)
z_test(247, 248, 'OffersScreenAppear', 0.01)
z_test_united(248, 'OffersScreenAppear', 0.01)

##### There is no reason to consider the shares at the second step different.

#### A/B-test for the 'CartScreenAppear' event

In [None]:
z_test(246, 248, 'CartScreenAppear', 0.01)
z_test(247, 248, 'CartScreenAppear', 0.01)
z_test_united(248, 'CartScreenAppear', 0.01)

##### There is no reason to consider the shares in the third step different.

#### A/B-test for the 'PaymentScreenSuccessful' event

In [None]:
z_test(246, 248, 'PaymentScreenSuccessful', 0.01)
z_test(247, 248, 'PaymentScreenSuccessful', 0.01)
z_test_united(248, 'PaymentScreenSuccessful', 0.01)

##### And at the fourth step, no significant differences in the proportions between the groups were found.

##### Conclusion:
- 4 z-tests of control groups 246 and 247 in the context of funnel events: there were no statistically significant differences between the groups, which means that the samples were made correctly.

- 12 z-tests of similar tests with the control and the tested group: no statistically significant differences between the conversions of the groups were revealed.

Since 16 tests were conducted in the end, in order for the group probability of error not to exceed a certain level of significance α (we lowered the significance level to 0.01), if the number of groups was greater, we would have applied the Bonferroni correction to the coefficient of statistical significance.


Based on the conducted A/B tests, we can conclude that changes in the design of the site (font changes) did not lead to statistically significant changes in user behavior. In general, the test results show that users quickly adapted to the new design and continued their normal behavior. We can conclude that the design changes did not have a significant impact on users.

## The main conclusion:
From the presented data is that the application has usability problems on the offer screen, which leads to an outflow of users. Moreover, changing fonts did not lead to statistically significant changes in user behavior. Also, it can be concluded that the division into groups within the framework of the conducted A/A tests occurs correctly. In general, the conducted experiment allows us to draw conclusions about the current state of the application and what measures can be taken to improve its usability.
 In addition, it is worth considering that the conclusions drawn on the basis of A/B testing depend on the selected level of significance. For example, if we had chosen a significance level of 10%, we would have found significant differences between groups 246 and 248 regarding the appearance of the basket. However, at this level of significance, every tenth time you can get a false result, and we conducted 16 tests. Therefore, when choosing the significance level, you should be careful and take into account its impact on the results and conclusions.