# Mobile application for selling food products

We have event data from a startup mobile app that sells food products. We need to understand how users of the mobile app behave.

<u>The purpose of the project:</u>
- study the sales funnel;
- learn the user's path before purchasing;
- calculate how many users make a purchase;
- calculate how many users are “stuck” at previous steps;
- find out at what steps the user gets “stuck” in the previous steps;
- study the results of the A/A/B experiment on changing fonts.<br>

<u>Work plan:</u>
1. We will conduct a general analysis from the provided data;
2. We will perform data preprocessing: processing duplicate gaps, renaming columns and transforming values;
3. We will conduct a data study: distribution of data by different parameters and analysis for anomalies;
4. Event funnel analysis: studying the step sequence and studying the conversion to step;
5. Analysis of conversion to an event based on the results of an A/B test.


## General information and preliminary data analysis

The event log data is provided in the file `/datasets/logs_exp.csv`, which stores the following information.

- event name;
- unique user identifier;
- time of the event;
- experimental group number.

### Importing libraries


In [None]:
import pandas as pd
import datetime
import numpy as np
import math as mth
from scipy import stats as st
from matplotlib import pyplot as plt
import matplotlib.colors as mcolors
from plotly import graph_objects as go

### Loading data

Let's download data from the file `/datasets/logs_exp.csv` taking into account the separator `\t`.


In [None]:
logs = pd.read_csv('logs_exp.csv', sep='\t')

Let's display the first rows of the table and examine the data.


In [None]:
logs.head()

Let's display general information about the table.


In [None]:
logs.info()

### Conclusion

From a preliminary review of the event log data, the following conclusions can be drawn:
- ***244,126 records*** were found in the file;
- ***no obvious gaps*** with the value `nan` in all columns, but the issue of gaps and duplicates needs to be carefully examined;
- it is necessary to rename the columns to shorten their names and make them more readable;
- the time data is written as a numeric value of milliseconds and the date and time column will need to be translated;
- you can replace the values ​​of experimental groups in a more familiar string format for greater convenience (`A1`, `A2`, `B`).


## Data preparation


### Replacing column names

Let's rename the column names to a more familiar and short string format using lower case and an underscore separator `_` between parts.


In [None]:
logs.columns = ['event_name', 'user_id', 'event_timestamp', 'group']

Let's output rows with new column names.


In [None]:
logs.head(3)

### Checking passes


As we have already found out, when calling the `info` method on the dataset, no obvious gaps were found in any columns (the number of values ​​corresponds to the number of records).

Let's check the unique values ​​of the event names and make sure that they all correspond to an event transition to a specific window.


In [None]:
list(logs.event_name.unique())

We will also check the unique values ​​of the experimental groups and make sure that there are exactly 3 of them.


In [None]:
list(logs.group.unique())

### Checking and processing duplicates


Let's check for duplicate rows in the entire dataset.


In [None]:
sum(logs.duplicated())

We found ***413 duplicate records***. Given that the data contains columns with event times, we can assume that the same event could have been recorded multiple times.

Let's filter out duplicate records.


In [None]:
logs.drop_duplicates(inplace=True)

The number of records has been reduced to ***243713***.


In [None]:
len(logs.event_name)

Also found ***23,156*** duplicate records ONLY for the user and event time columns.<br><br>
It is not very clear why several events could be recorded for one user at one point in time, so we will leave these records.


In [None]:
print("Количество", sum(logs.duplicated(subset=['user_id', 'event_timestamp'])))

### Adding Date and Datetime Columns

Event date and time data is difficult to analyze by numeric value, so let's create 2 additional columns: date (`event_date`) and date and time (`event_datetime`).


In [None]:
logs['event_datetime'] = pd.to_datetime(logs['event_timestamp'], unit='s')
logs['event_date'] = logs['event_datetime'].dt.date

Let's output a fragment of the dataset and evaluate the result.


In [None]:
logs.sample(5)

### Renaming the value of experimental groups

For ease of perception, let's rename the group values ​​to symbolic values:
- control group `246` - `A1`;
- control group `247` - `A2`;
- test group `248` - `B`.


In [None]:
group_convert = lambda x: "B" if x == 248 else "A1" if x == 246 else "A2"
logs.group = logs.group.apply(group_convert)

Let's double-check the unique values.


In [None]:
list(logs.group.unique())

We can check and evaluate the result.


In [None]:
logs.sample(5)

### Conclusion

The following steps were taken in preparing the data for the study:
1. Renamed columns;
2. Records were analyzed for omissions;
3. Duplicate records were found and filtered for all columns;
4. Added date and date with time columns;
5. The values ​​of the control and test groups have been renamed.


## Data Research


### Analysis of general statistical data of events


Let's calculate the general information:
- number of records in the dataset;
- number of unique events;
- number of unique users;
- average number of events per user.


In [None]:
print("Number of records:", len(logs.event_name))
print("Number of unique events:", len(logs.event_name.unique()))
print("Unique events:", ", ".join(logs.event_name.unique()))
print("Number of users:", len(logs.user_id.unique()))
print("Average number of events per user: {:.3}".format(len(logs.event_name) / len(logs.user_id.unique())))

Let's calculate the period during which the events occurred (14 days).


In [None]:
print("Observation start:", min(logs.event_datetime))
print("Observation end:", max(logs.event_datetime))
print("Observation period:", max(logs.event_datetime) - min(logs.event_datetime))

#### Analysis of the distribution of events by observation days

Let's derive a graph of the distribution of the number of events by day.


In [None]:
date_counts = logs.groupby('event_date').agg({'event_name': 'count'})

ax = date_counts.plot.bar(figsize=(11, 4), grid=True, legend=False)
res = ax.set_title("Number of events by observation date", fontsize=20)
res = ax.set_xlabel("Event date", fontsize=15)
res = ax.set_ylabel("Number of events", fontsize=15)
res = ax.set_xticklabels(list(map(lambda x: datetime.datetime.strftime(x, '%b %-d'), date_counts.index)), rotation=0, fontsize=13)
for p in ax.patches:
    ax.annotate(str(p.get_height()), (p.get_x()-0.1, p.get_height() + 500))

We will see that during the observation period in the first week the number of events is several times less than on August days. The reasons for such a difference may be technical problems, as a result of which the event may "reach" August. In this case, it cannot be said that the data for July is complete, so it is better to filter them.

Let's construct the same table charts of distribution by days, but with a breakdown into groups and types of events.


In [None]:
date_counts = pd.pivot_table(logs, index='event_date', columns='group', values='event_name', aggfunc='count')
event_counts = pd.pivot_table(logs, index='event_date', columns='event_name', values='user_id', aggfunc='count')

In [None]:
ax1 = plt.subplot(1, 2, 1)
date_counts.plot.bar(figsize=(20, 4), grid=True, ax=ax1)
ax1.legend(loc='upper left')
res = ax1.set_title("Number of events by observation date\nbroken down by experimental groups", fontsize=20)
res = ax1.set_xlabel("Event date", fontsize=15)
res = ax1.set_ylabel("Number of events", fontsize=15)
res = ax1.set_xticklabels(list(map(lambda x: datetime.datetime.strftime(x, '%b %-d'), date_counts.index)), rotation=0, fontsize=10)

ax2 = plt.subplot(1, 2, 2)
event_counts.plot.bar(grid=True, ax=ax2)
ax2.legend(loc='upper left')
res = ax2.set_title("Number of events by observation date\nbroken down by event type", fontsize=20)
res = ax2.set_xlabel("Event date", fontsize=15)
res = ax2.set_ylabel("Number of events", fontsize=15)
res = ax2.set_xticklabels(list(map(lambda x: datetime.datetime.strftime(x, '%b %-d'), date_counts.index)), rotation=0, fontsize=10)

From the graphs we see that neither the experimental group nor the type of event depends on the day of observation, which tells us that the events for August can be considered complete.

Let's calculate the share of events for July.


In [None]:
print("Number of events in July:", len(logs[logs.event_datetime.dt.month == 7].index))
print("Share of events in July: {:.1f}%".format(100 * len(logs[logs.event_datetime.dt.month == 7].index) / len(logs.index)))

The share of events for July is ***1.2%***. These events can be neglected and filtered out of the total sample. The distribution by experimental groups and event types on July days is similar to August, so this will not affect further research.

Finally, let's filter out the July days.


In [None]:
filtered_logs = logs[logs.event_datetime.dt.month == 8]

#### Anomalous peaks in the number of events over time

Let's plot a graph of the distribution of events over time. Each division will correspond to the number of events that were sent within half an hour in a certain time interval.


In [None]:
(max(logs.event_datetime) - min(logs.event_datetime)) / (24 * 14 * 2)

In [None]:
ax = filtered_logs.event_datetime.hist(bins=24*7*2, figsize=(20, 5))
res = ax.set_title("Distribution of events over time", fontsize=20)
res = ax.set_ylabel("Number of events", fontsize=20)
res = ax.set_xlabel("Event time", fontsize=20)

From the graph we see that events are most often sent during the daytime period, which is logical. However, it can be noted that at certain points in time, the frequency of sending events jumps very strongly. This may be due to excessive user activity.

Let's plot a scatter plot of the number of events per user.


In [None]:
users_orders = filtered_logs.groupby('user_id').agg({'event_name': 'count'}).rename(columns={'event_name': 'events_count'})
xlabel = pd.Series(range(len(users_orders)))
ax = plt.scatter(xlabel, users_orders, alpha=0.3)
ax = plt.grid()
res = plt.title("Scatter plot of the number\nof events per user", fontsize=15)
res = plt.ylabel("Number of events", fontsize=15)
res = plt.xlabel("User index", fontsize=15)

From the graph we see that in general, a user has from 0 to 200 events during the week, however, there are users whose number of events has reached 2500.

Let's calculate the 50th, 75th, 95th and 99th percentiles of the number of events per user.


In [None]:
print(np.percentile(users_orders, [50, 75, 95, 99, 99.5]))

***1 percent of events*** produced between 200 events and 2,500 events. Considering that 75 percent of users may only have up to 37 events, this 1 percent can have a significant impact on the overall user behavior analysis.

Let's calculate the total number of anomalous users, the number of events produced by them, and the share of events from anomalous users.


In [None]:
abnormal_user_ids = users_orders[users_orders.events_count > 201].index
abnormal_logs = filtered_logs[filtered_logs.user_id.isin(abnormal_user_ids)]
print("Number of users with 200+ events:", len(abnormal_user_ids))
print("Number of events from the most active users:", len(abnormal_logs))
print("Share of events from the most active users: {:.1f}%".format(100 * len(abnormal_logs) / len(filtered_logs)))


As we can see, ***1 percent of the most active users*** account for ***14.2 events***. This indicator means that this 1 percent can greatly distort the overall picture of the use of the application by a casual user.

In addition, we will construct a similar distribution of events by time, but only for abnormally active users. For comparison, we will also construct a distribution by time of events from users not included in this 1 percent, having previously made a cut by the sample size equal to the number of events from abnormally active users (34272 records).


In [None]:
normal_logs = filtered_logs[~filtered_logs.user_id.isin(abnormal_user_ids)]
normal_logs.sample(34272).event_datetime.hist(bins=24*7*2, figsize=(20, 5), alpha=0.9, label="Regular users")
ax = abnormal_logs.event_datetime.hist(bins=24*7*2, figsize=(20, 5), alpha=0.7, label="Active users")
ax.legend()
res = ax.set_title("Distribution of events over time", fontsize=20)
res = ax.set_ylabel("Number of events", fontsize=20)
res = ax.set_xlabel("Event time", fontsize=20)

From the graph we can see that there are random half-hour periods of time with a large number of events being sent from the most users, against a more even distribution of "normal" users.

This effect may be caused by delays in sending events to tracking or other technical problems.

Let's filter out events from ***1 percent of users*** and re-plot the event distribution graph over time.


In [None]:
ax = filtered_logs.event_datetime.hist(bins=24*7*2, figsize=(20, 5), alpha=0.9)
ax = normal_logs.event_datetime.hist(bins=24*7*2, figsize=(20, 5), alpha=0.7)
res = ax.set_title("Distribution of events over time", fontsize=20)
res = ax.set_ylabel("Number of events", fontsize=20)
res = ax.set_xlabel("Event time", fontsize=20)

You can see from the graph that we have solved the problem with abnormal peaks in a certain period of time. The distribution graph has become smoother and better reflects the picture of application use by time of day.


In [None]:
data = normal_logs

At the moment we have filtered ***15.2%*** of event records, of which about 14% are from the ***1% most active users*** and a few ***events from the month of July***.


In [None]:
print("Share of filtered events: {:.1f}%".format((1 - len(data.event_name) / len(logs.event_name)) * 100))

#### Check for overlapping experimental groups by users


Let's calculate how many users could be in several experimental groups, as this can lead to distortion of the A/B test results.


In [None]:
user_group_counts = data.groupby('user_id').agg({'group': lambda x: len(x.unique())})
len(user_group_counts[user_group_counts.group > 1])

No group intersections found. No anomalies.


#### Distribution by experimental groups


Let's count the number of events and users by experimental groups.


In [None]:
group_counts = data.groupby('group').agg({'event_name': 'count', 'user_id': 'nunique'})
group_counts = group_counts.rename(columns={'event_name': 'events', 'user_id': 'users'})
group_counts

Let's also construct a pie chart of the distribution of events and users by groups.


In [None]:
ax1 = plt.subplot(1, 2, 1)
ax2 = plt.subplot(1, 2, 2)
group_counts.events.plot.pie(subplots=True, autopct='%1.1f%%', legend=False, ax=ax1)
group_counts.users.plot.pie(subplots=True, autopct='%1.1f%%', legend=False, ax=ax2)
res = ax1.set_title("Distribution of events\nby groups")
res = ax2.set_title("Distribution of users\nby groups")

According to the table and graphs, events and users are distributed evenly across groups. The share of each group ranges from ***32.6% to 34.3%*** of the total.


### Conclusion from the data study

Based on the results of the data study, the following conclusions were made:
- 15.2% of all entries were filtered out due to the most active users and the low number of events in July compared to August;
- no overlaps were found between users in the control and experimental groups;
- the distribution of events and users across groups can be considered uniform.


## Funnel Analysis


### Analysis by event type. Funnel study


Let's analyze the event funnel. First, we'll output the number of events, users, and the share of users who have performed at least one event by their type. We'll also output the cumulative dynamics for these metrics.


In [None]:
event_counts = data.groupby('event_name').agg({'event_name': 'count', 'user_id': 'nunique'}).rename(columns={'event_name': 'events', 'user_id': 'users'})
event_counts['users_share'] = np.round(100 * event_counts['users'] / len(data.user_id.unique()), 1)

eventByDate_countsCum = pd.pivot_table(data, index='event_date', columns='event_name', values='user_id', aggfunc='count').cumsum()

def usersCum(data):
    return pd.Series(data.event_date.unique()).apply(lambda x: data[data.event_date <= x].user_id.nunique()).to_list()

usersCums = dict((event_name, usersCum(data[data.event_name == event_name])) for event_name in data.event_name.unique())
usersCums['All'] = usersCum(data)
eventByDate_usersCum = pd.DataFrame(data=usersCums, index=list(data.event_date.unique()))
eventByDate_usersShareCum = eventByDate_usersCum.div(eventByDate_usersCum['All'], axis=0)

As you can see, the cumulative dynamics lines of the number of events and users are approximately similar in shape by event type. Accordingly, we can say that there is a clear funnel of steps from opening the screen to purchasing a product.

Based on the event type name and their number, we can reveal the sequence of steps from opening the application to purchasing:
1. `MainScreenAppear` - appearance of the main application screen;
2. `OffersScreenAppear` - page with product selection;
3. `CartScreenAppear` - cart page with selected products;
4. `PaymentScreenSuccessful` - page of successful payment for the order.

There is also a page with familiarization with the work of the application `Tutorial`. Due to the small share of users, sending the event occurs when the application is opened for the first time and he reads the tutorial that appears to the end, or opens the tutorial by clicking on the button from the main screen. In any case, this step can be discarded from the funnel, since it is obviously optional.


In [None]:
ax1 = plt.subplot(3, 2, 1)
ax2 = plt.subplot(3, 2, 2)
ax3 = plt.subplot(3, 2, 3)
ax4 = plt.subplot(3, 2, 4)
ax5 = plt.subplot(3, 2, 5)
ax6 = plt.subplot(3, 2, 6)   

event_names = list(data.event_name.unique())

colors = list(mcolors.TABLEAU_COLORS.values())
colors = sorted(colors, key=lambda c: tuple(mcolors.rgb_to_hsv(mcolors.to_rgb(c))))
colors_dict = dict((event_names[i], colors[i]) for i in range(len(event_names)))

event_counts_sorted = event_counts.events.sort_values(ascending=True)
event_counts_sorted.plot.barh(figsize=(20, 14), grid=True, ax=ax1, color=[colors_dict[x] for x in event_counts_sorted.index])
ax1.set_xlim((0, 135000))
ax1.set_title("Number of events by type", fontsize=18)
ax1.set_xlabel("Number of events", fontsize=16)
ax1.set_ylabel("Event type", fontsize=16)
for p in ax1.patches:
    ax1.annotate(str(p.get_width()), (p.get_width() + 1000, p.get_y() + 0.15), fontsize=15)

eventByDate_countsCum.plot(ax=ax2, grid=True, linewidth=4.0, color=[colors_dict[x] for x in eventByDate_countsCum.columns])
ax2.legend(loc='upper left')
ax2.set_title("Cumulative dynamics of event count by type", fontsize=18)
ax2.set_xlabel("Date", fontsize=16)
ax2.set_ylabel("Number of events", fontsize=16)

event_users = event_counts.users.sort_values(ascending=True)
event_users.plot.barh(grid=True, ax=ax3, color=[colors_dict[x] for x in event_users.index])
for p in ax3.patches:
    ax3.annotate(str(p.get_width()), (p.get_width() - 600, p.get_y() + 0.15), fontsize=15, color='white')
ax3.set_title("Number of users who performed a specific event", fontsize=18)
ax3.set_xlabel("Number of users", fontsize=16)
ax3.set_ylabel("Event type", fontsize=16)

event_usersCum = eventByDate_usersCum.drop('All', axis=1)
event_usersCum.plot(ax=ax4, grid=True, linewidth=4.0, color=[colors_dict[x] for x in event_usersCum.columns])
ax4.legend(loc='upper left')
ax4.set_title("Cumulative dynamics of user count\nby type of events performed", fontsize=18)
ax4.set_xlabel("Date", fontsize=16)
ax4.set_ylabel("Number of events", fontsize=16)
    
event_usersShare = event_counts.users_share.sort_values(ascending=True)
event_usersShare.plot.barh(grid=True, ax=ax5, color=[colors_dict[x] for x in event_usersShare.index])
for p in ax5.patches:
    ax5.annotate(str(p.get_width()) + '%', (p.get_width() - 9, p.get_y() + 0.1), fontsize=15, color='white')
ax5.set_xlim((0, 100))
ax5.set_title("Share of users who performed at least one event by type", fontsize=18)
ax5.set_xlabel("Share of users (%)", fontsize=16)
ax5.set_ylabel("Event type", fontsize=16)

event_usersShareCum = eventByDate_usersShareCum.drop('All', axis=1)
event_usersShareCum.plot(ax=ax6, grid=True, linewidth=4.0, color=[colors_dict[x] for x in event_usersShareCum.columns])
ax6.legend(loc='upper left')
ax6.set_title("Cumulative dynamics of the share of users\nwho performed at least one event by type", fontsize=18)
ax6.set_xlabel("Share of users", fontsize=16)
ax6.set_ylabel("Date", fontsize=16)

plt.tight_layout()
plt.show()

Let's add another visualization for funnels.


In [None]:
funnel_users = event_users.drop('Tutorial').sort_values(ascending=False)
funnel_colors = [colors_dict[event_name] for event_name in funnel_users.index.to_numpy()]
fig = go.Figure(go.Funnel(
    y=funnel_users.index.to_numpy(),
    x=funnel_users.to_numpy(),
    textinfo = "value",
    marker = {"color": funnel_colors}
))
fig.update_layout(
    title=dict(text="User funnel by events", font_size=20, x=0.5)
)
fig.show()

***Conclusion:***

A sequence of funnel steps was found that leads from the main screen to the purchase:
1. `MainScreenAppear` - appearance of the main application screen;
2. `OffersScreenAppear` - page with product selection;
3. `CartScreenAppear` - cart page with selected products;
4. `PaymentScreenSuccessful` - page of successful payment for the order.

Based on the number of clicks to the `Tutorial` page, this step is optional.


### Study of anomalous funnels


Let's first display the number of users who sent only 1 type of events.


In [None]:
user_events = data.groupby('user_id').agg({'event_name': lambda x: ",".join(sorted(x.unique()))})
user_events.rename(inplace=True, columns={'event_name': 'events'})

In [None]:
onlyEvents = pd.Series(event_names).apply(lambda x: len(user_events[user_events.events == x]))
onlyEvents.index = list(event_names)
onlyEvents.name = 'Only'
onlyEvents.to_frame()

From the data, we can see that in addition to those who only went to the main screen `MainScreenAppear`, there are users who only got to the pages `OffersScreenAppear` and `Tutorial`. It can be assumed that customers got to these pages through referral links, bypassing the main screen. There were also no customers who only got to the pages `CartScreenAppear` and `PaymentScreenSuccessful`, which is logical.

Let's count the number of users who sent at least 2 types of events.


In [None]:
users_events_double = dict((x, [len(user_events[np.logical_and(user_events.events.str.contains(y), user_events.events.str.contains(x))]) for y in event_names]) for x in event_names)
users_counts = pd.DataFrame(
    data=users_events_double,
    index=event_names,
)
users_counts.style.background_gradient(axis=None)

In [None]:
print(
    "Number of users who made a purchase 'PaymentScreenSuccessful' but skipped the cart 'CartScreenAppear':",
    users_counts.loc['PaymentScreenSuccessful', 'PaymentScreenSuccessful'] - users_counts.loc['CartScreenAppear', 'PaymentScreenSuccessful']
)
print(
    "Number of users who made a purchase 'PaymentScreenSuccessful' but skipped the offers page 'OffersScreenAppear':",
    users_counts.loc['PaymentScreenSuccessful', 'PaymentScreenSuccessful'] - users_counts.loc['OffersScreenAppear', 'PaymentScreenSuccessful']
)
print(
    "Number of users who made a purchase 'PaymentScreenSuccessful' but skipped the main page 'MainScreenAppear':",
    users_counts.loc['PaymentScreenSuccessful', 'PaymentScreenSuccessful'] - users_counts.loc['MainScreenAppear', 'PaymentScreenSuccessful']
)
print(
    "Number of users who reached the cart 'CartScreenAppear' but skipped the offers page 'OffersScreenAppear':",
    users_counts.loc['CartScreenAppear', 'CartScreenAppear'] - users_counts.loc['CartScreenAppear', 'OffersScreenAppear']
)

The data may show some strange numbers, as there are users who may have skipped stages before making a purchase. Therefore, in the further study, users who did not skip stages of making a purchase in the funnel will not be taken into account.


***Conclusion:***

Users were found who had a rather anomalous funnel, studied in step (4.1) on the transition between screens.
For the conversion of transitions to steps, we will only count users who did not jump between screens.



### Study of overall user conversion per step

Let's calculate the share of users who moved on to the next step:
1. `OffersScreenAppear` provided that the user has opened the main screen `MainScreenAppear`;
2. `CartScreenAppear` provided that the user has opened the screen with offers `OffersScreenAppear`;
2. `PaymentScreenSuccessful` provided that the user opened the cart screen `CartScreenAppear`.

First, let's count the number of users who reached the screens without skipping the stages.


In [None]:
main_events = user_events[user_events.events.str.contains('MainScreenAppear')]
offers_events = main_events[main_events.events.str.contains('OffersScreenAppear')]
cart_events = offers_events[offers_events.events.str.contains('CartScreenAppear')]
payment_events = cart_events[cart_events.events.str.contains('PaymentScreenSuccessful')]

In [None]:
print("Users who opened the main screen:\t\t", len(main_events))
print("Users who opened the offers screen:\t", len(offers_events))
print("Users who opened the cart:\t\t", len(cart_events))
print("Users who made a purchase:\t\t", len(payment_events))

Let's calculate the share of users who completed the next step of the funnel from the number of users who moved to the previous step.


In [None]:
funnel = pd.DataFrame(
    data={
    'share': [
        len(offers_events) / len(main_events),
        len(cart_events) / len(offers_events),
        len(payment_events) / len(cart_events),
    ]}, 
    index=['OffersScreenAppear', 'CartScreenAppear', 'PaymentScreenSuccessful']
)
funnel.style.format('{:.1%}'.format)

According to the table, we see that users are most confused when moving from the main screen `MainScreenAppear` to the application screen `OffersScreenAppear` (about ***40 percent*** of users). This may be due to poor screen design or because users do not go to `OffersScreenAppear` to get familiar with the application and find out its usefulness. In any case, this is a growth point that can be improved.

It is also possible to note a good conversion of the transition from the offer screen `OffersScreenAppear` to the cart screen `CartScreenAppear` (about ***21 percent*** of users drop off). Most likely, the reasons for the outflow are related to the fact that the user did not find what he needed or did not like the products in general.

Another thing to note is the good conversion rate of the purchase `PaymentScreenSuccessful` from the cart screen `CartScreenAppear` (about ***6 percent*** churn), which shows the good work of the payment service. A small churn may be due to the fact that the user may have changed his mind about buying the product.

Let's also calculate the overall conversion of users who have completed the entire path from the main screen to purchasing `PaymentScreenSuccessful`, on those who have at least once landed on `MainScreenAppear`.


In [None]:
print("Share of users who made a purchase: {:.1f}%".format(np.round(100 * len(payment_events) / len(main_events), 2)))

From the conversion and dynamics data, we see that the share of the total loss of potential customers who made at least 1 purchase from those who visited the application at least once is ***about 54.3%***, so there are growth points for increasing revenue.

***Conclusion:***

Based on the conversion data, the following conclusions were made:
- the overall conversion rate to purchase among users who opened the application is ***45.7%***;
- <u>low</u> conversion per step from the main screen to the application screen was calculated - ***60.0%***;
- the conversion rate per step from the application screen to the cart was calculated - ***79.5%***;
- a <u>high</u> conversion rate into purchase was calculated among users who opened the cart - ***95.7%***;


### Study of user conversion per step with breakdown of non-users who viewed/did not view the tutorial


Let's do the same calculations, but only break it down into those users who viewed the tutorial page `Tutorial` and those who did not, to see whether familiarization with the application affects the quality of its use.

As we have already calculated, only ***11% of users*** studied the tutorial, which indicates that this step is optional.

Now let's study the impact of tutorial review on step conversion.


In [None]:
main_events_with_tut = main_events[main_events.events.str.contains('Tutorial')]
offers_events_with_tut = offers_events[offers_events.events.str.contains('Tutorial')]
cart_events_with_tut = cart_events[cart_events.events.str.contains('Tutorial')]
payment_events_with_tut = payment_events[payment_events.events.str.contains('Tutorial')]

main_events_no_tut = main_events[~main_events.events.str.contains('Tutorial')]
offers_events_no_tut = offers_events[~offers_events.events.str.contains('Tutorial')]
cart_events_no_tut = cart_events[~cart_events.events.str.contains('Tutorial')]
payment_events_no_tut = payment_events[~payment_events.events.str.contains('Tutorial')]

In [None]:
print("Users who went through the tutorial and:")
print("- opened the main screen:\t", len(main_events_with_tut))
print("- opened the offers screen:\t", len(offers_events_with_tut))
print("- opened the cart:\t\t", len(cart_events_with_tut))
print("- made a purchase:\t\t", len(payment_events_with_tut))
print()
print("Users who did NOT go through the tutorial and:")
print("- opened the main screen:\t", len(main_events_no_tut))
print("- opened the offers screen:\t", len(offers_events_no_tut))
print("- opened the cart:\t\t", len(cart_events_no_tut))
print("- made a purchase:\t\t", len(payment_events_no_tut))

Let's repeat the calculation of the conversion per step that we did earlier, only breaking it down into the fact of studying the tutorial.


In [None]:
funnel_tut = pd.DataFrame(
    data={
    'with_tutorial': [
        len(offers_events_with_tut) / len(main_events_with_tut),
        len(cart_events_with_tut) / len(offers_events_with_tut),
        len(payment_events_with_tut) / len(cart_events_with_tut),
    ],
    'no_tutorial': [
        len(offers_events_no_tut) / len(main_events_no_tut),
        len(cart_events_no_tut) / len(offers_events_no_tut),
        len(payment_events_no_tut) / len(cart_events_no_tut),
    ]}, 
    index=['OffersScreenAppear', 'CartScreenAppear', 'PaymentScreenSuccessful']
)
funnel_tut['diff'] = funnel_tut['with_tutorial'] - funnel_tut['no_tutorial'] 
funnel_tut.style.format('{:.2%}'.format)

From the numbers you can see:
- ***18.66% difference in conversion per step*** from the main screen `MainScreenAppear` to the application screen `OffersScreenAppear`, among those who studied the tutorial and did not;
- ***2.88% difference in conversion per step*** from the `OffersScreenAppear` offer screen to the `CartScreenAppear` cart screen.

However, it can be noted that the ***conversion to the purchase step from the basket*** for users who did not study the tutorial ***is higher by 9.75%***. The nature of this figure is difficult to explain, perhaps a larger sample of users who went to the tutorial is needed.

Let's compare the overall conversion to purchases among users who have opened the application at least once.


In [None]:
conv_with_tut = len(payment_events_with_tut) / len(main_events_with_tut)
conv_no_tut = len(payment_events_no_tut) / len(main_events_no_tut)
print("Share of users who made a purchase and completed the tutorial: \t\t{:.1%}".format(conv_with_tut))
print("Share of users who made a purchase and did NOT complete the tutorial:\t{:.1%}".format(conv_no_tut))
print("Difference:\t{:.1%}".format(conv_with_tut - conv_no_tut))

Notice the big difference of ***10.4%*** in conversion to purchase among those who studied the tutorial.


***Conclusion:***

The point of growth of conversion into steps and into purchases in general can be an increase in views of the tutorial page.
The difference in conversion to purchase between those who studied the tutorial and those who did not is ***10.4%***. In particular, this affects the conversion of the transition to the step from the main screen to the application.


### Conclusion on funnel analysis

Based on the funnel analysis, the following conclusions can be drawn:

- A sequence of funnel steps was found that leads from the main screen to the purchase:
1. `MainScreenAppear` - appearance of the main application screen;
2. `OffersScreenAppear` - page with product selection;
3. `CartScreenAppear` - cart page with selected products;
4. `PaymentScreenSuccessful` - page of successful payment for the order.
- Based on the number of clicks on the `Tutorial` page, this step is optional;
- Users were found who had a rather abnormal funnel (for the conversion of transitions to steps, only users who did not jump over screens were taken into account);
- the overall conversion rate to purchase `PaymentScreenSuccessful` among users who opened the `MainScreenAppear` application is ***45.9%***;
- <u>low</u> conversion per step from the main screen `MainScreenAppear` to the offer screen `OffersScreenAppear` was calculated - ***60%***;
- the conversion rate per step from the offer screen `OffersScreenAppear` to the cart `CartScreenAppear` has been calculated - ***79.5%***;
- <u>high</u> conversion to purchase `PaymentScreenSuccessful` among users who opened the `CartScreenAppear` cart was calculated - ***95.5%***;

The growth point can be an increase in conversion in the step of transition from the main screen to the offer. This can be done by increasing the conversion of users who have studied the tutorial.
- The difference in conversion per step from the main screen `MainScreenAppear` to the offer page `OffersScreenAppear` between those who studied the `Tutorial` tutorial and those who did not is ***18.66%***;
- The difference in overall conversion to purchase among those who studied the tutorial and those who did not is ***10.4%***.


## Analysis of A/B test results


### Distribution of users by experimental groups

Let's remove the 'Tutorial' event from the data first, since it is not included in the overall funnel of steps from the main screen to a successful purchase.


In [None]:
data = data[data.event_name != 'Tutorial']


As we have studied before, events and users are distributed evenly across experimental groups. No overlaps of user groups were found either.


In [None]:
group_counts = data.groupby('group').agg({'event_name': 'count', 'user_id': 'nunique'})
group_counts = group_counts.rename(columns={'event_name': 'events', 'user_id': 'users'})
group_counts

### A/A test. Shares of users who performed a specific event

Let's count the number of users who performed each type of action in the two control groups.


In [None]:
group_users = pd.pivot_table(data, index='event_name', columns='group', values='user_id', aggfunc='nunique', margins=True)
group_users = group_users.drop('All', axis=1)
group_users_control = group_users.copy().drop('B', axis=1)
group_users_control

The most popular event is the opening of the main screen `MainScreenAppear`.

Let's calculate the proportion of users who performed the `MainScreenAppear` event in each of the control groups.


In [None]:
share1 = group_users_control.loc['MainScreenAppear', 'A1'] / group_users_control.loc['All', 'A1']
share2 = group_users_control.loc['MainScreenAppear', 'A2'] / group_users_control.loc['All', 'A2']

print("Share of users from A1 who performed the 'MainScreenAppear' event: {:.2%}".format(share1))
print("Share of users from A2 who performed the 'MainScreenAppear' event: {:.2%}".format(share2))
print("Difference in share of users between groups A1 and A2: {:.2%}".format(share1 - share2))

The difference between the two control groups is small, but it cannot yet be said that the difference is not statistically significant.

Let's check the hypothesis of equality of shares, where:
- H0 - there is ***NO significant difference*** between the shares of users who performed the `MainScreenAppear` event;
- H1 - there is ***a significant difference*** between the shares of users who performed the `MainScreenAppear` event.

Let's write a function that will calculate the pvalue.


In [None]:
def check_hypothesyis(data, columns1, column2, event_name):
    successes = data.loc[event_name, [columns1, column2]].to_numpy()
    trials = data.loc['All', [columns1, column2]].to_numpy()

    p1 = successes[0] / trials[0]  # proportion of successes in the first group
    p2 = successes[1] / trials[1]  # proportion of successes in the second group

    # proportion of successes in the combined dataset:
    p_combined = (successes[0] + successes[1]) / (trials[0] + trials[1])
    difference = p1 - p2  # difference in proportions between the datasets

    # calculate the z-score in standard deviations of the standard normal distribution
    z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials[0] + 1/trials[1]))

    # define the standard normal distribution (mean 0, std deviation 1)
    distr = st.norm(0, 1)
    pvalue = (1 - distr.cdf(abs(z_value))) * 2

    return pvalue


We will specify the level of statistical significance for the hypotheses as ***0.05***, since leaving a 5% probability that the null hypothesis will be erroneously rejected is a generally accepted indicator.


In [None]:
alpha = 0.05

In [None]:
def hypothesis_conclusion(pvalue, alpha):
    if pvalue > alpha:
        return "Failed to reject the first hypothesis, no evidence to suggest the proportions are different"
    else:
        return "Reject the first hypothesis: there is a significant difference between the proportions"

In [None]:
pvalue = check_hypothesyis(group_users, 'A1', 'A2', 'MainScreenAppear')
print("pvalue: {:.4f}".format(pvalue))
print(hypothesis_conclusion(pvalue, alpha))

As it turns out from the results, there is ***NO reason to believe that the difference of 0.14% is significant*** to reject the null hypothesis. Therefore, it can be said that users were distributed evenly across the control groups by this criterion.

Let's now calculate the shares of users who performed each of the events.


In [None]:
def perc_format(columns):
    return dict((x, '{:.2%}'.format) for x in columns)

In [None]:
group_users_control_share = group_users.div(group_users_control.loc['All'], axis=1)
group_users_control_share.drop(['All'], axis=0, inplace=True)
group_users_control_share.drop(['B'], axis=1, inplace=True)
group_users_control_share['diff'] = group_users_control_share['A1'] - group_users_control_share['A2']
group_users_control_share.style.format(perc_format(group_users_control_share.columns))

The largest difference in proportions between the two control groups is ***1.59%*** for the `PaymentScreenSuccessful` purchase event. However, there is no reason to assume that the difference is statistically significant. Accordingly, similar hypotheses should be made.

Let's test several hypotheses about the equality of the shares of users who performed the events `CartScreenAppear`, `MainScreenAppear`, `OffersScreenAppear`, `PaymentScreenSuccessful`, `Tutorial` where:
- H0 - there is ***NO significant difference*** between the proportions of users in the control groups;
- H1 - there is ***a significant difference*** between the proportions of users in the control groups.


In [None]:
pd.options.display.max_colwidth = 600
def pvalue_color_builder(alpha):
    def pvalue_color(val):
        color = 'pink' if val < alpha else 'lightgreen'
        return 'background-color: %s' % color
    return pvalue_color

def pvalue_format(columns):
    return dict((x, '{:.4f}'.format) for x in columns)

In [None]:
event_names = group_users_control.drop('All').index.to_series()

group_users_control_share['pvalue'] = event_names.apply(lambda x: check_hypothesyis(group_users_control, 'A1', 'A2', x))
group_users_control_share['conclusion'] = group_users_control_share.apply(lambda x: hypothesis_conclusion(x.pvalue, alpha), axis=1)
group_users_control_share.style.applymap(pvalue_color_builder(alpha), subset=pd.IndexSlice[:, ['pvalue']]).format(perc_format(['A1', 'A2', 'diff'])).format(pvalue_format(['pvalue']))

***Conclusion:***

Based on these findings, it can be concluded that for each event, the proportion of users who committed them does not differ.


### A/B test. Shares of users who performed a specific event


Let's now run an A/B test comparing the control group with the experimental group, where we changed the fonts.
We will make a multiple hypothesis, where we will check the significance of the difference in the proportion of users who performed a specific event:
- between the control `A1` and experimental `B` groups;
- between the control `A2` and experimental `B` groups;
- between the combined control `A12` and experimental `B` groups;


Let's output the number of users who performed a specific event in each control group. We'll also add a combined control group.


In [None]:
group_users['A12'] = group_users['A1'] + group_users['A2']
group_users = group_users.reindex(['A1', 'A2', 'A12', 'B'], axis=1)
group_users

Let's calculate the shares of users in each group (`A1`, `A2`, `A12`, `B`) who performed each observed event. We will also derive the difference in the shares of each of the control groups `A1`, `A2`, `A12` with the experimental `B`, as well as between the two control groups `A1`, `A2`.


In [None]:
group_users_share = group_users.div(group_users.loc['All'], axis=1)
group_users_share.drop(['All'], axis=0, inplace=True)
group_users_share['diff_A1_A2'] = group_users_share['A1'] - group_users_share['A2']
group_users_share['diff_A1_B'] = group_users_share['A1'] - group_users_share['B']
group_users_share['diff_A2_B'] = group_users_share['A2'] - group_users_share['B']
group_users_share['diff_A12_B'] = group_users_share['A12'] - group_users_share['B']
group_users_share.style.format(perc_format(group_users_share.columns))

We can see that in almost all cases, the control groups have better indicators. But this does not mean that the difference is significant.

Let's test several hypotheses about the equality of the shares of users who performed the events `CartScreenAppear`, `MainScreenAppear`, `OffersScreenAppear`, `PaymentScreenSuccessful` where:
- H0 - between the proportions of users in the control and experimental groups ***NO significant difference***;
- H1 - there is ***a significant difference*** between the proportions of users in the control and experimental groups.

We will not include the `Tutorial` events in the tests, since this event is not included in the general funnel.

Since we are making a multiple hypothesis, where we test 4 pairs of groups (`A1` and `A2`, `A1` and `B`, `A2` and `B`, `A1`+`A2` and `B`) for 4 key events (16 tests in total), we must adjust the level of statistical significance to ***reduce the group probability of type I error (FWER)***.

Let's use it and calculate what the probability of making a type I error (FWER) will be if we leave the significance level equal to 0.05 for each test.


In [None]:
print("FWER: {:.1%}".format(1 - pow((1 - alpha), 16)))

***56%*** is a very high `FWER`, so we need to make a correction. We will use the Bonferroni method, where we will adjust the significance level by the number of tests.


In [None]:
alpha_bonf = alpha / 16
alpha_bonf

Let's calculate the group error of type I for the adjusted significance level.


In [None]:
print("FWER: {:.2%}".format(1 - pow((1 - alpha_bonf), 16)))

The Type I error rate is now ***4.88%***, using the Bonferron method to adjust the significance level.


Now let's calculate the `pvalue` values ​​for each of the tests, taking into account the significance level correction.
Let's highlight in green the cells where we cannot reject the `H0` hypothesis about the statistical significance of the difference in proportions.


In [None]:
def hypothesis_list_conclusion(pvalues, alpha):
    if all([pvalue > alpha for pvalue in pvalues]):
        return "Failed to reject the first hypothesis, no evidence to suggest the proportions are different"
    else:
        return "Reject the first hypothesis: there is a significant difference between the proportions"

In [None]:
group_hypothesis = group_users_share.copy()
pvalue_columns = ['pvalue_A1_A2','pvalue_A1_B', 'pvalue_A2_B', 'pvalue_A12_B']
color_slice = pd.IndexSlice[:, pvalue_columns]

group_hypothesis['pvalue_A1_A2'] = event_names.apply(lambda x: check_hypothesyis(group_users, 'A1', 'A2', x))
group_hypothesis['pvalue_A1_B'] = event_names.apply(lambda x: check_hypothesyis(group_users, 'A1', 'B', x))
group_hypothesis['pvalue_A2_B'] = event_names.apply(lambda x: check_hypothesyis(group_users, 'A2', 'B', x))
group_hypothesis['pvalue_A12_B'] = event_names.apply(lambda x: check_hypothesyis(group_users, 'A12', 'B', x))
(
    group_hypothesis.style.applymap(pvalue_color_builder(alpha_bonf), subset=color_slice)
        .format(perc_format(['A1', 'A2', 'A12', 'B', 'diff_A1_A2', 'diff_A1_B', 'diff_A2_B', 'diff_A12_B']))
        .format(pvalue_format(pvalue_columns))
)

Based on pvalue values, we will draw conclusions based on the test results:
- separately for each event;
- general conclusion for all tests.


In [None]:
conclusions = group_hypothesis.apply(lambda x: hypothesis_list_conclusion(x[pvalue_columns], alpha_bonf), axis=1)
conclusions['TOTAL CONCLUSION'] = hypothesis_list_conclusion(group_hypothesis[pvalue_columns].to_numpy().flatten(), alpha_bonf)
conclusions = conclusions.rename('conclusion').to_frame()
conclusions.style.set_properties(**{'background-color': 'black', 'color': 'white'}, subset=pd.IndexSlice[['TOTAL CONCLUSION'], :])

As can be seen, for each of the events, there is no statistical difference between the control and experimental groups.

The lowest `pvalue` among 16 tests performed is ***0.0855*** for the test describing the difference in shares of ***2.61%*** between the control `A1` and experimental `B` groups in users who went to the cart page `CartScreenAppear`.

As we have already said, ***the probability of a type I error is 4.88%.***

Accordingly, it can be argued that changes to the font will not help increase conversions in transitions between screens.


### Conclusions from the results of A/A and A/B tests

Based on the test results, the following conclusions and recommendations can be made:
- the distribution of users and events across the three groups was distributed evenly into thirds;
- two control groups were pre-tested against each other and no statistical significance was found in the difference in the proportion of users who performed a specific event;
- multiple tests were conducted between control and experimental groups and no statistical difference was found in the number of users going to a particular screen;
- the test can be stopped and it can be assumed that ***changing fonts*** does not lead to an increase in conversion into clicks on screens and purchases.
