# The A/B test analysis

**Project content**

1. [Familiarization with data](#start)
2. [Data preprocessing](#preprocessing)
3. [Test correctness evaluation](#matchtest)
4. [Exploratory Data Analysis](#eda)
5. [Hypothesis testing](#hypot)
6. [General conclusions](#conclusion)

**Description of the project**

Our goal is to evaluate the results of the A/B test.

Tasks :

* Evaluate the correctness of the test

* Analyze test results

To evaluate the correctness of the test, check:

* data compliance with the requirements of the TOR

* no overlap of groups

* check the uniformity of distributions in groups and the ratio of classes

* Intersection of the test audience with a competing test

* Find the time of the test. The coincidence between test and marketing events, other test timing issues

## Familiarization with data
<a id="start"></a>

**Let's call the libraries we need**

In [1]:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import plotly.express as px
from plotly import graph_objects as go
import math as mt
import numpy as np

import scipy.stats as stats

from scipy.stats import norm

import datetime as dt

*Let's output datasets*

In [None]:
try:
     mark_events, ab_new_users, ab_events, ab_participants = (
         pd.read_csv('https://code.s3.yandex.net/datasets/ab_project_marketing_events.csv'),
         pd.read_csv('https://code.s3.yandex.net/datasets/final_ab_new_users.csv'),
         pd.read_csv('https://code.s3.yandex.net/datasets/final_ab_events.csv'),
         pd.read_csv('https://code.s3.yandex.net/datasets/final_ab_participants.csv')
     )
except:
     mark_events, ab_new_users, ab_events, ab_participants = (
         pd.read_csv('https://drive.google.com/file/d/1LVvfMLjDtlTWfstUgp_zyNzC7QfPPOeG/view?usp=sharing'),
         pd.read_csv('https://drive.google.com/file/d/1TiKpvRcqslJtCZjNGStaGHvudN48s03_/view?usp=sharing'),
         pd.read_csv('https://drive.google.com/file/d/1MJuC6OmoG9MW3J2L38Y1WJV2x8ZeP626/view?usp=sharing'),
         pd.read_csv('https://drive.google.com/file/d/1Ws9FVJii5rlZqDPXM5zqffpaJb4cAnUz/view?usp=sharing')
     )

In [None]:
mark_events.head()

In [None]:
ab_new_users.head()

In [None]:
ab_events.head()

In [None]:
ab_participants.head()

**Data Description**

`mark_events` - calendar of marketing events for 2020;

Table structure:
* `name` — marketing event name;
* `regions` - regions in which the advertising campaign will be carried out;
* `start_dt` — campaign start date;
* `finish_dt`  - is the end date of the campaign.

`ab_new_users` - all users who registered in the online store from December 7 to December 21, 2020;

Table structure:
* `user_id` - the user ID;
* `first_date` - registration date;
* `region` — user's region;
* `device` — the device from which registration took place.

`ab_events` - all new user events from December 7, 2020, to January 4, 2021;

Table structure:
* `user_id` - the user ID;
* `event_dt` - date and time of the event;
* `event_name` - event type;
* `details` - additional data about the event. For example, for purchases, purchase, this field stores the cost of the purchase in dollars.

`ab_participants` - table of test participants.

Table structure:
* `user_id` - the user ID;
* `ab_test` - test name;
* `group` -  is the user's group.

In [None]:
print("_______________________________________".format (mark_events.info()))
print("_______________________________________".format (ab_new_users.info()))
print("_______________________________________".format (ab_events.info()))
print("_______________________________________".format (ab_participants.info()))

<div style="border:solid green 2px; padding: 20px">
    
**Conclusions:**
The A/B test data contains 61733 rows for new users, 440317 rows for users, and 18268 for test participants.
    
There are gaps in the data in the test event table, in the details column, they need to be processed.
    
The columns need not be renamed because they do not contain capital letters.
    
It is necessary to check columns for duplicates, dates and times must be converted to DateTime format.

## Data preprocessing
<a id="preprocessing"></a>


### Changing Data Types

*Let's convert all dates to DateTime format*

In [None]:
mark_events['start_dt'] = mark_events['start_dt'].map(
    lambda x: dt.datetime.strptime(x, '%Y-%m-%d'))
 
mark_events['finish_dt'] = mark_events['finish_dt'].map(
    lambda x: dt.datetime.strptime(x, '%Y-%m-%d'))
mark_events.info()


In [None]:
mark_events.head(2)

In [None]:
ab_new_users['first_date'] = ab_new_users['first_date'].map(
    lambda x: dt.datetime.strptime(x, '%Y-%m-%d'))
 

ab_new_users.info()

In [None]:
ab_new_users.head()

In [None]:
ab_events['event_dt'] = ab_events['event_dt'].map(
    lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
 

ab_events.info()

In [None]:
ab_events.head()

*All dates are in the correct format*

### Checking for duplicates

*Check for obvious duplicates*

In [None]:
print('Number of duplicates in mark_events {}'.format(mark_events.duplicated().sum()))
print('Number of duplicates in ab_new_users {}'.format(ab_new_users.duplicated().sum()))
print('Number of duplicates in ab_events {}'.format(ab_events.duplicated().sum()))
print('Number of duplicates in ab_participants {}'.format(ab_participants.duplicated().sum()))

*No obvious duplicates were found in all data frames, let's check duplicate user IDs*

In [None]:
print("Number of duplicated user_id in ab_participants {}  ".format(ab_participants['user_id'].duplicated().sum()))
print("Number of duplicated user_id in ab_new_users {}  ".format(ab_new_users['user_id'].duplicated().sum()))


*This most likely means that the same users were included in different tests, let's check this statement in relation to our test, which is called `recommender_system_test`*

In [None]:
ab_participants['ab_test'].value_counts()

In [None]:
ab_participants.query('ab_test == \
"recommender_system_test"')['user_id'].isin(ab_participants.query('ab_test != "recommender_system_test"')['user_id'])

In [None]:
print("Number of users, which were in both tests = ", \
      sum(ab_participants.query('ab_test == \
"recommender_system_test"')['user_id'].isin(ab_participants.query('ab_test != \
"recommender_system_test"')['user_id'])))


*Which logically repeats the sum of duplicates, we do not need crossed users, since it is not known which transformation affected their actions*

In [None]:
recommender_id = ab_participants.query('ab_test == "recommender_system_test"')['user_id']
not_recommender_id = ab_participants.query('ab_test != "recommender_system_test"')['user_id']
intersection_tests_id = set(not_recommender_id) & set(recommender_id)
ab_participants_with_intersection = ab_participants.copy()

ab_participants = ab_participants[~ab_participants['user_id'].isin(intersection_tests_id)].copy()

In [None]:
ab_participants['user_id'].duplicated().sum()

*We got rid of overlapping user ids, but we don't need users of the second test at all, let's get rid of them as well*

In [None]:
ab_participants = ab_participants.query('ab_test == "recommender_system_test"').copy()

### Handling data gaps

*Let's consider  the dataset with gaps again*

In [None]:
ab_events.isna().mean()

*86% of gaps, it is clear that the gaps are most likely due to the fact that this is related to the purchase event, the "details" column contains the cost, we will display the number of lines with the purchase and see in comparison with the number of gaps  in "details"*

In [None]:
ab_events['event_name'].value_counts() 

In [None]:
ab_events.info()

*As required to prove, for us this column is non-target, in order not to lose data in the case of filtering, we fill them with a negative stub -55.5*

In [None]:
ab_events = ab_events.fillna(-55.5)

In [None]:
ab_events.isna().mean()

<div style="border:solid green 2px; padding: 20px">
    
**Conclusions:**
    
The gaps in the data in the test event table, in the details column, we filled in with a stub.

We cast date columns to DateTime.
    
Duplicate users, as well as users from another test, have been removed.

Next, we check the test results for compliance with the terms of reference.

## Assessment of the correctness of the test for compliance with the TOR 
<a id="matchtest"></a>

**Technical task**

Test name: `recommender_system_test`;

Groups: `A` (control), `B` (new payment funnel);

Launch date: `2020-12-07`;

New user enrollment stop date: `2020-12-21`;

Stop date: `2021-01-04`;

Audience: `15% of new users from the EU region`;

Purpose of the test: testing the changes associated with the introduction of an improved recommender system;

Expected number of test participants: `6000`.

Expected effect: within 14 days from the moment of registration in the system, users will show an improvement in each metric by at least 10%:

* conversions to view product cards — `product_page` event
* cart views - `product_cart`
* purchases - `purchase`.

*We check the groups, at first, we display all the test participants*

In [None]:
print ('Number of participants of recommender_system_test is {} people'.format (ab_participants['user_id'].count())) 

*We get the first violation of the terms of reference: people who took part in the A / B test are 901 people less than expected, that is, the sample is 18% smaller, in a good way it is worth continuing the test*

*Let's take a sample by groups*

In [None]:
ab_participants['group'].value_counts()

*The control group is larger than the test group with a difference of 707 people (24%), also a non-compliance was found*

*Checking intersections in groups* 

In [None]:
common_a_b = len(list(set(ab_participants[ab_participants['group']=='A']['user_id']).intersection(set(ab_participants[ab_participants['group']=='B']['user_id']))))


In [None]:
print('Intersections between  two groups of the test: {}'.format(common_a_b)) 

*No intersections*

*Let's check the number of unique users in ab_new_users*

In [None]:
 ab_new_users['user_id'].nunique()

*Let's select users who are in some group - A or B.*

In [None]:
ab_new_users_row =ab_new_users.copy()
ab_new_users = ab_new_users[(ab_new_users['user_id'].isin(ab_participants['user_id']))].copy()

In [None]:
 ab_new_users['user_id'].nunique()

*Now let's see if there are users who registered after*

In [None]:
ab_new_users[~(('2020-12-07'<= ab_new_users['first_date'] ) & (ab_new_users['first_date']  <=  '2020-12-21'))]

In [None]:
print('First day of testing registration is {}'.format(ab_new_users['first_date'].min())) 
print('Last day of testing registration is {}'.format(ab_new_users['first_date'].max())) 

*There are no such users, the terms of reference are met*

*Let's filter the dataset by user id and check the last date of the test*

In [None]:
ab_events = ab_events[(ab_events['user_id'].isin(ab_participants['user_id']))].copy()

In [None]:
print('First day of testing is {}'.format(ab_events['event_dt'].min()))
print('Last day of testing is {}'.format(ab_events['event_dt'].max()))

*The condition at the end of the test is not met, the last day is 30.12 and not 04.01*

In [None]:
ab_events.info()

*There are 18804 events in dataset*

*Let's check the data for the condition of 15% from the EU first on the raw dataset*  

In [None]:
ab_new_users_row.info()

In [None]:
ab_new_users_row['region'].value_counts()

In [None]:
region_user = ab_new_users.groupby('region')\
.agg({'user_id':'count'})\
.reset_index().rename(columns={'user_id':'number'})
region_user ['%_all_users'] = round(region_user['number']*100/sum(region_user['number']))
region_user

*As a result of the test, the audience participating in the test from Europe was 93%, from the original European users 10.3%, which does not correspond to the TOR.*

*Let's check which marketing events are intersected by the time of the test*

In [None]:
mark_events

In [None]:
mark_events[(mark_events['finish_dt']>='2020-12-07')&(mark_events['start_dt']<='2021-01-04')]

*During the test period, there should be no promotions that could affect conversion rates. As we can see from the tables, our test days intersect with the largest sales of the year, dedicated to Christmas and New Year, both in Europe and America (promotion) and in Eastern Europe and Asia (New Year's lottery): days of intersection with promo shares in the ideal test 16, given that the test was completed ahead of time, then in reality 6 days.*

<div style="border:solid green 2px; padding: 20px">
    
**Conclusions:**
    
The results of the dataset do not meet the criteria of the TOR for the following parameters:
<br/>1. The test sample of 901 people is smaller than expected, that is, 18% less than the estimated 6000
  <br/>2. The last day of observation of the events is December 30 and not January 04, that is, there is no data for 5 days
  <br/> 3. The audience participating in the test from Europe was 4749 people or 93%. Of the original European users, 10.3%, which does not meet the requirements of the TOR of 15%.
  <br/>
In addition, the following violations were made during the test:
  <br/> - The control group is larger than the test group, with a difference of 707 people (24%)
   <br/> - The days of the testing overlap with the biggest sales of the year, dedicated to Christmas and New Year, both in Europe and America (promotion) and in Eastern Europe and Asia (New Year's lottery). There are 16 days of intersections with promotions in the ideal test, if we take into account that the test was completed ahead of time, then in reality 6 days.
<br/>As a result, the test can be considered failed, it must be carried out again strictly following the requirements of the TOR and excluding seasonal promotions and parallel tests.
  <br/>
Despite the above, we will analyze the data<br/>

## Exploratory data analysis
<a id="eda"></a>

*Let's collect the test data for analysis in a common table*

In [None]:
data = ab_participants.merge(ab_new_users, on = 'user_id', how = 'left')

In [None]:
data = data.merge(ab_events, on = 'user_id', how = 'left')

In [None]:
data.head()

*Add a column with the date of the event*

In [None]:
data['event_date'] = data['event_dt'].dt.date

In [None]:
data.head()

### Number of users and events

*Let's see the number of users and distribution by groups*

In [None]:
print ('Number of users in the test {}'.format(data['user_id'].nunique()))

In [None]:
sns.set_style('dark')

*Once again we derive the ratio of groups*

In [None]:
plt.figure(figsize = (19,6))
fig = sns.barplot(x=[data[data['group']=='A']['user_id'].nunique(), \
                     data[data['group']=='B']['user_id'].nunique()],\
            y = ['group A','group B'], palette='husl')
plt.title('Number of the groups A and B')
fig.set_xlabel('Number of users',fontsize = 10)
plt.show()

*Consider the distribution of the number of events in groups*

In [None]:
data[data['group']=='A'].groupby('user_id').agg({'event_dt':'count'}).describe()

In [None]:
print('Total number of events for group A is {}'.format (data[data['group']=='A'].groupby('user_id').agg({'event_dt':'count'})['event_dt'].sum()))

In [None]:
data[data['group']=='B'].groupby('user_id').agg({'event_dt':'count'}).describe()

In [None]:
print('Total number of events for group B is {}'.format (data[data['group']=='B'].groupby('user_id').agg({'event_dt':'count'})['event_dt'].sum()))

In [None]:
plt.figure(figsize = (19,6))
ax = plt.subplot(1, 1, 1)
sns.histplot(data[data['group']=='A'].groupby('user_id')\
             .agg({'event_dt':'count'}), palette= 'Set2', \
             alpha = 0.5, stat = 'density', common_norm =False)
sns.histplot(data[data['group']=='B'].groupby('user_id')\
             .agg({'event_dt':'count'}), palette= 'rocket',  alpha = 0.2,\
             stat = 'density', common_norm =False)
ax.set_title('Destribution of events in groups', fontsize = 10, color = 'DarkBlue')
plt.legend(['Group A', 'Group B']);

*We see that the number of events is extremely heterogeneous: the total number of events in group A is 14737, while in group B - 4067. For group B, more than 50% of users have no events at all, the 75th quartile has only 3 events, that is, there are very few events. In group A, things are better, the median value is 4 events.*

### The number of events by dates

*Let's consider event data by day in groups*

In [None]:
day_event_a = data[data['group']=='A'].groupby('event_date')\
.agg({'event_name':'count', 'region':'first'}).reset_index()


In [None]:
day_event_b = data[data['group']=='B'].groupby('event_date')\
.agg({'event_name':'count', 'region':'first'}).reset_index()


In [None]:
plt.figure(figsize = (19,6))
ax = plt.subplot(1, 1, 1)
sns.barplot(x = day_event_a['event_date'], y = day_event_a['event_name'], color = '#3F0071')
sns.barplot(x = day_event_b['event_date'], y = day_event_b['event_name'], color = '#FB2576')
ax.set_title('Destribution of evenent in groups', fontsize = 20, color = 'DarkBlue')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 90)
ax.set_ylabel('Number of events',fontsize = 20)
ax.set_xlabel('Date of test',fontsize = 20)
;

*As can be seen from the distribution, a sharp increase in events falls on December 14 - people are preparing for the celebration of Christmas (most of us are from Europe, and Christmas in the Western Christian Church is celebrated on December 25), which again indicates the incorrectness of the test on the eve of major holidays, but marketing the events did not greatly affect the test - since the 25th, activity has been falling, although without the action it could have been even lower. At the same time, we note that in group B the number of events does not depend much on the calendar until the peak of December 21, it fluctuates from 200-400 events per day,  21 there are about 100-150 events per day. Both December 14 and December 21 are Mondays.*

### Let's consider regions and devices in groups

*Let's derive the distribution in groups by regions*

In [None]:
colors = sns.color_palette('bright')
plt.figure(figsize = (14,6))
plt.subplot(1,2,1)
d = pd.DataFrame(data[data['group']== "A"][['user_id', \
                                             'region', 'device']].drop_duplicates()['region'].value_counts()).reset_index()
plt.pie(d['region'], labels = d['index'],  autopct='%.0f%%', colors=colors)
plt.title('Parts of regions in group A')

plt.subplot(1,2,2)
d = pd.DataFrame(data[data['group']== "B"][['user_id', \
                                             'region', 'device']].drop_duplicates()['region'].value_counts()).reset_index()
plt.pie(d['region'], labels = d['index'],  autopct='%.0f%%', colors=colors)
plt.title('Parts of regions in group B')

plt.show()

*By distribution by region, the groups are homogeneous, now let's look at the platforms from which users came*

In [None]:
colors = sns.color_palette('colorblind')
plt.figure(figsize = (14,6))
plt.subplot(1,2,1)
d = pd.DataFrame(data[data['group']== "A"][['user_id', \
                                             'region', 'device']].drop_duplicates()['device'].value_counts()).reset_index()
plt.pie(d['device'], labels = d['index'],  autopct='%.0f%%', colors=colors)
plt.title('Parts of devices in group A')

plt.subplot(1,2,2)
d = pd.DataFrame(data[data['group']== "B"][['user_id', \
                                             'region', 'device']].drop_duplicates()['device'].value_counts()).reset_index()
plt.pie(d['device'], labels = d['index'],  autopct='%.0f%%', colors=colors)
plt.title('Parts of devices in group B')

plt.show()

*Let's fix that in group B more came out from Android, and less from PC and Mac, but the difference is 3%.*

### Event Funnel

*Let's build a common funnel of events*

In [None]:
d = pd.DataFrame(data.groupby('event_name')['user_id'].nunique()).reindex(['login',\
                                                        'product_page', 'product_cart', 'purchase'])\
.reset_index()
d.columns = ['event_name', 'event_counts']
fig = go.Figure(go.Funnel(
    y = d ['event_name'],
    x = d ['event_counts'],
    textposition = "inside",
    textinfo = "value+percent initial",
    opacity = 0.65, marker = {"color": ["deepskyblue", "lightsalmon", "tan", "teal"],
    "line": {"width": [4, 2, 2, 3], "color": ["wheat", "wheat", "blue", "wheat"]}},
    connector = {"line": {"color": "royalblue", "dash": "dot", "width": 3}}),
   layout_title_text= 'The event funnel in our test')

fig.show()

*Как видно, у нас неплохая общая конверсия в 30%, обращает на себя внимание, что люди каким-то образом покупали не заходя на страницу корзины: конверсия из корзины в покупку 102%, ну и обратим внимание, что собственно в эксперементе учавствовали всего 2788 пользователей из 5099 зарегистрировавшихся.*

Now let's analyze the result of our experiment:

Expectations of our experiment: `in 14 days from the moment of registration in the system, users will show an improvement of each metric by at least 10%`

Let's calculate the data for each user for 14 days. group and build a funnel with division into groups

In [None]:
data_2_weeks = data.reset_index()

true_false_data = pd.DataFrame(data.groupby('user_id').apply(lambda x :   x['event_dt'] <=\
                                                    x['event_dt'].min() +  dt.timedelta(days=14))).reset_index()

true_false_data.head(10)

In [None]:
data_2_weeks = data_2_weeks.merge(true_false_data, right_on='level_1', \
                       left_on='index').rename(columns = {'event_dt_y' : 'row_dt',
                                                         'user_id_x' : 'user_id',
                                                          'event_dt_x' : 'event_dt'
                                                         }).drop(columns=['index', 'level_1', 'user_id_y'])
data_2_weeks = data_2_weeks[data_2_weeks['row_dt']].copy()

In [None]:
fig = go.Figure()

data_a = pd.DataFrame(data_2_weeks.query('group == "A"').groupby('event_name')['user_id'].nunique()).reindex(['login',\
                                                        'product_page', 'product_cart', 'purchase']).reset_index()
data_a.columns = ['event_name_a', 'event_counts_a']
fig.add_trace(go.Funnel(
    name = 'A - group',
    y= data_a['event_name_a'],
    x= data_a['event_counts_a'],
    textinfo = "value+percent initial"))

data_b =  pd.DataFrame(data_2_weeks.query('group == "B"').groupby('event_name')['user_id'].nunique()).reindex(['login',\
                                                        'product_page', 'product_cart', 'purchase']).reset_index()
data_b.columns = ['event_name_b', 'event_counts_b']
fig.add_trace(go.Funnel(
    name = 'B - group',
    y= data_b['event_name_b'],
    x= data_b['event_counts_b'],
    textinfo = "value+percent initial"))

fig.update_layout(
    title={
        'text': "Conversion by group with the furst step: login",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})

fig.show()

*As we can see, the result of the experiment is a failure:
not only did we not get an increase in conversion in the test group compared to the control group, but we also got a deterioration in conversion at each step. The only step in which the increase in group B is greater is from the product page to the cart: an increase of 5.5%*

<div style="border:solid green 2px; padding: 20px">


**Conclusions:**

<br/>1. The number of events is extremely heterogeneous: there are 14737 events in total in group A, while 4067 in group B. For group B, more than 50% of users have no events at all, the 75th quartile has only 3 events, that is, there are very few events. In group A, things are better, the median value is 4 events.
  <br/>2. A sharp increase in events falls on December 14 - people are preparing for the celebration of Christmas (most users from Europe, where Christmas in the Western Christian Church is celebrated on December 25), which again indicates the incorrectness of the test on the eve of major holidays. But marketing events did not greatly affect the test: from the 25th, activity drops, although without a promotion it could be even lower. At the same time, we note that in group B the number of events does not depend much on the calendar until the peak of December 21, fluctuates around 200-400 events per day, and after the 21st there are about 100-150 events per day.
  <br/> 3. Regions and devices of users have little effect on the homogeneity of groups.
     <br/>4. The total conversion of the resource is 30%
     <br/> 5. The experiment with the introduction of the new recommendation system failed, because in the tested group, instead of the expected increase in conversion, there is a decrease at each step, the only step in which the increase in group B is greater is from the product page to the cart: increase 5 .5%, which also does not meet expectations.

 
  <br/>
Next, consider the statistical difference in the shares of groups<br/>

## Testing hypotheses about equality of shares.
<a id="hypot"></a>

*Since the only step in which the increase in group B is greater is from the product page to the basket. Let's calculate the hypothesis about the equality of these shares in group A and group B. Let's formulate the hypothesis:*
```
H_0: The shares of groups A and B when moving from the product page to the cart are equal

H_a: The shares of groups A and B when moving from the product page to the cart are different
alpha = 0.025
```

Since we are going to compare the proportion of users who have moved to the next step in the funnel, we will use the Z criterion*

In [None]:
def z_test (group_1, group_2, event_1, event_2, test_n):
    sample_a1 = data[(data['group']==group_1)&(data['event_name']==event_1)]['user_id'].nunique()
    sample_a1_ev = data[(data['group']==group_1)&(data['event_name']==event_2)]['user_id'].nunique()
    sample_a2 = data[(data['group']==group_2)&(data['event_name']==event_1)]['user_id'].nunique()
    sample_a2_ev = data[(data['group']==group_2)&(data['event_name']==event_2)]['user_id'].nunique()
    alpha = 0.05/test_n
    print('alpha: ', alpha)


    p1 = sample_a1_ev/sample_a1
    p2 = sample_a2_ev/sample_a2

    p_combined = (sample_a1_ev + sample_a2_ev)/ (sample_a1 + sample_a2)


    difference = p1 - p2 
    z_value = difference / mt.sqrt(p_combined * (1 - p_combined) * (1/sample_a1 + 1/sample_a2))

    distr = stats.norm(0, 1) 

    p_value = (1 - distr.cdf(abs(z_value))) * 2

    print('p-value: ', p_value)

    if p_value < alpha:
        print('We reject the null hypothesis: there is a significant difference between the proportions')
    else:
        print(
             'We failed to reject the null hypothesis, there is no reason to consider the proportions different'
    )

In [None]:
z_test ("A", "B", 'product_page', 'product_cart', 2)

*This means that there is no difference between the shares in this step. Let's go through the hypotheses of all stages of the funnel*

```
H_0: The shares of groups A and B when switching from login to product card are equal

H_a: Shares of groups A and B differ when switching from login to product card
alpha = 0.017
```

In [None]:
z_test ("A", "B", 'login', 'product_page', 3)

*The difference in shares when going to the product page is fixed, then we will consider the difference in shares during the conversion between the login and the cart page*

```
H_0: The shares of groups A and B when moving from login to cart are equal

H_a: Shares of groups A and B differ when going from login to cart
alpha = 0.0167
```

In [None]:
z_test ("A", "B", 'login','product_cart', 3)

*The difference is not fixed, now let's compare the shares by conversion from login to purchase*

```
H_0: The shares of groups A and B when switching from login to purchase are equal

H_a: The shares of groups A and B differ when switching from login to purchase
alpha = 0.0167
```

In [None]:
z_test ("A", "B", 'login','purchase', 3)

*Difference not fixed*

<div style="border:solid green 2px; padding: 20px">


**Conclusions:**

<br/>Of all the tested hypotheses, the difference is statistically fixed in shares when moving from the login to the product page. In other cases, the difference is statistically insignificant.

## General conclusions
<a id="conclusion"></a>

<div style="border:solid pink 2px; padding: 20px">
<br/> 1. Processed the results of the conducted A / B test
<br/>2. As a result of checking the results for compliance with the TOR:
     <br/>* The sample of the test is 901 people less than expected, that is, less than 18% of the estimated 6000
  <br/> - The last day of monitoring the events is 30.12, not 04.01, that is, there is no data for 5 days
  <br/> - The audience participating in the test from Europe was 4749 people or 93%. From the original European users, 10.3%, which does not meet the requirements of the TOR of 15%.
  <br/>3. The following violations were made during the test:
  <br/> - The control group is larger than the test group, with a difference of 707 people (24%).
   <br/> - The days of the test overlap with the biggest Christmas and New Year sales of the year, both in Europe and America (promotion) and in Eastern Europe and Asia (New Year's lottery). There are 16 days of intersections with promotions in the ideal test, if we take into account that the test was completed ahead of time, then in reality 6 days.
<br/>4. The number of events in the data by groups is extremely heterogeneous: the total number of events in group A is 14737, while in group B - 4067. For group B, more than 50% of users have no events at all. There are only 3 events in the 75th quartile, that is, there are very few events. In group A, things are better, the median value is 4 events.
  <br/>5. A sharp increase in events in the data falls on December 14 - people are preparing for the celebration of Christmas (most users from Europe, Christmas in the Western Christian Church is celebrated on December 25), which again indicates the incorrectness of the test on the eve of big holidays, but marketing events did not greatly affect for the test - from the 25th, activity drops, although without a promotion it could be even lower. At the same time, we note that in group B the number of events does not depend much on the calendar until the peak of December 21, it fluctuates to 200-400 events per day, after 21 there are about 100-150 events per day.
  <br/> 6. Regions and devices of users have little effect on the homogeneity of groups.
  <br/>7. The total conversion of the resource is 30%
  <br/> 8. The experiment with the introduction of the new recommendation system failed, because in the test group, instead of the expected increase in conversion, there is a decrease at each step, the only step in which the increase in group B is more from the product page to the cart: increase 5, 5%, which also does not meet expectations.
  <br/>9. Of all the hypotheses tested at each stage of the funnel, the difference is statistically fixed in shares when moving from the login to the product page. In other cases, the difference is statistically insignificant.

## Recomendations

<div style="border:solid pink 2px; padding: 20px">
 This test can be considered failed, it must be carried out again strictly following the requirements of the TOR and excluding seasonal promotions and parallel tests.
  For re-experimentation, it is necessary to take another control group to ensure sample homogeneity, calculate and take into account seasonality in advance, strictly follow the TOR and calculate the test results before interrupting it.