In [None]:
import pandas as pd
import numpy as np
np.seterr(divide='ignore', invalid='ignore')
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import plotly.graph_objs as go
from scipy import stats
import statsmodels.api as sm
import warnings
warnings.filterwarnings('ignore')

In [None]:
marketing_events = pd.read_csv('ab_project_marketing_events_us.csv')

In [None]:
marketing_events.head()

In [None]:
marketing_events.info()

In [None]:
users = pd.read_csv('final_ab_new_users_upd_us.csv')

In [None]:
users.head()

In [None]:
users.info()

In [None]:
user_events = pd.read_csv('final_ab_events_upd_us.csv')

In [None]:
user_events.head()

In [None]:
user_events.info()

In [None]:
participants = pd.read_csv('final_ab_participants_upd_us.csv')

In [None]:
participants.head()

In [None]:
participants.info()

## Data Preprocessing

##### Marketing_events

In [None]:
#check for missing values in marketing_events
marketing_events.isna().sum()

In [None]:
#check for duplicate values
marketing_events.duplicated().sum()

In [None]:
#converting the datatypes
marketing_events['start_dt'] = pd.to_datetime(marketing_events['start_dt'])
marketing_events['finish_dt'] = pd.to_datetime(marketing_events['finish_dt'])
marketing_events.info()

We may be interested in the Christmas & New Year Promo and CIS New Year Gift Lottery as the coverage of the test is between 2020-12-07 and 2021-01-01.

##### Users

In [None]:
# check for missing values in users
users.isna().sum()

In [None]:
#check for duplicate values
users.duplicated().sum()

In [None]:
# Change the datatype of the column date_column to datetime
users['first_date'] = pd.to_datetime(users['first_date'])
users['region'] = users['region'].astype('category')
users['device'] = users['device'].astype('category')
users.info()

In [None]:
users.describe(include='all')

If the technical description states that the date when they stopped taking up new users is 2020-12-21, but the data shows the last date as 2020-12-23, then there might be a discrepancy between the two. It's important to clarify and confirm the actual date when new user sign-ups were stopped.

In [None]:
users['region'].value_counts()

In [None]:
# filter new_users by region and sign-up date
eu_users = users[(users['region'] == 'EU') & (users['first_date'] >= '2020-12-07') & (users['first_date'] <= '2020-12-21')]

# count the number of rows in the filtered dataframe
num_eu_users = len(eu_users)
num_users_pntg= round(num_eu_users * 0.15)
num_eu_users, num_users_pntg

In the technical description, expected number of test participants is 6000

##### User_events

In [None]:
#check for missing values in user_events
user_events.isna().sum()

In [None]:
user_events[pd.isnull(user_events).any(axis=1)]

In [None]:
#check for duplicate values
user_events.duplicated().sum()

In [None]:
# Coverting the datatypes
user_events['event_dt'] = pd.to_datetime(user_events['event_dt'])
user_events['event_name'] = user_events['event_name'].astype('category')
user_events['details'] = user_events['details'].astype('float')
user_events.info()

In [None]:
user_events.describe(include = 'all')

In [None]:
user_events['event_name'].value_counts()

In [None]:
#check if missing values in details are connected to event type
for event in user_events['event_name'].unique():
    event_missing_details = user_events.query('event_name == @event')['details'].isna().sum()
    print('Event name: {}. Missing details: {} out of {}.'.format(event, event_missing_details, len(user_events.query('event_name == @event'))))

details_stats = user_events['details'].describe()

# look at minimum and maximum values of 'details':
user_events['details'].describe()

In [None]:
# fill missing values with 0
user_events['details'] = user_events['details'].fillna(0)

There are 363447 missing values in the details column. However details is only for additional info on the event_name which is specifically the amount of purchase if the event_name is purchase. As there are only 60,314 users who made a purchase, there are only 60,314 records with values in the details column.The dates satify the test duration period.

### Participants

In [None]:
participants.isnull().sum()

In [None]:
participants.head(2)

In [None]:
participants.describe(include='all')

In [None]:
participants['group'].value_counts()

In [None]:
participants['ab_test'].value_counts()

In [None]:
participants['user_id'].duplicated().sum()

In [None]:
duplicated_user = participants[participants.duplicated(subset=['user_id'], keep=False)].sort_values(by=['user_id', 'ab_test']).reset_index(drop=True)
duplicated_user

In [None]:
# create separate dataframes for each AB test group
interface_eu = participants[participants['ab_test'] == 'interface_eu_test']
recommender_system = participants[participants['ab_test'] == 'recommender_system_test']

# count the number of duplicate users in each group
interface_eu_duplicates = interface_eu['user_id'].duplicated().sum()
recommender_system_duplicates = recommender_system['user_id'].duplicated().sum()

# print the results
print("There are {} duplicate users in interface_eu_test participants".format(interface_eu_duplicates))
print("There are {} duplicate users in recommender_system_test participants".format(recommender_system_duplicates))

* The participants dataset contains 887 duplicate user_ids, which are users who participated in both the interface_eu_test and the recommender_system_test. However, there are no duplicate user_ids within the participants of each individual test.
* The actual number of participants in the recommender_system_test is only 3675, which falls short of the expected number. The number of participants in the recommender_system_test is lower than the expected number of 6000 participants based on the technical description. The technical description stated that the audience should be 15% of new users from the EU region who signed up between 2020-12-07 and 2020-12-21, and the date when they stopped taking up new users is 2020-12-21. 

#### DataPreprocessing

In [None]:
new_eu_users = users[(users['region'] == 'EU') & (users['first_date'] >= '2020-12-07') & (users['first_date'] <= '2020-12-21')]
new_eu_users

In [None]:
interface_eu_participants = participants[participants['ab_test'] == 'interface_eu_test']
interface_eu.describe(include = 'all')

The interface_eu df include only records whose user_id is in new_eu_users user_id column.

In [None]:
new_eu_users = users.loc[(users['region'] == 'EU') & (users['first_date'] <= '2020-12-21')]
print('There are {} new users who are from the EU region and signup date is between 2020-12-07 and 2020-12-21'.format(len(new_eu_users)))

interface_eu = interface_eu[interface_eu['user_id'].isin(new_eu_users['user_id'])]
interface_eu.describe(include='all')

The number of new users from the EU region who signed up between 2020-12-07 and 2020-12-21 is 39466, out of which the required audience for the test is 15%, equivalent to 5919 users. The expected number of test participants is 6000. However, the total number of test participants in interface_eu_test is 9848, which represents approximately 25% of the required audience.

In [None]:
interface_eu['group'].value_counts()

In [None]:
#Finding the proportion of participants in each group(A and B) for 'interface_eu'
group_a = interface_eu[interface_eu['group'] == 'A']
group_b = interface_eu[interface_eu['group'] == 'B']

pro_group_a = len(group_a) / len(interface_eu)
pro_group_b = len(group_b) / len(interface_eu)

print('The proportion of Group A to total participants is {:.2f}'.format(pro_group_a))
print('The proportion of Group B to total participants is {:.2f}'.format(pro_group_b))

The proportions of users from group A and B are almost equal.

In [None]:
user_events['event_date'] = pd.to_datetime(user_events['event_dt'].dt.date)
user_events

In [None]:
#Left join user_events df to interface_eu df
interface_eu_events = interface_eu.merge(user_events, on='user_id', how='left')
interface_eu_events

In [None]:
interface_eu_events.describe(include = 'all')

In [None]:
interface_eu_events.to_csv('eu_events.csv')

In [None]:
#create a dataframe for users participated in recommender_system test
recommender_system = participants[participants['ab_test'] == 'recommender_system_test']
recommender_system.describe(include = 'all')

The recommender_system include only records whose user_id is in new_eu_users , user_id

In [None]:
recommender_system = recommender_system[recommender_system['user_id'].isin(new_eu_users['user_id'])]
recommender_system.describe(include='all')

In [None]:
recommender_system['group'].value_counts()

* The total number of test participants for recommender_system_test, where users are from the EU region and signed up between 2020-12-07 and 2020-12-21, is only 3481, whereas the expected number was 6000. 
* Additionally, the sample sizes of groups A and B are significantly different, rendering the data inadequate for the test. Therefore, further analysis will only be conducted on interface_eu_test.

In [None]:
#Resolving Discrepancy in Stop Date for New User Sign-Ups
# Check the unique values and frequency of the 'first_date' column
users['first_date'].value_counts().sort_index()


The output shows the Unique values and their frequencies in the 'first_date'  column of your dataset. 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
# Sample data (replace this with your actual data)
data = {
    'first_date': ['2020-12-07', '2020-12-08', '2020-12-09', '2020-12-10', '2020-12-11', '2020-12-12', '2020-12-13', '2020-12-14', '2020-12-15', '2020-12-16', '2020-12-17', '2020-12-18', '2020-12-19', '2020-12-20', '2020-12-21', '2020-12-22', '2020-12-23'],
    'signups': [5291, 3017, 2010, 2784, 2226, 3591, 4181, 5448, 2924, 2093, 2940, 3238, 3480, 4140, 6077, 3083, 2180]
}

# Create a DataFrame from the sample data
df = pd.DataFrame(data)

# Convert 'first_date' column to datetime
df['first_date'] = pd.to_datetime(df['first_date'])

# Sort the DataFrame by 'first_date' for proper visualization
df = df.sort_values(by='first_date')

# Create a line plot
plt.figure(figsize=(12, 6))
plt.plot(df['first_date'], df['signups'], marker='o', linestyle='-', color='b')
plt.title('User Sign-Ups Over Time')
plt.xlabel('Date')
plt.ylabel('Number of Sign-Ups')
plt.xticks(rotation=45)
plt.grid(True)

# Show the plot
plt.tight_layout()
plt.show()

## Exploratory Data Analysis

- Study conversion at different funnel stages
- Is the number of events per user distributed equally in the samples?
- Are there users who enter both samples?
- How is the number of events distributed by days?
- Think of the possible details in the data that you have to take into account before starting the A/B test?

#### Study conversion at different funnel stages

In [None]:
# Calculate conversions at each stage for each group
conversion_funnel = interface_eu_events.pivot_table(index='event_name', 
                                                     values='user_id', 
                                                     columns='group', 
                                                     aggfunc=lambda x: x.nunique()) \
                                         .reset_index()
conversion_funnel

In [None]:
conversion_funnel = conversion_funnel.rename(columns={'event_name': 'funnel_stage', 'A': 'Group A', 'B': 'Group B'})
conversion_funnel

In [None]:
#conversion funnel stages
funnel_stages = ['login', 'product_page', 'product_cart', 'purchase']
pd.options.display.float_format = '{:.2%}'.format


#calculate the total number of users in each group
a = len(group_a['user_id'].unique())
b = len(group_b['user_id'].unique())

conversion_rates = {'stage': funnel_stages}

conversion_funnel['conversion_rate_a'] = conversion_funnel['Group A'] / a 
conversion_funnel['conversion_rate_b'] = conversion_funnel['Group B'] / b

#calculating the difference of conversions at each stage
conversion_funnel['increase/decrease'] = (conversion_funnel['Group B'] - conversion_funnel['Group A']) / conversion_funnel['Group A']


cols = ['funnel_stage', 'Group A', 'conversion_rate_a', 'Group B', 'conversion_rate_b', 'increase/decrease']
conversion_funnel = conversion_funnel[cols]

conversion_funnel

* For group A, the number of conversions in the purchase stage is higher than that in the product_cart stage. This suggests that some users may have proceeded directly to the purchase stage without adding the product to their cart.
* At the product_page and purchase funnel stages, the conversion for group A is actually higher than group B, which means there is no decrease in conversion.
* The product_cart stage, the conversion from group B shows a very small increase of 2.24%
* The expected result of obtaining at least a 10% increase in conversion rate at each of the funnel stages was not achieved based on the funnel analysis.

##### - Is the number of events per user distributed equally in the samples?

In [None]:
mean_events_a = interface_eu_events[interface_eu_events['group'] == 'A'].groupby('user_id')['event_name'].count().mean()
std_events_a = interface_eu_events[interface_eu_events['group'] == 'A'].groupby('user_id')['event_name'].count().std()

mean_events_b = interface_eu_events[interface_eu_events['group'] == 'B'].groupby('user_id')['event_name'].count().mean()
std_events_b = interface_eu_events[interface_eu_events['group'] == 'B'].groupby('user_id')['event_name'].count().std()

print("Group A: mean = {:.2f}, std = {:.2f}".format(mean_events_a, std_events_a))
print("Group B: mean = {:.2f}, std = {:.2f}".format(mean_events_b, std_events_b))

In [None]:
# group A
events_a = interface_eu_events[interface_eu_events['group'] == 'A'].groupby('user_id')['event_name'].count()
plt.hist(events_a, bins=20, alpha=0.5, label='Group A')

# group B
events_b = interface_eu_events[interface_eu_events['group'] == 'B'].groupby('user_id')['event_name'].count()
plt.hist(events_b, bins=20, alpha=0.5, label='Group B')

plt.xlabel('Number of Events per User')
plt.ylabel('Count')
plt.legend(loc='upper right')

plt.show()

It shows that the number of events per user distributed almost equally for groups A and B.

##### Are there users who enter both samples? 

In [None]:
interface_eu_events.groupby(['user_id'])['group'].nunique().value_counts()

In [None]:
#user_id's for both samples
users_a = set(interface_eu_events[interface_eu_events['group'] == 'A']['user_id'])
users_b = set(interface_eu_events[interface_eu_events['group'] == 'B']['user_id'])

#the intersection of the two sets of user_ids
users_both = users_a.intersection(users_b)

print(f"There are {len(users_both)} users who appear in both samples.")

There are no users participated in both groups

##### How is the number of events distributed by days?

In [None]:
g =interface_eu_events.groupby(['event_date'])['event_name'].count().plot(kind='bar', x='event_date', figsize=[15,10])

plt.title('Distribution of Number of Events by Date', fontsize=20)
plt.xlabel('Dates', fontsize=15)
plt.ylabel('Number of Events', fontsize=15)

g.bar_label(g.containers[0])
plt.xticks(rotation=75);

Based on the data for new users who signed up between 2020-12-07 and 2020-12-21, it appears that there was a positive response. Event activity began to increase as early as 2020-12-12 and continued until 2020-12-23.

####  Think of the possible details in the data that you have to take into account before starting the A/B test?
* Neither group has any users who participated in both groups.
* The number of users is almost the same size.
* The number of events is almost equally distributed.
* All the users are from the EU and their signup dates are between 2020-1207 and 2020-12-21.
* The dates of events are between 2020-12-07 and 2021-01-01.

##### Evaluate the A/B test results

###### What can you tell about the A/B test results?
  Assuming that the participants were chosen randomly, there are still potential biases in this A/B     test. 
* The test was only conducted for users from the EU region, which may not accurately reflect the    population of interest for an international online store unless the e-commerce website of the store has independent websites for each continent and that the new recommendation system is only for the EU webpage.

* The test was conducted during a holiday season, specifically between 2020-12-07 and 2021-01-01, which is known to have a surge in purchases due to the holiday season. Therefore, it is possible that the increase in conversions is not solely due to the new webpage design or the new recommendation system, but also due to the holiday season or other factors such as the Christmas and New Year promo running in the EU region from 2020-12-25 until 2021-01-03.

* To obtain a more accurate picture of the impact of the new design and recommendation system, it is recommended to conduct another test during regular days when users will have different spending behavior.

###### Use the z-criterion to check the statistical difference between the proportions.
- The z-criterion test used to determine whether there is a significant difference between two     proportions in A/B testing. It compares the proportion of successes between two groups and calculates the z-score and corresponding p-value.
- We use the proportions_ztest function from the statsmodels library in python to calculate the z-score and p-value. The syntax is z_score, p_value = proportions_ztest(count, nobs, alternative='larger')
- The next step is to make a decision to reject or fail to reject the null hypothesis based on comparing the two values:
   * We then compare the z-score to the critical value at 95% for one-tailed test, the critical value is 1.645, and for a two-tailed test it is 1.960. 
   * If the z-score is greater than the critical value, we reject the null hypothesis.

##### Perform z_test for each stage
* To perform a two-sample z_test for each event, we can use the proportions_ztest() function from statsmodels.stats.proportion module.
* $H_0: (p_B \leq p_A)$ The proportion of conversion of treatment group(B) is equal or less than the control group (A)
* $H_1 : p_{B} > p_{A}$ The proportion of conversion of treatment group(B) is greater than the control group (A)

In [None]:
conversion_funnel

In [None]:
interface_eu_events

In [None]:
#Z-test for product_page 
# filter the data for the product page
product_page_data = conversion_funnel[conversion_funnel['funnel_stage'] == 'product_page']

# number of conversions for group A and group B
conv_a = product_page_data['Group A'].sum()
conv_b = product_page_data['Group B'].sum()

# number of trials in Group A and Group B
num_group_a = interface_eu_events[interface_eu_events['group'] == 'A']['user_id'].nunique()
num_group_b = interface_eu_events[interface_eu_events['group'] == 'B']['user_id'].nunique()

print("The number of conversions for product_page in Group A, Group B:",conv_a, '&', conv_b)
print("The number of trails for product_page in Group A, Group B:",num_group_a, '&', num_group_b)

In [None]:
 # calculate the z-score and p-value
z_score, p_value = sm.stats.proportions_ztest([conv_b, conv_a], [num_group_b, num_group_a], alternative='larger')

print("z_score: {:.4f}".format(z_score))
print("p_value: {:.4f}".format(p_value))

#decision
if z_score > 1.645  and p_value < 0.05:
    print('We reject the null hypothesis')
else:
    print('We fail to reject the null hypotheis')


* we reject the null hypothesis if z_score > $Z_{\alpha}$.
* $Z_{\alpha}$ or $Z_{0.05}$ known as critical value at 95% confidence interval is 1.645 for **one-tailed test.** For right tailed test , we reject the null hypothesis  if z_score > $Z_{\alpha}$.
* Our z_score is -0.9740 which is less than $Z_{\alpha}$ of 1.645. Our p_value of 0.8350 is greater than alpha of 0.05
* Based on the result we **fail to reject the null hypothesis**. Therefore, we do not have enough evidence to support the claim that the new and improved recommendation system drives more conversions than the old one.

In [None]:
#Z-test for product_cart
# filter the data for the product page
product_cart_data = conversion_funnel[conversion_funnel['funnel_stage'] == 'product_cart']

# number of conversions for group A and group B
conv_a = product_cart_data['Group A'].sum()
conv_b = product_cart_data['Group B'].sum()

# number of trials in Group A and Group B
num_group_a = interface_eu_events[interface_eu_events['group'] == 'A']['user_id'].nunique()
num_group_b = interface_eu_events[interface_eu_events['group'] == 'B']['user_id'].nunique()

print("The number of conversions for product_cart in Group A, Group B:",conv_a, '&', conv_b)
print("The number of trails for product_cart in Group A, Group B:",num_group_a, '&', num_group_b)

In [None]:
# calculate the z-score and p-value
z_score, p_value = sm.stats.proportions_ztest([conv_b, conv_a], [num_group_b, num_group_a], alternative='larger')

print("z_score: {:.4f}".format(z_score))
print("p_value: {:.4f}".format(p_value))

#decision
if z_score > 1.645  and p_value < 0.05:
    print('We reject the null hypothesis')
else:
    print('We fail to reject the null hypotheis')

* We reject the null hypothesis if z_score > $Z_{\alpha}$.
* Our z_score is 1.5352 which is less than $Z_{\alpha}$ of 1.645. Our p_value of 0.0624 is greater than alpha of 0.05
* Based on the result we **fail to reject the null hypothesis**. Therefore, we do not have enough evidence to support the claim that the new and improved recommendation system drives more conversions than the old one.

In [None]:
#Z-test for purchase
# filter the data for the product page
purchase_data = conversion_funnel[conversion_funnel['funnel_stage'] == 'purchase']

# number of conversions for group A and group B
conv_a = purchase_data['Group A'].sum()
conv_b = purchase_data['Group B'].sum()

# number of trials in Group A and Group B
num_group_a = interface_eu_events[interface_eu_events['group'] == 'A']['user_id'].nunique()
num_group_b = interface_eu_events[interface_eu_events['group'] == 'B']['user_id'].nunique()

print("The number of conversions for purchase_data in Group A, Group B:",conv_a, '&', conv_b)
print("The number of trails for purchase_data in Group A, Group B:",num_group_a, '&', num_group_b)

In [None]:
# calculate the z-score and p-value
z_score, p_value = sm.stats.proportions_ztest([conv_b, conv_a], [num_group_b, num_group_a], alternative='larger')

print("z_score: {:.4f}".format(z_score))
print("p_value: {:.4f}".format(p_value))

#decision
if z_score > 1.645  and p_value < 0.05:
    print('We reject the null hypothesis')
else:

    print('We fail to reject the null hypotheis')

* we reject the null hypothesis if z_score > $Z_{\alpha}$.
* Our z_score is -2.4652 which is less than $Z_{\alpha}$ of 1.645. Our p_value of 0.9932 is greater than alpha of 0.05
* Based on the result we **fail to reject the null hypothesis**. Therefore, we do not have enough evidence to support the claim that the new and improved recommendation system drives more conversions than the old one.

## Conclusions

#### Exploratory Data Analysis

- I performed EDA on a preprocessed data where the participants met the following technical requirements:
     * The participants are all from the EU region.
     * The participants are new users whose sign up dates are between 2020-12-07 and 2020-12-21.
     * The dates of the events are between 2020-12-07 and 2021-01-01.
- In the conversion funnel analysis, it shows that the expected result of getting at least 10% of increase in conversion at each of the 3 stages (product_page, product_cart, and purchase) are not met.
- The number of events per user is distributed almost equally for groups A and B.
- There are no users who participated in both groups.
- The number of events is distributed almost equally.
- Considering that these events are from new users who sign up between 2020-12-07 and 2020-12-21, I can say that there is a good response. The events started to pick up as early as 2020-12-12 until 2020-12-23.

#### A/B Test Results

-  The test only includes users from the EU region, which may not accurately represent the international online store's population of interest unless the store has independent websites for each continent and the new recommendation system is only for the EU webpage.
-  The test was conducted during the Christmas holiday season, where there is typically a surge in purchases, making it difficult to determine the impact of the new webpage design or recommendation system.
-  There was a Christmas and New Year promo running in the EU region from 2020-12-25 until 2021-01-03, which may have influenced user behavior during the test period.
-  It is recommended to conduct another test during regular days to observe users' typical spending behavior without any external factors.

#### z-test

-  A two-sample z-test was performed to determine if there is a statistical difference between the two proportions.
-  $H_0: (p_B \leq p_A)$ The null hypothesis was that the proportion of conversions in treatment group (B) is equal to or less than the control group (A).
-  $H_1 : p_{B} > p_{A}$ The alternative hypothesis was that the proportion of conversions in treatment group (B) is greater than the control group (A).
- Based on the result of the z-test, we fail to reject the null hypothesis.
- This indicates that there is no evidence to suggest that the new recommendation system drives more conversions than the old one.

#### Recommendation

- Clarifications are needed for the technical description as some parts do not match with the given     data:
     
  * It is recommended to conduct another test during regular days to observe user behavior and            eliminate the effects of holiday season spending.
  * Random selection of participants from different regions is recommended to better represent the        international user base of the online store.
  * The technical description names the test as "recommender_system_test," but the dataset for            nterface_eu_system satisfies the technical specifications.
  *  The technical description refers to groups A (control) and B (new payment funnel), but also           mentions testing changes related to an improved recommendation system. This creates confusion         about the specific changes being tested. It is possible that there is a mistake in the               technical details or that they have not yet been updated. Clarification is needed on this             point.