# Project 9_Integrated_2

I work at a startup that sells food products and it's need to investigate user behavior for the company's app

## Purpose of the project

Purposes of the analysis are to find out the event funnel, determine how users reach the purchase stage and then to perform A/A/B test to investigate whether the change in design of the fonts for the entire application has an influence to users behavior.
First we should study the sales funnel. Find out how users reach the purchase stage and define how many users actually make it to this stage, how many users get stuck at previous stages.
Then interpret the results of an A/A/B tests. The designers would like to change the fonts for the entire app, but the users might find the new design intimidating. The analysis should help to make a decision based on the results of the following A/A/B test- the users are splited into three groups. Two control groups get the old fonts and one test group gets the new ones. Creating two A groups has certain advantages. We can make it a principle that we will only be confident in the accuracy of our testing when the two control groups are similar. If there are significant differences between the A groups, this can help us uncover factors that may be distorting the results. 


 ## 1. Open the data file and read the general information


In [None]:
!pip install -qq --user statsmodels

In [1]:
# Loading all the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from statsmodels.stats.proportion import proportions_ztest
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from scipy import stats as st
from functools import reduce
from dateutil.relativedelta import relativedelta
import datetime
from datetime import timedelta

In [4]:
# Load the data file
df = pd.read_csv('C:/logs_exp_us.csv') 
#let's print the first 5 rows
df.head()

Unnamed: 0,EventName\tDeviceIDHash\tEventTimestamp\tExpId
0,MainScreenAppear\t4575588528974610257\t1564029...
1,MainScreenAppear\t7416695313311560658\t1564053...
2,PaymentScreenSuccessful\t3518123091307005509\t...
3,CartScreenAppear\t3518123091307005509\t1564054...
4,PaymentScreenSuccessful\t6217807653094995999\t...


In [6]:
#We hahe a \t symbol as a separator
df = pd.read_csv('C:/logs_exp_us.csv', sep='\t') 
#let's print the first 5 rows
df.head()

Unnamed: 0,EventName,DeviceIDHash,EventTimestamp,ExpId
0,MainScreenAppear,4575588528974610257,1564029816,246
1,MainScreenAppear,7416695313311560658,1564053102,246
2,PaymentScreenSuccessful,3518123091307005509,1564054127,248
3,CartScreenAppear,3518123091307005509,1564054127,248
4,PaymentScreenSuccessful,6217807653094995999,1564055322,248


### Description of the data
Each log entry is a user action or an event.
#### EventName — event name
#### DeviceIDHash — unique user identifier
#### EventTimestamp — event time
#### ExpId — experiment number: 246 and 247 are the control groups, 248 is the test group

## 2. Prepare the data for analysis
Rename the columns, check for missing values and data types, add a date and time column and a separate column for dates.

In [7]:
# print the general information about
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244126 entries, 0 to 244125
Data columns (total 4 columns):
 #   Column          Non-Null Count   Dtype 
---  ------          --------------   ----- 
 0   EventName       244126 non-null  object
 1   DeviceIDHash    244126 non-null  int64 
 2   EventTimestamp  244126 non-null  int64 
 3   ExpId           244126 non-null  int64 
dtypes: int64(3), object(1)
memory usage: 7.5+ MB


No missing values

In [8]:
#Check for duplicates
df.duplicated().sum()

413

A little pice of the dataframe

In [9]:
#Remove duplicates
df.drop_duplicates(inplace=True)

#Check
df.duplicated().sum()

0

In [10]:
#Rename columns
df.columns = (['event_name', 'user_id', 'ts', 'group'])

df.head()

Unnamed: 0,event_name,user_id,ts,group
0,MainScreenAppear,4575588528974610257,1564029816,246
1,MainScreenAppear,7416695313311560658,1564053102,246
2,PaymentScreenSuccessful,3518123091307005509,1564054127,248
3,CartScreenAppear,3518123091307005509,1564054127,248
4,PaymentScreenSuccessful,6217807653094995999,1564055322,248


In [11]:
#Check numer of values in `event_name` column
df.event_name.unique()

array(['MainScreenAppear', 'PaymentScreenSuccessful', 'CartScreenAppear',
       'OffersScreenAppear', 'Tutorial'], dtype=object)

In [12]:
#Change types of `event_name` column and `ts` column
df['event_name'] = df['event_name'].astype('category')
df['ts'] = pd.to_datetime(df['ts'], unit='s')
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 243713 entries, 0 to 244125
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype         
---  ------      --------------   -----         
 0   event_name  243713 non-null  category      
 1   user_id     243713 non-null  int64         
 2   ts          243713 non-null  datetime64[ns]
 3   group       243713 non-null  int64         
dtypes: category(1), datetime64[ns](1), int64(2)
memory usage: 7.7 MB


In [13]:
#Adding date column
df['date'] = df['ts'].dt.date
df.head()

Unnamed: 0,event_name,user_id,ts,group,date
0,MainScreenAppear,4575588528974610257,2019-07-25 04:43:36,246,2019-07-25
1,MainScreenAppear,7416695313311560658,2019-07-25 11:11:42,246,2019-07-25
2,PaymentScreenSuccessful,3518123091307005509,2019-07-25 11:28:47,248,2019-07-25
3,CartScreenAppear,3518123091307005509,2019-07-25 11:28:47,248,2019-07-25
4,PaymentScreenSuccessful,6217807653094995999,2019-07-25 11:48:42,248,2019-07-25


## 3. Study and check the data



In [14]:
#Caculate the number of events
df.event_name.nunique()

5

In [15]:
#How many users are in the logs?
df.user_id.nunique()

7551

In [16]:
#What's the average number of events per user?
event_per_user = df.groupby('user_id')['event_name'].nunique().reset_index()
event_per_user['event_name'].describe()

count    7551.000000
mean        2.674480
std         1.454287
min         1.000000
25%         1.000000
50%         3.000000
75%         4.000000
max         5.000000
Name: event_name, dtype: float64

So the mean is 2.67 event per user and 50% of user make not more than 3 events. We can say that the average number of events is three.

In [None]:
#What period of time does the data cover? Find the maximum and the minimum date.
print(f' From {df.date.min()} to {df.date.max()}')

The period that covers dataframe lasts two weeks. Starts at 25 July and ends at 08 August 2019.

In [None]:
#Plot a histogram by date and time.
fig, ax = plt.subplots(figsize = (12,10))
ax=sns.histplot(data=df, x='ts')

ax.grid()
plt.title('Users in A/A/B test')
ax.set_xlabel('Date')
ax.set_ylabel('Events')
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
ax.tick_params(axis='x', labelrotation=45)

plt.show()


Looking at the distribution on graph we can't be sure that we have equally complete data for the entire period. Older events could end up in some users' logs for technical reasons, and this could skew the overall picture. 
Еhe moment at which the data starts to be complete is 2019-08-01 00:00. Let's filter the data and ignore the earlier section.


In [None]:
#Filter the data
data = df.query('ts >="2019-08-01"')
data.info()

In [None]:
#How many events and users are lost?

us_l = df.user_id.nunique() - data.user_id.nunique()

#Print
print(f'''we lost {len(df)-len(data)} events 
we lost {us_l} users''')

Obviously, we have filtered not much data, the result should be still relevant.

In [None]:
# Let's make sure we have users from all three experimental groups.
data.groupby('group')['user_id'].nunique().reset_index()

For sure we have about 2500 users in each group. The difference between the groups is acceptable.

## 4. The event funnel


Let's resolve the following questions: 

In what order the actions took place. 
Are all of them part of a single sequence?

We use the event funnel to find the share of users that proceed from each stage to the next. For the sequence of events A → B → C, calculate the ratio of users at stage B to the number of users at stage A and the ratio of users at stage C to the number at stage B.

At what stage we lose the most users?

What share of users make the entire journey from their first event to payment?

In [None]:
#frequency of occurrence:
data.event_name.value_counts()


The tutorial is the least frequent event, may be users just skip it.

In [None]:
#Find the number of users who performed each of these actions.
all_events_users=data.groupby('user_id')['event_name'].nunique().reset_index().query('event_name==5').shape[0]

print(all_events_users)

There are a few users who made all actions. This happens because they skip Tutorial

In [None]:
#Sort the events by the number of users.
event_user = data.groupby('event_name')['user_id'].nunique().sort_values(ascending=False).reset_index()
event_user.rename(columns={'user_id':'number_of_users'}, inplace=True)
user_n = data.user_id.nunique()

#Adding a column with percent of users of the event
event_user['user_share'] = event_user.number_of_users / user_n
event_user.user_share = event_user.user_share.apply(lambda x: "{:.1%}".format(x))
event_user.head()

Half of the users made the 4 actions and came to the page of the Payment Screen. 89% of users skip Tutorial, we should not  take this iteration to our funnel. We can determine the event funnel as:

1. - Main screen Appear
2. - Offer screen Appear
3. - Chart screen appear
4. - Payment screen Appear

In [None]:
#Create the function for a funnel for each group
def funnel(group):
    df = (data.query('group==@group').groupby('event_name')['user_id']
              .nunique().sort_values(ascending=False).reset_index()
    )
    df.rename(columns={'user_id':'number_of_users'}, inplace=True)
    df = df[df['event_name']!='Tutorial'].reset_index()
    
    #Adding percent of users 
    for i, row in df.iterrows():
        if i==0:
            df.loc[i, 'funnel'] = row['number_of_users'] / user_n
        else:
            df.loc[i, 'funnel'] = row['number_of_users']/df.loc[i-1, 'number_of_users']

    #Adjusting format of the values in `funnel` column
    df.funnel = df.funnel.apply(lambda x: "{:.1%}".format(x))
    
    return df

In [None]:
#Apply to group 246
funnel_246 = funnel(246)
funnel_246

In [None]:
#Apply function to group 247
funnel_247 = funnel(247)
funnel_247

In [None]:
#Applyto group 248 
funnel_248 = funnel(248)
funnel_248

In [None]:
#Plot funnel chart
fig = go.Figure()

fig.add_trace(go.Funnel(
    name = 'A_group_246',
    y = funnel_246['event_name'],
    x = funnel_246['number_of_users'],
    textinfo = "value+percent previous"))

fig.add_trace(go.Funnel(
    name = 'A_group_247',
    y = funnel_247['event_name'],
    x = funnel_247['number_of_users'],
    textinfo = "value+percent previous"))

fig.add_trace(go.Funnel(
    name = 'B_group_248',
    y = funnel_248['event_name'],
    x = funnel_248['number_of_users'],
    textinfo = "value+percent previous"))

fig.show()


- We lose 40% of users from Main screen to the next Offer screen. This part of users don't like the main page.

- 20% of customers don't like Offers screen. 

- The number of users who are going to Pay screen from Cart screen is huge, almost all of them (95%) pay.

In [None]:
#Calculate the number of users who have made at least 1 payment
payers = data.query('event_name=="PaymentScreenSuccessful"')['user_id'].nunique()
print(payers)

So, about a half of users have made at least one payment

## 5. The results of the experiment. A/A/B test analysis.


How many users are there in each group?

We have mentioned that there are about 2500 users in each of tested groups. The difference of users between the groups is just about 2%. In A/A/B test we compare number of users and the share of total users who perform the certain action. We'd like to know if there is a statistically significant difference between the groups. Before making a test with the group we should compare control groups with each other to be sure that there is no statistically significant difference between them.

In [None]:
#Filter data with queries
a1 = data.query('group==246')
a2 = data.query('group==247')
b = data.query('group==248')

#Check rows
sum([len(a1), len(a2), len(b)]) == len(data)

Now we should to repeat the procedure for all other events, so it will save time if we create a special function for this test.

In [None]:
def group_event_share(g1, g2, event):
    #Find number of total users in groups
    us_n_g1 = g1.user_id.nunique()
    us_n_g2 = g2.user_id.nunique()
    
    #Find number of users of an event
    us_n_g1_ac = g1.query('event_name == @event')['user_id'].nunique()
    us_n_g2_ac = g2.query('event_name == @event')['user_id'].nunique()
    
    #Find shares of user of event
    share_g1 = us_n_g1_ac / us_n_g1
    share_g2 = us_n_g2_ac / us_n_g2
    
    #Find relative difference
    diff = share_g2 / share_g1 - 1
    
    #print
    return print(f'''    Share of users who perform {event} from the first group is {share_g1:.1%}
    Share of users who perform {event} from the second group is {share_g2:.1%}
    Relative difference is {diff:.2%}''')

group_event_share(a1, a2, "MainScreenAppear")

There is no difference at all. We need to prove this statistically with Z-test. 

In [None]:
def test(g1, g2, event):
    #number of total users in groups
    us_n_g1 = g1.user_id.nunique()
    us_n_g2 = g2.user_id.nunique()
    
    #number of users performed a certain event
    us_n_g1_ac = g1.query('event_name == @event')['user_id'].nunique()
    us_n_g2_ac = g2.query('event_name == @event')['user_id'].nunique()
           
    #variables for the test
    count = [us_n_g1_ac, us_n_g2_ac]
    nobs = [us_n_g1, us_n_g2]
    stat, pval = proportions_ztest(count, nobs)
    
    #statistical significancy
    alpha=0.05
    
    if pval <alpha:
        print(f'''Groups 1 and 2 are statistically different in permorming {event}. p-value is: {pval:.3f}
    We reject null hypothesis.''')
    else:
        print(f'''We can not reject null hypothesis, because we can not confirm that there is statistical difference
in performing {event} for groups 1 and 2. p-value is: {pval:.3f}''')
        
test(a1, a2, "MainScreenAppear")

We see that the difference between our control groups in number of users who saw 'MainScreenAppear' in the application is not statistically significant. Let's formulate our hypotheses and perform the tests for each event for our control groups.

Hypotheses regarding number of users who perform a certain event:
 Difference in proportion of users who performed a certain event for both groups 1 and 2 are statistically insignificant.
 Difference in proportion of users who performed a certain event for both groups 1 and 2 are statistically significant.

In [None]:
#list of event_names
events = data.event_name.unique()

#a loop through `event_name` column
print('Comparison of proportion of users who performed different events between control groups:\n ')
for event in events:
    group_event_share(a1, a2, event)
    test(a1, a2, event)
    print()



So the difference between the control groups is not statistically significant.

In [None]:
#Comparison between Group 246 and 248
#a loop through `event_name` column
print('Comparison of number users who performed different events between groups 246 and 248:\n ')
for event in events:
    group_event_share(a1, b, event)
    test(a1, b, event)
    print()

The  result is the same.

The difference in 'Cart screen appear' is very close to the level of statistical significance 0.078. 

In [None]:
#Compare between 247 and 248 #a loop through `event_name` column
print('Comparison of number users who involved in different events between groups 247 and 248:\n ')
for event in events:
    group_event_share(a2, b, event)
    test(a2, b, event)
    print()

There is no statistically significant difference between those groups. Now merge our control groups together and compare control group with the group with altered fonts.

In [None]:
#control group
a = data.query('group!=248')

#Comparison between new control group and group with altered fonts
#a loop through `event_name` column
print('Comparison of number of users who involved in different events between combined group A and group 248 (B):\n ')
for event in events:
    group_event_share(a, b, event)
    test(a, b, event)
    print()

Acording to the tests we can not confirm that altered fonts would lead in change in users behavior. We made 20 tests in total. 15 tests to compare 5 events between 2 control groups and group with altered founts and 5 more for joined control group and group with altered fonts. All the tests show no statistically significant difference in share of users who perform certain event for each of them.



#### Conclusions about the level of statistical significance.  
With the level of 0.05 one of 20 tests can be false, so the choice is right. If we increase the level to 0.1, each of 10 tests could be false, so it's better to leave it no higher then 0.05. 

# General conclusions


The purposes of the analysis were to find out the event funnel, determine how users reach the purchase stage and then to perform A/A/B test to investigate whether the change in design of the fonts for the entire application  improves the user's behavior.

We have records of users actions in the app. From the row data we droped 413 duplicates (0.17% of the data), changed type of the time stamp column to datetime and added the column with the date of the event, we left only one the last week for the analysis. After filtering we lost just 1.2% of the events and 0.2% of the total users. 

In average each user makes 3 different events when using the application. We have almost the same amount of users in each group. There are 466 users involved in all the events. This is 6.2% of total users. Almost half of the users (47%) made 4 from 5 events. We saw that the most of users skip tutorial screen. The Cart screen sometimes appears later than the payment was executed.The event funnel is:

    >Main screen appear

        >Offer screen appear

            >Chart screen appear

                >Payment screen appear

40% of users see the main page and do not continue interaction. 

Than 20% of customers don't like the offer screen.

95% of users are going to pay after look at the cart screen.


There are 3539 users who made at least 1 payment. This share is 47.0% of total users.

We confirmed that maximum difference in number of users between the groups is 2% which is acceptable. In order to compare the share of users who performed certain action we did Z-test for proportions. Before making a test between the group who has altering fonts and a control group we compared both control groups between each other to be sure that there is no statistically significant difference and we could continue our test.

We formulated hypotheses and performed the tests for each event for our control groups.

Hypotheses regarding number of users who perform a certain event:
 0.Difference in proportion of users who performed a certain event for both groups 1 and 2 are statistically insignificant.
 Alternative. Difference in proportion of users who performed a certain event for both groups 1 and 2 are statistically significant. The level of statistical significance - 0.05

The difference between the control groups is not statistically significant for each event, although there is rather big difference in payment event (about 5%). And the p-value is twice larger than our alpha parameter (0.05). So it is still statistically insignificant. We merge control groups together and compare it with the group with altered fonts.


#### All the tests showed that we can not confirm that altered fonts imrove the user's behavior.  We do not recommend to make changes in the application.