# **Success Indicators**

You have now been asked to discover what key performance indicators (KPIs) will determine the success of the new design? Use at least completion rate, time spent on each step and error rates. Add any KPIs you might find relevant.

- **Completion Rate:** The proportion of users who reach the final ‘confirm’ step.
- **Time Spent on Each Step:** The average duration users spend on each step.
- **Error Rates: If there’s a step where users go back to a previous step, it may indicate confusion or an error. You should consider moving from a later step to an earlier one as an error.
Redesign Outcome
Based on the chosen KPIs, how does the new design’s performance compare to the old one?



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re

In [2]:
# import reusable functions from utils directory
import sys
sys.path.append('../../utils')
import functions

In [3]:
combined_cleaned_df = pd.read_csv('../../data/raw/combined_cleaned_data.csv')
combined_cleaned_df.head()

Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time
0,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:27:07
1,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:26:51
2,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:19:22
3,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:19:13
4,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:18:04


In [4]:
print(combined_cleaned_df['process_step'].unique())

['step_3' 'step_2' 'step_1' 'start' 'confirm']


In [5]:
combined_cleaned_df['date_time'] = pd.to_datetime(combined_cleaned_df['date_time'], errors='coerce')

# check if any dates couldn't be converted (i.e., they are NaT)
inconsistent_dates = combined_cleaned_df['date_time'].isna().sum()

# If inconsistent_dates > 0, then there are invalid or mismatched date formats
if inconsistent_dates > 0:
    print(f'There are {inconsistent_dates} inconsistent or invalid date formats in the column.')
else:
    print('All dates in the column have the same format.')

print(combined_cleaned_df[['date_time']].head())

All dates in the column have the same format.
            date_time
0 2017-04-17 15:27:07
1 2017-04-17 15:26:51
2 2017-04-17 15:19:22
3 2017-04-17 15:19:13
4 2017-04-17 15:18:04


In [6]:
functions.inspect_dataframe(combined_cleaned_df)

Check the shape (rows, columns):
(744641, 5)

Column names:
Index(['client_id', 'visitor_id', 'visit_id', 'process_step', 'date_time'], dtype='object')

Data types:
client_id                int64
visitor_id              object
visit_id                object
process_step            object
date_time       datetime64[ns]
dtype: object

Missing values:
client_id       0
visitor_id      0
visit_id        0
process_step    0
date_time       0
dtype: int64


In [7]:
# make a copy
kpi_df = combined_cleaned_df.copy()

# drop columns
kpi_df = kpi_df.drop(columns=['visitor_id', 'visit_id'])

In [8]:
# change the naming of the steps to numeric values to ensure order
replacement_dict_steps = {
    'start' : 0,
    'step_1' : 1,
    'step_2' : 2,
    'step_3' : 3,
    'confirm' : 4
}

kpi_df['process_step'] = kpi_df['process_step'].map(replacement_dict_steps)

## How many clients in total have completed the process?

In [9]:
# filter rows where process_step is 'Finish'
clients_finished = kpi_df[kpi_df['process_step'] == 4]

# print(clients_finished)
total_unique_clients = kpi_df['client_id'].nunique()
# find unique client ids that finished
unique_clients_finished = clients_finished['client_id'].nunique()

print(f'Clients who finished the process: {unique_clients_finished} out ouf {total_unique_clients}.')

Clients who finished the process: 81145 out ouf 120157.


**To do**:
- [ ] calculate percentage
- [ ] visual representation?

## How much time has been spent on each step (on average)?