# Metrics for site

### Introduction

For this task, we are asked to develop a solution for a theoretical website that calculates some values called "metrics", based on certain parameters. The first five metrics are explicitly listed in the task brief, including their formulas. The other three metrics need to be created from scratch.

Metric — in terms of the considered task, is some statistical, percentage, or quantitive measure, based on user activity on the website (e.g., clicks, time spent, etc.) or on other external parameters connected with the website's exploitation (e.g., gained and spent amount [of finance]).

The following metrics are specified in the brief:
- **Click-Through Rate (CTR)** is calculated as $\frac{Total\,Measured\,Clicks}{Total\,Measured\,Ad\,Impressions}*100$, where “total measured clicks” is the total amount of clicks on an ad; “total measured ad impressions” is the number of times an ad was loaded on a page. Click-through rates measure how successful an ad has been in capturing users' attention. The higher the click-through rate, the more successful the ad has been in generating interest.

- **Return on Investment (ROI)** is calculated as $\frac{Amount\,Gained\,-\,Amount\,Spent}{Amount\,Spent}*100$, where “amount gained” is the amount of income that has been generated by an investment; “amount spent” is the total amount spent on an investment. ROI stands for Return on Investment and means the amount of money you get back relative to the amount of money you put into something. It is different to profit, which is simply the amount spent subtracted from the amount earned. ROI goes a step further and works out profit per the amount spent. This answers the question – how much profit can I earn per pound/dollar/euro etc spent.

- **Average Page Time** is calculated as $\frac{\Sigma(Time\,Spent\,on\,a\,Page\,by\,a\,User)}{Number\,of\,Users}$, where “time spent on a page by a user” is time measured for each user who visits a webpage; “number of users” is the number of users who visit a webpage. Keep in mind, that usually users who spend less than 5 seconds on a webpage are not included in the calculations.

- **Customer Lifetime Value (CLV)** is calculated as $\scriptstyle(Average\,Purchase\,Value\,–\,Average\,Purchase\,Frequency)\,*\,Average\,Customer\,Lifespan$ and used to predict how much revenue a customer will drive over time. 

-  **Conversion Rate (CR)**  is calculated as $\frac{Total\,Attributed\,Conversion}{Total\,Measured\,Clicks}*100$, where “total attributed conversion” is the total amount of conversion recorded which have been caused clicks; “total clicks” – number of times an ad was clicked on.

### New metrics 

Here are mine suggestion for the new metrics:

- **Customer Acquisition Cost (CAC)** measures the cost of acquiring a new customer. It is calculated by dividing the total amount spent [of finance] by the number of customers acquired during a given period. It is calculated as $\frac{Total\,Spent\,Amount}{Number\,of\,Customers\,Acquired}$.
- **Customer Churn Rate (CCR)** measures the rate at which customers stop doing business with a company. It is calculated by dividing the number of customers lost during a given time period by the total number of customers at that period. In terms of the website, lost customers are those users who were visiting the site previous month but stopped doing so current month. The formula for the churn rate is $\frac{Number\,of\,Customers\,Lost}{Total\,Number\,of\,Customers}$
- **Bounce Rate (BR)**  measures the percentage of visitors who leave a website after viewing only one page. It is calculated as $\frac{Total\,Number\,of\,Single\,Page\,Visits}{Total\,Number\,of\,Visits}*100$. For example, if a website has 1000 visitors and 300 of them only view one page before leaving, the bounce rate would be 30%.


### The solution

#### Imports

For this task, a few libraries are needed. Considering this task belongs to the data analysis field, I chose NumPy and Pandas. Through the means of numpy, I will be returning the NaN value for the times where it is not possible to calculate a specific metric, but the exception during a function call wasn't raised. Since returning 0 or nothing is not the best approach for processing the dataset, 0 is still a number, and in the case of incorrect input parameters to the function, I'd like to highlight that the problem is in place. And for future research, one must give a second glance to such a dataset. Through the means of Pandas, I will be building a test DataFrame, representing some random dataset, for the metrics calculation functions to be called on that data. Also, the NumPy library will be used for filling the columns of that DataFrame with pseudo-random data.

In [76]:
import numpy as np
import pandas as pd

# to get current fucntion name
import inspect

#### Functions for the basic metrics

In [77]:
def caculate_ctr(total_msrd_clicks, total_msrd_impressions):
    """
        Calculates the Click-Through Rate (CTR).

        Parameters:
        total_msrd_clicks (int) - total number of clicks on the ad.
        Input value expected to be no less than 0.

        total_msrd_impressions (int) - total number of times ad was loaded on the page. 
        Input value expected to be positive (more than 0).
    
        Returns:
        float: Click-Trough Rate (CTR) or NaN if parameters conditions are not met.
    """
    try: 
        if total_msrd_impressions <= 0 or total_msrd_clicks < 0:
            return np.NaN
        return total_msrd_clicks / total_msrd_impressions * 100
    except Exception as e:
        cur_name = inspect.currentframe().f_code.co_name
        return f"""
        Exeption: {e} happend.
        Please, give an input according the expectations.
        Use help({cur_name}) or print({cur_name}.__doc__) for info.
            """

In [78]:
def calculate_roi(amnt_gained, amnt_spent):
    """
        Calculates the Return on Investment (ROI).

        Parameters:
        amnt_gained (int, float) - amount of income generated by the investment.

        amount_spent (int, float) - total amount spent on the investment.
        Input value expected to be positive (more than 0).
    
        Returns:
        float: Return on Investment (ROI) or NaN if parameters conditions are not met.
    """
    try:
        if amnt_spent <= 0:
            return np.NaN
        return (amnt_gained - amnt_spent) / amnt_spent * 100
    except Exception as e:
        cur_name = inspect.currentframe().f_code.co_name
        return f"""
        Exeption: {e} happend.
        Please, give an input according the expectations.
        Use help({cur_name}) or print({cur_name}.__doc__) for info.
            """

In [79]:
def calculate_apt(timespent, user_count):
    """
        Calculates Average Page Time (APT).

        Parameters:
        timespent (int) - time spent on a page by users in seconds.
        Input value expected to be no less than 0.

        user_count (int) - number of users visited the webpage
        Input value expected to be positive (more than 0).
    
        Returns:
        float: Average Page Time (APT) or NaN if parameters conditions are not met.
    """
    try:
        if timespent < 0 or user_count <= 0:
            return np.NaN
        return timespent / user_count
    except Exception as e:
        cur_name = inspect.currentframe().f_code.co_name
        return f"""
        Exeption: {e} happend.
        Please, give an input according the expectations.
        Use help({cur_name}) or print({cur_name}.__doc__) for info.
            """

In [80]:
def calculate_clv(avg_purchase_val, avg_purchase_freq, avg_cust_lifespan):
    """
        Calculates Customer Lifetime Value (CLV).

        Parameters:
        avg_purchase_val (int, float) - Average Purchase Value
        avg_purchase_freq (int, float) - Average Purchase Frequency
        avg_cust_lifespan (int, float) - Average Customer Lifespan

        All parameters expected to be positive (more than 0).

        Returns:
        int, float: Customer Lifetime Value (CLV) or
        NaN if parameters conditions are not met.
    """
    try:
        if avg_purchase_val <=0 \
            or avg_purchase_freq <= 0 \
            or avg_cust_lifespan <= 0:
            return np.NaN
        return (avg_purchase_val - avg_purchase_freq) * avg_cust_lifespan
    except Exception as e:
        cur_name = inspect.currentframe().f_code.co_name
        return f"""
        Exeption: {e} happend.
        Please, give an input according the expectations.
        Use help({cur_name}) or print({cur_name}.__doc__) for info.
            """


In [81]:
def calculate_cr(total_attr_conver, total_msrd_clicks):
    """
        Calculate Conversion Rate (CR).

        Parameters: 
        total_attr_conver (int) - Total Attributed Conversion
        Input value expected to be no less than 0.

        total_msrd_clicks - total number of clicks on the ad.
        Input value expected to be positive (more than 0).

        Returns:
        float: Conversion Rate (CR) or
        NaN if parameters conditions are not met.
    """
    try:
        if total_attr_conver < 0 or total_msrd_clicks <= 0:
            return np.NaN
        return total_attr_conver / total_msrd_clicks * 100
    except Exception as e:
        cur_name = inspect.currentframe().f_code.co_name
        return f"""
        Exeption: {e} happend.
        Please, give an input according the expectations.
        Use help({cur_name}) or print({cur_name}.__doc__) for info.
            """

#### Functions for the suggested metrics

In [82]:
def calculate_cac(total_amnt_spent, total_msrd_users):
    """
        Calculates the Customer Acquisition Cost (CAC).

        Parameters:
        total_amnt_spent (int, float) - total amount spent on the investment.
        Input value expected to be no less than 0.
        total_msrd_users (int) - total number of users for a given period.
    
        Returns:
        float: Customer Acquisition Cost (CAC) or NaN if parameters conditions are not met.
    """
    try:
        if total_amnt_spent < 0 or total_msrd_users <= 0:
            return np.NaN
        return total_amnt_spent / total_msrd_users
    except Exception as e:
        cur_name = inspect.currentframe().f_code.co_name
        return f"""
        Exeption: {e} happend.
        Please, give an input according the expectations.
        Use help({cur_name}) or print({cur_name}.__doc__) for info.
            """

In [83]:
def calculate_ccr(inactive_users, all_users):
    """
        Calculate the Customer Churn Rate (CCR).

        Parameters:
        inactive_users (int) - number of users who are inactive in a given period.
        Input value expected to be no less than 0.

        all_users (int) - number of all users in a given period.
        Input value expected to be positive (more than 0).

        Returns:
        float: Customer Churn Rate (CCR) or NaN if parameters conditions are not met.
    """
    try:
        if inactive_users < 0 or all_users <= 0:
            return np.NaN
        return inactive_users / all_users
    except Exception as e:
        cur_name = inspect.currentframe().f_code.co_name
        return f"""
        Exeption: {e} happend.
        Please, give an input according the expectations.
        Use help({cur_name}) or print({cur_name}.__doc__) for info.
            """

In [84]:
def calculate_br(single_page_visit, total_visits):
    """
        Calculates Bounce Rate (BR).

        Parameters:
        single_page_visit (int) - number of user visits for a single page only.
        Input value expected to be no less than 0.
        total_visits (int) - number of total visits of a site.
        Input value expected to be positive (more than 0).

        Returns:
        float: Bounce Rate (BR) or NaN if parameters conditions are not met.
    """
    try:
        if single_page_visit < 0 or total_visits <= 0:
            return np.NaN
        return single_page_visit / total_visits
    except Exception as e:
        cur_name = inspect.currentframe().f_code.co_name
        return f"""
        Exeption: {e} happend.
        Please, give an input according the expectations.
        Use help({cur_name}) or print({cur_name}.__doc__) for info.
            """

#### Testing dataset

At this point, we have made eight different simple functions to calculate the metrics described above. For the sake of research, I would suggest making a testing DataFrame object, which will be made using pandas and numpy libraries. Such a dataset will for sure have some assumptions, but it could still illustrate how the given calculation functions would receive input and show how I planned to process data before this.

The dataset will have a few columns:
- **user_id** integer number that illustrates a specific user
- **datetime** time period of the record
- **page** representation of page visited
- **timespent** amount of time in seconds spent on the page
- **purchase** in the case of the purchase represents the value, otherwise 0
- **clicks** number of clicks on the page
- **impressions** number of ads shown on the page

In [85]:
# for the consistent results of random, just in case
# np.random.seed(42)

In [86]:
# creating a hollow dataframe
df = pd.DataFrame()
#planned columns: 'user_id', 'datetime', 'page', 'timespent', 'purchase', 'clicks', 'impressions'

Since we are not interested in a specific user name, I decided to leave only the theoretical user identification number. The number is unique for the same user, but repetitions in the dataset are allowed. For example, if the user John Smith has the id = 47 (int), then every row in the dataset table with the value in the user_id column equal to 47 will be assigned to the user named John Smith. Actually, the user name, for the sake of privacy, could be stored in a separate database and is not needed for the current research and calculations.

In [87]:
# fill in 100 rows of user_id with the values from 1 to 49
df['user_id'] = np.random.randint(1, 50, 100)

In [88]:
# the number of unique user_id generated, should be less than 100
df.user_id.unique().size

41

To make the calculations and processing on the test dataset simpler, I decided to represent the **datetime** column value as a string. The value represents the month and year of the recorded row information in the table. In real life, most likely, that value would be stored with the precision of day and time in seconds. But this is not needed here; we just decided that we already processed some raw data to represent certain months of the year.

In [89]:
# filling datetime column with 100 rows of pseudo-random choice
df['datetime'] = np.random.choice(['01.2023', '02.2023', '03.2023', '04.2023'], 100)

In [90]:
# of course should have 4 unique values
df.datetime.unique()

array(['04.2023', '01.2023', '03.2023', '02.2023'], dtype=object)

Since the column actually represents a different thing to what I've stated before, let's rename it

In [91]:
df.rename(columns={'datetime': 'monthyear'}, inplace=True)

The next step is to add a few pages, which will be somewhat randomly distributed between dataset rows. For the case, I decided to make an /index page for the main, /item# pages for some goods on the website, /cart, /info, and /delivery. The standard page compositions for some easy-going online shops

In [92]:
df['page'] = np.random.choice(['/index', '/item1', '/item2', '/item3', '/cart', '/info', '/delivery'], 100)

In [93]:
# dataset starts to look normal
df.head()

Unnamed: 0,user_id,monthyear,page
0,30,4.2023,/item1
1,19,1.2023,/cart
2,30,3.2023,/delivery
3,8,3.2023,/index
4,48,1.2023,/delivery


In [94]:
# 7 unique pages as expected
df.page.unique().size

7

Adding time spent in seconds for each row means the user with the specific user_id on a certain page address spent that number of seconds. For this task, let's assume the user would spend no more than 15 minutes on the page, or 900 seconds.

In [95]:
df['timespent'] = np.random.randint(0, 901, 100)

Purchase amount for each row if the user ordered or bought something. Let's assume the purchase amount is usually no more than 1000 rub/usd/euro. Also, since 0 represents no purchase at all, we will fill the rows once more with the somewhat random choice between 0 and the previously generated amount of purchase. That way, we could drastically increase the frequency of 0s.

In [96]:
# generate purchases and setting a 0s to some of them
rnd_purchase = np.random.randint(0, 1001, 100)
for i in range(df.shape[0]):
    df.loc[i, 'purchase'] = np.random.choice([0.0, rnd_purchase[i]])

In [97]:
# also purchase became float after the procedure
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   user_id    100 non-null    int64  
 1   monthyear  100 non-null    object 
 2   page       100 non-null    object 
 3   timespent  100 non-null    int64  
 4   purchase   100 non-null    float64
dtypes: float64(1), int64(2), object(2)
memory usage: 4.0+ KB


The next step is to add impressions to the table. As we remembered, that number represents the amount of ad shown on a page. Consider this value no more than 9, since that amount of ad is sufficient enough for the one page, I suppose.

In [98]:
df['impressions'] = np.random.randint(0, 10, 100)

Also, some number of clicks is required. The trick is that the clicks probably shouldn't be more than impressions. Consider the situation when clicks outperform impressions never happen.

In [99]:
for i in range(df.shape[0]):
    df.loc[i, 'clicks'] = np.random.randint(0, df.loc[i, 'impressions'] + 1)

In [100]:
df.head(10)

Unnamed: 0,user_id,monthyear,page,timespent,purchase,impressions,clicks
0,30,4.2023,/item1,276,975.0,8,8.0
1,19,1.2023,/cart,232,416.0,3,3.0
2,30,3.2023,/delivery,885,627.0,6,4.0
3,8,3.2023,/index,367,576.0,2,2.0
4,48,1.2023,/delivery,703,0.0,2,0.0
5,6,2.2023,/item3,0,680.0,5,1.0
6,14,3.2023,/index,868,46.0,1,0.0
7,1,4.2023,/item3,720,40.0,0,0.0
8,16,2.2023,/index,102,0.0,8,3.0
9,1,2.2023,/index,333,600.0,4,0.0


Also, the amount of clicks is quite high for real-world data. So let's use the trick, which we've already done with the purchase values.

In [101]:
for i in range(df.shape[0]):
    df.loc[i, 'clicks'] = np.random.choice([0, df.loc[i, 'clicks']])
# cast to int
df.clicks = df.clicks.astype('int32')

In [102]:
df.head(20)
# here is situation a bit better

Unnamed: 0,user_id,monthyear,page,timespent,purchase,impressions,clicks
0,30,4.2023,/item1,276,975.0,8,8
1,19,1.2023,/cart,232,416.0,3,3
2,30,3.2023,/delivery,885,627.0,6,4
3,8,3.2023,/index,367,576.0,2,2
4,48,1.2023,/delivery,703,0.0,2,0
5,6,2.2023,/item3,0,680.0,5,0
6,14,3.2023,/index,868,46.0,1,0
7,1,4.2023,/item3,720,40.0,0,0
8,16,2.2023,/index,102,0.0,8,3
9,1,2.2023,/index,333,600.0,4,0


At this point, we're done building the test dataset. The next step will be testing the functions with the processed input from the dataset data.

In [103]:
# 100 columns and the expected types in place
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   user_id      100 non-null    int64  
 1   monthyear    100 non-null    object 
 2   page         100 non-null    object 
 3   timespent    100 non-null    int64  
 4   purchase     100 non-null    float64
 5   impressions  100 non-null    int64  
 6   clicks       100 non-null    int32  
dtypes: float64(1), int32(1), int64(3), object(2)
memory usage: 5.2+ KB


#### Testing functions

First, let's make a dictionary for storing all metrics values in order to show calculated numbers at the end of the current research.

In [104]:
metrics = {}

##### Click-Through Rate

Let's remember what this function requires as an input and which output it gives.

In [105]:
help(caculate_ctr)

Help on function caculate_ctr in module __main__:

caculate_ctr(total_msrd_clicks, total_msrd_impressions)
    Calculates the Click-Through Rate (CTR).
    
    Parameters:
    total_msrd_clicks (int) - total number of clicks on the ad.
    Input value expected to be no less than 0.
    
    total_msrd_impressions (int) - total number of times ad was loaded on the page. 
    Input value expected to be positive (more than 0).
    
    Returns:
    float: Click-Trough Rate (CTR) or NaN if parameters conditions are not met.



Introducing the global variables, which contain parameters required for the function call. Since we already have a DataFrame with the given clicks and impressions columns, it will be just the sum of all rows in these columns.

In [106]:
clicks = df['clicks'].sum()
clicks

93

In [107]:
impressions = df['impressions'].sum()
impressions

434

In [108]:
ctr = caculate_ctr(clicks, impressions)
print(f'The Click-Trough Rate of the website, based on the dataset, is {ctr:.2f}')

The Click-Trough Rate of the website, based on the dataset, is 21.43


In [109]:
metrics['Click-Trough Rate'] = ctr

Extreme cases:

In [110]:
print(caculate_ctr(-1, -2))
print(caculate_ctr('AAA', 'BBB'))

nan

        Exeption: '<=' not supported between instances of 'str' and 'int' happend.
        Please, give an input according the expectations.
        Use help(caculate_ctr) or print(caculate_ctr.__doc__) for info.
            


##### Return on Investment (ROI)

The following are required for the function to calculate correctly:

In [111]:
print(calculate_roi.__doc__)


        Calculates the Return on Investment (ROI).

        Parameters:
        amnt_gained (int, float) - amount of income generated by the investment.

        amount_spent (int, float) - total amount spent on the investment.
        Input value expected to be positive (more than 0).
    
        Returns:
        float: Return on Investment (ROI) or NaN if parameters conditions are not met.
    


Consider that we don't have data on financial amounts spent on the website. And the sum of all purchases does not represent the full picture of the amount gained. Anyway, I suggest that the spent value be randomly chosen for the sake of representation. Let's assume that spent value is something that was spent proportionally to the sum of purchases in the dataset, but no more.

In [112]:
gained = df['purchase'].sum()
spent = np.random.randint(0, gained)
print(gained, spent)

24007.0 11283


That way the Return on Investment rate is:

In [113]:
roi = calculate_roi(gained, spent)
print(f"Return on Inverstment: {roi:.2f}")

Return on Inverstment: 112.77


In [114]:
metrics['Return on Investment'] = roi

Extreme cases are some with negative or non-decimal values:

In [115]:
print(calculate_roi(-111,0))
print(calculate_roi('aaa', 'bbb'))

nan

        Exeption: '<=' not supported between instances of 'str' and 'int' happend.
        Please, give an input according the expectations.
        Use help(calculate_roi) or print(calculate_roi.__doc__) for info.
            


##### Average Page Time

This metric is a bit tricky. From the task brief, it is known that time less than 5 seconds usually are not included into consideration. Since we have a testing dataset with different pages and possible values in the timespent column, there is a need to filter and sum those values before passing it to the function.

Below, I filtered only the 3 columns needed for the calculation and excluded all rows with a time less than 5 seconds:

In [116]:
# filter only usefull data
time_df = df[df['timespent'] >= 5].filter(['user_id','page', 'timespent'])
time_df

Unnamed: 0,user_id,page,timespent
0,30,/item1,276
1,19,/cart,232
2,30,/delivery,885
3,8,/index,367
4,48,/delivery,703
...,...,...,...
95,44,/item2,234
96,42,/info,794
97,44,/item3,693
98,22,/item2,328


Calculating Average Page Time for each page in the dataset:

In [117]:
for page in df.page.unique():
    page_df = time_df[time_df['page'] == page]
    page_users = page_df['user_id'].unique().size
    page_time = page_df['timespent'].sum()
    apt = calculate_apt(page_time, page_users)
    print(f'For the page "{page}", Average Page Time is {apt:.2f} seconds.')
    # store value in metrics
    metrics[f'Average page time for {page}'] = apt

For the page "/item1", Average Page Time is 548.75 seconds.
For the page "/cart", Average Page Time is 502.44 seconds.
For the page "/delivery", Average Page Time is 563.08 seconds.
For the page "/index", Average Page Time is 658.29 seconds.
For the page "/item3", Average Page Time is 529.85 seconds.
For the page "/item2", Average Page Time is 484.45 seconds.
For the page "/info", Average Page Time is 579.12 seconds.


Extreme cases here are not only connected to the function itself but also to the data in the dataset. But this situation is excluded here in our testing DataFrame.

Some examples on the function fail:

In [118]:
print(calculate_apt(-111, 222))
print(calculate_apt(df, "aaa"))

nan

        Exeption: '<' not supported between instances of 'str' and 'int' happend.
        Please, give an input according the expectations.
        Use help(calculate_apt) or print(calculate_apt.__doc__) for info.
            


##### Customer Lifetime Value (CLV)

The decrtiption of a function

In [119]:
help(calculate_clv)

Help on function calculate_clv in module __main__:

calculate_clv(avg_purchase_val, avg_purchase_freq, avg_cust_lifespan)
    Calculates Customer Lifetime Value (CLV).
    
    Parameters:
    avg_purchase_val (int, float) - Average Purchase Value
    avg_purchase_freq (int, float) - Average Purchase Frequency
    avg_cust_lifespan (int, float) - Average Customer Lifespan
    
    All parameters expected to be positive (more than 0).
    
    Returns:
    int, float: Customer Lifetime Value (CLV) or
    NaN if parameters conditions are not met.



Considering the given dataset, we could calculate some of the parameters needed.

The average purchase value is quite easy to calculate. Just filter a rows with a purchase greater than 0 and afterwards get mean() or average value.

In [120]:
# dataframe for all the purchases > 0
purch_df = df[df['purchase'] > 0]

In [121]:
# average purchase value
avg_purch = purch_df['purchase'].mean()

As for Average Purchase Frequency, there is a formula on how to calculate it, specifically $\frac{Number\,of\,Purchases}{Number\,of\,Customers}$. Using the given dataset let's assume the "number of purchases" is the number of rows for which purchase value is greater than 0. And for the "the number of customer" is all unique users who made purchases.

In [122]:
num_purch = purch_df.shape[0]
num_cust = purch_df['user_id'].unique().size
print(num_purch, num_cust)

50 33


In [123]:
# Average Purchase Frequency
purch_freq = num_purch / num_cust
purch_freq

1.5151515151515151

As for the Average Customer Lifespan (ACL) that is a tricky part. According to the formula $ACL=\frac{1}{Customer\,Churn\,Rate}$. And actually this is the one of the metrics which I've suggested for the task. So let's first calculate Customer Churn Rate and then return to the CLV calculation.

<a href="https://www.zoho.com/subscriptions/guides/what-is-customer-lifetime-value-clv.html#:~:text=Average%20Customer%20Lifespan%20(ACL)%20is,derived%20from%20the%20churn%20rate.">Link to the Average Customer Lifespan formula.</a>

##### Customer Churn Rate (CCR)

Help for CCR calculation function:

In [124]:
help(calculate_ccr)

Help on function calculate_ccr in module __main__:

calculate_ccr(inactive_users, all_users)
    Calculate the Customer Churn Rate (CCR).
    
    Parameters:
    inactive_users (int) - number of users who are inactive in a given period.
    Input value expected to be no less than 0.
    
    all_users (int) - number of all users in a given period.
    Input value expected to be positive (more than 0).
    
    Returns:
    float: Customer Churn Rate (CCR) or NaN if parameters conditions are not met.



The problem here is finding the inactive user count. Consider the two month period of March and April, meaning the **monthyear** field value in our dataset should be '03.2023' and '04.2023'. Let's assume for the specific dataset that if the user_id has a record in March, but doesn't have one in April, then the user is considered inactive.

In [125]:
#filter march user id's and putting them in set
march_users = set(df[df['monthyear'] == '03.2023']['user_id'])
print(march_users)

{1, 2, 6, 7, 8, 14, 17, 20, 21, 22, 27, 28, 29, 30, 34, 37, 38, 41, 42, 44, 48, 49}


In [126]:
#filter april user id's and putting them in set
april_users = set(df[df['monthyear'] == '04.2023']['user_id'])
print(april_users)

{1, 2, 8, 9, 10, 11, 19, 20, 24, 25, 26, 28, 30, 31, 34, 35, 36, 41, 44, 47}


In [127]:
#inactive_users are those who were in march, but not in april
#meaning set difference between id's
inactive_users = march_users.difference(april_users)
print(inactive_users)

{37, 6, 7, 38, 42, 14, 48, 17, 49, 21, 22, 27, 29}


In [128]:
#have inactive users count
inactive_count = len(inactive_users)

And all users in the given period are just the size of the set april_users

In [129]:
#putting values in function
ccr = calculate_ccr(inactive_count, len(april_users))
print(f'For the current data set Customer Churn Rate (CCR) is {ccr:.2f}')

For the current data set Customer Churn Rate (CCR) is 0.65


In [130]:
metrics['Customer Churn Rate'] = ccr

##### Continuing Customer Lifetime Value (CLV)

Having CCR calculated we now may find Average Custome Lifespan by using the formula above, getting:

In [131]:
# Average Customer Lifespan
acl = 1 / ccr
acl

1.5384615384615383

Putting the values in the function

In [132]:
clv = calculate_clv(avg_purch, purch_freq, acl)
print(f'For the current dataset Customer Lifetime Value is {clv:.2f}')

For the current dataset Customer Lifetime Value is 736.35


Extreme cases are all the same, actually. If you put values that were not expected by the function, there are two cases: it returns numpy.NaN or the description of happened exception for you to correct your data or input processing.

In [133]:
metrics['Customer Lifetime Value'] = clv

##### Conversion Rate (CR)

As function help says:

In [134]:
help(calculate_cr)

Help on function calculate_cr in module __main__:

calculate_cr(total_attr_conver, total_msrd_clicks)
    Calculate Conversion Rate (CR).
    
    Parameters: 
    total_attr_conver (int) - Total Attributed Conversion
    Input value expected to be no less than 0.
    
    total_msrd_clicks - total number of clicks on the ad.
    Input value expected to be positive (more than 0).
    
    Returns:
    float: Conversion Rate (CR) or
    NaN if parameters conditions are not met.



In terms of the current testing dataset Total Attributed Conversion could mean if the purchase was made after clicks

In [135]:
# remember the dataframe for counting average purchase
purch_df.head()

Unnamed: 0,user_id,monthyear,page,timespent,purchase,impressions,clicks
0,30,4.2023,/item1,276,975.0,8,8
1,19,1.2023,/cart,232,416.0,3,3
2,30,3.2023,/delivery,885,627.0,6,4
3,8,3.2023,/index,367,576.0,2,2
5,6,2.2023,/item3,0,680.0,5,0


Let's filter the given dataset to find out if the purchase were made after clicks:

In [136]:
# getting Total Attributed Conversions (tac)
tac = purch_df[purch_df['clicks'] > 0].shape[0] 
tac

18

Afterwards we can calculate the Conversion Rate, having total clicks value calculated before.

In [137]:
cr = calculate_cr(tac, clicks)
print(f'For the current dataset the Conversion Rate is {cr:.2f}.')

For the current dataset the Conversion Rate is 19.35.


In [138]:
metrics['Conversion Rate'] = cr

Extreme cases once more:

In [139]:
print(calculate_cr(-1, -2))
print(calculate_cr('aaa','bbb'))

nan

        Exeption: '<' not supported between instances of 'str' and 'int' happend.
        Please, give an input according the expectations.
        Use help(calculate_cr) or print(calculate_cr.__doc__) for info.
            


##### Customer Acquisition Cost (CAC)

The specified input is expected and output will be provided:

In [140]:
print(calculate_cac.__doc__)


        Calculates the Customer Acquisition Cost (CAC).

        Parameters:
        total_amnt_spent (int, float) - total amount spent on the investment.
        Input value expected to be no less than 0.
        total_msrd_users (int) - total number of users for a given period.
    
        Returns:
        float: Customer Acquisition Cost (CAC) or NaN if parameters conditions are not met.
    


For the purspose of reasearch let's assume the total amount spent is somewhat between 0 and purchase sum amount

In [141]:
ttl_spent = np.random.randint(0, df['purchase'].sum())
ttl_spent

11629

The total user amount for a given period (January to April according to our dataset) are all unqiue users:

In [142]:
ttl_users = df['user_id'].unique().size
ttl_users

41

Calculating Customer Acquisition Cost, getting:

In [143]:
cac = calculate_cac(ttl_spent, ttl_users)
print(f'For the current dataset the Customer Acquisiton Cost is {cac:.2f}')

For the current dataset the Customer Acquisiton Cost is 283.63


In [144]:
metrics['Customer Acquisition Cost'] = cac

Testing extreme cases:

In [145]:
print(calculate_cac(-2, -5))
print(calculate_cac('aaa', 'bbb'))

nan

        Exeption: '<' not supported between instances of 'str' and 'int' happend.
        Please, give an input according the expectations.
        Use help(calculate_cac) or print(calculate_cac.__doc__) for info.
            


##### Bounce Rate (BR)

And the last bounce rate function decription:

In [146]:
help(calculate_br)

Help on function calculate_br in module __main__:

calculate_br(single_page_visit, total_visits)
    Calculates Bounce Rate (BR).
    
    Parameters:
    single_page_visit (int) - number of user visits for a single page only.
    Input value expected to be no less than 0.
    total_visits (int) - number of total visits of a site.
    Input value expected to be positive (more than 0).
    
    Returns:
    float: Bounce Rate (BR) or NaN if parameters conditions are not met.



Considering the testing dataset, we may have a single page visit only in the case where our user_id is in the table only once. Meaning we have to find those rows that do not have a repeating user id no matter which page was visited.

In [147]:
# filter those rows where user_id appears only once
unique_rows = df[df.groupby('user_id').user_id.transform('count') == 1]
unique_rows

Unnamed: 0,user_id,monthyear,page,timespent,purchase,impressions,clicks
17,31,4.2023,/item2,402,0.0,1,0
22,37,3.2023,/index,745,0.0,2,0
40,7,3.2023,/index,170,0.0,1,0
42,49,3.2023,/cart,545,0.0,7,0
49,35,4.2023,/item3,580,0.0,1,1
59,36,4.2023,/item1,89,0.0,2,0
64,46,2.2023,/info,200,136.0,2,0
67,32,1.2023,/item3,606,110.0,9,0
77,17,3.2023,/info,588,83.0,9,0


In [148]:
#let's check
df[df['user_id'] == 7]

Unnamed: 0,user_id,monthyear,page,timespent,purchase,impressions,clicks
40,7,3.2023,/index,170,0.0,1,0


In [149]:
df[df['user_id'] == 48]

Unnamed: 0,user_id,monthyear,page,timespent,purchase,impressions,clicks
4,48,1.2023,/delivery,703,0.0,2,0
32,48,2.2023,/cart,368,0.0,2,0
86,48,3.2023,/info,574,0.0,1,1
91,48,3.2023,/item3,839,687.0,0,0


It seems like it was done correctly. So now it's time to put values in the bounce rate function. Single page visits will be the number of rows in the previously filtered dataframe, and all visits will be the total number of rows in the main dataframe. Getting:

In [150]:
br = calculate_br(unique_rows.shape[0], df.shape[0])
print(f'For the current dataset Bounce Rate (BR) is {br:.2f}')

For the current dataset Bounce Rate (BR) is 0.09


In [151]:
metrics['Bounce Rate'] = br

Extreme cases are something like that:

In [152]:
print(calculate_br(-1, -2))
print(calculate_br('fasfas', 'aaaa'))

nan

        Exeption: '<' not supported between instances of 'str' and 'int' happend.
        Please, give an input according the expectations.
        Use help(calculate_br) or print(calculate_br.__doc__) for info.
            


### Conclusion

For the current research, we have made functions that calculate metrics according to the given formulas. Also, the testing dataset was created using the pandas library and the sheer capabilities of the DataFrame object. Thus, we've illustrated that the metrics calculation itself is extremely simple, but retrieving, analyzing, and processing the data for those metrics to be calculated is a non-trivial task for sure. And the largest part of the current research is dedicated to showing that the data needs to be in a special format, making it possible to think about how to process the data accordingly.

In conclusion, there are eight metrics we have calculated during the current notebook:

In [154]:
print('Metrics for the website, based on testing dataset:\n')
for key, value in metrics.items():
  print(f'\t{key} = {value}')
  print()

Metrics for the website, based on testing dataset:

	Click-Trough Rate = 21.428571428571427

	Return on Investment = 112.77142603917399

	Average page time for /item1 = 548.75

	Average page time for /cart = 502.44444444444446

	Average page time for /delivery = 563.0833333333334

	Average page time for /index = 658.2941176470588

	Average page time for /item3 = 529.8461538461538

	Average page time for /item2 = 484.45454545454544

	Average page time for /info = 579.1176470588235

	Customer Churn Rate = 0.65

	Customer Lifetime Value = 736.3459207459207

	Conversion Rate = 19.35483870967742

	Customer Acquisition Cost = 283.6341463414634

	Bounce Rate = 0.09

