# Analyse Customer Value by Frequency, Recency and Monetary Value
ref: 
- [Customer Lifetime Value](https://clevertap.com/blog/customer-lifetime-value/)
- [Frequency, Recency, Monetary Value Analysis](https://clevertap.com/blog/rfm-analysis/) | [whitepaper](https://info.clevertap.com/hubfs/Blog%20Images/A%20Quick%20Start%20Guide%20to%20Automated%20Segmentation%20(1).pdf?_hsmi=64667222&_hsenc=p2ANqtz-96Je_GDBFw8_9_MDcEkdq4SvOYni_MBWoopRVB4h87PHOHxkf039plUfRhUIPxxK7H6bkXki0pfix-uQeDz-0qfER-KQ) 
- [Customer Segmentation Blog](https://towardsdatascience.com/the-most-important-data-science-tool-for-market-and-customer-segmentation-c9709ca0b64a)

Customer lifetime value (CLV), is the profit margin a company expects to earn over the entirety of their business relationship with the average customer.

Some contributing factors:
- customer churn rate
- retention rate
- sales & marketing strategy

Business may use strategies such as: 
- **Impress** | by quality/pricing
- **Engage**  | by sales & marketing strategies
- **Retain**  | continue to impress and engage

In this notebook, we explore customer segments by *Frequency*, *Recency* and *Monetary Value*

## About Data

Source: https://archive.ics.uci.edu/ml/datasets/online+retail#

**Attribute Information**:

`InvoiceNo`: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with **letter 'c'**, it indicates a **cancellation**.

`StockCode`: Product (item) code. Nominal, a 5-digit integral number **uniquely** assigned to each distinct product.

`Description`: Product (item) name. Nominal.

`Quantity`: The quantities of each product (item) per transaction. Numeric.

`InvoiceDate`: Invice Date and time. Numeric, the day and time when each transaction was generated.

`UnitPrice`: Unit price. Numeric, Product price per unit in **sterling**.

`CustomerID`: Customer number. Nominal, a 5-digit integral number **uniquely** assigned to each customer.

`Country`: Country name. Nominal, the name of the country where each **customer resides**.

# Set up

In [None]:
%load_ext autoreload
%autoreload 2

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np 
import pandas as pd

# Data
## Load Data
Cell below assumed that dataset is registered in AML Workspace.

In [None]:
# azureml-core of version 1.0.72 or higher is required
# azureml-dataprep[pandas] of version 1.1.34 or higher is required
from azureml.core import Workspace, Dataset

workspace = Workspace.from_config()
print(workspace.name, workspace.resource_group, workspace.location, workspace.subscription_id, sep = '\n')

dataset = Dataset.get_by_name(workspace, name='online-retail-processed')
df_orig = dataset.to_pandas_dataframe()

In [None]:
df = df_orig.copy()
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'], errors ='coerce')
df.dtypes

## Normalise Data

- Check if `StockCode` has consistent `Description` < TO DO>

## Prepare Features

### Derive new features:
- ~~`OrderCancelled` : Create a new column to indicate the order is cancelled. Boolean.~~
- `TotalSum` : Create a new column to indicate the total sum of an order, i.e. `Quantity` x `UnitPrice`

## Select Data
- Select data within the range of last 12 months

In [None]:
# Filter data between two dates
df_sub = df.loc[(df['InvoiceDate'] >= '2011-06-1')
           & (df['InvoiceDate'] < '2011-12-1')].copy()

df_sub.describe(include='all', datetime_is_numeric=True)

Note:
- extreme value
    - `Quantity` : -80995, 80995
    - `UnitPrice` : 38970.00

## Aggregate Data

### Aggregate transactions by `CustomerID`
- Aggregate by accounting for effective `quantity`, i.e. taking into account when items are returned. e.g. if bought 10 units, then return 3 units, effective `Quantity` is 7 units

Note:
- Effective `Quantity` >= 0

In [None]:
# group by 'CustomerID', 'StockCode', 'UnitPrice', 'Country', then, get the sum of `Quantity`
df_effective_quantity = df_sub.groupby(['CustomerID', 'StockCode', 'UnitPrice', 'Country'], as_index=False, observed=True)['Quantity'].sum() 

df_effective_quantity.describe()
df_effective_quantity

In [None]:
df_effective_quantity[df_effective_quantity['Quantity'] < 0]

### Remove `CustomerID` where `df_effective_quantity['Quantity']<0`, if exist

In [None]:
condition = (df_effective_quantity['Quantity']<0)
CustomerID_remove = df_effective_quantity[condition]['CustomerID'].unique()
CustomerID_remove

In [None]:
df_sub.shape
df_sub = df_sub[~df_sub['CustomerID'].isin(CustomerID_remove)]
df_sub.shape

In [None]:
df_sub.describe(include='all', datetime_is_numeric=True)

#### Check again

In [None]:
# group by 'CustomerID', 'StockCode', 'UnitPrice', 'Country', then, get the sum of `Quantity`
df_effective_quantity = df_sub.groupby(['CustomerID', 'StockCode', 'UnitPrice', 'Country'], as_index=False, observed=True)['Quantity'].sum() 

df_effective_quantity.describe()
df_effective_quantity

# check for effective 'Quantity' < 0
df_effective_quantity[df_effective_quantity['Quantity'] < 0]

Note:
- At this stage, `df_sub`:
    - date range 2021-6-1 to 2021-12-1
    - contains transactions of `CustomerID` where effective `Quantity` is >= 0

In [None]:
df_sub.describe(include='all', datetime_is_numeric=True)

## Aggregate transactions by `CustomerID`

In [None]:
df_effective_quantity.describe(include='all', datetime_is_numeric=True)
df_effective_quantity

### Create a column `TotalSum`

In [None]:
df_sub['TotalSum'] = df_sub['Quantity'] * df_sub['UnitPrice']
df_sub

## Analyse Customer Value

## Frequency, Recency, Monetary

In [None]:
# Snapshot at latest date of this dataset
snapshot_date = df_sub['InvoiceDate'].max()
snapshot_date

# Calculate Recency, Frequency and Monetary value for each customer
df_frm = df_sub.groupby(['CustomerID']).agg({'InvoiceDate' : lambda x : (snapshot_date - x.max()).days,
                                             'InvoiceNo' : 'count',
                                             'TotalSum' : 'sum'}).rename(columns={'InvoiceDate' : 'Recency(Days)',
                                                                                  'InvoiceNo' : 'Frequency',
                                                                                  'TotalSum' : 'Monetary(£)'})

df_frm

In [None]:
df_frm.describe()

### Pair Plot

In [None]:
_ = sns.pairplot(df_frm, height=3, aspect=1.2)

### Map `df_rfm` to normal distribution

In [None]:
from sklearn.preprocessing import PowerTransformer
import pickle

ptransformer = PowerTransformer(method="yeo-johnson")

df_frm_transformed = pd.DataFrame(ptransformer.fit(df_frm).transform(df_frm), 
                                  columns=['Recency(Days)',	'Frequency', 'Monetary(£)'])


# if False: 
if True: # Uncomment to save
    ptransformer_filepath = f'../../.aml/models/powertransformer.pkl'
    pickle.dump(ptransformer, open(ptransformer_filepath, "wb"))

### Register Transformer Model

In [None]:
if True:
# if False:
    from azure.ai.ml import MLClient
    from azure.ai.ml.entities import Model
    #from azure.ai.ml._constants import ModelType
    from azure.identity import DefaultAzureCredential

    # get a handle to the workspace
    ml_client = MLClient(credential=DefaultAzureCredential(), 
                        subscription_id=workspace.subscription_id, 
                        resource_group_name=workspace.resource_group, 
                        workspace_name=workspace.name)
    ml_client

    model_filepath = f'../../.aml/models/powertransformer.pkl'

    file_model = Model(
        path = model_filepath,
        #type=ModelType.CUSTOM,
        name = "powertransformer",
        description = "powertransformer.pkl",
        auto_increment_version = True,)
    
    ml_client.models.create_or_update(file_model)

In [None]:
pplt = sns.pairplot(df_frm_transformed, height=3, aspect=1.2)
_ = pplt.fig.suptitle('Normalised Frequency, Recency, Monetary Distribution', y=1.02) # y is position of title

# Data Management

## Upload to Blob Storage

In [None]:
from azureml.core import Workspace, Dataset

datastore = workspace.get_default_datastore()

if True:
# if False: # Replace `False` with `True` to run code below
    filename = '../../.aml/data/online-retail-frm.csv'

    # Save to local
    df_frm_transformed.to_csv(filename, index=False)

    # Upload to DataStore
    Dataset.File.upload_directory('../../.aml/data', datastore)

## Register `df_rfm_transformed` as Dataset

In [None]:
from azureml.core import Workspace, Dataset

datastore = workspace.get_default_datastore()

if True:
# if False: # Replace `False` with `True` to run code below

    # Dataset name to register as 
    name = 'online-retail-frm'

    # create a new dataset
    Dataset.Tabular.register_pandas_dataframe(dataframe=df_frm_transformed, 
                                            target=datastore, 
                                            name=name, 
                                            show_progress=True, 
                                            tags={'Purpose':'Tutorial'})