In [None]:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import datetime as dt
from sklearn.preprocessing import MinMaxScaler
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
from IPython.display import Image
!ls ../input/crmanalyticsfigures/

# CRM Analytics

CRM analytics refers to the processing of data that resides in your CRM database to uncover useful insights about customers that businesses can act upon. Integration with analytics makes CRM systems more intelligent in comprehending customers and empowers you to make data-driven decisions.

![image.png](attachment:213a504d-04ef-4183-b05d-b815ced4050b.png)

Besides that, CRM analytics contains different subtitles as below:

- Customer lifecycle/journey/funnel
- Communication with the customers (language, business culture, colors, campaign etc.)
- Finding new customers
- Customer retention or desertion
- Cross-sell / up-sell

![image.png](attachment:aa441ec4-8f2d-4c14-834e-63a1bbafe0a3.png)

## 1) KPI (Key Performance Indicator)

A Key Performance Indicator is a measurable value that demonstrates how effectively a company is achieving key business objectives. Organizations use KPIs at multiple levels to evaluate their success at reaching targets. High-level KPIs may focus on the overall performance of the business, while low-level KPIs may focus on processes in departments such as sales, marketing, HR, support and others.

__Oxford's Dictionary definition of KPI:__ A quantifiable measure used to evaluate the success of an organization, employee, etc. in meeting objectives for performance.

KPI contains different subtitles for the business. For instance:

- Customer Acquisition Rate
- Customer Retention Rate
- Customer Churn Rate
- Growth Rate

Source: [KPI](https://ceaksan.com/tr/kpi-nedir)

In [None]:
Image("../input/crmanalyticsfigures/KPI.jpeg")

## 2) Cohort Analysis

A cohort is a group of people who share a common characteristic over a certain period of time, such as users that have become customers at approximately the same time, a graduating class of students, or contact tracing individuals during a pandemic.

Cohort analysis is a study that concentrates on the activities of a specific cohort type. A cohort analysis table is used to visually display cohort data in order to help analysts compare different groups of users at the same stage in their lifecycle, and to see the long-term relationship between the characteristics of a given user group.

Source: [Cohort](https://www.omnisci.com/technical-glossary/cohort-analysis#:~:text=Cohort%20analysis%20is%20an%20analytical,common%20characteristics%20prior%20to%20analysis)


In [None]:
Image("../input/crmanalyticsfigures/Cohort.png")

## 3) What is RFM ?

RFM stands for Recency, Frequency, and Monetary value. RFM represents a method used for measuring customer value.

The RFM model implies analyzing past transactional data and using that research to identify different segments of customers based on their purchase history.

RFM specialists usually use it in database marketing and direct marketing, but lately, RFM Analysis has received big attention and is widely used in eCommerce.

That’s why the main benefit of an RFM analysis consists of being able to address each segment separately, according to what you know about them based on their score on Recency, Frequency, and Monetary value indicators.

In [None]:
Image("../input/crmanalyticsfigures/RFM.png")

### What are Recency, Frequency and Monetary?

- Recency – How recent was a customer’s latest purchase from you?

- Frequency – How often does a customer purchase from you?

- Monetary – How much does a customer usually spend?

### What is the RFM analysis?

Afteridentifying the three metrics and listed some questions they might provide the answer for, it’s time to learn how to do the RFM analysis.

An RFM analysis can show you who are the most valuable customers for your business. The ones who buy most frequently, most often, and spend the most.

And it all starts with proper segmentation.

- Frequency: very frequent buyers, medium frequency buyers, one transaction only buyers.

- Recency: most recent customers, medium recency customers, least recent customers

- Monetary: customers who spend the most, above-average monetary value, average, low monetary value

Combining the above segments you’ll get more advanced ones, such as:

- Clients who come back frequently but spend very little (high frequency, low monetary, maybe deal hunters)

- Clients who only ordered once but spent above average (they could help you identify pain points that prevented them from ordering again and their monetary value – higher than average – is worth digging for insights)

- VIP clients (those who have a high RFM score overall, especially the monetary value, and who bring your business the most revenue)

- Clients who used to have high frequency and monetary value but have stopped ordering from you, so they have low recency (a sign that they might have switched to the competition although they had been loyal to you, and it’s worth finding out why

### RFM Segmentation

The recency frequency monetary model can help you take segmentation to a whole new level.

Through an RFM customer segmentation analysis you’ll be able to see:

- Who is included in the 1% of customers that bring most of your revenue

- Which of the loyal customers return the most often

- Who are the customers with big monetary value who placed big orders in the past, but have a low recency score, meaning they haven’t ordered in a long time

Based on these segments you’ll be able to ask yourself questions such as:

- How can I make the high monetary value but low recency customers come back?

- What makes the high recency and frequency customers so loyal? What is it they appreciate in my products or services?

- What other traits (geographical, demographical, behavioral) do my high monetary value customers have in common? How can I use those to create a segment of potential new customers and address them accordingly?


Source: 
- [RFM](https://www.omniconvert.com/blog/what-is-rfm/)

- [RFM Segmentation](https://medium.com/@raviatgrowlytics?source=rss-e4b11989be06------2)

In [None]:
Image("../input/crmanalyticsfigures/RFMSegmentation.png")

### Customer Lifetime Value

Customer lifetime value (CLV) is one of the key stats to track as part of a customer experience program. CLV is a measurement of how valuable a customer is to your company, not just on a purchase-by-purchase basis but across the whole relationship.


We also can formulaze the Customer Lifetime Value as below:

- CLTV = (Customer Value / Churn Rate) * Profit Margin

- Customer Value = Average Order Value * Purchase Frequency

- Average Order Value = Total Price / Total Transaction

- Purchase Frequency = Total Transaction / Total Number of Customers

- Churn Rate = 1 - Repeat Rate

- Repeat Rate = Customers that buy products several times / Total Customers

- Profit Margin = Total Price * 0.10 (0.10 can change depends on the industry standarts)

Source:
- [Customer Life Time](https://www.qualtrics.com/uk/experience-management/customer/customer-lifetime-value/?rid=ip&prevsite=en&newsite=uk&geo=TR&geomatch=uk)

- [CVT](https://clevertap.com/blog/customer-lifetime-value/)

## Let's Code and Practice 🚀👨🏼‍💻

- We will use open source Online Retail Dataset

Let's check the dataset for first insight

- InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'C', it indicates a cancellation.

- StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.

- Description: Product (item) name. Nominal.

- Quantity: The quantities of each product (item) per transaction. Numeric.

- InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated.

- UnitPrice: Unit price. Numeric. Product price per unit in sterling (£).

- CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.

- Country: Country name. Nominal. The name of the country where a customer resides.

Source: [Online Retail Dataset](https://archive.ics.uci.edu/ml/datasets/Online+Retail+II#)

### Data Understanding

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [None]:
df_ = pd.read_csv("../input/online-retail-ii-uci/online_retail_II.csv")
# Let's copy the dataset for furher changes
df = df_.copy()
df.head()

In [None]:
df.shape

In [None]:
# InvoiceNo column has some features that needs to be clean
# If the InvoiceNo data has "C" letter that means there was an cancellation
# So, let's get rid of that unwanted data
df = df[~df["Invoice"].str.contains("C", na=False)]

### Calculating RFM Metrics

- Recency

- Frequency

- Monetary

In [None]:
# To define recency, we need to define analyzing date
df["InvoiceDate"].max()
today_date = dt.datetime(2011, 12, 11)
df['InvoiceDate']=pd.to_datetime(df['InvoiceDate'])

In [None]:
rfm = df.groupby('Customer ID').agg({'InvoiceDate': lambda date: (today_date - date.max()).days,
                                     'Invoice': lambda num: num.nunique(),
                                     'Price': lambda Price: Price.sum()})
rfm.columns = ['recency', 'frequency', 'monetary']
rfm.head()

In [None]:
rfm.describe().T

In [None]:
# Cancellation processes affected our dataset as -41451 which shouldn’t include in our dataset
rfm = rfm[rfm["monetary"] > 0]

### Calculating RFM Scores

In [None]:
# Recency
rfm["recency_score"] = pd.qcut(rfm['recency'], 5, labels=[5, 4, 3, 2, 1])
# Frequency
rfm["frequency_score"] = pd.qcut(rfm['frequency'].rank(method="first"), 5, labels=[1, 2, 3, 4, 5])
# Monetary
rfm["monetary_score"] = pd.qcut(rfm['monetary'], 5, labels=[1, 2, 3, 4, 5])

In [None]:
rfm.head()

In [None]:
# Let's use Recency and Frequency for calculating the RFM Score

rfm["RFM_SCORE"] = (rfm['recency_score'].astype(str) +
                    rfm['frequency_score'].astype(str))

rfm.head()

In [None]:
rfm[rfm["RFM_SCORE"] == "55"].head()  # champions

In [None]:
rfm[rfm["RFM_SCORE"] == "11"].head()  # hibernating

### Creating and Analyzing RFM Segments

In [None]:
# RFM naming
seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_Risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}

In [None]:
# Segmentation using RFM scores and RFM naming
rfm['segment'] = rfm['RFM_SCORE'].replace(seg_map, regex=True) 

In [None]:
# Group by each segment fro RFM score as mean and count
rfm[["segment", "recency", "frequency", "monetary"]].groupby("segment").agg(["mean", "count"])

In [None]:
# Show the need attantion segment
rfm[rfm["segment"] == "need_attention"].head()

In [None]:
# Listing the new customers segment indexes
rfm[rfm["segment"] == "new_customers"].index 

In [None]:
# Creating new customers dataframe
new_df = pd.DataFrame()
new_df["new_customer_id"] = rfm[rfm["segment"] == "new_customers"].index
new_df.head()

In [None]:
# Saving the new customers dataframe to csv file
new_df.to_csv("new_customers.csv")

### Customer Lifetime Value

- CLTV = (Customer Value / Churn Rate) * Profit Margin

- Customer Value = Average Order Value * Purchase Frequency

- Average Order Value = Total Price / Total Transaction

- Purchase Frequency = Total Transaction / Total Number of Customers

- Churn Rate = 1 - Repeat Rate

- Repeat Rate = Customers that buy products several times / Total Customers

- Profit Margin = Total Price * 0.10 (0.10 can change depends on the industry standarts)

In [None]:
df.head()

In [None]:
# Data Preparation
df = df[~df["Invoice"].str.contains("C", na=False)]
df = df[(df['Quantity'] > 0)]
df.dropna(inplace=True)
df["TotalPrice"] = df["Quantity"] * df["Price"]

In [None]:
cltv_c = df.groupby('Customer ID').agg({'Invoice': lambda x: x.nunique(),
                                        'Quantity': lambda x: x.sum(),
                                        'TotalPrice': lambda x: x.sum()})

In [None]:
cltv_c.columns = ['total_transaction', 'total_unit', 'total_price']
cltv_c.head()

In [None]:
# Average Order Value (average_order_value = total_price / total_transaction)
cltv_c['avg_order_value'] = cltv_c['total_price'] / cltv_c['total_transaction']

In [None]:
# Purchase Frequency (total_transaction / total_number_of_customers)
cltv_c["purchase_frequency"] = cltv_c['total_transaction'] / cltv_c.shape[0]

In [None]:
# Repeat Rate & Churn Rate
repeat_rate = cltv_c[cltv_c.total_transaction > 1].shape[0] / cltv_c.shape[0]
churn_rate = 1 - repeat_rate

In [None]:
# Profit Margin (profit_margin =  total_price * 0.10)
cltv_c['profit_margin'] = cltv_c['total_price'] * 0.10

In [None]:
# Customer Value
cltv_c['customer_value'] = (cltv_c['avg_order_value'] * cltv_c["purchase_frequency"]) / churn_rate

In [None]:
# Customer Lifetime Value (CLTV = (customer_value / churn_rate) x profit_margin)
cltv_c['cltv'] = cltv_c['customer_value'] * cltv_c['profit_margin']
cltv_c.head()

In [None]:
# But there is no scale for the output. So let's scale it 
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit(cltv_c[["cltv"]])
cltv_c["scaled_cltv"] = scaler.transform(cltv_c[["cltv"]])
cltv_c.sort_values(by="scaled_cltv", ascending=False).head()

In [None]:
# Segmentation based on scaled customer lifetime value
cltv_c["segment"] = pd.qcut(cltv_c["scaled_cltv"], 4, labels=["D", "C", "B", "A"])
cltv_c.head()

In [None]:
cltv_c[["total_transaction", "total_unit", "total_price", "cltv", "scaled_cltv"]].sort_values(by="scaled_cltv",ascending=False).head()

In [None]:
# Group by segments
cltv_c.groupby("segment")[["total_transaction", "total_unit", "total_price", "cltv", "scaled_cltv"]].agg(
    {"count", "mean", "sum"})

### Customer Lifetime Value Prediction

What is a customer worth? How many more times a customer will purchase before churning? How likely is he to churn within the next 3 months? And above all, how long should we expect a customer to be “alive” for?

In non-contractual business settings, where customers can end their relationship with a retailer at any moment and without notice, this can be even trickier.

Amazon for books (or any other of its product categories without subscription), Zalando for clothing, and Booking.com for hotels are all examples of non-contractual businesses settings. For all these three E-commerces we cannot look at the end date of a customer’s contract to know if he’s “alive” (will purchase in the future) or “dead” (will never purchase again). We can only rely on a customer’s past purchases and other less characterizing events (website visits, reviews, etc.).
But how do we decide in this scenario if a customer is going to come back or he’s gone for good?

- “Buy ‘Til You Die” probabilistic models help us in quantifying the lifetime value of a customer by assessing the expected number of his future transactions and his probability of being “alive”.


Customer Lifetime Value Prediction

- CLTV = Expected Number of Transaction * Expected Average Profit

- CLTV = BG/NBM Model * Gamma-Gamma Submodel


In [None]:
Image("../input/crmanalyticsfigures/CustomerLifetime.png")

In [None]:
Image("../input/crmanalyticsfigures/BuyTillYouDie.png")

#### BG/NBD Model

In particular, to predict future transactions the model treats the customer purchasing behaviour as a coin tossing game.

Each customer has 2 coins: a buy coin that controls the probability of a customer to purchase, and a die coin that controls the probability of a customer to quit and never purchase again.

##### Assumption 1: 

- while active, the number of transactions made by a customer follows a Poisson Process with transaction rate λ (=expected number of transactions in a time interval).

Poison Process:

A Poisson Process is a model for a series of discrete event where the average time between events is known, but the exact timing of events is random. The arrival of an event is independent of the event before (waiting time between events is memoryless). For example, suppose we own a website which our content delivery network (CDN) tells us goes down on average once per 60 days, but one failure doesn’t affect the probability of the next. All we know is the average time between failures. This is a Poisson process that looks like:

The important point is we know the average time between events but they are randomly spaced (stochastic). We might have back-to-back failures, but we could also go years between failures due to the randomness of the process.
A Poisson Process meets the following criteria (in reality many phenomena modeled as Poisson processes don’t meet these exactly):

- Events are independent of each other. The occurrence of one event does not affect the probability another event will occur.

- The average rate (events per time period) is constant.

- Two events cannot occur at the same time.

At every sub-period (1 month) of a specific time interval (12 months) each customer tosses his buy coin and, depending on the result, he purchases or not.
The number of transactions (heads) we observe in the period depends on each customer’s probability distribution around λ.
Let’s plot below a customer’s Poisson probability distribution to visualize what we just said.

In [None]:
#Poison Process
Image("../input/crmanalyticsfigures/Poison-Process.png")

##### Assumption 2

Heterogeneity in transaction rates among customers follows a Gamma distribution. This is equivalent to saying that each customer has its own buy coin (with its very own probability of head and tail).

__Gamma Distribution__

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are two different parameterizations in common use:

With a shape parameter k and a scale parameter θ.
With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.
In each of these forms, both parameters are positive real numbers.

The gamma distribution is the maximum entropy probability distribution (both with respect to a uniform base measure and with respect to a 1/x base measure) for a random variable X


In [None]:
Image("../input/crmanalyticsfigures/Gamma-Distrubution.png")

##### Assumption 3

After any transaction, a customer becomes inactive with probability p.
Therefore the point at which the customer “drops out” is distributed across transactions according to a (shifted) Geometric distribution.

After every transaction, each customer will toss the second coin, the die coin.
Given that p is the probability of “dying”, then we can define P(Alive) = 1-p.
Once again, let’s plot a random customer probability distribution to better grasp the meaning of this assumption.

Assuming that our customer becomes inactive with probability p = 0.52, then the probability that he becomes inactive after the 2nd transaction is 25%, and the probability that he becomes inactive after the 3rd transaction is 12%.
As you see the more the customer purchases the higher his probability of being alive.

In [None]:
Image("../input/crmanalyticsfigures/Geometric-Distribution.png")

In [None]:
Image("../input/crmanalyticsfigures/Shifted-.png")

##### Assumption 4

Heterogeneityin p follows a Beta Distribution.

As for the buy coin, each customer has his own die coin with its own probability of being alive after a specific amount of transactions.

Beta Distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution.

The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines.

In Bayesian inference, the beta distribution is the conjugate prior probability distribution for the Bernoulli, binomial, negative binomial and geometric distributions. The beta distribution is a suitable model for the random behavior of percentages and proportions.


[Wikipedia](https://en.wikipedia.org/wiki/Beta_distribution)

In [None]:
Image("../input/crmanalyticsfigures/Beta-Distribution.png")

##### Assumption 5

The transaction rate λ and the dropout probability p vary independently across customers.

##### Model Output

Eventually, by fitting the previously mentioned distributions on the historical customers data we are able to derive a model that for each customer provides:

- P(X(t) = x | λ, p)- the probability of observing x transactions in a time period of length t

- E(X(t) | λ, p)- the expected number of transactions in a time period of length t

- P(τ>t) - the probability of a customer becoming inactive at period τ

The fitted distributions parameters are then used in the forward-looking customer-base analysis to find the expected number of transactions in a future period of length t for an individual with past observed behavior defined by x, tₓ, T — where x = number of historical transactions, tₓ = time of last purchase and T = Age of a customer.

__The Shape of the Data:__

- Recency(derived from tx) : the age of the customer at the moment of his last purchase, which is equal to the duration between a customers first purchase and their last purchase

- Frequency(x): the number of periods in which the customer has made a repeat purchase

- Age of the customer(T): the age of the customer at the end of the period under study, which is equal to the duration between a customer's first purchase and the last day in the dataset.

In [None]:
Image("../input/crmanalyticsfigures/BG-NBD-Model-Formula.png")

In [None]:
df.head()

In [None]:
# Data Preprocessing
def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3 = dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit


def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

In [None]:
df.dropna(inplace=True)
df = df[~df["Invoice"].str.contains("C", na=False)]
df = df[df["Quantity"] > 0]
df = df[df["Country"] == "United Kingdom"]

replace_with_thresholds(df, "Quantity")
replace_with_thresholds(df, "Price")

# df.shape  # before: (541910, 10) after: (397925, 10)

df["TotalPrice"] = df["Quantity"] * df["Price"]

#### Time to Practice Customer Lifetime Prediction 🚀👨🏼‍💻

 - Recency(derived from tx) : the age of the customer at the moment of his last purchase, which is equal to the duration between a customers first purchase and their last purchase
 
- Frequency(x): the number of periods in which  the customer has made a repeat purchase

- Age of the customer(T): the age of the customer at the end of the period under study, which is equal to the duration between a customer's first purchase and the last day in the dataset.


In [None]:
today_date = dt.datetime(2011, 12, 11)

cltv_df = df.groupby('Customer ID').agg({'InvoiceDate': [lambda date: (date.max() - date.min()).days,
                                                        lambda date: (today_date - date.min()).days],
                                        'Invoice': lambda num: num.nunique(),
                                        'TotalPrice': lambda TotalPrice: TotalPrice.sum()})

cltv_df.columns = cltv_df.columns.droplevel(0)
cltv_df.columns = ['recency', 'T', 'frequency', 'monetary']
cltv_df["frequency"] = cltv_df["frequency"].astype(int)

In [None]:
# Let's assign monetary value to the mean value for each transaction frequency
cltv_df["monetary"] = cltv_df["monetary"] / cltv_df["frequency"]

# Picking the positive monetary value
cltv_df = cltv_df[cltv_df["monetary"] > 0]
cltv_df.head()

In [None]:
# Defining Recency value as weekly for BG/NB Model
cltv_df["recency"] = cltv_df["recency"] / 7
# Defining T value as weekly for BG/NB Model
cltv_df["T"] = cltv_df["T"] / 7
# Frequency needs to higher than 0
cltv_df = cltv_df[(cltv_df['frequency'] > 1)]

In [None]:
!pip install lifetimes

In [None]:
##############################################################
# BG/NBD Model and Gamma-Gamma Model
##############################################################
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter

bgf = BetaGeoFitter(penalizer_coef=0.001)

bgf.fit(cltv_df['frequency'],
        cltv_df['recency'],
        cltv_df['T'])

ggf = GammaGammaFitter(penalizer_coef=0.01)
ggf.fit(cltv_df['frequency'], cltv_df['monetary'])

In [None]:
cltv = ggf.customer_lifetime_value(bgf,
                                   cltv_df['frequency'],
                                   cltv_df['recency'],
                                   cltv_df['T'],
                                   cltv_df['monetary'],
                                   time=6,  # 6 months
                                   freq="W",  # T's frequency information.
                                   discount_rate=0.01)
cltv.head()

In [None]:
cltv = cltv.reset_index()
cltv.sort_values(by="clv", ascending=False).head(50)

cltv_final = cltv_df.merge(cltv, on="Customer ID", how="left")
cltv_final.sort_values(by="clv", ascending=False).head(10)

In [None]:
from sklearn.preprocessing import MinMaxScaler
# Standardization of CLTV
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit(cltv_final[["clv"]])
cltv_final["scaled_clv"] = scaler.transform(cltv_final[["clv"]])

# Sorting the  Scaled CLV results
cltv_final.sort_values(by="scaled_clv", ascending=False).head()

In [None]:
# define 4 different segment for the CLTV 6 months
cltv_final["expected_purc_1_month"] = bgf.predict(4*6, cltv_final['frequency'], cltv_final['recency'], cltv_final['T'])

cltv_final["expected_average_profit"] = ggf.conditional_expected_average_profit(cltv_final['frequency'],
                                       cltv_final['monetary'])

cltv_final.sort_values(by= "expected_purc_1_month", ascending=False).head(20)

In [None]:
# Defining 4 different segmentation
cltv_final["segment"] = pd.qcut(cltv_final["scaled_clv"], 4, labels=["D", "C", "B", "A"])
cltv_final.head()

cltv_final.sort_values(by="scaled_clv", ascending=False).head(50)

In [None]:
# Describing the segments:
cltv_final.groupby("segment").agg(
    {"count", "mean", "sum"})