# Predicting Customer Lifecycle Value

Customer Lifetime value (CLV) is a prediction of the net profit attribute to the entire future relationship with a customer. [SOURCE](https://en.wikipedia.org/wiki/Customer_lifetime_value)

In the following use case, CLV is calculated by first obtaining the RFTM parameters (recency - R, frequency - F, age - T and monetary value - M)  per unique consumer and merchant.
The calculation of the above stated consumer parameters is done on the transaction dataset by using the following formulas: 

__Recency (difference in days between first and last transaction date):__
> R(t) = last_trx_date - first_trx_date

__Frequency (count of the transactions per customer up to time _t_):__
> Xi(t) = nb_trx_days_i-1  
>>nb_trx_days - # of repeated transactions -1, up to time t. 1 is subtracted since only the repeated customers should be considered for training the prob. model

__Age (age of a customer at a specified time _t_ is the difference in days between t and his first transaction date):__
> T(t) = t - first_trx_date

__Monetary value (value of a customer _i_ at the time _t_ is equal to his total spending at that time divided by the number of days this customer has at least one transacion):__
> M = total_spending(t) / (Xi(t)+1)

In [1]:
import pandas as pd
import numpy as np
import lifetimes 
import warnings
warnings.filterwarnings('ignore')

In [2]:
inputDF=pd.read_csv("query-select-data.csv", sep=';', delimiter=None, header='infer')
inputDF.head()

Unnamed: 0,consumer_id,merchant_id,log_date,first_trx_date,last_trx_date,total_spending_eur,beginning_period,end_period,recency,frequency,age_days,monetary_value
0,1,1,2020-01-15 13:52:04.841000000,05.12.19 23:00,05.12.19 23:00,200.0,31.12.05 23:00,31.01.20 23:00,2,1,57,100.0
1,2,1,2020-01-09 14:47:48.845000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0
2,2,1,2020-01-14 07:59:40.051000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0
3,2,1,2020-01-14 16:47:22.672000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0
4,2,1,2020-01-14 16:53:53.176000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0


In [3]:
inputDF.columns.to_list()

['consumer_id',
 'merchant_id',
 'log_date',
 'first_trx_date',
 'last_trx_date',
 'total_spending_eur',
 'beginning_period',
 'end_period',
 'recency',
 'frequency',
 'age_days',
 'monetary_value']

In [4]:
def get_variables():
    frequency = np.asarray(inputDF['frequency']).astype(float)
    monetary_value = np.asarray(inputDF['monetary_value']).astype(float)
    age_days = np.asarray(inputDF['age_days'])
    recency = np.asarray(inputDF['recency'])
    
    repeat_cust = inputDF[inputDF['frequency'] > 0]
    frequency_rep = np.asarray(repeat_cust['frequency']).astype(float)
    monetary_value_rep = np.asarray(repeat_cust['monetary_value']).astype(float)
    return {'frequency': frequency,
            'monetary_value': monetary_value,
            'age_days': age_days,
            'recency': recency,
            'frequency_rep': frequency_rep,
            'monetary_value_rep': monetary_value_rep, 
            'repeat_cust': repeat_cust
           }

In [5]:
###Checking
print(get_variables()['monetary_value'])

[100.    250.    250.    ... 333.333 250.    250.   ]


In [6]:
def train_regularized_ggf(x = get_variables()):
    for c in range(0, 10):
        c = (10**c)*1.e-10
        try:

            ggf = lifetimes.GammaGammaFitter(penalizer_coef=c)
            ggf.fit(x['frequency_rep'], x['monetary_value_rep'])
            p,q,v = ggf._unload_params('p', 'q', 'v')
            if q>0:
                print('Model trained with penalizer coeficient:',c)
                return ggf

        except lifetimes.utils.ConvergenceError:
            continue

    raise Exception('GGF model could not be trained.')

In [7]:
def ggf_fit():
    x=get_variables()
    try:
        ggf = lifetimes.GammaGammaFitter(penalizer_coef=0)
        ggf.fit(x['frequency_rep'], x['monetary_value_rep'])
    except lifetimes.utils.ConvergenceError:
        ggf = train_regularized_ggf(x = get_variables())
    return ggf


In [8]:
print(ggf_fit())

      fun: 5.3499040142123535
 hess_inv: array([[ 1.52231460e+00, -2.56034465e+01, -2.71529338e+01],
       [-2.56034464e+01,  2.96401690e+07,  2.96401959e+07],
       [-2.71529338e+01,  2.96401959e+07,  2.96402245e+07]])
      jac: array([ 3.83068064e-05, -3.93306083e-05,  3.92920285e-05])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 67
      nit: 41
     njev: 58
   status: 2
  success: False
        x: array([ 3.10820261, 20.03651471, 22.5872538 ])
Model trained with penalizer coeficient: 1e-10
<lifetimes.GammaGammaFitter: fitted with 20912 subjects, p: 24.03, q: 381.37, v: 4523.34>


In [9]:
#ggf.save_model('ggf.pkl')

In [10]:
#ggf_loaded = lifetimes.GammaGammaFitter()
#ggf_loaded.load_model('ggf.pkl')
#ggf_loaded

In [11]:
def bgf_fit():
    x = get_variables()
    bgf = lifetimes.BetaGeoFitter(penalizer_coef=0.1)
    try:
        bgf.fit(x['frequency'], x['recency'],x['age_days'],maxiter=10000,tol=1e-6, verbose=True)
    except lifetimes.utils.ConvergenceError:
        print('ConvergenceError')
    return bgf


In [12]:
print(bgf_fit())

Optimization terminated successfully.
         Current function value: -0.191568
         Iterations: 13
         Function evaluations: 14
         Gradient evaluations: 14
<lifetimes.BetaGeoFitter: fitted with 20912 subjects, a: 0.79, alpha: 6.50, b: 0.29, r: 1.17>


In [13]:
bgf=bgf_fit()
r, alpha, a, b = bgf._unload_params('r', 'alpha', 'a', 'b')

Optimization terminated successfully.
         Current function value: -0.191568
         Iterations: 13
         Function evaluations: 14
         Gradient evaluations: 14


In [14]:
bgf.save_model('bgf.pkl')

In [15]:
bgf_loaded = lifetimes.BetaGeoFitter()
bgf_loaded.load_model('bgf.pkl')
bgf_loaded

<lifetimes.BetaGeoFitter: fitted with 20912 subjects, a: 0.79, alpha: 6.50, b: 0.29, r: 1.17>

In [16]:
def get_p_alive():
    x = get_variables()
    bgf = bgf_fit()
    p_alive = bgf.conditional_probability_alive(x['frequency'], x['recency'],x['age_days'])
    return p_alive
print(get_p_alive())

Optimization terminated successfully.
         Current function value: -0.191568
         Iterations: 13
         Function evaluations: 14
         Gradient evaluations: 14
[0.00461195 0.00461195 0.00461195 ... 0.04298458 0.00886281 0.00886281]


In [17]:
help(lifetimes.GammaGammaFitter.conditional_expected_average_profit)

Help on function conditional_expected_average_profit in module lifetimes.fitters.gamma_gamma_fitter:

conditional_expected_average_profit(self, frequency=None, monetary_value=None)
    Conditional expectation of the average profit.
    
    This method computes the conditional expectation of the average profit
    per transaction for a group of one or more customers.
    
    Parameters
    ----------
    frequency: array_like, optional
        a vector containing the customers' frequencies.
        Defaults to the whole set of frequencies used for fitting the model.
    monetary_value: array_like, optional
        a vector containing the customers' monetary values.
        Defaults to the whole set of monetary values used for
        fitting the model.
    
    Returns
    -------
    array_like:
        The conditional expectation of the average profit per transaction



In [18]:
inputDF['p_alive'] = get_p_alive()
inputDF.head()

Optimization terminated successfully.
         Current function value: -0.191568
         Iterations: 13
         Function evaluations: 14
         Gradient evaluations: 14


Unnamed: 0,consumer_id,merchant_id,log_date,first_trx_date,last_trx_date,total_spending_eur,beginning_period,end_period,recency,frequency,age_days,monetary_value,p_alive
0,1,1,2020-01-15 13:52:04.841000000,05.12.19 23:00,05.12.19 23:00,200.0,31.12.05 23:00,31.01.20 23:00,2,1,57,100.0,0.004612
1,2,1,2020-01-09 14:47:48.845000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612
2,2,1,2020-01-14 07:59:40.051000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612
3,2,1,2020-01-14 16:47:22.672000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612
4,2,1,2020-01-14 16:53:53.176000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612


In [19]:
help(lifetimes.GammaGammaFitter.conditional_expected_average_profit)

Help on function conditional_expected_average_profit in module lifetimes.fitters.gamma_gamma_fitter:

conditional_expected_average_profit(self, frequency=None, monetary_value=None)
    Conditional expectation of the average profit.
    
    This method computes the conditional expectation of the average profit
    per transaction for a group of one or more customers.
    
    Parameters
    ----------
    frequency: array_like, optional
        a vector containing the customers' frequencies.
        Defaults to the whole set of frequencies used for fitting the model.
    monetary_value: array_like, optional
        a vector containing the customers' monetary values.
        Defaults to the whole set of monetary values used for
        fitting the model.
    
    Returns
    -------
    array_like:
        The conditional expectation of the average profit per transaction



In [20]:
def get_p_frequency():
    x = get_variables()
    bgf = bgf_fit()
    p_frequency = bgf.conditional_expected_number_of_purchases_up_to_time(365, x['frequency'], x['recency'],x['age_days'])
    return p_frequency

In [21]:
inputDF['p_frequency'] = get_p_frequency()
inputDF.head()

Optimization terminated successfully.
         Current function value: -0.191568
         Iterations: 13
         Function evaluations: 14
         Gradient evaluations: 14


Unnamed: 0,consumer_id,merchant_id,log_date,first_trx_date,last_trx_date,total_spending_eur,beginning_period,end_period,recency,frequency,age_days,monetary_value,p_alive,p_frequency
0,1,1,2020-01-15 13:52:04.841000000,05.12.19 23:00,05.12.19 23:00,200.0,31.12.05 23:00,31.01.20 23:00,2,1,57,100.0,0.004612,0.017941
1,2,1,2020-01-09 14:47:48.845000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941
2,2,1,2020-01-14 07:59:40.051000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941
3,2,1,2020-01-14 16:47:22.672000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941
4,2,1,2020-01-14 16:53:53.176000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941


In [22]:
###Q: Fo you fit above p_monetary on the repeated f and m_v or complete data?

In [23]:
def get_p_monetary():
    x = get_variables()
    ggf=ggf_fit()
    p_monetary= ggf.conditional_expected_average_profit(x['frequency'], x['monetary_value'])
    return p_monetary

In [24]:
inputDF['p_monetary'] = get_p_monetary()
inputDF.head()

      fun: 5.3499040142123535
 hess_inv: array([[ 1.52231460e+00, -2.56034465e+01, -2.71529338e+01],
       [-2.56034464e+01,  2.96401690e+07,  2.96401959e+07],
       [-2.71529338e+01,  2.96401959e+07,  2.96402245e+07]])
      jac: array([ 3.83068064e-05, -3.93306083e-05,  3.92920285e-05])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 67
      nit: 41
     njev: 58
   status: 2
  success: False
        x: array([ 3.10820261, 20.03651471, 22.5872538 ])
Model trained with penalizer coeficient: 1e-10


Unnamed: 0,consumer_id,merchant_id,log_date,first_trx_date,last_trx_date,total_spending_eur,beginning_period,end_period,recency,frequency,age_days,monetary_value,p_alive,p_frequency,p_monetary
0,1,1,2020-01-15 13:52:04.841000000,05.12.19 23:00,05.12.19 23:00,200.0,31.12.05 23:00,31.01.20 23:00,2,1,57,100.0,0.004612,0.017941,274.689675
1,2,1,2020-01-09 14:47:48.845000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941,283.601721
2,2,1,2020-01-14 07:59:40.051000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941,283.601721
3,2,1,2020-01-14 16:47:22.672000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941,283.601721
4,2,1,2020-01-14 16:53:53.176000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941,283.601721


In [25]:
def get_CLV():
    p_alive=get_p_alive()
    p_frequency=get_p_frequency()
    p_monetary=get_p_monetary()
    predictedCLV = p_alive * p_frequency* p_monetary
    return predictedCLV

#inputDF['predicted_monetery'] = get_CLV()
#inputDF.apply(lambda x: ggf.conditional_expected_average_profit(x.frequency,x.monetary_value), axis=1)

In [26]:
inputDF['predictedCLV'] = get_CLV()
inputDF.head()

Optimization terminated successfully.
         Current function value: -0.191568
         Iterations: 13
         Function evaluations: 14
         Gradient evaluations: 14
Optimization terminated successfully.
         Current function value: -0.191568
         Iterations: 13
         Function evaluations: 14
         Gradient evaluations: 14
      fun: 5.3499040142123535
 hess_inv: array([[ 1.52231460e+00, -2.56034465e+01, -2.71529338e+01],
       [-2.56034464e+01,  2.96401690e+07,  2.96401959e+07],
       [-2.71529338e+01,  2.96401959e+07,  2.96402245e+07]])
      jac: array([ 3.83068064e-05, -3.93306083e-05,  3.92920285e-05])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 67
      nit: 41
     njev: 58
   status: 2
  success: False
        x: array([ 3.10820261, 20.03651471, 22.5872538 ])
Model trained with penalizer coeficient: 1e-10


Unnamed: 0,consumer_id,merchant_id,log_date,first_trx_date,last_trx_date,total_spending_eur,beginning_period,end_period,recency,frequency,age_days,monetary_value,p_alive,p_frequency,p_monetary,predictedCLV
0,1,1,2020-01-15 13:52:04.841000000,05.12.19 23:00,05.12.19 23:00,200.0,31.12.05 23:00,31.01.20 23:00,2,1,57,100.0,0.004612,0.017941,274.689675,0.022729
1,2,1,2020-01-09 14:47:48.845000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941,283.601721,0.023467
2,2,1,2020-01-14 07:59:40.051000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941,283.601721,0.023467
3,2,1,2020-01-14 16:47:22.672000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941,283.601721,0.023467
4,2,1,2020-01-14 16:53:53.176000000,05.12.19 23:00,05.12.19 23:00,500.0,31.12.05 23:00,31.01.20 23:00,2,1,57,250.0,0.004612,0.017941,283.601721,0.023467
