The input to this forecast consists of :
1. a csv file (newStartCustomerForecast.csv) that has a series of tuples - start-month and customer-count-forecast for each month In other words: The forecast of customer count for the 1st month for each cohort of customers in an upcoming period 
    
2. The churn rate of actual customer retention count for each month in a past period. The data supplied to test both cases involve a period of 12 months. This data is read from Pod#UnitCustomerLTV

The outputs involve calculating the forecast for each subsequent month of a cohort's existence beginning from its (cohort's that is) 1st month to the last month in the period. Each forecast involves the formula: (1+applicable churn rate for the month of existence) * the forecasted customer count for the previous month). This logic is applied for each of the cohorts.

In [1]:
newStartCustomerForecastInputFile = "../csv/newStartCustomerForecast.csv" # input to start notebook operations

The second input to this notebook operation is the monthly churn rate array variable; read from Pod#UnitCustomerLTV, the source of truth for that data. Here's how it would typically look:
monthlyChurnRates =[-0.30, -0.14, -0.17, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

In [3]:
#read in the monthly churn rates from pod
unitCusLTVPodName= "UnitCustomerLTV";  # name of related pod

import sys; sys.path.insert(0, '../pythonLib')
from LexoOperations import  getPod, savePod

pod = getPod(unitCusLTVPodName)
customerActualsMonthlyChurnRate = pod['vars']['monthly-churn-rate']

#insure that type is float since we are doing calculations with it and Python raises error if it is type 'str'
mcr = []
for item in customerActualsMonthlyChurnRate:
    #set number of decimal points to 2 and make it a float from string
    if (isinstance(item,str)) :
        float_item = float('%.2f'%float(item))
        mcr.append(float_item) 
    else:
        mcr.append(item)
customerActualsMonthlyChurnRate = mcr
print(customerActualsMonthlyChurnRate)

[-0.3, -0.14, -0.17, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


Read the new starting customer forecast csv, normalize the data read, and convert it to 2 companion arrays - startMonthsForForecast and initialCusCountForecast

In [4]:
import csv
def readForecastData(filePath) :

    with open(filePath, 'r') as csvfile:
        datareader = csv.reader(csvfile);
        head = next(datareader); # skip header line

        startMonthsForForecast = [];
        initialCusCountForecast = [];
        for col0, col1 in datareader:
            startMonthsForForecast.append(col0);
            initialCusCountForecast.append(col1);
    return startMonthsForForecast,initialCusCountForecast


In [5]:
startMonthsForForecast,initialCusCountForecast = readForecastData(newStartCustomerForecastInputFile);
print(startMonthsForForecast,initialCusCountForecast)

['2021-01-01', '2021-02-01', '2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01', '2021-09-01', '2021-10-01', '2021-11-01', '2021-12-01'] ['100', '110', '120', '130', '140', '140', '140', '140', '140', '140', '140', '140']


Input to this notebook operation : An array of cohort begin dates, and a matching companion array that provides the forecasted customer count for each cohort. In other words, the initial forecast for each cohort.

In [111]:
#This is to illustrate how the arrays are set up
#startMonthsForForecast=["2021-01-01","2021-02-01","2021-03-01","2021-04-01","2021-05-01","2021-06-01",
#                        "2021-07-01","2021-08-01","2021-09-01","2021-10-01","2021-11-01","2021-12-01"];
#initialCusCountForecast=[100,110,120,130,140,140,140,140,140,140,140,140]

The output of this notebook operation is to  generate forecast data that is used to populate an already existing  CustomerForecast table (empty or not)

The basic task is to generate the data to populate the CustomerForecast bean. The steps are:

    1. Normalize the inception input data provided as input to the notebook
    
    2. Calculate the monthly forecast for each month of a cohort's existence, encompassing all cohorts; the output generated is normalized (i.e  a table with columns start-month, active-month, and forecast of customer count); the monthly churn rate data us used in this process
    
    3. Aggregate the two tables from each of the above 2 steps
    
    4. Sort the aggregated normalized data by start-month, active-month
    
    5. Data then is prepared for populating the CustomerForecast bean; the normalized data is pivoted to have start-month as the column axis, and active-month as the row axis; the data generated is for each intersection of start-month and active-month
    
    6. Data generated is an array of  items with each item consisting of the symbolicReference to a cell intersection, and the calculated customer forecast (i.e {'symRef': '2021-01-01?2021-01-01', 'data': 100.0} ; some items include a formula (instead of calculated data when aggregation is involved ) 

    7. The Customer Forecast pod is assumed to have already exist in the repository 
    
    8. The pod is fetched , and the data generated from step 6 is used to find the cells of a the pod and update it.
    
    9. The CustomerForecast pod is then saved to the repository

In [6]:
cusForecastPodName = "CustomerForecast";
import sys; sys.path.insert(0, '../pythonLib')
from customerCohortOperations import prepareDataForPopulatingActualsOrForecastPod, updatePodObj


In [7]:
#Normalize the input data; the start-month also becomes the active-month; so we have a 1 row table
# with columns start-month, active-month,cus-count; jsonify it 
normalizedForecastDataAtInception =[]

def normalizeForecastData (startMonthsForForecast, initialCusCountForecast) :
    normalizedCustomerForecastData=[] # ret Value
    index = 0;
    for startMonth in startMonthsForForecast :
        activeMonth = startMonth; #startMonth also becomes the activeMonth
        cusCount = initialCusCountForecast[index]
        index = index+1;
        row = {"start-month":startMonth, "active-month":activeMonth, "cus-count":cusCount}
        normalizedCustomerForecastData.append(row);
    return   normalizedCustomerForecastData;

normalizedForecastDataAtInception = normalizeForecastData(startMonthsForForecast, initialCusCountForecast);

In [8]:
#Uncomment for debugging
#print(normalizedForecastDataAtInception)

In [9]:
from IPython.display import JSON
import requests
import pandas as pd
import  json
import numpy as np
from datetime import datetime
from dateutil.relativedelta import relativedelta

In [10]:
def getCusCountFromDataFrameGivenActiveMonthAndStartMonth(normalizedDF, activeMonth, startMonth,):
    df = normalizedDF;
    cusCount = float("NAN");
    #print("inside getCusCountFromDataFrameGivenActiveMonthAndStartMonth:", activeMonth, startMonth)
    selectedRow = df.loc[(df['active-month'] == activeMonth) & (df['start-month'] == startMonth)]  # returns a dataframe with that one row
    if selectedRow.empty :
        pass
    else:
        cusCount =  selectedRow.iloc[0,2]  #1st index is row selector, 2nd index is col selector (cus-count is at index 2
    return cusCount

In [11]:
def calculateMonthlyForecast (priorYearchurnRates, normalizedForecastAtInceptionDF,  activeMonths, startMonths ):
    #returns normalized forecast info 
    firstPeriodDate = activeMonths[0];
    lastPeriodDate = activeMonths[-1];
    normalizedForcastInfo = [];
    cohortId = -1;
    #print(normalizedDF)
    
    for startMonth in startMonths:

        #For the case where startMonth == start of last period date, the forecast has already been provided
        # as input; forecast for subsequent month(s) is not involved, and so case of > is also has
        #to be by-passed
        if (startMonth >= str(lastPeriodDate)):
            continue;  # bypass
        prevPeriodCusCount = None;
        #assumed that forecast for initial month  is the only data available for each cohort;
        # subsequent month forecast values are calculated based on monthly churn rate provided
        forecastCusCount = getCusCountFromDataFrameGivenActiveMonthAndStartMonth(normalizedForecastAtInceptionDF,startMonth,startMonth);
        #print(startMonth, forecastCusCount)
        if pd.isna(forecastCusCount):
            continue
        else:
            prevPeriodCusCount = forecastCusCount;

        periodIndex=0;
        for activeMonth in activeMonths:
                               
            if (activeMonth <= startMonth):
                continue;
            
            periodIndex = periodIndex + 1;
           
            applicableChurnRate = priorYearchurnRates[periodIndex-1];
            #print('startMonth:', startMonth, 'activeMonth:', activeMonth, 'prevPeriodCusCount: ',prevPeriodCusCount, ' applicableChurnRate:',applicableChurnRate)
            #
            forecastCusCount = (1+applicableChurnRate)* float(prevPeriodCusCount);
            forecastInfoForPeriod = {"start-month": startMonth, "active-month":activeMonth,"cus-count":forecastCusCount}
           
            #print(forecastInfoForPeriod)
            prevPeriodCusCount = forecastCusCount;
            normalizedForcastInfo.append(forecastInfoForPeriod);
            

    return normalizedForcastInfo;


In [119]:
def extractStartMonthsAndActiveMonths(normalizedInitialForecastCustomerCountJsonData):
    startMonths=[];
    activeMonths=[];
    for customerForecastRecord in normalizedInitialForecastCustomerCountJsonData:
        startMonth = customerForecastRecord["start-month"]
        activeMonth = customerForecastRecord["active-month"]
        cusCount=customerForecastRecord["cus-count"]
        if startMonth not in startMonths:
            startMonths.append(startMonth)
        if activeMonth not in activeMonths:
            activeMonths.append(activeMonth)
                
    retVal = { "startMonths":startMonths, "activeMonths":activeMonths}     
    return retVal;

In [120]:
retVal = extractStartMonthsAndActiveMonths(normalizedForecastDataAtInception)
startMonths =retVal["startMonths"]
activeMonths = retVal["activeMonths"]

In [121]:
normalizedForecastDataAtInceptionDF = pd.DataFrame(normalizedForecastDataAtInception)
normalizedForecastData = calculateMonthlyForecast(customerActualsMonthlyChurnRate, normalizedForecastDataAtInceptionDF,startMonths,activeMonths)

In [122]:
for entry in normalizedForecastDataAtInception:
    normalizedForecastData.append(entry)
normalizedForecastDataDF = pd.DataFrame(normalizedForecastData) ;

normalizedForecastDataDF = normalizedForecastDataDF.sort_values(by=['start-month','active-month'])
normalizedForecastDataDF = normalizedForecastDataDF.reset_index(drop=True)
normalizedForecastDataDF = normalizedForecastDataDF.round(2)
print(normalizedForecastDataDF)

   start-month active-month cus-count
0   2021-01-01   2021-01-01       100
1   2021-01-01   2021-02-01      70.0
2   2021-01-01   2021-03-01      60.2
3   2021-01-01   2021-04-01    49.966
4   2021-01-01   2021-05-01    49.966
..         ...          ...       ...
73  2021-10-01   2021-11-01      98.0
74  2021-10-01   2021-12-01     84.28
75  2021-11-01   2021-11-01       140
76  2021-11-01   2021-12-01      98.0
77  2021-12-01   2021-12-01       140

[78 rows x 3 columns]


In [123]:
podDataForForecast = prepareDataForPopulatingActualsOrForecastPod(normalizedForecastDataDF);
#print ("printing podData")
#for item in podDataForForecast :
    #print(item)
    
##sample item: {'symRef': '2021-01-01?2021-01-01', 'data': 100.0}    

In [124]:
forecastPod = getPod(cusForecastPodName)


In [125]:
updatePodObj(forecastPod,podDataForForecast)
#JSON(forecastPod)

In [126]:
savedPod = savePod(forecastPod)