# Markow Chains with sales data.

### Load data.

In [1]:
data = read.delim(file = 'purchases.txt', header = FALSE, sep = '\t', dec = '.')

## Adding headers, set last column as a date and extract year of purchase.

In [2]:
colnames(data) = c('customer_id', 'purchase_amount', 'date_of_purchase')
data$date_of_purchase = as.Date(data$date_of_purchase, "%Y-%m-%d")
data$year_of_purchase = as.numeric(format(data$date_of_purchase, "%Y"))
data$days_since       = as.numeric(difftime(time1 = "2016-01-01",
                                            time2 = data$date_of_purchase,
                                            units = "days"))

In [8]:
head(data)

customer_id,purchase_amount,date_of_purchase,year_of_purchase,days_since
760,25,2009-11-06,2009,2247.3333
860,50,2012-09-28,2012,1190.3333
1200,100,2005-10-25,2005,3720.3333
1420,50,2009-07-09,2009,2367.3333
1940,70,2013-01-25,2013,1071.3333
1960,40,2013-10-29,2013,794.3333


### KPIs for customers in 2015.

In [4]:
library(dplyr)

customers_2015 <- data %>% group_by(customer_id) %>% 
                  summarize(   # creates new variables:
                    recency = min(days_since),
                    first_purchase = max(days_since),
                    frequency = n(),
                    amount = mean(purchase_amount)  ) 

head(customers_2015)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



customer_id,recency,first_purchase,frequency,amount
10,3829.3333,3829.333,1,30.0
80,343.3333,3751.333,7,71.42857
90,758.3333,3783.333,10,115.8
120,1401.3333,1401.333,1,20.0
130,2970.3333,3710.333,2,50.0
160,2963.3333,3577.333,2,30.0


### Segmenting customers in 2015.

#### These are the segments we define:

* inactive.
* cold.
* warm: new warm, warm high value, warm low value.
* active: new active, active high value, active low value.

In [9]:
customers_2015$segment = "NA"
customers_2015$segment[which(customers_2015$recency > 365*3)] = "inactive"
customers_2015$segment[which(customers_2015$recency <= 365*3 & customers_2015$recency > 365*2)] = "cold"
customers_2015$segment[which(customers_2015$recency <= 365*2 & customers_2015$recency > 365*1)] = "warm"
customers_2015$segment[which(customers_2015$recency <= 365)] = "active"
customers_2015$segment[which(customers_2015$segment == "warm" & customers_2015$first_purchase <= 365*2)] = "new warm"
customers_2015$segment[which(customers_2015$segment == "warm" & customers_2015$amount < 100)] = "warm low value"
customers_2015$segment[which(customers_2015$segment == "warm" & customers_2015$amount >= 100)] = "warm high value"
customers_2015$segment[which(customers_2015$segment == "active" & customers_2015$first_purchase <= 365)] = "new active"
customers_2015$segment[which(customers_2015$segment == "active" & customers_2015$amount < 100)] = "active low value"
customers_2015$segment[which(customers_2015$segment == "active" & customers_2015$amount >= 100)] = "active high value"

#### Rearranging segments:

In [10]:
customers_2015$segment <- factor(x = customers_2015$segment, 
                                levels = c("inactive", "cold",
                            "warm high value", "warm low value", "new warm",
                            "active high value", "active low value", "new active"))

table(customers_2015$segment)


         inactive              cold   warm high value    warm low value 
             9158              1903               119               901 
         new warm active high value  active low value        new active 
              938               573              3313              1512 

### KPIs for customers in 2014.