https://analyzecore.com/2016/08/03/attribution-model-r-part-1/

As we know, a customer usually goes through a path/sequence of different channels/touchpoints before a purchase in e-commerce or conversion in other areas. In Google Analytics we can find some touchpoints more likely to assist to conversion than others that more likely to be last-click touchpoint.

As most of the channels are paid for (in terms of money or time spent), it is vital to have an algorithm for distributing conversions and the value between those channels and compare with their costs instead of crediting e.g. last non-direct channel only. This is a Multi-Channel Attribution Model problem.

A definition by Google Analytics helps: an Attribution Model is a rule, or set of rules, that determines how credit for sales and conversions is assigned to touchpoints in conversion paths.

Nowadays, Google Analytics provides seven (!) predefined attribution models and even a custom model that you can adapt to your case. However, there are some aspects that I don’t like about the Google Analytics approach, which is why I started research on this area. I’m sure this is a very interesting field for analysts and marketers. I’m going to publish a sequence of posts about alternative (relatively to Google Analytics) Attribution Model concepts, some ideas for solving issues that you would face in practice when implementing them, and R code for computing them (as always).

What I don’t like about the GA approach:
* You have to make a choice or managerial decision regarding which model to use and why. You can see different results with different models but which one is more correct? In other words, GA provides heuristic models with their pros and cons,
* The data are aggregated and anonymized and you can’t mine deeper if you want,
* You can’t take into account paths without conversions but this would be interesting.

Pros of GA:

* You don’t need to organize a storage and infrastructure for collecting data,
* You are provided with a range of heuristic models,
* It is pretty easy and free to use.

Therefore, if you are relatively small company it would be logical to use the GA’s approach but if you see the results of attribution would have a significant impact on marketing budgets, product prices, understanding customer journeys, etc. or you have the necessary data collected, you can explore ideas that I’m going to share.

I focused on the Markov chains concept for attribution in this article mainly. In the second post of the series, we will study practical aspects of its implementation.

# ATTRIBUTION MODEL BASED ON MARKOV CHAINS CONCEPT

Using Markov chains allow us to switch from heuristic models to probabilistic ones. We can represent every customer journey (sequence of channels/touchpoints) as a chain in a directed Markov graph where each vertex is a possible state (channel/touchpoint) and the edges represent the probability of transition between the states (including conversion.) By computing the model and estimating transition probabilities we can attribute every channel/touchpoint.

Let’s start with a simple example of the first-order or “memory-free” Markov graph for better understanding the concept. It is called “memory-free” because the probability of reaching one state depends only on the previous state visited.

For instance, customer journeys contain three unique channels C1, C2, and C3. In addition, we should manually add three special states to each graph: (start), (conversion) and (null). These additional states represent starting point, purchase or conversion, and unsuccessful conversion. Transitions from identical channels are possible (e.g. C1 -> C1) but can be omitted for different reasons.

Let’s assume we have three customer journeys:

* C1 -> C2 -> C3 -> purchase
* C1 -> unsuccessful conversion
* C2 -> C3 -> unsuccessful conversion

Due to the approach, we will add extra states (see column 2 of the following table) and split for pairs (see column 3):


