Data is added and read, then sorted below, by unique identifier (Cookie ID / Softrip ID) and chronologically

In [1]:
import pandas as pd
import numpy as np
from collections import defaultdict


df = pd.read_csv('attribution data.csv')

df = df.sort_values(['cookie', 'time'],
                                    ascending=[False, True])

df['visit_order'] = df.groupby('cookie').cumcount() + 1



Next a sequential counter column is added to count the actions for each unique cookie

The data is still arranged in rows for each action, we need to construct pathways for each unique cookie (conversion of data frame from long-form to wide-form like a pivot).
After this a 'null' or 'conversion' needs to be added to determine whether a conversion (booking/lead etc.) has been completed.

Separate out multi bookers

In [2]:
cookietodrop = df.groupby('cookie')['conversion'].sum().reset_index()

ctd = cookietodrop[cookietodrop['conversion'] >1]

ctd = ctd.drop(columns = ['conversion'])

ctd

ctdl = ctd.values.tolist()

dfmulti = df[df['cookie'].isin(ctdl)]

df = df[~df['cookie'].isin(ctdl)]

Aggregate rows to pathways by unique cookie ID

In [3]:
df_paths = df.groupby('cookie')['channel'].aggregate(
    lambda x: x.unique().tolist()).reset_index()

df_last_interaction = df.drop_duplicates('cookie', keep='last') [['cookie', 'conversion', 'conversion_value']]

df_paths = pd.merge(df_paths, df_last_interaction, how='left', on='cookie')

Formatting step to drop unrequired columns and ensure other columns / dataframe are correct. 
Also add null column to help produce accurate weighting.

In [4]:
patheroo = df_paths[['channel', 'conversion', 'conversion_value']].copy()
patheroo['nulls'] = (patheroo['conversion'] <1)
patheroo['nulls'] = patheroo['nulls'].astype(int)
df = list(patheroo.itertuples(index=False, name=None))

Additional Formatting Step to make data frames suitable for modelling.
Also create two data frames, one for multichannel pathways, and one for single channel pathways.
Single channel pathways are not suitable for the modelling process, and need to be added back into the calcualtions after the fact.

In [5]:
from pandas import DataFrame
df = DataFrame(df, columns=(['path'], ['conversion'], ['conversion_value'], ['null']))
df.columns = ['Path', 'Conversion', 'Conversion_Value', 'Null']
df['pathstring'] = [', '.join(map(str, l)) for l in df['Path']] 
df_single = df[~df['pathstring'].str.contains(',', na= False)]
df_multi = df[df['pathstring'].str.contains(',', na= False)]
df_single.to_csv('removesinglechann.csv')

Import and define markov model, 1st Order

In [6]:
from pychattr.channel_attribution import MarkovModel
mm = MarkovModel
mm = mm(path_feature = 'pathstring', 
        conversion_feature='Conversion', 
        null_feature='Null', 
        cost_feature='Conversion_Value', 
        separator=',', 
        k_order=1, 
        n_simulations=10000, 
        max_steps=None, 
        return_transition_probs=True,
        )

Fit 1st Order Model

In [7]:
mm.fit(df_multi)
mm.attribution_model_.to_csv('Conversions.csv')
mm.removal_effects_.to_csv('RemovalEffect.csv')


In [8]:
import seaborn as sns
import matplotlib.pyplot as plt

heatmapdata = (mm.transition_matrix_)


In [9]:
heatmapdata

Unnamed: 0,channel_from,channel_to,transition_probability
0,(start),"Instagram,Online Display",0.009985
1,(start),"Instagram,Facebook",0.167005
2,(start),"Online Display,Paid Search",0.062728
3,(start),"Online Video,Facebook",0.035121
4,(start),"Paid Search,Online Video",0.039080
...,...,...,...
115,"Online Video,Online Display",(null),0.753548
116,"Online Video,Online Display","Online Display,Paid Search",0.068387
117,"Online Video,Online Display",(conversion),0.072258
118,"Online Video,Online Display","Online Display,Instagram",0.025806


In [None]:
hmpd = heatmapdata.pivot(index='channel_from', columns='channel_to', values='transition_probability')
fig, ax = plt.subplots(figsize=(12,8))
ax = sns.heatmap(hmpd, cmap='Greens', annot=True, vmin=0, vmax=1)
ax.set_title('Customer Journey Transition Correlations')
fig.savefig('Heatmap.pdf')