## Problem

Analyze the multi-touch attribution using the campaign_touchpoints.csv dataset.

The goal is to attribute the conversions to different marketing touchpoints effectively.

1) Implement a linear attribution model where each touchpoint shares equal credit for a conversion.

2) Calculate the attributed conversions for each channel.

3) Calculate the attributed conversions for each campaign.

<br>

---

# Solution

In [1]:
# import libraries
import pandas as pd
import numpy as np

## Data Overview

In [2]:
# Read the data
df = pd.read_csv('data/campaign_touchpoints.csv')

In [4]:
# Preview the data
df

Unnamed: 0,user_id,campaign_id,touchpoint_id,conversion
0,31,Campaign_3,Search Ad,1
1,79,Campaign_1,Display Ad,0
2,51,Campaign_1,Email,0
3,14,Campaign_2,Social Media,1
4,67,Campaign_2,Search Ad,1
...,...,...,...,...
995,76,Campaign_2,Search Ad,1
996,79,Campaign_2,Display Ad,0
997,49,Campaign_2,Social Media,1
998,35,Campaign_3,Search Ad,0


## Multi-touch Linear Attribution Model

In [5]:
# For each user, check if they converted and how many touchpoints they had
# I'll use a grouped approach: group by user_id and count the touchpoints
df['touchpoint_count'] = df.groupby('user_id')['touchpoint_id'].transform('count')

# Apply linear attribution
# Each conversion is shared equally among all touchpoints for the same user
df['touchpoint_credit'] = df['conversion'] / df['touchpoint_count']

In [6]:
print("\nOutput of linear attribution model\n")
df


Output of linear attribution model



Unnamed: 0,user_id,campaign_id,touchpoint_id,conversion,touchpoint_count,touchpoint_credit
0,31,Campaign_3,Search Ad,1,12,0.083333
1,79,Campaign_1,Display Ad,0,16,0.000000
2,51,Campaign_1,Email,0,5,0.000000
3,14,Campaign_2,Social Media,1,9,0.111111
4,67,Campaign_2,Search Ad,1,16,0.062500
...,...,...,...,...,...,...
995,76,Campaign_2,Search Ad,1,11,0.090909
996,79,Campaign_2,Display Ad,0,16,0.000000
997,49,Campaign_2,Social Media,1,9,0.111111
998,35,Campaign_3,Search Ad,0,13,0.000000


## Channels

In [7]:
# Calculate total conversions attributed to each touchpoint and each campaign
# Group by touchpoint_id to sum up the credit attributed to each touchpoint
touchpoint_conversions = df.groupby('touchpoint_id')['touchpoint_credit'].sum().reset_index()

# Sort to identify the most effective touchpoints
touchpoint_conversions = touchpoint_conversions.sort_values(by='touchpoint_credit', ascending=False)

print("\nAttributed conversions by touchpoint:")
touchpoint_conversions.reset_index(drop=True)


Attributed conversions by touchpoint:


Unnamed: 0,touchpoint_id,touchpoint_credit
0,Search Ad,6.105444
1,Email,5.050359
2,Display Ad,4.72905
3,Social Media,4.037521


## Campaigns

In [8]:
# Similarly, group by campaign_id to calculate attributed conversions per campaign
campaign_conversions = df.groupby('campaign_id')['touchpoint_credit'].sum().reset_index()

# Sort to identify the most effective campaigns
campaign_conversions = campaign_conversions.sort_values(by='touchpoint_credit', ascending=False)

print("\nAttributed conversions by campaign:")
campaign_conversions.reset_index(drop=True)


Attributed conversions by campaign:


Unnamed: 0,campaign_id,touchpoint_credit
0,Campaign_3,7.059783
1,Campaign_1,6.43501
2,Campaign_2,6.427581
