### Attribution in Marketing

**Attribution** in marketing refers to the process of identifying and assigning value to the different touchpoints a customer interacts with on their journey towards a conversion or sale. These touchpoints could include various channels like email campaigns, social media ads, website visits, and more.

**Why is Attribution Important?**

Attribution helps marketers understand the effectiveness of their marketing strategies and campaigns by determining which channels and interactions are driving conversions. This insight allows businesses to:

1. **Optimize Marketing Spend**: By identifying the most effective channels, marketers can allocate their budget more efficiently.
2. **Improve Campaign Performance**: Understanding which touchpoints contribute to conversions helps in refining strategies to enhance performance.
3. **Measure ROI**: Attribution provides a clear view of the return on investment (ROI) for different marketing activities.
4. **Enhance Customer Experience**: By analyzing customer journeys, marketers can improve the overall customer experience and engagement.

### Types of Attribution Models

Attribution models can be broadly classified into **single-touch** and **multi-touch** models. Each model offers a different way of distributing credit to touchpoints.

#### **Single-Touch Attribution**

1. **First-Touch Attribution**
   - **Definition**: Assigns 100% of the conversion credit to the first interaction a customer has with your brand.
   - **Pros**: Simple and straightforward. Useful for understanding initial engagement.
   - **Cons**: Ignores subsequent interactions that might also play significant roles.
   - **Use Case**: Ideal for campaigns focused on generating awareness and attracting new customers.

2. **Last-Touch Attribution**
   - **Definition**: Assigns 100% of the conversion credit to the last interaction before the conversion.
   - **Pros**: Highlights the final interaction that triggered the conversion.
   - **Cons**: Disregards earlier touchpoints that contributed to the journey.
   - **Use Case**: Useful for campaigns aimed at closing sales or conversions.

#### **Multi-Touch Attribution**

Multi-touch attribution models distribute the credit across multiple interactions, providing a more comprehensive view of the customer journey.

1. **Linear Attribution**
   - **Definition**: Distributes the credit equally across all touchpoints leading up to the conversion.
   - **Pros**: Simple to implement and provides a balanced view.
   - **Cons**: Does not differentiate the impact of each touchpoint.
   - **Use Case**: Useful when all interactions are believed to contribute equally to the conversion.

2. **Time-Decay Attribution**
   - **Definition**: Gives more credit to interactions that occurred closer to the conversion time.
   - **Pros**: Acknowledges that touchpoints nearer to the conversion may have a higher impact.
   - **Cons**: Can undervalue earlier interactions.
   - **Use Case**: Ideal for longer sales cycles where the influence of touchpoints increases as the customer gets closer to converting.

3. **Position-Based (U-Shaped) Attribution**
   - **Definition**: Assigns 40% of the credit to the first and last interactions, and the remaining 20% is distributed evenly among the middle touchpoints.
   - **Pros**: Balances the importance of first and last interactions while still giving some credit to the middle.
   - **Cons**: Can be complex to calculate and may still undervalue some touchpoints.
   - **Use Case**: Suitable for campaigns where both the first engagement and the closing touchpoint are crucial.

### Summary

- **Attribution** is the practice of understanding which marketing interactions contribute to conversions.
- **Importance**: It helps in optimizing marketing spend, improving campaign performance, measuring ROI, and enhancing customer experience.
- **Models**: There are single-touch models like First-Touch and Last-Touch, and multi-touch models like Linear, Time-Decay, Position-Based, and Data-Driven Attribution. Each model has its advantages and use cases depending on the marketing objectives and customer journey complexity.

Creating Randomised Datasets

In [12]:
import pandas as pd
import numpy as np

# Constants
N_CUSTOMERS = 1000
N_EVENTS = 5000
N_TARGETS = np.random.randint(200, 301)  # Number of target rows between 200 and 300
EVENTS = ['Visit Website', 'Click Ad', 'Signup Newsletter', 'Download App', 'Watch Video']
START_DATE = '2023-01-01'
END_DATE = '2023-12-31'

# Generate random activity data
def generate_activity_data(n_customers, n_events, events, start_date, end_date):
    np.random.seed(42)
    # Randomly assign customers to events
    customer_ids = np.random.randint(1, n_customers + 1, n_events)
    # Randomly choose events
    event_choices = np.random.choice(events, n_events)
    # Randomly generate dates
    dates = pd.to_datetime(np.random.randint(pd.Timestamp(start_date).value,
                                             pd.Timestamp(end_date).value,
                                             n_events))
    # Create DataFrame
    activity_data = pd.DataFrame({
        'CustomerID': customer_ids,
        'Event': event_choices,
        'Date': dates
    })
    return activity_data

# Generate random target data
def generate_target_data(n_customers, n_targets, start_date, end_date):
    np.random.seed(42)
    # Randomly select a subset of customers
    customer_ids = np.random.choice(np.arange(1, n_customers + 1), n_targets)
    # Generate random target amounts between $10 and $100
    target_amounts = np.random.randint(10, 100, size=n_targets)
    # Randomly generate target dates
    target_dates = pd.to_datetime(np.random.randint(pd.Timestamp(start_date).value,
                                                    pd.Timestamp(end_date).value,
                                                    n_targets))
    # Create DataFrame
    target_data = pd.DataFrame({
        'CustomerID': customer_ids,
        'TargetAmount': target_amounts,
        'TargetDate': target_dates
    })
    return target_data

# Generate the datasets
activity_data = generate_activity_data(N_CUSTOMERS, N_EVENTS, EVENTS, START_DATE, END_DATE)
target_data = generate_target_data(N_CUSTOMERS, N_TARGETS, START_DATE, END_DATE)

# Display datasets
print("Activity Data")
print(activity_data)
print("\nTarget Data")
print(target_data)


Activity Data
      CustomerID              Event                          Date
0            103  Signup Newsletter 2023-10-16 09:41:17.081846279
1            436        Watch Video 2023-04-30 23:54:30.961011074
2            861        Watch Video 2023-07-01 09:24:29.787375890
3            271      Visit Website 2023-04-17 14:37:05.142324289
4            107      Visit Website 2023-10-01 17:03:54.850574933
...          ...                ...                           ...
4995         290       Download App 2023-10-08 03:48:22.871559789
4996         295      Visit Website 2023-12-22 22:06:05.925943536
4997         451        Watch Video 2023-12-23 09:33:49.894217929
4998         904       Download App 2023-02-08 17:13:56.745944572
4999         215        Watch Video 2023-01-24 04:36:52.221262943

[5000 rows x 3 columns]

Target Data
     CustomerID  TargetAmount                    TargetDate
0           103            80 2023-08-30 12:16:23.561984827
1           436            68 2023-1

In [13]:
# Check for multiple events per customer
print(activity_data['CustomerID'].value_counts())

# Check for multiple targets per customer
print(target_data['CustomerID'].value_counts())


CustomerID
664    14
929    12
39     12
936    12
113    12
       ..
110     1
979     1
189     1
368     1
528     1
Name: count, Length: 989, dtype: int64
CustomerID
958    3
872    3
41     2
131    2
15     2
      ..
92     1
367    1
455    1
428    1
160    1
Name: count, Length: 192, dtype: int64


Customer Journies (Merging Event and Target Datasets)

In [15]:
# Merge activity and target data
# Add a column to distinguish events from targets
activity_data['Type'] = 'Event'
target_data['Type'] = 'Target'
target_data.rename(columns={'TargetAmount': 'Amount', 'TargetDate': 'Date'}, inplace=True)
activity_data['Amount'] = np.nan

# Combine both datasets
combined_data = pd.concat([activity_data, target_data], ignore_index=True)

# Sort by CustomerID and Date
combined_data_sorted = combined_data.sort_values(by=['CustomerID', 'Date'])

print(combined_data_sorted)

# Create customer journey dictionary
customer_journeys = {}
for customer_id, group in combined_data_sorted.groupby('CustomerID'):
    journey = []
    for _, row in group.iterrows():
        if row['Type'] == 'Event':
            journey.append((row['Date'], row['Event']))
        elif row['Type'] == 'Target':
            journey.append((row['Date'], 'Target', row['Amount']))
    customer_journeys[customer_id] = journey

      CustomerID              Event                          Date   Type  \
1357           1       Download App 2023-03-05 23:45:23.459254458  Event   
4234           1  Signup Newsletter 2023-03-16 19:16:09.105778464  Event   
4511           1      Visit Website 2023-03-18 11:16:27.404361226  Event   
1049           1       Download App 2023-03-25 07:59:29.359660428  Event   
897            1  Signup Newsletter 2023-04-21 08:04:00.525932378  Event   
...          ...                ...                           ...    ...   
759          999       Download App 2023-03-14 18:46:04.627417538  Event   
3123         999           Click Ad 2023-06-10 18:54:29.486619924  Event   
4798         999           Click Ad 2023-08-30 03:44:13.632851925  Event   
631          999        Watch Video 2023-12-12 11:24:34.621862321  Event   
3741        1000  Signup Newsletter 2023-07-26 22:47:50.662755516  Event   

      Amount  
1357     NaN  
4234     NaN  
4511     NaN  
1049     NaN  
897      NaN

In [17]:
# Example: Print the journey for customer 592
customer_id_example = 131
print(f"Customer {customer_id_example} Journey:")
for entry in customer_journeys.get(customer_id_example, []):
    print(entry)

Customer 131 Journey:
(Timestamp('2023-01-20 06:47:08.540925130'), 'Signup Newsletter')
(Timestamp('2023-03-11 05:49:41.469664800'), 'Watch Video')
(Timestamp('2023-03-13 12:14:13.491732238'), 'Target', 10.0)
(Timestamp('2023-03-29 19:16:36.754803610'), 'Visit Website')
(Timestamp('2023-06-01 10:13:02.285062530'), 'Signup Newsletter')
(Timestamp('2023-06-28 04:26:51.066969002'), 'Download App')
(Timestamp('2023-08-13 14:44:10.782534723'), 'Visit Website')
(Timestamp('2023-08-22 10:44:08.631152442'), 'Download App')
(Timestamp('2023-09-21 17:33:43.437923570'), 'Click Ad')
(Timestamp('2023-12-22 18:17:59.684139101'), 'Target', 99.0)


Single-Touch Attribution:  First-Touch

Attributes the entire conversion value to the first interaction.

Last-Touch: Attributes the entire conversion value to the last interaction.
Multi-Touch Attribution:

In [21]:
# Calculate First-Touch Attribution
attribution = []

# Iterate through each customer's journey
for customer_id, journey in customer_journeys.items():
    # Iterate through each step in the journey
    for i, entry in enumerate(journey):
        if isinstance(entry, tuple) and entry[1] == 'Target':
            # Find the first event before this target
            for j in range(i):
                if isinstance(journey[j], tuple) and journey[j][1] != 'Target':
                    attribution.append((journey[j][1], entry[2]))
                    break

# Convert attribution results to DataFrame
attribution_df = pd.DataFrame(attribution, columns=['Event', 'TargetAmount'])

# Sum the target amounts for each event
attribution_result = attribution_df.groupby('Event')['TargetAmount'].sum().reset_index()

# Display the attribution result
print("Attribution Result")
print(attribution_result)

Attribution Result
               Event  TargetAmount
0           Click Ad        2026.0
1       Download App        1504.0
2  Signup Newsletter        1536.0
3      Visit Website        2661.0
4        Watch Video        2013.0


Introducing 'Miscellaneous' as there could be conversion without any events

In [22]:
# Calculate First-Touch Attribution with Miscellaneous handling
attribution = []

# Iterate through each customer's journey
for customer_id, journey in customer_journeys.items():
    # Initialize flag to check if any event was found before target
    found_event = False
    for i, entry in enumerate(journey):
        if isinstance(entry, tuple) and entry[1] == 'Target':
            # Reset the flag for each target
            found_event = False
            # Find the first event before this target
            for j in range(i):
                if isinstance(journey[j], tuple) and journey[j][1] != 'Target':
                    attribution.append((journey[j][1], entry[2]))
                    found_event = True
                    break
            # If no event was found before this target, assign to Miscellaneous
            if not found_event:
                attribution.append(('Miscellaneous', entry[2]))

# Convert attribution results to DataFrame
attribution_df = pd.DataFrame(attribution, columns=['Event', 'TargetAmount'])

# Sum the target amounts for each event
attribution_result = attribution_df.groupby('Event')['TargetAmount'].sum().reset_index()

# Display the attribution result
print("Attribution Result")
print(attribution_result)

Attribution Result
               Event  TargetAmount
0           Click Ad        2026.0
1       Download App        1504.0
2      Miscellaneous        1434.0
3  Signup Newsletter        1536.0
4      Visit Website        2661.0
5        Watch Video        2013.0


Evaluate Result

In [23]:
# Evaluate the result
# Sum the target amounts from the target_data
total_target_amount = target_data['Amount'].sum()

# Sum the target amounts from the attribution_result
total_attributed_amount = attribution_result['TargetAmount'].sum()

# Compare the sums
print(f"Total Target Amount from target_data: {total_target_amount}")
print(f"Total Attributed Amount from attribution_result: {total_attributed_amount}")

# Check if the sums are equal
if np.isclose(total_target_amount, total_attributed_amount):
    print("The total target amount matches the total attributed amount.")
else:
    print("There is a discrepancy between the total target amount and the total attributed amount.")

Total Target Amount from target_data: 11174
Total Attributed Amount from attribution_result: 11174.0
The total target amount matches the total attributed amount.
