# Attribution Models

## What Are Attribution Models?

Attribution models are methods used in marketing to **assign credit** to
the various channels or touchpoints that contribute to a customer’s
conversion (e.g., making a purchase or signing up for a service).
Consider these simple examples:

-   **First-Touch Attribution:**  
    *Example:* A customer discovers your brand through an Instagram post
    and later makes a purchase. In this model, **all credit** is given
    to that first interaction—even if the customer later visits your
    website via email or search.

-   **Last-Touch Attribution:**  
    *Example:* A customer clicks a promotional email and then
    immediately buys a dress. Here, **all the credit** goes to the email
    interaction, ignoring any earlier exposures through social media or
    other channels.

-   **Multi-Touch Attribution:**  
    *Example:* A customer sees an influencer post on Instagram, clicks
    on a retargeting ad, and finally converts after receiving an email
    about a flash sale. This model distributes the credit **across all
    channels**, reflecting the reality that each interaction contributed
    to the final purchase decision.

## Case Study: LuxeStyle Boutique

Let’s examine data from LuxeStyle Boutique, a direct-to-consumer fashion
brand that specializes in contemporary women’s clothing and accessories.
The company has both an online presence and several physical stores,
using multiple marketing channels to reach their fashion-forward target
audience. They’re particularly known for their sustainable practices and
size-inclusive collections.

LuxeStyle’s marketing mix includes:

-   Email campaigns featuring new collections, styling tips, and
    exclusive pre-sales
-   Strong social media presence on Instagram, Pinterest, and TikTok
-   Paid search advertising focusing on fashion-related keywords
-   Organic search optimization for style guides and trend content
-   Direct traffic from brand recognition and offline advertising
-   Influencer partnerships (tracked through direct traffic and special
    URLs)

The dataset we’ll analyze contains 10,000 customer journeys from the
Spring/Summer 2024 collection launch, tracking whether customers were
exposed to each marketing channel and if they ultimately made a
purchase.

## Creating the Dataset

First, let’s generate a synthetic dataset that mimics realistic fashion
e-commerce conversion rates:

In [None]:
import numpy as np
import pandas as pd

# Set random seed for reproducibility
np.random.seed(42)

# Generate synthetic dataset with 10,000 samples (more realistic sample size)
n_samples = 10000
data = {
    "Email": np.random.choice([0, 1], size=n_samples, p=[0.7, 0.3]),  # 30% email exposure
    "Social Media": np.random.choice([0, 1], size=n_samples, p=[0.4, 0.6]),  # 60% social media exposure
    "Paid Search": np.random.choice([0, 1], size=n_samples, p=[0.8, 0.2]),  # 20% paid search exposure
    "Organic Search": np.random.choice([0, 1], size=n_samples, p=[0.7, 0.3]),  # 30% organic search exposure
    "Direct": np.random.choice([0, 1], size=n_samples, p=[0.85, 0.15]),  # 15% direct traffic
}

# Set true influence of each channel (coefficients calibrated for ~2% conversion rate)
true_coeffs = np.array([0.5, 0.8, 0.3, 0.2, 0.4])  # Social Media still has highest impact
intercept = -4.5  # Much lower intercept to achieve realistic conversion rate

# Calculate conversion probabilities
X = np.column_stack([data[key] for key in data])
logits = intercept + np.dot(X, true_coeffs)
probabilities = 1 / (1 + np.exp(-logits))
conversions = np.random.binomial(1, probabilities)

# Create final DataFrame
df = pd.DataFrame(data)
df["Converted"] = conversions

# Display conversion rate and channel exposure statistics
print("Conversion Rate Analysis:")
conversion_rate = df["Converted"].mean() * 100
print(f"\nOverall conversion rate: {conversion_rate:.2f}%")
print("\nChannel Exposure Rates:")
for channel in data.keys():
    exposed_conv_rate = df[df[channel] == 1]["Converted"].mean() * 100
    exposed_count = df[channel].sum()
    print(f"{channel}:")
    print(f"  - Exposure rate: {exposed_count/n_samples*100:.1f}%")
    print(f"  - Conversion rate when exposed: {exposed_conv_rate:.2f}%")

## Logistic Regression Analysis

Now, let’s analyze this more realistic dataset using logistic regression
to understand the impact of each marketing channel:

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt

# Split data into features (X) and target (y)
X = df.drop(columns=["Converted"])
y = df["Converted"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict and evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Get feature coefficients
coefficients = model.coef_[0]
feature_importance = pd.DataFrame({
    "Channel": X.columns,
    "Coefficient": coefficients,
    "Odds Ratio": np.exp(coefficients)
})
feature_importance["Relative Importance (%)"] = (abs(feature_importance["Coefficient"]) / 
                                               abs(feature_importance["Coefficient"]).sum()) * 100
feature_importance = feature_importance.sort_values(by="Relative Importance (%)", ascending=False)

# Display detailed results
print("\nChannel Importance Analysis:")
print(feature_importance)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Plot feature importance with fashion-oriented colors
plt.figure(figsize=(10,6))
plt.barh(feature_importance["Channel"], feature_importance["Relative Importance (%)"], 
         color='plum', edgecolor='purple')
plt.xlabel("Relative Importance (%)")
plt.ylabel("Marketing Channel")
plt.title(f"Marketing Channel Impact on Conversion\nOverall Conversion Rate: {conversion_rate:.2f}%")
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

## Interpretation of Results for LuxeStyle Boutique

Our analysis of LuxeStyle’s marketing data now reflects realistic
e-commerce conversion rates and reveals several key insights:

1.  **Overall Conversion Rate**: The simulated data shows a conversion
    rate of approximately 2%, which aligns with typical fashion
    e-commerce benchmarks. This means that out of every 100 customer
    journeys, about 2 result in a purchase.

2.  **Channel Impact Analysis**:

    -   **Social Media Dominance**: Despite the low overall conversion
        rate, social media shows the strongest relative influence on
        increasing conversion probability
    -   **Email Performance**: Email marketing demonstrates the
        second-highest impact, particularly effective for retargeting
        and promotional campaigns
    -   **Direct Traffic**: Brand recognition and influencer
        partnerships show moderate impact
    -   **Search Channels**: Both paid and organic search contribute to
        conversions but with lower relative importance

3.  **Channel Exposure Analysis**:

    -   Social media has the highest exposure rate (60% of customer
        journeys)
    -   Direct traffic has the lowest exposure rate (15%) but good
        conversion impact
    -   Email reaches 30% of potential customers

### Recommendations for LuxeStyle

Based on these more realistic findings, we recommend:

1.  **Social Media Strategy**:
    -   Continue strong investment in social media, focusing on
        conversion optimization
    -   Develop more shoppable posts to reduce friction in the purchase
        journey
    -   Test different content types to identify what drives highest
        conversion rates
2.  **Email Marketing Optimization**:
    -   Focus on growing the email list given its strong conversion
        impact
    -   Implement abandoned cart and browse recovery emails
    -   Test timing and frequency of emails to maximize engagement
3.  **Multi-Channel Approach**:
    -   While maintaining focus on high-performing channels, keep a
        balanced presence across all channels
    -   Develop channel-specific content strategies
    -   Track and optimize cross-channel customer journeys
4.  **Conversion Rate Optimization**:
    -   Given the realistic 2% conversion rate, focus on both:
        1.  Increasing traffic from high-converting channels
        2.  Optimizing the conversion funnel to improve rates across all
            channels

This analysis provides LuxeStyle with actionable insights based on
realistic e-commerce metrics, helping them optimize their marketing
strategy in the competitive fashion space.