# Marketing Campaign Optimization using Reinforcement Learning

This notebook demonstrates how to use a simple reinforcement learning algorithm (multi-armed bandit) to optimize the selection of marketing campaigns. We'll simulate a scenario where we need to choose between different marketing campaigns with unknown conversion rates, learning which performs best through trial and error.

## Business Context

Marketing teams often face the challenge of allocating limited budget across different campaign strategies. Traditionally, this might involve:
- Running A/B tests for a fixed period
- Analyzing results after the test period
- Allocating the remaining budget to the winner

With a multi-armed bandit approach, we can:
- Continuously learn which campaigns perform best
- Gradually shift budget toward high-performing campaigns
- Continue exploring new options to adapt to changing conditions

This approach maximizes overall campaign performance while still gathering valuable information about all options.

## 1. The Multi-Armed Bandit Problem

A multi-armed bandit problem is named after a casino slot machine ("one-armed bandit") but with multiple arms. Each arm, when pulled, provides a reward from a probability distribution specific to that arm. The objective is to maximize the total reward over a period of time.

In our marketing context:
- Each arm represents a different marketing campaign
- Pulling an arm means showing a campaign to a customer
- The reward is whether the customer converts (1) or not (0)
- Each campaign has an unknown true conversion rate

Let's set up our simulation with three different marketing campaigns with the following (hidden) conversion rates:
- Campaign 1: 20% conversion rate
- Campaign 2: 50% conversion rate
- Campaign 3: 70% conversion rate

In a real business scenario, we wouldn't know these rates in advance - we'd learn them through experimentation.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import random

# Set random seed for reproducibility
np.random.seed(42)
random.seed(42)

# True conversion rates for each campaign (unknown to the algorithm)
true_conversion_rates = [0.2, 0.5, 0.7]
num_campaigns = len(true_conversion_rates)

# Function to simulate a campaign result
def get_campaign_result(campaign_index):
    """Return 1 (conversion) or 0 (no conversion) based on the true conversion rate"""
    if random.random() < true_conversion_rates[campaign_index]:
        return 1  # Conversion
    else:
        return 0  # No conversion

# Let's test our function
test_results = [get_campaign_result(0) for _ in range(1000)]
print(f"Campaign 1 test conversion rate: {sum(test_results)/len(test_results):.3f} (should be close to {true_conversion_rates[0]})")

## 2. Exploration vs. Exploitation

The key challenge in multi-armed bandit problems is balancing:

- **Exploration**: Trying different campaigns to learn their conversion rates
- **Exploitation**: Selecting the best-performing campaign based on current knowledge

If we exploit too early, we might miss better campaigns. If we explore too much, we waste opportunities on lower-performing campaigns.

### Random Strategy (Pure Exploration)

Let's first implement a purely random strategy that selects campaigns with equal probability, regardless of their performance. This represents pure exploration with no exploitation.

In [None]:
def random_strategy(num_simulations=1000):
    conversions = 0
    campaign_counts = np.zeros(num_campaigns)
    
    for _ in range(num_simulations):
        # Randomly select a campaign
        campaign = random.randint(0, num_campaigns-1)
        
        # Get result and update counts
        result = get_campaign_result(campaign)
        conversions += result
        campaign_counts[campaign] += 1
    
    print(f"Random Strategy Results:")
    print(f"Total Conversions: {conversions} out of {num_simulations} ({conversions/num_simulations:.3f})")
    print(f"Campaign Selection Counts: {campaign_counts}")
    return conversions, campaign_counts

random_conversions, random_counts = random_strategy(10000)

### Greedy Strategy (Pure Exploitation)

Now let's implement a purely greedy strategy that always chooses the campaign with the highest observed conversion rate. After a brief initial exploration phase, it will exploit the best option.

In [None]:
def greedy_strategy(num_simulations=1000):
    conversions = 0
    campaign_counts = np.zeros(num_campaigns)
    campaign_conversions = np.zeros(num_campaigns)
    
    # Initial exploration: try each campaign once
    for campaign in range(num_campaigns):
        result = get_campaign_result(campaign)
        conversions += result
        campaign_counts[campaign] += 1
        campaign_conversions[campaign] += result
    
    # Main simulation loop
    for _ in range(num_simulations - num_campaigns):
        # Calculate current conversion rates
        conversion_rates = campaign_conversions / campaign_counts
        
        # Select campaign with highest rate
        campaign = np.argmax(conversion_rates)
        
        # Get result and update counts
        result = get_campaign_result(campaign)
        conversions += result
        campaign_counts[campaign] += 1
        campaign_conversions[campaign] += result
    
    print(f"Greedy Strategy Results:")
    print(f"Total Conversions: {conversions} out of {num_simulations} ({conversions/num_simulations:.3f})")
    print(f"Campaign Selection Counts: {campaign_counts}")
    print(f"Estimated Conversion Rates: {campaign_conversions/campaign_counts}")
    return conversions, campaign_counts, campaign_conversions

greedy_conversions, greedy_counts, greedy_campaign_conversions = greedy_strategy(10000)

## 3. Epsilon-Greedy Algorithm

The epsilon-greedy algorithm provides a simple way to balance exploration and exploitation:

- With probability ε (epsilon), select a random campaign (exploration)
- With probability 1-ε, select the campaign with the highest observed conversion rate (exploitation)

Let's implement this algorithm and test it with different values of epsilon.

In [None]:
def epsilon_greedy_strategy(epsilon, num_simulations=1000):
    conversions = 0
    campaign_counts = np.zeros(num_campaigns)
    campaign_conversions = np.zeros(num_campaigns)
    
    # For tracking learning progress
    history = []
    
    # Initial exploration: try each campaign once
    for campaign in range(num_campaigns):
        result = get_campaign_result(campaign)
        conversions += result
        campaign_counts[campaign] += 1
        campaign_conversions[campaign] += result
        
        # Track cumulative conversion rate
        history.append(conversions / (campaign + 1))
    
    # Main simulation loop
    for i in range(num_simulations - num_campaigns):
        # Epsilon-greedy action selection
        if random.random() < epsilon:
            # Exploration: choose random campaign
            campaign = random.randint(0, num_campaigns-1)
        else:
            # Exploitation: choose campaign with highest estimated conversion rate
            conversion_rates = campaign_conversions / campaign_counts
            campaign = np.argmax(conversion_rates)
        
        # Get result and update counts
        result = get_campaign_result(campaign)
        conversions += result
        campaign_counts[campaign] += 1
        campaign_conversions[campaign] += result
        
        # Track cumulative conversion rate
        history.append(conversions / (i + num_campaigns + 1))
    
    print(f"Epsilon-Greedy (ε={epsilon}) Results:")
    print(f"Total Conversions: {conversions} out of {num_simulations} ({conversions/num_simulations:.3f})")
    print(f"Campaign Selection Counts: {campaign_counts}")
    print(f"Estimated Conversion Rates: {campaign_conversions/campaign_counts}")
    return conversions, campaign_counts, campaign_conversions, history

# Test with different epsilon values
epsilons = [0.0, 0.1, 0.3, 0.5]
results = {}

for eps in epsilons:
    results[eps] = epsilon_greedy_strategy(eps, 10000)

## 4. Analyzing the Results

Let's visualize and compare the performance of different strategies:

In [None]:
# Prepare data for plotting
strategies = ['Random'] + [f'ε-Greedy (ε={eps})' for eps in epsilons]
conversion_rates = [random_conversions/10000] + [results[eps][0]/10000 for eps in epsilons]

# Create bar chart
plt.figure(figsize=(12, 6))
bars = plt.bar(strategies, conversion_rates, color=sns.color_palette("viridis", len(strategies)))
plt.axhline(y=max(true_conversion_rates), color='r', linestyle='--', alpha=0.7, label=f'Best Campaign ({max(true_conversion_rates):.2f})')
plt.axhline(y=sum(true_conversion_rates)/len(true_conversion_rates), color='gray', linestyle='--', alpha=0.7, 
            label=f'Average Campaign ({sum(true_conversion_rates)/len(true_conversion_rates):.2f})')

# Add value labels on top of bars
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height + 0.01, f'{height:.3f}', 
             ha='center', va='bottom', fontweight='bold')

plt.ylim(0, max(true_conversion_rates) + 0.1)
plt.ylabel('Overall Conversion Rate')
plt.xlabel('Strategy')
plt.title('Performance Comparison of Different Campaign Selection Strategies')
plt.legend()
plt.grid(axis='y', alpha=0.3)
plt.show()

In [None]:
# Plot campaign selection counts
selection_data = {
    'Random': random_counts,
}
for eps in epsilons:
    selection_data[f'ε-Greedy (ε={eps})'] = results[eps][1]

# Create DataFrame for plotting
selection_df = pd.DataFrame(selection_data, index=[f'Campaign {i+1} ({rate:.1f})' for i, rate in enumerate(true_conversion_rates)])

# Create stacked bar chart
ax = selection_df.plot(kind='bar', stacked=False, figsize=(14, 7), 
                     color=sns.color_palette("viridis", len(selection_data)))

plt.title('Campaign Selection Distribution by Strategy')
plt.xlabel('Campaign (True Conversion Rate)')
plt.ylabel('Number of Times Selected')
plt.legend(title='Strategy')
plt.grid(axis='y', alpha=0.3)

# Add value labels to each bar
for container in ax.containers:
    ax.bar_label(container, fmt='%d', fontweight='bold')

plt.tight_layout()
plt.show()

In [None]:
# Plot learning curves for epsilon-greedy strategies
plt.figure(figsize=(14, 7))

for eps in epsilons:
    history = results[eps][3]
    plt.plot(history, label=f'ε-Greedy (ε={eps})')

plt.axhline(y=max(true_conversion_rates), color='r', linestyle='--', alpha=0.7, label=f'Best Campaign ({max(true_conversion_rates):.2f})')
plt.axhline(y=sum(true_conversion_rates)/len(true_conversion_rates), color='gray', linestyle='--', alpha=0.7, 
            label=f'Average Campaign ({sum(true_conversion_rates)/len(true_conversion_rates):.2f})')

plt.title('Learning Curve: Cumulative Conversion Rate Over Time')
plt.xlabel('Number of Customers')
plt.ylabel('Cumulative Conversion Rate')
plt.legend()
plt.grid(alpha=0.3)
plt.show()

## 5. Business Interpretation

Let's analyze these results from a business perspective:

1. **Random Strategy (Pure Exploration):**
   - Evenly distributes traffic across all campaigns
   - Achieves average performance (around 0.467)
   - Provides good data about all campaigns, but sub-optimal overall performance
   - Business Use Case: Initial testing phase when you have no prior data

2. **Greedy Strategy (ε=0.0):**
   - Quickly commits to the campaign that appears best in early testing
   - Can get "stuck" with a sub-optimal campaign if early data is misleading
   - When it finds the best campaign, achieves high performance
   - Business Use Case: Short-term campaigns where quick optimization is needed

3. **Balanced Strategy (ε=0.1):**
   - Allocates 90% of traffic to the best-performing campaign, 10% to exploration
   - Balances learning with performance optimization
   - Usually finds the best campaign while maintaining strong overall performance
   - Business Use Case: Most marketing campaigns, especially ongoing ones

4. **High-Exploration Strategies (ε=0.3, ε=0.5):**
   - Allocate significant traffic to exploration
   - More likely to find the best campaign but at cost of overall performance
   - Business Use Case: When campaign performance is highly uncertain or environment changes rapidly

### Key Business Insights:

- The optimal level of exploration (ε) depends on:
  - The lifecycle stage of your campaigns
  - How much performance variance exists between campaigns
  - How quickly customer preferences change
  - Your tolerance for short-term performance drops

- In our simulation, ε=0.1 provided a good balance, but real-world applications might require adjustment

- Consider decaying ε over time: start with higher exploration, then gradually reduce as you gain confidence in your estimates

## 6. Advanced Topics

While epsilon-greedy is a simple and effective approach, more sophisticated reinforcement learning techniques can provide additional benefits in marketing campaign optimization:

### 1. Upper Confidence Bound (UCB) Algorithm

UCB balances exploration and exploitation by considering both the estimated value and the uncertainty of each option. It explores options with high uncertainty and high potential.

### 2. Thompson Sampling

Thompson sampling uses Bayesian methods to represent uncertainty about campaign performance. It naturally balances exploration and exploitation based on probability distributions.

### 3. Contextual Bandits

Contextual bandits extend multi-armed bandits by considering customer features (context) when selecting campaigns. This allows for personalization based on customer segments or attributes.

### 4. Non-Stationary Bandits

Non-stationary bandits handle environments where conversion rates change over time, such as seasonal variations or changing customer preferences.

Let's implement a simple version of UCB to see how it compares to epsilon-greedy:

In [None]:
def ucb_strategy(num_simulations=1000, c=2.0):
    conversions = 0
    campaign_counts = np.zeros(num_campaigns)
    campaign_conversions = np.zeros(num_campaigns)
    history = []
    
    # Initial exploration: try each campaign once
    for campaign in range(num_campaigns):
        result = get_campaign_result(campaign)
        conversions += result
        campaign_counts[campaign] += 1
        campaign_conversions[campaign] += result
        history.append(conversions / (campaign + 1))
    
    # Main simulation loop
    for i in range(num_simulations - num_campaigns):
        # Calculate UCB values for each campaign
        t = i + num_campaigns
        ucb_values = np.zeros(num_campaigns)
        for j in range(num_campaigns):
            if campaign_counts[j] > 0:
                # Estimated conversion rate
                exploitation_term = campaign_conversions[j] / campaign_counts[j]
                # Exploration bonus
                exploration_term = c * np.sqrt(np.log(t) / campaign_counts[j])
                ucb_values[j] = exploitation_term + exploration_term
            else:
                ucb_values[j] = float('inf')  # Ensure we try each arm at least once
        
        # Select campaign with highest UCB value
        campaign = np.argmax(ucb_values)
        
        # Get result and update counts
        result = get_campaign_result(campaign)
        conversions += result
        campaign_counts[campaign] += 1
        campaign_conversions[campaign] += result
        history.append(conversions / (t + 1))
    
    print(f"UCB Strategy (c={c}) Results:")
    print(f"Total Conversions: {conversions} out of {num_simulations} ({conversions/num_simulations:.3f})")
    print(f"Campaign Selection Counts: {campaign_counts}")
    print(f"Estimated Conversion Rates: {campaign_conversions/campaign_counts}")
    return conversions, campaign_counts, campaign_conversions, history

# Run UCB simulation
ucb_results = ucb_strategy(10000, c=1.0)

In [None]:
# Compare learning curves: UCB vs best epsilon-greedy
plt.figure(figsize=(14, 7))

plt.plot(ucb_results[3], label='UCB (c=1.0)', linewidth=2)
plt.plot(results[0.1][3], label='ε-Greedy (ε=0.1)', linewidth=2)

plt.axhline(y=max(true_conversion_rates), color='r', linestyle='--', alpha=0.7, label=f'Best Campaign ({max(true_conversion_rates):.2f})')
plt.axhline(y=sum(true_conversion_rates)/len(true_conversion_rates), color='gray', linestyle='--', alpha=0.7, 
            label=f'Average Campaign ({sum(true_conversion_rates)/len(true_conversion_rates):.2f})')

plt.title('Learning Curve: UCB vs Epsilon-Greedy')
plt.xlabel('Number of Customers')
plt.ylabel('Cumulative Conversion Rate')
plt.legend()
plt.grid(alpha=0.3)
plt.show()

## 7. Practical Implementation Considerations

When implementing reinforcement learning for marketing campaign optimization in a real business environment, consider these practical aspects:

### 1. Integration with Marketing Platforms

- **Web/Email Marketing:** Implement via A/B testing frameworks
- **Ad Platforms:** Use platform-specific targeting and optimization features
- **CRM Systems:** Update customer segments based on learning results

### 2. Evaluation Metrics

- **Primary Metrics:** Conversion rate, ROAS, CTR
- **Secondary Metrics:** Customer acquisition cost, customer lifetime value
- **Learning Metrics:** Exploration rate, confidence intervals, regret

### 3. Constraints and Considerations

- **Budget Allocation:** Minimum/maximum spend per campaign
- **Time Constraints:** Campaign deadlines, seasonal effects
- **Audience Targeting:** Different algorithms for different segments
- **Brand Guidelines:** Certain campaigns may have non-performance considerations

### 4. Implementation Strategy

1. **Start Small:** Begin with a limited campaign set
2. **Test Against Baseline:** Compare with traditional A/B testing
3. **Gradually Increase Scope:** Add more campaigns and features
4. **Monitor and Adjust:** Update parameters based on performance

### 5. Technical Implementation Options

- **Custom Solution:** Implement reinforcement learning algorithms directly
- **Marketing Platforms:** Some platforms offer built-in optimization
- **Third-Party Tools:** Specialized reinforcement learning platforms

## 8. Learning Challenge

Now it's your turn to experiment with multi-armed bandit algorithms for marketing campaign optimization. Try to complete the following tasks:

### Exercise 1: Implement a Decay Strategy

Create a modified epsilon-greedy algorithm where epsilon decreases over time. Start with epsilon=0.5 and gradually reduce it to 0.05.

### Exercise 2: Non-Stationary Environment

Modify the simulation to handle a non-stationary environment where the true conversion rates change halfway through the simulation. How does this affect the performance of different algorithms?

### Exercise 3: Contextual Bandits

Implement a simple contextual bandit where customers belong to one of two segments, and each segment has different conversion rates for the campaigns. How can you modify the algorithms to account for this additional information?

In [None]:
# Exercise 1: Epsilon-greedy with decay
def epsilon_greedy_with_decay(start_epsilon=0.5, end_epsilon=0.05, num_simulations=1000):
    # YOUR CODE HERE
    pass

In [None]:
# Exercise 2: Non-stationary environment
def non_stationary_simulation():
    # YOUR CODE HERE
    pass

In [None]:
# Exercise 3: Contextual bandits
def contextual_bandit_simulation():
    # YOUR CODE HERE
    pass

## Conclusion

Multi-armed bandit algorithms provide a powerful framework for optimizing marketing campaign selection. By balancing exploration and exploitation, these algorithms can achieve better results than traditional A/B testing approaches, especially for ongoing campaigns.

Key takeaways from this notebook:

1. **Balancing exploration and exploitation is critical:**
   - Too little exploration: May miss better campaigns
   - Too much exploration: Wastes resources on inferior campaigns

2. **Algorithm selection depends on business context:**
   - Epsilon-greedy: Simple and effective for most applications
   - UCB: Better theoretical guarantees, more sophisticated exploration
   - Contextual bandits: When customer segments respond differently

3. **Business implementation considerations:**
   - Integration with existing marketing platforms
   - Adaptation to changing conditions
   - Balancing multiple business objectives

By applying these reinforcement learning techniques, marketing teams can continuously optimize their campaigns, leading to higher conversion rates, better resource allocation, and improved ROI.