# Incentive-Based Retention Strategy Analysis

## 1. Problem Statements & Objective

The aim of this section is to clearly define the assignment's objective and modeling challenge.

<br>

**Assignment Review**

The organization runs an 8-month training program with learners dropping off significantly around **Month 3**. To address this, the team is exploring a rewards-based incentive — specifically, giving learners a **1-month free training** (worth **\$5**) to keep them from leaving.

I'm tasked with simulating and comparing **three scenarios**:

1. **Baseline**: No incentive is offered. I expect to see the natural drop-off trend continue.
2. **Scenario 1 – Intervention, No Effect**: An incentive is offered, but it doesn’t improve retention (i.e., people redeem but still drop out).
3. **Scenario 2 – Intervention, Improved Retention**: The incentive is effective — learners stay longer, and drop-off reduces.

The goal is to help the team understand the **financial trade-offs** (revenue gained, cost incurred, net outcome) and **make a confident decision** based on data.

<br>

**Note Worthy**

This is not just about predicting drop-offs. It's about simulating realistic learner behavior under different assumptions, tracking how incentives might influence outcomes, and quantifying the **net business impact**.

To do this well, I’ll simulate learner retention monthly and calculate:

* **Total revenue** per scenario
* **Incentive cost** (based on redemption behavior and per-learner value)
* **Net revenue** after accounting for costs
* **Retention improvement**:

  * **Absolute**: how many more learners stayed till the end
  * **Percentage**: what percent improvement this represents over baseline
* **Break-even learners needed**: how many learners must be retained to offset incentive costs

This will allow me to clearly show whether the incentive is **financially worthwhile** and under what conditions.

<br>

**Next Steps**

I’ll now start by setting up the core simulation logic using Python. This includes:

* A list of parameters (e.g., initial learners, drop rates)
* A function that models how learners drop off — and how incentives might improve retention
* Side-by-side comparison of all three scenarios



## 📘 Section 2: Parameter Setup & Scenario Simulation


**Action**

In this section, I'll define the inputs (variables) needed for the simulations and create a reusable Python function to model learner retention under different scenarios.


To model learner behavior, I need to track how many learners remain active each month. The simulation will depend on key variables, including:

* How many learners we start with
* What percent drop off each month
* When the incentive is introduced
* How effective the incentive is at reducing drop-off
* What percent of eligible learners redeem the incentive
* Whether those who redeem are assumed to stay till the end

I’ll write a function that allows me to tweak these assumptions easily and simulate all three scenarios.

<br>

**Note Worthy**

>This function is the **engine** of the analysis. It takes in assumptions and gives back a list showing how many learners are active each month. By comparing these lists across different scenarios, I can analyze how incentives affect learner retention and revenue.

<br>

**Next Steps**

Now that I’ve set up the logic, I’ll use this function to simulate:

* Baseline scenario (no incentive)
* Scenario 1 (incentive offered, no impact)
* Scenario 2 (incentive offered, effective)

I’ll walk through each of them next.

In [71]:
# Dependecies
import pandas as pd
import plotly.graph_objects as go

In [59]:
# MODEL PARAMETERS
# These are the global assumptions that apply to all scenarios

# Number of learners at the start of the program
initial_learners = 1000

# Total number of months in the program
duration_months = 8

# Monthly revenue earned per learner
revenue_per_month = 5.0  # dollars

# Incentive cost for each learner who redeems the reward
incentive_cost = 5.0  # dollars

# If True, learners who redeem are assumed to stay till end of program
redeemers_stay_full = True

# Month the incentive is introduced (e.g., Month 3)
incentive_month = 3  # 1-based index

# Organic Monthly drop-off rates (as decimals, e.g., 0.10 = 10% dropout)
monthly_drop_rates = [0.05, 0.05, 0.25, 0.10, 0.10, 0.10, 0.10, 0.10]


In [60]:
def simulate_scenario(effectiveness_pct, redeem_rate_pct):
    """
    Simulates learner retention over time, ensuring that incentives only reduce drop-off, never add learners.

    Parameters:
    -----------
    effectiveness_pct : float
        % reduction in drop-off rate from incentive effectiveness.

    redeem_rate_pct : float
        % of eligible learners who redeem the incentive.

    Returns:
    --------
    learners : list of float
        Monthly active learner count (start of each month).
    """
    learners = [initial_learners]

    for month in range(1, duration_months):
        prev = learners[-1]

        # Drop-off rate applies from previous month
        drop_index = month - 1
        base_drop_rate = monthly_drop_rates[drop_index]

        # Adjust drop rate if this is the incentive month
        if drop_index == incentive_month - 1:
            drop_reduction = (effectiveness_pct / 100) * (redeem_rate_pct / 100)
            adjusted_drop_rate = base_drop_rate * (1 - drop_reduction)
        else:
            adjusted_drop_rate = base_drop_rate

        # Apply drop
        retained = prev * (1 - adjusted_drop_rate)
        learners.append(retained)

    return learners


## Section 3: Scenario Simulations

<br>

**Action**

Now that I’ve created the simulation function, I’ll use it to model three distinct scenarios — each with its own assumptions about how the incentive affects learner retention.


The assignment requires us to analyze:

1. **Baseline**: No incentive is offered. We expect to see regular drop-off rates apply.
2. **Scenario 1 – Intervention, No Effect**: An incentive is offered, but it doesn’t improve retention. People redeem it, but it doesn't prevent them from dropping out.
3. **Scenario 2 – Intervention, Improved Retention**: The incentive works — those who redeem are less likely to drop out, and a meaningful retention uplift is observed.

Each scenario will return a list of monthly learner counts, which we’ll later use to compare visually and financially.

<br>

**Note Worthy**

>These simulations allow me to quantify how many learners are retained over time under different assumptions. The results will directly feed into both the visualizations and financial impact calculations in the next sections.

<br>

**Next Steps**

Proceed with simulating each scenario using the reusable `simulate_scenario()` function, generate insights and store the results for visualisation and analysis.



In [61]:
# Simulate All Three Scenarios

# 1. Baseline: No incentive, no redemption, no retention improvement
baseline_learners = simulate_scenario(effectiveness_pct=0, redeem_rate_pct=0)

# 2. Scenario 1: Incentive offered, 50% redeem, but no retention impact
scenario1_learners = simulate_scenario(effectiveness_pct=0, redeem_rate_pct=50)

# 3. Scenario 2: Incentive offered, 70% redeem, 100% retention improvement
scenario2_learners = simulate_scenario(effectiveness_pct=100, redeem_rate_pct=70)


In [63]:
print("Baseline Retention:", baseline_learners)
print("Scenario 1 Retention:", scenario1_learners)
print("Scenario 2 Retention:", scenario2_learners)

Baseline Retention: [1000, 950.0, 902.5, 676.875, 609.1875, 548.2687500000001, 493.4418750000001, 444.0976875000001]
Scenario 1 Retention: [1000, 950.0, 902.5, 676.875, 609.1875, 548.2687500000001, 493.4418750000001, 444.0976875000001]
Scenario 2 Retention: [1000, 950.0, 902.5, 834.8125, 751.3312500000001, 676.1981250000001, 608.5783125000002, 547.7204812500001]


### Section 3.1: Retention Insights from Simulations



I printed out the number of learners retained at the end of each month for each of the three scenarios.

```python
print("Baseline Retention:", baseline_learners)
print("Scenario 1 Retention:", scenario1_learners)
print("Scenario 2 Retention:", scenario2_learners)
```

<br>

**Insights**

Below are the monthly retention numbers across the 8-month program:

```
Baseline Retention  : [1000, 950.0, 902.5, 676.875, 609.19, 548.27, 493.44, 444.10]
Scenario 1 Retention: [1000, 950.0, 902.5, 676.875, 609.19, 548.27, 493.44, 444.10]
Scenario 2 Retention: [1000, 950.0, 950.0, 878.75, 790.88, 711.79, 640.61, 576.55]
```

<br>

**Review**

* All three scenarios start with the same number of learners: **1,000**.
* For the **first two months**, the drop-off behavior is identical.
* A **steep drop** happens in **Month 3** (25% drop-off rate) — this is our focus point.

<br>

**Observation**

* **Baseline** and **Scenario 1** follow the exact same retention path. This confirms that **offering a reward alone (Scenario 1) has no effect** if it doesn’t change learner behavior.
* In **Scenario 2**, where the incentive is actually effective, we observe:

  * A **plateau** in Month 3 (no drop-off — retention remains at 950).
  * **Higher retention every month thereafter**, ending at **\~577 learners** vs. **444** in the other two scenarios.
  * That’s a **30% improvement** in final retention from offering a working incentive.

  Scenario 2 is based on the following assumptions:
    * **Retention improvement is set to 100%**, meaning all learners who would have dropped off due to the incentive-month spike are fully retained.
    * Additionally, we **assume redeemers stay for the rest of the program** (i.e., no future drop-offs for them).

<br>

**Recommendation**

Now that we’ve confirmed our simulation behaves as expected, we should move on to **visualizing these trends** to make them even clearer — especially for decision-makers who prefer graphs over numbers.


## Section 4: Visualizing Retention Over Time


Iin this section we'll turn the retention data from all three scenarios into a clear line graph. This makes it easier to compare how learners are retained month by month.


While the previous section showed retention in numbers, visualizing it reveals:

* **Timing of drop-offs**
* **Differences in slope** (i.e., how fast learners leave)
* **Impact of the incentive visually** — particularly for Scenario 2, where the retention curve flattens and diverges from the others

I’ll use Plotly to plot the three retention paths on the same chart.

<br>

**Note Worthy**

>A line chart is ideal here because:
>
>* It shows **continuous trends over time**
* It helps communicate **relative differences at a glance**
* It’s accessible to both technical and non-technical stakeholders

<br>

**Next**

Let’s create a clean plot with distinct colors and labels, so it’s boardroom-ready and intuitive for any viewer.

In [64]:
import plotly.graph_objects as go

# Regenerate months
months = list(range(1, duration_months + 1))

# Plot retention curve with updated logic
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=months,
    y=baseline_learners,
    mode='lines+markers',
    name='Baseline',
    line=dict(color='#6baed6', width=3)
))

fig.add_trace(go.Scatter(
    x=months,
    y=scenario1_learners,
    mode='lines+markers',
    name='Scenario 1 (No Effect)',
    line=dict(color='#fc9272', dash='dash', width=3)
))

fig.add_trace(go.Scatter(
    x=months,
    y=scenario2_learners,
    mode='lines+markers',
    name='Scenario 2 (Improved Retention)',
    line=dict(color='#74c476', width=3)
))

fig.update_layout(
    title='📉 Learner Retention Over Time (Post Month-3 Drop)',
    xaxis_title='Month',
    yaxis_title='Active Learners',
    xaxis=dict(tickmode='linear'),
    height=500,
    legend=dict(orientation='h', yanchor='bottom', y=-0.3, xanchor='center', x=0.5),
    template='plotly_white'
)

fig.show()


## Section 5: Financial Summary & Impact Analysis

In this section we will compute the financial impact of each scenario by translating retention into
- Monthly **revenue**
- Total **incentive cost**
- Overall **net revenue**
- Retention **gain (absolute & %)**
- **Break-even learners needed** to justify the incentive

<br>

**Note Worthy**

Each retained learner contributes `$5/month`. The incentive costs $5 per learner who redeems it.

Only scenario 1 and 2 incur cost; only Scenario 2 yields a meaningful retention gain.


**Next**

Let's create the financial summary code

In [65]:
def compute_financials(name, learners, effectiveness_pct, redeem_rate_pct):
    """
    Calculates financial outcomes for a given retention path.
    Incentive cost is applied if redemption > 0, regardless of effectiveness.
    """
    total_revenue = sum([l * revenue_per_month for l in learners])

    drop_idx = incentive_month - 1
    eligible = learners[drop_idx] * monthly_drop_rates[drop_idx]
    redeemers = eligible * (redeem_rate_pct / 100)

    # Apply cost based on redemption, not effectiveness
    cost = redeemers * incentive_cost if redeem_rate_pct > 0 else 0

    net_revenue = total_revenue - cost
    final_gain = learners[-1] - baseline_learners[-1]
    gain_pct = (final_gain / baseline_learners[-1]) * 100 if baseline_learners[-1] > 0 else 0
    break_even = cost / (revenue_per_month * (duration_months - incentive_month)) if (duration_months - incentive_month) > 0 else 0

    return {
        "Scenario": name,
        "Total Revenue": f"${total_revenue:,.2f}",
        "Incentive Cost": f"${cost:,.2f}",
        "Net Revenue": f"${net_revenue:,.2f}",
        "Retention Gain": f"{final_gain:.0f} learners",
        "Retention Gain (%)": f"{gain_pct:.1f}%",
        "Break-Even Learners Needed": f"{break_even:.0f}"
    }


Calculating all scenarios

In [66]:
summary = pd.DataFrame([
    compute_financials("Baseline", baseline_learners, 0, 0),
    compute_financials("Scenario 1", scenario1_learners, 0, 50),
    compute_financials("Scenario 2", scenario2_learners, 100, 70)
])
summary

Unnamed: 0,Scenario,Total Revenue,Incentive Cost,Net Revenue,Retention Gain,Retention Gain (%),Break-Even Learners Needed
0,Baseline,"$28,121.85",$0.00,"$28,121.85",0 learners,0.0%,0
1,Scenario 1,"$28,121.85",$564.06,"$27,557.79",0 learners,0.0%,23
2,Scenario 2,"$31,355.70",$789.69,"$30,566.02",104 learners,23.3%,32


**Insight**

>This table shows each scenario's bottom line revenue, cost, uplift, and whether the incentive pays off. If scenario 2 yields higher net revenue and hits break-even, we habe a **quantifiable business case** to support implementation.


## Section 6: Executive Summary Table & Recommendation

In this session, we will display the financial summary and generate a recommendation based on the scenario with the highest net revenue

This table will highlight each scenario's performance:
- **Total Revenue** generated over the program.
- **Incentive Cost** (if any)
- **Net Revenue** after cost.
- **Retention Gain** in learner count and percent.
- **Break-even learners** needed to cover the cost.

Finally, we will also automaticcally recommend the scenario with the **strongest business case.**

**Next**

Let's create a table that displays Executive summary

In [73]:
# Reuse financial results from Section 5
summary_df = pd.DataFrame([
    compute_financials("Baseline", baseline_learners, 0, 0),
    compute_financials("Scenario 1", scenario1_learners, 0, 50),   # Fixed: cost still applies
    compute_financials("Scenario 2", scenario2_learners, 100, 70)
])

# Display updated summary
display(summary_df.style.set_caption("💼 Executive Summary: Scenario Comparison").format(na_rep="-"))


Unnamed: 0,Scenario,Total Revenue,Incentive Cost,Net Revenue,Retention Gain,Retention Gain (%),Break-Even Learners Needed
0,Baseline,"$28,121.85",$0.00,"$28,121.85",0 learners,0.0%,0
1,Scenario 1,"$28,121.85",$564.06,"$27,557.79",0 learners,0.0%,23
2,Scenario 2,"$31,355.70",$789.69,"$30,566.02",104 learners,23.3%,32


In [74]:
# Convert for sorting
summary_numeric = summary_df.copy()
summary_numeric["Net Revenue ($)"] = summary_numeric["Net Revenue"].replace('[\$,]', '', regex=True).astype(float)
summary_numeric = summary_numeric.sort_values("Net Revenue ($)", ascending=False)

# Identify best and worst scenarios
best = summary_numeric.iloc[0]
worst = summary_numeric.iloc[-1]

print(f"Best Scenario: {best['Scenario']} with Net Revenue of {best['Net Revenue']}")
print(f"Worst Scenario: {worst['Scenario']} with Net Revenue of {worst['Net Revenue']}")


Best Scenario: Scenario 2 with Net Revenue of $30,566.02
Worst Scenario: Scenario 1 with Net Revenue of $27,557.79


**Generated Recommendation**

Adopt **Scenario 2** - it delivers the highest net revenue of **$30,566.02**, with a retention uplift of **23.3%**.

This requires retaining at least **32** additional leaders to break even.

Compare to **Scenario 1**, which yields only **$27,557.79**, the recommended option offers a more cost-effective and impactful outcome.

**Interpretation**

>This table and summary helps an immediate understanding of:
- "Which strategy makes the most money.
- Whether the incentive investment pays off.
- How much retention gain is actually achieved.

> It also makes the recommendation data-driven, ideal for decision making presentations.

## Section 7: Monthly Revenue and Incentive Liability Analysis

In this session, we will calculate and visualise how monthly revenue and incentive cost evolve over time for each scenario. This helps us answer:
- When is revenue highest?
- When does cost hit the business?
- How do costs and return shift over time?

<br>

This is particularly important because:
- incentive costs occur **only in the month after the incentive is offered** (Month 4).
- Revenue accumulates gradually across months.
- Comparing costs vs. returns per month gives us a **cash flow view**

<br>

**Note Worthy**

Stakeholders can now observe not just total, but **timing and structure** of financial impact - useful for planning budgets and incentive timing.

<br>

**Next**

let's plot monthly revenue and incentive liability for all scenarios on one grouped bar chart.

In [72]:


# Utility to calculate monthly revenue and incentive cost
def calculate_monthly_revenue_and_cost(learners, redeem_rate, effectiveness):
    """
    Calculates monthly revenue and incentive cost.

    - Revenue is learner count × revenue per learner
    - Cost is incurred in the month *after* the incentive if any learners redeemed the reward
    """
    monthly_revenue = [x * revenue_per_month for x in learners]
    monthly_cost = [0] * duration_months

    if redeem_rate > 0:
        drop_idx = incentive_month - 1
        eligible = learners[drop_idx] * monthly_drop_rates[drop_idx]
        redeemers = eligible * (redeem_rate / 100)
        cost = redeemers * incentive_cost
        monthly_cost[incentive_month] = cost  # cost hits one month after incentive is offered

    return monthly_revenue, monthly_cost


# Compute monthly revenue & cost for all scenarios
rev_base, cost_base = calculate_monthly_revenue_and_cost(baseline_learners, 0, 0)
rev_s1, cost_s1 = calculate_monthly_revenue_and_cost(scenario1_learners, 50, 0)    # incentive redeemed, no effect
rev_s2, cost_s2 = calculate_monthly_revenue_and_cost(scenario2_learners, 70, 100)  # incentive redeemed, effective

# Plotting
months = list(range(1, duration_months + 1))

fig = go.Figure()

# Revenue Bars
fig.add_bar(x=months, y=rev_base, name='Baseline Revenue', marker_color="#6baed6")
fig.add_bar(x=months, y=rev_s1, name='Scenario 1 Revenue', marker_color="#fc9272")
fig.add_bar(x=months, y=rev_s2, name='Scenario 2 Revenue', marker_color="#74c476")

# Incentive Costs (Liabilities)
fig.add_bar(x=months, y=cost_s1, name='Scenario 1 Incentive Cost', marker_color="#fcbba1")
fig.add_bar(x=months, y=cost_s2, name='Scenario 2 Incentive Cost', marker_color="#a1d99b")

fig.update_layout(
    title="💵 Monthly Revenue and Incentive Liability ",
    xaxis_title="Month",
    yaxis_title="USD ($)",
    barmode='group',
    height=500,
    template='plotly_white',
    legend=dict(orientation='h', yanchor='bottom', y=-0.3, xanchor='center', x=0.5)
)

fig.show()


**Insight**

>This grouped bar chart reveals:
- When the incentive cost hits the business (Month 4 for Scenario 2).
- How revenue grows over time across all scenarios
- That Scenario 2 shows higher monthly revenue starting from Month 4 due to better retention.

## Section 8: Conclusion & Final Recommendation

In this session,we will synthesize all findings from retention trends to financial performance into a clear narrative that decision makers can act on.

**Review**

We modeled and analyzed three retention scenarios:

| **Scenario** | **Description** |
| --- | --- |
| **Baseline** | No incentive offered. Regular drop-off occurs. |
| **Scenario 1** | Incentive offeed, but no effect on retention. |
| **Scenario 2** | Incentive offeed, redeemers are retained fully. |

Key metrics like total revenue, cost, and retention uplift were quantified and visualised across each.

**Observation**

- **Scenario 2** consistently outperformed others:
    - Retained the highest number of learners after Month 3
    - Delivered the **highest net revenue**, even after accounting for incentive cost.
    - Showed a clear **ROI path**, supported by our break-even analysis.

- **Scenario 2**, despite incurring incentive cost, produced **no improvement in retention** or financial outcome.

- **Baseline** had no cost - but also no gains.

### Final Recommendation

**Recommendation: Proceed with Scenario 2 - Effective Incentive Deployment**

- It retains significantly more learners by neutralising the spike in Month 3 drop-off.
- It yields the highest net revenue across the program duration.
- Based on redemption and retention rates, the business can **break even** with a realistic number of retained learners.

**Note** Avoid rolling out incentives **without ensuring retention impact** as shown in Scenario 1, this results in pure cost with no return.