# Weeks 10-11: Renewal-Reward Processes

**Objective:** Understand how to model systems with costs or rewards associated with renewal cycles and apply the Renewal-Reward Theorem to find long-run average performance.

## Step 1: Build Intuition

In Week 10, we modeled the *timing* of recurring events (renewals). Now, let's add another layer: what if each cycle of this process has a **reward** or **cost** associated with it?

Imagine a machine on a factory floor. 
1. It runs for a certain amount of time, generating profit (a "reward").
2. It eventually breaks down.
3. It takes some time to repair it (a "costly" period).
4. Once repaired, the cycle begins anew (a "renewal").

We are often interested in the **long-run average reward per unit of time**. For the factory, this would be the average profit per hour, accounting for both productive uptime and costly downtime. A **renewal-reward process** is the perfect tool to analyze this.

## Step 2: Understand the Core Idea

A renewal-reward process is built on top of a standard renewal process. 

1.  **Renewal Cycles:** The system goes through cycles, where the end of one cycle is the start of the next (a renewal). The length of the \(n\)-th cycle is \(T_n\), and these lengths are IID random variables.
2.  **Rewards per Cycle:** In each cycle \(n\), a certain reward \(R_n\) is earned. The pairs \((T_n, R_n)\) are IID for all cycles.

The core idea is that the long-run average reward is simply the **average reward you get in a cycle** divided by the **average length of a cycle**. This simple and powerful idea is formalized by the Renewal-Reward Theorem.

## Step 3: Learn the Definitions and Formulas

**Definition: Renewal-Reward Process**
Let \({N(t), t \ge 0}\) be a renewal process with inter-arrival times \(T_1, T_2, ...\). Let \(R_1, R_2, ...\) be a sequence of IID random variables representing the reward from each cycle. The total reward accumulated by time \(t\) is:
$$ C(t) = \sum_{n=1}^{N(t)} R_n $$

--- 

**Key Result: The Renewal-Reward Theorem**
This theorem states that the long-run average reward per unit time is the expected reward per cycle divided by the expected length of a cycle.
$$ \lim_{t \to \infty} \frac{E[C(t)]}{t} = \frac{E[R_n]}{E[T_n]} $$

This holds if \(E[|R_n|] < \infty\) and \(E[T_n] < \infty\). This theorem is incredibly useful because calculating \(E[R_n]\) and \(E[T_n]\) is often much easier than analyzing the entire process \(C(t)\) directly.

## Step 4: Apply and Practice

Let's use our machine example to see the theorem in action.

**Scenario:** A machine's operating time (uptime) is an **Exponential** random variable with a mean of 40 hours (rate \(\lambda_{fail} = 1/40\)). When it fails, the time to repair it is also **Exponential**, but with a mean of 10 hours (rate \(\lambda_{repair} = 1/10\)).

**Problem:** What is the long-run availability of the machine (i.e., the long-run fraction of time the machine is operational)?

We can frame this as a renewal-reward problem:
- A **cycle** is one full period of uptime followed by downtime.
- The **reward** for a cycle is the amount of time the machine was operational during that cycle.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

plt.style.use('seaborn-v0_8-whitegrid')

### Part A: Theoretical Calculation

Let's use the Renewal-Reward Theorem. We need \(E[R_n]\) and \(E[T_n]\).

- **Cycle Length (\(T_n\)):** A cycle consists of one uptime and one downtime. \(T_n = \text{Uptime}_n + \text{Downtime}_n\).
  - \(E[T_n] = E[\text{Uptime}_n] + E[\text{Downtime}_n] = 40 \text{ hours} + 10 \text{ hours} = 50 \text{ hours}\).
- **Reward (\(R_n\)):** The reward in a cycle is the uptime. \(R_n = \text{Uptime}_n\).
  - \(E[R_n] = E[\text{Uptime}_n] = 40 \text{ hours}\).

By the theorem, the long-run average reward per hour (which is the availability) is:
$$ \text{Availability} = \frac{E[R_n]}{E[T_n]} = \frac{40}{50} = 0.8 $$ 

So, we expect the machine to be operational 80% of the time in the long run.

### Part B: Verifying with Simulation

Now, let's simulate the machine's life over a very long period and see if the fraction of time it's up converges to 0.8.

In [None]:
def simulate_machine_availability(mean_uptime, mean_downtime, num_cycles):
    """
    Simulates the uptime and downtime of a machine over many cycles.
    
    Args:
        mean_uptime (float): Mean of the exponential uptime distribution.
        mean_downtime (float): Mean of the exponential downtime distribution.
        num_cycles (int): The number of cycles to simulate.
        
    Returns:
        tuple: (cycle_numbers, cumulative_availability)
    """
    # Generate all uptimes and downtimes at once for efficiency
    uptimes = np.random.exponential(scale=mean_uptime, size=num_cycles)
    downtimes = np.random.exponential(scale=mean_downtime, size=num_cycles)
    
    # Calculate cumulative sums
    total_uptime = np.cumsum(uptimes)
    total_time = np.cumsum(uptimes + downtimes)
    
    # Calculate availability at the end of each cycle
    cumulative_availability = total_uptime / total_time
    
    cycle_numbers = np.arange(1, num_cycles + 1)
    
    return cycle_numbers, cumulative_availability

# --- Simulation Parameters ---
MEAN_UPTIME = 40
MEAN_DOWNTIME = 10
N_CYCLES = 5000
THEORETICAL_AVAILABILITY = MEAN_UPTIME / (MEAN_UPTIME + MEAN_DOWNTIME)

# Run the simulation
cycles, availability = simulate_machine_availability(MEAN_UPTIME, MEAN_DOWNTIME, N_CYCLES)

# Plot the result
plt.figure(figsize=(12, 6))
plt.plot(cycles, availability, label='Simulated Availability')
plt.axhline(THEORETICAL_AVAILABILITY, color='red', linestyle='--', label=f'Theoretical Limit ({THEORETICAL_AVAILABILITY:.2f})')
plt.title('Convergence of Machine Availability')
plt.xlabel('Number of Cycles')
plt.ylabel('Cumulative Availability (Total Uptime / Total Time)')
plt.xscale('log') # Use a log scale for x-axis to see early convergence better
plt.legend()
plt.grid(True, which='both')
plt.show()

print(f"Availability after {N_CYCLES} cycles: {availability[-1]:.4f}")
print(f"Theoretical long-run availability: {THEORETICAL_AVAILABILITY:.4f}")

**Interpretation:**

The plot shows that while the availability fluctuates randomly in the early cycles, it quickly converges towards the theoretical long-run value of 0.8 as the number of cycles increases. This is a powerful demonstration of the Renewal-Reward Theorem.

## Summary & Next Steps

In this notebook, we've explored the powerful Renewal-Reward framework:
1.  A **renewal-reward process** associates a cost or reward with each cycle of a renewal process.
2.  The **Renewal-Reward Theorem** provides a simple and elegant way to calculate the long-run average reward per unit time: \(E[R_n] / E[T_n]\).
3.  This allows us to analyze complex systems by focusing only on the expected values within a single, representative cycle.

This theorem is a cornerstone of applied probability, used extensively in operations research, reliability engineering, and performance analysis.

In **Weeks 12-13**, we will move on to **Queuing Models**, which are essential for analyzing waiting lines. Many queuing systems can be analyzed using renewal-reward concepts.