To find the optimal policy and the average cost $\gamma$, we use Howard's policy iteration algorithm.

We define a system of linear equations for the policy evaluation step. For a chosen policy $u$ (where we decide to either keep or sell at each age), the potential values $f_j$ and average cost $\gamma$ satisfy
\begin{equation}
    \gamma + f_j = g_j(u) + \sum_{k} p_{jk}(u) f_k.
\end{equation}

In [65]:
# @title Data
import pandas as pd
import numpy as np
import io

csv_data_1 = """
j   Age     Purchase    Trade   Operating   Survival
1   0       5000        3500    860         0.963
2   2       3150        2170    1025        0.794
3   4       2285        1500    1225        0.568
4   6       1545        900     1430        0.255
5   8       1050        590     1815        0.001
6   10      600         330     2240        0.000
"""

csv_data_2 = """
j   Age     Purchase    Trade   Operating   Survival
1   0       5000        3500    200         0.999
2   0.5     4285        3000    210         0.995
3   1       3750        2650    220         0.990
4   1.5     3430        2375    230         0.979
5   2       3150        2170    240         0.968
6   2.5     2900        1950    250         0.956
7   3       2645        1850    260         0.936
8   3.5     2475        1625    275         0.917
9   4       2285        1500    290         0.898
10  4.5     2130        1350    300         0.879
11  5       1970        1225    315         0.860
12  5.5     1760        1060    320         0.836
13  6       1545        900     335         0.801
14  6.5     1400        780     350         0.761
15  7       1260        700     365         0.697
16  7.5     1140        625     380         0.600
17  8       1050        590     400         0.482
18  8.5     940         520     430         0.300
19  9       830         470     465         0.129
20  9.5     720         400     520         0.020
21  10      600         330     560         0.000
"""

In [66]:
class CarReplacementModel:
    '''
    Encapsulates the state space, parameters, and transition logic
    for the Car Replacement Problem.
    '''
    def __init__(self, csv_string):
        df = pd.read_csv(io.StringIO(csv_string), sep="\\s+")

        self.j = df['j'].values
        self.ages = df['Age'].values
        self.op_costs = df['Operating'].values
        self.trade_ins = df['Trade'].values
        self.survival_probs = df['Survival'].values

        # Constants extracted from age 0 (New Car)
        self.new_car_price = 5000
        self.new_car_op_cost = 860
        self.new_car_survival = 0.963

        # State definitions
        self.n_ages = len(self.ages)
        self.omega_state = self.n_ages
        self.total_states = self.n_ages + 1

        # Actions
        self.ACTION_KEEP = 0
        self.ACTION_REPLACE = 1

    def get_transition_probs(self, state, action):
        '''
        Returns the probability distribution over next states.
        '''
        probs = np.zeros(self.total_states)

        if state == self.omega_state:
            probs[0] = self.new_car_survival
            probs[self.omega_state] = 1.0 - self.new_car_survival
            return probs

        if action == self.ACTION_REPLACE:
            probs[0] = self.new_car_survival
            probs[self.omega_state] = 1.0 - self.new_car_survival
            return probs

        if state == self.n_ages - 1:
            probs[self.omega_state] = 1.0
        else:
            p = self.survival_probs[state]
            probs[state + 1] = p
            probs[self.omega_state] = 1.0 - p

        return probs

    def get_instant_cost(self, state, action):
        '''
        Returns the immediate expected cost.
        '''
        if state == self.omega_state:
            return self.new_car_price + self.new_car_op_cost

        if action == self.ACTION_KEEP:
            return self.op_costs[state]

        return self.new_car_price - self.trade_ins[state] + self.new_car_op_cost

In [67]:
def build_linear_system(model, policy):
    '''
    Constructs Matrix A and Vector b for the equation:
    gamma + f_i - sum(p_ij * f_j) = g_i
    Normalise f_0 = 0.
    '''
    A_rows = []
    b_vec = []

    for i in range(model.total_states):
        u = policy[i]
        probs = model.get_transition_probs(i, u)
        cost = model.get_instant_cost(i, u)

        # Solve for x = [gamma, f_1, f_2, ..., f_omega]
        # Size of row is total_states since f_0 is removed but gamma is added
        row = np.zeros(model.total_states)

        # Coefficient for gamma is always 1
        row[0] = 1.0

        # Coefficient for f_i
        if i != 0:
            row[i] += 1.0

        # Coefficient for -sum(p_ij * f_j)
        for k in range(1, model.total_states):
            row[k] -= probs[k]

        A_rows.append(row)
        b_vec.append(cost)

    return np.array(A_rows), np.array(b_vec)

def evaluate_policy(model, policy):
    '''
    Solves the linear system to find average cost (gamma)
    and relative values (f) for the given policy.
    '''
    A, b = build_linear_system(model, policy)

    try:
        solution = np.linalg.solve(A, b)
    except np.linalg.LinAlgError:
        raise ValueError("Singular matrix. Policy might be invalid.")

    gamma = solution[0]

    # Reconstruct full f vector including f_0 = 0
    f_values = np.zeros(model.total_states)
    f_values[1:] = solution[1:]

    return gamma, f_values

def improve_policy(model, f_values, current_policy):
    '''
    Greedy step: Finds the action that minimises Q-value for each state.
    Returns the new policy and a boolean indicating if it changed.
    '''
    new_policy = np.zeros(model.total_states, dtype=int)
    is_stable = True

    for state in range(model.total_states):
        # Force Replace for Omega state
        if state == model.omega_state:
            new_policy[state] = model.ACTION_REPLACE
            continue

        # Calculate Q-values
        # Q(s, a) = Cost(s,a) + Sum( P(s'|s,a) * f(s') )

        # Try KEEP
        probs_k = model.get_transition_probs(state, model.ACTION_KEEP)
        cost_k = model.get_instant_cost(state, model.ACTION_KEEP)
        q_keep = cost_k + np.dot(probs_k, f_values)

        # Try REPLACE
        probs_r = model.get_transition_probs(state, model.ACTION_REPLACE)
        cost_r = model.get_instant_cost(state, model.ACTION_REPLACE)
        q_replace = cost_r + np.dot(probs_r, f_values)

        # Choose minimum and use epsilon for float comparison stability
        if q_keep < q_replace - 1e-9:
            best_action = model.ACTION_KEEP
        else:
            best_action = model.ACTION_REPLACE

        new_policy[state] = best_action

        if new_policy[state] != current_policy[state]:
            is_stable = False

    return new_policy, is_stable

def run_policy_iteration(model):
    '''
    Main loop for Howard's Policy Iteration.
    '''
    # Initialise policy, start with always replace
    policy = np.ones(model.total_states, dtype=int) * model.ACTION_REPLACE

    iteration = 0
    while True:
        iteration += 1

        # Policy Evaluation
        gamma, f_values = evaluate_policy(model, policy)

        # Policy Improvement
        new_policy, is_stable = improve_policy(model, f_values, policy)

        if is_stable:
            print("=" * 40)
            print(f"Converged after {iteration} iterations.")
            return policy, gamma

        policy = new_policy

In [69]:
def print_results(model, policy, gamma):
    print(f"Optimal Average Cost (gamma): {gamma:.2f}")
    print("-" * 40)
    print(f"{'State':<15} | {'Optimal Action':<15}")
    print("-" * 40)

    actions = {0: "Keep", 1: "Replace"}

    for i in range(model.n_ages):
        age_label = f"{model.j[i]}"
        print(f"{age_label:<15} | {actions[policy[i]]:<15}")

    print(f"{'Written-off':<15} | {actions[policy[model.omega_state]]:<15}")


car_model_1 = CarReplacementModel(csv_data_1)
optimal_policy_1, optimal_gamma_1 = run_policy_iteration(car_model_1)
print_results(car_model_1, optimal_policy_1, optimal_gamma_1)
print("=" * 40)
car_model_2 = CarReplacementModel(csv_data_2)
optimal_policy_2, optimal_gamma_2 = run_policy_iteration(car_model_2)
print_results(car_model_2, optimal_policy_2, optimal_gamma_2)

Converged after 3 iterations.
Optimal Average Cost (gamma): 2243.77
----------------------------------------
State           | Optimal Action 
----------------------------------------
1               | Keep           
2               | Keep           
3               | Keep           
4               | Keep           
5               | Replace        
6               | Replace        
Written-off     | Replace        
Converged after 3 iterations.
Optimal Average Cost (gamma): 724.71
----------------------------------------
State           | Optimal Action 
----------------------------------------
1               | Keep           
2               | Keep           
3               | Keep           
4               | Keep           
5               | Keep           
6               | Keep           
7               | Keep           
8               | Keep           
9               | Keep           
10              | Keep           
11              | Keep           
12              | Kee

It is possible that the optimal policy dictates replacing a car at age $n$, but if that opportunity is missed, the optimal policy at age $n+1$ dictates keeping the car. This counter-intuitive scenario arises when the trade-in value $T_j$ is not linear, but rather undergoes a sharp drop between age $n$ and $n+1$, and then stabilises.

The decision to replace a car is essentially a comparison between the marginal cost of keeping, i.e., the operating costs for the next year plus the depreciation over the next year, versus, the average cost of a new car, i.e. the long-run average cost of starting over with a new vehicle.

If the trade-in value drops largely between year $n$ and $n+1$, then the marginal cost of keeping at age $n$ is extremely high and dominated by the loss in resale value. This forces a replace decision. However, once we reach age $n+1$, that massive depreciation has already occurred. It is now a sunk cost. If the depreciation between $n+1$ and $n+2$ is small, then the marginal cost of keeping for the next year might be lower than the average cost of a new car. This forces a keep decision.

There is even an example where the purchase price, trade-in price, operating cost, and survival probability are all monotone. Consider the deterministic example (survival probability $=1$) with the new car price equal to $6000$ and for a standard replacement cycle, the average cost is $1200$ per year. For $j \in \{0, 1, 2, 3\}$, let $(O_j)_j = (0, 200, 400, 600)$ and $(T_j)_{j} = (500, 4000, 100, 100)$. Then the depreciation in the next year is $(D_j)_j = (T_j - T_{j+1})_j$ and the marginal cost to keep is $O_j + D_j$.

1.  At age $1$:
    *   If we keep until age $2$, then we pay $200$ in operating costs, but we lose $4000$ in resale value, giving a total cost for this year of $4200$.
    *   If we replace, then we switch to a policy with an average cost of $1200$.
    *   Since $4200 > 1200$, the optimal policy is replace to avoid the depreciation cliff.

2.  At age $2$:
    *   If we keep until age $3$, then we pay $400$ in operating costs. The car only loses $100$ in value. Thus, the total cost for this year is $500$.
    *   If we replace, then we pay the high cost to enter a policy with an average cost of $1200$.
    *   Since $500 < 1200$, the optimal policy is now keep.