Car owners are haunted by the following problem. Every day, the operating cost for their car increases, as does the probability that the car breaks down. Even worse, when trading in the car for a different one dealers will pay less for older cars and charge more for newer ones. The problem, then, is to find an optimal policy for trading in the car.

We model the problem as a Markov decision process. Let $g_j(u)$ be the instantaneous cost incurred if one takes action $u$ in state $j$ and let $p_{jk}(u)$ be the probability of then moving to state $k$. Define sequences $\gamma^{(n)}$, $f_j^{(n)}$, $u_j^{(n)}$ by the recursions
\begin{equation}
    \gamma^{(n)} + f_j^{(n)} = g_j(u_j^{(n)}) + \sum_k p_{jk}(u_j^{(n)})f_k^{(n)}
\end{equation}
and $u_j^{(n+1)}$ is the $u$-value minimising
\begin{equation}
    g_j(u) + \sum_k p_{jk}(u)f_k^{(n)}.
\end{equation}
Consider the following stationary policy: for fixed $n$, whenever state $j$ occurs take action $u_j^{(n)}$. The recursion is the standard Poisson equation for a fixed policy $\pi$ that prescribes action $a$ in state $j$. The the long-run average cost $g$ and the relative value function $h_j$ satisfy the relationship
\begin{equation}
    \text{Average Cost} + \text{Relative Value}(j) = \text{Immediate Cost} + \text{Expected Future Relative Value},
\end{equation}
or in other words,
\begin{equation}
    g + h_j = c(j, \pi(j)) + \sum_{k} p_{jk}(\pi(j)) h_k.
\end{equation}
By comparing the recursion to the standard form:
*   The fixed policy is defined by the actions $u_j^{(n)}$.
*   The immediate cost term is $g_j(u_j^{(n)})$.
*   The relative value function (bias) corresponds to $f_j^{(n)}$.
*   The constant term $\gamma^{(n)}$ corresponds to the long-term average cost $g$.

Since $\gamma^{(n)}$ is a constant scalar independent of the state $j$ that satisfies this recursive balance equation, it represents the average cost per stage for the Markov chain induced by the policy $u^{(n)}$.

The algorithm described by the recursion equations is known as Howard's policy iteration algorithm. The first equation is the policy evaluation step where the average cost $\gamma^{(n)}$ of the current policy is calculated, and the second equation is the policy improvement step.

Note that the values $f_j^{(n)}$ are arbitrary up to an additive constant and can be normalised, for example by letting $f_1^{(n)} = 0$. If the matrix of transition probabilities is irreducible in every stage, then there will always be a solution for $f$. The sequence $\gamma^{(n)}$ is non-increasing, and will converge to a minimum value $\gamma$ in a finite number of steps if $u$ can take
only a finite number of values. The policy $u_j^{(n)}$ will then have converged to an average optimal policy.

To fully define the Markov decision process for the car replacement problem, we need to define the state space, action space, instantaneous costs, and transition probabilities.

1.  State space $S$:

    Let the state $j$ represent the condition of the car at the beginning of a decision period. Let states $\\{0, 1, \dots, M\\}$ represent the age of a working car in time units, where $0$ is a brand new car and $M$ is the maximum possible age. Let state $\omega$ represent the write-off state where the car has broken down. The state space is then $S = \\{0, 1, \dots, M, \omega \\}$.

2.  Action space $U$:

    At each state, the owner must make a decision $u$. Either $u = K$ keep the current car for the next period or $u = R$ replace the current car immediately. In state $\omega$, the only feasible action is $R$, or we can assign an infinite cost to keeping a broken car).

3.  Parameters:
    
    Before defining the functions, we define the economic and physical parameters:
    *   $O_j$: The operating cost for a car of age $j$ for one period.
    *   $P$: The purchase price of a new car.
    *   $T_j$: The trade-in value of a car of age $j$. Note that $T_\omega = 0$.
    *   $\pi_j$: The probability that a car of age $j$ breaks down during the period.

4. Instantaneous costs $g_j(u)$:

    The cost $g_j(u)$ depends on the current state $j$ and the action chosen.
    If the action is keep $u = K$, then we pay the operating cost for the current car
    \begin{equation}
        g_j(K) =
        \begin{cases}
             O_j & j \in \{0, 1, \dots M\}, \\
             \infty & j = \omega.
        \end{cases}
    \end{equation}
    If the action is replace $u = R$, then we pay the price of the new car minus the trade-in value of the old car. We also incur the operating cost of the new car for that period (assuming replacement happens at the start of the period).
    \begin{equation}
        g_j(R) =
        \begin{cases}
            (P - T_j) + O_0 & j \in \{0, 1, \dots M\}, \\
            P + O_0 & j = \omega.
        \end{cases}
    \end{equation}

5. Transition probabilities $p_{jk}(u)$

    These define the probability of moving from state $j$ to state $k$ given action $u$. If the action is keep $u = K$, then the car ages by one unit unless it breaks down,

    \begin{equation}
        p_{j, k}(K) =
        \begin{cases}
            1 - \pi_j & k = j + 1, \\
            \pi_j & k = \omega, \\
            0 & \text{otherwise}.
        \end{cases}
    \end{equation}

    If the action is replace $u = R$, then we instantly swap to a new car. The state for the next period depends on whether this new car survives its first period,

    \begin{equation}
        p_{j, k}(R) =
        \begin{cases}
            1 - \pi_0 & k = 1, \\
            \pi_0 & k = \omega.
        \end{cases}
    \end{equation}

    Alternatively, if we assume that the state transition is deterministic to new car and aging happens in the next step, then we could simplify to $p_{j,0}(R) = 1$.

    For the write-off state $\omega$, since we must replace

    \begin{equation}
        p_{\omega, k}(R) =
        \begin{cases}
            1 - \pi_0 & k = 1, \\
            \pi_0 & k = \omega.
        \end{cases}
    \end{equation}