# Estimating Harold Zurcher model

“Dynamic programming and structural estimation” mini course

Fedor Iskhakov

Reading: **Rust (1987) "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher"**

## Bellman equation for the Harold Zurcher problem

\begin{equation}
V(x) = \max_{d\in C} \big\{ u(x,d) + \beta E\big[ V(x')\big|x,d\big] \big\}
\end{equation}

$C = \{0,1\} = \{\text{keep},\text{replace}\}$

\begin{equation}
    \ u(x_{t},d_t,\theta_1)=\left \{ 
    \begin{array}{ll}
        -RC-c(0,\theta_1) & \text{if }d_{t}=1 \\ 
        -c(x_{t},\theta_1) & \text{if }d_{t}=0%
    \end{array} \right.
\end{equation}

$x_{t+1} \sim F(x_t,d_t)$

## Rust assumptions

1. Additive separability in preferences (**AS**)

$$
u(x,\varepsilon,d) = u(x,d) + \varepsilon[d],
$$

2. Conditional independence (**CI**)

$$
p(x',\varepsilon'|x,\varepsilon,d) = q(\varepsilon'|x')\cdot \pi(x'|x,d)
$$

3. Extreme value Type I (EV1) distribution of $\varepsilon$ (**EV**)

\begin{equation}
V(x,\varepsilon) = \max_{d\in C} \big\{ u(x,d) + \beta 
E\big[ V(x',\varepsilon')\big|x,d\big]
+ \varepsilon[d] \big\}
\end{equation}


\begin{equation}
v(x,d) = u(x,d) + \beta E\big[ V(x',\varepsilon')\big|x,d\big]
\end{equation}

\begin{equation}
V(x',\varepsilon') = \max_{d\in C} \big\{ v(x',d) + \varepsilon'[d] \big\}
\end{equation}

\begin{equation}
E\big[ V(x',\varepsilon')\big|x,d\big] = 
\int_{X} \log \big( \exp[v(x',0)] + \exp[v(x',1)] \big) \pi(x'|x,d) dx'
\end{equation}



## Bellman equation in expected value function space

Let $EV(x,d)$ denote the expected value function, then we have

\begin{equation}
EV(x,d) = \int_{X} \log \big( \exp[u(x',0) + \beta EV(x',0)] + \exp[u(x',1) + \beta EV(x',1)] \big) \pi(x'|x,d) dx'
\end{equation}

In the form of the operator

$$
T^*(EV)(x,d) \equiv \int_{X} \log \big( \exp[u(x',0) + \beta EV(x',0)] + \exp[u(x',1) + \beta EV(x',1)] \big) \pi(x'|x,d) dx'
$$

Solution to the Bellman functional equation $EV(x,d)$ is also a fixed point of $T^*$ operator, $T^*(EV)(x,d)=EV(x,d)$




## Choice probabilities in Zurcher model

Once the fixed point is found, the *optimal* choice probability $P(d|x)$ is given by the Logit structure (assumption EV):

\begin{equation}
P(d|x) = 
\frac{\exp[v(x,d)]}{\sum_{d'\in C} \exp[v(x,d')]} =
\frac{\exp[u(x,d) + \beta EV(x,d)]}{\sum_{d'\in C} \exp[u(x,d') + \beta EV(x,d')]}
\end{equation}

The choice probability serve as the bases for forming the likelihood function.

## Data

- Harold Zurcher’s Maintenance records of 162 busses in 8 groups
- Monthly observations of mileage on each bus (odometer reading)
- Data on maintenance operations:
    1. Routine periodic maintenance (i.e. brake adjustment)
    2. Replacement or repair at time of failure
    3. Major engine overhaul and/or replacement (*the focus of the paper*)
    
Data on $(x_{i,t},d_{i,t})$ where $x_{i,t}$ is discretized mileage (bin indexes), and $d_{i,t}$ is the observed choice at this mileage for each bus $i$ in each month $t$ 
    

## Likelihood function

\begin{equation}
L(\theta, EV_\theta) = \prod_{i=1}^{162}\prod_{t=2}^{T_i} P(d_{i,t}|x_{i,t}) p(x_{i,t}|x_{i,t-1},d_{i,t-1})
\end{equation}

\begin{equation}
logL(\theta,EV_\theta) = \sum_{i=1}^{162}\sum_{t=2}^{T_i} \big ( \log P(d_{i,t}|x_{i,t}) + \log p(x_{i,t}|x_{i,t-1},d_{i,t-1}) \big)
\end{equation}



## MLE estimator

\begin{equation}
\theta^* = \arg\max_\theta logL(\theta, EV_{\theta})
\end{equation}

Unconstrained optimiztion, but retires the computation of $EV_{\theta}$ for each value of parameter $\theta$



## Nested loop

**Outer loop** = Hill-climbing algorithm
- Likelihood function $L(\theta,EV_{\theta})$ is maximized with respect to $\theta$
- Quasi-Newton algorithm with approximation of Hessian (BHHH, BFGS)
- Each evaluation of $L(\theta,EV_{\theta})$ requires dynamic programming solution for $EV_{\theta}$

**Inner loop** = Fixed point algorithm
- Solver for the fixed point of the Bellman operator $EV_{\theta} = \Gamma(EV_{\theta}) $
- Successive approximations + Newton-Kantorovich iterations

## Important details

1. **Performance:** Newton method to maximize likelihood
1. **Numerical stability:** recenter smooth maximum (logsum) and choice probabilities
2. **Analytical gradients:** not too hard to compute for inner loop (Frechet derivative) and outer loop (using implicit function theorem and chain rule)
3. **Use BHHH:** outer product of gradian approximation for Hessian is always positive semidefinie
4. **Further info:** NFXP manual (see `papers\` directory)

## Implementation

We will code up the estimator in the next lab