# Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher

This notebook processes the sample and estimates some basic descriptives used by John Rust 1987.
> John Rust (1987). [Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher](https://doi.org/10.2307/1911259). *Econometrica*, Vol. 55, No.5, 999-1033.

The data is taken from the NFXP software provided by [Rust](https://editorialexpress.com/jrust/nfxp.html) which is available to download [here](https://github.com/OpenSourceEconomics/harold_zurcher_data). 

## Preparations

First the provided asc files are converted into dataframes, labeled accordingly to the 8 Groups Rust 
defines in his paper, saved as pickle files and for futher use stored in a dictionary. 

In [8]:
import pandas as pd

In [9]:
dict_df = dict()
"""
The following dictionaries contain details on the raw data described by Rust in his documentation.
This Information is used to create 8 dataframes, each containing one of the Busgroups used in the paper.
Also the first 11 columns contain bus specific information on purchase,engine replacement etc.
For further information, see the documnetation.
"""
dict_data = {'g870': [36, 15, 'Group 1'], 'rt50': [60, 4, 'Group 2'],
             't8h203': [81, 48, 'Group 3'], 'a452372': [137, 18, 'Group 8'],
             'a452374': [137, 10, 'Group 6'], 'a530872': [137, 18, 'Group 7'],
             'a530874': [137, 12, 'Group 5'], 'a530875': [128, 37, 'Group 4']}
re_col = {1: 'Bus_ID', 2: "Month_pur", 3: "Year_pur", 4: "Month_1st", 5: "Year_1st", 6: "Odo_1st",
          7: "Month_2nd", 8: "Year_2nd", 9: "Odo_2nd", 10: "Month_begin", 11: "Year_begin"}
for keys in dict_data:
    r = dict_data[keys][0]
    c = dict_data[keys][1]
    f_raw = open('data/' + keys + '.asc').read()
    f_col = f_raw.split('\n')
    df = pd.DataFrame()
    for j in range(0, c):
        for k in range(j * r, (j + 1) * r):
            df.loc[(k - j * r) + 1, j + 1] = float(f_col[k])
    df = df.transpose()
    df = df.rename(columns=re_col)
    df['Bus_ID'] = df['Bus_ID'].astype(str)
    df = df.set_index('Bus_ID')
    if len(dict_data[keys]) == 3:
        dict_df[dict_data[keys][2]] = df
        df.to_pickle('./pkl/' + dict_data[keys][2] + '.pkl')
    else:
        df.to_pickle('./pkl/' + keys + '.pkl')

## Descriptives

The first two tables of the paper will be paritally replicated. 

### Odometer at Engine Replacement
The first table in Rust's paper describes the milage on which a engine replacement occured. As there are buses, which had a second replacement during the time of the observation, the record of the second replacement will be reduced by the milage of the first, to get the real life time milage of an engine.

In [10]:
df = pd.DataFrame()
for j, i in enumerate(sorted(dict_df.keys())):
    df2 = dict_df[i][['Odo_1st']][dict_df[i]['Odo_1st'] > 0]
    df2 = df2.rename(columns={'Odo_1st': i})
    df3 = dict_df[i][['Odo_2nd']].sub(dict_df[i]['Odo_1st'], axis=0)[dict_df[i]['Odo_2nd'] > 0]
    df3 = df3.rename(columns={'Odo_2nd': i})
    df3 = df3.set_index(df3.index + '_2')
    df4 = pd.concat([df2, df3])
    if j == 0:
        df = df4.describe()
    else:
        df = pd.concat([df, df4.describe()], axis=1)
df = df.transpose()
df = df.drop(df.columns[[4, 5, 6]], axis=1)
df[['max', 'min', 'mean', 'std', 'count']].fillna(0).astype(int)

Unnamed: 0,max,min,mean,std,count
Group 1,0,0,0,0,0
Group 2,0,0,0,0,0
Group 3,273400,124800,199733,37459,27
Group 4,387300,121300,257336,65477,33
Group 5,322500,118000,245290,60257,11
Group 6,237200,82400,150785,61006,7
Group 7,331800,121000,208962,48980,27
Group 8,297500,132000,186700,43956,19


### Never failing buses

The following descriptive uses buses, which never had an engine replacement. Therefore this data is left-censored, as the econometrican never observes the time of replacement. The table shows the variation in the odometer record at the end of the observation period.

In [11]:
df = pd.DataFrame()
for i in sorted(dict_df.keys()):
    df2 = dict_df[i][[dict_df[i].columns.values[-1]]][dict_df[i]['Odo_1st'] == 0]
    df2 = df2.rename(columns={df2.columns.values[0]: i})
    if j == 0:
        df = df2.describe()
    else:
        df = pd.concat([df, df2.describe()], axis=1)
df = df.transpose()
df = df.drop(df.columns[[4, 5, 6]], axis=1)
df[['max', 'min', 'mean', 'std', 'count']]

Unnamed: 0,max,min,mean,std,count
Group 1,120151.0,65643.0,100116.666667,12929.488359,15.0
Group 2,161748.0,142009.0,151182.5,8529.85121,4.0
Group 3,280802.0,199626.0,250766.428571,21324.86938,21.0
Group 4,352450.0,310910.0,337221.6,17802.375327,5.0
Group 5,326843.0,326843.0,326843.0,,1.0
Group 6,299040.0,232395.0,265263.666667,33331.770135,3.0
Group 7,,,,,0.0
Group 8,,,,,0.0
