# Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher

This notebook processes the sample and estimates some basic descriptives used by John Rust 1987.
> John Rust (1987). [Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher](https://doi.org/10.2307/1911259). *Econometrica*, Vol. 55, No.5, 999-1033.

The data is taken from the NFXP software provided by [Rust](https://editorialexpress.com/jrust/nfxp.html) which is available to download [here](https://github.com/OpenSourceEconomics/harold_zurcher_data). 

## Preparations

First the provided asc files are converted into dataframes, labeled accordingly to the 8 Groups Rust 
defines in his paper, saved as pickle files and for futher use stored in a dictionary. 

In [47]:
import pandas as pd
import numpy as np
dict_df=dict()
for l,i in enumerate(['d309','g870','rt50','t8h203','a452372','a452374','a530872','a530874','a530875']):
    f_raw=open('nfxp/dat/'+i+'.asc').read()
    if i=='d309':
        r=110
        c=4
    if i=='g870':
        r=36
        c=15
        i='Group 1'
    if i=='rt50':
        r=60
        c=4
        i='Group 2'
    if i=='t8h203':
        r=81
        c=48
        i='Group 3'
    if i=='a452372':
        r=137
        c=18
        i='Group 8'
    if i=='a452374':
        r=137
        c=10
        i='Group 6'
    if i=='a530872':
        r=137
        c=18
        i='Group 7'
    if i=='a530874':
        r=137
        c=12
        i='Group 5'
    if i=='a530875':
        r=128
        c=37
        i='Group 4'
    f_col=f_raw.split('\n')
    df=pd.DataFrame()
    for j in range (0,c):
        for k in range(j*r,(j+1)*r):
            df.loc[(k-j*r)+1,j+1]=float(f_col[k]) 
    df=df.transpose()
    df=df.rename(columns={1: 'Bus_ID', 2: "Month_pur", 3: "Year_pur",
                                  4: "Month_1st",5: "Year_1st",
                                  6: "Odo_1st",7: "Month_2nd",
                                  8: "Year_2nd",9: "Odo_2nd",
                                  10: "Month_begin" ,11: "Year_begin"})
    df['Bus_ID'] = df['Bus_ID'].astype(int)
    df['Bus_ID'] = df['Bus_ID'].astype(str)
    df=df.set_index('Bus_ID')
    df.to_pickle('./pkl/'+i+'.pkl')
    if l>0:
        dict_df[i]=df
  
        


## Descriptives

The first two tables of the paper will be paritally replicated. 

### Odometer at Engine Replacement
The first table in Rust's paper describes the milage on which a engine replacement occured. As there are buses, which had a second replacement during the time of the observation, the record of the second replacement will be reduced by the milage of the first, to get the real life time milage of an engine.

In [48]:
df=pd.DataFrame()
for j,i in  enumerate(sorted(dict_df.keys())):
    df2=dict_df[i][['Odo_1st']][dict_df[i]['Odo_1st']>0]
    df2=df2.rename(columns={'Odo_1st':i})
    df3=dict_df[i][['Odo_2nd']].sub(dict_df[i]['Odo_1st'],axis=0)[dict_df[i]['Odo_2nd']>0]
    df3=df3.rename(columns={'Odo_2nd':i})
    df3=df3.set_index(df3.index+'_2')
    df4=pd.concat([df2,df3])
    if j==0:
        df=df4.describe()
        
    else:
        df=pd.concat([df,df4.describe()],axis=1)
df=df.transpose()
df=df.drop(df.columns[[4, 5, 6]], axis=1)
df[['max','min','mean','std','count']]

Unnamed: 0,max,min,mean,std,count
Group 1,,,,,0.0
Group 2,,,,,0.0
Group 3,273400.0,124800.0,199733.333333,37459.413934,27.0
Group 4,387300.0,121300.0,257336.363636,65477.003683,33.0
Group 5,322500.0,118000.0,245290.909091,60257.870101,11.0
Group 6,237200.0,82400.0,150785.714286,61006.814608,7.0
Group 7,331800.0,121000.0,208962.962963,48980.923666,27.0
Group 8,297500.0,132000.0,186700.0,43956.127117,19.0


### Never failing buses

The following descriptive uses buses, which never had an engine replacement. Therefore this data is left-censored, as the econometrican never observes the time of replacement. The table shows the variation in the odometer record at the end of the observation period.

In [49]:
df=pd.DataFrame()
for i in sorted(dict_df.keys()):
    df2=dict_df[i][[dict_df[i].columns.values[-1]]][dict_df[i]['Odo_1st']==0]
    df2=df2.rename(columns={df2.columns.values[0]:i})
    if j==0:
        df=df2.describe()
    else:
        df=pd.concat([df,df2.describe()],axis=1)
df=df.transpose()
df=df.drop(df.columns[[4, 5, 6]], axis=1)
df[['max','min','mean','std','count']]

Unnamed: 0,max,min,mean,std,count
Group 1,120151.0,65643.0,100116.666667,12929.488359,15.0
Group 2,161748.0,142009.0,151182.5,8529.85121,4.0
Group 3,280802.0,199626.0,250766.428571,21324.86938,21.0
Group 4,352450.0,310910.0,337221.6,17802.375327,5.0
Group 5,326843.0,326843.0,326843.0,,1.0
Group 6,299040.0,232395.0,265263.666667,33331.770135,3.0
Group 7,,,,,0.0
Group 8,,,,,0.0
