# Choice Model Data

- Travel Mode Data <p>
- Long format: there is one row for each alternative and, therefore, as many rows as there are alternatives for each choice situation

In [1]:
library(mlogit)

"package 'mlogit' was built under R version 3.4.4"Loading required package: Formula
Loading required package: maxLik
"package 'maxLik' was built under R version 3.4.4"Loading required package: miscTools
"package 'miscTools' was built under R version 3.4.4"
Please cite the 'maxLik' package as:
Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.

If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
https://r-forge.r-project.org/projects/maxlik/


In [2]:
TravelMode = read.csv('TravelMode_data.csv', header=T)

In [3]:
travel.logit = mlogit.data(TravelMode, shape='long', choice='choice',
                           alt.levels = c('air', 'train', 'bus', 'car'))

### A simple choice model for travel mode

In [5]:
travel.m1    = mlogit(choice ~ wait + vcost + travel, travel.logit, 
                      reflevel = 'bus')    

In [7]:
summary(travel.m1)


Call:
mlogit(formula = choice ~ wait + vcost + travel, data = travel.logit, 
    reflevel = "bus", method = "nr", print.level = 0)

Frequencies of alternatives:
    bus     air   train     car 
0.14286 0.27619 0.30000 0.28095 

nr method
5 iterations, 0h:0m:0s 
g'(-H)^-1g = 0.000192 
successive function values within tolerance limits 

Coefficients :
                     Estimate  Std. Error t-value  Pr(>|t|)    
air:(intercept)    1.43363372  0.68071345  2.1061   0.03520 *  
train:(intercept)  0.64696705  0.29739688  2.1754   0.02960 *  
car:(intercept)   -3.30622276  0.45832999 -7.2136 5.449e-13 ***
wait              -0.09688675  0.01034202 -9.3683 < 2.2e-16 ***
vcost             -0.01391160  0.00665133 -2.0916   0.03648 *  
travel            -0.00399468  0.00084915 -4.7043 2.547e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood: -192.89
McFadden R^2:  0.32024 
Likelihood ratio test : chisq = 181.74 (p.value = < 2.22e-16)

### Another choice model for travel mode

In [9]:
travel.m2 = mlogit(choice ~ wait + vcost + travel | income, travel.logit, 
                   reflevel='bus')

In [10]:
summary(travel.m2)


Call:
mlogit(formula = choice ~ wait + vcost + travel | income, data = travel.logit, 
    reflevel = "bus", method = "nr", print.level = 0)

Frequencies of alternatives:
    bus     air   train     car 
0.14286 0.27619 0.30000 0.28095 

nr method
5 iterations, 0h:0m:0s 
g'(-H)^-1g = 0.000546 
successive function values within tolerance limits 

Coefficients :
                     Estimate  Std. Error t-value  Pr(>|t|)    
air:(intercept)    0.18436561  0.89664384  0.2056   0.83709    
train:(intercept)  1.42648960  0.55740326  2.5592   0.01049 *  
car:(intercept)   -4.06305942  0.68715749 -5.9129 3.362e-09 ***
wait              -0.09528341  0.01035524 -9.2015 < 2.2e-16 ***
vcost             -0.00449878  0.00721124 -0.6239   0.53272    
travel            -0.00366471  0.00086797 -4.2222 2.420e-05 ***
air:income         0.02311070  0.01645639  1.4044   0.16021    
train:income      -0.03278436  0.01708649 -1.9187   0.05502 .  
car:income         0.02521351  0.01567725  1.6083   0.10777  

## Another Data Example

- The data set "Heating" is about the choice of heating system in California houses
- 900 observations in wide format
- Variables:
    - Idcase: id; Depvar: choice in {gc, gr, ec, er, hp},
    - ic.alt: installation cost
    - oc.alt: annual operating cost
    - Income, agehed, rooms, region

In [11]:
data('Heating', package='mlogit')

In [12]:
head(Heating)

idcase,depvar,ic.gc,ic.gr,ic.ec,ic.er,ic.hp,oc.gc,oc.gr,oc.ec,oc.er,oc.hp,income,agehed,rooms,region
1,gc,866.0,962.64,859.9,995.76,1135.5,199.69,151.72,553.34,505.6,237.88,7,25,6,ncostl
2,gc,727.93,758.89,796.82,894.69,968.9,168.66,168.66,520.24,486.49,199.19,5,60,5,scostl
3,gc,599.48,783.05,719.86,900.11,1048.3,165.58,137.8,439.06,404.74,171.47,4,65,2,ncostl
4,er,835.17,793.06,761.25,831.04,1048.7,180.88,147.14,483.0,425.22,222.95,2,50,4,scostl
5,er,755.59,846.29,858.86,985.64,883.05,174.91,138.9,404.41,389.52,178.49,2,25,6,valley
6,gc,666.11,841.71,693.74,862.56,859.18,135.67,140.97,398.22,371.04,209.27,6,65,7,scostl


In [13]:
H.data = mlogit.data(Heating, shape='wide', choice ='depvar', 
                     varying=c(3:12))

In [15]:
heating.m1 = mlogit(depvar ~ ic + oc | 0, H.data, reflevel = 'hp')
summary(heating.m1)


Call:
mlogit(formula = depvar ~ ic + oc | 0, data = H.data, reflevel = "hp", 
    method = "nr", print.level = 0)

Frequencies of alternatives:
      hp       ec       er       gc       gr 
0.055556 0.071111 0.093333 0.636667 0.143333 

nr method
4 iterations, 0h:0m:0s 
g'(-H)^-1g = 1.56E-07 
gradient close to zero 

Coefficients :
      Estimate  Std. Error t-value  Pr(>|t|)    
ic -0.00623187  0.00035277 -17.665 < 2.2e-16 ***
oc -0.00458008  0.00032216 -14.217 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood: -1095.2

To see how the model fit the original data, let's look at the market shares:

In [16]:
heating.fit1 = fitted(heating.m1, outcome=F)
colMeans(heating.fit1)

## Fit the Fixed Effect Model

The model with choice-specific intercepts

In [18]:
heating.m2 = mlogit(depvar ~ ic + oc, H.data, reflevel='hp')
summary(heating.m2)


Call:
mlogit(formula = depvar ~ ic + oc, data = H.data, reflevel = "hp", 
    method = "nr", print.level = 0)

Frequencies of alternatives:
      hp       ec       er       gc       gr 
0.055556 0.071111 0.093333 0.636667 0.143333 

nr method
6 iterations, 0h:0m:0s 
g'(-H)^-1g = 9.58E-06 
successive function values within tolerance limits 

Coefficients :
                  Estimate  Std. Error t-value  Pr(>|t|)    
ec:(intercept)  1.65884594  0.44841936  3.6993 0.0002162 ***
er:(intercept)  1.85343697  0.36195509  5.1206 3.045e-07 ***
gc:(intercept)  1.71097930  0.22674214  7.5459 4.485e-14 ***
gr:(intercept)  0.30826328  0.20659222  1.4921 0.1356640    
ic             -0.00153315  0.00062086 -2.4694 0.0135333 *  
oc             -0.00699637  0.00155408 -4.5019 6.734e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood: -1008.2
McFadden R^2:  0.013691 
Likelihood ratio test : chisq = 27.99 (p.value = 8.3572e-07)

In [19]:
heating.fit2 = fitted(heating.m2, outcome=F)
colMeans(heating.fit2)

The fitted market shares are identical to the observed market shares

A more complicated model

In [21]:
heating.m3 = mlogit(depvar ~ ic+oc|income + agehed + rooms + region, 
                    H.data, reflevel='hp')
summary(heating.m3)


Call:
mlogit(formula = depvar ~ ic + oc | income + agehed + rooms + 
    region, data = H.data, reflevel = "hp", method = "nr", print.level = 0)

Frequencies of alternatives:
      hp       ec       er       gc       gr 
0.055556 0.071111 0.093333 0.636667 0.143333 

nr method
6 iterations, 0h:0m:0s 
g'(-H)^-1g = 0.000205 
successive function values within tolerance limits 

Coefficients :
                   Estimate  Std. Error t-value  Pr(>|t|)    
ec:(intercept)   0.74157651  1.11159204  0.6671   0.50469    
er:(intercept)   2.39869671  1.00459037  2.3877   0.01695 *  
gc:(intercept)   1.18644866  0.81402383  1.4575   0.14498    
gr:(intercept)   0.34596215  0.90359240  0.3829   0.70181    
ic              -0.00151382  0.00062611 -2.4178   0.01561 *  
oc              -0.00695406  0.00156315 -4.4488 8.637e-06 ***
ec:income       -0.05946817  0.11392698 -0.5220   0.60168    
er:income       -0.09752868  0.10803426 -0.9028   0.36665    
gc:income       -0.06630568  0.08943008 -0.7414 

In [22]:
AIC(heating.m1)
AIC(heating.m2)
AIC(heating.m3)

BIC(heating.m1)
BIC(heating.m2)
BIC(heating.m3)