----
# Feb 7, 2020 Nested Logit with mlogit in R
* Name: Jikhan Jeong
* Ref: https://cran.r-project.org/web/packages/mlogit/vignettes/e2nlogit.html (Author: Kenneth Train and Yves Croissant)
* This is a repo of the above example, no modification. This is for understanding the structure of nested logit
----

In [2]:
library(mlogit)
library(lmtest)

---
# Problem 
* The choice of heating and central cooling system for 250 (n=250)
### Dependent Seven Alternatives as follows:
* 1. **gccGas** : central heat with cooling,
* 2. **ecc**    : Electric central resistence heat with cooling 
* 3. **erc**    : Electric room resistence heat with cooling 
* 4. **hpc**    : Electric heat pump, which provides cooling also
* 5. **gc**     : Gas central heat without cooling
* 6. **ec**     : Electric central resistence heat without cooling
* 7. **er**     : Electric room resistence heat without cooling 
---
### Independent 
* 1. **depvar** gives the name of the chosen alternative (dependent varaibles)
* 2. **ich.alt** are the installation cost for the heating portion of the system,
* 3. **icca** is the installation cost for cooling
* 4. **och.alt** are the operating cost for the heating portion of the system
* 5. **occa** is the operating cost for cooling
* 6. **income** is the annual income of the household
---

- <font color = blue> **Q1**. Run a nested logit model on the data for two nests and one log-sum coefficient that applies to both nests. 
- Nest 1: (gcc,ecc, erc, hpc)  
- Nest 2: (gc,ec,er}) 

In [6]:
data("HC", package = "mlogit")

In [8]:
HC <- mlogit.data(HC, varying = c(2:8, 10:16), choice = "depvar", shape = "wide")

In [10]:
cooling.modes <- index(HC)$alt %in% c('gcc', 'ecc', 'erc', 'hpc')
room.modes <- index(HC)$alt %in% c('erc', 'er')

In [None]:
# installation / operating costs for cooling are constants, 
# only relevant for mixed systems

In [11]:
HC$icca[!cooling.modes] <- 0
HC$occa[!cooling.modes] <- 0

In [12]:
# create income variables for two sets cooling and rooms
HC$inc.cooling <- HC$inc.room <- 0
HC$inc.cooling[cooling.modes] <- HC$income[cooling.modes]
HC$inc.room[room.modes] <- HC$income[room.modes]

In [13]:
# create an intercet for cooling modes
HC$int.cooling <- as.numeric(cooling.modes)

In [14]:
# estimate the model with only one nest elasticity
nl <- mlogit(depvar ~ ich + och +icca + occa + inc.room + inc.cooling + int.cooling | 0, HC,
             nests = list(cooling = c('gcc','ecc','erc','hpc'), 
             other = c('gc', 'ec', 'er')), un.nest.el = TRUE) # un.nest.el = True = the same log sume
summary(nl)


Call:
mlogit(formula = depvar ~ ich + och + icca + occa + inc.room + 
    inc.cooling + int.cooling | 0, data = HC, nests = list(cooling = c("gcc", 
    "ecc", "erc", "hpc"), other = c("gc", "ec", "er")), un.nest.el = TRUE)

Frequencies of alternatives:
   ec   ecc    er   erc    gc   gcc   hpc 
0.004 0.016 0.032 0.004 0.096 0.744 0.104 

bfgs method
11 iterations, 0h:0m:0s 
g'(-H)^-1g = 7.26E-06 
successive function values within tolerance limits 

Coefficients :
             Estimate Std. Error z-value  Pr(>|z|)    
ich         -0.554878   0.144205 -3.8478 0.0001192 ***
och         -0.857886   0.255313 -3.3601 0.0007791 ***
icca        -0.225079   0.144423 -1.5585 0.1191212    
occa        -1.089458   1.219821 -0.8931 0.3717882    
inc.room    -0.378971   0.099631 -3.8038 0.0001425 ***
inc.cooling  0.249575   0.059213  4.2149 2.499e-05 ***
int.cooling -6.000415   5.562423 -1.0787 0.2807030    
iv           0.585922   0.179708  3.2604 0.0011125 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘

---
#### A. The estimated log-sum coefficient is −0.59. What does this estimate tell you about the degree of correlation in unobserved factors over alternatives within each nest?
* (Ans) The correlation is approximately 1−0.59=0.41. It’s a moderate correlation.
---

#### B. Test the hypothesis that the log-sum coefficient is 1.0 (the value that it takes for a standard logit model.) Can the hypothesis that the true model is standard logit be rejected?
* T-test
* The critical value of t for 95% confidence is 1.96. So we can reject the hypothesis at 95% confidence.
---

In [16]:
(coef(nl)['iv'] - 1) / sqrt(vcov(nl)['iv', 'iv'])
# vcov :  variance-covariance matrix of the main parameters of a fitted model object.

---
#### <font color = 'blue' > Q2. the model that has the cooling alternatives in one nest and the non-cooling alternatives in the other nest (like for exercise 1), with a separate log-sum coefficient for each nest.
* Two log sum coefficients : for each nests
---

In [17]:
nl3 <- update(nl, un.nest.el = FALSE)

In [18]:
lrtest(nl, nl3)

#Df,LogLik,Df,Chisq,Pr(>Chisq)
8,-178.1247,,,
9,-178.0368,1.0,0.1758243,0.6749866


In [19]:
summary(nl3)


Call:
mlogit(formula = depvar ~ ich + och + icca + occa + inc.room + 
    inc.cooling + int.cooling | 0, data = HC, nests = list(cooling = c("gcc", 
    "ecc", "erc", "hpc"), other = c("gc", "ec", "er")), un.nest.el = FALSE)

Frequencies of alternatives:
   ec   ecc    er   erc    gc   gcc   hpc 
0.004 0.016 0.032 0.004 0.096 0.744 0.104 

bfgs method
4 iterations, 0h:0m:0s 
g'(-H)^-1g =  1.18 
last step couldn't find higher value 

Coefficients :
             Estimate Std. Error z-value  Pr(>|z|)    
ich         -0.562283   0.146145 -3.8474 0.0001194 ***
och         -0.895493   0.271861 -3.2939 0.0009880 ***
icca        -0.267062   0.150310 -1.7767 0.0756103 .  
occa        -1.338514   1.264215 -1.0588 0.2897042    
inc.room    -0.381441   0.096658 -3.9463 7.937e-05 ***
inc.cooling  0.259932   0.062085  4.1867 2.830e-05 ***
int.cooling -4.821927   5.528796 -0.8721 0.3831277    
iv:cooling   0.611529   0.188736  3.2401 0.0011947 ** 
iv:other     0.378394   0.133617  2.8319 0.0046270 *