# Task 1

In [5]:
# Install mlogit if not already installed
if (!require(mlogit, quietly = TRUE)) {
    cat("Installing mlogit package...\n")
    install.packages("mlogit", dependencies = TRUE)
    library(mlogit)
}

# Check if mlogit loaded successfully
if (!"mlogit" %in% loadedNamespaces()) {
    stop("mlogit package failed to load. Please install manually.")
} else {
    cat("mlogit package loaded successfully.\n")
}

# Read in the CSV (adjust path as needed)
commute_raw <- read.csv("Commute_Mode.csv", stringsAsFactors = TRUE)

# Ensure 'mode' is a factor and 'id' is an identifier
commute_raw$mode <- as.factor(commute_raw$mode)
commute_raw$id <- as.character(commute_raw$id)


mlogit package loaded successfully.


In [6]:
commute_mlogit <- mlogit.data(
    data     = commute_raw,
    choice   = "choice", # column with 1 = chosen alternative
    shape    = "long", # data is in long format
    chid.var = "id", # individual ID
    alt.var  = "mode" # alternative identifier
)

# Inspect the first few rows
head(commute_mlogit)


~~~~~~~
 first 10 observations out of 1812 
~~~~~~~
    id    mode choice     cost     time     idx
1    1     bus  FALSE 1.800512 20.86779   1:bus
2    1     car   TRUE 1.507010 18.50320   1:car
3    1 carpool  FALSE 2.335612 26.33823  1:pool
4    1    rail  FALSE 2.358920 30.03347  1:rail
5   10     bus  FALSE 2.003332 44.43243  10:bus
6   10     car   TRUE 4.242578 16.58784  10:car
7   10 carpool  FALSE 1.799301 18.30912 10:pool
8   10    rail  FALSE 2.168862 29.48583 10:rail
9  100     bus   TRUE 1.567057 15.58405 100:bus
10 100     car  FALSE 6.167042 20.07002 100:car

~~~ indexes ~~~~
   chid     alt
1     1     bus
2     1     car
3     1 carpool
4     1    rail
5    10     bus
6    10     car
7    10 carpool
8    10    rail
9   100     bus
10  100     car
indexes:  1, 2 


## Data Import and Formatting

We imported the survey data for 453 respondents and 4 commuting alternatives into “long” format using `mlogit.data()`.  

- **Individuals (chid)**: 453  
- **Alternatives per individual**: 4 (bus, car, carpool, rail)  
- **Total observations**: 453 × 4 = 1812  
- **Choice column**: `choice` (TRUE if that mode was chosen)  
- **Alternative identifier**: `mode`  
- **Attributes**:  
  - `cost`: total travel cost  
  - `time`: travel time in minutes  
- **Index**: `idx` combines `id` and `mode` (e.g. `1:car`)  

**First 10 rows of the prepared `commute_mlogit` data frame:**

| id  | mode    | choice |  cost  |   time   |  idx   |
|-----|---------|--------|-------:|---------:|--------|
| 1   | bus     | FALSE  | 1.8005 | 20.8678  | 1:bus  |
| 1   | car     | TRUE   | 1.5070 | 18.5032  | 1:car  |
| 1   | carpool | FALSE  | 2.3356 | 26.3382  | 1:pool |
| 1   | rail    | FALSE  | 2.3589 | 30.0335  | 1:rail |
| 10  | bus     | FALSE  | 2.0033 | 44.4324  | 10:bus |
| 10  | car     | TRUE   | 4.2426 | 16.5878  | 10:car |
| 10  | carpool | FALSE  | 1.7993 | 18.3091  | 10:pool|
| 10  | rail    | FALSE  | 2.1689 | 29.4858  | 10:rail|
| 100 | bus     | TRUE   | 1.5671 | 15.5841  |100:bus |
| 100 | car     | FALSE  | 6.1670 | 20.0700  |100:car |

This confirms that our data are correctly specified for multinomial logit estimation.  

---

# Task 2

In [7]:
# Predict choice using cost and time (with alternative‐specific constants by default)
model_mnl <- mlogit(choice ~ cost + time, data = commute_mlogit)

# Display estimation results
summary(model_mnl)



Call:
mlogit(formula = choice ~ cost + time, data = commute_mlogit, 
    method = "nr")

Frequencies of alternatives:choice
    bus     car carpool    rail 
0.17881 0.48124 0.07064 0.26932 

nr method
5 iterations, 0h:0m:0s 
g'(-H)^-1g = 6.07E-07 
gradient close to zero 

Coefficients :
                      Estimate Std. Error  z-value  Pr(>|z|)    
(Intercept):car      3.2924661  0.3172767  10.3773 < 2.2e-16 ***
(Intercept):carpool -0.9051585  0.2459427  -3.6804 0.0002329 ***
(Intercept):rail     0.6277690  0.1633612   3.8428 0.0001216 ***
cost                -0.7723478  0.0919795  -8.3970 < 2.2e-16 ***
time                -0.0853574  0.0077484 -11.0161 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -354.45
McFadden R^2:  0.34811 
Likelihood ratio test : chisq = 378.56 (p.value = < 2.22e-16)

### Multinomial Logit Estimation Results

We estimated a multinomial logit model of mode choice as a function of travel cost and travel time, with "bus" as the reference alternative (base category). Below is a summary of the key output:

**Choice Frequencies**  
- Bus: 17.9%  
- Car: 48.1%  
- Carpool: 7.1%  
- Rail: 26.9%  

**Model Fit**  
- Log-Likelihood: −354.45  
- McFadden's $R^2$: 0.3481  
- Likelihood-ratio test: $\chi^2 = 378.56$, $p < 2.2 \times 10^{-16}$

**Parameter Estimates**

| Parameter             | Estimate  | Std. Error | z-value | p-value    | Interpretation (relative to bus)                    |
|:----------------------|----------:|-----------:|--------:|-----------:|:-----------------------------------------------------|
| **(Intercept):car**   |  3.2925   |   0.3173   |  10.38  | <2.2e-16  | Car has much higher baseline utility than bus       |
| **(Intercept):carpool** | −0.9052 |   0.2459   |  −3.68  | 0.00023   | Carpool has lower baseline utility than bus         |
| **(Intercept):rail**  |  0.6278   |   0.1634   |   3.84  | 0.00012   | Rail has modestly higher baseline utility vs. bus   |
| **cost**              | −0.7723   |   0.0920   |  −8.40  | <2.2e-16  | Higher cost reduces utility of all modes equally    |
| **time**              | −0.0854   |   0.0077   | −11.02  | <2.2e-16  | Longer travel time reduces utility of all modes     |

- **Alternative-specific constants**:  
  - Car is strongly preferred to bus at zero cost/time (high positive constant).  
  - Carpool is less preferred than bus (negative constant).  
  - Rail is moderately preferred to bus (positive constant).  

- **Cost coefficient (−0.7723)**:  
  - A one‐unit increase in cost reduces the utility of each mode by 0.7723, holding time constant.

- **Time coefficient (−0.0854)**:  
  - Each additional minute of travel time reduces utility by 0.0854, holding cost constant.

All coefficients are highly significant ($p<0.001$), indicating that both cost and time play important roles in commuters' mode choices, and that baseline preferences differ markedly across alternatives.

# Task 3

In [8]:
# 1. Extract all coefficients
coefs <- coef(model_mnl)

# 2. Identify and count the alternative‐specific intercepts
intercepts <- coefs[grep("^\\(Intercept\\)", names(coefs))]
n_intercepts <- length(intercepts)

# 3. Extract the cost and time slopes
slope_cost <- coefs["cost"]
slope_time <- coefs["time"]

# 4. Print results
cat("Alternative‐specific intercepts:\n")
print(intercepts)
cat("\nNumber of intercepts: ", n_intercepts, "\n\n")

cat("Cost coefficient: ", slope_cost, "\n")
cat("Time coefficient: ", slope_time, "\n\n")

cat("Sign of slopes:\n")
print(sign(c(cost = slope_cost, time = slope_time)))


Alternative‐specific intercepts:
    (Intercept):car (Intercept):carpool    (Intercept):rail 
          3.2924661          -0.9051585           0.6277690 

Number of intercepts:  3 

Cost coefficient:  -0.7723478 
Time coefficient:  -0.08535743 

Sign of slopes:
cost.cost time.time 
       -1        -1 


### Interpretation of Estimated Coefficients

- **Cost and Time Slopes**  
  - **Cost (−0.7723)**: Negative, as expected—higher travel cost reduces the utility of each mode.  
  - **Time (−0.0854)**: Also negative—longer travel time makes a mode less attractive.  
  Both magnitudes are highly significant and their signs align with economic theory of random utility models.

- **Alternative-Specific Intercepts**  
  - We have **three** intercept coefficients:  
    ```
    (Intercept):car     =  3.2925  
    (Intercept):carpool = −0.9052  
    (Intercept):rail    =  0.6278  
    ```  
  - Why three?  With four alternatives (bus, car, carpool, rail), we include \(K-1\) intercepts to identify the model.  “Bus” serves as the **base** category (its intercept is implicitly zero).  
  - These intercepts capture **baseline** preferences (at zero cost/time) relative to bus:  
    - **Car** has a large positive intercept ⇒ at equal cost/time, car is much more preferred than bus.  
    - **Carpool** has a negative intercept ⇒ even at zero cost/time, carpool is less preferred than bus.  
    - **Rail** has a modest positive intercept ⇒ rail is somewhat preferred to bus, all else equal.

- **Summary**  
  1. **Signs**: All coefficients conform to expectations (higher cost/time ⇒ lower utility).  
  2. **Intercept count**: \(J-1 = 4 - 1 = 3\) intercepts are estimated because one alternative (bus) is the reference.  
  3. **Magnitude**: Car’s large positive intercept and the strong negative cost/time effects highlight that commuters trade off cost and travel time in choosing their primary mode, and have inherent baseline preferences across modes.


# Task 4

In [9]:
# 1. Extract predicted probabilities (one row per respondent, one column per mode)
probabilities <- fitted(model_mnl, type = "probabilities")
# Alternative: probabilities <- predict(model_mnl, type = "probabilities")

# 2. View the first few respondents’ probabilities
head(probabilities, 6)


Unnamed: 0,bus,car,carpool,rail
1,0.02323985,0.95992632,0.003898082,0.01293575
10,0.01594586,0.81969423,0.070206581,0.09415332
100,0.60213017,0.31647392,0.042211299,0.03918461
101,0.29674708,0.06135665,0.054066439,0.58782984
102,0.3051825,0.39452095,0.013029398,0.28726715
103,0.03184252,0.88666615,0.038620002,0.04287132


### Predicted Choice Probabilities

Using `fitted(model_mnl, type = "probabilities")`, we obtain each respondent’s predicted probability of choosing each mode. Below are the first six individuals:

| id  |    bus   |    car   | carpool  |   rail   |
|----:|---------:|---------:|---------:|---------:|
|   1 | 0.023239 | 0.959926 | 0.003898 | 0.012936 |
|  10 | 0.015946 | 0.819694 | 0.070207 | 0.094153 |
| 100 | 0.602130 | 0.316474 | 0.042211 | 0.039185 |
| 101 | 0.296747 | 0.061357 | 0.054066 | 0.587830 |
| 102 | 0.305183 | 0.394521 | 0.013029 | 0.287267 |
| 103 | 0.031843 | 0.886666 | 0.038620 | 0.042871 |

- The entry in the **first row, last column** is **0.012936**.  
  - **Interpretation:** Individual 1 has about a **1.3%** predicted probability of choosing **rail** as their primary mode, given their cost and time attributes.

- By contrast, that same individual has a **95.99%** probability of choosing **car**, reflecting both the high alternative‐specific intercept for car and the trade‐off against cost/time.

- Looking at **respondent 100**, the model assigns a **60.2%** probability to **bus**, suggesting this person’s cost/time profile makes bus the most likely choice.

These probabilities let us see, at the individual level, how cost and time trade‐offs translate into mode‐choice likelihoods—and can be aggregated to predict changes in overall mode shares under different cost/time scenarios.  

---

# Task 5

In [10]:
# 1. Compute alternative‐specific mean cost/time
cost_means <- tapply(commute_raw$cost, commute_raw$mode, mean)
time_means <- tapply(commute_raw$time, commute_raw$mode, mean)

# 2. Extract model coefficients
coefs <- coef(model_mnl)
intercepts <- c(
    bus     = 0,
    car     = coefs["(Intercept):car"],
    carpool = coefs["(Intercept):carpool"],
    rail    = coefs["(Intercept):rail"]
)
beta_cost <- coefs["cost"]
beta_time <- coefs["time"]

# 3. Systematic utilities at mean covariates: V_j = α_j + β_cost·mean_cost_j + β_time·mean_time_j
V <- intercepts + beta_cost * cost_means + beta_time * time_means

# 4. Choice probabilities at mean: P_j = exp(V_j) / Σ_k exp(V_k)
P <- exp(V) / sum(exp(V))

# 5. Marginal effects of rail’s time on each P_j:
#    ∂P_j/∂time_rail = β_time * P_j * (I[j=="rail"] − P["rail"])
ME_time_rail <- beta_time * P * ((names(P) == "rail") - P["rail"])

# 6. Tabulate results
me_df <- data.frame(
    mode           = names(P),
    probability    = round(P, 6),
    ME_time_rail   = round(ME_time_rail, 6)
)
print(me_df)


           mode probability ME_time_rail
bus         bus    0.141337     0.002890
car         car    0.543267     0.011108
carpool carpool    0.075853     0.001551
rail       rail    0.239542    -0.015549


### Marginal Effects of Rail Travel Time

| Mode    | Predicted Probability | ∂P/∂(Rail Time) |
|:--------|----------------------:|----------------:|
| bus     | 0.141337              |  0.002890       |
| car     | 0.543267              |  0.011108       |
| carpool | 0.075853              |  0.001551       |
| rail    | 0.239542              | −0.015549       |

- **Last row, first column** (rail probability = **0.239542**):  
  At the mean cost/time profile, about **23.95%** of commuters choose rail as their primary mode.

- **Last row, last column** (∂P/∂(Rail Time) for rail = **−0.015549**):  
  A one‐minute increase in rail travel time **decreases** the probability of choosing rail by **1.55 percentage points**.

> **Note on substitution:**  
> Because the marginal effect on rail is negative, commuters displaced from rail by slower times reallocate to the other modes. For example, each extra minute of rail time increases car probability by about 1.11 percentage points.  
