In [23]:
import pylogit
print("🎉 pylogit is working!")

🎉 pylogit is working!


In [24]:
import pandas as pd

# Replace with your exact file path
file_path = '/Users/danielseymour/Downloads/transportation_4.dta'

# Load the .dta file
df = pd.read_stata(file_path)

# Display the first few rows to verify the data loaded correctly
df.head()

Unnamed: 0,mode,ttme,invc,invt,gc,hinc,psize,id,type
0,0,69,59,100,70,35,1,1,air
1,0,34,31,372,71,35,1,1,train
2,0,35,25,417,70,35,1,1,bus
3,1,0,10,180,30,35,1,1,car
4,0,64,58,68,68,30,2,2,air


In [25]:
df.describe()

Unnamed: 0,mode,ttme,invc,invt,gc,hinc,psize,id
count,840.0,840.0,840.0,840.0,840.0,840.0,840.0,840.0
mean,0.25,34.589286,47.760714,486.165476,110.879762,34.547619,1.742857,105.5
std,0.433271,24.948608,32.371004,301.439107,47.978353,19.676044,1.01035,60.657207
min,0.0,0.0,2.0,63.0,30.0,2.0,1.0,1.0
25%,0.0,0.75,23.0,234.0,71.0,20.0,1.0,53.0
50%,0.0,35.0,39.0,397.0,101.5,34.5,1.0,105.5
75%,0.25,53.0,66.25,795.5,144.0,50.0,2.0,158.0
max,1.0,99.0,180.0,1440.0,269.0,72.0,6.0,210.0


The utility of mode j for individual i is modeled as:

$$U_{ij} = \alpha_{air} d_{i,air} + \alpha_{train} d_{i,train} + \alpha_{bus} d_{i,bus} + \beta_{G} GC_{ij} + \beta_{T} TTME_{ij} + \gamma_{H} d_{i,air} HINC_{i} + \varepsilon_{ij}$$

where the error term is distributed according to the EV1 distribution 

We're running the python equivalent of clogit in Stata. It's called conditional logit because the intercepts are conditioned out. This seems to just be the standard logit model we're using. 

The utility that individual i derives from alternative j (denoted U_{ij}) is:

$$U_{ij} =
\begin{cases}
\alpha_{\text{air}} + \beta_{\text{GC}} \cdot GC_{ij} + \beta_T \cdot TTME_{ij} + \gamma_H \cdot HINC_i + \varepsilon_{ij}, & \text{if } j = \text{air} \\
\alpha_{\text{train}} + \beta_{\text{GC}} \cdot GC_{ij} + \beta_T \cdot TTME_{ij} + \varepsilon_{ij}, & \text{if } j = \text{train} \\
\alpha_{\text{bus}} + \beta_{\text{GC}} \cdot GC_{ij} + \beta_T \cdot TTME_{ij} + \varepsilon_{ij}, & \text{if } j = \text{bus} \\
\beta_{\text{GC}} \cdot GC_{ij} + \beta_T \cdot TTME_{ij} + \varepsilon_{ij}, & \text{if } j = \text{car (base)} \\
\end{cases}$$

And the corresponding choice probability for consumer i choosing alternative j is:

$$P_{ij} = \frac{e^{U_{ij}}}{\sum_{k \in C_i} e^{U_{ik}}}$$

Where C_i is the set of alternatives available to individual i. 

Important note: 

Conditional logit is not linear regression — it’s maximum likelihood estimation of a logit choice model, using the following likelihood function:

$$\mathcal{L}(\theta) = \prod_{i=1}^{N} \prod_{j \in C_i} P_{ij}^{y_{ij}}\quad \text{where} \quad
P_{ij} = \frac{e^{X_{ij} \cdot \beta}}{\sum_{k \in C_i} e^{X_{ik} \cdot \beta}}$$

So the regression is estimating the vector of coefficients \beta that best explains the choice behavior across individuals, using:

$$\boxed{
\text{Pr}(y_{ij} = 1) = \frac{e^{X_{ij} \cdot \beta}}{\sum_{k} e^{X_{ik} \cdot \beta}}
}$$

This is the multinomial logit model, where you’re modeling choice probabilities based on the utility that each alternative provides.

This is not a regression of mode ~ X in the OLS sense.  
	•	OLS: predicts a continuous variable (e.g. cost, time)  
	•	Logit: predicts a probability of choosing a discrete alternative

In [26]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.discrete.conditional_models import ConditionalLogit

# Create alternative-specific dummies
# (base category will be 'car', so we include air/train/bus)
df['air'] = (df['type'] == 'air').astype(int)
df['train'] = (df['type'] == 'train').astype(int)
df['bus'] = (df['type'] == 'bus').astype(int)

# Interaction term: air × hinc
df['air_hinc'] = df['air'] * df['hinc']

df

Unnamed: 0,mode,ttme,invc,invt,gc,hinc,psize,id,type,air,train,bus,air_hinc
0,0,69,59,100,70,35,1,1,air,1,0,0,35
1,0,34,31,372,71,35,1,1,train,0,1,0,0
2,0,35,25,417,70,35,1,1,bus,0,0,1,0
3,1,0,10,180,30,35,1,1,car,0,0,0,0
4,0,64,58,68,68,30,2,2,air,1,0,0,30
...,...,...,...,...,...,...,...,...,...,...,...,...,...
835,1,0,27,510,82,20,1,209,car,0,0,0,0
836,0,64,66,140,87,70,4,210,air,1,0,0,70
837,0,44,54,670,156,70,4,210,train,0,1,0,0
838,0,53,33,664,134,70,4,210,bus,0,0,1,0


In [27]:
# Define dependent variable (1 if this row is the chosen mode)
y = df['mode']

# Define independent variables
X = df[['air', 'train', 'bus', 'gc', 'ttme', 'air_hinc']]

# Define groups (individuals)
groups = df['id']

# Fit conditional logit model
model = ConditionalLogit(y, X, groups=groups)
result = model.fit()

# Print results
print(result.summary())

                  Conditional Logit Model Regression Results                  
Dep. Variable:                   mode   No. Observations:                  840
Model:               ConditionalLogit   No. groups:                        210
Log-Likelihood:               -199.13   Min group size:                      4
Method:                          BFGS   Max group size:                      4
Date:                Sun, 30 Mar 2025   Mean group size:                   4.0
Time:                        15:58:30                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
air            5.2074      0.779      6.684      0.000       3.681       6.734
train          3.8690      0.443      8.731      0.000       3.001       4.738
bus            3.1632      0.450      7.025      0.000       2.281       4.046
gc            -0.0155      0.004     -3.517      0.0

## Compute the elasticities with respect to the generalized cost of the travel GC. We want to compute:

$$\text{Elasticity of } s_k \text{ w.r.t. } GC_k = \frac{\partial s_k}{\partial GC_k} \cdot \frac{GC_k}{s_k}$$

Under the logit model, this is:

$$\varepsilon_k^{GC} = \beta_{GC} \cdot GC_k \cdot (1 - s_k)$$

Therefore, we need to calculate s_k before computing the elasticities. As we've already computed for each individual:

$$s_k = \frac{e^{V_k}}{\sum_j e^{V_j}}$$

We can recover the individual-level predicted probabilities s_k as we have already estimated the model. 

## Do We Need to Define an Outside Good?  
No, you do not need to define an outside good unless you’re modeling market shares at the aggregate level (e.g., 30% of the market chooses air, 25% train, etc.) and the total market is not 100% captured by your options.  

But in your model:
1. You are modeling individual-level choice among four mutually exclusive and exhaustive travel modes (air, train, bus, car).
2. Every individual must choose one of those four options.
3. So there is no need for an outside good — the choice set is closed and complete.

When Would You Include an Outside Good?  

You’d define an outside good if:  
1. People could also opt out (e.g., not travel at all),
2. Or if your model only includes a subset of options (e.g., you model Coke vs Pepsi but not water or juice),
3. Or if you’re using aggregate-level market shares and want to normalize them.


In [28]:
import numpy as np
# So even though we have do

# Step 1: Compute utility (X dot beta)
X = df[['air', 'train', 'bus', 'gc', 'ttme', 'air_hinc']]
beta = result.params.values  # same order as X columns
V = X @ beta  # linear utility for each row

# Step 2: Add utility to DataFrame
df['V'] = V

# Step 3: Compute exp(V) within each individual
df['expV'] = np.exp(df['V'])

# Step 4: Sum of exp(V) within each choice set (grouped by person id)
df['denom'] = df.groupby('id')['expV'].transform('sum')

# Step 5: Predicted probability = exp(V) / sum(exp(V)) within each person
df['prob'] = df['expV'] / df['denom']

df

Unnamed: 0,mode,ttme,invc,invt,gc,hinc,psize,id,type,air,train,bus,air_hinc,V,expV,denom,prob
0,0,69,59,100,70,35,1,1,air,1,0,0,35,-2.045213,0.129353,1.640407,0.078854
1,0,34,31,372,71,35,1,1,train,0,1,0,0,-0.499803,0.606650,1.640407,0.369817
2,0,35,25,417,70,35,1,1,bus,0,0,1,0,-1.286287,0.276295,1.640407,0.168431
3,1,0,10,180,30,35,1,1,car,0,0,0,0,-0.465040,0.628110,1.640407,0.382899
4,0,64,58,68,68,30,2,2,air,1,0,0,30,-1.600023,0.201892,0.891027,0.226583
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
835,1,0,27,510,82,20,1,209,car,0,0,0,0,-1.271110,0.280520,0.925215,0.303195
836,0,64,66,140,87,70,4,210,air,1,0,0,70,-1.363067,0.255875,0.569059,0.449645
837,0,44,54,670,156,70,4,210,train,0,1,0,0,-2.778663,0.062121,0.569059,0.109165
838,0,53,33,664,134,70,4,210,bus,0,0,1,0,-4.008616,0.018159,0.569059,0.031910


In [29]:
# Step 1: Get coefficient on GC
beta_gc = result.params['gc']

# Step 2: Compute elasticity. Note the negative sign in front of beta_gc means that our gc_elasticity is positive by construction.
# This means we will need to compute 
df['gc_elasticity'] = -beta_gc * df['gc'] * (1 - df['prob'])

In [30]:
df

Unnamed: 0,mode,ttme,invc,invt,gc,hinc,psize,id,type,air,train,bus,air_hinc,V,expV,denom,prob,gc_elasticity
0,0,69,59,100,70,35,1,1,air,1,0,0,35,-2.045213,0.129353,1.640407,0.078854,0.999530
1,0,34,31,372,71,35,1,1,train,0,1,0,0,-0.499803,0.606650,1.640407,0.369817,0.693577
2,0,35,25,417,70,35,1,1,bus,0,0,1,0,-1.286287,0.276295,1.640407,0.168431,0.902331
3,1,0,10,180,30,35,1,1,car,0,0,0,0,-0.465040,0.628110,1.640407,0.382899,0.286977
4,0,64,58,68,68,30,2,2,air,1,0,0,30,-1.600023,0.201892,0.891027,0.226583,0.815252
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
835,1,0,27,510,82,20,1,209,car,0,0,0,0,-1.271110,0.280520,0.925215,0.303195,0.885717
836,0,64,66,140,87,70,4,210,air,1,0,0,70,-1.363067,0.255875,0.569059,0.449645,0.742218
837,0,44,54,670,156,70,4,210,train,0,1,0,0,-2.778663,0.062121,0.569059,0.109165,2.154225
838,0,53,33,664,134,70,4,210,bus,0,0,1,0,-4.008616,0.018159,0.569059,0.031910,2.010898


The 1% increase in GC leads to approximately a $\varepsilon_k^{GC}$ \% change in the choice probability $s_k$.

The column gc_elasticity gives us the elasticity for each individual-alternative combination (i.e., “person 1, air”, “person 3, bus”, etc.)

When it’s useful:
	•	To see how sensitive different types of people or alternatives are.
	•	For segmentation: e.g., “High-income travelers are less price sensitive for air.”
	•	For policy simulations: e.g., “What happens to each traveler if GC increases?”
	•	For heterogeneous effects: different elasticities across individuals or modes.

This is why we want to also calculate the average elasticities. We will do this for each mode and only for the chosen alternatives. We restrict to only the chosen alternatives because this is  where the behavioral impact is real and observable — and where policy or pricing changes will have the biggest effect.

Note:

This is not the absolute change in probability from a $1 increase in GC — that would be the marginal effect:

$$\frac{\partial s_k}{\partial GC_k} = -\beta_{GC} \cdot s_k (1 - s_k)$$

The elasticty is the marginal effect multiplied by $$\frac{GC_k}{s_k}$$

\varepsilon_k^{GC} = \frac{\partial s_k}{\partial GC_k} \cdot \frac{GC_k}{s_k}

Marginal Effect
	•	Measures the absolute change in choice probability s_k from a unit change in some variable (e.g., GC).  
	•	Units: percentage points, not percent.  
	•	Formula for MNL:  
$$\frac{\partial s_k}{\partial GC_k} = -\beta_{GC} \cdot s_k (1 - s_k)$$  
	•	Example: “A $1 increase in generalized cost reduces the probability of choosing mode k by 0.02 (or 2 percentage points).”

⸻

Elasticity
	•	Measures the proportional (percent) change in the choice probability from a 1% change in the variable.  
	•	Units: percent change in probability per 1% change in GC.  
	•	Formula:
$$\varepsilon_k^{GC} = \frac{\partial s_k}{\partial GC_k} \cdot \frac{GC_k}{s_k}$$  
or, simply:  
$$\varepsilon_k^{GC} = -\beta_{GC} \cdot GC_k \cdot (1 - s_k)$$  
	•	Example: “A 1% increase in GC leads to a 0.8% decrease in the probability of choosing mode k.”

In [31]:
elasticities_by_mode = df[df['mode'] == 1].groupby('type')['gc_elasticity'].mean()
print(elasticities_by_mode)

type
air      0.839616
train    0.790682
bus      0.873572
car      0.816909
Name: gc_elasticity, dtype: float64


Interpretation:
Recall that because we included a negative sign in our estimates this is giving the percentage decrease in the choice probability given a 1% increase in the price of the good. We note that all modes of transport are inelastic as all elasticities are less than 1. We can also note that the elasticities are similar with bus having the highest sensitivity to price changes and the train the least.

In [32]:
# Get index of predicted choice (highest prob per individual)
df['predicted'] = df.groupby('id')['prob'].transform(lambda x: x.idxmax())
df['predicted_choice'] = df.loc[df['predicted'], 'type'].values
df

Unnamed: 0,mode,ttme,invc,invt,gc,hinc,psize,id,type,air,train,bus,air_hinc,V,expV,denom,prob,gc_elasticity,predicted,predicted_choice
0,0,69,59,100,70,35,1,1,air,1,0,0,35,-2.045213,0.129353,1.640407,0.078854,0.999530,3,car
1,0,34,31,372,71,35,1,1,train,0,1,0,0,-0.499803,0.606650,1.640407,0.369817,0.693577,3,car
2,0,35,25,417,70,35,1,1,bus,0,0,1,0,-1.286287,0.276295,1.640407,0.168431,0.902331,3,car
3,1,0,10,180,30,35,1,1,car,0,0,0,0,-0.465040,0.628110,1.640407,0.382899,0.286977,3,car
4,0,64,58,68,68,30,2,2,air,1,0,0,30,-1.600023,0.201892,0.891027,0.226583,0.815252,7,car
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
835,1,0,27,510,82,20,1,209,car,0,0,0,0,-1.271110,0.280520,0.925215,0.303195,0.885717,833,train
836,0,64,66,140,87,70,4,210,air,1,0,0,70,-1.363067,0.255875,0.569059,0.449645,0.742218,836,air
837,0,44,54,670,156,70,4,210,train,0,1,0,0,-2.778663,0.062121,0.569059,0.109165,2.154225,836,air
838,0,53,33,664,134,70,4,210,bus,0,0,1,0,-4.008616,0.018159,0.569059,0.031910,2.010898,836,air


In [33]:
actual_choices = df[df['mode'] == 1][['id', 'type']]
actual_choices.columns = ['id', 'actual_choice']
actual_choices

Unnamed: 0,id,actual_choice
3,1,car
7,2,car
11,3,car
15,4,car
19,5,car
...,...,...
823,206,car
824,207,air
830,208,bus
835,209,car


In [34]:
predicted_vs_actual = actual_choices.merge(df[['id', 'predicted_choice']].drop_duplicates(), on='id')

In [35]:
confusion_table = pd.crosstab(predicted_vs_actual['actual_choice'], predicted_vs_actual['predicted_choice'])
confusion_table

predicted_choice,air,train,bus,car
actual_choice,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
air,41,3,0,14
train,4,45,0,14
bus,1,3,23,3
car,10,13,0,36


# Test the IIA assumption for logit models

## What is the odds ratio?

For any two alternatives j and k, the odds of choosing j vs k are:

$$\frac{P_j}{P_k} = \frac{e^{V_j}}{e^{V_k}} = e^{V_j - V_k}$$

It answers: “How many times more likely is someone to choose alternative A over alternative B?”

And since:

$$V_j = \alpha_j + \text{(other stuff)}$$,

If you compare two mode-specific constants, say train and bus, and hold other variables equal (e.g., same GC, TTME):

$$\text{Odds Ratio (train vs bus)} = \frac{P_{\text{train}}}{P_{\text{bus}}} = e^{\alpha_{\text{train}} - \alpha_{\text{bus}}}$$

So the difference in coefficients gives you the log-odds, and exponentiating gives you the odds ratio.

In [36]:
import numpy as np
import pandas as pd
from statsmodels.discrete.conditional_models import ConditionalLogit

# Step 1: Remove the air alternative
df_no_air = df[df['type'] != 'air'].copy()

# Step 2: Drop individuals who originally chose air (mode == 1 and type == air)
# These people have no remaining valid choice
chosen_air_ids = df[(df['type'] == 'air') & (df['mode'] == 1)]['id'].unique()
df_no_air = df_no_air[~df_no_air['id'].isin(chosen_air_ids)]

# Step 3: Redefine y, X, and groups for new model
y_no_air = df_no_air['mode']

# IMPORTANT: remove the air dummy and air_hinc (they're now irrelevant)
X_no_air = df_no_air[['train', 'bus', 'gc', 'ttme']]
groups_no_air = df_no_air['id']

# Step 4: Re-estimate the model without air
model_no_air = ConditionalLogit(y_no_air, X_no_air, groups=groups_no_air)
result_no_air = model_no_air.fit()

# Print results
print(result_no_air.summary())

                  Conditional Logit Model Regression Results                  
Dep. Variable:                   mode   No. Observations:                  456
Model:               ConditionalLogit   No. groups:                        152
Log-Likelihood:               -87.938   Min group size:                      3
Method:                          BFGS   Max group size:                      3
Date:                Sun, 30 Mar 2025   Mean group size:                   3.0
Time:                        15:58:30                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
train          4.4637      0.641      6.969      0.000       3.208       5.719
bus            3.1048      0.609      5.098      0.000       1.911       4.298
gc            -0.0637      0.010     -6.341      0.000      -0.083      -0.044
ttme          -0.0699      0.015     -4.696      0.0

In [37]:
odds_ratio_full = np.exp(result.params['train'] - result.params['bus'])
print(f"Full model odds ratio (train vs bus): {odds_ratio_full:.3f}")

odds_ratio_no_air = np.exp(result_no_air.params['train'] - result_no_air.params['bus'])
print(f"Restricted model odds ratio (train vs bus): {odds_ratio_no_air:.3f}")


Full model odds ratio (train vs bus): 2.026
Restricted model odds ratio (train vs bus): 3.892


We want to use the Hausman test the IIA assumption by testing whether the estimates from the restricted model (no air) are significantly different from the full model.

Null Hypothesis H_0: 
No difference in coefficients → IIA holds

Alternative H_1:
Coefficients change → IIA is violated

We compute:

$$H = (\hat{\beta}_R - \hat{\beta}_F){\prime} \left[ \text{Var}(\hat{\beta}_R - \hat{\beta}_F) \right]^{-1} (\hat{\beta}_R - \hat{\beta}_F)$$

In [39]:
from numpy.linalg import inv
from scipy.stats import chi2

# 1. Extract coefficients and variances (only for train and bus)
b_full = result.params[['train', 'bus']].values
b_restricted = result_no_air.params[['train', 'bus']].values

# 2. Extract covariance matrices
V_full = result.cov_params().loc[['train', 'bus'], ['train', 'bus']].values
V_restricted = result_no_air.cov_params().loc[['train', 'bus'], ['train', 'bus']].values

# 3. Compute difference in coefficients and variance
b_diff = b_restricted - b_full
V_diff = V_restricted - V_full

# 4. Compute Hausman statistic
hausman_stat = b_diff.T @ inv(V_diff) @ b_diff
df_len = len(b_diff)  # degrees of freedom = number of coefficients tested
p_value = 1 - chi2.cdf(hausman_stat, df_len)

print(f"Hausman test statistic: {hausman_stat:.3f}")
print(f"Degrees of freedom: {df_len}")
print(f"P-value: {p_value:.4f}")

Hausman test statistic: 12.937
Degrees of freedom: 2
P-value: 0.0016


## Interpretation of the Hausman test:
As the p-value=<0.05, we have statistical evidence that removing air changes the odds between train and bus. This means the model violates IIA.

## Solution: Use nested logit model

In [40]:
df

Unnamed: 0,mode,ttme,invc,invt,gc,hinc,psize,id,type,air,train,bus,air_hinc,V,expV,denom,prob,gc_elasticity,predicted,predicted_choice
0,0,69,59,100,70,35,1,1,air,1,0,0,35,-2.045213,0.129353,1.640407,0.078854,0.999530,3,car
1,0,34,31,372,71,35,1,1,train,0,1,0,0,-0.499803,0.606650,1.640407,0.369817,0.693577,3,car
2,0,35,25,417,70,35,1,1,bus,0,0,1,0,-1.286287,0.276295,1.640407,0.168431,0.902331,3,car
3,1,0,10,180,30,35,1,1,car,0,0,0,0,-0.465040,0.628110,1.640407,0.382899,0.286977,3,car
4,0,64,58,68,68,30,2,2,air,1,0,0,30,-1.600023,0.201892,0.891027,0.226583,0.815252,7,car
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
835,1,0,27,510,82,20,1,209,car,0,0,0,0,-1.271110,0.280520,0.925215,0.303195,0.885717,833,train
836,0,64,66,140,87,70,4,210,air,1,0,0,70,-1.363067,0.255875,0.569059,0.449645,0.742218,836,air
837,0,44,54,670,156,70,4,210,train,0,1,0,0,-2.778663,0.062121,0.569059,0.109165,2.154225,836,air
838,0,53,33,664,134,70,4,210,bus,0,0,1,0,-4.008616,0.018159,0.569059,0.031910,2.010898,836,air


In [71]:
df['choice'] = df['mode']

df['alt_id'] = df['alt_id'].astype(int)
df['id'] = df['id'].astype(int)
df['choice'] = df['choice'].astype(int)

In [47]:
# Create numeric alternative IDs
alt_mapping = {'air': 1, 'train': 2, 'bus': 3, 'car': 4}
df['alt_id'] = df['type'].map(alt_mapping)

In [57]:
from collections import OrderedDict

specification = OrderedDict({
    'gc': ['all_same'],
    'ttme': ['all_same']
})

In [60]:
from collections import OrderedDict

nests = OrderedDict({
    'air': [1],
    'ground': [2, 3, 4]
})

In [75]:
# Each obs_id represents one person (or one decision occasion).
# alt_id: Alternative ID = the choice option
from collections import OrderedDict

nests = OrderedDict({
    'air': [1],
    'ground': [2, 3, 4]
})

specification = OrderedDict({
    'gc': 'all_same',
    'ttme': 'all_same'
})


names = OrderedDict({
    'gc': 'Generalized Cost',
    'ttme': 'Terminal Time'
})

model = pylogit.create_choice_model(
    data=df,
    alt_id_col='alt_id',
    obs_id_col='id',
    choice_col='choice',
    specification=specification,
    model_type="Nested Logit",
    names=names,
    nest_spec=nests
)

import numpy as np

init_vals = np.zeros(4)  # required dimension
model.fit_mle(init_vals)
model.summary

Log-likelihood at zero: -294.5556
Initial Log-likelihood: -294.5556
Estimation Time for Point Estimation: 0.08 seconds.
Final log-likelihood: -262.8153


  warn('Method %s does not use Hessian information (hess).' % method,
  self._store_inferential_results(np.sqrt(np.diag(self.robust_cov)),


Unnamed: 0,parameters,std_err,t_stats,p_values,robust_std_err,robust_t_stats,robust_p_values
air,0.0,,,,,,
ground,0.292086,0.408627,0.714797,0.474734,,,
Generalized Cost,-0.007398,0.002791,-2.651092,0.008023,,,
Terminal Time,-0.010966,0.002421,-4.528968,6e-06,,,


1. Inclusive Value Parameter (λ for ground): 0.292  

This is the key feature of the nested logit.  
	•	A λ between 0 and 1 means that the ground alternatives (train, bus, car) are correlated — users see them as substitutable.  
	•	λ closer to 1 ⇒ behaves like a standard logit (IIA holds).  
	•	λ closer to 0 ⇒ extreme substitution within the ground nest.  
	•	Your λ = 0.292 → indicates moderate correlation among train, bus, and car (so nested logit is justified 👍).  
	•	p-value = 0.47 → not statistically different from zero, so be cautious in claiming strong substitution.  


The Elasticity Formula for Own-Nested:

$$\varepsilon_{ij}^{\text{GC}} = -\beta_{\text{GC}} \cdot \frac{\text{GC}{ij}}{\lambda_m} \cdot \left[(1 - P{ij|m}) + P_{ij|m} (1 - P_{im})\right]$$

💡 Interpretation  
	•	Nested logit elasticity adjusts for substitution within the nest via P_{ij|m} and across nests via P_{im}  
	•	As \lambda_m \to 1, the expression simplifies to the standard logit elasticity  
	•	As \lambda_m < 1, elasticity is dampened, especially across nests  


$$\left[(1 - P_{ij|m}) + P_{ij|m} (1 - P_{im})\right]= (1 - P_{ij|m}) + P_{ij|m} - P_{ij|m} P_{im}= 1 - P_{ij|m} P_{im}$$

Now recall:

$$P_{ij} = P_{im} \cdot P_{ij|m}\Rightarrow P_{ij|m} \cdot P_{im} = P_{ij}$$

So this becomes:

$$\varepsilon_{ij}^{\text{GC}} = -\beta_{\text{GC}} \cdot \frac{\text{GC}{ij}}{\lambda_m} \cdot (1 - P{ij})$$

In the conditional logit model, the elasticity with respect to generalized cost measures how sensitive the probability of choosing a travel mode is to a change in its own cost. However, this model assumes IIA (Independence of Irrelevant Alternatives), meaning that substitution patterns between all modes are equally likely, regardless of their similarity.  

The nested logit model, by contrast, allows for correlated alternatives within a nest. In this case, train, bus, and car are grouped into a “ground” nest, reflecting that they are more similar to each other than to air travel.  

As a result, I expect the elasticity with respect to generalized cost to be smaller in magnitude (i.e., less negative) in the nested logit model. This is because when the cost of, say, bus travel increases, some demand shifts to other ground alternatives (like train or car), rather than shifting unrealistically to air as assumed in the conditional logit model.  

More formally, the nested logit dampens the impact of price changes across nests, making substitution more realistic and elasticities more conservative. The own-price elasticity in nested logit accounts for within-nest correlation, which reduces the responsiveness compared to the conditional logit’s IIA-based assumptions  