In [None]:
# === Environment Setup ===
import os, sys, math, time, random, json, textwrap, warnings
import numpy as np, pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.optimize import minimize
from scipy.stats import gumbel_r, norm
from IPython.display import display, Markdown, Image
from collections import OrderedDict
try:
    import pylogit as pl
    PYLOGIT_AVAILABLE = True
except ImportError:
    PYLOGIT_AVAILABLE = False
try:
    import pymc as pm
    import pytensor.tensor as pt
    import arviz as az
    PYMC_AVAILABLE = True
except ImportError:
    PYMC_AVAILABLE = False

# --- Configuration ---
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({'font.size': 12, 'figure.figsize': (11, 7), 'figure.dpi': 130})
np.set_printoptions(suppress=True, linewidth=120, precision=4)

# --- Utility Functions ---
def note(msg): display(Markdown(f"<div class='alert alert-info'>📝 {textwrap.fill(msg, width=100)}</div>"))
def sec(title): print(f"\n{100*'='}\n| {title.upper()} |\n{100*'='}")

note(f"Environment initialized. PyLogit: {PYLOGIT_AVAILABLE}, PyMC: {PYMC_AVAILABLE}")

# Part 5: Microeconomic Theory
## Chapter 5.04: Discrete Choice Models: Theory and Application

### Table of Contents
1.  [The Random Utility Model (RUM) Framework](#1.-The-Random-Utility-Model-(RUM)-Framework)
2.  [The Multinomial Logit (MNL)](#2.-The-Multinomial-Logit-(MNL))
    *   [2.1 Derivation from Gumbel Distributed Errors](#2.1-Derivation-from-Gumbel-Distributed-Errors)
    *   [2.2 The IIA Property and its Limitations](#2.2-The-IIA-Property-and-its-Limitations)
3.  [Relaxing the IIA Assumption: Nested and Mixed Logit](#3.-Relaxing-the-IIA-Assumption:-Nested-and-Mixed-Logit)
    *   [3.1 The Nested Logit Model](#3.1-The-Nested-Logit-Model)
    *   [3.2 The Mixed Logit Model](#3.2-The-Mixed-Logit-Model)
4.  [Estimation Methods](#4.-Estimation-Methods)
    *   [4.1 Maximum Likelihood (for MNL and Nested Logit)](#4.1-Maximum-Likelihood-(for-MNL-and-Nested-Logit))
    *   [4.2 Maximum Simulated Likelihood (for Mixed Logit)](#4.2-Maximum-Simulated-Likelihood-(for-Mixed-Logit))
    *   [4.3 Bayesian Estimation via MCMC](#4.3-Bayesian-Estimation-via-MCMC)
5.  [Application: The BLP Model of Demand](#5.-Application:-The-BLP-Model-of-Demand)
6.  [Chapter Summary](#6.-Chapter-Summary)
7.  [Exercises](#7.-Exercises)

### 1. The Random Utility Model (RUM) Framework

Developed by Daniel McFadden, the RUM frames the choice problem from the perspective of an econometrician with imperfect information. The utility individual $n$ derives from choosing alternative $j$ is:
$$ U_{nj} = V_{nj} + \epsilon_{nj} $$ 
Where $V_{nj} = \mathbf{X}_{nj}'\beta$ is the systematic (observable) utility and $\epsilon_{nj}$ is the stochastic (unobservable) utility. The probability that agent $n$ chooses alternative $i$ is the probability that its utility is the highest:
$$ P_{ni} = \text{Prob}(V_{ni} + \epsilon_{ni} > V_{nj} + \epsilon_{nj} \quad \forall j \neq i) $$ 
The specific functional form of this probability depends entirely on the assumption made about the joint distribution of the error terms.

### 2. The Multinomial Logit (MNL)

#### 2.1 Derivation from Gumbel Distributed Errors
The **Multinomial Logit (MNL)** model arises from the assumption that the error terms $\epsilon_{nj}$ are **independently and identically distributed (IID)** according to a **Type I Extreme Value (Gumbel)** distribution. The CDF of the Gumbel distribution is $F(\epsilon) = \exp(-e^{-\epsilon})$.

This assumption is powerful because the difference of two Gumbel variables follows a logistic distribution, and the maximum of several Gumbel variables also has a known, simple distribution. This allows the complex integral for the choice probability to be solved analytically, yielding the famous logit formula:
$$ P_{ni} = \frac{e^{V_{ni}}}{\sum_{j \in C_n} e^{V_{nj}}} $$

#### 2.2 The IIA Property and its Limitations
The IID assumption leads to the **Independence of Irrelevant Alternatives (IIA)** property. The ratio of probabilities for any two alternatives, $P_{ni}/P_{nk} = e^{V_{ni}}/e^{V_{nk}}$, depends only on the attributes of those two alternatives. This implies that introducing a new alternative reduces the probabilities of all existing alternatives proportionally. This is unrealistic in many settings (e.g., the "red bus/blue bus" problem), motivating more advanced models.

### 3. Relaxing the IIA Assumption: Nested and Mixed Logit

#### 3.1 The Nested Logit Model
The **Nested Logit** model partially relaxes the IIA assumption by grouping similar alternatives into "nests." The error term $\epsilon_{nj}$ is decomposed into a component common to the nest and a component specific to the alternative. This induces correlation among alternatives within the same nest.

The choice probability becomes a product of the probability of choosing a nest and the probability of choosing an alternative within that nest:
$$ P_{ni} = P(i | \text{nest } k) \times P(\text{nest } k) $$
The model estimates a **nesting parameter**, $\lambda_k$, which measures the degree of correlation within the nest. If $\lambda_k=1$, the model collapses to the standard MNL.

In [None]:
sec("Nested Logit Example")
if not PYLOGIT_AVAILABLE:
    note("Pylogit not installed. Skipping Nested Logit example.")
else:
    # Create a long-format dataframe with a 'nest_id' column
    # Choices 1,2 are 'bus' (nest 1), choice 3 is 'train' (nest 2)
    long_df = pd.DataFrame({
        'ind_id': np.repeat(np.arange(100), 3),
        'alt_id': np.tile([1,2,3], 100),
        'nest_id': np.tile([1,1,2], 100),
        'price': np.random.rand(300)*5,
        'time': np.random.rand(300)*2
    })
    # Dummy choice data
    long_df['choice'] = (long_df['alt_id'] == np.repeat(np.random.randint(1,4,100), 3)).astype(int)
    
    spec = OrderedDict([('price', 'all_same'), ('time', 'all_same')])
    nest_spec = OrderedDict([('nest_id', [1, 2])])
    
    model = pl.create_choice_model(data=long_df, alt_id_col='alt_id', obs_id_col='ind_id',
                                   choice_col='choice', specification=spec, 
                                   model_type="Nested Logit", nests=nest_spec)
    # model.fit_mle(...) # Estimation would follow

#### 3.2 The Mixed Logit Model
The **Mixed Logit** (or Random Coefficients Logit) model is the most flexible and widely used discrete choice model. It fully overcomes the IIA property by allowing the coefficients ($\eta$) themselves to be random, varying across individuals according to a specified distribution (e.g., price sensitivity $\beta_{price}$ is normally distributed in the population).

The utility function becomes $U_{nj} = \mathbf{X}_{nj}'\beta_n + \epsilon_{nj}$, where $\beta_n \sim f(\beta | b, W)$ is a vector of coefficients for individual $n$. The overall choice probability is the integral of the logit probabilities over the distribution of $\beta_n$:
$$ P_{ni} = \int \frac{e^{\mathbf{X}_{ni}'\beta}}{\sum_{j} e^{\mathbf{X}_{nj}'\beta}} f(\beta | b, W) d\beta $$

### 4. Estimation Methods

#### 4.1 Maximum Likelihood (for MNL and Nested Logit)
For models with a closed-form choice probability, we can use standard Maximum Likelihood Estimation.

#### 4.2 Maximum Simulated Likelihood (for Mixed Logit)
For models without a closed-form probability like the Mixed Logit, we use **Maximum Simulated Likelihood (MSL)**. We approximate the integral in the choice probability by taking many draws from the distribution of the random coefficients and averaging the resulting logit probabilities.

#### 4.3 Bayesian Estimation via MCMC
An alternative to MSL for Mixed Logit is to use Bayesian MCMC methods. This approach treats the individual-level coefficients $\beta_n$ as latent variables to be estimated along with the population-level parameters $(b, W)$. It can be more computationally robust and provides a richer characterization of uncertainty.

### 5. Application: The BLP Model of Demand
The **Berry, Levinsohn, and Pakes (BLP, 1995)** model is the workhorse of modern empirical Industrial Organization. It combines a Mixed Logit demand model with instrumental variables to tackle the critical problem of **price endogeneity** (prices are correlated with unobserved product quality).

**The BLP Algorithm:**
1.  **Demand Side:** A Mixed Logit model relates consumer choices to product characteristics (including price) and unobserved quality, $\xi_j$. The model allows for heterogeneous consumer tastes.
2.  **Supply Side:** Firms are assumed to set prices in a Bertrand-Nash equilibrium, where price is a function of marginal cost and the prices of competing products.
3.  **Inversion:** The key insight is to use the model to *invert* the observed market shares to back out the unobserved product quality shocks, $\xi_j$, that are consistent with the data.
4.  **GMM Estimation:** The final step is a GMM estimation. The moment condition is that the unobserved quality shocks, $\xi_j$, must be uncorrelated with the instrumental variables. Common instruments include the characteristics of *other* firms' products, which are correlated with price but plausibly uncorrelated with the unobserved quality of a specific product.

### 6. Chapter Summary
- **RUM:** The Random Utility Model is the theoretical foundation, where choice probabilities are determined by the distribution of unobserved utility.
- **Logit Models:** Assuming IID Gumbel errors yields the simple **Multinomial Logit (MNL)**, but this implies the restrictive **IIA property**. The **Nested Logit** relaxes this by grouping alternatives, while the **Mixed Logit** is fully flexible by allowing for random taste coefficients.
- **Estimation:** Simple logit models are estimated with MLE. More complex models like Mixed Logit require simulation methods, either **Maximum Simulated Likelihood (MSL)** or **Bayesian MCMC**.
- **Endogeneity:** In real-world applications, price is often endogenous. The **BLP model** is the standard framework for tackling this, combining a Mixed Logit demand system with an instrumental variables (GMM) approach.

### 7. Exercises

1.  **The IIA Property:** Explain the "red bus/blue bus" problem. Why does the Multinomial Logit model fail in this scenario, and how does the Nested Logit model solve the problem?

2.  **Derivation of Logit:** Show that if $U_1 = V_1 + \epsilon_1$ and $U_2 = V_2 + \epsilon_2$, where $\epsilon_1, \epsilon_2$ are IID Gumbel, then the difference $\epsilon_1 - \epsilon_2$ follows a logistic distribution, and from this, derive the binary logit choice probability.

3.  **Simulation Bias:** The MSL estimator is consistent, but it is biased in finite samples because of the simulation noise. How does the bias of the MSL estimator depend on the number of simulation draws, `R`? 

4.  **Welfare Calculation:** The **log-sum-exp term**, $\ln(\sum_j e^{V_{nj}})$, represents the expected maximum utility an individual gets from their choice set. Use estimated MNL parameters to calculate the change in consumer welfare (in dollar terms) if the price of one alternative increases by 10%. (Hint: The marginal utility of income is the negative of the price coefficient).

5.  **BLP Instruments:** In the BLP model, why are the characteristics of other firms' products considered a valid instrument for price? Explain both the relevance and exogeneity conditions in this context.