# Specification of the IPDL Model

This notebook introduces the general Pertubed Utility Model setup and then presents the IPDL Model, among others, as an importent special case. We also discuss the IPDL Model specification and a few computational shortcuts used in applications and implementions of the IPDL Model.

In [1]:
import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = "warn"
pd.set_option('display.max_rows', 500)
import os
import sys
from numpy import linalg as la
from scipy import optimize
import scipy.stats as scstat
from matplotlib import pyplot as plt
import itertools as iter
%load_ext line_profiler

# Files
module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)

data_path = os.path.join(module_path, 'data')

from utilities.Logit_file import estimate_logit, logit_se, logit_t_p, q_logit, logit_score, logit_score_unweighted, logit_ccp, LogitBLP_estimator
from data.Eurocarsdata_file import Eurocars_cleandata

In [2]:
# Load dataset and variable names

descr = (pd.read_stata(os.path.join(data_path,'eurocars.dta'), iterator = True)).variable_labels() # Obtain variable descriptions
dat_file = pd.read_csv(os.path.join(data_path,'eurocars.csv')) # reads in the data set as a pandas dataframe.

In [3]:
# Outside option is included if OO == True, otherwise analysis is done on the inside options only.
OO = True

# Choose which variables to include in the analysis, and assign them either as discrete variables or continuous.

x_discretevars = [ 'brand', 'home', 'cla']
x_contvars = ['cy', 'hp', 'we', 'le', 'wi', 'he', 'li', 'sp', 'ac', 'pr']
z_IV_contvars = ['xexr']
z_IV_discretevars = []
x_allvars =  [*x_contvars, *x_discretevars]
z_allvars = [*z_IV_contvars, *z_IV_discretevars]

if OO:
    nest_vars = [var for var in ['in_out', *x_allvars] if (var != 'pr')] # We nest over all variables other than price, but an alternative list can be specified here if desired.
else:
    nest_vars = [var for var in x_allvars if (var != 'pr')] # See above

nest_cont_vars = ['cy', 'hp', 'we', 'le', 'wi', 'he', 'li', 'sp', 'ac'] # The list of continuous variables, from which nests will be created according to the deciles of the distribution.

G = len(nest_vars)

# Print list of chosen variables as a dataframe
pd.DataFrame(descr, index=['description'])[x_allvars].transpose().reset_index().rename(columns={'index' : 'variable names'})

Unnamed: 0,variable names,description
0,cy,cylinder volume or displacement (in cc)
1,hp,horsepower (in kW)
2,we,weight (in kg)
3,le,length (in cm)
4,wi,width (in cm)
5,he,height (in cm)
6,li,"average of li1, li2, li3 (used in papers)"
7,sp,maximum speed (km/hour)
8,ac,time to acceleration (in seconds from 0 to 100...
9,pr,price (in destination currency including V.A.T.)


In [4]:
dat, dat_org, x_vars, z_vars, N, pop_share, T, J, K = Eurocars_cleandata(dat_file, x_contvars, x_discretevars, z_IV_contvars, z_IV_discretevars, outside_option=OO)

In [5]:
# Create dictionaries of numpy arrays for each market. This allows the size of the data set to vary over markets.

dat = dat.reset_index(drop = True).sort_values(by = ['market', 'co']) # Sort data so that reshape is successfull

x = {t: dat[dat['market'] == t][x_vars].values.reshape((J[t],K)) for t in np.arange(T)} # Dict of explanatory variables
y = {t: dat[dat['market'] == t]['ms'].to_numpy().reshape((J[t])) for t in np.arange(T)} # Dict of market shares

### Perturbed utility, logit and nested logit

In the following, a vector $z\in \mathbb R^d$ is always a column vector. A Perturbed utility Model (PUM) is a discrete choice model, where the probability vector over the alternatives is given by the solution to a utility maximization problem of the form
$$
P(u)=\arg\max_{q\in \Delta} q'u-\Omega(q)
$$
where $\Delta$ is the probability simplex over the set of discrete choices, $u$ is a vector of payoffs for each option, $\Omega$ is a convex function and $q'$ denotes the transpose of $q$. All additive random utility models can be represented in this way (Fosgerau and Sørensen (2021)). For example, the logit choice probabilities result from the perturbation function $\Omega(q)=q'\ln q$ where $\ln q$ is the elementwise logarithm.

In the nested logit model, the choice set is divided into a partition $\mathcal C=\left\{C_1,\ldots,C_L\right\}$, and the perturbation function is given by
$$
\Omega(q|\lambda)=(1-\lambda)q'\ln q+\lambda \sum_{\ell =1}^L \left( \sum_{j\in C_\ell}q_j\right)\ln \left( \sum_{j\in C}q_j\right),
$$
where $\lambda\in [0,1)$ is a parameter. This function can be written equivalently as
$$
\Omega(q|\lambda)=(1-\lambda)q'\ln q+\lambda \left(\psi q\right)'\ln \left( \psi q\right),
$$
where $\psi$ is a $J \times L$ matrix, where $\psi_{j\ell}=1$ if option $j$ belongs to nest $C_\ell$ and zero otherwise.
 This specification generates nested logit choice probabilities.

### The IPDL model

In the IPDL model, we allow for multiple nesting structures. For each $g=1,\ldots, G$, let $\mathcal C_g$ and $\psi^g$ be constructed as described for the nested logit, and let $L_g$ be the number of nests in group $g$. The IPDL perturbation function is then
$$
\Omega(q|\lambda)=(1-\sum_g \lambda_g) q'\ln q +\sum_g \lambda_g \left(\psi^g q \right)'\ln \left( \psi^g q\right),
$$
where $\lambda=(\lambda_1,\ldots,\lambda_G)$ is a parameter vector satisfying $\lambda_g \geq 0$ and $\sum_g \lambda_g<1$. In this model, each option belongs to $G\geq 1$ nests. When $G=1$, it simplifies to the nested logit model, and when $\sum_g \lambda_g=0$, it simplifies to the logit model. The IPDL model therefore allows more flexibility than a single nested logit model in the types of substitution patterns it can represent, without having to specify a hierarchical structure over the nests.

In this note, the nesting is done according to a subset of the explanatory variables. For categorical variables, each category is a nest. For continuous variables, the data set is partitioned according to the deciles of the variable, resulting in `at most` 10 nests of roughly equal size, as well as a nest for the outside option. This construction implies that $\Omega$ is a function of the data.


### The utility function

Let $x_{tj}$ be the vector of product characteristics for option $j$ in market $t$, and let $X_t$ denote the $J_t\times K $ matrix with elements $x_{tjk}$. For simplicity of exposition, we assume throughout these notebooks that the payoff of option $j$ is a linear function of the characteristics $x_{tj}$ of product $j$, which means that the vector of utilities may be written
$$
u(X_t,\beta)=X_t\beta.
$$

Letting $\theta=(\beta',\lambda')'$ denote the full parameter vector of length $D=K+G$ (i.e. the number of characteristics plus the number of nesting structures), the choice probabilities in market $t$ may be written as
$$
P_t(\theta)=\arg \max_{q\in \Delta_{J_t}} \left\{q'X_t \beta-(1-\sum_g \lambda_g)q'\ln q +\sum_{g=1}^G\lambda_g \left(\psi^{gt} q \right)'\ln \left(\psi^{gt} q\right)\right\}
$$



### Max-rescaling for numerical stability

Let $\alpha$ be a scalar, and let $\iota$ be the all-ones vector in $\mathbb R^J$. Note that $q'(u+\alpha\iota)=q'u+(q'\iota)\alpha=q'u+\alpha$, since $q$ sums to one when it is a probability vector. For this reason, $\alpha$ does not enter into the utility maximization when calculating $P(u+\alpha\iota|\theta)$, and we have $P(u+\alpha\iota|\theta)=P(u|\theta)$.

This allows us to re-scale the utilities just as in the logit model, since $P(u-(\max_{j}u_j)\iota|\theta)=P(u|\theta)$. The numerical benefits of this approach carry over to the IPDL model.

## Computing gradients
In implementions of the IPDL Model, it will be useful to define a few matrices to use in computations. First we define the matrix $\Psi \in \mathbb{R}^{(G + 1)J_t \times J_t}$ as the matrix stacking the Identity matrix $I_{J_t}$ in $R^{J_t \times J_t}$ on top of the $\psi^g$ matrices:

$$
\Psi = \left(
    \begin{array}{c}
        I_{J_t} \\
        \psi^1 \\
        \vdots \\
        \psi^G
    \end{array}
    \right)
$$

Another useful matrix for carrying out computations is the matrix $\Gamma \in \mathbb{R}^{(G + 1)J_t \times J_t}$ defined by:

$$
\Gamma = \left(
    \begin{array}{c}
        \left(1 - \sum_{g = 1}^G \lambda_g\right) I_{J_t} \\
        \lambda_1 \psi^1 \\
        \vdots \\
        \lambda_G \psi^G
    \end{array}
    \right)
$$

Using the above matrices, the Similarity pertubation function may be computed by: $\Omega(q|\lambda) = (\Gamma q)' \ln (\Psi q) $

## Gradient and Hessian

The gradient of $\Omega$ with respect to the choice probabilities is

$$
\nabla_q \Omega_t(q|\lambda)=(1-\sum_g \lambda_g)\ln q+ \sum_g \lambda_g(\psi^{gt})'\ln \left( \psi^{gt}q\right)+\iota=\Gamma' \ln (\Psi q_t) + \iota_{J_t}
$$

The Hessian of $\Omega$ is
$$
\nabla_{qq}^2 \Omega_t(q|\lambda)=(1-\sum_g \lambda_g) \mathrm{diag}(q)^{-1}+\sum_g\lambda_g (\psi^{gt})'\mathrm{diag}(\psi^{gt}q)^{-1}\psi^{gt} = \Gamma' \mathrm{diag}(\Psi q)^{-1} \Psi
$$

In [None]:
def Create_Gamma(Lambda, Psi, nest_count):
    '''
    This function creates the \Gamma matrices for each market t

    Args:
        Lambda: a (G,) numpy array of nesting parameters \lambda
        Psi: a stacked version of the \psi matrices as outputted by 'Create_nests'
        nest_count: a dictionary of length T with the number of nests in each nesting structure for each market t = 1,...,T

    Returns.
        Gamma: a dictionary of length T containing the \Gamma matrices of each market t
    '''

    T = len(Psi)
    
    Gamma = {}
    lambda0 = np.array([1 - sum(Lambda)])
    Lambda_full = np.concatenate((lambda0, Lambda)) # create vector (1- sum(lambda), lambda_1, ..., lambda_G)

    for t in np.arange(T):
        C,J = Psi[t].shape # The amount of alternatives in market t
        Lambda_long = np.empty((C,)) # Initialize a row vector with as many rows as Psi[t]
        indices = np.concatenate((np.array([J]) , nest_count[t])).cumsum().astype('int64') # Get the indices of where the identity and the nests in psi_stack are located along the rows of psi_stack.

        for i in np.arange(len(indices)):
            if i == 0:
                Lambda_long[0:(indices[i])] = Lambda_full[i] # Assign 1-sum(lambda) to the first J coordinates of Lambda_long
            else:
                Lambda_long[indices[i-1]:indices[i]] = Lambda_full[i] # Assign lambda_g to the coordinates of Lambda_long corresponding to the rows of psi_stack equal to the block matrix \psi^g 
    
        Gamma[t] =  np.einsum('c,cj->cj', Lambda_long, Psi[t]) # Compute hadamard product of lambda parameters and psi_stack

    return Gamma

In [None]:
lambda0 = np.ones((G,))/(2*(G+1))
Gamma0 = Create_Gamma(lambda0, Psi, Nest_count)