# Modeling Demand for Cars with the IPDL model

In this notebook, we will explore the dataset used in
Goldberg & Verboven (2005). We will estimate the IPDL Model
model given the available data using the functions defined below.

In [161]:
import numpy as np
import pandas as pd 
import os
from numpy import linalg as la
from scipy import optimize
from IPython import display
from matplotlib import pyplot as plt
import itertools as iter
from numba import jit
import sparse as sp

# Files
import Logit_file as logit

Data
====

The dataset consists of approximately 110 vehicle makes per year in the period 1970-1999 in five european markets (Belgium, France, Germany, Italy, and the United Kingdom). Furthermore, the data contains information on various characteristics of the makes such as sales, prices, horse power, weight and other physical car characteristics. Also these characteristics may vary across markets. 

A observation in our analysis will be a market in a given year such that e.g. the French car market in 1995 counts as a single observation. If $Y = 30$ is the number of years, and $M = 5$ is the number of country-level markets, we thus have $T=Y\cdot M = 150$ markets and observations. In addition, since the available vehicle makes vary across time and place, let $\mathcal{J}_t$ denote the set of available makes in each market $t=1,\ldots,T$, and let $\mathcal{J} := \bigcup_{t=1}^T \mathcal{J}_t$ be the set of all makes which were available in some market. Then $J:=\#\mathcal{J}$ is the number of makes which were available at some point of time in the period in at least one country-level market. In our dataset there are $J = 356$ unique vehicle makes. Note also however that characteristics of vehicle makes vary across markets.

Our dataset includes 47 variables in total. The first three columns are market and product codes for the year, country, and make. Another variable is quantity sold (No. of new registrations) which will be used in computing observed market shares. The remaining 43 variables are potential explanatory variables. We will only consider the subset of these which describes car characteristics such as brand, after-tax price, horse power, etc. which adds up to $K=20$ characteristics. The remaining 23 variables are mainly macroeconomic variables such as e.g. GDP per capita which have been used to construct estimates of e.g. the average wage income and purchasing power. Since we are only interested in utility-shifting variables, we will not consider the latter columns. 

Reading in the dataset `eurocars.csv` we thus have a dataframe of $\sum_{t=1}^T \#\mathcal{J}_t = 11459$ rows and $47$ columns. The `ye` column runs through $y=70,\ldots,99$, the `ma` column runs through $m=1,\ldots,M$, and the ``co`` column takes values $j\in \mathcal{J}$. 

Because we consider a country-year pair as the level of observation, we construct a `market` column taking values $t=1,\ldots,T$. We also construct a `market_share` variable giving us the market share of any product $j$ in any market $t$; this will obviously take values in $[0,1]$. To deal with the fact that choice sets $\mathcal{J}_t$ vary across markets, we expand the dataframe so that every car $j\in \mathcal{J}$ which was observed in some market $t$ is in the choice set of all other markets as well, i.e. we impose $\mathcal{J}_t = \mathcal{J}$ for all markets $t$. We then impute a market share of $q_{jt}=0$ for any car $j$ which in reality was not available in market $t$. To this end we first construct an outside option $j=0$ in each market $t$  of not buying a car by letting the 'sales' of $j=0$ being determined as 

$$\mathrm{sales}_{0t} = \mathrm{pop}_t - \sum_{j=1}^J \mathrm{sales}_{jt}$$

where $\mathrm{pop}_t$ is the total population in market $t$.

We also read in the variable description of the dataset contained in `eurocars.dta`. We will use the list `x_vars` throughout to work with our explanatory variables.

Lastly, we access the underlying 3-dimensional numpy array of the explonatory variables `x` by sorting on `market` and then `co`, and subsequently resizing the explanatory variables as

> `x = dat[x_vars].values.resize((T,J,K))`

In [162]:
# Load dataset and variable names
os.chdir('../GREENCAR_notebooks/')
input_path = os.getcwd() # Assigns input path as current working directory (cwd)
descr = (pd.read_stata('eurocars.dta', iterator = True)).variable_labels()
dat = pd.read_csv(os.path.join(input_path, 'eurocars.csv'))

In [163]:
pd.DataFrame(descr, index=['description']).transpose().reset_index().rename(columns={'index' : 'variable names'})

Unnamed: 0,variable names,description
0,ye,year (=first dimension of panel)
1,ma,market (=second dimension of panel)
2,co,model code (=third dimension of panel)
3,zcode,alternative model code (predecessors and succe...
4,brd,brand code
5,type,name of brand and model
6,brand,name of brand
7,model,name of model
8,org,"origin code (demand side, country with which c..."
9,loc,"location code (production side, country where ..."


We now clean the data to fit our setup

In [164]:
### First we create the 'market' column 

dat = dat.sort_values(by = ['ye', 'ma'], ascending = True)
market_vals = [*iter.product(dat['ye'].unique(), dat['ma'].unique())]
market_vals = pd.DataFrame({'year' : [val[0] for val in market_vals], 'country' : [val[1] for val in market_vals]})
market_vals = market_vals.reset_index().rename(columns={'index' : 'market'})
dat = dat.merge(market_vals, left_on=['ye', 'ma'], right_on=['year', 'country'], how='left')

In [165]:
### Second we expand the dataset such that all cars are at least vacuously available in all markets

dat = dat.sort_values(['market', 'co'])
product_vals = [*iter.product(dat['market'].unique(), dat['co'].unique())]
product_vals = pd.DataFrame({'market' : [val[0] for val in product_vals], 'co' : [val[1] for val in product_vals]})
dat = product_vals.merge(dat, on=['market','co'], how='outer')
dat['active'] = np.where(dat['qu'].notna(), 1, 0) # Create a column of whether cars was actually active or not.
dat['qu'] = np.where(dat['qu'].isna(), 0, dat['qu'])

In [166]:
### Third we construct an outside option for each market t

outside_shares = dat.groupby('market', as_index=False)['qu'].sum()
outside_shares = outside_shares.merge(dat[['market', 'pop']], on = 'market', how='left').dropna().drop_duplicates(subset = 'market', keep = 'first')
outside_shares['qu'] = outside_shares['pop'] - outside_shares['qu']
outside_shares['co'] = 0
outside_shares['active'] = 1
dat = pd.concat([dat, outside_shares])

# Potentially set characteristics equal to 0 for outside option. However consider different data types!

In [167]:
### We also create an indicator on whether a car was active in given market as a numpy array
A = pd.pivot(dat[['market', 'co', 'active']], index='market', columns='co', values='active').values

In [168]:
### Fourth we compute market shares for each product j in each market t 

dat['ms'] = dat.groupby('market')['qu'].transform(lambda x: x/x.sum())

In [169]:
# Determine explanatory variables and find variable description as 'x_lab'
x_vars =  [dat.keys()[k] for k in [*range(6,14), *range(15,23), *range(26,30)]]
nest_vars = [var for var in x_vars if (var != 'type')&(var != 'model')&(var != 'pr')] # we will nest on variables which are not price, brand, model.
nest_cont_vars = ['cy', 'hp', 'we', 'le', 'wi', 'he', 'li', 'sp', 'ac']
x_lab = (pd.DataFrame(descr, index=['description'])[x_vars].transpose().reset_index().rename(columns={'index' : 'variable names'}))

In [170]:
x_lab

Unnamed: 0,variable names,description
0,type,name of brand and model
1,brand,name of brand
2,model,name of model
3,org,"origin code (demand side, country with which c..."
4,loc,"location code (production side, country where ..."
5,cla,class or segment code
6,home,domestic car dummy (appropriate interaction of...
7,frm,firm code
8,cy,cylinder volume or displacement (in cc)
9,hp,horsepower (in kW)


In [171]:
# Find the dimensions of Data
T = dat['market'].nunique()
J = dat['co'].nunique()
K = len(x_vars)

We also convert our discrete explanatory variables to numerical variables

In [172]:
obj_columns = dat.select_dtypes(['object'])
for col in obj_columns:
    dat[col] = dat[col].astype('category').cat.codes.astype('float64') # Possibly a problem with Nan's being mapped to -1 ?

Additionally we fill Nan values with '$-1$' in remaining columns

In [173]:
dat = dat.fillna(-1)

We also scale values such that they lie in the interval $[-1,1]$. This has various numerical benefits. Also, this will not affect elasticities or diversion ratios, but semielasticities will be affected by the scaling.  

In [174]:
dat[x_vars] = dat[x_vars] / dat[x_vars].abs().max()

Finally, we will primarily use numpy data types and numpy functions in this notebook. Hence we store our response variable 'y' and our explanatory variables 'x' as numpy arrays.

In [223]:
# Create numpy arrays of response and explanatory variables
dat = dat.sort_values(by = ['market', 'co']) # Sort data so that reshape is successfull
x = dat[x_vars].values.reshape((T,J,K))
y = dat['ms'].to_numpy().reshape((T,J))

#### Multinomial Logit - for comparison
Estimating a Logit model via maximum likelihood with an initial guess of parameters $\hat \beta^0 = 0$ yields estimated parameters $\hat \beta^{\text{logit}}$ given as...

beta_0 = np.zeros((K,))

# Estimate the model
res_logit = logit.estimate_logit(logit.q_logit, beta_0, a, x)

logit_beta = res_logit['beta']
pd.DataFrame(logit_beta.reshape(1,len(logit_beta))) # Our estimates

We then compute the corresponding Logit choice probabilities

logit_q = logit.logit_ccp(logit_beta, x)

We also find the elasticities and diversion ratios implied by the logit model as follows...

epsilon_logit = logit.logit_elasticity(logit_q, logit_beta, 0) # Elasticities wrt. the price-to-log-income characteristic
DR_logit_hat = logit.logit_diversion_ratio(logit_q, logit_beta)

# The IPDL model - Nesting structure

The IPDL model is a generalization of the nested logit model where each alternative may belong to more than one nest. Before fully introducing the model, we construct the nesting structure.


## Constructing nests

Let $\Delta=\left\{q\in \mathbb{R}^J_+: \sum_{j=1}^J q_j=1\right\}$ denote the probability simplex. For each group of nests $g=1,\ldots, G$, nest membership is denoted by the matrix $\Psi^g\in \mathbb R^{C_g\times J}$: $\Psi^g_{cj}=1$ if product $j$ belongs to nest $c$ and zero otherwise, and each product can only belong to one nest within each group, meaning that $\sum_{c=1}^{C_g}\Psi^g_{cj}=1$ for all $j$ and all $g$. The matrix-vector product $\Psi^gq$ is then
$$
\Psi^g q=\sum_j \Psi^{g}_{cj}q_j=\left(\begin{array}{c}
\sum_{j:\Psi^g_{1j}=1} q_j \\
\vdots \\
\sum_{j: \Psi^g_{C_gj}=1}q_j
\end{array}\right),
$$
and the vector $\Psi^gq$ is a vector of nest-specific choice probabilities, i.e. the sum of the probabilities within each nest.

### The perturbation function $\Omega$

In the following, a vector $z\in \mathbb R^d$ is always a column vector. We now construct the IPDL perturbation function which has the form (where for a vector $z$, the logarithm is applied elementwise and $z'$ denote the transpose)
$$
\Omega(q|\lambda)= (1-\sum_{g=1}^G \lambda_g) q'\ln q +\sum_{g=1}^{G} \lambda_g \left(\Psi^g q \right)'\ln \left(\Psi^g q\right).
$$
Note that since $\Psi^g q$ denotes a probability distribution over the nests, the term $(\Psi^gq)'\ln (\Psi^gq)$ is the (negative) entropy of the probability distribution $\Psi^g q$. Similarly, $q'\ln q$ is the negative entropy of q. Note also that as each nest has at least one member, and $q$ is strictly positive, $\Psi^gq$ is also strictly positive. When the parameters $\lambda_g$ satisfy $\lambda_g>0$ and
$$
\sum_g \lambda_g<1,
$$
the function $\Omega(\cdot|\lambda)$ is a strictly convex function of $q$, and the utility maximization problem has a unique interior (meaning strictly positive choice probabilities) solution. If $\lambda_g = 0$ for all groupings $g$, we immediately see that the  IPDL becomes the standard multinomial Logit model for the choice probabilities $q$. When there is only one group of nests, $G=1$, then $\Omega$ induces the nested logit choice probabilities (note though that the nested logit model is often parameterized in terms of the nesting parameter $\mu=1-\lambda$ instead!). 

It will be convenient to define a choice probability function for a given vector of payoffs $u$ as
$$
P(u|\lambda)=\arg \max_{q\in \Delta}\left\{q'u-\Omega(q|\lambda)\right\}
$$
Letting $\theta$ denote the full vector of parameters, $\theta=(\beta',\lambda')'$, the individual choice probabilities is a function of the matrix $\mathbf{X}_i$ and the parameters $\theta$, as
$$
p(\mathbf{X}_i,\theta)=\arg\max_{q\in \Delta}\left\{q'\mathbf{X}_i \beta-(1-\sum_{g=1}^G\lambda_g)q'\ln q-\sum_{g=1}^G\lambda_g \left(\Psi^g q \right)'\ln \left(\Psi^g q\right)\right\}
$$

# Max-rescaling for numerical stability

Let $\alpha$ be a scalar, and let $\iota$ be the all-ones vector in $\mathbb R^J$. Note that $q'(u+\alpha\iota)=q'u+(q'\iota)\alpha=q'u+\alpha$, since $q$ sums to one. For this reason, $\alpha$ does not enter into the utility maximization when calculating $P(u+\alpha\iota|\lambda)$, and we have $P(u+\alpha\iota|\lambda)=P(u|\lambda)$.

This allows us to re-scale the utilities just as in the logit model, since $P(u-(\max_{j}u_j)\iota|\lambda)=P(u|\lambda)$. The numerical benefits of this approach carry over to the IPDL model.

## Gradient and Hessian

For purposes of computing the gradient and Hessian of $\Omega$, it is convenient to define
$$
\Gamma=\left(\begin{array}{c}
(1-\sum_g \lambda_g)I_J\\
\lambda_1 \Psi^1\\
\vdots\\
\lambda_G \Psi^G
\end{array}\right)
$$
where $I_J$ is the identity matrix in $\mathbb R^J$. The matrix $\Gamma$ is a block matrix with $J+\sum_g C_g$ rows and $J$ columns. Note that 

$$
\Gamma q=\left(\begin{array}{c}
(1-\sum_g\lambda_g)q \\
\lambda_1\Psi^g q\\
\vdots \\
\lambda_G \Psi^Gq
\end{array}\right)>0
$$
if $q>0$.

Using $\Gamma$, we can show that
$$
\Omega(q|\lambda)=(\Gamma q)'\ln (\Gamma q)+c\\
\nabla_q \Omega(q|\lambda)=\Gamma'\ln (\Gamma q)+\iota\\
\nabla^2_{qq}\Omega(q|\lambda)=\Gamma'\mathrm{diag}(\Gamma q)^{-1}\Gamma,
$$
where $c$ is a scalar that depends on $\lambda$ but not on $q$ and therefore does not affect the utility maximization problem, $\iota=(1,\ldots,1)'\in \mathbb R^J$ is the all-ones vector and $\mathrm{diag}(z)$ is a diagonal matrix with the elements of the vector $z$ on the diagonal.

In the following we impose on all nests on all markets. We deal with this by setting $\psi_{tcj} = 0$ for all products $j$ if the nest $c$ was not in fact observed in market $t$.

In [176]:
def Create_nests(data, markets_id, products_id, columns, cont_var = None, cont_var_bins = None):
    '''
    This function creates the nest matrices \Psi^g from any specified columns in data

    Args.
        data: a pandas DataFrame
        markets_id: a string denoting the column of 'data' containing an enumeration t=0,1,...,T-1 of markets
        products_id: a string denoting the column of 'data' containing product code which uniquely identifies products
        columns: a list containing the column names of columns in 'data' from which nest groupings g=0,1,...,G-1 for each market t are to be generated
        cont_var: a list of the continuous variables in 'columns'
        caont_var_bins: a list containing the number of bins to make for each continuous variable in 'columns'

    Returns
        Psi_dict: a dictionary of dictionaries of the Psi_g matrices for each market t and each gropuing g
        nest_dict: a dictionary of dictionaries of pandas dataframes describing the structure of each nest for each market t and each grouping g 
    '''

    J = data[products_id].nunique()
    T = data[markets_id].nunique()
    G = len(columns)

    dat = data.sort_values(by = [markets_id, products_id]) # This is good :)
    
    Psi_dict = {}
    nest_dict = {}

    ### Bin continuous variables

    if cont_var == None:
        None
    else:
        for var,n_bins in zip(cont_var,cont_var_bins):
            dat[var] = pd.cut(dat[var], bins=n_bins, labels=[str(i) for i in range(1,n_bins +1)], include_lowest=True)
        

    nest_counts = dat[columns].nunique().values

    ### New - find unique nests over all markets t and impose all nests into all markets t 
    for g in range(G):
        
        col = columns[g]
        vals = pd.DataFrame({'nests' : dat[col].sort_values().unique()}).reset_index().rename(columns={'index' :'nest_index'})
        descr = vals.rename_axis(col, axis='columns')
        nest_dict[g] = descr

        product_enumeration = pd.DataFrame({products_id : dat[products_id].sort_values().unique(), 'product_enumeration' : np.arange(dat[products_id].nunique())})
        C_g = dat[col].nunique()
        Psi_dict_t = {}

        for t in range(T):
            frame = dat[dat[markets_id] == t][[products_id, col]].merge(vals, left_on = col, right_on = 'nests')
            allocation = frame[[products_id, 'nest_index']].merge(product_enumeration, on=products_id, how='left')

            mat = np.zeros((int(C_g), J))

            for c,j in zip(allocation['nest_index'], allocation['product_enumeration']):
                mat[c, j] = 1

            Psi_dict_t[t] = mat
        
        Psi_dict[g] = Psi_dict_t

    C = np.concatenate([np.eye(J) if g==0 else Psi_dict[g-1][0] for g in range(G+1)]).shape[0]
    Gamma_tilde = np.empty((T,C,J))

    for t in range(T):
        Gamma_tilde[t,:,:] = np.concatenate([np.eye(J) if g==0 else Psi_dict[g-1][t] for g in range(G+1)])

    return Gamma_tilde, Psi_dict, nest_dict, nest_counts

We bin all the continuous explanatory variables different from `pr` (i.e. the price) in 10 bins, and the grouping of `pr` includes 100 bins.

In [177]:
Psi_stack, Psi_dict, Nest_descr, Nest_count = Create_nests(dat, 'market', 'co', nest_vars, nest_cont_vars, [*[np.int64(10) for i in range(len(nest_cont_vars))]])

In [178]:
def Create_Gamma( Lambda, Psi_stack, nest_count):
    '''
    This function 
    '''

    T,C,J = Psi_stack.shape

    lambda0 = 1 - sum(Lambda)
    Lambda_long = np.empty((C))
    Lambda_full = [lambda0, *Lambda]
    indices = np.array([J,*nest_count]).cumsum()

    for i in range(len(indices)):
        if i == 0:
            Lambda_long[0:indices[i]] = Lambda_full[i]
        else:
            Lambda_long[indices[i-1]:indices[i]] = Lambda_full[i]
    
    Gamma =  Lambda_long[None,:,None] * Psi_stack # np.einsum('c,tcj->tcj', Lambda_long, Psi_stack, optimize=True)

    return Gamma

def Create_Gamma2(x,Lambda, Psi_dict):
    ''' 
    '''

    G = len(Psi_dict.keys())
    T,J,K = x.shape

    lambda0 = 1 - sum(Lambda)
    Lambda_full = [lambda0, *Lambda]

    C = np.concatenate([np.eye(J) if g==0 else Psi_dict[g-1][0] for g in range(G+1)]).shape[0]
    Gamma = np.empty((T,C,J))

    for t in range(T):
        Gamma[t,:,:] = np.concatenate([Lambda_full[g]*np.eye(J) if g==0 else Lambda_full[g]*Psi_dict[g-1][t] for g in range(G+1)])

    return Gamma

In [179]:
G = K
lambda0 = np.ones((G,))/(G+1)
Gamma0 = Create_Gamma(lambda0, Psi_stack, Nest_count)

## Model solution

While it is possible to solve for the choice probabilities explicitly by maximizing utility, Fosgerau and Nielsen (2021) suggest a contraction mapping approach which is conceptually simpler. Suppose we are evaluating the likelihood at some guess of the parameters $\theta=(\beta',\lambda')$. Let $u_i=\mathbf{X}_i\beta$, and let $q_i^0$ denote some initial vector of choice probabilities e.g. $q^0_i=\frac{e^{u_i}}{\sum_{j'=1}^Je^{u_{ij'}}}$, we update the choice probabilities according to the formula
$$
v_i^{k} =u_i+\ln q_i^{k-1}-\Gamma'\ln (\Gamma q_i^{k-1})\\
q_i^{k} = \frac{e^{v_i^{k}}}{\sum_{j=1}^J e^{v_{ij}^{k}}},
$$
they show that $\lim_{k\rightarrow \infty}q_i^k=p(\mathbf{X}_i,\theta)$ for any starting value $q^0_i$ in the interior of $\Delta$. For numerical stability, it can be a good idea to also do max-rescaling of $v^k_i$ at every iteration.

Let $p$ denote the solution to the utility maximization problem. Formally, the Kullback-Leibler divergence $D_{KL}(p||q)=p'\ln \frac{p}{q}$ decays linearly with each iteration,
$$
D_{KL}(p||q^{k+1})\leq \left(1- \sum_g \lambda_g \right)D_{KL}(p||q^k),
$$
Noting that $(1-\sum_g \lambda_g)\in [0,1)$ by assumption.

In [180]:
def IPDL_ccp(Beta, x, Gamma, active_mat, tol = 1.0e-15, maximum_iterations = 1000, MAXRESCALE:bool = True):
    ''' 
    '''

    u = logit.util(Beta, x)
    q = np.exp(u) / np.exp(u).sum(axis = 1, keepdims=True) # Find logit choice probabilities
    q0 = active_mat*q

    assert u.ndim == 2
    assert q.ndim == 2

    T,J,K = x.shape
    
    Epsilon = 1.0e-8

    for k in range(maximum_iterations):
        # Calculate v
        gamma_q = np.einsum('tcj,tj->tc', Gamma, q0, optimize=True)
        gamma_log_prod = np.einsum('tcj,tc->tj', Gamma, np.log(gamma_q + Epsilon), optimize=True)
        v = u - gamma_log_prod

        # Calculate iterated ccp q^k
        denom = np.sum(q0 * np.exp(v), axis=1, keepdims=True)
        numerator = q0*np.exp(v)
        q1 = active_mat * numerator / denom

        # Check convergence in an appropriate distance function
        dist = np.max(np.sum((q1-q0)**2/q , axis=1)) # Uses logit weights. This avoids precision issues when q1~q0~0.

        if dist<tol:
            break
        elif k==maximum_iterations:
            break
        else:
            None
        
        # Iteration step
        q0 = q1

    return q1 

In [181]:
beta0 = 0.1*np.ones((K,))
theta0 = np.append(beta0, lambda0)

q0_hat = IPDL_ccp(beta0, x, Gamma0, A)
pd.DataFrame(q0_hat).rename_axis(index='Markets', columns='Products')

Products,0,1,2,3,4,5,6,7,8,9,...,347,348,349,350,351,352,353,354,355,356
Markets,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0.021786,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,0.019555,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,0.020951,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,0.023076,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,0.022384,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
145,0.021566,0.0,0.0,0.0,0.0,0.026945,0.0,0.0,0.0,0.0,...,0.000000,0.000989,0.012376,0.013339,0.018787,0.001146,0.035550,0.028714,0.004241,0.000349
146,0.023629,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.037783,0.003200,0.010011,0.009253,0.027931,0.000595,0.037549,0.028656,0.008825,0.000437
147,0.022068,0.0,0.0,0.0,0.0,0.035059,0.0,0.0,0.0,0.0,...,0.045526,0.002995,0.007867,0.009455,0.027532,0.001046,0.025389,0.015806,0.013101,0.001698
148,0.022802,0.0,0.0,0.0,0.0,0.036639,0.0,0.0,0.0,0.0,...,0.033189,0.001251,0.007858,0.011026,0.031485,0.000999,0.041191,0.028609,0.003931,0.000802


In [182]:
assert q0_hat.sum(axis=1).all() == 1

In [183]:
Gammaq0 = np.einsum('tcj,tj->tc', Gamma0, q0_hat)

In [184]:
Gammaq0

array([[0.00103744, 0.        , 0.        , ..., 0.00373226, 0.00165312,
        0.        ],
       [0.0009312 , 0.        , 0.        , ..., 0.00611345, 0.00146074,
        0.        ],
       [0.00099767, 0.        , 0.        , ..., 0.00463209, 0.00126549,
        0.        ],
       ...,
       [0.00105085, 0.        , 0.        , ..., 0.        , 0.        ,
        0.00946753],
       [0.00108583, 0.        , 0.        , ..., 0.        , 0.        ,
        0.01027736],
       [0.0010952 , 0.        , 0.        , ..., 0.        , 0.        ,
        0.00860977]])

In [185]:
np.divide(1,Gammaq0, out=np.zeros(Gammaq0.shape), where= Gammaq0!=0)

array([[ 963.90816481,    0.        ,    0.        , ...,  267.93450181,
         604.91577253,    0.        ],
       [1073.88142216,    0.        ,    0.        , ...,  163.57388385,
         684.58404991,    0.        ],
       [1002.33229683,    0.        ,    0.        , ...,  215.88544514,
         790.2092947 ,    0.        ],
       ...,
       [ 951.606512  ,    0.        ,    0.        , ...,    0.        ,
           0.        ,  105.62420253],
       [ 920.95825552,    0.        ,    0.        , ...,    0.        ,
           0.        ,   97.30127433],
       [ 913.07657408,    0.        ,    0.        , ...,    0.        ,
           0.        ,  116.14710251]])

## Demand derivatives and price Elasticity

While the demand derivatives in the IPDL model are not quite as simple as in the logit model, they are still easy to compute. 
Let $q=P(u|\lambda)$, then
$$
\nabla_u P(u|\lambda)=\left(\nabla^2_{qq}\Omega(q|\lambda)\right)^{-1}-qq'
$$
where the $()^{-1}$ denotes the matrix inverse. The derivatives with respect to any $x_{ij\ell}$ can now easily be computed by the chain rule,
$$
    \frac{\partial P_j(u_i|\lambda)}{\partial x_{ik\ell}}=\frac{\partial P_j(u_i|\lambda)}{\partial u_{ik}}\frac{\partial u_{ik}}{\partial x_{ik\ell}}=\frac{\partial P_j(u_i|\lambda)}{\partial u_{ik}}\beta_\ell,
$$

Finally, moving to price elasticity is the same as in the logit model, if $x_{ik\ell}$ is the log price of product $k$ for individual $i$, then
$$
    \mathcal{E}_{jk}= \frac{\partial P_j(u_i|\lambda)}{\partial x_{ik\ell}}\frac{1}{P_j(u_i|\lambda)}=\frac{\partial P_j(u_i|\lambda)}{\partial u_{ik}}\frac{1}{P_j(u_i|\lambda)}\beta_\ell=\frac{\partial \ln P_j(u_i|\lambda)}{\partial u_{ik}}\beta_\ell$$
we can also write this compactly as
$$
\nabla_u \ln P(u|\lambda)=\mathrm{diag}(P(u|\lambda))^{-1}\nabla_u P(u|\lambda).
$$

In [186]:
def compute_pertubation_hessian(q, x, Theta, Psi, active_mat, nest_count):
    '''
    This function calucates the hessian of the pertubation function \Omega

    Args.
        q: a (N,J) numpy array of choice probabilities
        Lambda: a (G,) numpy array of nesting parameters
        Psi: a dictionary of the \Psi^g matrices as columns as outputted 'Create_incidence_matrix'
    
    Returns
        Hess: a (N,J,J) numpy array of second partial derivatives of the pertubation function \Omega
    '''
    assert q.ndim == 2
    assert Theta.ndim == 1
    
    T,J,K = x.shape

    Gamma = Create_Gamma(Theta[K:], Psi, nest_count)
    Active_indicator = active_mat[:,:,None]*active_mat[:,None,:]

    gamma_q = np.einsum('tcj,tj->tc', Gamma, q)
    inv_gamma_q = np.divide(1, gamma_q, out=np.zeros(gamma_q.shape), where= gamma_q!=0) # Might have numerical implications... We handle division by zeros coming from imputation of non-active products by imputing a zero whenever the divisor is zero. 
    Hess = Active_indicator * np.einsum('tcj,tc,tck->tjk', Gamma, inv_gamma_q, Gamma) # Works since einsum merely divides through by c'th element in gamma_q (E.g. diag(\Gamma q)^-1) 

    return Hess

In [187]:
def ccp_gradient(q, x, Theta, Psi, active_mat, nest_count):
    '''
    This function calucates the gradient of the choice proabilities wrt. characteristics

    Args.
        q: a (N,J) numpy array of choice probabilities
        x: a (N,J,K) numpy array of covariates
        Lambda: a (G,) numpy array of nesting parameters
        Psi: a dictionary of the \Psi^g matrices as columns as outputted 'Create_incidence_matrix'
    
    Returns
        Grad: a (N,J,K) numpy array of partial derivatives of the choice proabilities wrt. characteristics
    '''

    assert q.ndim == 2

    T,J,K = x.shape

    inv_omega_hess = la.pinv(compute_pertubation_hessian(q, x, Theta, Psi, active_mat, nest_count)) # (N,J,J) # For each i=1,...,N , computes the inverse of the J*J Hessian
    qqT = np.einsum('tj,tk->tjk', q, q) # (N,J,J) outerproduct
    Grad = inv_omega_hess - qqT

    return Grad

In [188]:
gradccp0 = ccp_gradient(q0_hat, x, theta0, Psi_stack, A, Nest_count)

In [211]:
pd.DataFrame(gradccp0[0,:,:])

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,347,348,349,350,351,352,353,354,355,356
0,3.781943e-02,3.007997e-16,-1.812873e-16,5.220196e-17,-1.362745e-17,1.549284e-18,-3.236327e-17,1.082968e-17,4.965247e-17,-2.551926e-17,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2.014843e-16,-7.818688e-29,4.835970e-28,1.370093e-28,2.401517e-29,3.813243e-29,-5.631777e-29,-1.115197e-29,3.793552e-29,-9.408704e-30,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,-3.947801e-18,1.813704e-30,-1.217105e-29,-3.440989e-30,-6.140219e-31,-9.289169e-31,1.446439e-30,2.690396e-31,-9.738666e-31,2.723197e-31,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3.546293e-17,-2.020212e-29,1.199973e-28,3.328397e-29,5.883409e-30,9.368972e-30,-1.374349e-29,-2.731887e-30,9.092608e-30,-2.127954e-30,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,-3.203531e-17,8.322390e-30,-5.602999e-29,-1.581445e-29,-2.616778e-30,-4.459988e-30,6.698170e-30,1.210625e-30,-4.544609e-30,1.087358e-30,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
352,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
353,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
354,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
355,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [190]:
def IPDL_u_grad_Log_ccp(q, x, Theta, Psi, active_mat, nest_count):
    '''
    This function calucates the gradient of the log choice proabilities wrt. characteristics

    Args.
        q: a (N,J) numpy array of choice probabilities
        x: a (N,J,J) numpy array of covariates
        Theta: a (K+G,) numpy array of IPDL parameters
        Psi: a dictionary of the \Psi^g matrices as columns as outputted 'Create_incidence_matrix'
    
    Returns
        Epsilon: a (N,J,K) numpy array of partial derivatives of the log choice proabilities wrt. characteristics
    '''

    assert q.ndim == 2
    assert x.ndim == 3
    assert Theta.ndim == 1

    N,J,K = x.shape
    Active_indicator = active_mat[:,:,None]*active_mat[:,None,:]
    ccp_grad = ccp_gradient(q, x, Theta, Psi, active_mat, nest_count)
    inv_q = np.divide(1, q, out=np.zeros(q.shape), where= q!=0)
    #inv_Q = np.einsum('tj,jk->tjk', inv_q, np.eye(J))
    Epsilon = Active_indicator*inv_q[:,:,None]*ccp_grad # Is equivalent to (1./q)[:,:,None]*ccp_grad an elementwise product. Einsum merely divides through by the nj'th elemnt of q in k'th row of ccp_grad.

    return Epsilon

In [191]:
def IPDL_elasticity(q, x, Theta, Psi, active_mat, nest_count, char_number = 0):
    ''' 
    This function calculates the elasticity of choice probabilities wrt. any characteristic or nest grouping of products

    Args.
        q: a (N,J) numpy array of choice probabilities
        x: a (N,J,K) numpy array of covariates
        Theta: a (K+G,) numpy array of IPDL parameters
        Psi: a dictionary of the \Psi^g matrices as columns as outputted 'Create_incidence_matrix'
        char_number: an integer which is an index of the parameter in theta wrt. which we wish calculate the elasticity 

    Returns
        an (N,J,J) array of choice probability elasticities
    '''
    return IPDL_u_grad_Log_ccp(q, x, Theta, Psi, active_mat, nest_count)*Theta[char_number]

In [192]:
theta0

array([0.1       , 0.1       , 0.1       , 0.1       , 0.1       ,
       0.1       , 0.1       , 0.1       , 0.1       , 0.1       ,
       0.1       , 0.1       , 0.1       , 0.1       , 0.1       ,
       0.1       , 0.1       , 0.1       , 0.1       , 0.1       ,
       0.04761905, 0.04761905, 0.04761905, 0.04761905, 0.04761905,
       0.04761905, 0.04761905, 0.04761905, 0.04761905, 0.04761905,
       0.04761905, 0.04761905, 0.04761905, 0.04761905, 0.04761905,
       0.04761905, 0.04761905, 0.04761905, 0.04761905, 0.04761905])

Using guess parameters $\hat \theta^0$ we calculate price-to-log-income elasticities for individual $i=0$. 

In [216]:
epsilon0 = IPDL_elasticity(q0_hat, x, theta0, Psi_stack, A, Nest_count)
pd.DataFrame(epsilon0[0,:,:])

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,347,348,349,350,351,352,353,354,355,356
0,0.173593,1.380682e-15,-8.321156e-16,2.396090e-16,-6.255052e-17,7.111274e-18,-1.485487e-16,4.970867e-17,2.279068e-16,-1.171344e-16,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.000000,-0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-0.000000e+00,-0.000000e+00,0.000000e+00,-0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,-0.000000,0.000000e+00,-0.000000e+00,-0.000000e+00,-0.000000e+00,-0.000000e+00,0.000000e+00,0.000000e+00,-0.000000e+00,0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.000000,-0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-0.000000e+00,-0.000000e+00,0.000000e+00,-0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,-0.000000,0.000000e+00,-0.000000e+00,-0.000000e+00,-0.000000e+00,-0.000000e+00,0.000000e+00,0.000000e+00,-0.000000e+00,0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
352,0.000000,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
353,0.000000,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
354,0.000000,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
355,0.000000,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Maximum likelihood estimation of IPDL

The log-likelihood contribution is
$$
\ell_i(\theta)=y_i'\ln p(\mathbf{X}_i,\theta),
$$
and an estimation routine must therefore have a function that - given $\mathbf{X}_i$ and $\theta$ - calculates $u_i=\mathbf{X}_i\beta$ and constructs $\Gamma$, and then calls the fixed point routine described above. That routine will return $p(\mathbf{X}_i,\theta)$, and we can then evaluate $\ell_i(\theta)$. Using our above defined functions we now construct precisely such an estimation procedure.

For maximizing the likelihood, we want the derivates at some $\theta=(\beta',\lambda')$. Let $q_i=p(\mathbf{X}_i,\theta)$, then we have
$$
\nabla_\theta \ln p(\mathbf{X}_i,\theta)=\mathrm{diag}(q_i)^{-1}\left(\nabla_{qq}^2\Omega(q_i|\lambda)^{-1}-q_iq_i' \right)\left[\mathbf{X}_i,-\nabla_{q,\lambda}^2 \Omega(q_i|\lambda)\right]
$$
Note that the first two components is the elasticity $\nabla_u \ln P(u|\lambda)$ and the last term is a block matrix of size $J\times dim(\theta)$. The derivative of the log-likelihood function can be obtained from this as
$$
\nabla_\theta \ell_i(\theta)=\nabla_\theta \ln p(\mathbf{X}_i,\theta)' y_i \\
$$

In [None]:
def IPDL_loglikelihood(Theta, y, x, psi_stack, active_mat, nest_count):
    ''' 
    This function computes the loglikehood contribution for each individual i.
    
    Args.
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        y: a numpy array (N,J) of observed choices in onehot encoding,
        x: a numpy matrix (N,J,K) of covariates,
        Psi: a dictionary of the matrices \psi^g as columns as outputted by 'Create_incidence_matrix'

    Output
        ll: a numpy array (N,) of IPDL loglikelihood contributions
    '''

    N,J,K = x.shape

    gamma = Create_Gamma(Theta[K:], psi_stack, nest_count) # The last G parameters of theta are the nesting parameters \lambda_g
    ccp_hat = IPDL_ccp(Theta[:K], x, gamma, active_mat) # The first K parameters of theta are those of \beta

    ll = np.log(np.einsum('tj,tj->j',y,ccp_hat)) # DOESNT WORK! For each individual find (the log of) the choice probability of the chosen alternative. Is an (N,) array

    return ll

In [229]:
epsilon0 = 1.0e-10
np.einsum('tj,tj->j', y, np.log(q0_hat + epsilon0)) - epsilon0*np.einsum('tj,t->j', y, np.ones((T,)))

array([-5.60409150e+02, -8.41949808e-02, -3.75812167e-02, -1.78383974e-02,
       -5.05103270e-03, -3.98470478e-02, -3.14959823e-02, -8.59679824e-02,
       -5.49939710e-03, -5.23021929e-04, -2.12700410e-02, -4.48311829e-01,
       -2.16972435e-01, -4.29845087e-01, -2.22580082e-01, -1.26231318e-01,
       -2.98305685e-01, -2.90984197e-01, -3.34671172e-02, -1.05236275e-01,
       -7.31674613e-02, -3.24824508e-02, -2.02865985e-01, -1.00429418e-01,
       -7.72329669e-02, -1.39688023e-01, -7.82234211e-02, -8.20218088e-03,
       -1.76633288e-02, -1.42700492e-03, -1.97417888e-03, -1.65183164e-01,
       -3.36900845e-01, -5.56133108e-01, -1.27764231e-01, -4.47449320e-02,
       -7.59956295e-02, -3.38279936e-02, -7.95835521e-03, -1.80427857e-01,
       -3.63448663e-01, -4.88764008e-02, -1.27080558e-01, -3.86846778e-01,
       -4.30332034e-04, -4.21568045e-02, -3.50370423e-01, -7.11447280e-01,
       -9.70295142e-01, -9.83250723e-02, -2.13760336e-01, -7.22458475e-02,
       -9.45383752e-02, -