# Modeling Demand for Cars with the IPDL model

In this notebook, we will introduce and estimate the Inverse Product Differentiation Logit (IPDL) model of Fosgerau et al. (2023) using publically available data on the European car market from Frank Verboven's website at https://sites.google.com/site/frankverbo/data-and-software/data-set-on-the-european-car-market. We begin by introducing the data set. 


Data
====

The dataset consists of approximately 110 vehicle makes per year in the period 1970-1999 in five European markets (Belgium, France, Germany, Italy, and the United Kingdom). The data set includes 47 variables in total. The first four columns are market and product codes for the year, country, and make as well as quantity sold (No. of new registrations) which will be used in computing observed market shares. The remaining variables consist of car characteristics such as prices, horse power, weight and other physical car characteristics as well as macroeconomic variables such as GDP per capita which have been used to construct estimates of the average wage income and purchasing power.

We have in total 30 years and 5 countries, totalling $T=150$ year-country combinations, indexed by $t$, and we refer to each simply as market $t$. In market $t$, the choice set is $\mathcal{J}_t$ which includes the set of available makes as well as an outside option. Let $\mathcal{J} := \bigcup_{t=1}^T \mathcal{J}_t$ be the full choice set and 
 $J:=\#\mathcal{J}$ the number of choices which were available in at least one market, for this data set there are $J=357$ choices.
 


Reading in the dataset `eurocars.csv` we thus have a dataframe of $\sum_{t=1}^T \#\mathcal{J}_t = 11459$ rows and $47$ columns. The `ye` column runs through $y=70,\ldots,99$, the `ma` column runs through $m=1,\ldots,M$, and the ``co`` column takes values $j\in \mathcal{J}$. 

Because we consider a country-year pair as the level of observation, we construct a `market` column taking values $t=1,\ldots,T$. In Python, this variable will take values $t=0,\ldots,T-1$. We construct an outside option $j=0$ in each market $t$ by letting the 'sales' of $j=0$ be determined as 

$$\mathrm{sales}_{0t} = \mathrm{pop}_t - \sum_{j=1}^J \mathrm{sales}_{jt}$$

where $\mathrm{pop}_t$ is the total population in market $t$, and the car characteristics of the outside option is set to zero. The market shares of each product in market $t$ can then be found as
$$
\textrm{market share}_{jt}=\frac{\mathrm{sales_{jt}}}{\mathrm{pop}_t}.
$$
We also read in the variable description of the dataset contained in `eurocars.dta`. We will use the list `x_vars` throughout to work with our explanatory variables. 

In [834]:
import numpy as np
import pandas as pd 
import os
from numpy import linalg as la
from scipy import optimize
import scipy.stats as scstat
from matplotlib import pyplot as plt
import itertools as iter

# Files
import Logit_file as logit

In [835]:
# Load dataset and variable names
# os.chdir('../GREENCAR_notebooks/') # Assigns work directory

input_path = os.getcwd() # Assigns input path as current working directory (cwd)
descr = (pd.read_stata('eurocars.dta', iterator = True)).variable_labels() # Obtain variable descriptions
dat = pd.read_csv(os.path.join(input_path, 'eurocars.csv')) # reads in the data set as a pandas dataframe.

In [836]:
pd.DataFrame(descr, index=['description']).transpose().reset_index().rename(columns={'index' : 'variable names'}) # Prints data sets

Unnamed: 0,variable names,description
0,ye,year (=first dimension of panel)
1,ma,market (=second dimension of panel)
2,co,model code (=third dimension of panel)
3,zcode,alternative model code (predecessors and succe...
4,brd,brand code
5,type,name of brand and model
6,brand,name of brand
7,model,name of model
8,org,"origin code (demand side, country with which c..."
9,loc,"location code (production side, country where ..."


In [837]:
# Choose which variables to include in the analysis, and assign them either as discrete variables or continuous.

x_discretevars = [ 'brand', 'home']
x_contvars = ['cy', 'hp', 'we', 'le', 'wi', 'he', 'li', 'sp', 'ac', 'pr']
# x_ivvars = ...
x_allvars =  [*x_contvars, *x_discretevars]

# Outside option is included if OO == True, otherwise analysis is done on the inside options only.
OO = False

# Print list of chosen variables as a dataframe
print(pd.DataFrame(descr, index=['description'])[x_allvars].transpose().reset_index().rename(columns={'index' : 'variable names'}))

   variable names                                        description
0              cy            cylinder volume or displacement (in cc)
1              hp                                 horsepower (in kW)
2              we                                     weight (in kg)
3              le                                     length (in cm)
4              wi                                      width (in cm)
5              he                                     height (in cm)
6              li          average of li1, li2, li3 (used in papers)
7              sp                            maximum speed (km/hour)
8              ac  time to acceleration (in seconds from 0 to 100...
9              pr   price (in destination currency including V.A.T.)
10          brand                                      name of brand
11           home  domestic car dummy (appropriate interaction of...


We now clean the data to fit our setup

In [838]:
# Create the 'market' column of market index t

dat = dat.sort_values(by = ['ye', 'ma'], ascending = True) # Sorts data set by year and market
Used_cols = [*dat.keys()[:28], 'pr', 'princ', 'pop', 'xexr']  
dat = dat[Used_cols] # Leaves out unused macro variables
market_vals = [*iter.product(dat['ye'].unique(), dat['ma'].unique())] # creates a list of ma-ye combinations
market_vals = pd.DataFrame({'ye' : [val[0] for val in market_vals], 'ma' : [val[1] for val in market_vals]}) 
market_vals = market_vals.reset_index().rename(columns={'index' : 'market'}) # Creates market index
dat = dat.merge(market_vals, left_on=['ye', 'ma'], right_on=['ye', 'ma'], how='left') # Merges market index variable onto dat
dat_org = dat # Save the original data with the 'market'-column added as 'dat_org'.

# Create an inside/outside-option column if the outside option is included

if OO:
    dat['in_out'] = 1
else:
    None

# Drop rows which contain NaN values in any explanatory variable or in the response variable.

dat = dat.dropna()

# Convert discrete explanatory variables to integer valued variables and make sure continuous variables are floats.

obj_columns = dat.select_dtypes(['object'])
for col in obj_columns:
    if col in [*x_contvars, 'xexr']:
        dat[col] = dat[col].str.replace(',', '.').astype('float64')
    else:
        dat[col] = dat[col].astype('category').cat.rename_categories(np.arange(1, dat[col].nunique() + 1)).astype('int64')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dat[col] = dat[col].astype('category').cat.rename_categories(np.arange(1, dat[col].nunique() + 1)).astype('int64')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dat[col] = dat[col].astype('category').cat.rename_categories(np.arange(1, dat[col].nunique() + 1)).astype('int64')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returnin

In [839]:
# Re-encode discrete variables such that only the outside option takes the value 0

x_0vars = [var for var in x_discretevars if len(dat[(dat['co'] != 0)&(dat[var].isin([0]))]) > 0] # Picks out discrete variables where at least one car has category 0

for col in x_0vars:
    dat[col] = dat[col].astype('category').cat.rename_categories(np.arange(1, dat[col].nunique() + 1)).astype('int64') # re-assigns category zero as category 1, and moves other categories up by one

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dat[col] = dat[col].astype('category').cat.rename_categories(np.arange(1, dat[col].nunique() + 1)).astype('int64') # re-assigns category zero as category 1, and moves other categories up by one


In [840]:
# Construct outside option for each market t
if OO:
    outside_shares = dat.groupby('market', as_index=False)['qu'].sum() # sum of sales in each market
    outside_shares = outside_shares.merge(dat[['market', 'pop']], on = 'market', how='left').dropna().drop_duplicates(subset = 'market', keep = 'first')  # Adds population to dataframe
    outside_shares['qu'] = outside_shares['pop'] - outside_shares['qu'] # Assigns quantity for outside option as pop minus sum of sales
    keys_add = [key for key in dat.keys() if (key!='market')&(key!='qu')&(key!='pop')] 
    for key in keys_add:
        outside_shares[key] = 0 # Sets all variables other than market, qu and pop to zero for the outside option

    dat = pd.concat([dat, outside_shares]) # Add outside option to data set

In [841]:
# Compute market shares for each product j in each market t 

dat['ms'] = dat.groupby('market')['qu'].transform(lambda x: x/x.sum())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dat['ms'] = dat.groupby('market')['qu'].transform(lambda x: x/x.sum())


In [842]:
T = dat['market'].nunique() # Assigns the total number of markets T
J = np.array([dat[dat['market'] == t]['co'].nunique() for t in np.arange(T)]) # Array of number of choices in market t


# Number of observations 
if OO:
    N = np.array([dat[dat['market'] == t]['pop'].unique().sum() for t in np.arange(T)]).sum() # If outside option is included, number of observations in market t is the total population
else:
    N = np.array([dat[dat['market'] == t]['qu'].sum() for t in np.arange(T)]).sum() # If outside option is not included, number of observations in market t is the total number of sales


# Get each market's share of total population N
pop_share = np.empty((T,))
for t in np.arange(T):
    pop_share[t] = dat[dat['market'] == t]['qu'].sum() / N

In [843]:
pop_share.sum()

1.0000000000000002

In [844]:
dat[x_contvars] = dat[x_contvars] / dat[x_contvars].abs().max() # Rescale continuous variables so that they lie in the interval [-1,1]. This is done for numerical stability.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dat[x_contvars] = dat[x_contvars] / dat[x_contvars].abs().max() # Rescale continuous variables so that they lie in the interval [-1,1]. This is done for numerical stability.


In [845]:
# Construct dummies of discrete variables. For each variable, one of the columns is left out due to colinearity

dat_disc = pd.get_dummies(dat[x_discretevars], prefix = x_discretevars, columns=x_discretevars, drop_first=True)  

# If outside option is included, then each variable results in a column which is 1 for the outside option, and zero for all other options. These columns are identical to the 'in_out' variable column,
# so a second column must be dropped for each variable.
if OO:
    dat_disc = dat_disc[[var for var in dat_disc.keys() if not var.endswith('1')]] # Drops a second column from discrete columns if outside option is included

dat = pd.concat([dat, dat_disc], axis = 1)

if OO:
    x_vars = ['in_out', *x_contvars, *dat_disc.keys() ]
else:
    x_vars = [*x_contvars, *dat_disc.keys() ]

K = len(x_vars)

In [846]:
# Create dictionaries of numpy arrays for each market. This allows the size of the data set to vary over markets.

dat = dat.reset_index(drop = True).sort_values(by = ['market', 'co']) # Sort data so that reshape is successfull

x = {t: dat[dat['market'] == t][x_vars].values.reshape((J[t],K)) for t in np.arange(T)} # Dict of explanatory variables
y = {t: dat[dat['market'] == t]['ms'].to_numpy().reshape((J[t])) for t in np.arange(T)} # Dict of market shares

In [847]:
# This function tests whether the utility parameters are identified, by looking at the rank of the stacked matrix of explanatory variables.

def rank_test(x):
    x_stacked = np.concatenate([x[t] for t in np.arange(T)], axis = 0)
    eigs=la.eig(x_stacked.T@x_stacked)[0]

    if np.min(eigs)<1.0e-8:
        print('x does not have full rank')
    else:
        print('x has full rank')

rank_test(x)

x has full rank


## Perturbed utility, logit and nested logit

In the following, a vector $z\in \mathbb R^d$ is always a column vector. The IPDL model is a discrete choice model, where the probability vector over the alternatives is given by the solution to a utility maximization problem of the form
$$
p=\arg\max_{q\in \Delta} q'u-\Omega(q)
$$
where $\Delta$ is the probability simplex over the set of discrete choices, $u$ is a vector of payoffs for each option, $\Omega$ is a convex function and $q'$ denotes the transpose of $q$. All additive random utility models can be represented in this way (Fosgerau and Sørensen (2021)). For example, the logit choice probabilities result from the perturbation function $\Omega(q)=q'\ln q$ where $\ln q$ is the elementwise logarithm.

In the nested logit model, the choice set is divided into a partition $\mathcal C=\left\{C_1,\ldots,C_L\right\}$, and the perturbation function is given by
$$
\Omega(q|\lambda)=(1-\lambda)q'\ln q+\lambda \sum_{\ell =1}^L \left( \sum_{j\in C_\ell}q_j\right)\ln \left( \sum_{j\in C}q_j\right),
$$
where $\lambda\in [0,1)$ is a parameter. This function can be written equivalently as
$$
\Omega(q|\lambda)=(1-\lambda)q'\ln q+\lambda \left(\psi q\right)'\ln \left( \psi q\right),
$$
where $\psi$ is a $J \times L$ matrix, where $\psi_{j\ell}=1$ if option $j$ belongs to nest $C_\ell$ and zero otherwise.
 This specification generates nested logit choice probabilities.


## The IPDL model

In the IPDL model, we allow for multiple nesting structures. For each $g=1,\ldots, G$, let $\mathcal C_g$ and $\psi^g$ be constructed as described for the nested logit, and let $L_g$ be the number of nests in group $g$. The IPDL perturbation function is then
$$
\Omega(q|\lambda)=(1-\sum_g \lambda_g) q'\ln q +\sum_g \lambda_g \left(\psi^g q \right)'\ln \left( \psi^g q\right),
$$
where $\lambda=(\lambda_1,\ldots,\lambda_G)$ is a parameter vector satisfying $\lambda_g \geq 0$ and $\sum_g \lambda_g<1$. In this model, each option belongs to $G\geq 1$ nests. When $G=1$, it simplifies to the nested logit model, and when $\sum_g \lambda_g=0$, it simplifies to the logit model. The IPDL model therefore allows more flexibility than a single nested logit model in the types of substitution patterns it can represent, without having to specify a hierarchical structure over the nests.

In this note, the nesting is done according to a subset of the explanatory variables. For categorical variables, each category is a nest. For continuous variables, the data set is partitioned according to the deciles of the variable, resulting in `at most` 10 nests of roughly equal size, as well as a nest for the outside option. This construction implies that $\Omega$ is a function of the data.

## Similarity and negative values of $\lambda$

Bla bla bla


In [848]:
if OO:
    nest_vars = [var for var in ['in_out', *x_allvars] if (var != 'pr')] # We nest over all variables other than price, but an alternative list can be specified here if desired.
else:
    nest_vars = [var for var in x_allvars if (var != 'pr')] # See above

nest_cont_vars = ['cy', 'hp', 'we', 'le', 'wi', 'he', 'li', 'sp', 'ac'] # The list of continuous variables, from which nests will be created according to the deciles of the distribution.

G = len(nest_vars)

## The utility function

Let $x_{tj}$ be the vector of product characteristics for option $j$ in market $t$, and let $X_t$ denote the $J_t\times K $ matrix with elements $x_{tjk}$. The payoff of option $j$ is a linear function of the characteristics $x_{tj}$ of product $j$, which means that the vector of utilities may be written
$$
u(X_t,\beta)=X_t\beta.
$$

Letting $\theta=(\beta',\lambda')'$ denote the full parameter vector of length $D=K+G$, the choice probabilities in market $t$ may be written as
$$
p_t(\theta)=\arg \max_{q\in \Delta_{J_t}} \left\{q'X_t \beta-(1-\sum_g \lambda_g)q'\ln q +\sum_{g=1}^G\lambda_g \left(\psi^{gt} q \right)'\ln \left(\psi^{gt} q\right)\right\}
$$



# Max-rescaling for numerical stability

Let $\alpha$ be a scalar, and let $\iota$ be the all-ones vector in $\mathbb R^J$. Note that $q'(u+\alpha\iota)=q'u+(q'\iota)\alpha=q'u+\alpha$, since $q$ sums to one. For this reason, $\alpha$ does not enter into the utility maximization when calculating $P(u+\alpha\iota|\lambda)$, and we have $P(u+\alpha\iota|\lambda)=P(u|\lambda)$.

This allows us to re-scale the utilities just as in the logit model, since $P(u-(\max_{j}u_j)\iota|\lambda)=P(u|\lambda)$. The numerical benefits of this approach carry over to the IPDL model.

## Gradient and Hessian

The gradient of $\Omega$ with respect to the choice probabilities is

$$
\nabla_q \Omega_t(q|\lambda)=(1-\sum_g \lambda_g)\ln q+ \sum_g \lambda_g(\psi^{gt})'\ln \left( \psi^{gt}q\right)+\iota=\ln q-Z_t(q)\lambda+\iota
$$
where $\iota$ is the all-ones vector and
$$Z_{tg}(q)=\ln q - (\psi^{tg})' \ln (\psi^{tg}q)$$

The Hessian of $\Omega$ is
$$
\nabla_{qq}^2 \Omega_t(q|\lambda)=(1-\sum_g \lambda_g) \mathrm{diag}(q)^{-1}+\sum_g\lambda_g (\psi^{gt})'\mathrm{diag}(\psi^{gt}q)^{-1}\psi^{gt}
$$

Using $\Gamma$, we can show that
$$
\Omega(q|\lambda)=(\Gamma q)'\ln (\Gamma q)+c\\
\nabla_q \Omega(q|\lambda)=\Gamma'\ln (\Gamma q)+\iota\\
\nabla^2_{qq}\Omega(q|\lambda)=\Gamma'\mathrm{diag}(\Gamma q)^{-1}\Gamma,
$$
where $c$ is a scalar that depends on $\lambda$ but not on $q$ and therefore does not affect the utility maximization problem, $\iota=(1,\ldots,1)'\in \mathbb R^J$ is the all-ones vector and $\mathrm{diag}(z)$ is a diagonal matrix with the elements of the vector $z$ on the diagonal.














For purposes of computing the gradient and Hessian of $\Omega$, it is convenient to define
$$
\Gamma=\left(\begin{array}{c}
(1-\sum_g \lambda_g)I_J\\
\lambda_1 \Psi^1\\
\vdots\\
\lambda_G \Psi^G
\end{array}\right)
$$
where $I_J$ is the identity matrix in $\mathbb R^J$. The matrix $\Gamma$ is a block matrix with $J+\sum_g C_g$ rows and $J$ columns. Note that 

$$
\Gamma q=\left(\begin{array}{c}
(1-\sum_g\lambda_g)q \\
\lambda_1\Psi^g q\\
\vdots \\
\lambda_G \Psi^Gq
\end{array}\right)>0
$$
if $q>0$.

In [849]:
def Create_nests(data, markets_id, products_id, columns, cont_var = None, cont_var_bins = None, outside_option = True):
    '''
    This function creates the nest matrices \Psi^{gt}, and stack them over g for each t.

    Args.
        data: a pandas DataFrame
        markets_id: a string denoting the column of 'data' containing an enumeration t=0,1,...,T-1 of markets
        products_id: a string denoting the column of 'data' containing product codes which uniquely identifies products
        columns: a list containing the column names of columns in 'data' from which nest groupings g=0,1,...,G-1 for each market t are to be generated
        cont_var: a list of the continuous variables in 'columns'
        cont_var_bins: a list containing the number of bins to make for each continuous variable in 'columns'
        outside_option: a boolean indicating whether the model is estimated with or without an outside option. Default is set to 'True' i.e. with an outside option.

    Returns
        Psi: a dictionary of length T of the J[t] by J[t] identity stacked on top of the Psi_g matrices for each market t and each gropuing g
        nest_dict: a dictionary of length T of pandas series describing the structure of each nest for each market t and each grouping g
        nest_count: a dictionary of length T of (G,) numpy arrays containing the amount of nests in each category g
    '''

    T = data[markets_id].nunique()
    J = np.array([data[data[markets_id] == t][products_id].nunique() for t in np.arange(T)])
    
    # We include nest on outside vs. inside options. The amount of categories varies if the outside option is included in the analysis.
    dat = data.sort_values(by = [markets_id, products_id]) # We sort the data in ascending, first according to market and then according to the product id
    
    Psi = {}
    nest_dict = {}
    nest_counts = {}

    # Assign nests for products in each market t
    for t in np.arange(T):
        data_t = dat[dat[markets_id] == t] # Subset data on market t


        ### Bin continuous variables according to quantiles of the variable

        if cont_var == None:
            None
        else:
            for var,n_bins in zip(cont_var,cont_var_bins):
                if outside_option:
                    q_dat = np.unique(np.quantile(data_t[var].rank(method = 'min'), q = np.arange(1,n_bins + 1) / n_bins)) # Get the unique 'n_bins' equally spaced quantiles of each continuous variable given in the cont_var list
                    data_t[var] = pd.cut(data_t[var].rank(method = 'min'), bins = [0.99,1, *q_dat], labels=False) # Quantiles are equally spaced with 'n_bins' quantiles for the variable. The outside option gets its own bin (0.99,1].
                else:
                    q_dat = np.unique(np.quantile(data_t[var].rank(method = 'min'), q = np.arange(1,n_bins + 1) / n_bins)) # Get the unique 'n_bins' equally spaced quantiles of each continuous variable given in the cont_var list
                    data_t[var] = pd.cut(data_t[var].rank(method = 'min'), bins = q_dat, labels=False) # Bin the variable according to 'n_bins' equally spaced quantiles.

        nest_dict[t] = data_t[columns].apply(lambda col: list(np.unique(col))) # Get the unique values of each 'col' in columns
        nest_counts[t] = data_t[columns].nunique().values # Find the number of unique values in each column in columns and output as a numpy array

        nest_count_total = data_t[columns].nunique().sum() # Find the sum of nest counts L_g
        nests = pd.get_dummies(data_t[columns], columns = columns).values.reshape((J[t], nest_count_total)).transpose() # Finds dummies for each category in columns, and converts these to numpy arrays of the appropiate size. Note that the data has been sorted according to market and then product.
        Psi_t = np.concatenate([np.eye(J[t]), nests], axis = 0) # Stack a J[t] by J[t] identity on top of the stacked \Psi^g matrices for each market t

        Psi[t] = Psi_t

    return Psi, nest_dict, nest_counts

In [850]:
cont_bins=[np.int64(10) for i in range(len(nest_cont_vars))] # Sets the number of bins to 10 for each continuous variable.
Psi, Nest_descr, Nest_count = Create_nests(dat, 'market', 'co', nest_vars, nest_cont_vars,cont_bins , outside_option=OO)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_t[var] = pd.cut(data_t[var].rank(method = 'min'), bins = q_dat, labels=False) # Bin the variable according to 'n_bins' equally spaced quantiles.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_t[var] = pd.cut(data_t[var].rank(method = 'min'), bins = q_dat, labels=False) # Bin the variable according to 'n_bins' equally spaced quantiles.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_t[var] = pd.cut(data_t[var].rank(method = 'min'), bins = q_dat, labels=False) # Bin the variable according to 'n_bins' equally spaced quantiles.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_t[var] = pd.cut(data_t[var].rank(method = 'min'), bins = q_dat, labels=False) # Bin the variable according to 'n_bins' equally spaced quantiles.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://

In [851]:
def Create_Gamma(Lambda, Psi, nest_count):
    '''
    This function 
    '''

    T = len(Psi)
    
    Gamma = {}
    lambda0 = np.array([1 - sum(Lambda)])
    Lambda_full = np.concatenate((lambda0, Lambda)) # create vector (1- sum(lambda), lambda_1, ..., lambda_G)

    for t in np.arange(T):
        C,J = Psi[t].shape # The amount of alternatives in market t
        Lambda_long = np.empty((C,)) # Initialize a row vector with as many rows as psi_stack
        indices = np.concatenate((np.array([J]) , nest_count[t])).cumsum().astype('int64') # Get the indices of where the identity and the nests in psi_stack are located along the rows of psi_stack.

        for i in np.arange(len(indices)):
            if i == 0:
                Lambda_long[0:(indices[i])] = Lambda_full[i] # Assign 1-sum(lambda) to the first J coordinates of Lambda_long
            else:
                Lambda_long[indices[i-1]:indices[i]] = Lambda_full[i] # Assign lambda_g to the coordinates of Lambda_long corresponding to the rows of psi_stack equal to the block matrix \psi^g 
    
        Gamma[t] =  np.einsum('c,cj->cj', Lambda_long, Psi[t]) # Compute hadamard product of lambda parameters and psi_stack

    return Gamma

In [852]:
lambda0 = np.ones((G,))/(2*(G+1))
Gamma0 = Create_Gamma(lambda0, Psi, Nest_count)

theta0=np.ones((K+G,))/(K+G)

## Model solution

Suppose we are evaluating the choice probability function $p_t(\theta)$ at some parameter vector $\theta$. While it is possible to solve for the choice probabilities explicitly by numerical maximization, Fosgerau and Nielsen (2021) suggest a contraction mapping approach which is conceptually simpler. Let $u_t=X_t\beta$ and let $q_t^0$ be an initial guess of the choice probabilities, e.g. $q_t^0\propto \exp(X_t\beta)$. Define further
$$
a=\sum_{g:\lambda_g\geq 0} \lambda_g   \qquad b=\sum_{g:\lambda_g<0} |\lambda_g|.
$$

The choice probabilities are then updated iteratively as
$$
q_t^{r} = \frac{e^{v_t^{r}}}{\sum_{j\in \mathcal J_t} e^{v_{tj}^{r}}},
$$
where
$$
v_t^{r} =\ln q_t^{r-1}+\left(u_t-\nabla_q \Omega_t(q^{r-1}_t|\lambda)\right)/(1+b).
$$
Using the definition of $Z_{gt}$ above, this becomes
$$
v^r_t=\ln q_t^{r-1}+\left(u_t+Z_{t}(q^{r-1})\lambda-\ln q_t^{r-1}  \right)/(1+b) =  \left( u_t+ b\ln q^{r-1}_t+Z_{t}(q^{r-1})\lambda\right)/(1+b)
$$


For numerical stability, it can be a good idea to also do max-rescaling of $v^r_t$ at every iteration. The Kullback-Leibler divergence $D_{KL}(p||q)=p'\ln \frac{p}{q}$ decays linearly with each iteration,
$$
D_{KL}(p_t(\theta)||q_t^{r})\leq \frac{a+b}{1+b}D_{KL}(p_t(\theta)||q^{r-1}_t).
$$
This is implemeneted in the function "IPDL_ccp" below. 

In [894]:
def IPDL_ccp(Theta, x, psi, nest_count, tol = 1.0e-15, maximum_iterations = 1000):
    '''
    This function finds approximations to the true conditional choice probabilities given parameters.

    Args.
        Theta: a numpy array (K+G,) of parameters
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        psi: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t
        tol: tolerated approximation error
        maximum_iterations: a no. of maximum iterations which if reached will stop the algorithm

    Output
        q_1: a dictionary of T numpy arrays (J[t],) of IPDL choice probabilities for each market t
    '''

    T = len(x) # Number of markets
    K = x[0].shape[1] # Number of car characteristics

    # Parameters
    Beta = Theta[:K]
    Lambda = Theta[K:]
    G = len(Lambda)  # Number of groups

    print(Lambda)
    # Calculate small beta
    C_minus = np.array([True if Lambda[g] < 0 else False for g in np.arange(G)])
    print(C_minus) # Find the categories g with negative a negative parameter lambda_g
    if C_minus.all() == False:
        b = 0
    else:    
        b = np.abs(Lambda[C_minus]).sum() # sum of absolute value of negative lambda parameters.

    Gamma = Create_Gamma(Lambda, psi, nest_count) # Find the Gamma matrix

    u = {t: np.einsum('jk,k->j', x[t], Beta) for t in np.arange(T)} # Calculate linear utilities
    q = {t: np.exp(u[t] - u[t].max()) / np.exp(u[t] - u[t].max()).sum() for t in np.arange(T)} # Find logit choice probabilities
    q0 = q
    
    Epsilon = 1.0e-14

    for k in range(maximum_iterations):
        q1 = {}
        for t in np.arange(T):
            # Calculate v
            psi_q = np.einsum('cj,j->c', psi[t], q0[t]) # Compute matrix product
            log_psiq =  np.log(psi_q) # Add Epsilon? to avoid zeros in log np.log(np.abs(gamma_q), out = np.NINF*np.ones_like(gamma_q), where = (np.abs(gamma_q) > 0))
            gamma_log_prod = np.einsum('cj,c->j', Gamma[t], log_psiq) # Compute matrix product
            v = np.log(q0[t], out = -np.inf*np.ones_like(q0[t]), where = (q0[t] > 0)) + (u[t] - gamma_log_prod)/(1 + b) # Calculate v = log(q) + (u - Gamma^T %o% log(Gamma %o% q) %o% Gamma)/(1 + b)
            v -= v.max(keepdims = True) # Do max rescaling wrt. alternatives

            # Calculate iterated ccp q^k
            numerator = np.exp(v)
            denom = numerator.sum()
            q1[t] = numerator/denom

        # Check convergence in an appropriate distance function
        dist = np.max(np.array([np.sum((q1[t]-q0[t])**2/q[t]) for t in np.arange(T)])) # Uses logit weights. This avoids precision issues when q1~q0~0.

        if dist<tol:
            break
        elif k==maximum_iterations:
            break
        else:
            None
            
        # Iteration step
        q0 = q1

    return q1 

assert np.array([np.sum(q1[t]) for t in np.arange(T)]).all() == 1

## Demand derivatives and price Elasticity

While the demand derivatives in the IPDL model are not quite as simple as in the logit model, they are still easy to compute. 
Let $q=P(u|\lambda)$, then
$$
\nabla_u P(u|\lambda)=\left(\nabla^2_{qq}\Omega(q|\lambda)\right)^{-1}-qq'
$$
where the $()^{-1}$ denotes the matrix inverse. The derivatives with respect to any $x_{ij\ell}$ can now easily be computed by the chain rule,
$$
    \frac{\partial P_j(u_i|\lambda)}{\partial x_{ik\ell}}=\frac{\partial P_j(u_i|\lambda)}{\partial u_{ik}}\frac{\partial u_{ik}}{\partial x_{ik\ell}}=\frac{\partial P_j(u_i|\lambda)}{\partial u_{ik}}\beta_\ell,
$$

Finally, moving to price elasticity is the same as in the logit model, if $x_{ik\ell}$ is the log price of product $k$ for individual $i$, then
$$
    \mathcal{E}_{jk}= \frac{\partial P_j(u_i|\lambda)}{\partial x_{ik\ell}}\frac{1}{P_j(u_i|\lambda)}=\frac{\partial P_j(u_i|\lambda)}{\partial u_{ik}}\frac{1}{P_j(u_i|\lambda)}\beta_\ell=\frac{\partial \ln P_j(u_i|\lambda)}{\partial u_{ik}}\beta_\ell$$
we can also write this compactly as
$$
\nabla_u \ln P(u|\lambda)=\mathrm{diag}(P(u|\lambda))^{-1}\nabla_u P(u|\lambda).
$$

In [854]:
def compute_pertubation_hessian(q, x, Theta, psi, nest_count):
    '''
    This function calucates the hessian of the pertubation function \Omega

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        Theta: a numpy array (K+G,) of parameters
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t
    
    Returns
        Hess: a dictionary of T numpy arrays (J[t],J[t]) of second partial derivatives of the pertubation function \Omega for each market t
    '''
    
    T = len(q.keys())
    K = x[0].shape[1]

    Gamma = Create_Gamma(Theta[K:], psi, nest_count) # Find the \Gamma matrices 
    #Hess = {}
    Hess={}
    for t in np.arange(T):
        psi_q = np.einsum('cj,j->c', psi[t], q[t]) # Compute a matrix product
        Hess[t] = np.einsum('cj,c,cl->jl', Gamma[t], 1/psi_q, psi[t]) # Computes the product \Gamma' diag(\psi q)^{-1} \psi (but faster)

    return Hess

In [855]:
def ccp_gradient(q, x, Theta, psi_stack, nest_count):
    
    '''
    This function calucates the gradient of the choice proabilities wrt. characteristics

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        Theta: a numpy array (K+G,) of parameters
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t
    
    Returns
        Grad: a dictionary of T numpy arrays (J[t],K) of partial derivatives of the choice proabilities wrt. utilities for each market t
    '''

    T = len(q.keys())
    Grad = {}
    Hess = compute_pertubation_hessian(q, x, Theta, psi_stack, nest_count) # Compute the hessian of the pertubation function

    for t in np.arange(T):
        inv_omega_hess = la.inv(Hess[t]) # (J,J) for each t=1,...,T , computes the inverse of the Hessian
        qqT = q[t][:,None]*q[t][None,:] # (J,J) outerproduct of ccp's for each market t
        Grad[t] = inv_omega_hess - qqT  # Compute IPDL gradient of ccp's wrt. utilities

    return Grad

In [856]:
def IPDL_u_grad_Log_ccp(q, x, Theta, psi_stack, nest_count):
    '''
    This function calucates the gradient of the log choice proabilities wrt. characteristics

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        Theta: a numpy array (K+G,) of parameters
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t
    
    Returns
        Epsilon: a dictionary of T numpy arrays (J[t],J[t]) of partial derivatives of the log choice proabilities of products j wrt. utilites of products k for each market t
    '''

    T = len(q.keys())
    Epsilon = {}
    Grad = ccp_gradient(q, x, Theta, psi_stack, nest_count) # Find the gradient of ccp's wrt. utilities
    
    for t in np.arange(T):
        #ccp_grad = Grad[t]
        #inv_diagq = np.divide(1, q[t], out = np.inf*np.ones_like(q[t]), where = (q[t] > 0)) # Find the inverse of the ccp's and assign infinity to any entry if that entry has q = 0
        Epsilon[t] = Grad[t]/q[t][:,None] # Computes diag(q)^{-1}Grad[t]
        #np.einsum('j,jk->jk', inv_diagq, ccp_grad) # Computes a Hadamard product. Is equivalent to:   diag(q)^-1 %o% ccp_grad

    return Epsilon

In [857]:
def IPDL_elasticity(q, x, Theta, psi_stack, nest_count, char_number = K-1):
    ''' 
    This function calculates the elasticity of choice probabilities wrt. any characteristic or nest grouping of products

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        Theta: a numpy array (K+G,) of parameters
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t
        char_number: an integer which is an index of the parameter in theta wrt. which we wish calculate the elasticity. Default is the index for the parameter of 'pr'.

    Returns
        a dictionary of T numpy arrays (J[t],J[t]) of choice probability semi-elasticities for each market t
    '''
    T = len(q.keys())
    Epsilon = {}
    Grad = IPDL_u_grad_Log_ccp(q, x, Theta, psi_stack, nest_count) # Find the gradient of log ccp's wrt. utilities

    for t in np.arange(T):
        Epsilon[t] = Grad[t]*Theta[char_number] # Calculate semi-elasticities

    return Epsilon

Using guess parameters $\hat \theta^0$ we calculate price-to-log-income elasticities for individual $i=0$. 

pd.DataFrame(IPDL_elasticity(q0_hat, x, theta0, Psi, Nest_count)[0])

## Maximum likelihood estimation of IPDL

The log-likelihood contribution is
$$
\ell_t(\theta)=y_t'\ln p(\mathbf{X}_t,\theta),
$$
and an estimation routine must therefore have a function that - given $\mathbf{X}_t$ and $\theta$ - calculates $u_t=\mathbf{X}_t\beta$ and constructs $\Gamma$, and then calls the fixed point routine described above. That routine will return $p(\mathbf{X}_t,\theta)$, and we can then evaluate $\ell_t(\theta)$. Using our above defined functions we now construct precisely such an estimation procedure.

For maximizing the likelihood, we want the derivates at some $\theta=(\beta',\lambda')$. Let $q_t=p(\mathbf{X}_t,\theta)$, then we have
$$
\nabla_\theta \ln p(\mathbf{X}_t,\theta)=\mathrm{diag}(q_t)^{-1}\left(\nabla_{qq}^2\Omega(q_t|\lambda)^{-1}-q_tq_t' \right)\left[\mathbf{X}_t,-\nabla_{q,\lambda}^2 \Omega(q_t|\lambda)\right]
$$
Note that the first two components is the elasticity $\nabla_u \ln P(u|\lambda)$ and the last term is a block matrix of size $J\times dim(\theta)$. Note that the latter cross derivative $\nabla_{q,\lambda}^2 \Omega(q_t|\lambda)$ is given by $\nabla_{q,\lambda} \Omega(q_t|\lambda)_g = \ln(q) - (\Psi^g)' \ln(\Psi^g q)$ for each row $g=1,\ldots,G$. The derivative of the log-likelihood function can be obtained from this as
$$
\nabla_\theta \ell_t(\theta)=\nabla_\theta \ln p(\mathbf{X}_t,\theta)' y_t \\
$$

In [858]:
def IPDL_loglikelihood(Theta, y, x, sample_share, psi_stack, nest_count):
    ''' 
    This function computes the loglikehood contribution for each individual i.
    
    Args.
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        y: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t

    Output
        ll: a numpy array (T,) of IPDL loglikelihood contributions
    '''

    T = len(x.keys())
    K = x[0].shape[1]
    ccp_hat = IPDL_ccp(Theta, x, psi_stack, nest_count)
    sum_lambdaplus = np.array([theta for theta in Theta[K:] if theta >0]).sum()

    if sum_lambdaplus >= 1:
        ll = np.NINF*np.ones((T,))

    else:
        ll=np.empty((T,))
        for t in np.arange(T):
            ll[t] = sample_share[t]*(y[t].T@np.log(ccp_hat[t]))#np.einsum('j,j', y[t], np.log(ccp_hat[t], out = -np.inf*np.ones_like(ccp_hat[t]), where = (ccp_hat[t] > 0)))

    print([sum_lambdaplus, -ll.mean()])

    return ll

In [859]:
def q_IPDL(Theta, y, x, sample_share, psi_stack, nest_count):
    ''' The negative loglikelihood criterion to minimize
    '''
    Q = -IPDL_loglikelihood(Theta, y, x, sample_share, psi_stack, nest_count)
    
    return Q

We also implement the derivative of the loglikehood wrt. parameters $\nabla_\theta \ell_t(\theta)$.

In [860]:
def cross_grad_pertubation(q, psi_stack, nest_count):
    ''' 
    This function calculates the cross diffential of the pertubation function \Omega wrt. first ccp's and then the lambda parameters

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t
    
    Returns
        Z: a dictionary of T numpy arrays (J[t],G) of cross diffentials of the pertubation function \Omega wrt. first ccp's and then the lambda parameters
    '''

    T = len(q.keys())
    log_q = {t: np.log(q[t], out = -np.inf*np.ones_like(q[t]), where = (q[t] > 0)) for t in np.arange(T)} # Determine log(q), and set entries equal minus inifinity if entry <= 0
    Z = {}
    
    for t in np.arange(T):
        G = len(nest_count[t])
        indices = np.int64(np.cumsum(nest_count[t])) # Find the indices of the categories g used in the psi_stack matrices
        J = np.int64(psi_stack[t].shape[0] - np.sum(nest_count[t])) # Find the number of alternatives
        Z_t = np.empty((J,G)) # Initialize a J[t] by G numpy matrix for market t

        for g in np.arange(G):

            # Find the \psi^g matrix for category g
            if g == 0:
                Psi = psi_stack[t][J:J+indices[g],:] 
            else:
                Psi = psi_stack[t][J+indices[g-1]:J+indices[g],:]

            Psi_q = np.einsum('cj,j->c', Psi, q[t]) # Compute a matrix product
            log_Psiq = np.log(Psi_q, out = -np.inf*np.ones_like(Psi_q), where = (Psi_q > 0)) # Determine log of Psi_q, and set entries equal to minus infinity if entry <= 0.
            Psi_logPsiq = np.einsum('cj,c->j', Psi, log_Psiq) # Compute matrix product

            Z_t[:,g] = log_q[t] - Psi_logPsiq # Compute cross differential
        
        Z[t] = Z_t
    
    return Z

In [861]:
def IPDL_theta_grad_log_ccp(Theta, x, psi_stack, nest_count):
    '''
    This function calculates the derivative of the IPDL log ccp's wrt. parameters theta

    Args.
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t
    Returns
        Grad: a dictionary of T numpy arrays (J[t],K+G) of derivatives of the IPDL log ccp's wrt. parameters theta for each market t
    '''

    T = len(x.keys())

    q = IPDL_ccp(Theta, x, psi_stack, nest_count) # Find choice probabilities

    Z = cross_grad_pertubation(q, psi_stack, nest_count) # Find cross differentials of the pertubation function
    u_grad = IPDL_u_grad_Log_ccp(q, x, Theta, psi_stack, nest_count)  # Find the gradient of log ccp's wrt. utilities
    Grad={}

    for t in range(T):
        G=np.concatenate((x[t], Z[t]), axis=1)
        Grad[t]=u_grad[t]@G
   
   # G = [np.concatenate((x[t], Z[t]), axis=1) for t in np.arange(T)] # Construct the block matrix of the covariates and the cross differentials as block matrices
    #Grad = {t: np.einsum('jk,kd->jd', u_grad[t], G[t]) for t in np.arange(T)} # Compute the derivative by matrix multiplication.

    return Grad

In [862]:
def IPDL_score(Theta, y, x, sample_share, psi_stack, nest_count):
    '''
    This function calculates the score of the IPDL loglikelihood.

    Args.
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        y: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t

    Returns
        Score: a numpy array (T,K+G) of IPDL scores
    '''
    T = len(x.keys())

    log_ccp_grad = IPDL_theta_grad_log_ccp(Theta, x, psi_stack, nest_count) # Find derivatives of the IPDL log ccp's wrt. parameters theta
    D = log_ccp_grad[0].shape[1] # equal to K+G
    Score = np.empty((T,D))
    
    for t in np.arange(T):
        Score[t,:] =sample_share[t]*(log_ccp_grad[t].T@y[t]) #np.einsum('j,jd->d', y[t], log_ccp_grad[t]) # Computes a matrix product

    return Score

In [863]:
def q_IPDL_score(Theta, y, x, sample_share, psi_stack, nest_count):
    ''' The derivative of the negative loglikelihood criterion
    '''
    return -IPDL_score(Theta, y, x, sample_share, psi_stack, nest_count)

In [864]:
def test_angrad(y,x,delta,theta,pop_share,Psi,Nest_count):

    numgrad = np.empty((T, K+G))

    for i in np.arange(K+G):
        vec = np.zeros((K+G,))
        vec[i] = 1
        numgrad[:,i] = (IPDL_loglikelihood(theta + delta*vec, y, x, pop_share, Psi, Nest_count) - IPDL_loglikelihood(theta0, y, x, pop_share, Psi, Nest_count)) / delta
    
    return numgrad

angrad = IPDL_score(theta0, y, x, pop_share, Psi, Nest_count)

numgrad.shape

pd.DataFrame(numgrad[0,:]).transpose()

pd.DataFrame(angrad[0,:]).transpose()

## Standard errors in Maximum Likelihood estimation

As usual we may consistently estimate the Covariance Matrix  of the IPDL maximum likelihood estimator for some estimate $\hat \theta = (\hat \beta', \hat \lambda')'\in \mathbb{R}^{K+G}$ as:

$$
\hat \Sigma = \left( \sum_{i=1}^N \nabla_\theta \ell_i (\hat \theta) \nabla_\theta \ell_i (\hat \theta)' \right)^{-1}
$$

Thereby we may find the estimated standard error of parameter $d$ as the squareroot of the d'th diagonal entry of $\hat \Sigma$:

$$
\hat \sigma_d = \sqrt{\hat \Sigma_{dd}}
$$

In [865]:
def IPDL_se(score, N):
    '''
    This function computes the asymptotic standard errors of the MLE.

    Args.
        score: a numpy array (T,K+G) of IPDL scores
        N: an integer giving the number of observations

    Returns
        SE: a numpy array (K+G,) of asymptotic IPDL MLE standard errors
    '''

    SE = np.sqrt(np.diag(la.inv(np.einsum('td,tm->dm', score, score))) / N)

    return SE

In [866]:
def IPDL_t_p(SE, Theta, N, Theta_hypothesis = 0):
    ''' 
    This function calculates t statistics and p values for characteristic and nest grouping parameters

    Args.
        SE: a numpy array (K+G,) of asymptotic IPDL MLE standard errors
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        N: an integer giving the number of observations
        Theta_hypothesis: a (K+G,) array or integer of parameter values to test in t-test. Default value is 0.
    
    Returns
        T: a (K+G,) array of estimated t tests
        p: a (K+G,) array of estimated asymptotic p values computed using the above t-tests
    '''

    T = np.abs(Theta - Theta_hypothesis) / SE
    p = 2*scstat.t.sf(T, df = N-1)

    return T,p

### We now estimate the model

In [867]:
def estimate_IPDL(f, Theta0, y, x, sample_share, psi_stack, nest_count, N, Analytic_jac:bool = True, options = {'disp': True}, **kwargs):
    ''' 
    Takes a function and returns the minimum, given starting values and variables necessary in the IPDL model specification.

    Args:
        f: a function to minimize,
        Theta0 : a numpy array (K+G,) of initial guess parameters (\beta', \lambda')',
        y: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests', 
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t,
        N: an integer giving the number of observations,
        Analytic_jac: a boolean. Default value is 'True'. If 'True' the analytic jacobian of the IPDL loglikelihood function is used in estimation. Else the numerical jacobian is used.
        options: dictionary with options for the optimizer (e.g. disp=True which tells it to display information at termination.)
    
    Returns:
        res: a dictionary with results from the estimation.
    '''

    # The objective function is the average of q(), 
    # but Q is only a function of one variable, theta, 
    # which is what minimize() will expect
    Q = lambda Theta: np.mean(f(Theta, y, x, sample_share, psi_stack, nest_count))

    if Analytic_jac == True:
        Grad = lambda Theta: np.mean(q_IPDL_score(Theta, y, x, sample_share, psi_stack, nest_count), axis=0) # Finds the Jacobian of Q. Takes mean of criterion q derivatives along axis=0, i.e. the mean across individuals.
    else:
        Grad = None

    # call optimizer
    result = optimize.minimize(Q, Theta0.tolist(), options=options, jac=Grad, **kwargs) # optimize.minimize takes a list of parameters Theta0 (not a numpy array) as initial guess.
    se = IPDL_se(IPDL_score(result.x, y, x, sample_share, psi_stack, nest_count), N)
    T,p = IPDL_t_p(se, result.x, N)

    # collect output in a dict 
    res = {
        'theta': result.x,
        'se': se,
        't': T,
        'p': p,
        'success':  result.success, # bool, whether convergence was succesful 1
        'nit':      result.nit, # no. algorithm iterations 
        'nfev':     result.nfev, # no. function evaluations 
        'fun':      result.fun # function value at termination 
    }

    return res

p_theta=IPDL_ccp(theta0,x,Psi,Nest_count)

H=compute_pertubation_hessian(p_theta,x,theta0,Psi,Nest_count)


In [868]:
beta_0 = np.ones((K,))

# Estimate the model
res_logit = logit.estimate_logit(logit.q_logit, beta_0, y, x, sample_share=pop_share, Analytic_jac=True)

theta0=np.append(res_logit['beta'],lambda0*0)

Optimization terminated successfully.
         Current function value: 0.023989
         Iterations: 259
         Function evaluations: 265
         Gradient evaluations: 265


In [869]:
resbla2 = estimate_IPDL(q_IPDL, theta0, y, x, pop_share, Psi, Nest_count, N, Analytic_jac=True,options={'gtol':1e-15})

[0.0, 0.023988953476331697]
[0.0014520970299060063, 0.023988679233698205]
[0.007260485149530031, 0.02398764175216794]
[0.023498462360929836, 0.023985187757716842]
[0.04552659533041656, 0.023982888092721543]
[0.04762646011621456, 0.023982511271870476]
[0.05602591925940656, 0.023981076992827588]
[0.07645212576664609, 0.023976800299988846]
[0.12439698682291579, 0.023967877238319665]
[0.198616903516793, 0.023958678900521477]
[0.2030907066992202, 0.023958232501100723]
[0.19601512816309413, 0.02395899415411581]
[0.20228751799043476, 0.023958293126525258]
[0.20299766340704511, 0.023958239180478926]
[0.20307990291267886, 0.023958233272053396]
[0.2030894518666148, 0.02395823259058251]
[0.20309056094893824, 0.023958232511493295]
[0.2030906897700909, 0.023958232502307827]
[0.20309070473287472, 0.023958232501240934]
[0.20309070647082877, 0.02395823250111701]
[0.20309070667269255, 0.02395823250110262]
[0.20309070669614349, 0.023958232501100945]
[0.20309070669886475, 0.02395823250110075]
[0.20309070

In [870]:
-IPDL_loglikelihood(theta0, y, x, pop_share, Psi, Nest_count).mean()

[0.0, 0.023988953476331697]


0.023988953476331697

In [871]:
resbla2

{'theta': array([-3.80927815e+00, -5.75097444e+00,  1.33818097e+00, -9.04156933e-01,
         7.64624073e+00, -1.16108741e+00, -1.72686230e+00,  3.43906981e+00,
        -5.16298383e-01, -3.49536422e+00, -7.74885730e-01, -2.83158082e-01,
        -1.09683011e+00, -7.16899838e-01, -9.54208768e-01, -1.20226421e+00,
        -1.45351252e+00, -2.95857841e+00, -5.62602995e-01, -5.23159613e-01,
        -6.89613015e-01, -1.44855097e+00, -2.37939708e+00, -2.24907708e+00,
        -1.42211322e+00, -1.19076816e+00,  5.83399544e-01, -1.42625142e+00,
        -5.98569335e-01, -5.15086922e-01, -5.05526624e-01, -3.35002467e-01,
        -1.03112469e+00, -1.05752280e+00, -1.40302647e+00,  7.18015543e-01,
        -1.40377790e+00, -7.58230378e-01, -1.92717295e+00, -8.58192359e-01,
        -1.14590240e+00, -2.91430780e-01, -1.32982873e+00, -7.74522990e-01,
        -4.16498095e-01,  1.59350044e+00,  5.85285409e-03,  8.85794878e-03,
         3.68236186e-02,  2.25929955e-02, -7.58062701e-03,  4.05622306e-02,
   

In [872]:
def reg_table(theta,se,N,x_vars,nest_vars):
    IPDL_t, IPDL_p = IPDL_t_p(se, theta, N)

    if OO:
        regdex = [*x_vars, *['group_' + var for var in nest_vars]]
    else:
        regdex = [*x_vars, *['group_' + var for var in nest_vars]]

    table  = pd.DataFrame({'theta': [ str(np.round(theta[i], decimals = 4)) + '***' if IPDL_p[i] <0.01 else str(np.round(theta[i], decimals = 3)) + '**' if IPDL_p[i] <0.05 else str(np.round(theta[i], decimals = 3)) + '*' if IPDL_p[i] <0.1 else np.round(theta[i], decimals = 3) for i in range(len(theta))], 
                'se' : np.round(se, decimals = 5),
                't (theta == 0)': np.round(IPDL_t, decimals = 3),
                'p': np.round(IPDL_p, decimals = 3)}, index = regdex).rename_axis(columns = 'variables')
    
    return table

In [873]:
IPDL_theta = resbla2['theta']
reg_table(resbla2['theta'],resbla2['se'],N,x_vars,nest_vars)

variables,theta,se,t (theta == 0),p
cy,-3.8093***,0.39451,9.656,0.0
hp,-5.751***,0.56515,10.176,0.0
we,1.3382***,0.49492,2.704,0.007
le,-0.904**,0.44454,2.034,0.042
wi,7.6462***,0.85902,8.901,0.0
he,-1.161**,0.54992,2.111,0.035
li,-1.7269***,0.41199,4.192,0.0
sp,3.4391***,0.69662,4.937,0.0
ac,-0.5163***,0.18305,2.821,0.005
pr,-3.4954***,0.79109,4.418,0.0


In [874]:
np.array([p for p in IPDL_theta[K:] if p>0]).sum()

0.2030907066992202

### An alternative approach

The log-likelihood function is not globally concave, and finding the global optimum can be difficult. Using the estimation procedure of Fosgerau et. al. (2023 working paper), we can instead fit the parameters using the first-order conditions for optimality. The estimator takes the form

$$
\hat \theta^0=\arg \min_{\theta} \sum_t s_t \hat \varepsilon^0_t(\theta)'\hat W^0_t\hat \varepsilon^0 _t(\theta),
$$
where $\hat W^0_t$ is a positive semidefinite weight matrix, $s_t$ is market $t$'s share of the total population and 
$$
\hat \varepsilon^0_t(\theta)=\hat D^0_t(u(X_t,\beta)- \nabla_q \Omega_t(\hat q_t^0|\lambda)) ,
$$
where 
$$
\hat D^0_t=\textrm{diag}(\hat q^0_t)-\hat q^0_t (\hat q^0_t)'.
$$
Using equation (...) above, we have that $\hat \epsilon_t$ is a linear function of $\theta$,
$$
\hat \varepsilon^0_t(\theta)=\hat D^0_t \left(\hat G^0_t\theta- \ln \hat q^0_t\right)\equiv \hat A^0_t\theta-\hat r^0_t.
$$
Using linearity, the weighted least squares criterion has a unique closed form solution,
$$
\hat \theta^0 =\left(\sum_t s_t (\hat A^0_t)'\hat W^0_t \hat A^0_t \right)^{-1}\left(\sum_t s_t (\hat A^0_t)'\hat W^0_t \hat r_t^0 \right)
$$




In [875]:
def G_array(q, x, psi_stack, nest_count):
    ''' 
    This function calculates the G block matrix

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t

    Returns
        G: a dictionary  of T numpy arrays (J[t],K+G): a G matrix for each market t
    '''
    T = len(x)

    Z = cross_grad_pertubation(q, psi_stack, nest_count) # Find the cross derivative of the pertubation function \Omega wrt. lambda and ccp's q
    G = {t: np.concatenate((x[t],Z[t]), axis=1) for t in np.arange(T)} # Join block matrices along 2nd dimensions  s.t. last dimension is K+G (same dimension as theta)

    return G

In [876]:
def D_array(q):
    '''
    This function calculates the D matrix - the logit derivative of ccp's wrt. utilities

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t

    Returns
        D: a dictionary of T numpy arrays (J[t],J[t]) of logit derivatives of ccp's wrt. utilities for each market t
    '''
    T = len(q)

    D = {t: np.diag(q[t]) - np.einsum('j,k->jk', q[t], q[t]) for t in np.arange(T)}
    
    return D

In [877]:
def A_array(q, x, psi_stack, nest_count):
    '''
    This function calculates the A matrix

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t

    Returns
        A: a dictionary  of T numpy arrays (J[t],K+G): an A matrix for each market t
    '''
    T = len(x)

    D = D_array(q)
    G = G_array(q, x, psi_stack, nest_count)
    A = {t: np.einsum('jk,kd->jd', D[t], G[t]) for t in np.arange(T)}

    return A

In [878]:
def r_array(q):
    '''
    This function calculates 'r'; the logarithm of observed or nonparametrically estimated market shares

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
    
    Returns
        r: a dictionary of T numpy arrays (J[t],) of the log of ccp's for each market t
    '''
    T = len(q)

    D = D_array(q) 
    log_q = {t: np.log(q[t], out = -np.inf*np.ones_like(q[t]), where = (q[t] > 0)) for t in np.arange(T)}
    r = {t: np.einsum('jk,k->j', D[t], log_q[t]) for t in np.arange(T)}

    return r

In [879]:
def WLS_init(q, x, sample_share, psi_stack, nest_count, N):
    ''' 
    This function calculates the weighted least squares estimator \hat \theta^k and its relevant estimated standard error for the initial FKN parameter estimates.

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        sample_share: A (T,) numpy array of the fraction of observations in each market t 
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t
        N: An integer giving the total amount of observations

    Returns
        theta_hat: a (K+G,) numpy array of initial FKN parameter estimates
        se_hat: a (K+G,) numpy array of standard errors for initial FKN parameter estimates
    '''

    T = len(x)

    #W = {t: la.inv(np.diag(q[t])) for t in np.arange(T)}
    A = A_array(q, x, psi_stack, nest_count)
    r = r_array(q)

    d = A[0].shape[1]
    
    AWA = np.empty((T,d,d))
    AWr = np.empty((T,d))

    for t in np.arange(T):
        AWA[t,:,:] = sample_share[t]*np.einsum('jd,j,jp->dp', A[t], 1/q[t], A[t], optimize = True) # Fast product using that the weights are diagonal.
        AWr[t,:] = sample_share[t]*np.einsum('jd,j,j->d', A[t], 1/q[t], r[t], optimize = True)
    
    theta_hat = la.solve(AWA.sum(axis = 0), AWr.sum(axis = 0))
    se_hat = np.sqrt(np.diag(la.inv(AWA.sum(axis = 0))) / N)
    
    return theta_hat,se_hat
    

Using the observed market shares we may thus find initial parameter estimates $\hat \theta^0$ as described above.

In [880]:
thetaFKN0,seFKN0 = WLS_init(y, x, pop_share, Psi, Nest_count, N)

In [881]:
np.array([p for p in thetaFKN0[K:] if p>0]).sum()

1.0525992325879783

In [882]:
seFKN0

array([0.00311397, 0.00337322, 0.00290639, 0.00323798, 0.00533055,
       0.004687  , 0.00249208, 0.00468713, 0.00142938, 0.00328441,
       0.00434429, 0.00058246, 0.00080285, 0.0005752 , 0.00060109,
       0.0018064 , 0.00216946, 0.00269626, 0.000616  , 0.00056872,
       0.00098117, 0.00169144, 0.00174439, 0.0032486 , 0.00069674,
       0.00089782, 0.0005501 , 0.00136375, 0.00080514, 0.00054775,
       0.00056954, 0.00059981, 0.00061002, 0.00149998, 0.00089356,
       0.03999154, 0.00158447, 0.01662605, 0.0030851 , 0.00314238,
       0.00111957, 0.01508594, 0.00083499, 0.00086465, 0.00081095,
       0.00027577, 0.00013191, 0.00012982, 0.00013633, 0.00013971,
       0.00012638, 0.00010282, 0.00011564, 0.00012739, 0.00011258,
       0.0002028 , 0.00023091])

## Regularization for parameter bounds

As we see above, the least squares estimator is not guaranteed to respect the parameter bounds $\sum_g \hat \lambda_g<1$. We can use that if we replace $\hat q^0_t$ with the choice probabilities from the maximum likelihood estimator of the logit model, $\hat q^{logit}_t\propto \exp\{X_t\hat \beta^{logit}\}$, and plug these choice probabilities into the WLS estimator described above, it will return $\hat \theta=(\hat \beta^{logit},0,\ldots,0)$ as the parameter estimate. Let $\hat q_t(\alpha)$ denote the weighted average of the logit probabilites and the market shares,
$$
\hat q_t(\alpha) =(1-\alpha) \hat q^{logit}_t+\alpha \hat q^0_t.
$$
 Let $\hat \theta^0(\alpha)$ denote the resulting parameter vector. We perform a line search for values of $\alpha$, $(\frac{1}{2},\frac{1}{4},\frac{1}{8},\ldots)$ until $\hat \theta^0(\alpha)$ yields a feasible parameter vector.


In [883]:
def LogL(Theta, y, x, sample_share, psi_stack, nest_count):
    ''' A function giving the mean IPDL loglikehood evaluated at data and an array of parameters 'Theta'
    '''
    return np.mean(IPDL_loglikelihood(Theta, y, x, sample_share, psi_stack, nest_count))

In [884]:
def LineSearch(Theta0, Logit_Beta, y, x, sample_share, psi_stack, nest_count, N, num_alpha = 5):
    ''' 
    '''
    T = len(x)
    d = Theta0.shape[0]
    K = x[0].shape[1]
    G = d-K

    # Find probabilities
    q_logit = logit.logit_ccp(Logit_Beta, x)
    q_obs = y

    # Search
    #alpha_line = np.linspace(0, 1, num_alpha)
    alpha0=0.5
    #LogL_alpha = np.empty((num_alpha,))
    #theta_alpha = np.empty((num_alpha, d))

    for k in range(1,100):

        alpha = alpha0**k

      
        q_alpha = {t: (1 - alpha)*q_logit[t] + alpha*q_obs[t] for t in np.arange(T)}
        theta_alpha = WLS_init(q_alpha, x, sample_share, psi_stack, nest_count, N)[0]

        lambda_alpha = theta_alpha[K:]
        
        pos_pars = np.array([theta for theta in lambda_alpha if theta > 0])

        if pos_pars.sum() <1:
            break
    
    # Pick the best set of parameters

    return theta_alpha

In [885]:
def GridSearch(Theta0, Logit_Beta, y, x, sample_share, psi_stack, nest_count, N, num_alpha = 5):
    ''' 
    '''
    T = len(x)
    d = Theta0.shape[0]
    K = x[0].shape[1]
    G = d-K

    # Find probabilities
    q_logit = logit.logit_ccp(Logit_Beta, x)
    q_obs = y

    # Search
    alpha_line = np.linspace(0, 1, num_alpha)
    LogL_alpha = np.empty((num_alpha,))
    theta_alpha = np.empty((num_alpha, d))

    for k in np.arange(len(alpha_line)):

        alpha = alpha_line[k]

        q_alpha = {t: (1 - alpha)*q_logit[t] + alpha*q_obs[t] for t in np.arange(T)}
        theta_alpha[k,:] = WLS_init(q_alpha, x, sample_share, psi_stack, nest_count, N)[0]

        lambda_alpha = theta_alpha[K:]
        pos_pars = np.array([theta for theta in lambda_alpha if theta > 0])

        if pos_pars.sum() >= 1:
            LogL_alpha[k] = np.NINF
        else:
            LogL_alpha[k] = LogL(theta_alpha, y, x, sample_share, psi_stack, nest_count)
    
    alpha_star = np.argmax(LogL_alpha)
    theta_hat_star = theta_alpha[alpha_star,:]
    
    # Pick the best set of parameters

    return theta_alpha

Implementing the grid search method we find corressponding parameters $\hat \theta^*$.

In [886]:
theta_alpha = LineSearch(thetaFKN0, beta_0, y, x, pop_share, Psi, Nest_count, N)

In [887]:
np.array([p for p in theta_alpha[K:] if p>0]).sum()

0.9321185227610683

In [888]:
q_IPDL(theta_alpha, y, x, pop_share, Psi, Nest_count).mean()

[0.9321185227610683, 0.026656102742521504]


0.026656102742521504

## Iterated FKN estimator

The iterated estimator is as the initial one, except there is an additional term on $\hat \varepsilon$. First, we update the choice probabilities,
$$
\hat q^k_i=p(\mathbf X_i,\hat \theta^{k-1})\\
$$
Then we assign
$$
\hat D^k_i=\nabla^2_{qq}\Omega(\hat q_i^k|\hat \lambda^{k-1})^{-1}-(\hat q^k_i \hat q^k_i)'
$$
and then construct the residual
$$
\hat \varepsilon^k_i(\theta)=\hat D^k_i\left( u(x_i,\beta)-\nabla_q \Omega(\hat q_i^k|\lambda)\right) -y_i+\hat q_i^k,
$$
Which can once again be simplified as
$$
\hat \varepsilon^k_i(\theta)= \hat A_i^k \theta-\hat r^k_i,
$$
where
$$
\hat A^k_i=\hat D_i^k\hat G^k_i, \hat r_i^k =\hat D^k_i\ln \hat q_i^k-y_i
$$
and where $\hat G^k_i$ is constructed as in the initial estimator. Using the weighted least squares estimator with weights $\hat W_i^k=\textrm{diag}(\hat q^k_i)^{-1}$, we get the estimator
$$
\hat \theta^k = \arg \min_{\theta}\frac{1}{n}\sum_i \hat \varepsilon^k_i(\theta)'\hat W_i^k \hat \varepsilon^k_i(\theta).
$$
We can once again solve it in closed form as
$$
\hat \theta^k =\left( \frac{1}{n}\sum_i \hat (A^k_i)'\hat W_i^k \hat A^k_i)\right)^{-1}\left( \frac{1}{n}\sum_i (\hat A_i^k)'\hat W_i^k \hat r_i^k\right)
$$
Now we implement this procedure and iterate starting from our initial guess $\hat \theta^{*}$


In [889]:
def WLS(Theta, y, x, sample_share, psi_stack, nest_count, N):
    '''
    This function calculates the weighted least squares estimator \hat \theta^k and its relevant estimated standard error for the iterated parameter estimates.

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        sample_share: A (T,) numpy array of the fraction of observations in each market t 
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t
        N: An integer giving the total amount of observations

    Returns
        theta_hat: a (K+G,) numpy array of initial FKN parameter estimates
        se_hat: a (K+G,) numpy array of standard errors for initial FKN parameter estimates
    '''
    T = len(x)
    d = Theta.shape[0]
    
    # Get ccp's
    q = IPDL_ccp(Theta, x, psi_stack, nest_count)

    # Construct A
    D = ccp_gradient(q, x, Theta, psi_stack, nest_count) # A is here constructed using the IPDL derivative of ccp's wrt. utilities instead of teh Logit derivative
    G = G_array(q, x, psi_stack, nest_count)
    A = {t: np.einsum('jk,kd->jd', D[t], G[t]) for t in np.arange(T)}
    W = {t: la.inv(np.diag(q[t])) for t in np.arange(T)}

    # Construct r
    log_q = {t: np.log(q[t], out = -np.inf*np.ones_like(q[t]), where=(q[t] > 0)) for t in np.arange(T)}
    r = {t: np.einsum('jk,k->j', D[t], log_q[t]) + y[t] for t in np.arange(T)}

    # Estimate parameters
    AWA = np.empty((T,d,d))
    AWr = np.empty((T,d))

    for t in np.arange(T):
        AWA[t,:,:] = sample_share[t]*np.einsum('jd,jk,kp->dp', A[t], W[t], A[t], optimize = True)
        AWr[t,:] = sample_share[t]*np.einsum('jd,jk,k->d', A[t], W[t], r[t], optimize = True)

    theta_hat = la.solve(AWA.sum(axis = 0), AWr.sum(axis = 0))
    se_hat = np.sqrt(np.diag(la.inv(AWA.sum(axis = 0))) / N)

    return theta_hat,se_hat

In [890]:
def FKN_estimator(logit_beta, q_obs, x, sample_share, psi_stack, nest_count, N, tol = 1.0e-15, max_iters = 1000):
    ''' 
    '''

    K = x[0].shape[1]

    theta_init = WLS_init(q_obs, x, sample_share, psi_stack, nest_count,  N)[0]
    
    if np.array([p for p in theta_init[K:] if p>0]).sum() >1:
        theta_hat_star = GridSearch(theta_init, logit_beta, q_obs, x, sample_share, psi_stack, nest_count, N)
        theta0 = theta_hat_star
    else:
        theta0 = theta_init

    #logl0 = LogL(theta0, q_obs, x, sample_share, psi_stack, nest_count)
    
    for k in np.arange(max_iters):
        theta1, se1 = WLS(theta0, q_obs, x, sample_share, psi_stack, nest_count, N)

        '''logl1=LogL(theta1, q_obs, x, sample_share, psi_stack, nest_count)
        
        for m in range(10):
            if logl1<logl0:
                theta1=0.5*theta0+0.5*theta1
                logl1=LogL(theta1, q_obs, x, sample_share, psi_stack, nest_count)
            else:
                break'''

        # Check convergence in an appropriate distance function
        dist = np.max(np.abs(theta1 - theta0))

        if dist<tol:
            succes = True
            iter = k
            break
        elif k==max_iters:
            succes = False
            iter = max_iters
            break
        else:
            None
            
        # Iteration step
        theta0 = theta1
        logl0 = logl1

    res = {'theta': theta1,
           'se': se1,
           'fun': -LogL(theta1, y, x, sample_share, psi_stack, nest_count),
           'iter': iter,
           'succes': succes}
    
    return res 
        

In [895]:
res = FKN_estimator(beta_0, y, x, pop_share, Psi, Nest_count, N, tol=1.0e-8, max_iters=1000)

[]
[]


IndexError: arrays used as indices must be of integer (or boolean) type

In [None]:
resbla2

{'theta': array([-3.80927815e+00, -5.75097444e+00,  1.33818097e+00, -9.04156933e-01,
         7.64624073e+00, -1.16108741e+00, -1.72686230e+00,  3.43906981e+00,
        -5.16298383e-01, -3.49536422e+00, -7.74885730e-01, -2.83158082e-01,
        -1.09683011e+00, -7.16899838e-01, -9.54208768e-01, -1.20226421e+00,
        -1.45351252e+00, -2.95857841e+00, -5.62602995e-01, -5.23159613e-01,
        -6.89613015e-01, -1.44855097e+00, -2.37939708e+00, -2.24907708e+00,
        -1.42211322e+00, -1.19076816e+00,  5.83399544e-01, -1.42625142e+00,
        -5.98569335e-01, -5.15086922e-01, -5.05526624e-01, -3.35002467e-01,
        -1.03112469e+00, -1.05752280e+00, -1.40302647e+00,  7.18015543e-01,
        -1.40377790e+00, -7.58230378e-01, -1.92717295e+00, -8.58192359e-01,
        -1.14590240e+00, -2.91430780e-01, -1.32982873e+00, -7.74522990e-01,
        -4.16498095e-01,  1.59350044e+00,  5.85285413e-03,  8.85794884e-03,
         3.68236188e-02,  2.25929957e-02, -7.58062702e-03,  4.05622309e-02,
   

In [None]:
FKN_theta = res['theta']

In [None]:
q_IPDL(FKN_theta, y, x, pop_share, Psi, Nest_count).mean()

[0.8993978866426582, 0.001493980851035042]


0.001493980851035042

In [None]:
reg_table(res['theta'],res['se'],N,x_vars,nest_vars)

variables,theta,se,t (theta == 0),p
in_out,-10.4067***,0.00309,3371.931,0.0
cy,-1.0728***,0.00151,708.887,0.0
hp,-3.2092***,0.00195,1644.597,0.0
we,-0.2245***,0.00144,156.123,0.0
le,-1.2365***,0.00126,981.018,0.0
wi,5.4818***,0.00289,1896.496,0.0
he,-0.1078***,0.00178,60.555,0.0
li,-0.7848***,0.00104,751.443,0.0
sp,3.2451***,0.00204,1587.552,0.0
ac,0.3005***,0.00065,463.385,0.0


# BLP Estimation and instruments

The setting is now a bit different. Instead of the noise coming from random sampling of individuals, we now have an additional source of uncertainty, stemming frm the random sampling of the fixed effects $\xi_{tj}$ for each market and each product. The number of ”observations” is therefore

$$
S = T \cdot \sum_t J_t
$$

Note that while random sampling of individuals choices (number of observations
in the hundreds of millions) still has an effect on the estimated parameters in
principle, this effect is completely drowned out by the sampling variance of the
fixed effects (number of observations T ≈ 15000?), so we choose to ignore it
here. When estimating random coefficients models, there is also a third source
of uncertainty stemming from approximation of numerical integrals. This is not
an issue in IPDL, as we have the inverse demand in closed form.

The principles are pretty similar to what we have been doing already. When
applicable, I will use the same notation as in the FKN section. Define the
residual,

$$\xi_m(\theta) = u(X_m, \beta) − \nabla_q \Omega(q^0|\lambda)$$

In the IPDL model, this residual is a linear function of $\theta$ which has the form

$$\xi_m(\theta) =  G^0_m \theta − r_m^0$$

where $ G^0_m=[X_m, Z_m^0]$, where $Z_m^0 = \nabla_{q,\lambda}\Omega(q_m^0|\lambda)$ and $r^0_m = \ln q^0_m$ as in the FKN section with $q^0_m$ being e.g. the observed market shares in market $m$. For the BLP estimator, we set this residual orthogonal to a matrix of instruments $\hat Z_m$ of size $J_m \times d$, and find the estimator $ \hat \theta^{IV}$ which solves the moment conditions

$$\frac{1}{T} \sum_m \hat Z_m' \xi(\hat \theta^{IV}) = 0$$

Since $\hat \xi$ is linear, the moment equations have a unique solution,

$$\hat \theta^{IV} = \left(\frac{1}{T}\sum_m \hat Z_m' G^0_m \right)^{-1}\left(\frac{1}{T}\sum_m \hat Z_m' r^0_m \right)$$

We require an instrument for the price of the goods. This is something which is correlated with the price, but uncorrelated with the error term $\xi_m$ (in the BLP model, $\xi_{mj}$ represents unobserved components of car quality). A standard instrument in this case would be a measure of marginal cost (or something which is correlated with marginal cost, like a production price index). For everything other than price, we can simply use the regressor itself as the instrument i.e. $ \hat Z^{mjd} = G^0_{mjd}$, for all other dimensions than price.

First we construct our instruments $\hat Z$. We'll use the average exchange rate of the destination country relative to average exchange rate of the origin country. 

In [783]:
xexr = {t: dat[dat['market'] == t]['xexr'].values for t in np.arange(T)}
G0 = G_array(y, x, Psi, Nest_count)
pr_index = len(x_contvars)
for t in np.arange(T):
    G0[t][:,pr_index] = xexr[t] / xexr[t].max()

z = G0

We then calculate the moment estimator $\hat \theta^{IV}$.

In [784]:
def BLP_estimator(y, z, x, sample_share, psi_stack, nest_count):
    '''
    Args.
        y: a dictionary of T numpy arrasy (J[t],) of observed or nonparametrically estimated market shares for each market t
        z: a dictionary of T numpy arrays (J[t],K+G) of instruments for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        sample_share: A (T,) numpy array of the fraction of observations in each market t 
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t

    Returns
        theta_hat: a numpy array (K+G,) of BLP parameter estimates
    '''
    T = len(z)

    G = G_array(y, x, psi_stack, nest_count)
    d = G[0].shape[1]
    r = {t: np.log(y[t], out = np.NINF*np.ones_like((y[t])), where = (y[t] > 0)) for t in np.arange(T)}
    
    sZG = np.empty((T,d,d))
    sZr = np.empty((T,d))

    for t in np.arange(T):
        sZG[t,:,:] = sample_share[t]*np.einsum('jd,jp->dp', z[t], G[t])
        sZr[t,:] = sample_share[t]*np.einsum('jd,j->d', z[t], r[t])

    theta_hat = la.solve(sZG.sum(axis=0), sZr.sum(axis=0))
    
    return theta_hat

In [785]:
BLP_theta = BLP_estimator(y, z, x, np.ones((T,)), Psi, Nest_count)

In [786]:
np.array([p for p in BLP_theta[K:] if p>0]).sum()

1.0177604791351837

In the Logit model we get the parameter estimates:

In [787]:
G_logit = x
for t in np.arange(T):
    G_logit[t][:,pr_index] = xexr[t] / xexr[t].max()

z_logit = G_logit

In [788]:
LogitBLP_beta = logit.LogitBLP_estimator(y, z_logit, x, np.ones((T,)))

In [789]:
LogitBLP_beta

array([-3.62007445, -5.2231269 ,  1.63482891, -0.29272541,  4.80219233,
       -7.07244195,  0.19039548,  0.35357351, -1.19227403, -0.74513781,
       -0.19538289, -0.53030469, -1.50554842, -0.80083257, -1.04325918,
       -1.6444787 , -1.07661701, -2.89393384, -0.94133483, -0.53954202,
       -1.08147988, -1.98547801, -2.9546429 , -2.76501592, -1.95331535,
       -1.68678532,  0.49380633, -1.87788536, -0.70656193, -0.67725803,
       -0.76844748, -0.54523664, -1.43066425, -1.11054745, -1.78965802,
       -2.80699212, -1.42887733, -3.82869218, -2.3875114 , -1.55711233,
       -0.61245025, -1.70484285, -1.35546094, -1.08931072, -0.65932002,
        1.54918688])

### BLP approximation to optimal instruments

BLP propose an algorithm for constructing an approximation to the optimal instruments. It is described in simple terms in Reynaert & Verboven (2014), and it has the following steps.
It requires an initial parameter estimator $\hat \theta = (\hat \beta', \hat \lambda')'$, here we can just usethe MLE we have already computed. Let $W_m$ denote the matrix of instruments (this is the matrix $X_m$ with the price replaced by the exchange rate). The steps are then as follows:

First we form the regression equation of the covariates on the instruments:
$$
X_m = W_m \Pi + E_m
$$

The OLS estimate is then given as:
$$
\hat \Pi = \left( \frac{1}{T}\sum_m W_m' W_m \right)^{-1}\left( \frac{1}{T}\sum_m W_m' X_m\right)
$$

Thus the predicted covariates given the instruments $W$ are:
$$
\hat X_m = W_m \hat \Pi
$$

Having constructed $\hat X_m$ (which consists of the exogenous regressors, and the predicted price given $W_m$), we compute the predicted mean utility:

$$
\hat u_m = \hat X_m \hat \beta
$$

and then the predicted market shares at the mean utility:

$$
\hat q_m^{*} = P(\hat u_m | \hat \lambda)
$$

Computationally, here we just use $\hat X_m$ in place of $X_m$ in the CCP function.
Given the predicted market shares, we compute

$$
\hat G_m^{*} = \left[\hat X_m, \nabla_{q,\lambda} \Omega (\hat q_m^{*} | \hat \lambda)\right]
$$

which is the same as the function $\hat G_m^0$ we already have constructed, except we evaluate it at the
predictions $\hat X_m$ and $\hat q_m^{*}$ instead of at $X_m$ and $\hat q_m^0$.

The procedure above gives an approximation to the optimal instruments. We also require a weight matrix. The optimal weight matrix is the (generalized) inverse of the conditional (on the instruments) covariance of the fixed effects. Assuming $\xi_{jm}$ is independetly and identically distributed over j and m, the conditional covariance simplifies to a scalar $\sigma^2$ times an identity matrix (of size $J_m$).
This means that all fixed effects are weighted equally, and the weights therefore drop out of the IV regression. The optimal IV estimator is therefore

$$
\hat \theta^{\text{IV}} = \left(\frac{1}{T}\sum_m (\hat G_m^*)'\hat G_m^0\right)^{-1}\left( \frac{1}{T}\sum_m (\hat G_m^*)'\hat r_m^0 \right)
$$

Let $\hat \xi^*$ denote the estimated residual evaluated at the new parameter estimates,

$$
\hat \xi_{mj}^* = \hat \xi_{mj}(\hat \theta^{\text{IV}})
$$

We may estimate the constant $\sigma^2$ by

$$
\hat \sigma^2 = \frac{1}{T}\sum_{m}\sum_{j = 1}^{J_m} \left(\hat \xi_{mj}^*\right)^2 
$$

The distribution of the estimator $\hat \theta^{\text{IV}}$ is then

$$
\hat \theta^{\text{IV}} \sim \mathcal{N}(\theta_0, \Sigma^{\text{IV}})
$$

which can be consistently estimated by

$$
\hat \Sigma^{\text{IV}} = \hat \sigma^2 \left( \sum_m (\hat G_m^*)'\hat G_m^0 \right)^{-1}
$$

and the standard errors are then the square root of the diagonal elements.

In [790]:
def predict_x(x, w, sample_share):
    ''' 
    '''
    T = len(w)
    K = w[0].shape[1]

    sWW = np.empty((T,K,K))
    sWX = np.empty((T,K,K))

    for t in np.arange(T):
        sWW[t,:,:] = sample_share[t]*np.einsum('jk,jl->kl', w[t], w[t])
        sWX[t,:,:] = sample_share[t]*np.einsum('jk,jl->kl', w[t], x[t])

    Pi_hat = la.solve(sWW.sum(axis=0), sWX.sum(axis=0))
    X_hat = {t: np.einsum('jl,lk->jk', w[t], Pi_hat) for t in np.arange(T)}

    return X_hat

In [792]:
def BLP_se(Theta, y, x, psi_stack, nest_count):
    ''' 
    '''
    T = len(x)
    S = T * np.array([x[t].shape[0] for t in np.arange(T)]).sum()

    G = G_array(y, x, psi_stack, nest_count)
    d = G[0].shape[1]
    r = {t: np.log(y[t]) for t in np.arange(T)}
    
    # We calculate \sigma^2
    xi = {t: np.einsum('jd,d->j', G[t], Theta) - r[t] for t in np.arange(T)}
    sum_xij2 = np.empty((T,))

    for t in np.arange(T):
        sum_xij2[t] = (xi[t]**2).sum()
    
    sigma2 = np.sum(sum_xij2) / S

    # We calculate GG for each market t
    GG = np.empty((T,d,d))

    for t in np.arange(T):
        GG[t,:,:] = np.einsum('jd,jp->dp', G[t], G[t])

    # Finally we compute \Sigma and the standard errors
    Sigma = sigma2*la.inv(GG.sum(axis=0))
    SE = np.sqrt(np.diag(Sigma))

    return SE

In [793]:
def OptimalBLP_estimator(Theta0, y, w, x, sample_share, psi_stack, nest_count):
    ''' 
    '''
    T = len(x)
    K = x[0].shape[1]

    beta0 = Theta0[:K]
    lambda0 = Theta0[K:]
    
    X_hat = predict_x(x, w, sample_share)
    q0 = IPDL_ccp(Theta0, X_hat, psi_stack, nest_count)
    G_star =G_array(q0, X_hat, psi_stack, nest_count)
    #G_star =G_array(y, w, psi_stack, nest_count)


    G0 = G_array(y, x, psi_stack, nest_count)
    
    #G_star=G0
    
    r = {t: np.log(y[t]) for t in np.arange(T)}

    d = G0[0].shape[1]

    sGG = np.empty((T,d,d))
    sGr = np.empty((T,d))

    for t in np.arange(T):
        sGG[t,:,:] = sample_share[t]*np.einsum('jd,jp->dp', G_star[t], G0[t])
        sGr[t,:] = sample_share[t]*np.einsum('jd,j->d', G_star[t], r[t])

    Theta_IV = la.solve(sGG.sum(axis=0), sGr.sum(axis=0))
    SE_IV = BLP_se(Theta_IV, y, x, psi_stack, nest_count)

    return Theta_IV, SE_IV

In [795]:
ThetaOptBLP, SEOptBLP = OptimalBLP_estimator(IPDL_theta, y, z_logit, x, np.ones((T,)), Psi, Nest_count)

In [796]:
G0 = G_array(y, x, Psi, Nest_count)
d = G0[0].shape[1]
r = {t: np.log(y[t]) for t in np.arange(T)}

# We calculate \sigma^2
xi = {t: np.einsum('jd,d->j', G0[t], ThetaOptBLP) - r[t] for t in np.arange(T)}
xi_np = np.empty((np.int64(J.sum()),))
index = J.cumsum()
for t in np.arange(T):
    if t == 0:
        xi_np[:index[t]] = xi[t]
    else:
        xi_np[index[t-1]:index[t]] = xi[t]

xi_np -= xi_np.mean() 

In [797]:
xi_np.mean()

-3.0896520741824117e-17

In [798]:
J.sum()

9199

In [799]:
np.array([p for p in ThetaOptBLP[K:]  if p > 0]).sum()

0.7979664676419304

In [800]:
-LogL(IPDL_theta, y, x, pop_share, Psi, Nest_count)

[0.20309070785106975, 0.024145447739504586]


0.024145447739504586

In [801]:
ThetaOptBLP[K:]

array([ 0.02404461,  0.04591335,  0.08334267,  0.15053443,  0.05053455,
        0.09686762,  0.04067728,  0.06649408,  0.01909169,  0.22046619,
       -0.23280802])

In [802]:
nest_vars

['cy', 'hp', 'we', 'le', 'wi', 'he', 'li', 'sp', 'ac', 'brand', 'home']

In [803]:
SEOptBLP[K:]

array([0.00079199, 0.00077638, 0.00082446, 0.00082639, 0.00074314,
       0.0005593 , 0.00068753, 0.0007575 , 0.00053441, 0.00091092,
       0.00108079])

In [804]:
S = T*np.array([x[t].shape[0] for t in np.arange(T)]).sum()
S

1379850

In [805]:
LogitBLP_beta[pr_index]

-0.19538288638001783

In [806]:
ThetaOptBLP[pr_index]

-0.05041510631794155

#### Multinomial Logit - for comparison

Estimating a Logit model via maximum likelihood with an initial guess of parameters $\hat \beta^0 = 0$ yields estimated parameters $\hat \beta^{\text{logit}}$ given as...

In [543]:
beta_0 = np.ones((K,))

# Estimate the model
res_logit = logit.estimate_logit(logit.q_logit, beta_0, y, x, sample_share=pop_share, Analytic_jac=True)

Optimization terminated successfully.
         Current function value: 0.001529
         Iterations: 46
         Function evaluations: 57
         Gradient evaluations: 57


In [544]:
logit_beta = res_logit['beta']
logit_score = logit.logit_score(logit_beta, y, x, pop_share) # maybe use 'logit.' functions from Logit_file instead of including e.g. standard errors in logit.estimate_logit function
logit_se = logit.logit_se(logit_score, N)
logit_t, logit_p = logit.logit_t_p(logit_beta, logit_score, N)
pd.DataFrame({'parameters': logit_beta, 'se' : logit_se, 't': logit_t, 'p': logit_p}, index = x_vars) # Our estimates

Unnamed: 0,parameters,se,t,p
in_out,-2.824034,2.9e-05,96210.1,0.0
cy,-0.002441,2.1e-05,115.2751,0.0
hp,-0.136166,2.2e-05,6325.787,0.0
we,-0.448615,2.2e-05,20682.07,0.0
le,-1.554939,2.1e-05,75064.63,0.0
wi,-1.880407,3e-05,62155.15,0.0
he,-2.264675,2.5e-05,90395.18,0.0
li,-0.663435,1.1e-05,58864.85,0.0
sp,-1.116575,2.6e-05,43021.37,0.0
ac,-0.652739,1e-05,65122.55,0.0


We then compute the corresponding Logit choice probabilities. STILL FIX THIS PART IN LOGIT BOOK!!!

In [545]:
logit_q = logit.logit_ccp(logit_beta, x)

We also find the elasticities and diversion ratios implied by the logit model as follows...

In [546]:
epsilon_logit = logit.logit_elasticity(logit_q, logit_beta, K-1) # Elasticities wrt. the price characteristic
DR_logit_hat = logit.logit_diversion_ratio(logit_q, logit_beta)

In [547]:
E_hat = IPDL_elasticity(q_hat, x, IPDL_theta, Psi, Nest_count)

For market $t=1$ the price elasticities are:

pd.DataFrame(E_hat[0]).rename_axis(columns = 'wrt. product', index = 'elasticity of product')

### Diversion ratios for the IPDL model

The diversion ratio to product j from product k is the fraction of consumers leaving product k and switching to product j following a one percent increase in the price of product k. Hence we have:

$$
\mathcal{D}_{jk}^i = -100 \cdot \frac{\partial P_j(u_i|\lambda) / \partial x_{ik\ell}}{\partial P_k(u_i|\lambda) / \partial x_{ik\ell}} = -100 \cdot \frac{\partial P_j(u_i|\lambda) / \partial u_{ik}}{\partial P_k(u_i|\lambda) / \partial u_{ik}}
$$

Where $\mathcal{D}^i = \left( \mathcal{D}_{jk}^i \right)_{j,k \in \{0,1,\ldots ,5\}}$ is the matrix of diversion ratios for individual i. This can be written more compactly as:

$$
\mathcal{D}^i = -100 \cdot  (\nabla_u P(u|\lambda) \circ I_J)^{-1}\nabla_u P(u|\lambda)
$$

In [548]:
def IPDL_diversion_ratio(q, x, Theta, psi_stack, nest_count):
    '''
    This function calculates diversion ratios from the IPDL model

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        Theta: a numpy array (K+G,) of parameters
        psi_stack: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t

    Returns
        Diversion_ratio: a dictionary of T numpy arrays (J,J) of diversion ratios from product j to product k for each individual i
    '''

    T = len(q.keys())

    Grad = ccp_gradient(q, x, Theta, psi_stack, nest_count) # Find the derivatives of ccp's wrt. utilities
    inv_diaggrad = {t: np.divide(1, np.diag(Grad[t]), out = np.zeros_like(np.diag(Grad[t])), where = (np.diag(Grad[t]) != 0)) for t in np.arange(T)}  # Compute the inverse of the 'own'-derivatives of ccp's
    DR = {t: np.multiply(-100, np.einsum('j,jk->jk', inv_diaggrad[t], Grad[t])) for t in np.arange(T)} # Compute diversion ratios as a hadamard product.
    
    return DR 

Calculating the implied diversion ratios $\mathcal{ D}^i$ from our estimates $\hat \theta^{\text{IPDL}}$, we find for market $t=1$:

In [549]:
DR_hat = IPDL_diversion_ratio(q_hat, x, IPDL_theta, Psi, Nest_count)

pd.DataFrame(DR_hat[0]).rename_axis(index = 'DR of products', columns = 'DR wrt. products')

In [550]:
DR_hat[0].sum(axis = 1).round(decimals = 8)

array([-102.21420624,   11.66516355,   85.92937549,  -13.21267058,
        -15.01285777,  -15.18617088,  -16.3434682 ,  -12.34547186,
        -36.1653491 ,  -20.31962142,    6.98589365, -162.86761639,
        -40.49628454,   15.60402234,  -30.33618596, -264.72963404,
        118.11701982,  -10.23483481,   28.2294045 ,  -39.19937787,
        -11.7471248 ,   14.45551088,  -47.95254862,  -13.84307058,
        -24.77048885, -907.77820789,  -37.59854995,   20.93792549,
         38.44868774,  -37.7905102 ,  -18.0676801 ,  -32.19858363,
        -12.90814536,  -32.11284049,  -45.70039748,   49.37834998,
          4.20532629, -133.27006306, -103.88631936,    1.66629638,
        -21.98431801,   -8.8069312 ,  -14.50843962,    9.20688017,
         11.05250475])

# Visualisation of elasticities and diversion ratios

We now compare the elasticities and the diversion ratios of the Logit and IPDL model. To clarify the interpretation of our results we will aggregate these according to the categorical variable `cla` describing the class or segment code of each vehicle. This variable takes values 'subcompact', 'compact', 'intermediate', 'standard', and 'luxury' encoded as the integers $1,\ldots, 5$. 

For all classes/segments $c,\ell \in \{1,\ldots, 5\}$ we calculate the change in the probability of class $c$, given as $q_c = \sum_j 1_{\{j\in c\}} q_j$, for a one unit increase in each of the utilities $u_j$ for products $j\in\ell$ i.e. we calculate the directional derivatives $\frac{\partial q_c}{\partial u_{\ell}}$. Then the price-to-income semi-elasticity of class $c$ wrt. class $\ell$ is given as $\bar E_{c\ell} = \frac{\partial q_c}{\partial u_\ell} \frac{1}{q_c} \beta_{\text{princ}}$. We use the fact that the directional derivative is calculated as $\frac{\partial q_c}{\partial u_{\ell}} = \sum_{j\in c} \sum_{k\in \ell} \frac{\partial q_j}{\partial u_k}$. In matrix notation this may be calulated as $\bar E = \psi^{\text{class}} \mathcal{E} {\psi^{\text{class}}}'$, where $\bar E = (\bar E_{c\ell})_{c,\ell = 1,\ldots,5}$ is the matrix of semi-elasticities between vehicle classes.

Psi_clafull, cla_descr, cla_count = Create_nests(dat[['cla', 'market', 'co']], 'market', 'co', ['cla'], outside_option=OO)

if OO:
    Psi_cla = {t: Psi_clafull[t][J[t]:, :] for t in np.arange(T)}
else:
    Psi_cla = {t: Psi_clafull[t][J[t]:, :] for t in np.arange(T)}
    
T_agg = Psi_cla[0].shape[0]

q_Logit_agg = {t: np.einsum('cj,j->c', Psi_cla[t], logit_q[t]) for t in np.arange(T)}
q_IPDL_agg = {t: np.einsum('cj,j->c', Psi_cla[t], q_hat[t]) for t in np.arange(T)}

Grad_Logit = {t: (np.diag(logit_q[t]) - np.einsum('j,k->jk', logit_q[t], logit_q[t])) for t in np.arange(T)}
Grad_IPDL = ccp_gradient(q_hat, x, IPDL_theta, Psi, Nest_count)

dq_dp_Logit_agg = {t: np.einsum('cj,jk,lk->cl', Psi_cla[t], Grad_Logit[t], Psi_cla[t])*logit_beta[K-1] for t in np.arange(T)}
dq_dp_IPDL_agg = {t: np.einsum('cj,jk,lk->cl', Psi_cla[t], Grad_IPDL[t], Psi_cla[t])*IPDL_theta[K-1] for t in np.arange(T)}

Logit_E_agg = {t:  np.einsum('c,cl->cl', 1./ q_Logit_agg[t], dq_dp_Logit_agg[t]) for t in np.arange(T)}
IPDL_E_agg = {t: np.einsum('c,cl->cl', 1./q_IPDL_agg[t], dq_dp_IPDL_agg[t]) for t in np.arange(T)}

E0, E1 = np.empty((T, T_agg, T_agg)), np.empty((T, T_agg, T_agg))
for t in np.arange(T):
    E0[t,:,:] = Logit_E_agg[t]
    E1[t,:,:] = IPDL_E_agg[t]

And we plot histograms of our results...

E0p = {j : (E0.reshape((T, T_agg**2))[:,j]).flatten() for j in np.arange(T_agg**2)} # Finds j'th entry in each of the elasticity matrices of individuals i.

j_pairs = iter.product(np.arange(T_agg), np.arange(T_agg))
num_bins = 25

fig, axes = plt.subplots(T_agg, T_agg)

for p, j in zip(j_pairs, np.arange(T_agg**2)):
    axes[p].hist(E0p[j], num_bins, range = (np.quantile(E0p[j], 0.10), np.quantile(E0p[j], 0.90)), color = 'r', alpha = 1) # Logit is blue
    axes[p].vlines(0, 0, 25, 'g', 'dotted')
    axes[p].get_xaxis().set_visible(False)
    axes[p].get_yaxis().set_visible(False)

fig.suptitle('Histograms of weigthed sums of Logit (red) and IPDL (blue) price elasticities by class')
fig.supxlabel('Weigthed sum of elasticities wrt. classes')
fig.supylabel('Weigthed sum of elasticities of classes')
fig.text(0.11, 0.8, '1', ha = 'center', va = 'center')
fig.text(0.11, 0.64, '2', ha = 'center', va = 'center')
fig.text(0.11, 0.48, '3', ha = 'center', va = 'center')
fig.text(0.11, 0.32, '4', ha = 'center', va = 'center')
fig.text(0.11, 0.16, '5', ha = 'center', va = 'center')
fig.text(0.2, 0.9, '1', ha = 'center', va = 'center')
fig.text(0.36, 0.9, '2', ha = 'center', va = 'center')
fig.text(0.52, 0.9, '3', ha = 'center', va = 'center')
fig.text(0.68, 0.9, '4', ha = 'center', va = 'center')
fig.text(0.84, 0.9, '5', ha = 'center', va = 'center')

plt.show()

E1p = {j : (E1.reshape((T, T_agg**2))[:,j]).flatten() for j in np.arange(T_agg**2)}

j_pairs = iter.product(np.arange(T_agg), np.arange(T_agg))
num_bins = 25

fig1, axes1 = plt.subplots(T_agg, T_agg)

for p, j in zip(j_pairs, np.arange(T_agg**2)):
    axes1[p].hist(E1p[j], num_bins, range = (np.quantile(E1p[j], 0.10), np.quantile(E1p[j], 0.90)), color = 'b', alpha = 1) # IPDL is blue
    axes1[p].vlines(0, 0, 25, 'red', 'dotted')
    axes1[p].get_xaxis().set_visible(False)
    axes1[p].get_yaxis().set_visible(False)

fig1.suptitle('Histograms of weigthed sums of Logit (red) and IPDL (blue) price elasticities by class')
fig1.supxlabel('Weigthed sum of elasticities wrt. classes')
fig1.supylabel('Weigthed sum of elasticities of classes')
fig1.text(0.11, 0.8, '1', ha = 'center', va = 'center')
fig1.text(0.11, 0.64, '2', ha = 'center', va = 'center')
fig1.text(0.11, 0.48, '3', ha = 'center', va = 'center')
fig1.text(0.11, 0.32, '4', ha = 'center', va = 'center')
fig1.text(0.11, 0.16, '5', ha = 'center', va = 'center')
fig1.text(0.2, 0.9, '1', ha = 'center', va = 'center')
fig1.text(0.36, 0.9, '2', ha = 'center', va = 'center')
fig1.text(0.52, 0.9, '3', ha = 'center', va = 'center')
fig1.text(0.68, 0.9, '4', ha = 'center', va = 'center')
fig1.text(0.84, 0.9, '5', ha = 'center', va = 'center')

plt.show()

#### The mean elasticities for the logit model are given as...

pd.DataFrame(E0.mean(axis = 0)).rename_axis(columns = 'Mean elasticity wrt. product', index = 'Mean elasticity of product')

#### For IPDL the mean elasticities are...

pd.DataFrame(E1.mean(axis = 0)).rename_axis(columns = 'Mean elasticity wrt. product', index = 'Mean elasticity of product')

### Diversion ratios

We now visualize the implied diversion ratios $\mathcal{D}$. If $\bar D_{c\ell}$ denotes the sum of choice probability weigthed diversion ratios, then we have as above that $\bar D_{c\ell} = \sum_{j}\sum_{k} \mathrm{1}_{\{j\in c\}} \mathrm{1}_{\{k\in \ell\}} q_j q_k \mathcal{D}_{jk}$ i.e. more generally $\bar D = (\psi^{\text{class}} \circ q) \mathcal{D} (\psi^{\text{class}} \circ q).'$

Logit_D_agg = {t: -100*np.einsum('c,cl->cl', 1./np.diag(dq_dp_Logit_agg[t]), dq_dp_Logit_agg[t]) for t in np.arange(T)}
IPDL_D_agg = {t: -100*np.einsum('c,cl->cl', 1./np.diag(dq_dp_IPDL_agg[t]), dq_dp_IPDL_agg[t]) for t in np.arange(T)}

D0, D1 = np.empty((T, T_agg, T_agg)), np.empty((T, T_agg, T_agg))
for t in np.arange(T):
    D0[t,:,:] = Logit_D_agg[t]
    D1[t,:,:] = IPDL_D_agg[t]

D0p = {j : (D0.reshape((T, T_agg**2))[:,j]).flatten() for j in np.arange(T_agg**2)} # Finds j'th entry in each of the elasticity matrices of individuals i.

j_pairs = iter.product(np.arange(T_agg), np.arange(T_agg))
num_bins = 25

fig, axes = plt.subplots(T_agg, T_agg)

for p, j in zip(j_pairs, np.arange(T_agg**2)):
    axes[p].hist(D0p[j], num_bins, range = (np.quantile(D0p[j], 0.10), np.quantile(D0p[j], 0.90)), color = 'r', alpha = 1) # Logit is red
    axes[p].vlines(0, 0, 25, 'g', 'dotted')
    axes[p].get_xaxis().set_visible(False)
    axes[p].get_yaxis().set_visible(False)

fig.suptitle('Histograms of weigthed sums of Logit (red) and IPDL (blue) price diversion ratios by class')
fig.supxlabel('Weigthed sum of diversion ratios wrt. classes')
fig.supylabel('Weigthed sum of diversion ratios of classes')
fig.text(0.11, 0.8, '1', ha = 'center', va = 'center')
fig.text(0.11, 0.64, '2', ha = 'center', va = 'center')
fig.text(0.11, 0.48, '3', ha = 'center', va = 'center')
fig.text(0.11, 0.32, '4', ha = 'center', va = 'center')
fig.text(0.11, 0.16, '5', ha = 'center', va = 'center')
fig.text(0.2, 0.9, '1', ha = 'center', va = 'center')
fig.text(0.36, 0.9, '2', ha = 'center', va = 'center')
fig.text(0.52, 0.9, '3', ha = 'center', va = 'center')
fig.text(0.68, 0.9, '4', ha = 'center', va = 'center')
fig.text(0.84, 0.9, '5', ha = 'center', va = 'center')

plt.show()

D1p = {j : (D1.reshape((T, T_agg**2))[:,j]).flatten() for j in np.arange(T_agg**2)}

j_pairs = iter.product(np.arange(T_agg), np.arange(T_agg))
num_bins = 25

fig, axes = plt.subplots(T_agg, T_agg, sharex=False, sharey=False)

for p, j in zip(j_pairs, np.arange(T_agg**2)):
    axes[p].hist(D1p[j], num_bins, range = (np.quantile(D1p[j], 0.10), np.quantile(D1p[j], 0.90)), color = 'b', alpha = 1) # IPDL is blue
    axes[p].vlines(0, 0, 25, 'red', 'dotted')
    axes[p].get_xaxis().set_visible(False)
    axes[p].get_yaxis().set_visible(False)

fig.suptitle('Histograms of weigthed sums of Logit (red) and IPDL (blue) price diversion ratios by class')
fig.supxlabel('Weigthed sum of diversion ratios wrt. classes')
fig.supylabel('Weigthed sum of diversion ratios of classes')
fig.text(0.11, 0.8, '1', ha = 'center', va = 'center')
fig.text(0.11, 0.64, '2', ha = 'center', va = 'center')
fig.text(0.11, 0.48, '3', ha = 'center', va = 'center')
fig.text(0.11, 0.32, '4', ha = 'center', va = 'center')
fig.text(0.11, 0.16, '5', ha = 'center', va = 'center')
fig.text(0.2, 0.9, '1', ha = 'center', va = 'center')
fig.text(0.36, 0.9, '2', ha = 'center', va = 'center')
fig.text(0.52, 0.9, '3', ha = 'center', va = 'center')
fig.text(0.68, 0.9, '4', ha = 'center', va = 'center')
fig.text(0.84, 0.9, '5', ha = 'center', va = 'center')

plt.show()

#### We also calculate the mean diversion ratios within each class. For the Logit model these are given as...

pd.DataFrame(D0.mean(axis = 0)).rename_axis(columns = 'Mean diversion ratio wrt. product', index = 'Mean diversion ratio of product')

#### For the IPDL model the mean diversion ratios are...

pd.DataFrame(D1.mean(axis = 0)).rename_axis(columns = 'Mean diversion ratio wrt. product', index = 'Mean diversion ratio of product')

LR = 2*(IPDL_loglikelihood(IPDL_theta, y, x, pop_share, Psi, Nest_count).sum() - logit.logit_loglikehood(logit_beta, y, x, pop_share).sum())

LR

scstat.chi2.sf(LR, df = G)

We find the corresponding choice probabilities implied by the MLE $\hat \theta$.

In [551]:
q_hat = IPDL_ccp(IPDL_theta, x, Psi, Nest_count)

  log_psiq =  np.log(psi_q) # Add Epsilon? to avoid zeros in log np.log(np.abs(gamma_q), out = np.NINF*np.ones_like(gamma_q), where = (np.abs(gamma_q) > 0))


For market $t=1$ the choice probabilites $\hat q_t$ are: 

We also find the IPDL price elasticities $\mathcal{E}$: