# Estimation of the Similarity Model

This notebook discusses two estimation methods of parameters for the Similarity model: the Maximimum Likelihood Estimator and the 'FKN' Estimator of Fosgerau et al. (2023). We implement both of these on the publically available data on the European car market from Frank Verboven's website at https://sites.google.com/site/frankverbo/data-and-software/data-set-on-the-european-car-market.

### Maximum Likelihood Estimation of Similarity

Suppose we want to estimate the parameters $\theta = (\beta', \lambda')'$, where $\beta$ is a vector of characteristics parameters and $\lambda$ is a vector of nesting parameters. The log-likelihood contribution of market $t$ in the Similarity Model is then:
$$
\ell_t(\theta)=y_t'\ln P_t(u_t|\theta),
$$
and an estimation routine must therefore have a function that - given $\mathbf{X}_t$ and $\theta$ - calculates $u_t=\mathbf{X}_t\beta$ and then calls the fixed point routine described in .... That routine will return the choice probabilities $P_t(u_t|\theta)$, and we can then evaluate $\ell_t(\theta)$.

In addition, when maximizing the likelihood we want the derivates at some $\theta=(\beta',\lambda')$. Let $q_t= P_t(u_t|\theta)$; then the dervative of choice probabilities wrt. parameters is given by,
$$
\nabla_\theta \ln P_t(u_t|\theta)=\mathrm{diag}(q_t)^{-1}\left(\nabla_{qq}^2\Omega_t(q_t|\lambda)^{-1}-q_tq_t' \right)\left[\mathbf{X}_t,-\nabla_{q,\lambda}^2 \Omega_t(q_t|\lambda)\right]
$$
Note that the first two components is the elasticity $\nabla_u \ln P_t(u_t|\theta)$ and the last term is a block matrix of size $J_t\times (K+G)$. The latter cross derivative of the Similarity Pertubation Function (see e.g. ...) $\nabla_{q,\lambda}^2 \Omega_t(q_t|\lambda)$ may be computed using the identity $\nabla_{q,\lambda}^2 \Omega_t(q_t|\lambda)_g = -\ln(q) + (\Psi^g)' \ln(\Psi^g q) - \varphi^g$ for each row $g=1,\ldots,G$. The derivative of the log-likelihood function can then be obtained from this as
$$
\nabla_\theta \ell_t(\theta)= y_t' \left(\nabla_\theta \ln P_t(u_t|\theta)\right) \\
$$

In [None]:
def compute_pertubation_hessian(q, x, Theta, model):
    '''
    This function calucates the hessian of the pertubation function \Omega

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        Theta: a numpy array (K+G,) of parameters
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'
    
    Returns
        Hess: a dictionary of T numpy arrays (J[t],J[t]) of second partial derivatives of the pertubation function \Omega for each market t
    '''
    psi = model['psi']
    T = len(q.keys())
    K = x[0].shape[1]

    Gamma = Create_Gamma(Theta[K:], model) # Find the \Gamma matrices 
    
    Hess={}
    for t in np.arange(T):
        psi_q = np.einsum('cj,j->c', psi[t], q[t]) # Compute a matrix product
        Hess[t] = np.einsum('cj,c,cl->jl', Gamma[t], 1/psi_q, psi[t], optimize=True) # Computes the product \Gamma' diag(\psi q)^{-1} \psi (but faster)
        
    return Hess

In [None]:
def ccp_gradient(q, x, Theta, model):
    
    '''
    This function calucates the gradient of the choice proabilities wrt. characteristics

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        Theta: a numpy array (K+G,) of parameters
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'
    
    Returns
        Grad: a dictionary of T numpy arrays (J[t],K) of partial derivatives of the choice proabilities wrt. utilities for each market t
    '''

    T = len(q.keys())
    Grad = {}
    Hess = compute_pertubation_hessian(q, x, Theta, model) # Compute the hessian of the pertubation function

    for t in np.arange(T):
        inv_omega_hess = la.inv(Hess[t]) # (J,J) for each t=1,...,T , computes the inverse of the Hessian
        qqT = q[t][:,None]*q[t][None,:] # (J,J) outerproduct of ccp's for each market t
        Grad[t] = inv_omega_hess - qqT  # Compute Similarity gradient of ccp's wrt. utilities

    return Grad

In [None]:
def Similarity_u_grad_Log_ccp(q, x, Theta, model):
    '''
    This function calucates the gradient of the log choice proabilities wrt. characteristics

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        Theta: a numpy array (K+G,) of parameters
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'
    
    Returns
        Epsilon: a dictionary of T numpy arrays (J[t],J[t]) of partial derivatives of the log choice proabilities of products j wrt. utilites of products k for each market t
    '''

    T = len(q.keys())
    Epsilon = {}
    Grad = ccp_gradient(q, x, Theta, model) # Find the gradient of ccp's wrt. utilities
    
    for t in np.arange(T):
        Epsilon[t] = Grad[t]/q[t][:,None] # Computes diag(q)^{-1}Grad[t]

    return Epsilon

In [None]:
def Similarity_elasticity(q, x, Theta, model, char_number = K-1):
    ''' 
    This function calculates the elasticity of choice probabilities wrt. any characteristic or nest grouping of products

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        Theta: a numpy array (K+G,) of parameters
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'
        char_number: an integer which is an index of the parameter in theta wrt. which we wish calculate the elasticity. Default is the index for the parameter of 'pr'.

    Returns
        a dictionary of T numpy arrays (J[t],J[t]) of choice probability semi-elasticities for each market t
    '''
    T = len(q.keys())
    Epsilon = {}
    Grad = Similarity_u_grad_Log_ccp(q, x, Theta, model) # Find the gradient of log ccp's wrt. utilities

    for t in np.arange(T):
        Epsilon[t] = Grad[t]*Theta[char_number] # Calculate semi-elasticities

    return Epsilon

In [None]:
def Similarity_loglikelihood(Theta, y, x, sample_share, model):
    ''' 
    This function computes the loglikehood contribution for each individual i.
    
    Args.
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        y: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        sample_share: a numpy array (T,) of the share of observations of each market t = 1,...,T
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'

    Output
        ll: a numpy array (T,) of Similarity loglikelihood contributions
    '''

    T = len(x.keys())
    K = x[0].shape[1]
    ccp_hat = Similarity_ccp(Theta, x, model) # Find Similarity choice probabilities
    sum_lambdaplus = np.array([theta for theta in Theta[K:] if theta >0]).sum() # Get sum of positive Lambda's
    
    ll=np.empty((T,))
    for t in np.arange(T):
        ll[t] = sample_share[t]*(y[t].T@np.log(ccp_hat[t])) # np.einsum('j,j', y[t], np.log(ccp_hat[t], out = -np.inf*np.ones_like(ccp_hat[t]), where = (ccp_hat[t] > 0)))

    print([sum_lambdaplus, -ll.mean()])

    return ll

In [None]:
def q_Similarity(Theta, y, x, sample_share, model):
    ''' The negative loglikelihood criterion to minimize
    '''
    Q = -Similarity_loglikelihood(Theta, y, x, sample_share, model)
    
    return Q

We also implement the derivative of the loglikehood wrt. parameters $\nabla_\theta \ell_t(\theta)$.

In [None]:
def cross_grad_pertubation(q, model):
    ''' 
    This function calculates the cross diffential of the pertubation function \Omega wrt. first ccp's and then the lambda parameters

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'
    
    Returns
        Z: a dictionary of T numpy arrays (J[t],G) of cross diffentials of the pertubation function \Omega wrt. first ccp's and then the lambda parameters
    '''

    # Get psidim object from the model specification
    psidim = model['psi_3d']
    phi = model['phi']

    # Get the amount of markets
    T = len(q.keys())

    # Initialize Z; the cross differential of the pertubation function
    Z = {}
    
    for t in np.arange(T):
        log_q = log_q = np.log(q[t], out = -np.inf*np.ones_like(q[t]), where = (q[t] > 0))
        psidim_t = psidim[t][1:,:,:] # Get each of the Psi^g nesting matrices
        psiq = psidim_t @ q[t] # Computes a matrix product
        log_psiq = np.log(psiq, out = np.NINF*np.ones_like(psiq), where = (psiq > 0))
        Z[t] = - log_q[:,None] + np.einsum('dkj,dk->jd', psidim_t, log_psiq) - phi[t] # Compute cross differential
    
    return Z

In [None]:
def cross_grad_pertubation_old(q, psi, phi):
    ''' 
    This function calculates the cross diffential of the pertubation function \Omega wrt. first ccp's and then the lambda parameters

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        psi: a dictionary of T numpy arrays (J[t] + sum(C_g),J[t]) of the J[t] by J[t] identity stacked on top of the \psi^g matrices for each market t as outputted by 'Create_nests'
        nest_count: a dictionary of T numpy arrays (G,) containing the amount of nests in each category g in each market t
    
    Returns
        Z: a dictionary of T numpy arrays (J[t],G) of cross diffentials of the pertubation function \Omega wrt. first ccp's and then the lambda parameters
    '''

    T = len(q.keys())
    J = np.array([q[t].shape[0] for t in np.arange(T)])
    G = np.int32((psi[0].shape[0] / J[0]) - 1)
    
    Z = {}

    for t in np.arange(T):
        log_q = np.log(q[t], out = -np.inf*np.ones_like(q[t]), where = (q[t] > 0))
        Psi_t = psi[t]
        Z_t = np.empty((J[t], G))

        for g in np.arange(1,G+1):
            Psi_d = Psi_t[g*J[t]:(g+1)*J[t],:]
            Psiq = np.einsum('cj,j->c', Psi_d, q[t])
            log_psiq = np.log(Psiq, out = -np.inf*np.ones_like(Psiq), where = (Psiq > 0))
            Z_t[:,g-1] = -log_q + np.einsum('cj,c->j', Psi_d, log_psiq) - phi[t][:,g-1]

        Z[t] = Z_t
    
    return Z

In [None]:
def Similarity_theta_grad_log_ccp(Theta, x, model):
    '''
    This function calculates the derivative of the Similarity log ccp's wrt. parameters theta

    Args.
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'
    Returns
        Grad: a dictionary of T numpy arrays (J[t],K+G) of derivatives of the Similarity log ccp's wrt. parameters theta for each market t
    '''

    T = len(x.keys())

    q = Similarity_ccp(Theta, x, model) # Find choice probabilities

    Z = cross_grad_pertubation(q, model) # Find cross differentials of the pertubation function
    u_grad = Similarity_u_grad_Log_ccp(q, x, Theta, model)  # Find the gradient of log ccp's wrt. utilities
    Grad={}

    for t in range(T):
        G = np.concatenate((x[t], -Z[t]), axis = 1) # Compute the block matrix
        Grad[t] = u_grad[t] @ G # Compute the derivative wrt. parameters

    return Grad

In [None]:
def Similarity_score(Theta, y, x, sample_share, model):
    '''
    This function calculates the score of the Similarity loglikelihood.

    Args.
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        y: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        sample_share: a numpy array (T,) of the share of observations of each market t = 1,...,T
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'

    Returns
        Score: a numpy array (T,K+G) of Similarity scores
    '''
    T = len(x.keys())

    log_ccp_grad = Similarity_theta_grad_log_ccp(Theta, x, model) # Find derivatives of Similarity log ccp's wrt. parameters theta
    D = len(Theta) # equal to K+G
    Score = np.empty((T,D))
    
    for t in np.arange(T):
        Score[t,:] =sample_share[t]*(log_ccp_grad[t].T@y[t]) #np.einsum('j,jd->d', y[t], log_ccp_grad[t]) # Computes a matrix product

    return Score

In [None]:
def Similarity_score_unweighted(Theta, y, x, model):
    '''
    This function calculates the score of the Similarity loglikelihood.

    Args.
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        y: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'

    Returns
        Score: a numpy array (T,K+G) of Similarity scores
    '''
    T = len(x.keys())

    log_ccp_grad = Similarity_theta_grad_log_ccp(Theta, x, model) # Find derivatives of Similarity log ccp's wrt. parameters theta
    D = log_ccp_grad[0].shape[1] # equal to K+G
    Score = np.empty((T,D))
    
    for t in np.arange(T):
        Score[t,:] = log_ccp_grad[t].T@y[t] #np.einsum('j,jd->d', y[t], log_ccp_grad[t]) # Computes a matrix product

    return Score

In [None]:
def q_Similarity_score(Theta, y, x, sample_share, model):
    ''' The derivative of the negative loglikelihood criterion
    '''
    return -Similarity_score(Theta, y, x, sample_share, model)

### Testing the score function

To make sure that our optimization procedure works as intended and is precise, we may calculate the normed difference of the numerical and our analytical gradients of the likelihood function $\ell_t(\theta)$ at some parameter $\hat \theta^0$.

In [None]:
def test_analyticgrad(y, x, theta, sample_share, model, delta = 1.0e-8):
    ''' 
    This function calculates the numerical and the analytical score functions at a given parameter \theta aswell the norm of their difference

    Args:
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        y: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        sample_share: a numpy array (T,) of the share of observations of each market t = 1,...,T
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'
        delta: the incremental change in the argument, a float, used in calculating numerical gradients

    Returns.
        normdiff: a float of the euclidean norm of the difference between the numerical and analytical score functions at \theta
        angrad: a numpy array (T,K+G) of analytical Similarity scores
        numgrad: a numpy array (T,K+G) of numerical Similarity scores
    '''

    T = len(x)
    K = x[0].shape[1]
    G = len(theta[K:])

    numgrad = np.empty((T, K+G))

    for i in np.arange(K+G):
        vec = np.zeros((K+G,))
        vec[i] = 1
        numgrad[:,i] = (Similarity_loglikelihood(theta + delta*vec, y, x, sample_share, model) - Similarity_loglikelihood(theta, y, x, sample_share, model)) / delta

    angrad = Similarity_score(theta, y, x, sample_share, model)

    normdiff = la.norm(angrad - numgrad)
    
    return normdiff, angrad, normdiff

diff, an, num = test_analyticgrad(y, x, theta0, pop_share, Model)
diff

## Standard errors in Maximum Likelihood estimation

As usual we may consistently estimate the covariance matrix of the Similarity maximum likelihood estimator $\hat \theta^{\text{MLE}}$ by the inverse information matrix:

$$
\hat \Sigma = \left( \sum_{t=1}^T s_t\nabla_\theta \ell_t \left(\hat \theta^{\text{MLE}}\right) \nabla_\theta \ell_t \left(\hat \theta^{\text{MLE}}\right)' \right)^{-1}
$$

Where $s = (s_1, \ldots, s_T)'$ is a vector of the fractions of total observations observed in each market $t$. Accordingly we may find the estimated standard error of parameter $d = 1,\ldots,K+G$ as the squareroot of the $d$'th diagonal entry of $\hat \Sigma$:

$$
\hat \sigma_d = \sqrt{\hat \Sigma_{dd}}
$$

In [None]:
def Similarity_se(score, sample_share, N):
    '''
    This function computes the asymptotic standard errors of the MLE.

    Args.
        score: a numpy array (T,K+G) of Similarity scores as outputted by the function 'Similarity_score_unweighted'.
        sample_share: a numpy array (T,) of the share of observations of each market t = 1,...,T
        N: an integer giving the number of observations

    Returns
        SE: a numpy array (K+G,) of asymptotic Similarity MLE standard errors
    '''

    SE = np.sqrt(np.diag(la.inv(np.einsum('td,tm->dm', sample_share[:,None]*score, score))) / N) # Compute standard errors by taking the squareroot of the diagonal elements of the variance estimate

    return SE

In [None]:
def Similarity_t_p(SE, Theta, N, Theta_hypothesis = 0):
    ''' 
    This function calculates t statistics and p values for characteristic and nest grouping parameters

    Args.
        SE: a numpy array (K+G,) of asymptotic Similarity MLE standard errors
        Theta: a numpy array (K+G,) of parameters of (\beta', \lambda')',
        N: an integer giving the number of observations
        Theta_hypothesis: a (K+G,) array or integer of parameter values to test in t-test. Default value is 0.
    
    Returns
        T: a (K+G,) array of estimated t tests
        p: a (K+G,) array of estimated asymptotic p values computed using the above t-tests
    '''

    T = np.abs(Theta - Theta_hypothesis) / SE # Compute two-sided t-tests
    p = 2*scstat.t.sf(T, df = N-1) # Compute p-values

    return T,p

In [None]:
def estimate_Similarity(f, Theta0, y, x, sample_share, model, N, Analytic_jac:bool = True, options = {'disp': True}, **kwargs):
    ''' 
    Takes a function and returns the minimum, given starting values and variables necessary in the Similarity Model specification.

    Args:
        f: a function to minimize,
        Theta0 : a numpy array (K+G,) of initial guess parameters (\beta', \lambda')',
        y: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        sample_share: a numpy array (T,) of the share of observations of each market t = 1,...,T,
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification',
        N: an integer giving the number of observations,
        Analytic_jac: a boolean. Default value is 'True'. If 'True' the analytic jacobian of the Similarity loglikelihood function is used in estimation. Else the numerical jacobian is used.
        options: dictionary with options for the optimizer (e.g. disp=True which tells it to display information at termination.)
    
    Returns:
        res: a dictionary with results from the estimation.
    '''

    # The objective function is the average of q(), 
    # but Q is only a function of one variable, theta, 
    # which is what minimize() will expect
    Q = lambda Theta: np.mean(f(Theta, y, x, sample_share, model))

    if Analytic_jac == True:
        Grad = lambda Theta: np.mean(q_Similarity_score(Theta, y, x, sample_share, model), axis=0) # Finds the Jacobian of Q. Takes mean of criterion q derivatives along axis=0, i.e. the mean across individuals.
    else:
        Grad = None

    # call optimizer
    result = optimize.minimize(Q, Theta0.tolist(), options=options, jac=Grad, **kwargs) # optimize.minimize takes a list of parameters Theta0 (not a numpy array) as initial guess.
    
    # Find estimated standard errors, t-tests, and p-values
    se = Similarity_se(Similarity_score_unweighted(result.x, y, x, model), sample_share, N) # Calculate standard errors using the unweighted score contributions from each market
    T,p = Similarity_t_p(se, result.x, N)

    # collect output in a dict 
    res = {
        'theta': result.x,
        'se': se,
        't': T,
        'p': p,
        'success':  result.success, # bool, whether convergence was succesful 1
        'nit':      result.nit, # no. algorithm iterations 
        'nfev':     result.nfev, # no. function evaluations 
        'fun':      result.fun # function value at termination 
    }

    return res

We then estimate the model using the corresponding MLE Logit parameters $\hat \beta^{\text{Logit}}$ and nesting parameters $\hat \lambda^0 = (0,\ldots,0)'$ as initial parameters.

In [None]:
beta_0 = np.ones((K,))

# Estimate the model
Logit_beta = estimate_logit(q_logit, beta_0, y, x, sample_share=pop_share, Analytic_jac=True)['beta']
Logit_SE = logit_se(logit_score_unweighted(Logit_beta, y, x), pop_share, N)
Logit_t, Logit_p = logit_t_p(Logit_beta, logit_score_unweighted(Logit_beta, y, x), pop_share, N)

# Initialize \theta^0
theta0 = np.append(Logit_beta,lambda0)

Optimization terminated successfully.
         Current function value: 0.001529
         Iterations: 26
         Function evaluations: 34
         Gradient evaluations: 34


In [None]:
resMLE = estimate_Similarity(q_Similarity, theta0, y, x, pop_share, Model, N)

[0.0, 0.001528535134161618]
[3.7176505867959405e-06, 0.001528530844629761]
[1.8588252933979702e-05, 0.001528514281678381]
[7.807066232271476e-05, 0.0015284575876433064]
[0.0002888968393993265, 0.0015283779712649036]
[0.0003156820978933802, 0.0015283756589398688]
[0.0004228231318695951, 0.001528366650756558]
[0.0008514977339305401, 0.0015283344814752554]


[0.002552430433766674, 0.0015282508819820435]
[0.011407411931886541, 0.001528002344870776]
[0.0336266372510129, 0.001527432169794866]
[0.07652390229927372, 0.0015264248802469536]
[0.1598061804339339, 0.0015247252520192492]
[0.3261793453534204, 0.0015224054228309036]
[0.41714512626147443, 0.0015215190856618535]
[0.43667551144630096, 0.0015212756051922887]
[0.4294172691651903, 0.0015212519744114967]
Optimization terminated successfully.
         Current function value: 0.001521
         Iterations: 11
         Function evaluations: 17
         Gradient evaluations: 17


In [None]:
def reg_table(theta, se, N, x_vars, nest_vars):
    '''
    This function constructs a regression table based on Similarity parameter standard error estimates

    Args:
        theta: a (K+G,) numpy array of estimated parameters
        se: a (K+G,) numpy array of estimated standard errors
        N: an integer; the number of observations
        x_vars: a list containing the names of the covariates
        nest_vars: a list containing the names of the nesting groups

    Returns.
        table: a pandas dataframe structured as a regression table w. parameter estiamtes, standard errors, t-tests, and p-values  
    '''
    Similarity_t, Similarity_p = Similarity_t_p(se, theta, N) # Get t-test values and p values

    if OO:
        regdex = [*x_vars, *['group_' + var for var in nest_vars]] # Set the names of the covariates and the nesting groups as the index
    else:
        regdex = [*x_vars, *['group_' + var for var in nest_vars]] # -=-

    table  = pd.DataFrame({'theta': [ str(np.round(theta[i], decimals = 4)) + '***' if Similarity_p[i] <0.01 else str(np.round(theta[i], decimals = 3)) + '**' if Similarity_p[i] <0.05 else str(np.round(theta[i], decimals = 3)) + '*' if Similarity_p[i] <0.1 else np.round(theta[i], decimals = 3) for i in range(len(theta))], # Give stars to parameter estimates according to t-tests at levels of significance 0.1, 0.05, and 0.01
                'se' : np.round(se, decimals = 5),
                't (theta == 0)': np.round(Similarity_t, decimals = 3),
                'p': np.round(Similarity_p, decimals = 3)}, index = regdex).rename_axis(columns = 'variables')
    
    return table

In [None]:
Similarity_theta = resMLE['theta']
Similarity_SE = resMLE['se']
Similarity_t, Similarity_p = Similarity_t_p(Similarity_SE, Similarity_theta, N)
reg_table(Similarity_theta, Similarity_SE, N, x_vars, nest_vars)

variables,theta,se,t (theta == 0),p
in_out,-2.5498***,0.43873,5.812,0.0
cy,-0.318,0.22618,1.408,0.159
hp,-0.457*,0.27293,1.676,0.094
we,-0.9464***,0.22234,4.256,0.0
le,-1.9569***,0.30143,6.492,0.0
wi,-2.0975***,0.42346,4.953,0.0
he,-2.0849***,0.31568,6.604,0.0
li,-0.7439***,0.11864,6.27,0.0
sp,-1.3924***,0.27032,5.151,0.0
ac,-0.055,0.10572,0.522,0.601


In [None]:
np.array([p for p in Similarity_theta[K:] if p>0]).sum()

0.4294172691651903

### The FKN Estimator: An alternative approach

The log-likelihood function is not globally concave, and finding the global optimum can be difficult. Using the estimation procedure of Fosgerau et. al. (2023 working paper), we can instead fit the parameters using the first-order conditions  for optimality $0=u(X_t,\beta) - \nabla_q \Omega_t(\hat q_t^0|\lambda)$ and the observed market shares $\hat q^0_t$ or some non-parametric estimate of the CCP's. The estimator takes the form

$$
\hat \theta^0=\arg \min_{\theta} \sum_t s_t \hat \varepsilon^0_t(\theta)'\hat W^0_t\hat \varepsilon^0 _t(\theta),
$$
where $\hat W^0_t$ is a positive semidefinite weight matrix, $s_t$ is market $t$'s share of the total population, and
$$
\hat \varepsilon^0_t(\theta) = \hat D^0_t(u(X_t,\beta)- \nabla_q \Omega_t(\hat q_t^0|\lambda)) ,
$$
is the residual of the first-order conditions utilizing the Logit derivatives of CCP's wrt. utilities
$$
\hat D^0_t=\textrm{diag}(\hat q^0_t)-\hat q^0_t (\hat q^0_t)'.
$$
Using equation (...) above, we have that $\hat \epsilon_t$ is a linear function of $\theta$,
$$
\hat \varepsilon^0_t(\theta)=\hat D^0_t \left(\hat G^0_t\theta- \ln \hat q^0_t\right)\equiv \hat A^0_t\theta-\hat r^0_t.
$$
With $\hat G^0_t = [X_t, -\nabla^2_{q,\lambda} \Omega (q^0_t|\lambda)]$, $\hat A^0_t = \hat D^0_t \hat G^0_t$ and $\hat r^0_t = \hat D^0_t \ln \hat q^0_t $. Using linearity, the weighted least squares criterion has a unique closed form solution,
$$
\hat \theta^0 =\left(\sum_t s_t (\hat A^0_t)'\hat W^0_t \hat A^0_t \right)^{-1}\left(\sum_t s_t (\hat A^0_t)'\hat W^0_t \hat r_t^0 \right)
$$

In our estimation procedure we will use the inverse of the matrix with the observed market shares CCP's $\hat q^0_t$ along its main diagonal as our weight matrix $\hat W^0_t = \mathrm{diag}(\hat q^0_t)^{-1}$.

In [None]:
def G_array(q, x, model):
    ''' 
    This function calculates the G block matrix

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'

    Returns
        G: a dictionary  of T numpy arrays (J[t],K+G): a G matrix for each market t
    '''
    T = len(x)

    Z = cross_grad_pertubation(q, model) # Find the cross derivative of the pertubation function \Omega wrt. lambda and ccp's q
    G = {t: np.concatenate((x[t],-Z[t]), axis=1) for t in np.arange(T)} # Join block matrices along 2nd dimensions  s.t. last dimension is K+G (same dimension as theta)

    return G

In [None]:
def D_array(q):
    '''
    This function calculates the D matrix - the logit derivative of ccp's wrt. utilities

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t

    Returns
        D: a dictionary of T numpy arrays (J[t],J[t]) of logit derivatives of ccp's wrt. utilities for each market t
    '''
    T = len(q)

    D = {t: np.diag(q[t]) - np.einsum('j,k->jk', q[t], q[t]) for t in np.arange(T)} # Compute logit derivatives of ccp's wrt. utilities
    
    return D

In [None]:
def A_array(q, x, model):
    '''
    This function calculates the A matrix

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'

    Returns
        A: a dictionary  of T numpy arrays (J[t],K+G): an A matrix for each market t
    '''
    T = len(x)

    D = D_array(q) # Compute the derivatives of logit ccp's
    G = G_array(q, x, model) # Get the G block matrix
    A = {t: np.einsum('jk,kd->jd', D[t], G[t]) for t in np.arange(T)} # Compute a matrix product for each market t

    return A

In [None]:
def r_array(q):
    '''
    This function calculates 'r'; the logarithm of observed or nonparametrically estimated market shares

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
    
    Returns
        r: a dictionary of T numpy arrays (J[t],) of the log of ccp's for each market t
    '''
    T = len(q)

    D = D_array(q) 
    log_q = {t: np.log(q[t], out = -np.inf*np.ones_like(q[t]), where = (q[t] > 0)) for t in np.arange(T)} # Take logs of ccp-s
    r = {t: np.einsum('jk,k->j', D[t], log_q[t]) for t in np.arange(T)} # Compute matrix product

    return r

In [None]:
def WLS_init(q, x, sample_share, model, N):
    ''' 
    This function calculates the weighted least squares estimator \hat \theta^k and its relevant estimated standard error for the initial FKN parameter estimates.

    Args.
        q: a dictionary of T numpy arrays (J[t],) of choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        sample_share: A (T,) numpy array of the fraction of observations in each market t 
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'
        N: An integer giving the total amount of observations

    Returns
        theta_hat: a (K+G,) numpy array of initial FKN parameter estimates
        se_hat: a (K+G,) numpy array of standard errors for initial FKN parameter estimates
    '''

    T = len(x)

    A = A_array(q, x, model)
    r = r_array(q)

    d = A[0].shape[1] # Get the total number of parameters; this is equal to K+G
    
    # Initialize AWA and AWr matrices
    AWA = np.empty((T,d,d))
    AWr = np.empty((T,d))

    for t in np.arange(T):
        AWA[t,:,:] = sample_share[t]*np.einsum('jd,j,jp->dp', A[t], 1/q[t], A[t], optimize = True) # Fast product using that the weights 'W' are diagonal.
        AWr[t,:] = sample_share[t]*np.einsum('jd,j,j->d', A[t], 1/q[t], r[t], optimize = True)
    
    theta_hat = la.solve(AWA.sum(axis = 0), AWr.sum(axis = 0)) # Solve system of equations AWA.sum()*theta = AWr.sum() for parameter estimates theta
    #se_hat = np.sqrt(np.diag(la.inv(AWA.sum(axis = 0))) / N)
    
    return theta_hat
    

Using the observed market shares we may thus find initial parameter estimates $\hat \theta^0$ as described above.

In [None]:
thetaFKN0 = WLS_init(y, x, pop_share, Model, N)

In [None]:
np.array([p for p in thetaFKN0[K:] if p > 0]).sum()

1.218988344960131

## Regularization for parameter bounds

As we see above, the least squares estimator is not guaranteed to respect the parameter bounds $\sum_g \hat \lambda_g<1$. We can use that if we replace $\hat q^0_t$ with the choice probabilities from the maximum likelihood estimator of the Logit Model, $\hat q^{logit}_t\propto \exp\{X_t\hat \beta^{logit}\}$, and plug these choice probabilities into the WLS estimator described above, it will return $\hat \theta=(\hat \beta^{logit},0,\ldots,0)$ as the parameter estimate. Let $\hat q_t(\alpha)$ denote the weighted average of the logit probabilites and the market shares,
$$
\hat q_t(\alpha) =(1-\alpha) \hat q^{logit}_t+\alpha \hat q^0_t.
$$
 Let $\hat \theta^0(\alpha)$ denote the resulting parameter vector. We perform a line search for values of $\alpha$, $(\frac{1}{2},\frac{1}{4},\frac{1}{8},\ldots)$ until $\hat \theta^0(\alpha)$ yields a feasible parameter vector.


In [None]:
def LogL(Theta, y, x, sample_share, model):
    ''' 
    A function giving the mean Similarity loglikehood evaluated at data and an array of parameters 'Theta'
    '''
    return np.mean(Similarity_loglikelihood(Theta, y, x, sample_share, model))

In [None]:
def LineSearch(Logit_Beta, q_obs, x, sample_share, model, N):
    '''
    This function performs a line search to find feasible lambda parameters

    Args:
        Logit_beta: a (K,) numpy array of estimated beta parameters from a corresponding Logit Model
        q_obs: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        sample_share: a numpy array (T,) of the share of observations of each market t = 1,...,T,
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification',
        N: an integer giving the number of observations

    Returns.
        theta_alpha: a (K+G,) numpy array of feasible parameters found by line search
    '''

    # Get dimensions of data
    T = len(x)
    K = x[0].shape[1]

    # Find probabilities
    q_logit = logit_ccp(Logit_Beta, x)

    # Search over alphas s.t. alpha = (1/2)^{k} for some positive integer k
    alpha0 = 0.5

    for k in np.arange(1,100):

        # Set alpha
        alpha = alpha0**k 
        
        # Compute convex combination of ccp's
        q_alpha = {t: (1 - alpha)*q_logit[t] + alpha*q_obs[t] for t in np.arange(T)}
        theta_alpha = WLS_init(q_alpha, x, sample_share, model, N) # Compute initial FKN parameters but using q_alpha ccp's 

        lambda_alpha = theta_alpha[K:] # Find lambda parameters
        pos_pars = np.array([theta for theta in lambda_alpha if theta > 0]) # Find positive lambda parameters

        if pos_pars.sum() <1:
            break # Break if positive parameters sum to less than 1

    return theta_alpha

In [None]:
def GridSearch(Logit_Beta, y, x, sample_share, model, N, num_alpha = 5):
    '''
    This function performs a grid search on the unit interval to find feasible parameters \theta

    Args:
        Logit_beta: a (K,) numpy array of estimated beta parameters from a corresponding Logit Model
        y: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        sample_share: a numpy array (T,) of the share of observations of each market t = 1,...,T,
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification',
        N: an integer giving the number of observations,
        num_alpha: an integer of the number of alphas for which the search is to be performed

    Returns.
        theta_star: a (K+G,) numpy array of feasible parameters found by grid search
    '''

    T = len(x)
    J0 = x[0].shape[0]
    psi_3d0 = model['psi_3d'][0]
    G = np.int64(psi_3d0.shape[0] - 1)
    K = x[0].shape[1]

    # Find probabilities
    q_logit = logit_ccp(Logit_Beta, x)
    q_obs = y

    # Search
    alpha_line = np.linspace(0, 1, num_alpha)
    LogL_alpha = np.empty((num_alpha,))
    theta_alpha = np.empty((num_alpha,K+G))

    for k in np.arange(len(alpha_line)):

        alpha = alpha_line[k]

        q_alpha = {t: (1 - alpha)*q_logit[t] + alpha*q_obs[t] for t in np.arange(T)}
        theta_alpha[k,:] = WLS_init(q_alpha, x, sample_share, model, N)

        #lambda_inout = theta_alpha[k,K]
        lambda_alpha = theta_alpha[k,K:] # theta_alpha[k,K+1:]
        pos_pars = np.array([theta for theta in lambda_alpha if theta > 0])

        if (pos_pars.sum() >= 1): #|(lambda_inout >= 1)
            LogL_alpha[k] = np.NINF
        else:
            LogL_alpha[k] = LogL(theta_alpha[k,:], y, x, sample_share, model)
    
    # Pick the best set of parameters
    alpha_star = np.argmax(LogL_alpha)
    theta_star = theta_alpha[alpha_star,:]

    return theta_star

Implementing the line search method we find corressponding parameters $\hat \theta^*$.

In [None]:
theta_alpha = LineSearch(Logit_beta, y, x, pop_share, Model, N)

In [None]:
q_Similarity(theta_alpha, y, x, pop_share, Model).mean()

[0.7549534173977517, 0.0015132588931061905]


0.0015132588931061905

## Iterated FKN estimator

The initial estimator $\hat \theta^0$ is usually biased. However this can be accommodated for via iterating a similar estimator on the estimator $\hat \theta^0$. The iterated estimator is as the initial one, except there is an additional term on $\hat \varepsilon$. First, we update the choice probabilities,
$$
\hat q^k_t=p(\mathbf X_t,\hat \theta^{k-1})\\
$$
Then we assign
$$
\hat D^k_t=\nabla^2_{qq}\Omega(\hat q_t^k|\hat \lambda^{k-1})^{-1}-(\hat q^k_t \hat q^k_t)'
$$
and then construct the residual
$$
\hat \varepsilon^k_t(\theta)=\hat D^k_t\left( u(x_t,\beta)-\nabla_q \Omega(\hat q_t^k|\lambda)\right) -y_t+\hat q_t^k,
$$
Which can once again be simplified as
$$
\hat \varepsilon^k_t(\theta)= \hat A_t^k \theta-\hat r^k_t,
$$
where
$$
\hat A^k_t=\hat D_t^k\hat G^k_t, \hat r_t^k =\hat D^k_t\ln \hat q_t^k-y_t
$$
and where $\hat G^k_t$ is constructed as in the initial estimator. Using the weighted least squares estimator with weights $\hat W_t^k=\textrm{diag}(\hat q^k_t)^{-1}$, we get the estimator
$$
\hat \theta^k = \arg \min_{\theta}\frac{1}{T}\sum_t \hat \varepsilon^k_t(\theta)'\hat W_t^k \hat \varepsilon^k_t(\theta).
$$
We can once again solve it in closed form as
$$
\hat \theta^k =\left( \frac{1}{T}\sum_t  ( \hat A^k_t)'\hat W_t^k \hat A^k_t\right)^{-1}\left( \frac{1}{T}\sum_t (\hat A_t^k)'\hat W_t^k \hat r_t^k\right)
$$

FKN (2021) show that the estimator $\hat \theta^k$ converges to the (true) MLE $\hat \theta^{\text{MLE}}$ as the number of iterations approaches infinity, avoiding thereby 

Now we implement this procedure and iterate starting from our initial guess $\hat \theta^{*}$


In [None]:
def WLS(Theta, y, x, sample_share, model, N):
    '''
    This function calculates the weighted least squares estimator \hat \theta^k and its relevant estimated standard error for the iterated parameter estimates.

    Args:
        Theta: a (K+G,) numpy array of previously estimated \theta parameters
        y: a dictionary of T numpy arrays (J[t],) of observed market shares in onehot encoding for each market t,
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t,
        sample_share: a numpy array (T,) of the share of observations of each market t = 1,...,T,
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification',
        N: an integer giving the number of observations,

    Returns.
        theta_hat: a (K+G,) numpy array of iterated FKN parameter estimates
        se_hat: a (K+G,) numpy array of standard errors for iterated FKN parameter estimates
    '''
    T = len(x)
    d = Theta.shape[0]
    
    # Get ccp's
    q = Similarity_ccp(Theta, x, model)

    # Construct A
    D = ccp_gradient(q, x, Theta, model) # A is here constructed using the Similarity derivative of ccp's wrt. utilities instead of the Logit derivative
    G = G_array(q, x, model)
    A = {t: np.einsum('jk,kd->jd', D[t], G[t]) for t in np.arange(T)}

    # Construct r
    log_q = {t: np.log(q[t], out = -np.inf*np.ones_like(q[t]), where=(q[t] > 0)) for t in np.arange(T)}
    r = {t: np.einsum('jk,k->j', D[t], log_q[t]) + y[t] for t in np.arange(T)} # r = D %o% log(q) + y

    # Estimate parameters
    AWA = np.empty((T,d,d))
    AWr = np.empty((T,d))

    for t in np.arange(T):
        AWA[t,:,:] = sample_share[t]*np.einsum('jd,j,jp->dp', A[t], 1./q[t], A[t], optimize = True)
        AWr[t,:] = sample_share[t]*np.einsum('jd,j,j->d', A[t], 1./q[t], r[t], optimize = True)

    theta_hat = la.solve(AWA.sum(axis = 0), AWr.sum(axis = 0))
    se_hat = np.sqrt(np.diag(la.inv(AWA.sum(axis = 0))) / N)

    return theta_hat,se_hat

In [None]:
def FKN_estimator(logit_beta, q_obs, x, sample_share, model, N, tol = 1.0e-15, max_iters = 1000):
    '''
    This function estimates the Similarity Model via the FKN estimator

    Args:
        logit_beta: a (K,) numpy array of estimated beta parameters from a corresponding Logit Model
        q_obs: a dictionary of T numpy arrays (J[t],) of observed choice probabilities for each market t
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        sample_share: A (T,) numpy array of the fraction of observations in each market t 
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'
        N: An integer giving the total amount of observations

    Returns.
        res: a dictionary containing FKN parameter estimates, standard errors, iterations to convergence, etc.
    '''

    K = x[0].shape[1]

    # Get initial FKN parameters using observed market shares
    theta_init = WLS_init(q_obs, x, sample_share, model, N) 
    
    # If the positive nesting paramters sum to more than 1, then perform a linesearch.
    if (np.array([p for p in theta_init[K:] if p>0]).sum() >= 1):
        theta_star = LineSearch(logit_beta, q_obs, x, sample_share, model, N)
        theta0 = theta_star
    else:
        theta0 = theta_init

    '''logl0 = LogL(theta0, q_obs, x, sample_share, psi, nest_count)'''
    
    # Debiasing iterations
    for k in np.arange(max_iters):
        # Compute iterated FKN parameters and standard errors
        theta1, se1 = WLS(theta0, q_obs, x, sample_share, model, N)

        '''logl1=LogL(theta1, q_obs, x, sample_share, psi, nest_count)
        
        for m in range(10):
            if logl1<logl0:
                theta1=0.5*theta0+0.5*theta1
                logl1=LogL(theta1, q_obs, x, sample_share, psi, nest_count)
            else:
                break'''

        # Check convergence in an appropriate norm
        dist = la.norm(theta1 - theta0)

        if dist<tol:
            succes = True
            iter = k
            break
        elif k==max_iters:
            succes = False
            iter = max_iters
            break
        else:
            None
            
        # Iteration step
        theta0 = theta1

    res = {'theta': theta1,
           'se': se1,
           'fun': -LogL(theta1, y, x, sample_share, model),
           'iter': iter,
           'succes': succes}
    
    return res 

In [None]:
res = FKN_estimator(Logit_beta, y, x, pop_share, Model, N, tol=1.0e-8, max_iters=1000)

[1.1675046701290483, 0.0014929418072264493]


In [None]:
FKN_theta = res['theta']
FKN_SE = res['se']
FKN_t, FKN_p = Similarity_t_p(FKN_SE, FKN_theta, N)
reg_table(FKN_theta, FKN_SE, N, x_vars, nest_vars)

variables,theta,se,t (theta == 0),p
in_out,-10.4593***,0.00342,3057.771,0.0
cy,-0.5007***,0.0017,294.137,0.0
hp,-3.5089***,0.00255,1378.193,0.0
we,0.0686***,0.00162,42.379,0.0
le,-2.2163***,0.00171,1299.239,0.0
wi,5.4956***,0.00318,1727.421,0.0
he,0.342***,0.00197,173.217,0.0
li,-1.0238***,0.00111,918.509,0.0
sp,3.2825***,0.00229,1433.185,0.0
ac,0.7804***,0.00081,958.274,0.0
