# Similarity Model choice probabilities

This notebook introduces computational methods for calculating choice probabilities in the Similarity Model of Fosgerau & Nielsen (2023). the methods will be illustrated using publically available data on the European car market from Frank Verboven's website at https://sites.google.com/site/frankverbo/data-and-software/data-set-on-the-european-car-market.

## Model solution

Suppose we are evaluating the choice probability function $P_t(u|\theta) = \arg\max_{q\in \Delta} q'u(\theta)-\Omega(q|\theta)$, where $\Omega$ is a Similarity Perturbation Function as in ..., at some parameter vector $\theta$. While it is possible to solve for the choice probabilities explicitly by numerical maximization, Fosgerau and Nielsen (2021) suggest a contraction mapping approach which is conceptually simpler. Let $u_t = X_t\beta$ for some parameter vector $\beta \in \mathbb{R}^{K}$, such that $\theta = (\beta', \lambda')'$, and let $q_t^0$ be an initial guess of the choice probabilities, e.g. $q_t^0\propto \exp(X_t\beta)$. Define further

$$
a=\sum_{g:\lambda_g\geq 0} \lambda_g   \qquad b=\sum_{g:\lambda_g<0} |\lambda_g|.
$$

The choice probabilities are then updated iteratively as
$$
q_t^{r} = \frac{e^{v_t^{r}}}{\sum_{j\in \mathcal J_t} e^{v_{tj}^{r}}},
$$
where
$$
v_t^{r} =\ln q_t^{r-1}+\left(u_t-\nabla_q \Omega_t(q^{r-1}_t|\lambda)\right)/(1+b).
$$
The gradient $\nabla_q \Omega_t(q_t|\lambda)$ is easily computed using the formula
$$
\nabla_q \Omega_t(q_t|\lambda) = \Gamma' \ln (\Psi q_t) - \delta + \iota_{J_t}
$$
For numerical stability, it can be a good idea to also do max-rescaling of $v^r_t$ at every iteration. The Kullback-Leibler divergence $D_{KL}(p||q)=p'\ln \frac{p}{q}$ decays linearly with each iteration,
$$
D_{KL}(p_t(\theta)||q_t^{r})\leq \frac{a+b}{1+b}D_{KL}(p_t(\theta)||q^{r-1}_t).
$$
This is implemented in the function "Similarity_ccp" below. 

In [None]:
def Similarity_ccp(Theta, x, model, tol = 1.0e-15, maximum_iterations = 1000):
    '''
    This function finds approximations to the true conditional choice probabilities given parameters.

    Args.
        Theta: a numpy array (K+G,) of parameters
        x: a dictionary of T numpy arrays (J[t],K) of covariates for each market t
        model: a dictionary of the Similarity Model specification as outputted by 'Similarity_specification'
        tol: tolerated approximation error
        maximum_iterations: a no. of maximum iterations which if reached will stop the algorithm

    Output
        q_1: a dictionary of T numpy arrays (J[t],) of Similarity choice probabilities for each market t
    '''

    # Objects in model specification
    psi = model['psi']
    phi = model['phi']

    T = len(x) # Number of markets
    K = x[0].shape[1] # Number of car characteristics

    # Parameters
    Beta = Theta[:K]
    Lambda = Theta[K:]
    G = len(Lambda)  # Number of groups

    # Calculate small b
    C_minus = np.array([True if Lambda[g] < 0 else False for g in np.arange(G)])
    if C_minus.all() == False:
        b = 0
    else:    
        b = np.abs(Lambda[C_minus]).sum() # sum of absolute value of negative lambda parameters.

    # Find the Gamma matrix
    Gamma = Create_Gamma(Lambda, model)

    u = {t: np.einsum('jk,k->j', x[t], Beta) for t in np.arange(T)} # Calculate linear utilities
    q = {t: np.exp(u[t] - u[t].max()) / np.exp(u[t] - u[t].max()).sum() for t in np.arange(T)}
    q0 = q
    Epsilon = 1.0e-10

    for k in range(maximum_iterations):
        q1 = {}
        for t in np.arange(T):
            # Calculate v
            psi_q = psi[t] @ q0[t] # Compute matrix product
            log_psiq =  np.log(np.abs(psi_q) + Epsilon) # Add Epsilon? to avoid zeros in log np.log(np.abs(gamma_q), out = np.NINF*np.ones_like(gamma_q), where = (np.abs(gamma_q) > 0))
            delta = phi[t]@Lambda
            Grad = (Gamma[t].T @ log_psiq) - delta # Compute matrix product
            v = np.log(q0[t] + Epsilon) + (u[t] - Grad)/(1 + b) # Calculate v = log(q) + (u - Gamma^T %o% log(Gamma %o% q) - delta)/(1 + b)
            v -= v.max(keepdims = True) # Do max rescaling wrt. alternatives

            # Calculate iterated ccp q^k
            numerator = np.exp(v)
            denom = numerator.sum()
            q1[t] = numerator/denom

        # Check convergence in an appropriate distance function
        dist = np.max(np.array([np.sum((q1[t]-q0[t])**2/q[t].std()) for t in np.arange(T)])) # Uses logit weights. This avoids precision issues when q1~q0~0.
        
        if dist<tol:
            break
        elif k==maximum_iterations:
            break
        else:
            None
            
        # Iteration step
        q0 = q1

    return q1

In [None]:
beta0 = estimate_logit(q_logit, np.zeros((K,)), y, x, pop_share)['beta']
lambda0 = np.zeros((G,))
theta0 = np.append(beta0, lambda0)
q0 = Similarity_ccp(theta0, x, Model)

### Choice probability distribution 

We may plot the distribution of choice probabilities ordered according to price: