# Modeling Demand for Cars with Conditional Logit 

In this problem set, you will replicate part of the results in
Brownstone and Train (1999). You will estimate the conditional logit
model given the available data.
The core of the exercise will be to fill out the file `clogit.py`. 


In [1]:
import numpy as np
from numpy import random
from scipy.stats import norm
from scipy.stats import genextreme
import pandas as pd 

%load_ext autoreload
%autoreload 2

import clogit 
import estimation as est

Data
====

The data consists of a survey of households regarding their preferences
for car purchase. Each household was given 6 options, but the
characteristics that the respondents were asked about was varied. The
surveys were originally conducted in order to illicit consumer
preferences for alternative-fuel vehicles. The data is *stated
preferences*, in the sense that consumers did not actually buy but just
stated what they would hypothetically choose, which is of course a
drawback. This is very common in marketing when historic data is either
not available or does not provide satisfactory variation. The advantage
of the stated preference data is therefore that the choice set can be
varied greatly (for example, the characteristics includes the
availability of recharging stations, which is important for purchase of
electric cars).

The data you will use has $N=4654$ respondents with $J=6$ cars to choose
from. For each household, we let $\mathcal{J}_{i}$ denote the set of
available cars.

If you load the csv-file, `car_data.csv`, you will get a dataframe with 
$NJ = 27,924$ rows. The column `person_id` runs through $0,2,...,N-1$, and
the column `j` is the index for the car, $\{0,1,...,5\}$. The variable 
`binary_choice` is a dummy, =1 for the car chosen by the respondent. 
A conveneint other variable, `y`, is the index for that car, repeated 
and identical for all $J$ rows for each person. The x-variables describe 
the characteristics of the 6 cars that the respondent was asked to choose 
from. 

We will also read in the dataset `car_labels.csv`, which contains the 
variable labels and descriptions for all the variables. 
The lists `x_vars` and `x_lab` will be used throughout as the list of 
explanatory variables we want to work with. 

In order to get the data into a 3-dimensional array, we will access 
the underlying numpy arrays and resize them. For example 

> `x = dat[x_vars].values.resize((N,J,K))`

Note that this will only work because the data is sorted according to 
first `person_id` and then `j`. You can test this by verifying that 
`x[0,:,k]` prints the same as `dat.loc[dat.person_id == 0, x_vars[k]]`. 

In [2]:
lab = pd.read_csv('car_labels.csv', index_col='variable')
lab

Unnamed: 0_level_0,label,description
variable,Unnamed: 1_level_1,Unnamed: 2_level_1
person_id,Person ID,Person identifier
rownum,Row num,Row number in the dataset
binary_choice,Binary choice,"Dummy, =1 if this row is the car that was chosen"
price_to_inc,Price/ln(income),"Purchase price in thousands of dollars, divide..."
range,Range,Hundreds of miles that the vehicle can travel ...
acceleration,Acceleration,"Seconds required to reach 30 mph from stop, in..."
top_speed,Top speed,"Highest speed that the vehicle can attain, in ..."
pollution,Pollution,Tailpipe emissions as fraction of comparable n...
size,Size,"0""mini, 0.1""subcompact, 0.2""compact, 0.3""mid-s..."
big_enough,Big enough,1 if household size is over 2 and vehicle size...


In [3]:
# variables to use as explanatory variables
x_lab  = list(lab.iloc[3:-4].label.values) # labels 
x_vars = list(lab.iloc[3:-4].index.values) # variable names 

In [4]:
dat = pd.read_csv('car_data.csv')
N = dat.person_id.nunique()
J = dat.j.nunique()
K = len(x_vars)

## Scaling variables

Logit is most stable numerically if we ensure that variables are scaled near to $\pm 1$. 

In [5]:
dat['range'] = dat['range'] / 100
dat['top_speed'] = dat['top_speed'] / 100

# scaling by 10 might be overkill 
dat['size'] = dat['size'] / 10
dat['acceleration'] = dat['acceleration'] / 10
dat['operating_cost'] = dat['operating_cost'] / 10

In [6]:
y = dat['y'].values.reshape((N,J))
y = y[:, 0] # all J elements are identical along axis=1
x = dat[x_vars].values.reshape((N,J,K))

# Question 1: 3-dimensional arrays in Numpy

The explanatory variables are $x_{ijk}$, where $i$ denotes individuals,
$j$ cars, and $k$ car attributes. We will be working with these as 3d arrays, so let's do some light warmup. 

Which of the following commands shows all the price-to-log-income
    values for the cars in the choiceset of individual 1 (and what does
    the other commands give you?).

    a:   x[:,1,1]

    b:   x[1,:,1]

    b:   x[1,1,:]

In [7]:
# Try to index, and make yourself familiar with a three dimensional array.

## Question 1b

Create a 3-dimensional matrix `A` that has dimensions $4 \times 3 \times 2$, and fill it with random draws from a normal distribution. What happens if you matrix multiply this by the $2 \times 1$ vector `1 2`? From "common" linear algebra knowledge, what do you expect the dimension of the new matrix is?

In [8]:
rng = random.default_rng(seed=42)
A = rng.normal(size=(4, 3, 2))
A

array([[[ 0.30471708, -1.03998411],
        [ 0.7504512 ,  0.94056472],
        [-1.95103519, -1.30217951]],

       [[ 0.1278404 , -0.31624259],
        [-0.01680116, -0.85304393],
        [ 0.87939797,  0.77779194]],

       [[ 0.0660307 ,  1.12724121],
        [ 0.46750934, -0.85929246],
        [ 0.36875078, -0.9588826 ]],

       [[ 0.8784503 , -0.04992591],
        [-0.18486236, -0.68092954],
        [ 1.22254134, -0.15452948]]])

In [9]:
B = A @ [1, 2]
print(B)

[[-1.77525113  2.63158063 -4.5553942 ]
 [-0.50464478 -1.72288901  2.43498185]
 [ 2.32051311 -1.25107558 -1.54901442]
 [ 0.77859848 -1.54672145  0.91348237]]


In [10]:
a = A.reshape(12, 2)
b = (a @ [1, 2]).reshape(4, 3)
print(b)
print()
print((B==b).all())

[[-1.77525113  2.63158063 -4.5553942 ]
 [-0.50464478 -1.72288901  2.43498185]
 [ 2.32051311 -1.25107558 -1.54901442]
 [ 0.77859848 -1.54672145  0.91348237]]

True


## 1c: Testing it on the data 
Next, we want to compute $\sum_{k=1}^{K}x_{ijk}\theta_{k}$ using linear algebra. Our  `x` matrix has three dimensions, $N\times J\times K$, and $\theta$ is $K\times1$. Does this behave as expected when using numpy?

In [11]:
theta = 0.0 * np.ones((K,))
(x @ theta).shape

(4654, 6)

# Question 2: Simulating data

Finish `simulate_dataset(N,J,theta)` and simulate a dataset. It must return `(y,x)`, where `x` has dimensions ($N\times J\times K$) and `y` ($N\times1$).

The data generating process should be: 
$$u_{ij} = \mathbf{x}_{ij}\theta + \varepsilon_{ij},\quad\varepsilon_{ij}\sim\text{IID Extreme Value Type I},$$

where we will draw the regressors as $x_{ijk} \sim \mathcal{N}(0,1)$ IID for all $(i,j,k)$. The outcome is then $y_i = \arg\max_{j} u_{ij}$. 

***Hints:*** <br>
* Draw `x` as `(N,J,K)` standard normals, 
* Draw `e` ($\varepsilon$) using `genextreme.ppf(uni, c=0)` (the inverse CDF), were `uni` should be an `(N,J)` matrix of uniform draws (`c=0` gives us *Type I*, as needed), 
* To find the argmax over rows, use `np.argmax(axis=1)`. Verify that `y` is `(N,)`. 


In [12]:
N = 1000
K = 3
J = 4
thetaTrue = np.array([1, 2, 3])

In [13]:
# Finish the clogit.sim_data function.
np.random.seed(1337)
y,x = clogit.sim_data(N, thetaTrue, J)

In [14]:
theta0 = clogit.starting_values(y, x)

# Question 3: choice probabilities 

Finish the functions 
* `starting_values(y,x)`: Just return a $K$-vector of zeros *(what probabilities will arise from this, do you think?)*
* `util(theta, x)`: The deterministic part of utility, max-rescaled: 
$$ v_{ij} = \mathbf{x}_{ij} \theta - K_i, \quad K_i = \max_{\ell \in \{1,...,J\}}  \mathbf{x}_{i\ell} \theta. $$ 
Subtracting $K_i$ is called "max-rescaling". 
* `ccp(theta, x)`: the $N\times J$ matrix of choice probabilities, given by 
$$ \Pr(y_i = j | \mathbf{x_i}; \theta) = \frac{\exp(v_{ij})}{\sum_{k=1}^J \exp(v_{ik})}. $$ 
Note that if we add a scalar, $K$, to *all* $i$'s utility indices, the choice probabilities are analytically identical (check the algebra!). 

***Hints:*** 
* If `x` has shape `(N,J,K)`, then the matrix product `x @ theta` will return an `(N,J)` matrix. 
* Use `u.max(axis=1)` on an `(N,J)` matrix `u` to compute the max over the rows. 
    * Further, use `u.max(axis=1, keepdims=True)` to get an `(N,1)` vector out, rather than an `(N,)` vector. 

In [15]:
u = clogit.util(theta0, x)
(u == 0.0).all()

True

In [16]:
ccp = clogit.choice_prob(theta0, x)
(ccp == 0.25).all()

True

# Question 4: Criterion 

Code up the functions 
* `loglike(theta, y, x)`: loglikelihood contribution: $$ \ell_i (\theta) = \log \Pr( \color{red}{y_i} | \mathbf{x}_i;\theta) = v_\color{red}{y_i} - \log \left[ \sum_{j=1}^J \exp(\mathbf{x}_{ij} \theta) \right], $$ 
where $\color{red}{y_i} \in \{0,...,J-1\}$ is the *outcome* for individual $i$ (whereas we earlier used $y_i$ to represent the stochastic variable that is the outcome chosen by $i$, in a mildly abusive notation.)
* `q(theta, y, x)`: the negative loglikelihood criterion, and the function we will give to `est.estimate()`. 

***Hint:*** if you have an `(N,J)` matrix, `v`, and an `(N,)` array of numbers in `0,...,N-1`, you can do it two way: 
```Python
# in a (slow) loop
ll = np.empty((N,))
for i in range(N): 
    ll[i] = v[i, y[i]]

# or faster
ll = v[range(N), y]
```

In [17]:
res = est.estimate(clogit.q, theta0, y, x)

Optimization terminated successfully.
         Current function value: 0.454249
         Iterations: 14
         Function evaluations: 60
         Gradient evaluations: 15


In [18]:
tab = pd.DataFrame({v:res[v] for v in ['theta', 'se', 't']}, index=thetaTrue)
tab.index.name = 'theta_true'
tab

Unnamed: 0_level_0,theta,se,t
theta_true,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1.132555,0.081938,13.822098
2,2.257784,0.125319,18.016352
3,2.95431,0.160174,18.444364


Expected output:

Optimization terminated successfully.
         Current function value: 0.454249
         Iterations: 14
         Function evaluations: 60
         Gradient evaluations: 15

|   theta_true |   theta |       se |       t |
|-------------:|--------:|---------:|--------:|
|            1 | 1.13255 | 0.081938 | 13.8221 |
|            2 | 2.25778 | 0.125319 | 18.0164 |
|            3 | 2.95431 | 0.160174 | 18.4444 |

# Question 5: Estimation on real data

Estimate the 21 parameters of the model. Check that you are able to
    find the same estimates and standard errors as in Brownstone and
    Train (1998; p. 121).



In [19]:
N = dat.person_id.nunique()
J = dat.j.nunique()
K = len(x_vars)

y = dat['y'].values.reshape((N,J))
y = y[:, 0] # all J elements are identical along axis=1
x = dat[x_vars].values.reshape((N,J,K))

In [20]:
bt_res = [-.185 , .350 , -.716 , .261 , -.444 , .935 , .143 , .501 , -.768 , .413 , .820 , .637 , -1.437 , -1.017 , -.799 , -.179 , .198 , .443 , .345 , .313 , .228]

In [21]:
tab_bt98 = pd.DataFrame(bt_res, index=x_lab, columns=['Estimate (B&T 1998)'])
tab_bt98

Unnamed: 0,Estimate (B&T 1998)
Price/ln(income),-0.185
Range,0.35
Acceleration,-0.716
Top speed,0.261
Pollution,-0.444
Size,0.935
Big enough,0.143
Luggage space,0.501
Operating cost,-0.768
Station availability,0.413


In [22]:
theta0 = np.array(bt_res)
theta0 = np.zeros((K,))

In [23]:
res = est.estimate(clogit.q, theta0, y, x)

Optimization terminated successfully.
         Current function value: 1.588275
         Iterations: 86
         Function evaluations: 1914
         Gradient evaluations: 87


In [24]:
tab = pd.DataFrame({v:res[v] for v in ['theta', 'se', 't']}, index=x_lab)
tab['B&T 1998'] = tab_bt98
tab['diff_small'] = tab['theta']/tab['B&T 1998']-1.0 < 0.01 
tab

Unnamed: 0,theta,se,t,B&T 1998,diff_small
Price/ln(income),-0.185425,0.0272,-6.817203,-0.185,True
Range,0.350097,0.026933,12.998902,0.35,True
Acceleration,-0.716186,0.110534,-6.47934,-0.716,True
Top speed,0.261102,0.079773,3.273062,0.261,True
Pollution,-0.444102,0.100115,-4.435929,-0.444,True
Size,0.934357,0.311053,3.003855,0.935,True
Big enough,0.143347,0.075909,1.8884,0.143,True
Luggage space,0.502492,0.188359,2.667735,0.501,True
Operating cost,-0.767961,0.073376,-10.466119,-0.768,True
Station availability,0.413064,0.096546,4.278434,0.413,True


In [25]:
print(f'Did we get close for all parameters? {tab.diff_small.all()}')

Did we get close for all parameters? True


# Question 6: Price elasticities

> Compute the own-price and cross-price elasticities for all observations $i$ and cars $j$.

As with standard binary probit or logit, the parameter estimates are not
interesting in and of themselves. Instead, we want to compute
*elasticities* that we can interpret and answer interesting questions
with. To do this, recall that an elasticity takes the form

$$\mbox{elasticity}=\frac{\mathrm{dy}}{y}\frac{x}{\mathrm{d}x}.$$ 
This
formula is useful for computing elasticities numerically: we can
increase $x$ by some small amount, say $10^{-5}$, and measure the change
in $y$. The elasticity is then the relative change in $y$ divided by the
relative change in $x$: $\frac{\Delta y/y}{\Delta x/x}$, or equivalently

$$\text{numerical elasticity}=\frac{\text{pct. change in }y}{\text{pct. change in }x}. \tag{1}$$


***Hint:*** One suggested approach would be: 
1. Evaluate conditional choice probabilities (CCPs) in the baseline setting, `ccp0`, 
2. for each car `j = 0, ..., 5` do: 
    1. make a copy of the dataset, `x2`, 
    2. increase the price  (`k=0`) of car `j` by `h` percent in `x2`, 
    3. evaluate CCPs for `x2`, call it `ccp2`
    4. compute the percent change in CCPs, 
    5. compute the own-price elasticity directly using (1), and save the average elasticity among all cars other than `j` as the cross-price elasticity (averaged among other cars). 

In [26]:
thetahat = res['theta']

In [27]:
# Original choice probabilites
ccp1 = clogit.choice_prob(thetahat, x)

In [28]:
E_own   = np.zeros((N, J))
E_cross = np.zeros((N, J))
k_price = 0 

for j in range(J):
    # A. copy 
    x2 = x.copy()
    
    # B. increase price just for car j 
    rel_change_x = 1e-3
    x2[:, j, k_price] *= (1.0+rel_change_x)
    
    # C. evaluate CCPs
    ccp2 = clogit.choice_prob(thetahat, x2)
    
    # D. percentage change in CCPs 
    rel_change_y = ccp2 / ccp1 - 1.0 
    
    # E. elasticities 
    elasticity = rel_change_y / rel_change_x 
    
    E_own[:, j] = elasticity[:, j]
    
    k_not_j = [k for k in range(J) if k != j]
    E_cross[:, j] = elasticity[:, k_not_j].mean(axis=1)

In [29]:
print(f'Own-price elasticity:  {np.mean(E_own).round(4)}')
print(f'Cross-price elasticity: {np.mean(E_cross).round(4)}')

Own-price elasticity:  -0.652
Cross-price elasticity: 0.1278


### Are Electric Vehicles (EVs) different? 

A lot of policy debate these days is about (EVs). Compare whether demand for EVs is more or less price-sensitive. 

In [30]:
i_ev = 15
x_vars[i_ev] # check that we found the right one 

'ev'

In [31]:
# Create two indexed, from where idx1 is for electric cars
# and idx0 is for non-electric cars.
idx1 = x[:, :, i_ev]==1
idx0 = x[:, :, i_ev]==0 
print(f'Elasticity, EVs:   {np.mean(E_own[idx1]).round(4)}')
print(f'Elasticity, other: {np.mean(E_own[idx0]).round(4)}')

Elasticity, EVs:   -0.7149
Elasticity, other: -0.6311
