The maximum likelihood estimator is the value that maximises the likelihood function, that is:

$$
\hat{\theta} = \arg\max_{\theta} L(\theta; Y_1, \dots, Y_n) 
\\= \arg\max_{\theta} \left\{ \prod_{i=1}^{n} f(Y_i; \theta) \right\} \text{ given random sampling}
$$

The likelihood function reflects the information in the data {Y1, …, Yn} about the unknown parameter θ.

In our specific case, instead of theta we will maximise the likelihood of alpha and beta simultaneously given the observed data. 

McFadden showed that the probability of choice k by consumer i equals

$$
P_{ik} = \frac{\exp(\delta_{t}^{k})}{\sum_{l=0}^{m_i} \exp(\delta_{t}^{k})}.
$$

where we rewrite the utility of product k from 

$$
U_{t}^{k} = \beta x_{t}^{k} - \alpha p_{t}^{k} + \epsilon_{t}^{k} \quad \text{for } k = 1,2;
$$

given $$ U_{t}^{k} = 0 \quad \text{for } k = 0.$$

to 

$$
U_{t}^{k} = \delta_{t}^{k} + \epsilon_{t}^{k} \quad \text{where } \delta_{t}^{k} = \beta x_{t}^{k} - \alpha p_{t}^{k}.
$$


Therefore, 



## Empirical Estimation

In [2]:
import pandas as pd
import statsmodels.api as sm

# Define the file path
file_path = "/Users/danielseymour/Downloads/EC313_BLP.dta"

# Read the Stata file
df = pd.read_stata(file_path)

# Display the first few rows of the dataset
df.head()

Unnamed: 0,year,firm,share,outshare,size,weighthp,air,mileage,price,delta
0,71.0,15.0,0.001051,0.880106,1.1502,0.528997,0.0,1.888146,4.935802,-6.730022
1,71.0,15.0,0.00067,0.880106,1.278,0.494324,0.0,1.935989,5.516049,-7.180407
2,71.0,15.0,0.000341,0.880106,1.4592,0.467613,0.0,1.716799,7.108642,-7.857304
3,71.0,15.0,0.000522,0.880106,1.6068,0.42654,0.0,1.687871,6.839506,-7.429667
4,71.0,15.0,0.000442,0.880106,1.6458,0.452489,0.0,1.504286,8.928395,-7.595532


In [4]:
df.shape

(2218, 10)

In [9]:
df.describe()

Unnamed: 0,year,firm,share,outshare,size,weighthp,air,mileage,price,delta
count,2217.0,2217.0,2217.0,2217.0,2217.0,2217.0,2217.0,2217.0,2218.0,2217.0
mean,81.54037,13.743798,0.0009732483,0.892779,1.310159,0.394375,0.241768,2.08487,11.756117,-7.550388
std,5.740816,6.25909,0.001144919,0.013642,0.237637,0.096643,0.428251,0.698056,8.645435,1.381634
min,71.0,1.0,7.01413e-07,0.871395,0.756,0.170455,0.0,0.84607,0.0,-14.048878
25%,77.0,8.0,0.000186254,0.879859,1.131279,0.336585,0.0,1.557015,6.711497,-8.483488
50%,82.0,16.0,0.000569844,0.891802,1.269827,0.375049,0.0,2.010417,8.728322,-7.356074
75%,87.0,19.0,0.001285526,0.907801,1.4527,0.427509,0.0,2.482848,13.056321,-6.542033
max,90.0,26.0,0.00947277,0.918871,1.888,0.947581,1.0,6.436827,68.596774,-4.521674


In [6]:
df.isna().sum()

year        1
firm        1
share       1
outshare    1
size        1
weighthp    1
air         1
mileage     1
price       0
delta       1
dtype: int64

What is delta?

Delta is the mean utility of a product calculated by taking the log market share of the good minus the log market share of the outside option

In [7]:
# First specification: include all variables

# Stata equivalent: reg delta p xvars,vce(r)
# Note that vce(r) or vce(robust) instructs Stata to compute heteroskedasticity-consistent standard errors

# Dependent variable
y = df['delta']

# Define the independent variables (all product characteristics including price)
X = df[['price', 'size', 'weighthp', 'air', 'mileage']]

# Remove rows with NaN or infinite values
X = X.replace([float('inf'), float('-inf')], pd.NA).dropna()
y = y.loc[X.index]  # Ensure y matches the cleaned X

# Add a constant term to the independent variables
X = sm.add_constant(X)

# Fit the OLS regression model with robust standard errors
model = sm.OLS(y, X).fit(cov_type='HC1')

# Display the regression results
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  delta   R-squared:                       0.387
Model:                            OLS   Adj. R-squared:                  0.386
Method:                 Least Squares   F-statistic:                     295.4
Date:                Tue, 25 Feb 2025   Prob (F-statistic):          1.57e-242
Time:                        14:07:07   Log-Likelihood:                -3319.4
No. Observations:                2217   AIC:                             6651.
Df Residuals:                    2211   BIC:                             6685.
Df Model:                           5                                         
Covariance Type:                  HC1                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        -10.0716      0.258    -39.102      0.0

In [8]:
# Second specification: all variables except air conditioning

# Stata equivalent: reg delta p xvars,vce(r)
# Note that vce(r) or vce(robust) instructs Stata to compute heteroskedasticity-consistent standard errors

# Dependent variable
y = df['delta']

# Define the independent variables (all product characteristics including price)
X = df[['price', 'size', 'weighthp', 'mileage']]

# Remove rows with NaN or infinite values
X = X.replace([float('inf'), float('-inf')], pd.NA).dropna()
y = y.loc[X.index]  # Ensure y matches the cleaned X

# Add a constant term to the independent variables
X = sm.add_constant(X)

# Fit the OLS regression model with robust standard errors
model = sm.OLS(y, X).fit(cov_type='HC1')

# Display the regression results
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  delta   R-squared:                       0.387
Model:                            OLS   Adj. R-squared:                  0.386
Method:                 Least Squares   F-statistic:                     362.6
Date:                Tue, 25 Feb 2025   Prob (F-statistic):          2.92e-240
Time:                        14:07:22   Log-Likelihood:                -3319.5
No. Observations:                2217   AIC:                             6649.
Df Residuals:                    2212   BIC:                             6677.
Df Model:                           4                                         
Covariance Type:                  HC1                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        -10.0545      0.257    -39.127      0.0

## Results

First specification with all variables:
Price, size and mileage are statistically significant. Air conditioning and weight/hp aren't.


Second specification with all variables except air conditioning:
Weight/hp remains statistically insignificant.


All coefficients pass the sense check for being in the right direction. R^2 is the same between the two regressions and extremely high. 

Next, we want to calculate the own- and cross-price elasticities for firms 15 (A) and 16 (B) in 1971 for the two specifications considered.

We have already verified that 
verify the formulas for price elasticities in the multinomial logit model 


\[
\eta_{jk} = -\alpha p_k (1_{\{j=k\}} - s_k)
\]
