# ECON 8601 - IO
## Homework 3 - BLP Exercises
### Xiang Liu
#### Department of Economics, University of Minnesota

This notebook contains Berry-Levinsohn-Pakes (BLP) exefcises conducted in BLP (1995) including the logit model (a naive model of demand without the interaction between individual and product characteristics) and the model with interaction without and with BLP instruments. 

Additionally, where I use BLP instruments I add Gandhi-Houde (GH) instruments that introduced in Gandhi \& Houde (2020) and compare the results.

Please note that, this notebook only has codes that is intuitive to understand, but not performs very fast in terms of computational perspective. I'm working on a JIT (Just-In-Time) compilation in Python and a Julia version, which I expect both will perform much better. 

In [1]:
# pip install numpy scipy pandas statsmodels scikit-learn matplotlib seaborn jax autograd


In [2]:
# Imports for numerical operations and statistics
import numpy as np
import scipy.stats as stats
import pandas as pd

# Advanced data structures
import collections

# Threading and multiprocessing
from concurrent.futures import ThreadPoolExecutor

# Statistical models, GLM, and formula API
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Optimization
from scipy import optimize

# Automatic differentiation with autograd (choose one based on preference)
# For basic use:
import autograd.numpy as anp
from autograd import grad

# For more advanced use, including GPU support:
import jax.numpy as jnp
from jax import grad

import numba as nb
from numba import jit



Importing data:

In [3]:

df = pd.read_csv("cardata.txt", delimiter='\t')
print(df.head())


  vehicle_name  year  horsepower_weight  length_width  ac_standard  \
0       AD100L    71           0.487908        1.2627            0   
1       ADSUPE    71           0.465766        1.1136            0   
2       AMAMBS    71           0.452489        1.6458            0   
3       AMGREM    71           0.528997        1.1502            0   
4       AMHORN    71           0.494324        1.2780            0   

   miles_per_dollar     price  market_share  firmid  
0          1.982720  8.876543      0.000281       7  
1          2.201909  7.641975      0.000038       7  
2          1.504286  8.928395      0.000442      15  
3          1.888146  4.935803      0.001051      15  
4          1.935989  5.516049      0.000670      15  


## Exercise 1 - the Logit Model

Assume utility is given by 
$$u_{ij} = \alpha (y_i - p_j) + x_j\beta + \epsilon_{ij}, \forall j = 0, 1, \cdots, J_t$$
with $\epsilon_{ij}$ type 1 extreme value, $y_i$ income, $x_j, p_j$ observed characteristics, and parameters $\beta, \alpha$.

Derive the appropriate likelihood function for the aggregated data by starting with the micro-level specification and regrouping. 


The utility $\delta_j$ for a product $j$ based on its characteristics $X$ and price $P$ is given by:

$$\delta_j(X, P, \beta, \alpha) = \beta \cdot X - \alpha P$$

The market share $s_j$ for a product $j$, derived from the utility, is calculated as:

$$s_j = \frac{e^{\delta_j}}{1 + \sum_{k} e^{\delta_k}}$$

Given observed market shares $\hat{s}_j$, the objective function for likelihood maximization is:

$$\mathcal{L}(\beta, \alpha) = \sum_{j} \log(s_j) \cdot \hat{s}_j$$

where $\mathcal{L}(\beta, \alpha)$ is to be maximized with respect to $\beta$ and $\alpha$.

Now, in order to do the following exercises, we convert these calculations into functions: 

In [4]:

def δ_j(X, P, β, α):
    """
    Calculate the value based on characteristics, price, and their respective coefficients.
    
    Parameters:
    X (np.array): Vector of characteristics.
    P (float): Price.
    β (np.array): Coefficients for characteristics.
    α (float): Coefficient for price.
    
    Returns:
    float: Calculated value.
    """
    return np.dot(β, X) - α * P


In [5]:
from collections import OrderedDict

def s_vec(df):
    """
    Calculate the market share for all products based on δ values.
    
    Parameters:
    df (pd.DataFrame): DataFrame containing product data and δ values.
    
    Returns:
    np.array: Array of market shares for all products.
    """
    markets = df['year'].unique()
    results = OrderedDict()
    
    for j in markets:
        # Filter df for the current market
        market_df = df[df['year'] == j]
        δ = market_df['δ_vec'].values
        market_share = np.exp(δ) / (1 + np.sum(np.exp(δ)))
        results[j] = market_share
    
    # Concatenate the market shares for all markets into a single array
    all_market_shares = np.concatenate(list(results.values()))
    return all_market_shares


In [6]:
def obj_func(s_j, s_hat_j):
    """
    Calculate the product of the logarithm of estimated market share and observed market share.
    
    Parameters:
    s_j (float): Estimated market share from s_vec.
    s_hat_j (float): Observed market share for product j.
    
    Returns:
    float: Result of the calculation, objective function.
    """
    return np.log(s_j) * s_hat_j


In [7]:
def compute_δ_vec(data, β, α, characteristics, constant=False):
    """
    Compute utility for each vehicle.
    
    Parameters:
    data (pd.DataFrame): DataFrame containing vehicle data.
    β (np.array): Coefficients for characteristics.
    α (float): Coefficient for price.
    characteristics (list of str): Column names in 'data' corresponding to characteristics.
    constant (bool): If True, adds a constant term to the characteristics vector.
    
    Returns:
    list: A list of utility values (δ_vec) for each vehicle.
    """
    δ_vec = []
    for i in range(len(data)):
        # Extract characteristics and price for each vehicle
        X = [data.iloc[i][c] for c in characteristics]
        if constant:
            X = [1.0] + X
        P = data.iloc[i]['price']

        # Calculate utility and append to δ_vec
        δ_vec.append(δ_j(X, P, β, α))
    return δ_vec


In [8]:
def total_log_likelihood(β, α, data, characteristics):
    """
    Compute the total log likelihood for the given parameters and data.
    
    Parameters:
    β (np.array): Coefficients for characteristics.
    α (float): Coefficient for price.
    data (pd.DataFrame): DataFrame containing the data, must include 'market_share'.
    characteristics (list of str): Column names in 'data' corresponding to characteristics.
    
    Returns:
    float: The total log likelihood.
    """
    total_ll = 0.0
    
    δ_vec = compute_δ_vec(data, β, α, characteristics)
    data['δ_vec'] = δ_vec  # Add δ_vec as a new column to the DataFrame
    
    s_vec_estimated = s_vec(data)  # 
    
    # Sum over j, products
    for i in range(len(data)):
        total_ll += obj_func(s_vec_estimated[i], data.iloc[i]['market_share'])
        
    return total_ll


## Exercise 2 - MLE to the Logit Model

Using the automobile data, estimate the logit demand specification using maximum likelihood and assuming prices are exogenous. What is the implied own-price elaticity of the 1990 Honda Accord (HDACCO)? What is the implied cross-elasticity of the Honda Accord with respect to the 1990 Ford Escort (FDESCO)? Pick two addtional cars and report the same numbers. 

### Slow codes

In [9]:
from scipy.optimize import minimize

# # Initial parameter guesses
# initial_β = np.array([0.1, 0.5, 0.3, 0.2])
# initial_α = 0.4
# characteristics = ['horsepower_weight', 'length_width', 'ac_standard', 'miles_per_dollar']

# # Combine initial β and α into a single array for optimization
# initial_params = np.concatenate([initial_β, [initial_α]])

# # Define a wrapper function for the total_log_likelihood to fit minimize function requirements
# def optimization_wrapper(params):
#     β = params[:-1]
#     α = params[-1]
#     # Assuming df is defined and contains the necessary data
#     return -total_log_likelihood(β, α, df, characteristics)

# # Optimization
# result = minimize(optimization_wrapper, initial_params)

# # Extract optimized parameters
# optimized_parameters = result.x
# optimized_β = optimized_parameters[:-1]
# optimized_α = optimized_parameters[-1]

# print("Optimized β: ", ", ".join([f"{b:.4f}" for b in optimized_β]))
# print("Optimized α: {:.4f}".format(optimized_α))


In [10]:

import numdifftools as nd
from numpy.linalg import inv

# # Assuming total_log_likelihood is defined and optimized_parameters are obtained from the optimization process

# def neg_total_log_likelihood(params):
#     β = params[:-1]
#     α = params[-1]
#     # Assuming df and characteristics are defined and accessible here
#     return -total_log_likelihood(β, α, df, characteristics)

# # Calculate the Hessian matrix of the negative log likelihood at the optimized parameters
# H = nd.Hessian(neg_total_log_likelihood)(optimized_parameters)

# # Invert the Hessian to get the covariance matrix, then take the diagonal (variances) and square root (standard errors)
# se = np.sqrt(np.diag(inv(H)))

# # Format and print the standard errors for β and α
# se_β = ", ".join([f"{b:.4f}" for b in se[:-1]])
# se_α = f"{se[-1]:.4f}"

# print("SE β: ", se_β)
# print("SE α: ", se_α)


### Make them a slightly faster if we combine them together:

In [11]:

# Initial parameter setup
initial_β = np.array([0.1, 0.5, 0.3, 0.2])
initial_α = 0.4
characteristics = ['horsepower_weight', 'length_width', 'ac_standard', 'miles_per_dollar']
initial_params = np.concatenate([initial_β, [initial_α]])

# Optimization wrapper function
def optimization_wrapper(params):
    β = params[:-1]
    α = params[-1]
    return -total_log_likelihood(β, α, df, characteristics)

# Perform optimization
result = minimize(optimization_wrapper, initial_params)

# Extract optimized parameters
optimized_parameters = result.x
optimized_β = optimized_parameters[:-1]
optimized_α = optimized_parameters[-1]

# Hessian and SE calculation
H = nd.Hessian(optimization_wrapper)(optimized_parameters)
se = np.sqrt(np.diag(inv(H)))

# Display results
print("Optimized β: ", ", ".join([f"{b:.4f}" for b in optimized_β]))
print("Optimized α: {:.4f}".format(optimized_α))
print("SE β: ", ", ".join([f"{b:.4f}" for b in se[:-1]]))
print("SE α: {:.4f}".format(se[-1]))


Optimized β:  -0.0437, 1.8839, 0.1797, 0.1834
Optimized α: 0.1357
SE β:  11.0999, 3.9349, 3.1170, 1.9030
SE α: 0.3270


### Elasticities:

In [12]:
def elasticity(df, car1, car2, year, α_hat):
    """
    Calculate own and cross price elasticities for any two given vehicles.
    
    Parameters:
    df (pd.DataFrame): DataFrame containing vehicle data.
    car1 (str): Vehicle name for which to calculate own and cross elasticity.
    car2 (str): Vehicle name for cross elasticity calculation.
    year (int): The year to filter the vehicles by.
    α_hat (float): Estimated coefficient for price.
    
    Returns:
    tuple: Own and cross price elasticities.
    """
    # Filter the DataFrame for the specific vehicles and year, 
    # and extract market share and price, as exogenously given
    
    s_j = df[(df['vehicle_name'] == car1) & (df['year'] == year)]['s'].iloc[0]
    s_k = df[(df['vehicle_name'] == car2) & (df['year'] == year)]['s'].iloc[0]
    p_j = df[(df['vehicle_name'] == car1) & (df['year'] == year)]['price'].iloc[0]
    p_k = df[(df['vehicle_name'] == car2) & (df['year'] == year)]['price'].iloc[0]
    
    # Calculate own and cross price elasticities
    elas_own = p_j / s_j * (-α_hat) * s_j * (1 - s_j)
    elas_cros = p_k / s_j * α_hat * s_k * s_j
    
    # Print the results
    print(f"{car1} Own elasticity: {elas_own:.9f}")
    print(f"{car1} {car2} Cross elasticity: {elas_cros:.9f}")
    
    return elas_own, elas_cros



In [13]:
# Extract optimized β and α from optimized_parameters
β_hat = optimized_parameters[:-1]  
α_hat = optimized_parameters[-1]

# Compute δ_vec using the compute_δ_vec function with the optimized parameters
δ_vec = compute_δ_vec(df, β_hat, α_hat, characteristics)

# Assign δ_vec to a new column in the DataFrame
df['δ_vec'] = δ_vec  

# Compute estimated s_vec using the s_vec function
s_vec_estimated = s_vec(df)

# Assign s_vec_estimated to a new column in the DataFrame
df['s'] = s_vec_estimated


Own-price elasticity 
$$ \frac{\partial s_j}{\partial p_j} \times \frac{p_j}{s_j} = \frac{p_j}{s_j} \times [-\alpha s_j (1-s_j)] $$

Cross-price elasticity 
$$ \frac{\partial s_j}{\partial p_k} \times \frac{p_k}{s_j}=\frac{p_k}{s_j} \times \alpha s_j s_k $$

In [14]:
# Compute elasticity for HDACCO and FDESCO
elas_own, elas_cros = elasticity(df, "HDACCO", "FDESCO", 90, α_hat)

# Compute elasticity for CRDYNA and CRLANC
elas_own, elas_cros = elasticity(df, "CRDYNA", "CRLANC", 89, α_hat)

# Compute elasticity for CHSUMM and CHTC
elas_own, elas_cros = elasticity(df, "CHSUMM", "CHTC", 89, α_hat)


HDACCO Own elasticity: -1.248946891
HDACCO FDESCO Cross elasticity: 0.011376004
CRDYNA Own elasticity: -1.333202079
CRDYNA CRLANC Cross elasticity: 0.011559278
CHSUMM Own elasticity: -1.012474398
CHSUMM CHTC Cross elasticity: 0.003007558


## Exercise 3 - Linear Regression of the Logit Model

Estimate the logit demand specification using the linearized version of this model from BLP (i.e. regress $\log (s_j - s_0)$ on characteristics). What is the implied own-prince elaticity of the 1990 Honda Accord (HDACCO)? What is the implied cross-elasticity of the Honda Accord with respect to the 1990 Ford Escort (FDESCO)? Pick two addtional cars and report the same numbers. 

In [15]:
# Group the DataFrame by 'year' and calculate s_0 for each group/year
df['s_0'] = df.groupby('year')['market_share'].transform(lambda x: 1 - x.sum())



In [16]:

# Prepare X: Add a column of ones to df for the intercept, then select the required columns
X = np.hstack([np.ones((df.shape[0], 1)), df[['horsepower_weight', 'length_width', 'ac_standard', 'miles_per_dollar', 'price']].values])

# Create the vector Y
Y = np.log(df['market_share'].values) - np.log(df['s_0'].values)

# Manual OLS regression using the Normal Equation to compute β
β_manual = np.linalg.inv(X.T @ X) @ X.T @ Y

# Extract β_hat and α_hat from the manual calculation
β_hat_manual = β_manual[:-1]
α_hat_manual = -β_manual[-1]

# Use statsmodels to fit the model and extract standard errors
model = sm.OLS(Y, X)
OLS_results = model.fit()

# Standard errors
se_β = OLS_results.bse[:-1]  # Standard errors for β
se_α = OLS_results.bse[-1]  # Standard error for α

# Printing results
print("β_hat: ", ", ".join([f"{b:.4f}" for b in β_hat_manual]))
print("α_hat: {:.4f}".format(α_hat_manual))
print("SE β: ", ", ".join([f"{se:.4f}" for se in se_β]))
print("SE α: {:.4f}".format(se_α))


β_hat:  -10.0716, -0.1242, 2.3421, -0.0343, 0.2650
α_hat: 0.0886
SE β:  0.2529, 0.2773, 0.1252, 0.0728, 0.0431
SE α: 0.0040


In [17]:
# Display the summary from statsmodels
print("\nStatsmodels Summary:")
print(OLS_results.summary())

# Extract β_hat and α_hat from statsmodels results for comparison
    # Do not skip the constant term for β 
    # since it's useful to calculate the elasticities
β_hat = OLS_results.params[:-1]  


α_hat = -OLS_results.params[-1]



Statsmodels Summary:
                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.387
Model:                            OLS   Adj. R-squared:                  0.386
Method:                 Least Squares   F-statistic:                     279.2
Date:                Sat, 09 Mar 2024   Prob (F-statistic):          6.52e-232
Time:                        20:30:09   Log-Likelihood:                -3319.3
No. Observations:                2217   AIC:                             6651.
Df Residuals:                    2211   BIC:                             6685.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -10.0716      0.2

### Elasticities

In [18]:
# Step 1: Compute δ vector and update df
δ_vec = compute_δ_vec(df, β_hat, α_hat, characteristics, constant=True)
df['δ_vec'] = δ_vec  # Directly assign the computed δ vector to a new column

# Step 2: Calculate estimated market shares and update df
s_vec_estimated = s_vec(df)
df['s'] = s_vec_estimated  # Directly assign the estimated market shares to a new column

# Step 3: Calculate elasticities for specified product pairs
elas_own_hdacco_fdesco, elas_cros_hdacco_fdesco = elasticity(df, "HDACCO", "FDESCO", 90, α_hat)
elas_own_crdyna_crlanc, elas_cros_crdyna_crlanc = elasticity(df, "CRDYNA", "CRLANC", 89, α_hat)
elas_own_chsumm_chtc, elas_cros_chsumm_chtc = elasticity(df, "CHSUMM", "CHTC", 89, α_hat)

HDACCO Own elasticity: -0.823079102
HDACCO FDESCO Cross elasticity: 0.000451967
CRDYNA Own elasticity: -0.878261183
CRDYNA CRLANC Cross elasticity: 0.000526359
CHSUMM Own elasticity: -0.667692045
CHSUMM CHTC Cross elasticity: 0.000228044


## Exercise 4.1 - BLP Instruments in Logit Model

Write

$$u_{ij} = \alpha \ln(y_i – p_j) + x_j \beta + \xi_j + \epsilon_{ij}, \forall j = 0, 1, \cdots, J.$$

Use the instruments used in BLP. You will nned the firmids and the year variables to calculate  these instruments (they are product-firm-year specific). Estimate the logit model using 2SLS and instrumenting for price. 

What is the implied own-prince elaticity of the 1990 Honda Accord (HDACCO)? What is the implied cross-elasticity of the Honda Accord with respect to the 1990 Ford Escort (FDESCO)? Pick two addtional cars and report the same numbers. 



### Define BLP Instruments:

In [19]:
def construct_instrument(data, characteristic):
    # Define new column names based on the characteristic
    own_firm_sum = f"own_firm_sum_{characteristic}"
    rival_firm_sum = f"rival_firm_sum_{characteristic}"
    
    # Initialize the new columns with zeros
    data[own_firm_sum] = 0.0
    data[rival_firm_sum] = 0.0
    
    # Iterate over the DataFrame by index
    for i in data.index:
        current_firm = data.at[i, 'firmid']
        current_time = data.at[i, 'year']
        
        # Filter data for the current firm and time period
        own_firm_data = data[(data['firmid'] == current_firm) & (data['year'] == current_time)]
        rival_firm_data = data[(data['firmid'] != current_firm) & (data['year'] == current_time)]
        
        # Sum of characteristic for own-firm products (excluding current product)
        data.at[i, own_firm_sum] = own_firm_data[characteristic].sum() - data.at[i, characteristic]
        
        # Sum of characteristic for rival-firm products
        data.at[i, rival_firm_sum] = rival_firm_data[characteristic].sum()



In [20]:
# Define the characteristics
characteristics = ['horsepower_weight', 'length_width', 'ac_standard', 'miles_per_dollar']

# Loop through characteristics to construct instruments
for char in characteristics:
    construct_instrument(df, char)

# Define the instrumental variables
instruments = characteristics + \
              [f"own_firm_sum_{char}" for char in characteristics] + \
              [f"rival_firm_sum_{char}" for char in characteristics]


In [21]:
# Convert DataFrame columns to arrays/matrices
Y = np.log(df['market_share'].values) - np.log(df['s_0'].values)
constant = np.ones(df.shape[0])

# Endogenous variable
X_endogenous = df[['price']].values

# Exogenous variables
X_exogenous = df[characteristics].values
X_exogenous = np.hstack([constant.reshape(-1, 1), X_exogenous])

# Instrumental variables
Z = df[instruments].values
Z = np.hstack([constant.reshape(-1, 1), Z])


In [22]:
# pip install linearmodels


In [23]:
# from linearmodels.iv import IV2SLS

# # Prepare the regressor and instrument matrices
# X = np.hstack([X_exogenous, X_endogenous])  # Combine exogenous and endogenous variables

# # Perform the IV 2SLS regression
# iv_model = IV2SLS(dependent=Y, 
#                   exog=X_exogenous,  # Only exogenous variables
#                   endog=X_endogenous,  # Endogenous variable(s)
#                   instruments=Z).fit()

# # Extract results
# β_hat = iv_model.params[:-1]  # Exclude the last one for α_hat
# α_hat = iv_model.params[-1]   # The last parameter is α_hat

# # Extract standard errors
# se_β = iv_model.std_errors[:-1]  # Standard errors for β_hat
# se_α = iv_model.std_errors[-1]   # Standard error for α_hat

# # Printing results
# print("β_hat: ", ", ".join([f"{b:.3f}" for b in β_hat]))
# print("α_hat: {:.3f}".format(α_hat))
# print("SE β: ", ", ".join([f"{se:.3f}" for se in se_β]))
# print("SE α: {:.3f}".format(se_α))


Does not work since ValueError: instruments [exog instruments]  do not have full column rank. Some thing just happened and I need to deal with it later. 

In [24]:

from statsmodels.sandbox.regression.gmm import IV2SLS


# Prepare the full regressor matrix including a constant, exogenous, and endogenous variables
X = np.hstack([X_exogenous, X_endogenous])  # This includes the constant and exogenous variables

# Perform the IV 2SLS regression
# Note: statsmodels expects endog (Y), exog (X including constant and exogenous variables),
# and instrument (Z including constant and instruments) as parameters
iv_model = IV2SLS(endog=Y, exog=X, instrument=Z).fit()

# Extract results and standard errors
β_hat = iv_model.params[:-1]  # Excluding the last one for α_hat
α_hat = iv_model.params[-1]   # The last parameter is assumed to be α_hat
se = iv_model.bse  # Standard errors of the estimated parameters

# Printing results
print("β_hat: ", ", ".join([f"{b:.3f}" for b in β_hat]))
print("α_hat: {:.3f}".format(α_hat))
print("SE β: ", ", ".join([f"{se:.3f}" for se in se[:-1]]))
print("SE α: {:.3f}".format(se[-1]))


β_hat:  -9.902, 1.345, 2.287, 0.532, 0.163
α_hat: -0.140
SE β:  0.264, 0.407, 0.130, 0.134, 0.049
SE α: 0.011


### Elaticities:

In [25]:
# Step 1: Compute δ vector and update DataFrame
δ_vec = compute_δ_vec(df, β_hat, α_hat, characteristics, constant=True)
df['δ_vec'] = δ_vec

# Step 2: Calculate estimated market shares and update DataFrame
s_vec_estimated = s_vec(df)
df['s'] = s_vec_estimated

# Step 3: Calculate elasticities for specified product pairs
elas_own_hdacco_fdesco, elas_cros_hdacco_fdesco = elasticity(df, "HDACCO", "FDESCO", 90, α_hat)
elas_own_crdyna_crlanc, elas_cros_crdyna_crlanc = elasticity(df, "CRDYNA", "CRLANC", 89, α_hat)
elas_own_chsumm_chtc, elas_cros_chsumm_chtc = elasticity(df, "CHSUMM", "CHTC", 89, α_hat)

HDACCO Own elasticity: 1.299306150
HDACCO FDESCO Cross elasticity: -0.000093166
CRDYNA Own elasticity: 1.386705671
CRDYNA CRLANC Cross elasticity: -0.000041518
CHSUMM Own elasticity: 1.054232826
CHSUMM CHTC Cross elasticity: -0.002871674


## Exercise 4.2 - Gandhi-Houde (DH) Instrumnets in Logit Model

Function to calculate GH IV, only construct quadratic instruments without interaction terms

In [28]:
def convert_IV_df(firm_distances, group, prefix):
    for char in firm_distances.keys():
        # Calculate row sums of the squared distances matrices
        row_sums = np.sum(firm_distances[char], axis=1)
        
        # Creating new column names based on the prefix and characteristic name
        column_name = prefix + char
        
        # Assign the row sums to the group DataFrame as new columns
        group[column_name] = row_sums

    return group

In [29]:
# Function to calculate GH IV, I only construct quadratic instruments without interaction terms

def GH_IV(df, characteristics):
    # Group the DataFrame by 'year'
    grouped_data = df.groupby('year')
    collected_groups = []

    for year, group in grouped_data:
        firm_intra_distances = {}
        firm_inter_distances = {}

        for char in characteristics:
            n = len(group)
            intra_distance_matrix = np.zeros((n, n))
            inter_distance_matrix = np.zeros((n, n))

            # Iterate through each pair of rows in the group to calculate distances
            for i in range(n):
                for j in range(n):
                    # Intrafirm distance
                    if group.iloc[i]['firmid'] == group.iloc[j]['firmid']:
                        intra_distance_matrix[i, j] = group.iloc[i][char] - group.iloc[j][char]
                    # Interfirm distance
                    elif group.iloc[i]['firmid'] != group.iloc[j]['firmid']:
                        inter_distance_matrix[i, j] = group.iloc[i][char] - group.iloc[j][char]

            firm_intra_distances[char] = intra_distance_matrix ** 2
            firm_inter_distances[char] = inter_distance_matrix ** 2

        group = convert_IV_df(firm_intra_distances, group, "intra_IV_")
        group = convert_IV_df(firm_inter_distances, group, "inter_IV_")
        collected_groups.append(group)

    final_df = pd.concat(collected_groups)

    return final_df




In [30]:
df = GH_IV(df, characteristics)

# Constructing GH_instruments list with original characteristics and the generated intra- and inter- IV names
GH_instruments = characteristics + \
                 ["intra_IV_" + char for char in characteristics] + \
                 ["inter_IV_" + char for char in characteristics]

# GH_instruments now contains all the required columns names


In [31]:
# Prepare the data for IV2SLS

X = np.hstack([X_exogenous, X_endogenous])  # This includes the constant and exogenous variables

# Instrumental variables
Z = df[GH_instruments].values
Z = np.hstack([constant.reshape(-1, 1), Z])

# Fit the model # Perform the IV 2SLS regression

iv_model_GH = IV2SLS(endog=Y, exog=X, instrument=Z).fit()


# Extract the estimated coefficients and standard errors
β_hat = iv_model_GH.params[:-1]
α_hat = -iv_model_GH.params[-1]
se_beta = iv_model_GH.bse[:-1]
se_alpha = iv_model_GH.bse[-1]


In [32]:

print(f"β_hat: {', '.join([f'{x:.3f}' for x in β_hat])}")
print(f"α_hat: {α_hat:.3f}")
print(f"SE β: {', '.join([f'{x:.3f}' for x in se_beta])}")
print(f"SE α: {se_alpha:.3f}")

β_hat: -9.908, 1.292, 2.289, 0.512, 0.167
α_hat: 0.138
SE β: 0.263, 0.361, 0.130, 0.113, 0.047
SE α: 0.009


In [33]:
# Assuming compute_δ_vec, s_vec, and elasticity functions are defined elsewhere
δ_vec = compute_δ_vec(df, β_hat, α_hat, characteristics, constant=True)
df['δ_vec'] = δ_vec

s_vec_estimated = s_vec(df)
df['s'] = s_vec_estimated

# For elasticity calculations
elas_own, elas_cros = elasticity(df, "HDACCO", "FDESCO", 90, α_hat)
elas_own, elas_cros = elasticity(df, "CRDYNA", "CRLANC", 89, α_hat)
elas_own, elas_cros = elasticity(df, "CHSUMM", "CHTC", 89, α_hat)


HDACCO Own elasticity: -1.281617801
HDACCO FDESCO Cross elasticity: 0.000699002
CRDYNA Own elasticity: -1.367689576
CRDYNA CRLANC Cross elasticity: 0.000703968
CHSUMM Own elasticity: -1.039727024
CHSUMM CHTC Cross elasticity: 0.000353852


## Exercise 5.1 - BLP Instruments in Logit Model with Random Coefficients

Write

$$u_{ij} = \alpha \ln(y_i – p_j) + x_j \beta_i + \xi_j + \epsilon_{ij}, \forall j = 0, 1, \cdots, J.$$

Now add random coefficients for each characteristic and estimate the means and variances of these nromally distributed random coefficients. Estimate the demand side of the model only (unlesss you are ambitious and want smaller standard errors - then add the supply side too).

What is the implied own-prince elaticity of the 1990 Honda Accord (HDACCO)? What is the implied cross-elasticity of the Honda Accord with respect to the 1990 Ford Escort (FDESCO)? Pick two addtional cars and report the same numbers. 

This part is unfinished, see Julia version for random coefficients. 

In [34]:
def gen_random(characteristics=['horsepower_weight', 'length_width', 'ac_standard', 'miles_per_dollar']):
    # Number of people
    N = 100

    # Create an empty DataFrame
    people_df = pd.DataFrame()

    # Set the random seed for reproducibility, equivalent to Julia's MersenneTwister
    np.random.seed(0)

    # Generate and add data for each characteristic
    for char in characteristics:
        people_df[char] = np.random.normal(0, 1, N)

    return people_df


In [35]:
#Function for estimated share for one market

def eshare(char_matrix, δ, rand_sample, σ):
    """
    Estimate the market share for one market.

    char_matrix: Matrix of characteristics for products (products x characteristics).
    δ: Vector of δ_i (intrinsic utility) for each product in market t.
    rand_sample: Matrix of random coefficients for each individual (individuals x characteristics).
    σ: Vector of scaling parameters for the random coefficients.
    
    return: Vector of estimated shares for each product.
    """
    R = rand_sample.shape[0]  # Number of simulated individuals
    k = char_matrix.shape[0]  # Number of products
    share = np.zeros(k)
    
    # s_jt = 1/R * sum_i{ exp(δ_jt + μ_ijt)/sum_j'{exp(δ_j't + μ_ij't)} }
    for i in range(R):
        # Calculate exp(δ_jt + μ_ijt) where μ_ijt = σ_1*v_i1*char_1 + ... + σ_k*v_ik*char_k
        numerator = np.exp(δ + char_matrix @ (rand_sample[i, :] * σ))
        denominator = 1 + np.sum(numerator)
        prob = numerator / denominator
        share += 1/R * prob

    return share


In [36]:
#Function for estimating delta for one market

def cdelta(char_matrix, rand_sample, share, σ, tol=1e-12):
    """
    Estimate delta for one market.
    
    char_matrix: Matrix of characteristics for products (products x characteristics).
    rand_sample: Matrix of random coefficients for each individual (individuals x characteristics).
    share: Vector of actual market shares for each product.
    σ: Vector of scaling parameters for the random coefficients.
    tol: Tolerance level for convergence.
    
    return: Vector of estimated delta values for each product.
    """
    k = char_matrix.shape[0]
    δ_0 = np.ones(k)
    δ_1 = np.ones(k)
    error = 1

    while error > tol:
        δ_0 = δ_1
        estimate_share = eshare(char_matrix, δ_0, rand_sample, σ)
        δ_1 = δ_0 + np.log(share) - np.log(estimate_share)
        error = np.dot((δ_1 - δ_0), (δ_1 - δ_0))

    return δ_1


In [45]:
def delta_full(df, rand_sample, characteristics, σ, tol=1e-12):
    markets = df['year'].unique()  # Get unique market identifiers
    results = OrderedDict()
    err = OrderedDict()

    for j in markets:
        # Filter DataFrame for the current market more explicitly
        market_df = df.loc[df['year'] == j]

        # Ensure characteristics is a list to prevent potential issues
        if not isinstance(characteristics, list):
            characteristics = list(characteristics)

        # Extract characteristics and market share for the current market
        chars = market_df.loc[:, characteristics].to_numpy()
        share = market_df['market_share'].to_numpy()

        # Estimate delta for the current market
        δ_new = cdelta(chars, rand_sample, share, σ, tol=tol)
        results[j] = δ_new

        # Check for NaN values in the results
        if np.isnan(δ_new).any():
            err[j] = 'NaN values detected'

    # Combine the results from each market into a single array
    combined_results = np.concatenate(list(results.values()))

    return combined_results, err



In [46]:
def obj(df, rand_sample, characteristics, IV_list, σ, delta_full_func=delta_full):
    # Calculate the delta (δ) value using the provided or default delta_full function
    δ, _ = delta_full_func(df, rand_sample, characteristics, σ)
    
    # Extracting the instrumental variables (IV) matrix from the dataframe.
    z1 = df.loc[:, IV_list].values
    
    # Ensuring the 'price' column is included in the characteristics if not already
    chars = characteristics
    if 'price' not in characteristics:
        chars = characteristics + ['price']
    
    # Creating the design matrix x with a constant term and characteristics
    x = np.hstack([np.ones((df.shape[0], 1)), df.loc[:, chars].values])
    
    # IV matrix with a constant term added
    z = np.hstack([np.ones((df.shape[0], 1)), z1])
    
    # Weight matrix for the instrumental variables
    w = np.linalg.inv(z.T @ z)
    
    # Estimating beta (β) using the 2SLS method
    β = np.linalg.inv(x.T @ z @ w @ z.T @ x) @ (x.T @ z @ w @ z.T @ δ)
    
    # Calculating the residuals
    u = δ - x @ β
    
    # Calculating the result as u'ZWZ'u
    res = u.T @ z @ w @ z.T @ u
    
    return res, β, u


In [47]:

# Define characteristics without the price since there's no random coefficient for price
characteristics = ["horsepower_weight", "length_width", "ac_standard", "miles_per_dollar"]

# Generate random sample using the characteristics
rand_sample = gen_random(characteristics)  # Ensure this returns a NumPy array or similar

# Construct instruments for each characteristic in the dataframe
for char in characteristics:
    construct_instrument(df, char)

# Create the IV list by concatenating the characteristics with their own and rival firm sums
IV_list = characteristics + \
          ["own_firm_sum_" + char for char in characteristics] + \
          ["rival_firm_sum_" + char for char in characteristics]


In [48]:
IV_list

['horsepower_weight',
 'length_width',
 'ac_standard',
 'miles_per_dollar',
 'own_firm_sum_horsepower_weight',
 'own_firm_sum_length_width',
 'own_firm_sum_ac_standard',
 'own_firm_sum_miles_per_dollar',
 'rival_firm_sum_horsepower_weight',
 'rival_firm_sum_length_width',
 'rival_firm_sum_ac_standard',
 'rival_firm_sum_miles_per_dollar']