## Nevo (2000b) Replication

Obj: Replicate Table 7 from Colton and Gortmaker 2020 (page 1148)

In [1]:
import pyblp
import numpy as np
import pandas as pd

pyblp.options.digits = 2
pyblp.options.verbose = False

### Here we create the problem class

Important:
- The demand-side linear characteristics (X1) is the first line of the product formulations
    - C(products_ids) means that the product ID is a categorical value
    - absorb means
- The demand-side non-linear characteristics (X2) is the second line of the product formulations
- The agent_formulation add the demographic information

In [2]:
problem = pyblp.Problem(
    product_formulations = (
        pyblp.Formulation('0 + prices', absorb='C(product_ids)'),
        pyblp.Formulation('1 + prices + sugar + mushy'),
    ),
    agent_formulation = pyblp.Formulation('0 + income + income_squared + age + child'),
    product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION),
    agent_data = pd.read_csv(pyblp.data.NEVO_AGENTS_LOCATION)
)
problem

Dimensions:
 T    N     F    I     K1    K2    D    MD    ED 
---  ----  ---  ----  ----  ----  ---  ----  ----
94   2256   5   1880   1     4     4    20    1  

Formulations:
       Column Indices:           0           1           2      3  
-----------------------------  ------  --------------  -----  -----
 X1: Linear Characteristics    prices                              
X2: Nonlinear Characteristics    1         prices      sugar  mushy
       d: Demographics         income  income_squared   age   child

From the dimension table:
|T|N|F|I|K1|K2|D|MD|ED|
|-|-|-|-|--|--|-|--|--|
|Number of markets|Number of products across all markets|Number of firms across all markets|Number of agents across all markets|Number of demand-side linear product characteristics|Number of demand-side nonlinear product characteristics|Number of demographic variable|Number of demand-side instruments, which is typically the number of excluded demand-side instruments plus the number of exogenous demand-side linear product characteristics|Number of absorbed dimensions of demand-side fixed effects|

### First we solve replicating nevo's work

Explanation
- sigma is a matrix that fixed at zero or at starting values of the (lower-triangular Cholesky root) covariance matrix of the nonlinear characteristics (unobserved heterogeneity)
    - Rows and columns correspond to columns of the variables of X2
- pi is a matrix that fixed at zero or at starting values the parameters of how agents preferences change with demographics (observed heterogeneity)
    - Rows correspond to the same product characteristic of sigma and the columns are the columns of the demographics
- We set a loose tolerance because the tolerance setting between matlab and scipy are not equally reported

In [3]:
initial_sigma = np.diag([0.3302, 2.4526, 0.0163, 0.2441])
initial_pi = [
          [ 5.4819,  0,      0.2037,  0     ],
          [15.8935, -1.2000, 0,       2.6342],
          [-0.2506,  0,      0.0511,  0     ],
          [ 1.2650,  0,     -0.8091,  0     ]
    ]

In [4]:
results_replication = problem.solve(
    sigma = initial_sigma,
    pi = initial_pi,
    optimization=pyblp.Optimization('bfgs', {'gtol': 0.5}), #Loose tol
    method='1s'
)
results_replication

Problem Results Summary:
GMM   Objective  Gradient      Hessian         Hessian     Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm    Min Eigenvalue  Max Eigenvalue  Shares   Condition Number  Condition Number 
----  ---------  --------  --------------  --------------  -------  ----------------  -----------------
 1    +1.5E+01   +3.7E-01     +6.9E-05        +1.7E+04        0         +6.9E+07          +3.7E+08     

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:00:20       Yes          20           26          20419        63372   

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
Sigma:      1         prices      sugar       mushy     |   Pi:      income    income_squared     age        child   
------  ----------  ----------  ----------  -------

In [5]:
# Calculate own elasticities
e_replication = results_replication.compute_elasticities()
e_replication_means = results_replication.extract_diagonal_means(e_replication)

# Calculate Markup
costs = results_replication.compute_costs()
markups_replication = results_replication.compute_markups(costs=costs)

------------------------
### Second we replicate the experiment but with tighter tolerance--

In [6]:
results_tightertol = problem.solve(
    sigma = initial_sigma,
    pi = initial_pi,
    optimization=pyblp.Optimization('bfgs', {'gtol': 1e-5}), # Here!
    method='1s'
)
results_tightertol

Problem Results Summary:
GMM   Objective  Gradient      Hessian         Hessian     Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm    Min Eigenvalue  Max Eigenvalue  Shares   Condition Number  Condition Number 
----  ---------  --------  --------------  --------------  -------  ----------------  -----------------
 1    +4.6E+00   +6.9E-06     +2.6E-05        +1.6E+04        0         +6.9E+07          +8.4E+08     

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:00:36       Yes          51           57          46395       143976   

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
Sigma:      1         prices      sugar       mushy     |   Pi:      income    income_squared     age        child   
------  ----------  ----------  ----------  -------

In [7]:
# Calculate own elasticities
e_tightertol = results_tightertol.compute_elasticities()
e_tightertol_means = results_tightertol.extract_diagonal_means(e_tightertol)

# Calculate Markup
costs = results_tightertol.compute_costs()
markups_tightertol = results_tightertol.compute_markups(costs=costs)

--------------------------
### Third we estimate with best estimation practices
In other words: tighter tolerance + approximate version of feasible optimal instruments

In [8]:
instrument_results = results_replication.compute_optimal_instruments(method='approximate')
optimal_instrument_problem = instrument_results.to_problem()

results_bestpractice = optimal_instrument_problem.solve(
    sigma = results_replication.sigma,
    pi = results_replication.pi,
    optimization=pyblp.Optimization('bfgs', {'gtol': 1e-5}), # Here!
    method='1s'
)
results_bestpractice

Problem Results Summary:
GMM   Objective  Gradient      Hessian         Hessian     Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm    Min Eigenvalue  Max Eigenvalue  Shares   Condition Number  Condition Number 
----  ---------  --------  --------------  --------------  -------  ----------------  -----------------
 1    +3.7E-15   +6.5E-06     +3.2E-04        +3.2E+04        0         +2.5E+07          +1.1E+08     

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:00:28       Yes          41           45          35183       109433   

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
Sigma:      1         prices      sugar       mushy     |   Pi:      income    income_squared     age        child   
------  ----------  ----------  ----------  -------

In [9]:
instrument_results = results_tightertol.compute_optimal_instruments(method='approximate')
optimal_instrument_problem = instrument_results.to_problem()

results_bestpractice = optimal_instrument_problem.solve(
    sigma = results_tightertol.sigma,
    pi = results_tightertol.pi,
    optimization=pyblp.Optimization('bfgs', {'gtol': 1e-5}), # Here!
    method='1s'
)
results_bestpractice

Problem Results Summary:
GMM   Objective  Gradient      Hessian         Hessian     Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm    Min Eigenvalue  Max Eigenvalue  Shares   Condition Number  Condition Number 
----  ---------  --------  --------------  --------------  -------  ----------------  -----------------
 1    +8.0E-14   +3.0E-06     +1.6E-04        +2.9E+04        0         +7.8E+07          +1.8E+08     

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:00:33       Yes          42           50          45899       142143   

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
Sigma:      1         prices      sugar       mushy     |   Pi:      income    income_squared     age        child   
------  ----------  ----------  ----------  -------

In [10]:
# Calculate own elasticities
e_bestpractice = results_bestpractice.compute_elasticities()
e_bestpractice_means = results_bestpractice.extract_diagonal_means(e_bestpractice)

# Calculate Markup
costs = results_bestpractice.compute_costs()
markups_bestpractice = results_bestpractice.compute_markups(costs=costs)

--------------------------
### Table of Results

In [11]:
r = results_replication
t = results_tightertol
b = results_bestpractice

pd.options.display.float_format = '{:,.3f}'.format

In [12]:
data = np.matrix([
    [-32.433,r.beta[0,0],     t.beta[0,0],     b.beta[0,0]],
    [7.743,  r.beta_se[0,0],  t.beta_se[0,0],  b.beta_se[0,0]],
    [1.848,  r.sigma[1,1],    t.sigma[1,1],    b.sigma[1,1]],
    [1.075,  r.sigma_se[1,1], t.sigma_se[1,1], b.sigma_se[1,1]],
    [0.377,  r.sigma[0,0],    t.sigma[0,0],    b.sigma[0,0]],
    [0.129,  r.sigma_se[0,0], t.sigma_se[0,0], b.sigma_se[0,0]],
    [0.004,  r.sigma[2,2],    t.sigma[2,2],    b.sigma[2,2]],
    [0.012,  r.sigma_se[2,2], t.sigma_se[2,2], b.sigma_se[2,2]],
    [0.081,  r.sigma[3,3],    t.sigma[3,3],    b.sigma[3,3]],
    [0.205,  r.sigma_se[3,3], t.sigma_se[3,3], b.sigma_se[3,3]],
    [16.598, r.pi[1,0],       t.pi[1,0],       b.pi[1,0]],
    [172.334,r.pi_se[1,0],    t.pi_se[1,0],    b.pi_se[1,0]],
    [-0.659, r.pi[1,1],       t.pi[1,1],       b.pi[1,1]],
    [8.955,  r.pi_se[1,1],    t.pi_se[1,1],    b.pi_se[1,1]],
    [11.625, r.pi[1,3],       t.pi[1,3],       b.pi[1,3]],
    [5.207,  r.pi_se[1,3],    t.pi_se[1,3],    b.pi_se[1,3]],
    [3.089,  r.pi[0,0],       t.pi[0,0],       b.pi[0,0]],
    [1.213,  r.pi_se[0,0],    t.pi_se[0,0],    b.pi_se[0,0]],
    [1.186,  r.pi[0,2],       t.pi[0,2],       b.pi[0,2]],
    [1.016,  r.pi_se[0,2],    t.pi_se[0,2],    b.pi_se[0,2]],
    [-0.193, r.pi[2,0],       t.pi[2,0],       b.pi[2,0]],
    [0.005,  r.pi_se[2,0],    t.pi_se[2,0],    b.pi_se[2,0]],
    [0.029,  r.pi[2,2],       t.pi[2,2],       b.pi[2,2]],
    [0.036,  r.pi_se[2,2],    t.pi_se[2,2],    b.pi_se[2,2]],
    [1.468,  r.pi[3,0],       t.pi[3,0],       b.pi[3,0]],
    [0.697,  r.pi_se[3,0],    t.pi_se[3,0],    b.pi_se[3,0]],
    [-1.514, r.pi[3,2],       t.pi[3,2],       b.pi[3,2]],
    [1.103,  r.pi_se[3,2],    t.pi_se[3,2],    b.pi_se[3,2]],
    [np.nan, np.mean(e_replication_means), np.mean(e_tightertol_means), np.mean(e_bestpractice_means)],
    [np.nan, np.mean(markups_replication), np.mean(markups_tightertol), np.mean(markups_bestpractice)],
    [0.0066, r.objective[0,0]/problem.N,   t.objective[0,0]/problem.N,  b.objective[0,0]/problem.N],
    [14.9,   r.objective[0,0],             t.objective[0,0],            b.objective[0,0]]
])
indexes = [np.array(['Means','Means',
                    'Standard Deviations','Standard Deviations','Standard Deviations','Standard Deviations',
                    'Standard Deviations','Standard Deviations','Standard Deviations','Standard Deviations',
                    'Interactions','Interactions','Interactions','Interactions','Interactions','Interactions',
                    'Interactions','Interactions','Interactions','Interactions','Interactions','Interactions',
                    'Interactions','Interactions','Interactions','Interactions','Interactions','Interactions',
                    'Mean own-price elasticity','Mean markup','GMM objective','GMM objective scaled by N']),
           np.array(['Price','price.se',
                    'Price','price.se','Constant','constant.se',
                    'Sugar','sugar.se','Mushy','mushy.se',
                    'Price x Income','price x income(se)','Price x sq(Income)','price x sq(income)(se)',
                    'Price x Child','price x child(se)','Constant x Income','constant x income(se)',
                    'Constant x Age','constant x age(se)','Sugar x Income','sugar x income(se)',
                    'Sugar x Age','sugar x age(se)','Mushy x Income','mushy x income(se)',
                    'Mushy x Age','mushy x age(se)','','','',''])]

nevo_table = pd.DataFrame(data, columns=['Nevo_Original','Replication','Tighter Tol','Best Practice'],
                         index=indexes)
nevo_table

Unnamed: 0,Unnamed: 1,Nevo_Original,Replication,Tighter Tol,Best Practice
Means,Price,-32.433,-32.416,-62.73,-31.403
Means,price.se,7.743,7.74,14.803,4.527
Standard Deviations,Price,1.848,1.853,3.312,3.002
Standard Deviations,price.se,1.075,1.067,1.34,0.648
Standard Deviations,Constant,0.377,0.376,0.558,0.214
Standard Deviations,constant.se,0.129,0.129,0.163,0.078
Standard Deviations,Sugar,0.004,-0.003,-0.006,0.027
Standard Deviations,sugar.se,0.012,0.012,0.014,0.007
Standard Deviations,Mushy,0.081,0.079,0.093,0.299
Standard Deviations,mushy.se,0.205,0.203,0.185,0.101
