### Replication of the best practice results from Colton and Gortmaker 2020

## Fake Cereal

In [1]:
import pyblp
import numpy as np
import pandas as pd

pyblp.options.digits = 2
pyblp.options.verbose = False
pyblp.options.weights_tol = np.inf

In [2]:
# Set the problem

nevo_problem = pyblp.Problem(
    product_formulations = (
        pyblp.Formulation('0 + prices', absorb='C(product_ids)'),
        pyblp.Formulation('1 + prices + sugar + mushy'),
    ),
    agent_formulation = pyblp.Formulation('0 + income + income_squared + age + child'),
    product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION),
    agent_data = pd.read_csv(pyblp.data.NEVO_AGENTS_LOCATION)
)
nevo_problem

Dimensions:
 T    N     F    I     K1    K2    D    MD    ED 
---  ----  ---  ----  ----  ----  ---  ----  ----
94   2256   5   1880   1     4     4    20    1  

Formulations:
       Column Indices:           0           1           2      3  
-----------------------------  ------  --------------  -----  -----
 X1: Linear Characteristics    prices                              
X2: Nonlinear Characteristics    1         prices      sugar  mushy
       d: Demographics         income  income_squared   age   child

In [3]:
initial_sigma = np.diag([0.3302, 2.4526, 0.0163, 0.2441])
initial_pi = [
          [ 5.4819,  0,      0.2037,  0     ],
          [15.8935, -1.2000, 0,       2.6342],
          [-0.2506,  0,      0.0511,  0     ],
          [ 1.2650,  0,     -0.8091,  0     ]
    ]

In [4]:
# Solve the lower tolerance scenario

results_tightertol = nevo_problem.solve(
    sigma = initial_sigma,
    pi = initial_pi,
    optimization=pyblp.Optimization('bfgs', {'gtol': 1e-5}),
    method='1s'
)
results_tightertol

Problem Results Summary:
GMM   Objective  Gradient      Hessian         Hessian     Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm    Min Eigenvalue  Max Eigenvalue  Shares   Condition Number  Condition Number 
----  ---------  --------  --------------  --------------  -------  ----------------  -----------------
 1    +4.6E+00   +6.9E-06     +2.6E-05        +1.6E+04        0         +6.9E+07          +8.4E+08     

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:00:35       Yes          51           57          46395       143976   

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
Sigma:      1         prices      sugar       mushy     |   Pi:      income    income_squared     age        child   
------  ----------  ----------  ----------  -------

In [5]:
# Introduce optimal instruments and solve 

instrument_results = results_tightertol.compute_optimal_instruments(method='approximate')
optimal_instrument_problem = instrument_results.to_problem()

nevo_bestpractice = optimal_instrument_problem.solve(
    sigma = results_tightertol.sigma,
    pi = results_tightertol.pi,
    optimization=pyblp.Optimization('bfgs', {'gtol': 1e-5}), 
    method='1s'
)
nevo_bestpractice

Problem Results Summary:
GMM   Objective  Gradient      Hessian         Hessian     Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm    Min Eigenvalue  Max Eigenvalue  Shares   Condition Number  Condition Number 
----  ---------  --------  --------------  --------------  -------  ----------------  -----------------
 1    +8.0E-14   +3.0E-06     +1.6E-04        +2.9E+04        0         +7.8E+07          +1.8E+08     

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:00:34       Yes          42           50          45899       142143   

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
Sigma:      1         prices      sugar       mushy     |   Pi:      income    income_squared     age        child   
------  ----------  ----------  ----------  -------

The beta estimate is around -31.403, a bit far from the -27.486 reported in table 7

## Automobile Data

In [6]:
am_problem = pyblp.Problem(
    product_formulations = (
        pyblp.Formulation('1 + hpwt + air + mpd + space'),
        pyblp.Formulation('1 + prices + hpwt + air + mpd + space'),
        pyblp.Formulation('1 + log(hpwt) + air + log(mpg) + log(space) + trend')
    ),
    agent_formulation = pyblp.Formulation('0 + I(1 / income)'),
    costs_type = 'log',
    product_data = pd.read_csv(pyblp.data.BLP_PRODUCTS_LOCATION), 
    agent_data = pd.read_csv(pyblp.data.BLP_AGENTS_LOCATION)
)
am_problem

Dimensions:
 T    N     F    I     K1    K2    K3    D    MD    MS 
---  ----  ---  ----  ----  ----  ----  ---  ----  ----
20   2217  26   4000   5     6     6     1    13    18 

Formulations:
       Column Indices:            0          1       2       3          4         5  
-----------------------------  --------  ---------  ----  --------  ----------  -----
 X1: Linear Characteristics       1        hpwt     air     mpd       space          
X2: Nonlinear Characteristics     1       prices    hpwt    air        mpd      space
X3: Log Cost Characteristics      1      log(hpwt)  air   log(mpg)  log(space)  trend
       d: Demographics         1/income                                              

In [7]:
initial_sigma = np.diag([3.612, 0, 4.628, 1.818, 1.050, 2.056])
initial_pi = np.array([[0],[-43.501],[0],[0],[0],[0]])

In [8]:
results_replication = am_problem.solve(
    initial_sigma,
    initial_pi,
    initial_update=True,           # update the weight matrix to starting values before first GMM
    costs_bounds=(0.001, None),    # bounds for costs_type (since is logarithmic having a lb avoid nonpositive costs)
    W_type='clustered',            # update the weight matrix cluster by automovil model
    se_type='clustered',
)
results_replication

Problem Results Summary:
GMM   Objective    Projected    Reduced Hessian  Reduced Hessian  Clipped  Clipped  Weighting Matrix  Covariance Matrix
Step    Value    Gradient Norm  Min Eigenvalue   Max Eigenvalue   Shares    Costs   Condition Number  Condition Number 
----  ---------  -------------  ---------------  ---------------  -------  -------  ----------------  -----------------
 2    +5.0E+02     +2.3E-06        +4.9E-01         +5.1E+02         0        0         +4.2E+09          +3.8E+08     

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:02:34       No           58           167         48339       148305   

Nonlinear Coefficient Estimates (Robust SEs Adjusted for 999 Clusters in Parentheses):
Sigma:      1        prices      hpwt        air         mpd        space     |

In [9]:
instrument_results = results_replication.compute_optimal_instruments(method='approximate')
optimal_instrument_problem = instrument_results.to_problem()

am_bestpractice = optimal_instrument_problem.solve(
    sigma = results_replication.sigma,
    pi = results_replication.pi,
    initial_update=True,           # update the weight matrix to starting values before first GMM
    costs_bounds=(0.001, None),    # bounds for costs_type (since is logarithmic having a lb avoid nonpositive costs)
    W_type='clustered',            # update the weight matrix cluster by automovil model
    se_type='clustered',
)
am_bestpractice

Problem Results Summary:
GMM   Objective    Projected    Reduced Hessian  Reduced Hessian  Clipped  Clipped  Weighting Matrix  Covariance Matrix
Step    Value    Gradient Norm  Min Eigenvalue   Max Eigenvalue   Shares    Costs   Condition Number  Condition Number 
----  ---------  -------------  ---------------  ---------------  -------  -------  ----------------  -----------------
 2    +1.4E+02     +3.0E-07        +3.5E-01         +1.1E+02         0        0         +8.1E+07          +3.1E+07     

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:01:47       No           50           116         26727        82094   

Nonlinear Coefficient Estimates (Robust SEs Adjusted for 999 Clusters in Parentheses):
Sigma:      1        prices      hpwt        air         mpd        space     |

There are some discrepancies. Maybe the most apparent:

|Estimate|         |Notebook     |Reported     |
|--------|---------|-------------|-------------|
|Means   |Air      |0.176(0.224) |0.572(0.349) |
|Std.Dev |C        |1.591(1.926) |2.962(1.637) |
|        |space    |0.966(1.105) |0.966(1.105) |
|Supply  |ln(space)|-0.352(0.219)|-0.472(0.125)|
|GMM onjective scaled by N||136.240|236|
