In [64]:
import numpy as np
import pandas as pd
import pyblp

In [65]:
df = pd.read_csv('/Users/terrylu/Desktop/UF/Courses/2025-2026/IO/IO_Code/data/laptop_data.csv')

print(df.head(5))

   market_ids  firm_ids                 model   prices  screen_size_in  \
0           1         1  MacBook Pro 14" (M4)  1606.39            14.0   
1           1         1  MacBook Air 13" (M3)  1091.23            13.3   
2           1         2          Inspiron 14"   700.78            14.0   
3           1         5          VivoBook 15"   636.10            15.6   
4           1         5      ROG Zephyrus 15"  1849.48            15.6   

   ram_gb  storage_gb  brand_Apple  brand_Dell  brand_HP  brand_Lenovo  \
0      16         512         True       False     False         False   
1       8         256         True       False     False         False   
2       8         256        False        True     False         False   
3       8         256        False       False     False         False   
4      16        1000        False       False     False         False   

   os_macOS    shares  nesting_ids  demand_instruments0  demand_instruments1  \
0      True  0.007528         

In [66]:
# prepping data for pyblp
product_data = df.drop(columns=['nesting_ids']).copy()
# pyblp expects: market_id, product_id, prices
# product characteristics, demand, instruments

#simple (agg market share) logit
logit_formulation = pyblp.Formulation(
    "1+ prices + screen_size_in + ram_gb + storage_gb"
)

problem_logit = pyblp.Problem(
    logit_formulation,
    product_data
)

Initializing the problem ...
Initialized the problem after 00:00:00.

Dimensions:
 T    N     F    K1    MD 
---  ----  ---  ----  ----
200  2196   5    5     7  

Formulations:
     Column Indices:         0     1           2           3         4     
--------------------------  ---  ------  --------------  ------  ----------
X1: Linear Characteristics   1   prices  screen_size_in  ram_gb  storage_gb


T = 200

市场的数量（number of markets）。

也就是你有 200 个不同的 market_id，比如不同地区、季度组合。

N = 2196

产品-市场观测值的总数（total product-market observations）。

每一行数据代表某个产品在某个市场的情况。

平均下来 = 2196 ÷ 200 ≈ 11 个产品/市场。

F = 5

产品特征（features）的数量，包括截距（intercept）。

这里是：1 (常数项) + prices + screen_size_in + ram_gb + storage_gb。

K1 = 5

线性效用中的参数数量（linear characteristics）。

和 F 一致，因为你没有额外的 non-linear terms 或 random coefficients。

MD = 7

工具变量和额外矩条件（moment dimensions）的数量。

pyblp 默认会自动生成一些 instruments，比如价格、特征、以及它们的函数。

In [67]:
#estimate logit

results_logit = problem_logit.solve()

Solving the problem ...
Updating the weighting matrix ...
Computed results after 00:00:00.

Problem Results Summary:
GMM     Objective    Clipped  Weighting Matrix  Covariance Matrix
Step      Value      Shares   Condition Number  Condition Number 
----  -------------  -------  ----------------  -----------------
 1    +8.013112E+01     0      +1.857417E+09      +7.712179E+08  

Estimating standard errors ...
Computed results after 00:00:00.

Problem Results Summary:
GMM     Objective    Clipped  Weighting Matrix  Covariance Matrix
Step      Value      Shares   Condition Number  Condition Number 
----  -------------  -------  ----------------  -----------------
 2    +1.189330E+03     0      +2.230494E+09      +1.758367E+09  

Cumulative Statistics:
Computation   Objective 
   Time      Evaluations
-----------  -----------
 00:00:00         2     

Beta Estimates (Robust SEs in Parentheses):
       1             prices       screen_size_in       ram_gb         storage_gb   
-----------

why does ram_gb have a negative coefficient?

sicne this may related to brand. for example, apple macbook has low ram but high price.

In [68]:
# logit with dummies (ASUS is baseline)
logit_formulation_2 = pyblp.Formulation(
    "1+ prices + screen_size_in + ram_gb + storage_gb" \
    "+ brand_HP + brand_Apple + brand_Dell + brand_Lenovo"
)

problem_logit_2 = pyblp.Problem(
    logit_formulation_2,
    product_data
)

# ❌ ASUS/Microsoft ≠ outside option。

# ✅ 它们只是基准品牌，效用差异通过其他 dummy 相对估计。

# 真正的 outside option 是单独的一行记录（通常 market share = 1 − Σ inside shares）。

# differnent of baseline and outside option: baseline is just a reference category for categorical variables, while outside option represents the choice of not selecting any of the available products.
# we care compare the utility of different products relative to the baseline, not relative to the outside option.

Initializing the problem ...


Initialized the problem after 00:00:00.

Dimensions:
 T    N     F    K1    MD 
---  ----  ---  ----  ----
200  2196   5    9     11 

Formulations:
     Column Indices:         0     1           2           3         4             5                 6                 7                  8         
--------------------------  ---  ------  --------------  ------  ----------  --------------  -----------------  ----------------  ------------------
X1: Linear Characteristics   1   prices  screen_size_in  ram_gb  storage_gb  brand_HP[True]  brand_Apple[True]  brand_Dell[True]  brand_Lenovo[True]


In [69]:
results_logit_2 = problem_logit_2.solve()

Solving the problem ...
Updating the weighting matrix ...
Computed results after 00:00:00.

Problem Results Summary:
GMM     Objective    Clipped  Weighting Matrix  Covariance Matrix
Step      Value      Shares   Condition Number  Condition Number 
----  -------------  -------  ----------------  -----------------
 1    +7.950784E+01     0      +2.653301E+09      +1.315120E+09  

Estimating standard errors ...
Computed results after 00:00:00.

Problem Results Summary:
GMM     Objective    Clipped  Weighting Matrix  Covariance Matrix
Step      Value      Shares   Condition Number  Condition Number 
----  -------------  -------  ----------------  -----------------
 2    +1.307044E+03     0      +2.875925E+09      +3.872637E+09  

Cumulative Statistics:
Computation   Objective 
   Time      Evaluations
-----------  -----------
 00:00:00         2     

Beta Estimates (Robust SEs in Parentheses):
       1             prices       screen_size_in       ram_gb         storage_gb     brand_HP[T

In [70]:
diff = +1.725538E-02 / -1.994877E-03
print(diff)  

-8.649846582019844


this diff = +1.725538E-02 / -1.994877E-03 = -8.649846582019844

means:

for one unit increase in ram_gb, the price will decrease by 8.649846582019844 unit, holding other features constant.

In [71]:
results_logit_2.beta
results_logit_2.delta
results_logit_2.xi
results_logit_2.W
# results_logit_2.......

array([[ 6.14108461e+00,  1.79277597e-03, -2.25501237e-04,
        -4.92434011e+01, -9.38559234e-01, -6.50440115e-02,
         2.53277006e-04,  8.64426066e-02, -1.10853395e+00,
        -4.70334178e-01, -4.53079698e-01],
       [ 1.79277597e-03,  2.60302414e-03, -2.29170064e-03,
         5.70229280e-01, -3.50338655e-02, -4.12146149e-02,
         1.46535725e-04,  3.08764391e-02, -1.16783258e-01,
        -1.47300373e-02,  6.22732727e-02],
       [-2.25501237e-04, -2.29170064e-03,  2.43414364e-03,
         7.54985668e-01, -4.78707193e-02, -5.63288478e-03,
        -1.94424173e-04,  1.36365892e-02, -1.51554808e-01,
        -5.34776247e-02, -9.81648217e-02],
       [-4.92434011e+01,  5.70229280e-01,  7.54985668e-01,
         1.19930793e+04, -7.03265303e+02, -2.23030481e+02,
         8.77845992e-01, -1.87641860e+02, -1.00335597e+03,
        -5.77848131e+02, -3.04431333e+02],
       [-9.38559234e-01, -3.50338655e-02, -4.78707193e-02,
        -7.03265303e+02,  4.52926389e+01,  1.25579663e+01,
  

In [72]:
#compate elasticities
elasticities_logit = results_logit_2.compute_elasticities()

Computing elasticities with respect to prices ...


Finished after 00:00:01.



In [73]:
# 矩阵的维度
print(elasticities_logit.shape)

# 前 5 行、前 5 列
print(elasticities_logit[:10, :10])


(2196, 14)
[[-3.18042631  0.02924826  0.03906406  0.03540878  0.02245479  0.04272027
   0.04294751  0.0314521   0.00792754  0.07517989]
 [ 0.02412344 -2.14762088  0.03906406  0.03540878  0.02245479  0.04272027
   0.04294751  0.0314521   0.00792754  0.07517989]
 [ 0.02412344  0.02924826 -1.35890554  0.03540878  0.02245479  0.04272027
   0.04294751  0.0314521   0.00792754  0.07517989]
 [ 0.02412344  0.02924826  0.03906406 -1.2335322   0.02245479  0.04272027
   0.04294751  0.0314521   0.00792754  0.07517989]
 [ 0.02412344  0.02924826  0.03906406  0.03540878 -3.66702951  0.04272027
   0.04294751  0.0314521   0.00792754  0.07517989]
 [ 0.02412344  0.02924826  0.03906406  0.03540878  0.02245479 -0.94885308
   0.04294751  0.0314521   0.00792754  0.07517989]
 [ 0.02412344  0.02924826  0.03906406  0.03540878  0.02245479  0.04272027
  -1.26812526  0.0314521   0.00792754  0.07517989]
 [ 0.02412344  0.02924826  0.03906406  0.03540878  0.02245479  0.04272027
   0.04294751 -2.55821685  0.00792754  0

么读这些数值？
1. Own-price elasticity（对角线）

例如第一行第一列：-3.18042631
→ 这是 产品1 对自己价格的弹性。
→ 负值（正常现象），说明提价会导致需求下降。
→ 大小（约 -3.18）表示价格上升 1%，需求下降约 3.18%。

2. Cross-price elasticity（非对角线）

例如第一行第二列：0.0294826
→ 这是 产品1 的需求 对 产品2 价格 的敏感度。
→ 正值 → 说明产品1 和产品2 是 替代品（对方价格上涨 → 自己销量上升）。
→ 如果出现负的交叉项，可能表示互补关系。

3. 范围解释

一般 own-price elasticity 应该 < -1（需求对价格敏感）。

Cross-price elasticity 通常比较小（0.01 ~ 0.1 级别），但能反映竞争关系。

🔹 你的矩阵具体例子

第 2 行第 2 列：-2.14762088
→ 产品2 的 own-price elasticity，大约 -2.15。

第 2 行第 3 列：0.03906406
→ 产品2 的需求对产品3 价格的敏感度（正，替代品）。

第 6 行第 6 列：-0.94885308
→ 产品6 own-price elasticity，约 -0.95（小于 1，说明需求比较缺乏弹性）。

第 9 行第 9 列：-4.81388858
→ 产品9 own-price elasticity，非常有弹性（价格稍微上涨，销量就大幅下降）。

✅ 总结一句话：

对角线（负数）： 每个产品对自己价格的弹性（own-price elasticity）。

非对角线（多为小正数）： 产品之间的替代效应（cross-price elasticity）。

数值大小 → 竞争强度，符号 → 替代还是互补。

In [74]:
np.diag(elasticities_logit)[:5]

array([-3.18042631, -2.14762088, -1.35890554, -1.2335322 , -3.66702951])

In [75]:
# compute Marginal costs
marginal_costs_logit = results_logit_2.compute_costs()
print(marginal_costs_logit[:5])

Computing marginal costs ...


Finished after 00:00:00.

[[1093.51072591]
 [ 578.35072591]
 [ 180.61448052]
 [ 110.36884162]
 [1323.74884162]]


### Cereal Data

In [76]:
product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)

In [None]:
logit_formulation_cereal = pyblp.Formulation('prices',
                                             absorb = 'C(product_ids)' # absorb product fixed effects of each cereal
# here, product_ids is the column name in the dataset that contains the unique identifiers for each cereal product.
# we control for unobserved product-specific factors that might influence demand, such as brand loyalty or unique product features, by including these fixed effects.
# only variation within each product is used to estimate the effect of price on market share. 
)

logit_formulation_cereal

prices + Absorb[C(product_ids)]

In [84]:
problem = pyblp.Problem(logit_formulation_cereal, product_data)

Initializing the problem ...
Absorbing demand-side fixed effects ...
Initialized the problem after 00:00:00.

Dimensions:
 T    N     F    K1    MD    ED 
---  ----  ---  ----  ----  ----
94   2256   5    1     20    1  

Formulations:
     Column Indices:          0   
--------------------------  ------
X1: Linear Characteristics  prices


In [90]:
logit_results_cereal = problem.solve()

Solving the problem ...
Updating the weighting matrix ...
Computed results after 00:00:00.

Problem Results Summary:
GMM     Objective    Clipped  Weighting Matrix
Step      Value      Shares   Condition Number
----  -------------  -------  ----------------
 1    +1.899432E+02     0      +6.927228E+07  

Estimating standard errors ...
Computed results after 00:00:00.

Problem Results Summary:
GMM     Objective    Clipped  Weighting Matrix
Step      Value      Shares   Condition Number
----  -------------  -------  ----------------
 2    +1.874555E+02     0      +5.682065E+07  

Cumulative Statistics:
Computation   Objective 
   Time      Evaluations
-----------  -----------
 00:00:00         2     

Beta Estimates (Robust SEs in Parentheses):
    prices     
---------------
 -3.004710E+01 
(+1.008589E+00)


In [93]:
# counterfatual shares
logit_results_cereal.compute_shares(prices = product_data['prices'])[0:5]
logit_results_cereal.compute_shares(prices = product_data['prices']*1.1)[0:5]

Computing shares ...
Finished after 00:00:00.

Computing shares ...
Finished after 00:00:00.



array([[0.01163712],
       [0.00644931],
       [0.01015992],
       [0.00453914],
       [0.01310805]])

In [99]:
# Nested Logit
product_data1 = product_data.copy()
product_data1['nesting_ids'] = 1  # all cereals in the same nest, the outside option is nest 0

groups = product_data1.groupby(['market_ids','nesting_ids'])
product_data1['demand_instruments20'] = groups['shares'].transform(np.size) # number of products in each nest within each market
nl_formaulation = pyblp.Formulation('0 + prices')
problem = pyblp.Problem(nl_formaulation, product_data1)






Initializing the problem ...
Initialized the problem after 00:00:00.

Dimensions:
 T    N     F    K1    MD    H 
---  ----  ---  ----  ----  ---
94   2256   5    1     21    1 

Formulations:
     Column Indices:          0   
--------------------------  ------
X1: Linear Characteristics  prices


In [102]:
nl_results = problem.solve(rho = .7)    # rho is the within-nest correlation, we used rammda in the chalss.

Solving the problem ...

Rho Initial Values:
 All Groups  
-------------
+7.000000E-01

Rho Lower Bounds:
 All Groups  
-------------
+0.000000E+00

Rho Upper Bounds:
 All Groups  
-------------
+9.900000E-01

Starting optimization ...



GMM   Computation  Optimization   Objective   Fixed Point  Contraction  Clipped    Objective      Objective      Projected                 
Step     Time       Iterations   Evaluations  Iterations   Evaluations  Shares       Value       Improvement   Gradient Norm      Theta    
----  -----------  ------------  -----------  -----------  -----------  -------  -------------  -------------  -------------  -------------
 1     00:00:00         0             1            0            0          0     +1.331657E+02                 +6.086235E+02  +7.000000E-01
 1     00:00:00         0             2            0            0          0     +4.727024E+01  +8.589549E+01  +1.624078E+01  +9.900000E-01
 1     00:00:00         1             3            0            0          0     +4.720903E+01  +6.120631E-02  +1.143692E-10  +9.824626E-01

Optimization completed after 00:00:01.
Computing the Hessian and updating the weighting matrix ...
Computed results after 00:00:00.

Problem Results Summary:
G

In [None]:
# Nested Logit  
product_data1 = product_data.copy()
product_data1['nesting_ids'] = product_data1['mushy']  # all cereals in the same nest, the outside option is nest 0
groups = product_data1.groupby(['market_ids','nesting_ids'])
product_data1['demand_instruments20'] = groups['shares'].transform(np.size) # number of products in each nest within each market
nl_formaulation = pyblp.Formulation('0 + prices')
problem = pyblp.Problem(nl_formaulation, product_data1)

nl_results = problem.solve(rho = .7)    # rho is the within-nest correlation, we used rammda in the chalss.

Initializing the problem ...
Initialized the problem after 00:00:00.

Dimensions:
 T    N     F    K1    MD    H 
---  ----  ---  ----  ----  ---
94   2256   5    1     21    2 

Formulations:
     Column Indices:          0   
--------------------------  ------
X1: Linear Characteristics  prices
Solving the problem ...

Rho Initial Values:
 All Groups  
-------------
+7.000000E-01

Rho Lower Bounds:
 All Groups  
-------------
+0.000000E+00

Rho Upper Bounds:
 All Groups  
-------------
+9.900000E-01

Starting optimization ...

GMM   Computation  Optimization   Objective   Fixed Point  Contraction  Clipped    Objective      Objective      Projected                 
Step     Time       Iterations   Evaluations  Iterations   Evaluations  Shares       Value       Improvement   Gradient Norm      Theta    
----  -----------  ------------  -----------  -----------  -----------  -------  -------------  -------------  -------------  -------------
 1     00:00:00         0             1      