## <center>Empirical IO I: Problem Set 3</center>
### <center>Yinan Wang</center>
#### <center>Oct 2, 2020</center>

## Part 2: Generate Fake Data
1.Draw the exogenous product characteristic $x_{jt}$ for $T=600$
geographically defined markets (e.g., cities). Assume each $x_{jt}$ is equal
to the absolute value of an iid standard normal draw, as is each w$_{jt}$.
Simulate demand and cost unobservables as well, specifying
$$\left(
\begin{array}{c}
\xi _{jt} \\
\omega _{jt}%
\end{array}%
\right) \sim N\left( \left(
\begin{array}{c}
0 \\
0%
\end{array}%
\right) ,\left(
\begin{array}{cc}
1 & 0.25 \\
0.25 & 1%
\end{array}%
\right) \right) \text{ iid across }j,t.$$

In [1]:
import numpy as np
import pandas as pd
import pyblp
from scipy import optimize
import statsmodels.formula.api as sm
from linearmodels.iv import IV2SLS
import matplotlib.pyplot as plt

pyblp.options.digits = 4
pyblp.options.verbose = False
pyblp.__version__

'0.11.0'

In [2]:
# np.random.seed(123)
#M: market number
#J: product number
M = 600
J = 4
gamma = np.array([0.5,0.25])
#generate exogenous product characteristics
x = np.absolute(np.random.normal(0, 1, (J, M)))
w = np.absolute(np.random.normal(0, 1, (J, M)))
#generate unobservables
temp = np.random.multivariate_normal( (0, 0), [[1, 0.25], [0.25, 1]], (J, M))
xi = temp[:,:,0]
omega = temp[:,:,1]
#generate mc
mc = np.exp(gamma[0]+gamma[1]*w+omega/8)


2.Solve for the equilibrium prices for each good in each market.

(a)Start by writing a procedure to approximate the derivatives of market
shares with respect to prices (taking prices, shares, x, and demand
parameters as inputs)

$s_{jt} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \frac{\exp(\beta ^{\left( 1\right) }x_{jt}+\beta
_{i}^{(2)}satellite_{jt}+\beta _{i}^{(3)}wired_{jt}+\alpha p_{jt}+\xi
_{jt})}{1+\sum_j \exp(\beta ^{\left( 1\right) }x_{jt}+\beta
_{i}^{(2)}satellite_{jt}+\beta _{i}^{(3)}wired_{jt}+\alpha p_{jt}+\xi
_{jt})}  \frac{1}{\sqrt {2\pi }} e^{-\frac{(\beta
_{i}^{(2)}-4)^2}{2}} \frac{1}{\sqrt {2\pi }} e^{-\frac{(\beta
_{i}^{(3)}-4)^2}{2}} d\beta
_{i}^{(2)} d\beta
_{i}^{(3)}$

$\partial s_{jt}/\partial p_{kt} =\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} -\alpha s_{ikt} s_{ijt} \frac{1}{\sqrt {2\pi }} e^{-\frac{(\beta
_{i}^{(2)}-4)^2}{2}} \frac{1}{\sqrt {2\pi }} e^{-\frac{(\beta
_{i}^{(3)}-4)^2}{2}} d\beta
_{i}^{(2)} d\beta
_{i}^{(3)}$

$\partial s_{jt}/\partial p_{jt} =\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \alpha s_{ijt} (1-s_{ijt}) \frac{1}{\sqrt {2\pi }} e^{-\frac{(\beta
_{i}^{(2)}-4)^2}{2}} \frac{1}{\sqrt {2\pi }} e^{-\frac{(\beta
_{i}^{(3)}-4)^2}{2}} d\beta
_{i}^{(2)} d\beta
_{i}^{(3)}$

In [3]:
def s_ijt(p,delta,alpha=-2):
    #ep_delta:N*J
    ep_delta = np.exp(delta+alpha*p)
    denom = 1+ep_delta.sum(axis=1).reshape(N,1)
    return ep_delta/denom #N*J

def cal_derivative(p,delta,N,alpha):
    #function to calculate derivative expression
    sijt = s_ijt(p,delta,alpha) #N*J
    #share of each product
    sjt=sijt.mean(axis=0) #J*1
    D = np.empty((N,J,J))
    for i in range(N):
        D[i,:,:] = (np.diag(sijt[i,:]) - np.outer(sijt[i,:],sijt[i,:])) * alpha ## N*J*J
    D = np.mean(D, axis=0) ## J*J         
    return D,sjt

(c)Substituting in your approximation of each $\left( \frac{\partial
s_{jt}(p_{t})}{\partial p_{jt}}\right) $, solve the system of equations ($J\,$equations per market) for the equilibrium prices in each market.

i.Solve J x J nonlinear equations with Monte Carlo Simulation

ii.Morrow and Skerlos (2011) with Monte Carlo Simulation

$p_t →􏰀 c_t + ζ_t(p_t)$ where $ζ_t(p_t) = Λ_t(p_t)^{-1} [H_t^* ⊙ Γ_t(p_t)](p_t − c_t) − Λ_t(p_t)^{-1}s_t(p_t)$

In [417]:
N = 1000
def getDelta(N,x,xi,market=0):
    #generate beta for consumers in each single market
    beta2 = np.random.normal(4, 1, N)
    beta3 = np.random.normal(4, 1, N)
    #beta:N*3
    beta = np.vstack((np.ones(N),beta2,beta3)).T
    #X contains exogenous x, statellite and wired indicator
    X = np.vstack((x[:,market],[1,1,0,0],[0,0,1,1]))
    #mean utility delta - p: N* J
    delta = beta @ X + np.tile(xi[:,market],(N,1))
    return delta # N*J

def equilibrium(p,N,delta,mc,alpha=-2):
    D,sjt= cal_derivative(p,delta,N,alpha)
    return p - mc+sjt/np.diag(D)

# function to calculate lambda and gamma
def lam_gam(p,delta,N,alpha):
    sijt = s_ijt(p,delta,alpha) #N*J
    #share of each product
    sjt=sijt.mean(axis=0) #J*1
    gamma = np.zeros((N,J,J))
    lamda = np.zeros((N,J,J))
    for i in range(N):
        lamda[i,:,:] = np.diag(sijt[i,:])* alpha
        gamma[i,:,:] = np.outer(sijt[i,:], sijt[i,:]) * alpha

    #lamda = np.mean(lamda, axis=0) # J*J
    gamma = np.mean(gamma, axis=0) # J*J
    lam_inv = np.diag(1/sjt)/alpha
    #return lamda, gamma
    return lam_inv,gamma
             
def fp(p,N,delta,mc,alpha=-2):
    D,sjt= cal_derivative(p,delta,N,alpha)
    #lamda,gamma = lam_gam(p,delta,N,alpha)
    lam_inv,gamma = lam_gam(p,delta,N,alpha)
    #return mc + np.linalg.inv(lamda)@(np.identity(J)*gamma)@((p-mc)) - np.linalg.inv(lamda)@(sjt)
    return mc + lam_inv@(np.identity(J)*gamma)@((p-mc)) - lam_inv@(sjt)

def constructData(N,mc,x,w,xi,omega,alpha=-2):
    df1 = pd.DataFrame()
    df2 = pd.DataFrame()
    for t in range(M):
        #draw new set of agent for each market
        delta = getDelta(N,x,xi,t)
        #i.Solve J x J nonlinear equations with Monte Carlo Simulation 
        p1=optimize.root(equilibrium, 3*np.ones(J),args=(N,delta,mc[:,t])).x
        D1,sjt1= cal_derivative(p1,delta,N,alpha)
        temp1 = pd.DataFrame({'x':x[:,t],'w':w[:,t],'sate':[1,1,0,0],'wire':[0,0,1,1],'xi':xi[:,t],'omega':omega[:,t],'market_id':t,'price':p1,'share':sjt1})
        df1 = pd.concat([df1,temp1], ignore_index=True)
#         #ii.Morrow and Skerlos (2011)
#         p2=optimize.fixed_point(fp,mc[:,t],args=(N,delta,mc[:,t]),xtol=1e-14)
#         D2,sjt2= cal_derivative(p2,delta,N,alpha)
#         temp2 = pd.DataFrame({'x':x[:,t],'w':w[:,t],'sate':[1,1,0,0],'wire':[0,0,1,1],'xi':xi[:,t],'omega':omega[:,t],'market_id':t,'price':p2,'share':sjt2})
#         df2 = pd.concat([df2,temp2], ignore_index=True)
    return df1

df1= constructData(N,mc,x,w,xi,omega)

i.Solve J x J nonlinear equations with Gaussian Quadrature

ii.Morrow and Skerlos (2011) with Gaussian Quadrature

In [4]:
def f1(p,x,beta2,beta3,xi,alpha=-2):
    ep_delta = np.exp(x+beta2*np.array([1,1,0,0])+beta3*np.array([0,0,1,1])+alpha*p+xi)
    denom = 1+np.sum(ep_delta)
    return ep_delta/denom

def f2(p,x,beta2,beta3,xi,alpha=-2):
    sj = f1(p,x,beta2,beta3,xi,alpha=-2)
    D = (np.diag(sj) - np.outer(sj,sj) ) * alpha
    return D

def f3(p,x,beta2,beta3,xi,alpha=-2):
    sj = f1(p,x,beta2,beta3,xi,alpha=-2)
    gamma = np.outer(sj,sj)
    return gamma

def int_2D(f,p,k,x,xi,alpha=-2):
    beta2,w2 = np.polynomial.hermite.hermgauss(k)
    beta3,w3 = np.polynomial.hermite.hermgauss(k)
    to_sum = np.zeros(J)
    for i in range(k):
        for j in range(k):
            x1 = np.sqrt(2)*1*beta2[i]+4
            x2 = np.sqrt(2)*1*beta3[j]+4
            to_sum = to_sum+w2[i]*w3[j]*f(p,x,x1,x2,xi,alpha)
    return (1/np.pi)*to_sum

def cal_derivative(f1,f2,p,x,xi,alpha=-2):
    #function to calculate derivative in integral form
    sjt = int_2D(f1,p,9,x,xi,alpha=-2) #J*1
    D = int_2D(f2,p,9,x,xi,alpha=-2) #J*J        
    return D,sjt

def equilibrium(p,f1,f2,x,xi,mc,alpha=-2):
    D,sjt= cal_derivative(f1,f2,p,x,xi,alpha=-2)
    return p - mc+sjt/np.diag(D)

def fp(p,f1,f3,x,xi,mc,alpha=-2):
    sjt = int_2D(f1,p,9,x,xi,alpha=-2)
    lamda = np.diag(sjt)*alpha
    gamma = int_2D(f3,p,9,x,xi,alpha=-2)*alpha

    return mc + np.linalg.inv(lamda)@(np.identity(J)*gamma)@((p-mc)) - np.linalg.inv(lamda)@(sjt)
    
def constructData(mc,x,w,xi,omega,f1,f2,alpha=-2):
    df1 = pd.DataFrame()
    df2 = pd.DataFrame()
    for t in range(M):
        #i.Solve J x J nonlinear equations with Gaussian Quadrature 
        p1=optimize.root(equilibrium, 3*np.ones(J),args=(f1,f2,x[:,t],xi[:,t],mc[:,t])).x
        D1,sjt1= cal_derivative(f1,f2,p1,x[:,t],xi[:,t],alpha=-2)
        temp1 = pd.DataFrame({'x':x[:,t],'w':w[:,t],'sate':[1,1,0,0],'wire':[0,0,1,1],'xi':xi[:,t],'omega':omega[:,t],'market_id':t,'price':p1,'share':sjt1})
        df1 = pd.concat([df1,temp1], ignore_index=True)
#         #ii.Morrow and Skerlos (2011)
#         p2=optimize.fixed_point(fp,mc[:,t],args=(f1,f3,x[:,t],xi[:,t],mc[:,t]),xtol=1e-14)
#         sjt2= int_2D(f1,p2,9,x[:,t],xi[:,t],alpha=-2)
#         temp2 = pd.DataFrame({'x':x[:,t],'w':w[:,t],'sate':[1,1,0,0],'wire':[0,0,1,1],'xi':xi[:,t],'omega':omega[:,t],'market_id':t,'price':p2,'share':sjt2})
#         df2 = pd.concat([df2,temp2], ignore_index=True)
    return df1

t1 = constructData(mc,x,w,xi,omega,f1,f2,alpha=-2)

## Part 3: Estimate Some Mis-specified Models

4.Estimate the plain multinomial logit model of demand by OLS\
(ignoring the endogeneity of prices).

In [5]:
df = t1.copy()
outside_share = df.groupby(['market_id'],as_index=False)['share'].sum()
outside_share['out_share']=1-outside_share['share']
df_prod = df.merge(outside_share[['market_id','out_share']],on=['market_id'])
df_prod['log_share_diff'] = np.log(df_prod['share'])-np.log(df_prod['out_share'])
df_prod.head()

Unnamed: 0,x,w,sate,wire,xi,omega,market_id,price,share,out_share,log_share_diff
0,2.924251,0.851108,1,0,-0.520681,-0.348167,0,2.894833,0.360672,0.18812,0.650888
1,0.111376,0.6935,1,0,-0.347674,-0.01549,0,2.497121,0.057027,0.18812,-1.193555
2,3.267158,0.284827,0,1,-0.714397,0.884603,0,2.964726,0.370865,0.18812,0.678756
3,0.531646,1.912153,0,1,0.523824,1.18398,0,3.599421,0.023316,0.18812,-2.087924
4,0.543815,0.038096,1,0,-1.093581,-0.869709,1,2.110673,0.158911,0.342183,-0.767001


In [507]:
ols_result = sm.ols(
    formula="log_share_diff ~ -1 + x + sate + wire + price",
    data=df_prod).fit()
ols_result.summary()

0,1,2,3
Dep. Variable:,log_share_diff,R-squared:,0.325
Model:,OLS,Adj. R-squared:,0.325
Method:,Least Squares,F-statistic:,385.4
Date:,"Thu, 08 Oct 2020",Prob (F-statistic):,2.9599999999999997e-204
Time:,23:31:39,Log-Likelihood:,-3120.3
No. Observations:,2400,AIC:,6249.0
Df Residuals:,2396,BIC:,6272.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
x,0.8900,0.031,29.020,0.000,0.830,0.950
sate,1.4198,0.128,11.125,0.000,1.170,1.670
wire,1.4431,0.128,11.284,0.000,1.192,1.694
price,-1.0049,0.046,-21.846,0.000,-1.095,-0.915

0,1,2,3
Omnibus:,26.212,Durbin-Watson:,2.033
Prob(Omnibus):,0.0,Jarque-Bera (JB):,26.94
Skew:,-0.259,Prob(JB):,1.41e-06
Kurtosis:,2.964,Cond. No.,30.2


5.Re-estimate the multinomial logit model of demand by two-stage
least squares, instrumenting for prices with the exogenous demand shifters $%
x $ and excluded cost shifters w. Discuss how the results differ from those
obtained by OLS.

In [508]:
iv_result = IV2SLS.from_formula( 
    'log_share_diff ~ -1 + x + sate + wire + [price ~  w]',
    data=df_prod).fit()
iv_result.summary

0,1,2,3
Dep. Variable:,log_share_diff,R-squared:,0.2008
Estimator:,IV-2SLS,Adj. R-squared:,0.1998
No. Observations:,2400,F-statistic:,2398.1
Date:,"Thu, Oct 08 2020",P-value (F-stat),0.0000
Time:,23:31:42,Distribution:,chi2(4)
Cov. Estimator:,robust,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
x,0.9861,0.0339,29.130,0.0000,0.9197,1.0524
sate,4.0039,0.1879,21.308,0.0000,3.6356,4.3722
wire,4.0345,0.1888,21.373,0.0000,3.6645,4.4045
price,-1.9730,0.0689,-28.651,0.0000,-2.1080,-1.8380


The 2SLS estimates are more accurate than OLS result, which means price is correlated with the unobserved error term in the demand equation. By instrumenting for prices in 2SLS, we get less biased estimates.

6.Now estimate a nested logit model by two-stage least squares,treating satellite and wired as the two nests for the inside goods. 

$\ln(s_j) - \ln(s_0) = x_j \beta - \alpha p_j + \sigma_1 \ln(s_{j/g1}) + \sigma_2 \ln(s_{j/g2}) + \xi_j$. The within group share is endogenous, suggesting the need for additional exogenous variables that are correlated with the within group share. These variables might include the characteristics of other firms in the group.

This model is misspecified because it assumes beta is constant for each consumer, while we have random coefficents model in truth.

In [6]:
#Add column of within nest share and instruments
df_nest = df_prod.copy()
groups=df_nest.groupby(['market_id', 'sate'],as_index=False,sort=False)
temp = groups.agg({'share':sum}).rename(columns={'share':'share_sate'})
df_temp = df_nest.merge(temp,on=['market_id','sate'])
temp = groups['x'].sum().rename(columns={'x':'sum'})
df_temp = df_temp.merge(temp,on=['market_id','sate'])
df_nest['share_within_sate'] = df_temp['share']/df_temp['share_sate']
df_nest['log_share_within_sate'] = np.log(df_nest['share_within_sate'])
df_nest['iv_sate_nest'] = df_temp['sum'] - df_temp['x']

groups= df_nest.groupby(['market_id', 'wire'],as_index=False,sort=False)
temp = groups.agg({'share':sum}).rename(columns={'share':'share_wire'})
df_temp = df_nest.merge(temp,on=['market_id','wire'])
temp = groups['x'].sum().rename(columns={'x':'sum'})
df_temp = df_temp.merge(temp,on=['market_id','wire'])
df_nest['share_within_wire'] = df_temp['share']/df_temp['share_wire']
df_nest['log_share_within_wire'] =np.log(df_nest['share_within_wire'])
df_nest['iv_wire_nest'] = df_temp['sum'] - df_temp['x']

df_nest.loc[df_nest.sate==1,'log_share_within_wire']=0
df_nest.loc[df_nest.sate==1,'share_within_wire']=0
df_nest.loc[df_nest.sate==1,'iv_wire_nest']=0
df_nest.loc[df_nest.wire==1,'log_share_within_sate']=0
df_nest.loc[df_nest.wire==1,'share_within_sate']=0
df_nest.loc[df_nest.wire==1,'iv_sate_nest']=0

In [7]:
iv_result = IV2SLS.from_formula( 
    'log_share_diff ~ -1 + x + sate + wire + [log_share_within_sate + log_share_within_wire + price ~ iv_sate_nest + iv_wire_nest + w ]',
    data=df_nest).fit()
iv_result.summary

0,1,2,3
Dep. Variable:,log_share_diff,R-squared:,0.2779
Estimator:,IV-2SLS,Adj. R-squared:,0.2764
No. Observations:,2400,F-statistic:,2384.1
Date:,"Mon, Oct 12 2020",P-value (F-stat),0.0000
Time:,10:31:40,Distribution:,chi2(6)
Cov. Estimator:,robust,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
x,0.8102,0.0442,18.321,0.0000,0.7236,0.8969
sate,3.4636,0.2449,14.146,0.0000,2.9837,3.9435
wire,3.2982,0.2096,15.735,0.0000,2.8874,3.7090
log_share_within_sate,0.2644,0.0956,2.7649,0.0057,0.0770,0.4519
log_share_within_wire,0.1025,0.1205,0.8510,0.3947,-0.1336,0.3387
price,-1.6196,0.1021,-15.856,0.0000,-1.8198,-1.4194


7. Using the nested logit results, provide a table comparing the estimated own-price elasticities to the true own-price elasticities.Provide two additional tables showing the true matrix of diversion ratios and the diversion ratios implied by your estimates.

$$s_{jt} = s_{j|h(j)t} \cdot s_{h(j)t} =
\frac{\exp[\delta_{jt}/(1-\sigma_{h(j)})]}{D_{h(j)t}}\cdot \frac{D_{h(j)t^{(1-\sigma_{h(j)})}}}{\sum_{h(j)} D_{h(j)t}^{1-\sigma_{h(j)}}} = \frac{\exp[\delta_{jt}/(1-\sigma_{h(j)t}]}{D_{h(j)t}^{\sigma_{h(j)}}[\sum_{h(j)}D_{h(j)t}^{1-\sigma_{h(j)}}]},$$
where 
$$
D_{h(j)t} = \sum_{j \in h(j)} \exp[\delta_{jt}/(1-\sigma_{h(j)})].
$$

Taking first order derivatives,
$$
\begin{aligned}
\frac{\partial s_{jt}}{\partial p_{jt}} &= \frac{\partial s_{j|h(j)t}}{\partial p_{jt}} \cdot s_{h(j)t} + s_{j|h(j)t} \cdot \frac{\partial s_{h(j)t}}{\partial p_{jt}} \\
&= \left[ \frac{\alpha}{1-\sigma_{h(j)}} s_{j|h(j)t} (1-s_{j|h(j)t}) \right] \cdot s_{h(j)t} + s_{j|h(j)t} \cdot \left[ \alpha \cdot s_{h(j)t} (1-s_{h(j)t}) \cdot s_{j|h(j)t} \right] \\
&= \alpha \cdot s_{h(j)t} \cdot s_{j|h(j)t} \cdot \left[ \frac{1-s_{j|h(j)t}}{1-\sigma_{h(j)}} + (1-s_{h(j)t}) \cdot s_{j|h(j)t} \right] \\
&= \alpha \frac{1}{1-\sigma_{h(j)}} \cdot s_{jt} \cdot \left[ 1 -\sigma_{h(j)} s_{j|h(j)t} -(1 - \sigma_{h(j)})s_{jt}  \right].
\end{aligned}
$$

The own price elasticies are 
$$
\frac{p_{jt}}{s_{jt}} \cdot \frac{\partial s_{jt}}{\partial p_{jt}} = \alpha \frac{1}{1-\sigma_{h(j)}} \cdot p_{jt} \cdot \left[ 1 -\sigma_{h(j)} s_{j|h(j)t} -(1 - \sigma_{h(j)})s_{jt}  \right].
$$

In [8]:
# calculate elasticities implied by nested logit
alpha_hat = iv_result.params['price']
df_nest['sigma'] = np.tile(np.repeat([iv_result.params['log_share_within_sate'], iv_result.params['log_share_within_wire']],2),M)
df_nest['elas_nl'] = 0
df_nest.loc[df_nest.sate==1,'elas_nl']= alpha_hat * df_nest['price'] /(1-df_nest['sigma']) *(1 - df_nest['sigma']* df_nest['share_within_sate'] - (1 - df_nest['sigma'])*df_nest['share'])
df_nest.loc[df_nest.wire==1,'elas_nl']= alpha_hat * df_nest['price'] /(1-df_nest['sigma']) *(1 - df_nest['sigma']* df_nest['share_within_wire'] - (1 - df_nest['sigma'])*df_nest['share'])
df_nest.head()

Unnamed: 0,x,w,sate,wire,xi,omega,market_id,price,share,out_share,log_share_diff,share_within_sate,log_share_within_sate,iv_sate_nest,share_within_wire,log_share_within_wire,iv_wire_nest,sigma,elas_nl
0,2.924251,0.851108,1,0,-0.520681,-0.348167,0,2.894833,0.360672,0.18812,0.650888,0.863473,-0.146792,0.111376,0.0,0.0,0.0,0.264417,-3.227628
1,0.111376,0.6935,1,0,-0.347674,-0.01549,0,2.497121,0.057027,0.18812,-1.193555,0.136527,-1.991236,2.924251,0.0,0.0,0.0,0.264417,-5.069117
2,3.267158,0.284827,0,1,-0.714397,0.884603,0,2.964726,0.370865,0.18812,0.678756,0.0,0.0,0.0,0.940849,-0.060973,0.531646,0.10253,-3.053407
3,0.531646,1.912153,0,1,0.523824,1.18398,0,3.599421,0.023316,0.18812,-2.087924,0.0,0.0,0.0,0.059151,-2.827654,3.267158,0.10253,-6.320419
4,0.543815,0.038096,1,0,-1.093581,-0.869709,1,2.110673,0.158911,0.342183,-0.767001,0.327935,-1.11494,0.63149,0.0,0.0,0.0,0.264417,-3.701132


In [566]:
#calculate true elasticities
for t in range(M):
    p = np.array(df_nest.loc[df_nest.market_id == t, 'price'])
    #calculate derivative matrix
    D = int_2D(f2,p,9,x[:,t],xi[:,t],alpha=-2)
    s = np.array(df_nest.loc[df_nest.market_id == t, 'share'])
    df_nest.loc[df_nest.market_id == t, 'elas_true'] = p/s * np.diag(D)

In [568]:
#Compare two results
df_nest[['elas_nl','elas_true']]

Unnamed: 0,elas_nl,elas_true
0,-2.783156,-2.568119
1,-3.854087,-3.435353
2,-5.596801,-4.927884
3,-4.224588,-3.834164
4,-3.790240,-3.403372
...,...,...
2395,-6.830893,-6.330328
2396,-5.068176,-4.684241
2397,-5.242283,-4.976993
2398,-3.898376,-3.772594


If j and k in both group
$$
\begin{aligned}
\frac{\partial s_{jt}}{\partial p_{kt}} &= \frac{\partial s_{j|h(j)t}}{\partial p_{kt}} \cdot s_{h(j)t} + s_{j|h(j)t} \cdot \frac{\partial s_{h(j)t}}{\partial p_{kt}} \\
&= -\alpha s_{kt} (s_{jt} + \frac{\sigma_{h(j)t}}{1 - \sigma_{h(j)t}} s_{j|h(j)t}) 
\end{aligned}
$$
$$
D_{jk} = \frac{\frac{\partial s_{kt}}{\partial p_{jt}}}{\frac{\partial s_{jt}}{\partial p_{jt}}} = -\frac{(1-\sigma_{h(j)t}) s_{kt} + \sigma_{h(j)t} s_{k|h(j)t}}{1-\sigma_{h(j)t} s_{j|h(j)t} - (1 - \sigma_{h(j)t})s_j}
$$

If j and k in different group
$$
\begin{aligned}
\frac{\partial s_{jt}}{\partial p_{kt}}
&= -\alpha s_{kt} s_{jt}
\end{aligned}
$$
$$
D_{jk} = \frac{\frac{\partial s_{kt}}{\partial p_{jt}}}{\frac{\partial s_{jt}}{\partial p_{jt}}} = -\frac{(1-\sigma_{h(j)t})s_{kt}}{1-\sigma_{h(j)t} s_{j|h(j)t} - (1 - \sigma_{h(j)t})s_j}
$$

In [11]:
# calculate diversion ratio implied by nested logit
D = np.zeros((M,J,J))
for t in range(M):
    s = np.array(df_nest.loc[df_nest.market_id == t, 'share'])
    s1 = np.array(df_nest.loc[df_nest.market_id == t, 'share_within_sate'])
    s2 = np.array(df_nest.loc[df_nest.market_id == t, 'share_within_wire'])
    sigma = np.array(df_nest.loc[df_nest.market_id == t, 'sigma'])
    D[t,:,:] = -np.diag(np.ones(J))
    for j in range(J):
        for k in range(J):
            if j==k:
                continue
            elif (j==0 and k==1) or (j==1 and k==0):
                D[t,j,k] = ((1-sigma[j])*s[k]+sigma[j]*s1[k])/(1-sigma[j]*s1[j]-(1-sigma[j])*s[j])
                continue
            elif (j==2 and k==3) or (j==3 and k==2):
                D[t,j,k] = ((1-sigma[j])*s[k]+sigma[j]*s2[k])/(1-sigma[j]*s2[j]-(1-sigma[j])*s[j])
                continue
            elif (j==0) or (j==1):
                D[t,j,k] = ((1-sigma[j])*s[k])/(1-sigma[j]*s1[j]-(1-sigma[j])*s[j])
                continue
            elif (j==2) or (j==3):
                D[t,j,k] = ((1-sigma[j])*s[k])/(1-sigma[j]*s2[j]-(1-sigma[j])*s[j])
                continue
D.mean(axis=0)

array([[-1.        ,  0.34658619,  0.18737   ,  0.19009193],
       [ 0.3514555 , -1.        ,  0.19043165,  0.1863962 ],
       [ 0.21483204,  0.21391553, -1.        ,  0.26516022],
       [ 0.21719479,  0.21037773,  0.26613271, -1.        ]])

In [12]:
# calculate true diversion ratio
D_true = np.zeros((M,J,J))
for t in range(M):
    p = np.array(df_nest.loc[df_nest.market_id == t, 'price'])
    #calculate derivative matrix
    D = int_2D(f2,p,9,x[:,t],xi[:,t],alpha=-2)
    for j in range(J):
        for k in range(J):
            D_true[t,j,k] = D[k,j]/np.abs(D[j,j])
D_true.mean(axis=0)

array([[-1.        ,  0.30913176,  0.1799187 ,  0.18211681],
       [ 0.31388677, -1.        ,  0.18215599,  0.17898959],
       [ 0.18375991,  0.18326345, -1.        ,  0.30619021],
       [ 0.18597317,  0.17968792,  0.30799805, -1.        ]])

## Part 4: Estimate the Correctly Specified Model

8.Report a table with the estimates of the demand parameters and standard errors. Do this three times: once when you estimate demand
alone, then again when you estimate jointly with supply; and again with the `optimal IV'.

In [56]:
product_data = t1.copy().rename(columns={'market_id':'market_ids','share':'shares','price':'prices'})
#product_data['nesting_ids'] = product_data['sate']
product_data['product_ids'] = product_data.groupby('market_ids').cumcount()
product_data['firm_ids'] = product_data['product_ids']
product_data['demand_instruments0'] = product_data.groupby(['market_ids'])['x'].transform('sum') - product_data['x']
product_data['demand_instruments1'] = product_data.groupby(['market_ids'])['w'].transform('sum') - product_data['w']
product_data['demand_instruments2'] = product_data['w']

bfgs = pyblp.Optimization('bfgs', {'gtol': 1e-5})
pr_integration = pyblp.Integration('product', size=9)


In [46]:
#Demand side alone
def solve_nl(df):
#     groups = df.groupby(['market_ids', 'nesting_ids'])
#     df['group_share'] = groups['shares'].transform(np.sum)
#     df['within_share'] = df['shares'] / df['group_share']
#     df['demand_instruments3'] = groups['shares'].transform(np.size)
    X1_formulation = pyblp.Formulation('0 + prices + x + sate + wire')
    X2_formulation = pyblp.Formulation('0 + sate + wire')
    product_formulations1 = (X1_formulation, X2_formulation)
#     problem = pyblp.Problem(product_formulations1, df.drop(columns=['nesting_ids']),integration=mc_integration)
    problem = pyblp.Problem(product_formulations1, df, integration=pr_integration)
    return problem.solve(sigma = np.eye(2),beta=[-2, 1,3,3], optimization=bfgs)

nl_results1 = solve_nl(product_data)
nl_results1

Problem Results Summary:
GMM   Objective    Gradient       Hessian         Hessian     Clipped  Weighting Matrix  Covariance Matrix
Step    Value        Norm     Min Eigenvalue  Max Eigenvalue  Shares   Condition Number  Condition Number 
----  ----------  ----------  --------------  --------------  -------  ----------------  -----------------
 2    +9.832E-16  +3.252E-06    -2.379E-07      +4.187E+04       0        +2.216E+01        +5.189E+17    

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:02:11       Yes          18           28         113387       348954   

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
Sigma:      sate          wire    
------  ------------  ------------
 sate    +6.978E-01               
        (+6.859E-02)              
                 

In [59]:
#Demand + Supply
product_data = t1.copy().rename(columns={'market_id':'market_ids','share':'shares','price':'prices'})
product_data['product_ids'] = product_data.groupby('market_ids').cumcount()
product_data['firm_ids'] = product_data['product_ids']
product_data['demand_instruments0'] = product_data['w']
product_data['demand_instruments1'] = product_data.groupby(['market_ids'])['x'].transform('sum') - product_data['x']
product_data['supply_instruments0'] = product_data['x']
product_data['supply_instruments1'] = product_data.groupby(['market_ids'])['w'].transform('sum') - product_data['w']


bfgs = pyblp.Optimization('bfgs', {'gtol': 1e-5})
pr_integration = pyblp.Integration('product', size=9)

X1_formulation = pyblp.Formulation('0 + prices+ x + sate + wire')
X2_formulation = pyblp.Formulation('0 + sate + wire')
X3_formulation = pyblp.Formulation('1 + w')

product_formulations2 = (X1_formulation, X2_formulation,X3_formulation)
problem2 = pyblp.Problem(product_formulations2, product_data, integration=pr_integration)

results2 = problem2.solve(
    sigma=np.eye(2),
    beta=[-2, 1, 3, 3]
)

results2

Problem Results Summary:
GMM   Objective     Projected    Reduced Hessian  Reduced Hessian  Clipped  Weighting Matrix  Covariance Matrix
Step    Value     Gradient Norm  Min Eigenvalue   Max Eigenvalue   Shares   Condition Number  Condition Number 
----  ----------  -------------  ---------------  ---------------  -------  ----------------  -----------------
 2    +9.057E-02   +1.529E-09      +2.795E-01       +4.450E+04        0        +1.545E+03        +2.022E+18    

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:13:02       Yes          98           169        633971       1967847  

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
Sigma:      sate          wire    
------  ------------  ------------
 sate    +7.467E-01               
        (+4.368E-01)            

In [58]:
#optimal IV
instrument_results = results2.compute_optimal_instruments(method='empirical')
problem3 = instrument_results.to_problem()

results3 = problem3.solve(
    sigma=np.diag([1,1]),
    beta=[-1, 1, 2, 2])

results3

Problem Results Summary:
GMM   Objective     Projected    Reduced Hessian  Reduced Hessian  Clipped  Weighting Matrix  Covariance Matrix
Step    Value     Gradient Norm  Min Eigenvalue   Max Eigenvalue   Shares   Condition Number  Condition Number 
----  ----------  -------------  ---------------  ---------------  -------  ----------------  -----------------
 2    +2.258E+00   +1.311E-07      +1.460E+00       +5.580E+04        0        +1.051E+16        +1.532E+04    

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:12:32       No           65           162        654808       2026102  

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
Sigma:      sate          wire    
------  ------------  ------------
 sate    +5.824E-01               
        (+2.792E-01)            

9.Using your preferred estimates from the prior step (explain your preference), provide a table comparing the estimated own-price elasticities to the true own-price elasticities. Provide two additional tables showing the true matrix of diversion ratios and the diversion ratios implied by your estimates.

In [626]:
elas = results3.compute_elasticities()
D = results3.compute_diversion_ratios()
elasticities = elas.reshape((M,J,J).mean(axis=0)
diversion_ratios = D.reshape((M,J,J).mean(axis=0)

array([[0.16716706, 0.27429558, 0.1967602 , 0.36177716],
       [0.58972558, 0.09502401, 0.10980639, 0.20544402],
       [0.56729408, 0.14725413, 0.09026751, 0.19518428],
       ...,
       [0.12831389, 0.52233938, 0.27828808, 0.07105865],
       [0.14052913, 0.20888911, 0.57278591, 0.07779586],
       [0.11478452, 0.17062672, 0.2488659 , 0.46572285]])

## Part 5: Merger Simulation

10.Suppose two of the four firms were to merge. Give a brief intuition for what theory tells us is likely to happen to the equilibrium prices of each good j.

if two firms were merged, then they will gain more market power and reduce competition, which could lead to higher prices for consumers. The non-partipant firms who produce close substitutes will also increase their price slightly. Therefore, prices of all goods would increase.

11.Suppose firms 1 and 2 are proposing to merge. Use the pyBLP merger simulation procedure to provide a prediction of the post-merger equilibrium prices.

In [24]:
#Original prices
np.array(product_data['prices']).reshape(M,J).mean(axis=0)

array([2.7400316 , 2.74062395, 2.7586635 , 2.7330697 ])

In [20]:
costs = results3.compute_costs()

df_M = product_data.copy()
df_M['merger_ids'] = df_M['firm_ids'].replace(1,0)

changed_prices = results3.compute_prices(
    firm_ids=df_M['merger_ids'],
    costs=costs
)
changed_prices.reshape(M,J).mean(axis=0)

array([2.89452682, 2.89751346, 2.76727867, 2.74171211])

12.Now suppose instead that firms 1 and 3 are the ones to merge. Re-run the merger simulation. Provide a table comparing the (average across markets) predicted merger-induced price changes for this merger and that in part 11. Interpret the differences between the predictions for the two mergers.

In [21]:
df_M = product_data.copy()
df_M['merger_ids'] = df_M['firm_ids'].replace(3,0)

changed_prices = results3.compute_prices(
    firm_ids=df_M['merger_ids'],
    costs=costs
)
changed_prices.reshape(M,J).mean(axis=0)

array([2.89605225, 2.74931134, 2.76695543, 2.8916927 ])

13.Thus far you have assumed that there are no efficiencies (reduction in costs) resulting from the merger. Explain briefly why a merger-specific reduction in marginal cost could mean that a merger is welfare-enhancing.

If marginal cost decrease, the merger could set lower prices and will produce more goods. Thus, aggregate output will increase and we will have larger social welfare.

14.Using the pyBlp software, re-run the merger simulation with the 15% cost saving. Show the predicted post-merger price changes.What is the predicted impact of the merger on consumer welfare

In [25]:
df_M = product_data.copy()
df_M['merger_ids'] = df_M['firm_ids'].replace(1,0)

merger_costs = costs.copy()
merger_costs[df_M.merger_ids==0] = 0.85*merger_costs[df_M.merger_ids==0]

changed_prices = results3.compute_prices(
    firm_ids=df_M['merger_ids'],
    costs=merger_costs
)
changed_prices.reshape(M,J).mean(axis=0)

array([2.69075241, 2.69329106, 2.75017037, 2.72446788])

In [27]:
cs = results3.compute_consumer_surpluses()
csnew = results3.compute_consumer_surpluses(changed_prices)
print(cs.mean(axis=0))
print(csnew.mean(axis=0))

[0.77404333]
[0.80308831]


In [34]:
np.concatenate((cs, csnew),axis=1)

array([[0.91673983, 0.9770986 ],
       [0.58845696, 0.57450305],
       [0.819028  , 0.88305061],
       ...,
       [0.9266044 , 0.89536147],
       [0.68589371, 0.74211465],
       [0.74915626, 0.81388573]])

15.Explain why this additional assumption (or data on the correct values of Mt) is needed here, whereas up to this point it was without loss to assume Mt = 1. What is the predicted impact of the merger on total welfare?\
Becasue change of consumer surplus will depend on market share of the merger in each market. For markets with higher share on firm 1 and firm 2, consumer surplus increases while decreases for the other case. Therefore, we will need total measure of consumers to calcuate impact of welfare. 