##### Q4: General Weighted Linear Regressions

OLS, FGLS, and IV regression with 2SLS are all forms of general weighted linear regression. 

In the case of OLS, the weights T are 1 and do not vary by observation such that T'Y=T'XB + T'u is equivalent to the results of Y=XB+u for a given Y, X, B, and u. 

In the case of FGLS, the weights T are determined by variance-covariance matrix of the errors first produced by OLS, then leveraged to implement GLS. T is not random and is determined based on the covariance matrix of errors.

In weighted 2SLS, the weights are determined by the inverse of the variance-covariance matrix of the instruments. We then conduct the second stage regression, with the weights determined based on the predicted values from the first stage analysis. Again, T here is not random and determined based on the data.

One estimator that is not a form of general weighted linear regression is a logit model, which cannot be written in the form of T'Y=T'XB+T'u. While weights can be factored into the log-likelihood function, the transformation of the equation to accommodate the binary outcome / avoid negative predicted values means that it does not belong in this class of regressions given by this equation.

##### Q5: Simultaneous Equations
#1: The assumption of conformability implies that X, B, T, and u are matrices that have compatible dimensions with one another in each step of our equations. We can use properties of matrix algebra to derive our expected dimensions for each element. 

Given rules of matrix multiplication, X must have the same number of columns as B has rows. If X has N x q dimensions and B has q x K dimensions, XB has N x K dimensions. q is 3x number of covariates
    
Given rules of matrix addition, U must have the same dimensions as XB. If XB has N x K dimensions, U must also have N x K dimensions.
    
Going back to T, given rules of matrix multiplication, T’ must have the same number of columns as XB has rows. That means T must have the same number of rows as XB has rows. If XB has N x K dimensions, T must have N x a dimensions, with T’ having a x N dimensions. 
    
#2: We can extend the code from weighted_regression. Instead of y taking on an n x 1 matrix (with XB and u also taking on the form of n x 1 matrices), we need to adapt the code to take on conformable forms as indicated above.

#3: See code below for extension of the code from the weighted regression python notebook.

#4: We require the following assumptions to estimate the distribution of B for the general weighted linear regression model:

-Some assumption about what the variance of the estimators looks like

-Population orthogonality E[T’u]=0

-Rank q of the expectation E[x’x] 


In [18]:
#REVIEW: Code from weighted_regression originally
#Define random variables
%matplotlib inline
import numpy as np
from scipy.stats import multivariate_normal
from scipy.linalg import inv, sqrtm

k = 3 # Number of observables in T

mu = [0]*k
Sigma=[[1,0.5,0],
       [0.5,2,0],
       [0,0,3]]

T = multivariate_normal(mu,Sigma)

u = multivariate_normal(cov=0.2)

#Construct the Sample
beta = [1/2,1]

D = np.random.random(size=(3,2)) # Generate random 3x2 matrix

N=1000 # Sample size

# Now: Transform rvs into a sample
T = T.rvs(N)

u = u.rvs(N) # Replace u with a sample

X = (T**3)@D  # Note use of ** operator for exponentiation

y = X@beta+u # Note use of @ operator for matrix multiplication

#Estimation
from scipy.linalg import inv, sqrtm

b = np.linalg.lstsq(T.T@X,T.T@y)[0] # lstsqs returns several results

e = y - X@b

print("Original code beta", b)

Original code beta [0.49694153 1.00161419]


  b = np.linalg.lstsq(T.T@X,T.T@y)[0] # lstsqs returns several results


In [19]:
#NOW: We have to extend the code to accommodate the simultaneous regression. Currently, X is taking on a random 3,2 shape when we would like it to incorporate 3 x k shape where k = 3

D_2 = np.random.random(size=(3,k)) #takes on K columns vs 2
print("Original D Shape", D.shape, "New D Shape with col K", D_2.shape)

X_2 = (T**3)@D_2 #Regenerate X with the new D matrix 
print("Original X Shape", X.shape, "New X Shape with col K", X_2.shape)

#Now recalculate beta with X_2 and the original Y values - issue is that original y is still only a 1 x K matrix
b_2 = np.linalg.lstsq(T.T@X_2,T.T@y)[0]
print("Q5 Part 3 beta - b is 1 x 3", b_2)

#We can generate a new random Y with the dimensions N x k to facilitate simultaneous regression
y_2 = np.random.randn(N, k)  #Y now is N x k matrix of dependent variables
print("Original Y Shape", y.shape, "New Y Shape with col K", y_2.shape)
b_3 = np.linalg.lstsq(T.T@X_2,T.T@y_2)[0]
print("Q5 Part 3 beta - b is 3 x 3", b_3)

Original D Shape (3, 2) New D Shape with col K (3, 3)
Original X Shape (1000, 2) New X Shape with col K (1000, 3)
Q5 Part 3 beta - b is 1 x 3 [ 1.20712446  1.06513683 -0.31202634]
Original Y Shape (1000,) New Y Shape with col K (1000, 3)
Q5 Part 3 beta - b is 3 x 3 [[ 0.00477332 -0.00394622  0.00170813]
 [-0.00167804 -0.00063917 -0.00201621]
 [-0.00192061  0.00282819  0.00048257]]


  b_2 = np.linalg.lstsq(T.T@X_2,T.T@y)[0]
  b_3 = np.linalg.lstsq(T.T@X_2,T.T@y_2)[0]
