# 8-2 Matrix Application - Multiple Regression

*Numpy* package can deal with matrices and matrix operations. To create a matrix, you can use np.array() to convert a list of lists to a n-dimensional array.

In [32]:
import numpy as np
A = [[1,2,3],
     [4,5,6],
     [7,8,9]]
A = np.array(A)
A[:,1]

array([2, 5, 8])

In [35]:
a = np.array(range(20))
a[1:9:2]

array([1, 3, 5, 7])

In [5]:
import numpy as np
A = [[1,2,3],
     [4,5,6],
     [7,8,9]]
A = np.array(A)
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Numpy also enables matrix operations, for example, for matrices A and B:
 - Transpose of a matrix - A.T
 - Inverse of a matrix - np.linalg.inv(A)
 - Matrix multiplication - A@B
 - Diagonal of a matrix - np.diagonal(A)
 - Determinant of a matrix - np.linalg.det(A)
 - and [more](https://numpy.org/doc/stable/reference/routines.linalg.html)

In [36]:
A.T

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [37]:
np.linalg.inv(A)

array([[-4.50359963e+15,  9.00719925e+15, -4.50359963e+15],
       [ 9.00719925e+15, -1.80143985e+16,  9.00719925e+15],
       [-4.50359963e+15,  9.00719925e+15, -4.50359963e+15]])

In [38]:
B = np.array([1,3,5])
A@B

array([22, 49, 76])

In [10]:
b = np.array([1,3,5])
# Transpose
print(A.T)

[[1 4 7]
 [2 5 8]
 [3 6 9]]


In [24]:
#Inverse
inv = np.linalg.inv(A)
print(inv)

[[-4.50359963e+15  9.00719925e+15 -4.50359963e+15]
 [ 9.00719925e+15 -1.80143985e+16  9.00719925e+15]
 [-4.50359963e+15  9.00719925e+15 -4.50359963e+15]]


In [14]:
#Multiplication
print(A@b)

[22 49 76]


In [15]:
#Diagonal
print(np.diagonal(A))

[1 5 9]


In [39]:
np.diagonal(A)

array([1, 5, 9])

Next, let's look at one application of matrix in multiple regress.

In [40]:
import wooldridge as woo

df = woo.data("wage1")
df.head()

Unnamed: 0,wage,educ,exper,tenure,nonwhite,female,married,numdep,smsa,northcen,...,trcommpu,trade,services,profserv,profocc,clerocc,servocc,lwage,expersq,tenursq
0,3.1,11,2,0,0,1,0,2,1,0,...,0,0,0,0,0,0,0,1.131402,4,0
1,3.24,12,22,2,0,1,1,3,1,0,...,0,0,1,0,0,0,1,1.175573,484,4
2,3.0,11,2,0,0,0,0,2,0,0,...,0,1,0,0,0,0,0,1.098612,4,0
3,6.0,8,44,28,0,0,1,0,1,0,...,0,0,0,0,0,1,0,1.791759,1936,784
4,5.3,12,7,2,0,0,1,1,0,0,...,0,0,0,0,0,0,0,1.667707,49,4


## 8-2.1 smf

In [41]:
import statsmodels.formula.api as smf
import numpy as np

res = smf.ols("np.log(wage) ~ educ + exper + tenure", data=df).fit()
print(res.summary())

                            OLS Regression Results                            
Dep. Variable:           np.log(wage)   R-squared:                       0.316
Model:                            OLS   Adj. R-squared:                  0.312
Method:                 Least Squares   F-statistic:                     80.39
Date:                Tue, 19 Oct 2021   Prob (F-statistic):           9.13e-43
Time:                        12:27:42   Log-Likelihood:                -313.55
No. Observations:                 526   AIC:                             635.1
Df Residuals:                     522   BIC:                             652.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.2844      0.104      2.729      0.0

### Estimators

In [42]:
res.params

Intercept    0.284360
educ         0.092029
exper        0.004121
tenure       0.022067
dtype: float64

In [3]:
b_h = res.params
b_h

Intercept    0.284360
educ         0.092029
exper        0.004121
tenure       0.022067
dtype: float64

### Make Predictions (In sample)

In [44]:
y_h=res.predict()

In [4]:
y_h = res.predict()
y_h[:5]

array([1.30492063, 1.52350623, 1.30492063, 1.81980234, 1.46168959])

### Residuals

In [48]:
res.resid
np.log(df["wage"])-y_h

0     -0.173519
1     -0.347933
2     -0.206308
3     -0.028043
4      0.206017
         ...   
521    0.849397
522   -0.393112
523   -0.574420
524    0.648055
525   -0.428877
Name: wage, Length: 526, dtype: float64

In [5]:
res.resid

0     -0.173519
1     -0.347933
2     -0.206308
3     -0.028043
4      0.206017
         ...   
521    0.849397
522   -0.393112
523   -0.574420
524    0.648055
525   -0.428877
Length: 526, dtype: float64

### Standard Error of $\hat{\beta}$

In [49]:
res.bse

Intercept    0.104190
educ         0.007330
exper        0.001723
tenure       0.003094
dtype: float64

In [6]:
se = res.bse
se

Intercept    0.104190
educ         0.007330
exper        0.001723
tenure       0.003094
dtype: float64

## 8-2.2 Matrix Solution

### Build design matrix

In [51]:
import patsy as pt
import numpy as np

y, X = pt.dmatrices("np.log(wage) ~ educ + exper + tenure", data=df)

In [55]:
b_h = np.linalg.inv(X.T@X)@X.T@y
b_h

array([[0.28435956],
       [0.09202899],
       [0.00412111],
       [0.02206722]])

In [56]:
res.params

Intercept    0.284360
educ         0.092029
exper        0.004121
tenure       0.022067
dtype: float64

### Get Multiple Regression Estimators

In [8]:
b_h = np.linalg.inv(X.T@X)@X.T@y
b_h = np.array(b_h)
b_h

array([[0.28435956],
       [0.09202899],
       [0.00412111],
       [0.02206722]])

### Make Predictions

In [58]:
y_h = X@b_h
y_h[:5,]

array([[1.30492063],
       [1.52350623],
       [1.30492063],
       [1.81980234],
       [1.46168959]])

### Estimate Residuals

In [60]:
u_h = y-y_h
u_h[:5,]

array([[-0.17351855],
       [-0.3479329 ],
       [-0.20630834],
       [-0.02804287],
       [ 0.20601726]])

### Standard Error of $\hat{\beta}$

In [62]:
s2 = sum(u_h**2)/(len(u_h)-X.shape[1])

In [64]:
import pandas as pd
vcov = s2*np.linalg.inv(X.T@X)
vcov

array([[ 1.08556349e-02, -7.29352619e-04, -8.65548142e-05,
         2.92773346e-05],
       [-7.29352619e-04,  5.37277752e-05,  3.96431088e-06,
        -2.56171608e-06],
       [-8.65548142e-05,  3.96431088e-06,  2.96968431e-06,
        -2.70017814e-06],
       [ 2.92773346e-05, -2.56171608e-06, -2.70017814e-06,
         9.57066532e-06]])

In [66]:
var = np.diagonal(vcov)
var

array([1.08556349e-02, 5.37277752e-05, 2.96968431e-06, 9.57066532e-06])

In [67]:
np.sqrt(var)

array([0.10419038, 0.00732992, 0.00172328, 0.00309365])

In [12]:
vcov = pd.DataFrame(vcov)
var = np.diagonal(vcov)
se = np.sqrt(var)
se

array([0.10419038, 0.00732992, 0.00172328, 0.00309365])