# Topics in Econometrics and Data Science: Tutorial 3

#### General Note

You will very likely find the solution to these exercises online. We, however, strongly encourage you to work on these exercises without doing so. Understanding someone elseâ€™s solution is very different from coming up with your own. Use the lecture notes and try to solve the exercises independently.

## Exercise 1: Linear Algebra with `NumPy` 

Make sure that you have installed the module *numpy* by

In [2]:
import numpy as np

Let us consider the matrix $X$

In [3]:
X=np.matrix([[2,3,4],[9,33,5],[3,12,35],[23,16,2]])
print(X)

[[ 2  3  4]
 [ 9 33  5]
 [ 3 12 35]
 [23 16  2]]


There are different methods for matrices that can be applied column or row-wise. For example, one can calculate the column-sums by

In [4]:
print(X.sum(0))

[[37 64 46]]


Enter the follwing arrays in Python (matrices and vectors):

$A = \begin{bmatrix}
1 & 2 & 3\\ 
4 & 5 & 6\\ 
7 & 8 & 9
\end{bmatrix}$

$x = \begin{bmatrix}
1\\ 
3\\ 
4
\end{bmatrix}$

$B = \begin{bmatrix}
-0.1 & -0.2 & -0.3\\ 
3 & 10 & 2\\ 
4 & 2 & 0.5
\end{bmatrix} $

a) Calculate the row-sums of matrix $A$

b) Calculate the mean of each column in matrix $B$:

Perform the following operations:

c) $A \bullet B$ (matrix multiplication)

d) $A \bullet x$

e) $A \circ B$ (element-wise multiplication) Hint: np.array()

In [5]:
A=np.matrix([[1,2,3],[4,5,6],[7,8,9]])
B=np.matrix([[-0.1,-0.2,-0.3],[3,10,2],[4,2,0.5]])
x=np.matrix([[1],[3],[4]])

In [6]:
# a)
A.sum(axis = 1)

matrix([[ 6],
        [15],
        [24]])

In [7]:
# b)
B.mean(axis = 0)

matrix([[2.3       , 3.93333333, 0.73333333]])

In [7]:
# c)
A*B

matrix([[17.9, 25.8,  5.2],
        [38.6, 61.2, 11.8],
        [59.3, 96.6, 18.4]])

In [8]:
# d)
A*x

matrix([[19],
        [43],
        [67]])

In [None]:
# e)
#A=np.array(np.matrix([[1,2,3],[4,5,6],[7,8,9]]))
#B=np.array(np.matrix([[-0.1,-0.2,-0.3],[3,10,2],[4,2,0.5]]))

A = np.array(A)
B = np.array(B)
A*B

array([[-0.1, -0.4, -0.9],
       [12. , 50. , 12. ],
       [28. , 16. ,  4.5]])

Alternative solution for matrix multiplication:

In [None]:
# New Version (deprecated np.matrix), use dot function for matrix multiplication
X=np.array([[2,3,4],[9,33,5],[3,12,35],[23,16,2]])
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
B=np.array([[-0.1,-0.2,-0.3],[3,10,2],[4,2,0.5]])
x=np.array([[1],[3],[4]])

# a)
print(A.sum(1))

# b)
print(B.mean(0))

# c)
print(A.dot(B))

# d)
print(A.dot(x))

[ 6 15 24]
[2.3        3.93333333 0.73333333]
[[17.9 25.8  5.2]
 [38.6 61.2 11.8]
 [59.3 96.6 18.4]]
[[19]
 [43]
 [67]]


## Exercise 2: Working with `NumPy` Arrays 

Your cell-phone bill varied the last year from month to month the following way

$$ 22\ 31\ 18\ 35\ 29 \ 32\ 19\ 23\ 23\ 25\ 20\ 33$$

* What is the largest amount you spent in a month? What is the smallest? 

* Find the total amount and the average amount per month that you spent in the last year!

* How many months was the amount greater than 30$? 

* Is it worth to enter into a flatrate (25$ per month) for the next year expecting on average the same bills?

In [None]:
import numpy as np
bill = np.array([22,31,18,35,29,32,19,23,23,25,20,33])

print(np.max(bill))

print(np.min(bill))

print(np.sum(bill))

print(np.mean(bill))

print(bill > 30)
print(np.sum(bill > 30))

np.sum(bill)-25*12

35
18
310
25.833333333333332
[False  True False  True False  True False False False False False  True]
4


10

## Exercise 3: Linear Regression with `NumPy`
The linear regression is a basic concept in statistics. Consider the linear regression model
$$
y_i = \beta_0 + \beta_1X_i + \varepsilon_i 
$$
One can estimate the coefficients $\mathbf{\hat{\beta}}$ by solving the following matrix equation:
$$
\boldsymbol{\hat{\beta}}=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}
$$
The predicted values $\mathbf{\hat{y}}$ are calculated as:
$$
\mathbf{\hat{y}} = \mathbf{X}\boldsymbol{\hat{\beta}}
$$

Calculate $\boldsymbol{\hat{\beta}}$ and the residuals $\boldsymbol{\hat{\varepsilon}} = \mathbf{y} - \mathbf{\hat{y}}$ using `NumPy` for the following small dataset.


| Shoe Size ($X$) | Height ($y$) |
|----------------|-------------|
|  36             |  170         |
|  37             |  175         |
|  38             |  180         |
|  38             |  169         |
|  40             |  190         |


* **Hint**: To have an intercept $\hat{\beta}_0$ in the model, just add a full column of $1s$ to your independent variable $X$ to get your design matrix $\mathbf{X}$ with `np.ones()`. To compute the inverse of a matrix, you can use `np.linalg.inv()`.

In [22]:
x_1 = np.array([36, 37, 38, 38, 40])
y = np.array([170, 175, 180, 169, 190])

X = np.vstack((np.ones(shape=len(x_1)), x_1)).T

beta_hat = np.matmul(np.matmul(np.linalg.inv(np.matmul(X.T, X)), X.T), y)
y_hat = np.matmul(X, beta_hat)
residuals = np.subtract(y, y_hat)

print(beta_hat[0]) # Intercept
print(beta_hat[1]) # Coefficient
print(residuals) # Residuals

-2.7500000000131024
4.749999999999746
[ 1.75  2.    2.25 -8.75  2.75]
