# SWMAL Exercise


## Training a Linear Regressor I 

#### Qa Write a Python function that uses the closed-form to find $\bw^*$
In this assignment we are going to make a function called closed_form, this function is going use the normal equation to calculate the closed form for the given dataset.   
![](figs/linear_reg.png) 

In [5]:
import numpy as np
from libitmal import utils as itmalutils

def GenerateData():
    X = np.array([[8.34044009e-01],[1.44064899e+00],[2.28749635e-04],[6.04665145e-01]])
    y = np.array([5.97396028, 7.24897834, 4.86609388, 3.51245674])
    return X, y

X, y = GenerateData()

def closed_form(X,y ): 
    Xw = np.c_[np.ones((X.shape[0],1)),X]
    w = np.linalg.inv(Xw.T.dot(Xw)).dot(Xw.T).dot(y)
    return w
    
w = closed_form(X,y)


w_expected = np.array([4.046879011698, 1.880121487278])
itmalutils.PrintMatrix(w, label="w=", precision=12)
itmalutils.AssertInRange(w, w_expected, eps=1E-9)

print("OK")

w=[4.046879011698 1.880121487278]
OK


#### Qb Find the limits of the least-square method
When calculating a matrix inverse it can become really computationally expensive, especially when the datasets get bigger. This a also the case when dealing with a singular matrix as a singular matrix does not have a true inverse. 
The reason why it takes so long is that we are essentially doubling the amount of features which we then have to multipy. This can increase the computationally requirement to the power of 2 if not more.

In the code below we show what happens when we use the same closed form function on this new dataset.    

In [6]:
def GenerateData(M, N):
    print(f'GenerateData(M={N}, N={N})...')
    
    assert M>0
    assert N>0
    assert isinstance(M, int)
    assert isinstance(N, int)

    # NOTE: not always possible to invert a random matrix; 
    #       it becomes sigular, hence a more elaborate choice 
    #       of values below (but still a hack): 
    X=2 * np.ones([M, N])
    for i in range(X.shape[0]):
        X[i,0]=i*4
    for j in range(X.shape[1]):
        X[0,j]=-j*4

    y=4 + 3*X + np.random.randn(M,1)
    y=y[:,0] # well, could do better here!
    
    return X, y

X, y = GenerateData(M=10000, N=20)

w = closed_form(X,y) 
itmalutils.PrintMatrix(w, label="w=", precision=12)

GenerateData(M=20, N=20)...
w=[ 1666387.463221365           2.440139825613   364127.6658302946
   -1027466.0514942813    -355721.38808381493     54929.842366258126
    -379931.10987626715     82890.95449692146   -260862.73190535876
      10940.299227085743  -169174.511310305      433595.05637842236
    -130711.18592202297    275061.5565574387     -50454.64883847751
    -149800.38651725327    107994.27999219672     91549.56736019408
       4787.780045981091    37335.089646668166   -43486.088661212234]
