# Part a): Ordinary Least Square on the Franke function
In this notebook, we generate a dataset by sampling the Franke function on the rectangle $[0,1]\times[0,1]$, both with and without the presence of added noise. We then try fitting a polynomial function to these dataset, and evaluate how well we are able to approximate the data.

In [2]:
import numpy as np
import os
os.sys.path.append(os.path.dirname(os.path.abspath('.')))

# Import local modules
from src.data.generate_data import FrankeFunction
from src.models.models import OLS
from src.features.polynomial import PolynomialFeatures
from src.evaluation.evaluation import mse, r_squared, cross_val_mse

We begin by creating the features. Here, we use the class $\texttt{PolynomialFeatures}$ in $\texttt{src.features.polynomial}$.

In [3]:
x = np.arange(0, 1, 0.05)
y = np.arange(0, 1, 0.05)
x, y = np.meshgrid(x, y)
x, y = x.ravel(), y.ravel()
pf = PolynomialFeatures(5)
X = pf.fit_transform(np.c_[x.reshape((-1, 1)), y.reshape((-1, 1))])

Now, we create the target values by sampling the Franke Function. By varying the noise term, we obtain different data sets.

In [4]:
# Compute mse for varying noise:
ols = OLS()
noises = np.arange(0, 3.0, 0.2)
print('%-10s%-10s%-10s' %('Noise', 'MSE', 'R squared'))
for noise in noises:
    z = FrankeFunction(x, y, noise=noise, seed=43)
    ols.fit(X, z)
    predictions = ols.predict(X)
    mean_squared_error = mse(z, predictions)
    r_s = r_squared(z, predictions)
    print('%-10.2f%-10.3f%-10.8f' %(noise, mean_squared_error, r_s))

Noise     MSE       R squared 
0.00      0.002     0.99993004
0.20      0.038     0.99919371
0.40      0.152     0.99842182
0.60      0.343     0.99805820
0.80      0.610     0.99788163
1.00      0.955     0.99778555
1.20      1.377     0.99772790
1.40      1.877     0.99769059
1.60      2.453     0.99766500
1.80      3.107     0.99764662
2.00      3.838     0.99763293
2.20      4.646     0.99762243
2.40      5.531     0.99761417
2.60      6.493     0.99760753
2.80      7.533     0.99760211


We see a how an increase in noise increases the mean square error, but that the R squared statistic remains fairly close to 1.

In [6]:
z = FrankeFunction(x, y, noise=0.3, seed=43)
cross_val_mse(X, y, 10)

[{'train_indices': array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  44,  45,  46,  47,  48,  51,  53,  54,  55,
        56,  57,  58,  59,  60,  61,  63,  64,  65,  67,  68,  69,  70,
        71,  72,  73,  74,  76,  77,  78,  79,  80,  81,  82,  83,  84,
        85,  86,  87,  89,  90,  91,  92,  93,  94,  95,  96,  97,  98,
        99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
       112, 113, 114, 116, 118, 119, 120, 121, 122, 124, 125, 126, 127,
       128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 140, 141,
       142, 143, 144, 145, 146, 147, 148, 149, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 162, 163, 164, 165, 166, 167, 168, 169,
       170, 171, 172, 173, 174, 175, 177, 178, 179, 180, 181, 182, 183,
       184, 186, 187, 188, 189, 190, 192, 193