# Part a): Ordinary Least Square on the Franke function
In this notebook, we generate a dataset by sampling the Franke function on the rectangle $[0,1]\times[0,1]$, both with and without the presence of added noise. We then try fitting a polynomial function to these dataset, and evaluate how well we are able to approximate the data.

In [22]:
import numpy as np
import os
os.sys.path.append(os.path.dirname(os.path.abspath('.')))
import pandas as pd

# Import local modules
from src.models.models import OLS
from src.evaluation.evaluation import mse, r_squared

In [57]:
df_X = pd.read_csv('../data/generated/X.csv', index_col=0)
df_z_no_noise = pd.read_csv('../data/generated/no_noise.csv', usecols=[1])
df_z_some_noise = pd.read_csv('../data/generated/some_noise.csv', usecols=[1])
df_z_noisy = pd.read_csv('../data/generated/noisy.csv', usecols=[1])

X = np.array(df_X)
z_no_noise = np.array(df_z_no_noise).ravel()
z_some_noise = np.array(df_z_some_noise).ravel()
z_noisy = np.array(df_z_noisy).ravel()

df_X.tail()

Unnamed: 0,1,x,y,x*x,x*y,y*y,x*x*x,x*x*y,x*y*y,y*y*y,...,x*x*x*y,x*x*y*y,x*y*y*y,y*y*y*y,x*x*x*x*x,x*x*x*x*y,x*x*x*y*y,x*x*y*y*y,x*y*y*y*y,y*y*y*y*y
395,1.0,0.75,0.95,0.5625,0.7125,0.9025,0.421875,0.534375,0.676875,0.857375,...,0.400781,0.507656,0.643031,0.814506,0.237305,0.300586,0.380742,0.482273,0.61088,0.773781
396,1.0,0.8,0.95,0.64,0.76,0.9025,0.512,0.608,0.722,0.857375,...,0.4864,0.5776,0.6859,0.814506,0.32768,0.38912,0.46208,0.54872,0.651605,0.773781
397,1.0,0.85,0.95,0.7225,0.8075,0.9025,0.614125,0.686375,0.767125,0.857375,...,0.583419,0.652056,0.728769,0.814506,0.443705,0.495906,0.554248,0.619453,0.69233,0.773781
398,1.0,0.9,0.95,0.81,0.855,0.9025,0.729,0.7695,0.81225,0.857375,...,0.69255,0.731025,0.771638,0.814506,0.59049,0.623295,0.657923,0.694474,0.733056,0.773781
399,1.0,0.95,0.95,0.9025,0.9025,0.9025,0.857375,0.857375,0.857375,0.857375,...,0.814506,0.814506,0.814506,0.814506,0.773781,0.773781,0.773781,0.773781,0.773781,0.773781


We will now compute the MSE, the $R^2$ score and the variance of the beta estimates

In [56]:
targets = [{
    'name': 'No noise',
    'values': z_no_noise
},
{
    'name': 'Some noise (sigma 0.1)',
    'values': z_some_noise
},
{
    'name': 'Noisy (sigma 0.9)',
    'values': z_noisy
}]

for target in targets:
    ol = OLS()
    ol.fit(X, target['values'])
    predictions = ol.predict(X)
    
    mse_value = mse(target['values'], predictions)
    r_2_value = r_squared(target['values'], predictions)
    
    var_beta = mse_value*np.sqrt(np.diag(np.linalg.inv(np.dot(X.transpose(), X))))
    

We see a how an increase in noise increases the mean square error, but that the R squared statistic remains fairly close to 1.