# Generating data from the Franke function
In this notebook, our goal is to generate datasets by sampling the Franke function on the rectangle $[0,1]\times[0,1]$, both with and without the presence of added noise. By varying the noise, we obtain different target values, and it will be interesting to study how our regression methods cope with the noise.

In [9]:
import numpy as np
import os
os.sys.path.append(os.path.dirname(os.path.abspath('.')))
import pandas as pd

# Import local modules
from src.data.generate_data import FrankeFunction
from src.features.polynomial import PolynomialFeatures

We begin by creating the features and collecting them in a matrix $X$. Here, we use the class $\texttt{PolynomialFeatures}$ in $\texttt{src.features.polynomial}$.

In [10]:
x = np.arange(0, 1, 0.05)
y = np.arange(0, 1, 0.05)
x, y = np.meshgrid(x, y)
x, y = x.ravel(), y.ravel()
pf = PolynomialFeatures(5)
X = pf.fit_transform(np.c_[x.reshape((-1, 1)), y.reshape((-1, 1))], ['x', 'y'])
names = pf.names

Now, we create the target values by sampling the Franke Function. By varying the noise term, we obtain three different data sets.

In [11]:
z_no_noise = FrankeFunction(x, y, noise=0, seed=43)
z_some_noise = FrankeFunction(x, y, noise=0.1, seed=43)
z_noisy = FrankeFunction(x, y, noise=0.9, seed=43)

Finally, we create $\texttt{.csv}$ files containing the three datasets. Of course, the data could easily be generated on the fly as needed, but I choose to save them to file at this point in order to ensure that our subsequent work will be on those three datasets and nothing else.

In [12]:
df_X = pd.DataFrame(X, columns=names)
df_z_no_noise = pd.DataFrame(z_no_noise.reshape((-1, 1)))
df_z_some_noise = pd.DataFrame(z_some_noise.reshape((-1, 1)))
df_z_noisy = pd.DataFrame(z_noisy.reshape((-1, 1)))
df_X.to_csv('../data/generated/X.csv')
df_z_no_noise.to_csv('../data/generated/no_noise.csv')
df_z_some_noise.to_csv('../data/generated/some_noise.csv')
df_z_noisy.to_csv('../data/generated/noisy.csv')