#### General guidance

This serves as a template which will guide you through the implementation of this task. It is advised
to first read the whole template and get a sense of the overall structure of the code before trying to fill in any of the TODO gaps.
This is the jupyter notebook version of the template. For the python file version, please refer to the file `template_solution.py`.

First, we import necessary libraries:

In [11]:
import numpy as np
import pandas as pd

# Add any additional imports here (however, the task is solvable without using 
# any additional imports)
# import ...
import os

import sklearn
import sklearn.model_selection
import sklearn.linear_model

 #### Loading data

In [12]:
# Pull data if not exists
DATA_PATH = 'data'
if not os.path.exists(DATA_PATH):
    !bash pull_data.sh
else:
    print("Data already fetched!")

Data already fetched!


In [13]:
data = pd.read_csv("data/train.csv")
y = data["y"].to_numpy()
data = data.drop(columns=["Id", "y"])
# print a few data samples
print(data.head())
X = data.to_numpy()

     x1    x2    x3    x4    x5
0  0.02  0.05 -0.09 -0.43 -0.08
1 -0.13  0.11 -0.08 -0.29 -0.03
2  0.08  0.06 -0.07 -0.41 -0.03
3  0.02 -0.12  0.01 -0.43 -0.02
4 -0.14 -0.12 -0.08 -0.02 -0.08


#### Transform data

In [14]:
"""
Transform the 5 input features of matrix X (x_i denoting the i-th component of X) 
into 21 new features phi(X) in the following manner:
5 linear features: phi_1(X) = x_1, phi_2(X) = x_2, phi_3(X) = x_3, phi_4(X) = x_4, phi_5(X) = x_5
5 quadratic features: phi_6(X) = x_1^2, phi_7(X) = x_2^2, phi_8(X) = x_3^2, phi_9(X) = x_4^2, phi_10(X) = x_5^2
5 exponential features: phi_11(X) = exp(x_1), phi_12(X) = exp(x_2), phi_13(X) = exp(x_3), phi_14(X) = exp(x_4), phi_15(X) = exp(x_5)
5 cosine features: phi_16(X) = cos(x_1), phi_17(X) = cos(x_2), phi_18(X) = cos(x_3), phi_19(X) = cos(x_4), phi_20(X) = cos(x_5)
1 constant feature: phi_21(X)=1

Parameters
----------
X: matrix of floats, dim = (700,5), inputs with 5 features

Compute
----------
X_transformed: array of floats: dim = (700,21), transformed input with 21 features
"""

X_transformed = np.zeros((700, 21))

X_transformed[:, 0] = X[:, 0] # x1
X_transformed[:, 1] = X[:, 1] # x2
X_transformed[:, 2] = X[:, 2] # x3
X_transformed[:, 3] = X[:, 3] # x4
X_transformed[:, 4] = X[:, 4] # x5

X_transformed[:, 5] = X[:, 0]**2 # x1**2
X_transformed[:, 6] = X[:, 1]**2 # x2**2
X_transformed[:, 7] = X[:, 2]**2 # x3**2
X_transformed[:, 8] = X[:, 3]**2 # x4**2
X_transformed[:, 9] = X[:, 4]**2 # x5**2

X_transformed[:, 10] = np.exp(X[:, 0]) # exp(x1)
X_transformed[:, 11] = np.exp(X[:, 1]) # exp(x2)
X_transformed[:, 12] = np.exp(X[:, 2]) # exp(x3)
X_transformed[:, 13] = np.exp(X[:, 3]) # exp(x4)
X_transformed[:, 14] = np.exp(X[:, 4]) # exp(x5)

X_transformed[:, 15] = np.cos(X[:, 0]) # cos(x1)
X_transformed[:, 16] = np.cos(X[:, 1]) # cos(x2)
X_transformed[:, 17] = np.cos(X[:, 2]) # cos(x3)
X_transformed[:, 18] = np.cos(X[:, 3]) # cos(x4)
X_transformed[:, 19] = np.cos(X[:, 4]) # cos(x5)

X_transformed[:, 20] = 1 # bias

display(X[0:5, :])
display(X_transformed[0:5, :])

assert X_transformed.shape == (700, 21)

array([[ 0.02,  0.05, -0.09, -0.43, -0.08],
       [-0.13,  0.11, -0.08, -0.29, -0.03],
       [ 0.08,  0.06, -0.07, -0.41, -0.03],
       [ 0.02, -0.12,  0.01, -0.43, -0.02],
       [-0.14, -0.12, -0.08, -0.02, -0.08]])

array([[ 2.00000000e-02,  5.00000000e-02, -9.00000000e-02,
        -4.30000000e-01, -8.00000000e-02,  4.00000000e-04,
         2.50000000e-03,  8.10000000e-03,  1.84900000e-01,
         6.40000000e-03,  1.02020134e+00,  1.05127110e+00,
         9.13931185e-01,  6.50509095e-01,  9.23116346e-01,
         9.99800007e-01,  9.98750260e-01,  9.95952733e-01,
         9.08965750e-01,  9.96801706e-01,  1.00000000e+00],
       [-1.30000000e-01,  1.10000000e-01, -8.00000000e-02,
        -2.90000000e-01, -3.00000000e-02,  1.69000000e-02,
         1.21000000e-02,  6.40000000e-03,  8.41000000e-02,
         9.00000000e-04,  8.78095431e-01,  1.11627807e+00,
         9.23116346e-01,  7.48263568e-01,  9.70445534e-01,
         9.91561894e-01,  9.93956098e-01,  9.96801706e-01,
         9.58243876e-01,  9.99550034e-01,  1.00000000e+00],
       [ 8.00000000e-02,  6.00000000e-02, -7.00000000e-02,
        -4.10000000e-01, -3.00000000e-02,  6.40000000e-03,
         3.60000000e-03,  4.90000000e-03,  1.68100000e

#### Fit data

In [15]:
"""
Use the transformed data points X_transformed and fit the linear regression on this 
transformed data. Finally, compute the weights of the fitted linear regression. 

Parameters
----------
X_transformed: array of floats: dim = (700,21), transformed input with 21 features
y: array of floats, dim = (700,), input labels)

Compute
----------
w: array of floats: dim = (21,), optimal parameters of linear regression
"""

train_size = 1

if train_size < 1:
    # Do test split for good measure
    X_train, X_test, Y_train, Y_test = sklearn.model_selection.train_test_split(X_transformed, y, train_size=train_size, random_state=69)
else:
    X_train = X_transformed
    Y_train = y
    X_test = None


# Train model (no intercept as we already have that in our weights)
model = sklearn.linear_model.RidgeCV(fit_intercept=False, alphas=np.logspace(0, 10, 400))
model.fit(X_train, Y_train)

display(model.alpha_)

w = model.coef_
display(w)

assert w.shape == (21,)


if X_test is not None:
    # Validate model
    test_preds = model.predict(X_test)
    diff = test_preds - Y_test
    RMSE = sklearn.metrics.mean_squared_error(Y_test, test_preds)**0.5

    print(f"RMSE: {RMSE}")



10.655379505623053

array([ 0.12990164, -0.30460059, -0.44676229,  0.21947168,  0.08323926,
       -0.15266684,  0.08257318,  0.08440959, -0.1149516 ,  0.03158846,
       -0.51283915, -0.8291457 , -0.97264835, -0.39995951, -0.46878198,
       -0.48806552, -0.60496836, -0.60584511, -0.50758863, -0.57900864,
       -0.56391809])

# Generate Output Files

In [16]:
# Save results in the required format
np.savetxt("./output.csv", w, fmt="%.12f")

In [34]:
## end of task ##

!jupyter nbconvert --to python task.ipynb

import re #python regular expression matching module
with open('task.py', 'r') as f_orig:
    script = re.sub(r'# In\[.*\]:\n','', f_orig.read())
    script = script.replace('## end of task ##',
"""
## Exit here, the rest is only used for creating this file
exit(0)
"""
    , 1)
    script = script.replace("get_ipython().system('bash pull_data.sh')",
"""# get_ipython().system('bash pull_data.sh')
    print("We are missing the data/ folder, please download the data manually and extract everything to data/.")
    exit(1)""", 1
)
with open('task.py','w') as fh:
    fh.write(script[:script.index("\n")])
    fh.write("""
   
## Note: This file was automatically generated from an Jupyter Notebook.

def display(X):
    print(X)

""")
    fh.write(script[script.index("\n"):])


[NbConvertApp] Converting notebook task.ipynb to python
[NbConvertApp] Writing 5251 bytes to task.py
