# Model evaluation and improvement: Choice of polyomial degrees
M5U5 - Exercise 1

## What are we going to do?
- Transform features to apply polynomials of varying degrees
- Identify the suitable polynomial degree for each feature
- Identify when we suffer deviation or over-fitting due to using the wrong degree polynomial

Remember to follow the instructions for the practice deliverables given at [Submission instructions](https://github.com/Tokio-School/Machine-Learning-EN/blob/main/Submission_instructions.md).

## Instructions
In many cases, various effects of predictor variables in nature have an influence on the target variable that is not linear, but this linearity can be modelled by transforming the original data. Some of these effects, and therefore transformations, are polynomial, square root, logarithmic, etc.

E.g. sunlight, temperature, temporal effects with daily cycles, etc., have a polynomial effect on animals, plants, etc.

In this exercise we are going to see how we can transform our data to model a system, train a model, of linear type on non-linear data, but which we can convert to be linear, and therefore we can solve for linear models such as linear regression or linear logistic regression.

In [None]:
# TODO: Import all the necessary libraries in this cell

## Polynomial characteristics

One of the most common effects are polynomials. Find out more about polynomials and their degrees: [Polinomial](https://en.wikipedia.org/wiki/Polynomial).

P. ej., modelo lineal con un único predictor *X* modelizado por un polinomio de grado 3: $Y = \theta_0 + \theta_1 \times X + \theta_2 \times X^2 + \theta_3 \times X^3 $

In this case, with a single predictor *X*, instead of taking just that predictor we take other features from it, transforming it by squaring and cubing it. We take one feature, and get 2 more from it.

To identify these effects in our datasets, it is important to become familiar with the characteristic graphical form of the most common ones.

Plot multi-degree polynomials, play with their parameters and study their resulting characteristic forms:

In [None]:
# TODO: Plot multiple polynomial graphs

#  Create an ndarray with a linear space of 100 points between [0, 100] which we will use as X, predictor variable, and the horizontal axis of the graph
x = [...]

# Create ndarrays with the transformations by raising said X to degrees 2 to 6
for degree in [...]:
    term = [...]    # Calculate the corresponding term by raising x to that degree
    # Concatenate that term to x as a new column, horizontally, using np.concatenate()
    [...]

# Plot such polynomials as dot and line plots as a series of different colours
# Add a grid, title and reading for the series

[...]

## Creating the dataset

Once the polynomial effects have been graphically explored, we are going to build a synthetic dataset with high degree polynomial effects, which we will have to solve by transforming our data and testing various polynomial degrees.

The process we are going to follow to generate the dataset, therefore, is the following:
1. Generate a dataset with 7 features, composed of a pseudo-random *X*, $X^2, X^3, ..., X^6$
1. Generate some pseudo-random $\Theta$ coefficients/weights
1. Complete the dataset by generating a Y from some of the features of *X* and $\Theta$, to a given degree
1. Add an error parameter or white/Gaussian noise to *Y*

To obtain *Y* we will not use all the $n + 1$ features of *X*, but only up to a given degree, o that we can train several models using more or less characteristics of *X* until we find the optimal polynomial degree, neither too much nor too little.

Once generated, as usual, our goal for practice will be to explore how we can transform our data to model an originally non-linear dataset, by linear models, in order to obtain $\Theta$ and be able to generate new predictions with our model.

We generate a dataset with more features than those used to calculate *Y* so that we have the flexibility to use more or less in the future.

Build on your manual dataset generation code (not Scikit-learn methods) from previous exercises:

In [None]:
# TODO: Create a dataset with polynomial effects up to degree 6
m = 100

# Generate an X of m pseudo-random values in the range [0, 1)
X_true = [...]

# Concatenate 5 new columns/characteristics to X with corresponding degree terms ([2, 6])
for grade in [...]:
    term = [...]    # Calculates the corresponding term by raising X to that degree
    # Concatenate that term to X as a new column, horizontally, using np.concatenate()
    [...]

# Inserts a column of 1. to the left of X as a bias term
X_true = [...]

# What would be the n or number of features/dimensions of this dataset?
n = [...]

# Generate a pseudo-random true Theta ndarray [0, 1) 1D of size (n + 1,)
Theta_true = [...]

# Calculate the Y corresponding to X and true Theta with the first 4 features of X, i.e. with a polynomial up to degree 3 (b + X + X^2 + X^3)
# Use the first 4 columns of X and true Theta
Y = [...]

# Add a white/Gaussian error term as a +/-e percentage added to Y
# Make sure to generate pseudo-random numbers from a normal or Gaussian distribution
e = 0.15

Y = Y + [...]

# Check the values and dimensions of X and Y
[...]

## Feature extraction

Once the base dataset is generated, we are going to generate a different training dataset. The reason for generating a different one is to simulate all the steps in the same way we would do in reality, starting from the same point, of only having one predictor or feature for *X* and one *Y*, having to generate new transformed features since we would not know which degree of the polynomial would be the correct one for each feature (here we only consider a base predictor), not even if there is a polynomial effect or not.

We will generate an *X* iteratively, testing one degree of polynomial, checking and re-testing a different degree, until we get a transformation that when modelling the model we obtain satisfactory results.

To do so, start from $X_{verd1}$ ($X_{verd0} = 1$) and generate a dataset *X* with a number of features given by the degree of the polynomial to be checked.

Throughout the exercise, you will return to the next code cell and you can re-run it to test a different polynomial grade.

To do this, use Scikit-learn's preprocessing methods:
- [preprocessing.PolynomialFeatures](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html).
- [Polynomial features](https://scikit-learn.org/stable/modules/preprocessing.html#polynomial-features).

Generate an *X* from $X_{verd1}$ by playing with polynomial transformations of degree from 2 to, in subsequent iterations, 5 or even 6.

*NOTE:* Polynomial effects can be of arbitrarily high degree. However, it is most common in nature to find effects up to degree 4, beyond which they are exceptionally rare and therefore also usually considered too extreme in statistical or scientific models.

In [None]:
# TODO: Generate a dataset X from X_verd[:, 1] by polynomial transformation with Scikit-learn
# NOTE: Beware of the behaviour of PolynomialFeatures(), which adds bias term and polynomial terms for multiple features
grade = 2    # In subsequent iterations, modify the degree number of the polynomial

X = [...]

# Checks the values and dimensions of X and Y
[...]

### Data preprocessing

As usual, preprocess your dataset before proceeding by randomly reordering the data, normalising it if necessary and splitting it into training and test subsets:

In [None]:
# TODO: Preprocess data by reordering it, normalising it and splitting it into training and test subsets

## Training the model

We have started with the hypothesis that we can transform our data with a polynomial of degree 2 for a linear model to obtain satisfactory results.

Let's train such a model and evaluate its results.

Train a linear regression model by cross-validation with [linear_model.RidgeCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html) and evaluate it with its coefficient of determination $R^2$ of the `model.score()` method on the test subset:

In [None]:
# TODO: Train the model hypothesis by CV and evaluate it on the test subset

### Evaluation of the residuals

Usually, the best way to assess whether we are hypothesising the correct data pattern, in this case a polynomial effect of degree 2, is to explore the residuals of the model.

Calculate your residuals on the test subset and plot them graphically:

*NOTE:* Remember the definition of residuals, $\text{residuals} = (Y_{pred} - Y)$

In [None]:
# TODO: Calculate and plot the residuals of the model against the original dataset.

*Do they look acceptable and do they follow a pattern?*

## Iterate until the solution is found

We have hypothesised that the optimal degree of polynomial transformation would be 2, but we have not obtained satisfactory results. Therefore, we must iterate, go back, make a new hypothesis of a higher degree, re-run the cells and check the results.

In science in general, data science and ML, we must always pose multiple hypotheses, test them and iteratively accept or discard them. To do this, it is essential to document the experiments we have been running and their results.

Record the results of your experiments in the following cell:

**Results:**
1. Polynomial of degree 2: $R^2$ = ...
1. Polynomial of degree 3: $R^2$ = ...
1. Polynomial of degree 4: $R^2$ = ...
1. Polynomial of degree 5: $R^2$ = ...
1. Polynomial of degree 6: $R^2$ = ...