# Lab instruction: linearRegression

### First exercise

##### Objective
To understand the effects of regularization in linear regression and the bias-variance tradeoff


##### Instructions

1. **Dataset**
   In this notebook it is provided a function that generates points that come from  polynomial. The parameters for the polynomial  $$x^4 - 10x^3 + 35x^2 - 50x + 2$$
   Afterwards, we generate samples should be contaminated with zero mean white Gaussian noise with customizable standard deviation. This funcrion has as input the range of variable x. Hint: Use the polyval function of numpy.
   Generate 30 points in the range of x = [1.0, 4.25].

2. **Make the design matrix**
   We are implementing a polynomial regression. This is non-linear. However, as we saw in the lecture we can consider that each non-linear component of the polynomial. As a hint. To build your design matrix use the  PolynomialFeatures routine from  sklearn.preprocessing. You need to add an extra dimension to your x array by x.reshape(-1, 1). 


3. **Learning algorithms**
   Use scipy's LinearRegression, Lasso, ridge_regression from sklearn.linear_model to estimate the parameters of the model that generates the data. Make a regression of 15th degree order. For each model 
   * For LASSO use , tol=0, max_iter=10000.
   * For Ridge use the as solver "cholesky". What happens if you set the alpha value to zero?

4. **Evaluate results**
   How are each of the regressions? Can you say something about their complexity with respect to the data? make a plot which displays the coefficients of each model? Which one has more non-zero elements? try different values of alpha for the LASSO and Ridge regressions.
   
   Reflect upon the following questions
   * Why does Ridge regression does not perform parameter selection and LASSO does?
   * What is the difference between Ridge regression and LASSO?
   * What is the difference between Ridge regression with alpha set to zero and regular linear regression?
   * What happens with the models  if we have more samples (e.g. 300 samples)?
   * What happens if you reduce the compolexity of the model? For example, we know that the data comes from a 4th order polynomial, but our model is order 15.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def generate_noisy_polynomial_data(coeffs, num_points=100, noise_std=1.0, x_range=(-10, 10)):
    """
    Generates data points from a polynomial of arbitrary degree with Gaussian noise.

    Parameters:
    - coeffs: list or array of coefficients [a_n, ..., a_1, a_0] for a_n*x^n + ... + a_1*x + a_0
    - num_points: number of data points to generate
    - noise_std: standard deviation of Gaussian noise
    - x_range: tuple (min_x, max_x) defining the range of x values

    Returns:
    - x: numpy array of x values
    - y_noisy: numpy array of noisy y values

    # During the preparation of this work, Luis A.Zavala Mondragon used MS. Copilot  in order to 
    # generate this function.  After using this tool/service, Luis A.Zavala Mondrago evaluated 
    # the validity of the tool’s outputs, including the sources that generative AI tools have used,
    # and edited the content as needed. As a consequence, Luis A.Zavala Mondragon takes full 
    # responsibility for the content of their work.
    """
    x = np.linspace(x_range[0], x_range[1], num_points)
    y = np.polyval(coeffs, x)
    noise = np.random.normal(0, noise_std, size=num_points)
    y_noisy = y + noise
    return x, y_noisy


def plot_regression(x, y_inp, y_true,  y_est=[], showEstimate=False, estimateName=""):
    """
    Generates plots to visualize the generated points, true line and estimated curve

    Parameters:
    - x: list or array of coefficients [a_n, ..., a_1, a_0] for a_n*x^n + ... + a_1*x + a_0
    - y_inp: The noisy y values
    - y_true: The noiseless y values
    - y_est: The estimated values of y with one of the regression methods
    - showEstimate: True when you want to display y_est
    - estimateName: Name for display of the line y_est

    Returns:
    - None

    # During the preparation of this work, Luis A.Zavala Mondragon used MS. Copilot  in order to 
    # generate part of this function.  After using this tool/service, Luis A.Zavala Mondrago evaluated 
    # the validity of the tool’s outputs, including the sources that generative AI tools have used,
    # and edited the content as needed. As a consequence, Luis A.Zavala Mondragon takes full 
    # responsibility for the content of their work.
    """
    # Plotting
    plt.scatter(x, y_inp, label='Noisy Data', alpha=0.6)
    plt.plot(x, y_true, color='red', label='True Polynomial')
    if showEstimate:
        plt.plot(x, y_est, color="green", label=estimateName)
    plt.legend()
    plt.xlabel('x')
    plt.ylabel('y')
    plt.title('Noisy Polynomial Data (4$^\mathrm{th}$ Degree)')
    plt.grid(True)
    plt.show()



# These are the coefficientts of the polynomial
coeffs = [1, -10, 35,-50, 24] 
x, y = generate_noisy_polynomial_data(coeffs, num_points=30, noise_std=1.5, x_range=(1.0, 4.25))
y_true = np.polyval(coeffs, x)

plot_regression(x, y_inp=y, y_true=y_true)

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures

# Add your code here to implement the regressions

