# <FONT COLOR="red">**POLYNOMIAL REGRESSION**</FONT>
---
---

The ***Polynomial Regression*** notebook was created to understand how the polynomial regression algorithm works. To do this, a set of synthetic data is prepared, and the degree of the polynomial is varied to understand its operation and concepts, such as under-adjustment and over-adjustment.

In [8]:
# IMPORT COMMON LIBRARIES
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import gc

# IMPORT MODEL LIBRARIES
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# IMPORT METRICS LIBARIES
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

In [11]:
# CREATION OF SYNTHETIC DATA
x = np.linspace(0,8,500) # Features
y_data = 0.125*x**3 - x**2 + 0.125*x # Real synthetic data
y = np.random.normal(0,0.50,500) + y_data # Target

In [30]:
# POLYNOMIAL REGRESSION MODEL
def polynomial_regression(degree:int, features:np.ndarray, target:np.ndarray) ->np.ndarray:

  # POLYNOMIAL CONFIGURATION
  poly_model = PolynomialFeatures(degree=degree)

  # FIT MODEL --> OBTAIN POLY FEATURES
  poly_features = poly_model.fit_transform(features.reshape(-1,1))

  # MODEL CREATION, TRAIN, AND PREDICT
  model = LinearRegression()
  model.fit(poly_features, target)
  y_pred = model.predict(poly_features)

  # CREATION FIGURE
  poly_figure(features, target, degree, y_pred)

  # METRICS RESULTS
  mse = mean_squared_error(target, y_pred)
  rmse = np.sqrt(mse)
  mae = mean_absolute_error(target, y_pred)
  r2 = r2_score(target, y_pred)

  results = np.array([degree, mse, rmse, mae, r2])

  # PRINT RESULTS
  print(f'MSE: {mse}')
  print(f'RMSE: {rmse}')
  print(f'MAE: {mae}')
  print(f'R2: {r2}')

  return results

In [28]:
def poly_figure (feature:np.ndarray, target:np.ndarray, degree:int, prediction:np.ndarray) ->None:

  # CREATE FIGURE
  plt.figure(figsize=(5,5))

  # REAL DATA
  plt.scatter(x=feature, y=target, color='blue', label='Real Data', linewidths=0.3)

  # PREDICTED DATA
  plt.plot(feature, prediction, color='red', label='Polynomial Regression', linewidth=3)

  # TAGS
  plt.title(f'Polynomial Regression - Degree = {degree}')
  plt.xlabel('Feature')
  plt.ylabel('Target')
  plt.legend()

  # GRID
  plt.minorticks_on()
  plt.grid(which='major', linestyle='-', linewidth='0.5', color='black')
  plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')

  # REMOVE TOP AND LEFT BORDERS
  sns.despine()

  # SHOW FIGURE
  plt.show()

  # CLOSE AND RELEASE MEMORY
  plt.close()
  gc.collect()

In [31]:
metric_values = np.array([])

# LOOP TO CREATE ALL THE MODELS
for i in range(1,31):
  results = polynomial_regression(i, x, y)
  metric_values = np.append(metric_values, results)
  print('\n')

Output hidden; open in https://colab.research.google.com to view.

In [32]:
# CREATE A RESULTS DATAFRAME WITH PANDAS
result_df = pd.DataFrame(metric_values.reshape(-1,5), columns=['Degree', 'MSE', 'RMSE', 'MAE', 'R2'])

# DISPLAY DATAFRAME
display(result_df)

Unnamed: 0,Degree,MSE,RMSE,MAE,R2
0,1.0,7.387014,2.717906,2.245937,0.244208
1,2.0,1.750248,1.322969,1.103996,0.820926
2,3.0,0.256051,0.506015,0.396978,0.973802
3,4.0,0.255956,0.505921,0.397316,0.973812
4,5.0,0.255956,0.505921,0.39732,0.973812
5,6.0,0.255919,0.505884,0.397196,0.973816
6,7.0,0.255914,0.505879,0.397339,0.973817
7,8.0,0.255914,0.505879,0.397334,0.973817
8,9.0,0.255661,0.505629,0.397462,0.973842
9,10.0,0.255613,0.505582,0.397363,0.973847


According to the graphs and results of the metrics, it is possible to affirm that the approximations are profitable from degree 3 to degree 17 of the polynomial, from this point on, alterations occur that cause an over-fit, on the other hand, the polynomials of degree 1 and 2 have an under-fit that causes them to not be able to adapt to the data.