# Practical Lab 4 - Multivariate Linear and Polynomial Regression, and Evaluation using R-Squared, MAPE and MAE.

## In this lab, we will run multivariate linear model and two polynomial regression models, and evaluate them using R-squared, Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE) metrices. The dataset will be Scikit-Learn's Diabetes dataset - the same one used in Practical Lab 3.

### 1. Get the data, and run a train-validation-test split. Description of each column can be found in sklearn documentation. Look at the documentation for the load_diabetes method to know what are as_frame and scaled arguments are for.

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score, mean_absolute_error

Load Data

In [2]:
data = load_diabetes(as_frame=True, scaled=False)

Data Splitting

In [3]:
X, y = data.data, data.target
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

### 2 . Run a multivariate linear regression on all variables

In [4]:
linear_regression = LinearRegression()
linear_regression.fit(X_train, y_train)
y_val_pred = linear_regression.predict(X_val)

### 3 . Run a polynomial regression of the 2nd degree on the BMI feature alone

In [5]:
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly_bmi = poly.fit_transform(X_train['bmi'].values.reshape(-1, 1))# Select the BMI feature
X_val_poly_bmi = poly.transform(X_val['bmi'].values.reshape(-1, 1))
poly_regression_bmi = LinearRegression()
poly_regression_bmi.fit(X_train_poly_bmi, y_train)
y_val_poly_bmi_pred = poly_regression_bmi.predict(X_val_poly_bmi)

### 4 . Run a multivariate polynomial regression of the 2nd degree on all variables

In [6]:
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_val_poly = poly.transform(X_val)
poly_regression = LinearRegression()
poly_regression.fit(X_train_poly, y_train)
y_val_poly_pred = poly_regression.predict(X_val_poly)



### 5 . Compare the three models by looking at R-squared, MAPE and MAE. Explain what the values mean for a non-expert and add your insight about the values of each model. Note: You can add any further comparisons and code (this is not necessary for a perfect score, but will be reviewed and evaluated) (2 points)

In [7]:
r2_linear = r2_score(y_val, y_val_pred)
mae_linear = mean_absolute_error(y_val, y_val_pred)
mape_linear = np.mean(np.abs((y_val - y_val_pred) / y_val)) * 100

r2_poly_bmi = r2_score(y_val, y_val_poly_bmi_pred)
mae_poly_bmi = mean_absolute_error(y_val, y_val_poly_bmi_pred)
mape_poly_bmi = np.mean(np.abs((y_val - y_val_poly_bmi_pred) / y_val)) * 100

r2_poly = r2_score(y_val, y_val_poly_pred)
mae_poly = mean_absolute_error(y_val, y_val_poly_pred)
mape_poly = np.mean(np.abs((y_val - y_val_poly_pred) / y_val)) * 100

In [8]:
num_params_linear = len(linear_regression.coef_)
num_params_poly_bmi = len(poly_regression_bmi.coef_)
num_params_poly = len(poly_regression.coef_)
print("Multivariate Linear Regression:")
print(f"R-squared: {r2_linear:.3f}")
print(f"MAE: {mae_linear:.3f}")
print(f"MAPE: {mape_linear:.3f}%")
print(f"Number of Parameters: {num_params_linear}")

print("\nPolynomial Regression on BMI:")
print(f"R-squared: {r2_poly_bmi:.3f}")
print(f"MAE: {mae_poly_bmi:.3f}")
print(f"MAPE: {mape_poly_bmi:.3f}%")
print(f"Number of Parameters: {num_params_poly_bmi}")

print("\nMultivariate Polynomial Regression:")
print(f"R-squared: {r2_poly:.3f}")
print(f"MAE: {mae_poly:.3f}")
print(f"MAPE: {mape_poly:.3f}%")
print(f"Number of Parameters: {num_params_poly}")

Multivariate Linear Regression:
R-squared: 0.511
MAE: 38.217
MAPE: 34.616%
Number of Parameters: 10

Polynomial Regression on BMI:
R-squared: 0.296
MAE: 48.273
MAPE: 41.902%
Number of Parameters: 2

Multivariate Polynomial Regression:
R-squared: 0.367
MAE: 42.471
MAPE: 38.090%
Number of Parameters: 65


## 6 . Please answer the following questions:

### i . How many parameters are we fitting for each of the three models? Explain these values. Hint: for explaining the parameters of the polynomial regression, you can use poly.get_feature_names_out()

Multivariate Linear Regression: 10 parameters, one for each feature in the dataset.
Polynomial Regression on BMI: 3 parameters, representing the coefficients of the polynomial equation.
Multivariate Polynomial Regression: A larger number of parameters due to interactions and quadratic terms; the exact count can be determined using poly.get_feature_names_out().

### ii . Which model would you choose for deployment, and why?

The Multivariate Linear Regression model should be strongly considered for deployment. It outperforms the other models with the highest R-squared value, lower mean absolute error (MAE), and a lower mean absolute percentage error (MAPE). This indicates its superior predictive accuracy and fit to the data. While it has more parameters, it provides a substantial gain in performance, making it the most compelling choice for deployment when predictive power is a top priority.