# Velon Murugathas

## Practical Lab 4 - Multivariate Linear and Polynomial Regression, and Evaluation using R-Squared, MAPE and MAE.

### 1. Get the data and run a train-validation-test splitting

In [16]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error, mean_absolute_percentage_error

diabetes_data = load_diabetes(as_frame=True)                                                            # Loading the diabetes dataset

X = diabetes_data.data                                                                                  # Splitting the data for training, validation, and test sets
y = diabetes_data.target

X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.2, random_state=42)
X_valid, X_test, y_valid, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

### 2. Run a multivariate linear regression on all variables

In [17]:
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)                                                                      # Fit a multivariate linear regression model
y_pred_linear = linear_model.predict(X_valid)

### 3. Run a polynomial regression of the 2nd degree on the BMI feature alone

In [18]:
from sklearn.preprocessing import PolynomialFeatures

X_train_bmi = X_train[['bmi']]                                                                          # Extracting the BMI feature
X_valid_bmi = X_valid[['bmi']]

poly2_bmi = PolynomialFeatures(degree=2)                                                                # Creating polynomial features for BMI
X_train_poly2_bmi = poly2_bmi.fit_transform(X_train_bmi)
X_valid_poly2_bmi = poly2_bmi.transform(X_valid_bmi)

poly2_bmi_model = LinearRegression()                                                                    # Fitting the polynomial regression model for BMI
poly2_bmi_model.fit(X_train_poly2_bmi, y_train)
y_pred_poly2_bmi = poly2_bmi_model.predict(X_valid_poly2_bmi)

### 4. Run a multivariate polynomial regression of the 2nd degree on all variables (set include_bias=False in PolynomialFeatures)

In [19]:
poly2_all = PolynomialFeatures(degree=2, include_bias=False)                                            # Creating a polynomial features for all variables
X_train_poly2_all = poly2_all.fit_transform(X_train)
X_valid_poly2_all = poly2_all.transform(X_valid)

poly2_all_model = LinearRegression()                                                                    # Fitting a multivariate polynomial regression model
poly2_all_model.fit(X_train_poly2_all, y_train)
y_pred_poly2_all = poly2_all_model.predict(X_valid_poly2_all)

### 5. Compare the three models by looking at R-squared, MAPE, and MAE

In [20]:
r2_linear = r2_score(y_valid, y_pred_linear)                                                            # Evaluating the models
mape_linear = mean_absolute_percentage_error(y_valid, y_pred_linear) * 100
mae_linear = mean_absolute_error(y_valid, y_pred_linear)

r2_poly2_bmi = r2_score(y_valid, y_pred_poly2_bmi)
mape_poly2_bmi = mean_absolute_percentage_error(y_valid, y_pred_poly2_bmi) * 100
mae_poly2_bmi = mean_absolute_error(y_valid, y_pred_poly2_bmi)

r2_poly2_all = r2_score(y_valid, y_pred_poly2_all)
mape_poly2_all = mean_absolute_percentage_error(y_valid, y_pred_poly2_all) * 100
mae_poly2_all = mean_absolute_error(y_valid, y_pred_poly2_all)

# Calculating the number of parameters
num_params_linear = len(linear_model.coef_)
num_params_poly2_bmi = poly2_bmi_model.coef_.shape[0]
num_params_poly2_all = X_train_poly2_all.shape[1]

print("Multivariate Linear Regression:")
print(f"R-squared (R²): {r2_linear:.2f}")
print(f"Mean Absolute Percentage Error (MAPE): {mape_linear:.2f}%")
print(f"Mean Absolute Error (MAE): {mae_linear:.2f}\n")

print("Polynomial Regression of the 2nd Degree on BMI:")
print(f"R-squared (R²): {r2_poly2_bmi:.2f}")
print(f"Mean Absolute Percentage Error (MAPE): {mape_poly2_bmi:.2f}%")
print(f"Mean Absolute Error (MAE): {mae_poly2_bmi:.2f}\n")

print("Multivariate Polynomial Regression of the 2nd Degree on all variables:")
print(f"R-squared (R²): {r2_poly2_all:.2f}")
print(f"Mean Absolute Percentage Error (MAPE): {mape_poly2_all:.2f}%")
print(f"Mean Absolute Error (MAE): {mae_poly2_all:.2f}")

print("\nMultivariate Linear Regression:")
print(f"Number of Parameters: {num_params_linear}")

print("\nPolynomial Regression of the 2nd Degree on BMI:")
print(f"Number of Parameters: {num_params_poly2_bmi}")

print("\nMultivariate Polynomial Regression of the 2nd Degree on all variables:")
print(f"Number of Parameters: {num_params_poly2_all}")

Multivariate Linear Regression:
R-squared (R²): 0.43
Mean Absolute Percentage Error (MAPE): 38.29%
Mean Absolute Error (MAE): 41.13

Polynomial Regression of the 2nd Degree on BMI:
R-squared (R²): 0.21
Mean Absolute Percentage Error (MAPE): 46.32%
Mean Absolute Error (MAE): 51.78

Multivariate Polynomial Regression of the 2nd Degree on all variables:
R-squared (R²): 0.38
Mean Absolute Percentage Error (MAPE): 36.47%
Mean Absolute Error (MAE): 41.17

Multivariate Linear Regression:
Number of Parameters: 10

Polynomial Regression of the 2nd Degree on BMI:
Number of Parameters: 3

Multivariate Polynomial Regression of the 2nd Degree on all variables:
Number of Parameters: 65


6. i. How many parameters are we fitting for each of the three models?

* **Multivariate Linear Regression** - Fitted 10 parameters which corresponds to the 10 features in the dataset.
* **Polynomial Regression of the 2nd Degree on BMI** - This model involves 3 parameters. They are the intercept, linear term and the quadratic term for the BMI feature.
* **Multivariate Polynomial Regression of the 2nd Degree on all the variables** - 65 parameters for this model, which includes 10 original features and 10 interactive features.

ii. Which model would you choose for deployment, and why?

Even though the multivariate linear regression model is the simplest, it may lack the ability to capture complex relationships, resulting in an R-squared value of 0.43, a Mean Absolute Percentage Error (MAPE) of 38.29%, and a Mean Absolute Error (MAE) of 41.13. Polynomial Regression of the 2nd Degree on BMI, with an R-squared of 0.21, a MAPE of 46.32%, and a MAE of 51.78, is also not the ideal option. This model focuses solely on a single feature (BMI), limiting its applicability to the broader dataset with multiple influencing factors. Therefore, the Multivariate Polynomial Regression is the best option for addressing complex, non-linear connections among multiple factors, as it achieves an R-squared of 0.38, a MAPE of 36.47%, and a MAE of 41.17. It excels at capturing intricate relationships, leading to highly accurate predictions.