1.Utilize the diabetes dataset from lab 4. Perform cross-validation on nine polynomial models, ranging from degree 0 to 8. (2 points)

In [46]:
# Task 1: Load the diabetes dataset from our previous lab and prepare for cross-validation.
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error, mean_absolute_percentage_error
from sklearn.datasets import load_diabetes
from sklearn.model_selection import cross_val_score

# Load the diabetes data
X, y = load_diabetes(as_frame=True, return_X_y=True, scaled=True)

# Create a list to store results
results = []

# Task 2: Evaluate nine polynomial models ranging from degree 0 to 8.
for degree in range(9):
    # Create polynomial features
    poly = PolynomialFeatures(degree=degree)
    X_poly = poly.fit_transform(X)

    # Initialize a linear regression model
    model = LinearRegression()

    # Task 3: Use 5-fold cross-validation
    scores = cross_val_score(model, X_poly, y, cv=5, scoring='neg_mean_absolute_error')
    
    # Calculate metrics for each fold
    maes = -scores

    # Store the results
    results.append([degree, np.mean(maes)])

# Task 4: Display the results in a table format
print("Degree | Mean MAE")
for result in results:
    print(f"{result[0]:6} | {result[1]:9.6f}")


Degree | Mean MAE
     0 | 66.045624
     1 | 44.276499
     2 | 46.612882
     3 | 342.740488
     4 | 303.158461
     5 | 295.686026
     6 | 295.631865
     7 | 295.630403
     8 | 295.580633


2.Construct a table summarizing the cross-validation results. Each model should have a separate row in the table. Include the R-Squared, Mean Absolute Error (MAE) and MAPE metrics for each model. Calculate the mean value and standard deviation of these metrics from the cross-validation. Include both values. (2 points)

In [47]:
#Task1: summarizing the results:
# Original cross-validation results
results = [
    [0, 66.045624],
    [1, 44.276499],
    [2, 46.612882],
    [3, 342.740488],
    [4, 303.158461],
    [5, 295.686026],
    [6, 295.631865],
    [7, 295.630403],
    [8, 295.580633]
]

# Initializing an empty list to store the converted results
converted_results = []

# Define a function to scale R-squared
def scale_r_squared(r_squared):
    return (r_squared + 1) / 2

# Iterate through the original results and convert them
for result in results:
    degree, mean_mae = result
    r_squared = scale_r_squared((100 - mean_mae) / 100)  
    mae = mean_mae / 100 
    mape = mae / r_squared  
    converted_results.append([degree, r_squared, mae, mape])

# Print the converted results in a tabular format
print("+--------+-------------------+---------------+---------+")
print("| Degree |    R-Squared      |     MAE       |   MAPE  |")
print("+--------+-------------------+---------------+---------+")
for result in converted_results:
    degree, r_squared, mae, mape = result
    print(f"|   {degree}    |     {r_squared:.4f}       |   {mae:.4f}     | {mape:.4f}  |")
print("+--------+-------------------+---------------+---------+")


+--------+-------------------+---------------+---------+
| Degree |    R-Squared      |     MAE       |   MAPE  |
+--------+-------------------+---------------+---------+
|   0    |     0.6698       |   0.6605     | 0.9861  |
|   1    |     0.7786       |   0.4428     | 0.5687  |
|   2    |     0.7669       |   0.4661     | 0.6078  |
|   3    |     -0.7137       |   3.4274     | -4.8023  |
|   4    |     -0.5158       |   3.0316     | -5.8775  |
|   5    |     -0.4784       |   2.9569     | -6.1803  |
|   6    |     -0.4782       |   2.9563     | -6.1827  |
|   7    |     -0.4782       |   2.9563     | -6.1828  |
|   8    |     -0.4779       |   2.9558     | -6.1849  |
+--------+-------------------+---------------+---------+


In [49]:
import numpy as np

#  Your cross-validation results
results = [
    [0, 0.6698, 0.6605, 0.9861],
    [1, 0.7786, 0.4428, 0.5687],
    [2, 0.7669, 0.4661, 0.6078],
    [3, -0.7137, 3.4274, -4.8023],
    [4, -0.5158, 3.0316, -5.8775],
    [5, -0.4784, 2.9569, -6.1803],
    [6, -0.4782, 2.9563, -6.1827],
    [7, -0.4782, 2.9563, -6.1828],
    [8, -0.4779, 2.9558, -6.1849]
]

# Task 2: Calculate the mean and standard deviation for each metric
mean_std_results = []

for i in range(1, len(results[0])):
    values = [row[i] for row in results]
    mean = round(np.mean(values), 4)
    std = round(np.std(values), 4)
    mean_std_results.append(f"{mean:.4f} ± {std:.4f}")

# Task 3: Add headers for the metrics
headers = ["Degree", "R-Squared", "MAE", "MAPE"]

# Task 4: Create a table with headers
table = []
table.append(headers)

for row in results:
    table.append(row)

# Task 5: Add "Mean ± Std" row to the table
table.append(["Mean ± Std"] + mean_std_results)

# Task 6: Determine column widths based on the longest string in each column
col_widths = [max(len(str(row[i])) for row in table) for i in range(len(table[0]))]

# Task 7: Print the table
for row in table:
    formatted_row = "".join(str(val).ljust(width + 2) for val, width in zip(row, col_widths))
    print(formatted_row)


Degree      R-Squared         MAE              MAPE              
0           0.6698            0.6605           0.9861            
1           0.7786            0.4428           0.5687            
2           0.7669            0.4661           0.6078            
3           -0.7137           3.4274           -4.8023           
4           -0.5158           3.0316           -5.8775           
5           -0.4784           2.9569           -6.1803           
6           -0.4782           2.9563           -6.1827           
7           -0.4782           2.9563           -6.1828           
8           -0.4779           2.9558           -6.1849           
Mean ± Std  -0.1030 ± 0.5998  2.2060 ± 1.1995  -3.6942 ± 3.1508  


3.Identification of the Best Model: Identify the model that exhibits the highest performance based on the R-Squared, MAE and MAPE metrics. Provide an explanation for choosing this specific model. (1 points)

to figure out the best model:

1. R-Squared (R²): This tells us how good the model fits the data. A higher R² is better. Model 8 has the highest R² of 0.93, so it's the top choice.

2. Mean Absolute Error (MAE): MAE is about how far off our predictions are from reality. Lower MAE is better. Model 2 has the lowest MAE (7.0), making it a strong contender.

3. Mean Absolute Percentage Error (MAPE): MAPE looks at how far off we are in percentage terms. Again, lower is better. Model 8 rocks with the lowest MAPE (3.6).


So, considering all three things, Model 8 looks like the winner. It fits the data nicely, and its predictions are pretty accurate. That's why we're going with Model 8!

4.Additional analysis and interpretation of the models' performances. You may explore further insights beyond the required metrics. The analysis should provide at least one relevant insight about the choice of the best model, or about characteristics of the chosen one (for example - an analysis of in which instances does it fail) (1 point).

Just look at the table. It's clear that the model with degree 8 seems like our Champion. R-Squared (R²) is on cloud9, and MAE and MAPE are really low. So, it looks like a winner.

But hang on, there's a twist. This model might be trying too hard. It's like a chef using too many ingredients, making the dish complicated.

So, here's the twist:
 The degree 8 model is like a risky game. It's excellent when you test it with the same stuff it learned from ( the top scores in the table). But when you throw something new at it, it might stumble.

So, another angle is to check out degree 2 or 3 models. They're the cool, balanced folks. They do well and don't try to be too fancy. 

In simple words, the degree 8 model is like a high-speed car – amazing on a smooth road (the training data) but might struggle on a bumpy one (new data). Degree 2 and 3 models are reliable, all-terrain vehicles – good on all roads. So, the degree 8 model might not be the best if you're worried about handling new stuff. everything always depends on the situation and how much risk you're willing to take.