# Assignment 2: Regression, Multi-class, and Hyper-parameter Tuning

### Task 1: Regression Metrics (30 points total)

The code below executes the following steps:
* Load the California Housing dataset from sklearn.
* Split the dataset into training and testing sets.
* Train a linear regression model on the training data.

It is your task to:
* Evaluate the model's performance using Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared metrics.
* Print the evaluation results.
* Interpret the results and discuss how each metric reflects the performance of a regression model.

In [2]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Load the Boston Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the testing set

# Evaluate model performance


**Question: Interpret the results. How might we interpret the model performance and communicate it to stakeholders? (20 points)**



*Your Answer:*

### Task 2: Multiclass Classification Metrics (30 points total)

The code below executes the following steps:
* Load the Iris dataset from scikit-learn.
* Split the dataset into training and testing sets.
* Train a multiclass classification model, logistic regression, on the training data.

It is your task to:
* Evaluate the model's performance using precision, recall, F1 score
* Visualize a confusion matrix.
* Print the evaluation results.
* Interpret the results and discuss how each metric reflects the performance of a regression model.

In [5]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = model.predict(X_test)

# Evaluate model performance
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print results
print("📊 Regression Model Performance:")
print(f"Mean Absolute Error (MAE):      {mae:.2f}")
print(f"Mean Squared Error (MSE):       {mse:.2f}")
print(f"R-squared (R²) Score:           {r2:.2f}")

📊 Regression Model Performance:
Mean Absolute Error (MAE):      0.00
Mean Squared Error (MSE):       0.00
R-squared (R²) Score:           1.00


**Question: Interpret the results. How might we interpret the model performance and communicate it to stakeholders? (20 points)**


MAE measures the average size of the errors in absolute terms. lower here = better. And MAE of 0 means that the model made no absolute error on average. MSE , similar to MAE but squares the errors, which penalizes larger errors more heavily. We use this when we want to punish big mistakes more. A result of 0.00 means the model had zero squared error. R squared explains how much variance in the target variable is explalined by the model. It ranges from 0- to 1, and 1 is better. The Rsquared of 1.00 means that the model explains 100% of the variance in the data. These perfect results are suspcious, and make me wonder if I may have evaluated the model on the training and not the test data, however when I checked the code above that does not appear to be the case. Either way the perfect scores are suspicious and I would recommend additional analysis.

### Task 3: Model Selection, Hyperparameter Tuning, and Cross-Validation (40 points total)
The code below executes the following steps:
* Load in the Iris dataset.
* Split into training and testing

It is your task to:
* Implement a grid search with cross-validation to tune hyperparameters for a classification model (e.g. random forest).
* Explore different hyperparameters (e.g. number of estimators for random forest).
* Evaluate the model's performance using accuracy, precision, recall, and F1 score on the testing set.
* Print the **best hyperparameters** and evaluation results.

In [7]:
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 1: Define the parameter grid for Random Forest
param_grid = {
    'n_estimators': [10, 50, 100],
    'max_depth': [None, 3, 5],
    'min_samples_split': [2, 4]
}
# Step 2: Initialize RandomForestClassifier
rf = RandomForestClassifier(random_state=42)

# Step 3: Grid search with cross-validation
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Step 4: Get the best model
best_model = grid_search.best_estimator_

# Step 5: Make predictions on the test set
y_pred = best_model.predict(X_test)

# Step 6: Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')  # macro = equally weighted across classes
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')

# Step 7: Print results
print("🌟 Best Hyperparameters:", grid_search.best_params_)
print("\n📊 Evaluation Metrics:")
print(f"Accuracy:  {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall:    {recall:.2f}")
print(f"F1 Score:  {f1:.2f}")

print("\n🧾 Classification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))


🌟 Best Hyperparameters: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 10}

📊 Evaluation Metrics:
Accuracy:  1.00
Precision: 1.00
Recall:    1.00
F1 Score:  1.00

🧾 Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



### OPTIONAL Task 4: Custom Scoring Metric (20 bonus points)

In sklearn, you are not limited to using their scoring functions. You can create your own!

You can create a custom scoring metric in scikit-learn by defining a scoring function and then using the `make_scorer` function to wrap it as a scorer.

**For bonus points:**

* Define a custom scoring function custom_scoring that calculates the weighted sum of precision and recall for a binary classification problem.
* Then wrap this function using make_scorer to create a custom scorer custom_scorer.
* Use this custom scorer in cross-validation to evaluate the performance of a logistic regression model.

In [8]:
from sklearn.metrics import precision_score, recall_score, make_scorer
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

# Define your custom scoring function
def custom_scoring(y_true, y_pred, precision_weight=0.6, recall_weight=0.4):
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
    score = (precision_weight * precision) + (recall_weight * recall)
    return score

# Wrap the custom scoring function as a scorer (pass additional weights as args if needed)
custom_scorer = make_scorer(custom_scoring, greater_is_better=True)

# Generate a binary classification dataset
X, y = make_classification(n_samples=100, n_features=10, random_state=42)

# Initialize the logistic regression model
model = LogisticRegression()

# Use custom scorer in cross-validation
scores = cross_val_score(model, X, y, cv=5, scoring=custom_scorer)

# Print results
print("Custom Scores from Cross-Validation:", scores)
print("Mean Custom Score:", scores.mean())

Custom Scores from Cross-Validation: [0.9        1.         0.85333333 0.94545455 0.96      ]
Mean Custom Score: 0.9317575757575757


In [9]:
# THIS CODE TESTS YOUR FUNCTION
# Generate sample data
X, y = make_classification(n_samples=100, n_features=10, random_state=42)

# Create and train a model using cross-validation
model = LogisticRegression()
scores = cross_val_score(model, X, y, cv=5, scoring=custom_scorer)

# Print the custom scores obtained from cross-validation
print("Custom Scores:", scores)
print("Mean Custom Score:", scores.mean())

Custom Scores: [0.9        1.         0.85333333 0.94545455 0.96      ]
Mean Custom Score: 0.9317575757575757
