# Assignment 2: Regression, Multi-class, and Hyper-parameter Tuning

### Task 1: Regression Metrics (30 points total)

The code below executes the following steps:
* Load the California Housing dataset from sklearn.
* Split the dataset into training and testing sets.
* Train a linear regression model on the training data.

It is your task to:
* Evaluate the model's performance using Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared metrics.
* Print the evaluation results.
* Interpret the results and discuss how each metric reflects the performance of a regression model.

In [2]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Load the Boston Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = model.predict(X_test)

# Evaluate model performance
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print results so we can intepret
print(f"Mean Absolute Error (MAE): {mae}")
print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared (R2): {r2}")

Mean Absolute Error (MAE): 0.533200130495698
Mean Squared Error (MSE): 0.5558915986952422
R-squared (R2): 0.5757877060324524


**Question: Interpret the results. How might we interpret the model performance and communicate it to stakeholders? (20 points)**



*Your Answer:*
Interpretation of Results
Mean Absolute Error (MAE): 0.53
MAE represents the average absolute difference between the predicted and actual values. In this case, on average, the model's predictions are off by about 0.53 units. This gives a sense of the typical magnitude of error.
Mean Squared Error (MSE): 0.56
MSE is similar to MAE but squares the errors before averaging them. This gives more weight to larger errors. An MSE of 0.56 means the average squared difference between predictions and actual values is 0.56. It's useful for understanding the variance of errors.
R-squared (R2): 0.58
R-squared indicates the proportion of variance in the target variable that is explained by the model. An R-squared of 0.58 means that about 58% of the variability in the target variable is captured by the model's predictions. The remaining 42% is unexplained.

### Task 2: Multiclass Classification Metrics (30 points total)

The code below executes the following steps:
* Load the Iris dataset from scikit-learn.
* Split the dataset into training and testing sets.
* Train a multiclass classification model, logistic regression, on the training data.

It is your task to:
* Evaluate the model's performance using precision, recall, F1 score
* Visualize a confusion matrix.
* Print the evaluation results.
* Interpret the results and discuss how each metric reflects the performance of a regression model.

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = model.predict(X_test)

# Evaluate model performance
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

# Print results so we can intepret
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")

# Visualize a confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)


Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


**Question: Interpret the results. How might we interpret the model performance and communicate it to stakeholders? (20 points)**


*Your Answer:*

Interpretation of Results
Precision: 1.0

Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. A precision of 1.0 means that for each class, every prediction made by the model was correct (no false positives).
Recall: 1.0

Recall is the ratio of correctly predicted positive observations to all actual positive observations. A recall of 1.0 means the model correctly identified all actual positive instances for each class (no false negatives).
F1 Score: 1.0

The F1 score is the harmonic mean of precision and recall, providing a balance between the two. An F1 score of 1.0 indicates perfect precision and recall.
Confusion Matrix:

The confusion matrix visually represents the performance of a classification model.

### Task 3: Model Selection, Hyperparameter Tuning, and Cross-Validation (40 points total)
The code below executes the following steps:
* Load in the Iris dataset.
* Split into training and testing

It is your task to:
* Implement a grid search with cross-validation to tune hyperparameters for a classification model (e.g. random forest).
* Explore different hyperparameters (e.g. number of estimators for random forest).
* Evaluate the model's performance using accuracy, precision, recall, and F1 score on the testing set.
* Print the **best hyperparameters** and evaluation results.

In [3]:
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV, train_test_split, cross_val_score # Import cross_val_score
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the parameter grid
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}

# Create a GridSearchCV object
grid_search = GridSearchCV(SVC(), param_grid, cv=5)

# Fit the grid search to the data
grid_search.fit(X_train, y_train)

# Perform grid search with cross-validation
cross_val_score(grid_search, X_train, y_train, cv=5) # Now cross_val_score is defined

# Get the best hyperparameters
hyperparameters = grid_search.best_params_

# Evaluate model performance
y_pred = grid_search.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Print the best hyperparameters and evaluation results
print("Best Hyperparameters:", hyperparameters)
print("Accuracy:", accuracy)

Best Hyperparameters: {'C': 1, 'kernel': 'linear'}
Accuracy: 1.0


### OPTIONAL Task 4: Custom Scoring Metric (20 bonus points)

In sklearn, you are not limited to using their scoring functions. You can create your own!

You can create a custom scoring metric in scikit-learn by defining a scoring function and then using the `make_scorer` function to wrap it as a scorer.

**For bonus points:**

* Define a custom scoring function custom_scoring that calculates the weighted sum of precision and recall for a binary classification problem.
* Then wrap this function using make_scorer to create a custom scorer custom_scorer.
* Use this custom scorer in cross-validation to evaluate the performance of a logistic regression model.

In [5]:
from sklearn.metrics import make_scorer
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

# Define your custom scoring function
def custom_scoring(y_true, y_pred, precision_weight = 0.6, recall_weight = 0.4):
    # calculate the weighted sum of prescision and recall for binary classification
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
    score = precision_weight * precision + recall_weight * recall

    return score

# Wrap the custom scoring function as a scorer
custom_scorer = make_scorer(custom_scoring, greater_is_better=True)



In [6]:
# THIS CODE TESTS YOUR FUNCTION
# Generate sample data
X, y = make_classification(n_samples=100, n_features=10, random_state=42)

# Create and train a model using cross-validation
model = LogisticRegression()
scores = cross_val_score(model, X, y, cv=5, scoring=custom_scorer)

# Print the custom scores obtained from cross-validation
print("Custom Scores:", scores)
print("Mean Custom Score:", scores.mean())

Custom Scores: [0.9        1.         0.85333333 0.94545455 0.96      ]
Mean Custom Score: 0.9317575757575757
