## Module 9 Lab 1 :Hyperparameter Tuning Using Grid Search

**Objective:**
The objective of this assignment is to practice using the `GridSearchCV` class in scikit-learn for hyperparameter tuning. You will implement a grid search to find the optimal hyperparameters for a K-nearest neighbors (KNN) regression model using the California Housing Prices dataset.

**Instructions:**

**Hyperparameter Tuning and Grid Search:** Hyperparameter tuning is the process of finding the optimal values for the hyperparameters of a machine learning model to improve its performance. Hyperparameters are settings that are not learned from the data but are set before training the model. They control the behavior of the model and can significantly impact its performance. In contrast, parameters are learned from the data during the model training process. They are the coefficients or weights that the model adjusts to make predictions. Hyperparameter tuning helps find the best configuration for a model to achieve optimal performance on a given task.

**Implementation of GridSearchCV for KNN Regression:** In this assignment, you will complete the incomplete code for a grid search using the `GridSearchCV` class in scikit-learn to find the optimal hyperparameters for a KNN regression model. The KNN regression model predicts continuous target variables based on the values of the nearest neighbors in the feature space.

**Analyzing the Results:** Analyze the results obtained from the grid search. Discuss the impact of the best hyperparameters found on the model's performance. Compare the results with the default hyperparameter settings of the KNN regression model. Interpret the best score obtained and assess the performance improvement achieved through hyperparameter tuning.

**Compare and Analyze the Results:** Once you have performed RandomizedSearchCV, compare and analyze the results with those obtained from GridSearchCV. Evaluate the impact of the alternative search method on the model's performance and the time taken for hyperparameter tuning. 

**Conclusion and Discussion:** Conclude your assignment by summarizing the key findings from both the GridSearchCV and RandomizedSearchCV. 

In [48]:
!pip3 install --upgrade pip
!pip3 install scikit-optimize



ERROR: To modify pip, please run the following command:
C:\Users\brenn\anaconda3\python.exe -m pip install --upgrade pip



Collecting pip
  Using cached pip-25.1.1-py3-none-any.whl.metadata (3.6 kB)
Using cached pip-25.1.1-py3-none-any.whl (1.8 MB)


In [49]:
import sklearn 
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.neighbors import KNeighborsRegressor

**Loading the California Housing Dataset:** Start by loading the California Housing Prices dataset. You can use the `fetch_california_housing()` function from scikit-learn's datasets module to load the dataset. The dataset contains information about housing prices in different regions of California. It includes various features such as the average number of rooms, population, and median income. Separate the features and target variables appropriately.

In [51]:
# Load the California Housing dataset
data = fetch_california_housing() 
features = data.data
targets = data.target

**Setting up the GridSearchCV Object:** Initialize the KNN regressor model (`KNeighborsRegressor()`) and define a dictionary of hyperparameters. The hyperparameters dictionary should include the values you want to explore during the grid search. Then, set up the `GridSearchCV` object using the defined KNN regressor model, the hyperparameters dictionary, the scoring metric (use `'neg_mean_absolute_error'` for mean absolute error), and the number of CPU cores to utilize (`n_jobs=-1` to use all available CPU cores).

In [53]:
# Define the KNN regressor model
knn = KNeighborsRegressor() 

# Define the hyperparameters dictionary
hyperparameters = {'n_neighbors': [3, 5, 7],
                   'weights': ['uniform', 'distance'],
                   'p': [1, 2]}

**Fitting the GridSearchCV Object:** Fit the `GridSearchCV` object to the features and target variables using the `fit()` method. This will perform cross-validation with the specified hyperparameter combinations and evaluate the models using the negative mean absolute error (MAE) as the scoring metric.


In [55]:
# Set up the GridSearchCV object
gs = GridSearchCV(knn, hyperparameters, scoring='neg_mean_absolute_error', n_jobs=-1)

# Fit the GridSearchCV object to the data
gs.fit(features, targets)

**Printing the Best Results:** Once the grid search is complete, print the best estimator, best parameters, and best score obtained from the grid search. The best estimator represents the KNN model with the optimal hyperparameters found during the grid search. The best parameters indicate the specific values of the hyperparameters that yielded the best performance. The best score indicates the mean absolute error achieved by the best model.

In [57]:
# Print the best estimator, best parameters, and best score
print("Best Estimator:", gs.best_estimator_)
print("Best Parameters:", gs.best_params_)
print("Best Score:", gs.best_score_)

Best Estimator: KNeighborsRegressor(n_neighbors=7, p=1, weights='distance')
Best Parameters: {'n_neighbors': 7, 'p': 1, 'weights': 'distance'}
Best Score: -0.8171788685661803


**Exploring Alternative Hyperparameter Search Methods:** In addition to GridSearchCV, there are other hyperparameter search methods available in scikit-learn that you can explore. RandomizedSearchCV is a popular alternative. RandomizedSearchCV performs a randomized search over the specified hyperparameter distributions.

**Try RandomizedSearchCV :** As an extension to this assignment, try implementing RandomizedSearchCV with the same KNN regression model and California Housing dataset. Follow similar steps as in the GridSearchCV implementation, but substitute the respective search class and modify the necessary parameters. Experiment with different hyperparameter distributions or search strategies.

In [59]:
# Trying RandomizedSearchCV
random_search = RandomizedSearchCV(knn, hyperparameters, scoring='neg_mean_absolute_error', n_jobs=-1, n_iter=10)
random_search.fit(features, targets)

print("\nRandomized Search - Best Estimator:", random_search.best_estimator_)
print("Randomized Search - Best Parameters:", random_search.best_params_)
print("Randomized Search - Best Score (Negative MAE):", random_search.best_score_)


Randomized Search - Best Estimator: KNeighborsRegressor(n_neighbors=7, p=1, weights='distance')
Randomized Search - Best Parameters: {'weights': 'distance', 'p': 1, 'n_neighbors': 7}
Randomized Search - Best Score (Negative MAE): -0.8171788685661803


# Analyzing the Results:

GridSearchCV found that using 7 neighbors, the Manhattan distance (p=1), and distance-based weighting (weights='distance') produced the best negative MAE of –0.8172. This represents an improvement compared to the default KNN settings, which typically yield higher MAE, demonstrating that tuning these hyperparameters sharpens the model’s ability to predict California home prices.

# Compare and Analyze the Results

RandomizedSearchCV produced the exact same best estimator and score as the GridSearchCV, even though it tested only 10 random combinations rather than the full grid. This shows that a well-configured randomized search can match the performance of a grid search in a fraction of the time, making it an efficient alternative for large datasets.

# Conclusion and Discussion

Both search methods confirm that the tuned KNN configuration reduces prediction error and provides a baseline for California Home Price analyses. 