<a href="https://colab.research.google.com/github/Chulika1711/AI-ML-Internship/blob/main/Implement_regression_and_classification_tasks_using_grid_search_cv(TASK_12).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Parameters are the variables that are used by the Machine Learning algorithm for predicting the results based on the input historic data. Hyperparameters are the variables that the user specify usually while building the Machine Learning model. thus, hyperparameters are specified before specifying the parameters or we can say that hyperparameters are used to evaluate optimal parameters of the model.
**Grid Search** uses a different combination of all the specified hyperparameters and their values and calculates the performance for each combination and selects the best value for the hyperparameters. This makes the processing time-consuming and expensive based on the number of hyperparameters involved.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from google.colab import drive
drive.mount('/content/drive')
df1 = pd.read_csv('/content/drive/MyDrive/test.csv')
df2 = pd.read_csv('/content/drive/MyDrive/Used Car Dataset.csv')

Mounted at /content/drive


In [None]:
df1.columns

Index(['id', 'battery_power', 'blue', 'clock_speed', 'dual_sim', 'fc',
       'four_g', 'int_memory', 'm_dep', 'mobile_wt', 'n_cores', 'pc',
       'px_height', 'px_width', 'ram', 'sc_h', 'sc_w', 'talk_time', 'three_g',
       'touch_screen', 'wifi'],
      dtype='object')

In [None]:
df2.columns

Index(['Unnamed: 0', 'car_name', 'registration_year', 'insurance_validity',
       'fuel_type', 'seats', 'kms_driven', 'ownsership', 'transmission',
       'manufacturing_year', 'mileage(kmpl)', 'engine(cc)', 'max_power(bhp)',
       'torque(Nm)', 'price(in lakhs)'],
      dtype='object')

In [None]:
#it gives the datatype of every column
df1.dtypes

id                 int64
battery_power      int64
blue               int64
clock_speed      float64
dual_sim           int64
fc                 int64
four_g             int64
int_memory         int64
m_dep            float64
mobile_wt          int64
n_cores            int64
pc                 int64
px_height          int64
px_width           int64
ram                int64
sc_h               int64
sc_w               int64
talk_time          int64
three_g            int64
touch_screen       int64
wifi               int64
dtype: object

In [None]:
df2.dtypes

Unnamed: 0              int64
car_name               object
registration_year      object
insurance_validity     object
fuel_type              object
seats                   int64
kms_driven              int64
ownsership             object
transmission           object
manufacturing_year     object
mileage(kmpl)         float64
engine(cc)            float64
max_power(bhp)        float64
torque(Nm)            float64
price(in lakhs)       float64
dtype: object

**REGRESSION using grid search cv**

One can increase the model performance using hyperparameters. Thus, finding the optimal hyperparameters would help us achieve the best-performing model. In this article, we will learn about Hyperparameters, Grid Search, Cross-Validation, GridSearchCV, and the tuning of Hyperparameters in Python.
Hyperparameters for a model can be chosen using several techniques such as Random Search, Grid Search, Manual Search, Bayesian Optimizations, etc. In this article, we will learn about GridSearchCV which uses the Grid Search technique for finding the optimal hyperparameters to increase the model performance.

In [None]:
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import accuracy_score, mean_squared_error
data_encoded = pd.get_dummies(df1)
# Separate features and target variable
X_regression = data_encoded.drop('talk_time', axis=1)
y_regression = data_encoded['talk_time']
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_regression, y_regression, test_size=0.2, random_state=42)
rf_reg = RandomForestRegressor()
param_grid_reg = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}
grid_search_reg = GridSearchCV(estimator=rf_reg, param_grid=param_grid_reg, cv=5)
grid_search_reg.fit(X_train_reg, y_train_reg)

In [None]:
best_reg = grid_search_reg.best_estimator_
print("Best parameters for regression:", grid_search_reg.best_params_)
y_pred_reg = best_reg.predict(X_test_reg)
mse = mean_squared_error(y_test_reg, y_pred_reg)
print("Mean Squared Error on test set (regression):", mse)

Best parameters for regression: {'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}
Mean Squared Error on test set (regression): 27.189829377772067


**CLASSIFICATION using grid search cv**

In [None]:
# Remove rows with NaN values
data_encoded.dropna(inplace=True)

# Now, you can proceed with splitting your data and fitting your model
X_regression = data_encoded.drop('mileage(kmpl)', axis=1)
y_regression = data_encoded['mileage(kmpl)']

X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_regression, y_regression, test_size=0.2, random_state=42)

# Now you can proceed to fit your model
rf_reg = RandomForestRegressor()
param_grid_reg = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2, 4]
}

grid_search_reg = GridSearchCV(estimator=rf_reg, param_grid=param_grid_reg, cv=3)
grid_search_reg.fit(X_train_reg, y_train_reg)

best_reg = grid_search_reg.best_estimator_
print("Best parameters for regression:", grid_search_reg.best_params_)

y_pred_reg = best_reg.predict(X_test_reg)
mse = mean_squared_error(y_test_reg, y_pred_reg)
print("Mean Squared Error on test set (regression):", mse)


Best parameters for regression: {'max_depth': 30, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 300}
Mean Squared Error on test set (regression): 22508.54122607509
