Here, we are using KNeighborsClassifier as the base model for RandomizedSearchCV. We have defined a hyperparameter grid to search over that includes the number of neighbors (n_neighbors), weight function (weights), and distance metric (metric).

The range of values for n_neighbors is chosen as odd numbers from 1 to 51, as KNN performs better with an odd number of neighbors to avoid ties. The weight function is chosen as uniform and distance, and the metric is chosen as euclidean, manhattan, and minkowski.

We have set the number of iterations (n_iter) to 50 and the number of cross-validation folds (cv) to 5, which determines how many times the data will be split into training and validation sets during the search.

Finally, we fit the RandomizedSearchCV object to the training data, predict on the test set, and report the train and test accuracy, classification report, and best hyperparameters found during the search.

Note that the hyperparameters and their ranges were chosen based on empirical evidence and prior knowledge about the KNN classifier. The choice of hyperparameters and their ranges may vary depending on the specific problem and dataset.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
df=pd.read_excel('RE_Data.xlsx')

In [3]:
df = df.drop(df.columns[df.columns.str.contains('unnamed', case=False)], axis=1)

In [4]:
# Shuffle the DataFrame
df_shuff = df

In [5]:
from sklearn.model_selection import train_test_split

In [6]:
var_columns = [c for c in df if c not in ['ph','ph_labels']]

X = df.loc[:,var_columns].values
y = df.loc[:,'ph'].values

In [7]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

# Split the data into training and testing sets
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, random_state=42)

# Define the KNN regression model
model = KNeighborsRegressor()

# Define the hyperparameter grid to search over
param_dist = {
    'n_neighbors': np.arange(1, 51),
    'weights': ['uniform', 'distance'],
    'p': [1, 2]
}

# Define the randomized search object
search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=50, cv=5, random_state=42)

# Fit the randomized search object to the training data
search.fit(X_train, y_train)

# Evaluate the model on training and validation sets
y_pred_train = search.predict(X_train)
y_pred_valid = search.predict(X_valid)

mse_train = mean_squared_error(y_train, y_pred_train)
mse_valid = mean_squared_error(y_valid, y_pred_valid)
rmse_train = np.sqrt(mse_train)
rmse_valid = np.sqrt(mse_valid)
r2_train = r2_score(y_train, y_pred_train)
r2_valid = r2_score(y_valid, y_pred_valid)

print("K-Nearest Neighbors Regressor train R^2 score: {:.3f}".format(r2_train))
print("K-Nearest Neighbors Regressor valid R^2 score: {:.3f}".format(r2_valid))
print("K-Nearest Neighbors Regressor train RMSE score: {:.3f}".format(rmse_train))
print("K-Nearest Neighbors Regressor valid RMSE score: {:.3f}".format(rmse_valid))
print("K-Nearest Neighbors Regressor best params:\n{}\n".format(search.best_params_))


K-Nearest Neighbors Regressor train R^2 score: 1.000
K-Nearest Neighbors Regressor valid R^2 score: 0.999
K-Nearest Neighbors Regressor train RMSE score: 0.000
K-Nearest Neighbors Regressor valid RMSE score: 0.009
K-Nearest Neighbors Regressor best params:
{'weights': 'distance', 'p': 2, 'n_neighbors': 4}



In [None]:
# from sklearn.ensemble import AdaBoostClassifier
# from sklearn.model_selection import RandomizedSearchCV, train_test_split
# from sklearn.metrics import classification_report


# # Split the data into training and testing sets
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# # Define the AdaBoost classifier model
# model = AdaBoostClassifier()

# # Define the hyperparameter grid to search over
# param_dist = {
#     'n_estimators': np.arange(50, 501, 50),
#     'learning_rate': np.logspace(-4, 0, 50),
#     'algorithm': ['SAMME', 'SAMME.R']
# }

# # Define the randomized search object
# search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=50, cv=5, random_state=42)

# # Fit the randomized search object to the training data
# search.fit(X_train, y_train)


# y_pred = search.predict(X_test)
# accuracy_train = search.score(X_train, y_train)
# accuracy_test = search.score(X_test, y_test)

# report_test = classification_report(y_test, y_pred)

# print("AdaBoostClassifier_train ",accuracy_train)
# print("AdaBoostClassifier report:\n{}\nAccuracy_test: {:.3f}\n".format(report_test, accuracy_test))
# print("AdaBoostClassifier best params:\n{}\n".format(search.best_params_))


