Here, we are using Logistic Regression as the base model for RandomizedSearchCV. We have defined a hyperparameter grid to search over that includes the regularization parameter (C), penalty function (penalty), and solver method (solver).

The range of values for C is chosen as a logarithmic range from 0.0001 to 10000. The penalty function is chosen as L1 and L2, and the solver method is chosen as liblinear and saga.

We have set the number of iterations (n_iter) to 50 and the number of cross-validation folds (cv) to 5, which determines how many times the data will be split into training and validation sets during the search.

Finally, we fit the RandomizedSearchCV object to the training data, predict on the test set, and report the train and test accuracy, classification report, and best hyperparameters found during the search.

Note that the hyperparameters and their ranges were chosen based on empirical evidence and prior knowledge about the Logistic Regression classifier. The choice of hyperparameters and their ranges may vary depending on the specific problem and dataset.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
df=pd.read_excel('RE_Data.xlsx')

In [3]:
df = df.drop(df.columns[df.columns.str.contains('unnamed', case=False)], axis=1)

In [4]:
# Shuffle the DataFrame
df_shuff = df[0:50_000]

In [5]:
from sklearn.model_selection import train_test_split

In [6]:
var_columns = [c for c in df_shuff if c not in ['ph','ph_labels','c4','c3']]

X = df_shuff.loc[:,var_columns].values
y = df_shuff.loc[:,'ph'].values

In [7]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import numpy as np

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define the linear regression model
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions on the training and testing data
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

# Calculate the mean squared error and R2 score for the training and testing data
mse_train = mean_squared_error(y_train, y_pred_train)
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_train = np.sqrt(mse_train)
rmse_test = np.sqrt(mse_test)
r2_train = r2_score(y_train, y_pred_train)
r2_test = r2_score(y_test, y_pred_test)

# Print the results
print("Linear Regression train r2: {:.3f}".format(r2_train))
print("Linear Regression test r2: {:.3f}".format(r2_test))
print("Linear Regression train RMSE: {:.3f}".format(rmse_train))
print("Linear Regression test RMSE: {:.3f}".format(rmse_test))


Linear Regression train r2: 0.742
Linear Regression test r2: 0.744
Linear Regression train RMSE: 0.146
Linear Regression test RMSE: 0.147


In [8]:
import pickle

# Train your machine learning model and save it to a variable named 'model'

# Save the model to a file named 'model.pkl'
with open('Linear_regression_2_1', 'wb') as file:
    pickle.dump(model, file)