 Classification Algorithms using scikit-learn

Class 1 : Logistic Regression

Logistic regression The logistic regression is implemented through **LogisticRegression()** function it is implemented as a linear model for classification rather than regression in terms of the scikit-learn.The logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function.

In [None]:
# importing dependencies
import pandas as pd

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

In [None]:
# load the dataset
housing = fetch_california_housing()
# setting up X and Y Variables
X = housing.data
Y = housing.target

In [None]:
import numpy as np
# convert the target variable into binary categories (High Price and Low Price)
median_price=np.median(Y)
Y_binary = (Y > median_price).astype(int)

In [None]:
# Splitting data into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, Y_binary, test_size=0.2, random_state=42)


In [None]:
# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
# Crate and train the Logistic Regression Model
model = LogisticRegression()
model.fit(X_train , y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [None]:
# make predictions
y_predict=model.predict(X_test)

In [None]:
#evaluate the model find the accuracy score
accuracy = accuracy_score(y_test, y_predict)
print("Accuracy:", accuracy)

Accuracy: 0.7999031007751938


In [None]:
# Output the results as a DataFrame
results_df = pd.DataFrame({
    "Actual": y_test,
    "Predicted": y_predict
})
print("\nResults DataFrame:")
print(results_df.head())


Results DataFrame:
   Actual  Predicted
0       0          0
1       0          0
2       1          1
3       1          1
4       1          1


Class 2 : k-Nearest Neighbors (k-NN):

Scikit-learn implements two different nearest neighbors classifiers:

1.KNN(K-NearestNeighbor): KNeighborsRegressor implements learning based on the
 nearest neighbors of each query point, where
 is an integer value specified by the user.

2.RNN(Radius-NearestNeighbor): RadiusNeighborRegressor implements learning based on the number of neighbors within a fixed radius
 of each training point, where
 is a floating-point value specified by the user



In [None]:
#importing the dependencies
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

In [None]:
# Create and train the KNN model
knn = KNeighborsRegressor(n_neighbors=3)
knn.fit(X_train, y_train)

In [None]:
# make predictions
y_predict1 = knn.predict(X_test)

In [None]:
# evaluate the model
mse = mean_squared_error(y_test, y_predict1)
print(f"Mean Squared Error: {mse:.2f}")

Mean Squared Error: 0.27


In [None]:
#actual and predicted datasets after model evaluation
comparison_df = pd.DataFrame({"Actual": y_test, "Predicted": y_predict1})
print("\nComparison of Actual and Predicted Prices:")
print(comparison_df.head())


Comparison of Actual and Predicted Prices:
   Actual  Predicted
0       0   0.333333
1       0   0.000000
2       1   1.000000
3       1   1.000000
4       1   0.000000


Class 3:Support Vector Machines (SVM)

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

In [None]:
# import dependencies
from sklearn.svm import SVC


In [None]:
# convert continous target to discrete classes
y_class = np.digitize(Y, bins=np.percentile(Y, [33, 66]), right=True)

In [None]:
# Create and train SVM model
svm=SVC()
svm.fit(X_train,y_train)

In [None]:
# make predictions
y_predict2=svm.predict(X_test)


In [None]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_predict2)
print("Accuracy:", accuracy)

Accuracy: 0.5179263565891473


In [None]:
# Output the results as a DataFrame
results_df = pd.DataFrame({"True Labels": y_test, "Predicted Labels": y_predict2})
print("\nResults DataFrame:")
print(results_df.head())


Results DataFrame:
   True Labels  Predicted Labels
0            0                 1
1            0                 0
2            1                 1
3            1                 0
4            1                 1


Class 4: Decision Tree

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation.

In [None]:
#importing the libraries
from sklearn.tree import DecisionTreeRegressor

In [None]:
# Create and train the Decision Tree Regressor model
decision_tree = DecisionTreeRegressor()
decision_tree.fit(X_train, y_train)

In [None]:
# Now make predictions
y_predict3 = decision_tree.predict(X_test)

In [None]:
# Evalute the model
mse = mean_squared_error(y_test, y_predict3)
print(f"Mean Squared Error: {mse:.2f}")

Mean Squared Error: 0.16


In [None]:
# output the dataframes
comparison_df = pd.DataFrame({"Actual": y_test, "Predicted": y_predict3})
print("\nComparison of Actual and Predicted Prices:")
print(comparison_df.head())


Comparison of Actual and Predicted Prices:
   Actual  Predicted
0       0        0.0
1       0        0.0
2       1        1.0
3       1        1.0
4       1        1.0


Class 5. RandomForest

The sklearn.ensemble module includes two averaging algorithms based on randomized decision trees: the RandomForest algorithm and the Extra-Trees method. Both algorithms are perturb-and-combine techniques [B1998] specifically designed for trees. This means a diverse set of classifiers is created by introducing randomness in the classifier construction. The prediction of the ensemble is given as the averaged prediction of the individual classifiers

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

In [None]:
# training random forest model
random_forest = RandomForestRegressor()
random_forest.fit(X_train, y_train)


In [None]:
# making predictions
y_predict4 = random_forest.predict(X_test)

In [None]:
# Evalute the model
mse = mean_squared_error(y_test, y_predict4)
print(f"Mean Squared Error: {mse:.2f}")
rmse =mse ** 0.5
print(f"Root Mean Squared Error: {rmse:.2f}")

Mean Squared Error: 0.08
Root Mean Squared Error: 0.29


Class 6. Gradient Boosting

Gradient Tree Boosting or Gradient Boosted Decision Trees (GBDT) is a generalization of boosting to arbitrary differentiable loss functions, see the seminal work of [Friedman2001]. GBDT is an excellent model for both regression and classification, in particular for tabular data.

In [None]:
#import libraries
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
#Training the model
gradient_boosting = GradientBoostingRegressor()
gradient_boosting.fit(X_train, y_train)

In [None]:
# making predictions
y_predict5 = gradient_boosting.predict(X_test)


In [None]:
# Evaluate the model
mse = mean_squared_error(y_test, y_predict5)
r2 = r2_score(y_test, y_predict5)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R^2 Score: {r2:.2f}")

Mean Squared Error: 0.09
R^2 Score: 0.63


In [None]:

# Output the results as a DataFrame
results_df = pd.DataFrame({
    "Actual": y_test,
    "Predicted": y_predict5
})
print("\nResults DataFrame:")
print(results_df.head())


Results DataFrame:
   Actual  Predicted
0       0  -0.043511
1       0   0.091257
2       1   0.987960
3       1   1.018354
4       1   0.786351


Class 7.Naive Bayes

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable

In [None]:
# importing necessary libraries
from sklearn.naive_bayes import GaussianNB


In [None]:
# Convert the target to a classification problem by binning the continuous target
bins = np.linspace(Y.min(), Y.max(), 6)
y_binned = np.digitize(Y, bins) - 1

In [None]:
# naive bayes model
naive_bayes = GaussianNB()
naive_bayes.fit(X_train, y_train)

In [None]:
# Make predictions
y_pred = naive_bayes.predict(X_test)

In [None]:
# Print the results
print(f"Naive Bayes Accuracy: {accuracy:.2f}")


results_df = pd.DataFrame([["Naive Bayes", accuracy]], columns=["Model", "Accuracy"])
print("\nResults DataFrame:")
print(results_df)

Naive Bayes Accuracy: 0.52

Results DataFrame:
         Model  Accuracy
0  Naive Bayes  0.517926
