Classification is a supervised learning technique where the goal is to predict the categorical label of a given input. It involves training a model on a labeled dataset to classify new data into predefined categories.

Types of Classification Algorithms

1. **Logistic Regression**
2. **Decision Trees**
3. **Random Forest**
4. **Support Vector Machines (SVM)**
5. **Naive Bayes**

## Dataset

We will use the Iris dataset from `scikit-learn`, which contains measurements of iris flowers and their species.

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names

In [3]:
df = pd.DataFrame(data=X, columns=feature_names)
df['species'] = y
df['species'] = df['species'].apply(lambda x: target_names[x])

In [4]:
df.head(4)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa


## Logistic Regression
It is used for binary or multi-class classification problems. It models the probability of a certain class or event existing.

In [5]:
from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# initializing and training the model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# making predictions
y_pred = model.predict(X_test)

# evaluating the model
print("Logistic Regression")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=target_names))

Logistic Regression
Accuracy: 0.9777777777777777
Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        16
  versicolor       1.00      0.94      0.97        18
   virginica       0.92      1.00      0.96        11

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45



## Decision Trees
Decision Trees are a non-parametric method used for classification and regression. They work by splitting the data into subsets based on the values of input features.

In [6]:
from sklearn.tree import DecisionTreeClassifier

# initializing and training the model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# making predictions
y_pred = model.predict(X_test)

# evaluating the model
print("Decision Tree")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=target_names))


Decision Tree
Accuracy: 0.9777777777777777
Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        16
  versicolor       1.00      0.94      0.97        18
   virginica       0.92      1.00      0.96        11

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45



## Random Forest
Random Forest is an ensemble method that uses multiple decision trees to improve the classification accuracy by averaging or voting on the results.

In [7]:
from sklearn.ensemble import RandomForestClassifier

# initializing and training the model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# making predictions
y_pred = model.predict(X_test)

# evaluating the model
print("Random Forest")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=target_names))


Random Forest
Accuracy: 0.9777777777777777
Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        16
  versicolor       1.00      0.94      0.97        18
   virginica       0.92      1.00      0.96        11

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45



## Support Vector Machines (SVM)
Support Vector Machines (SVM) are powerful classifiers that aim to find the hyperplane that best separates different classes in the feature space.

In [8]:
from sklearn.svm import SVC

# initializing and training the model
model = SVC()
model.fit(X_train, y_train)

# making predictions
y_pred = model.predict(X_test)

# evaluating the model
print("Support Vector Machine")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=target_names))


Support Vector Machine
Accuracy: 0.9777777777777777
Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        16
  versicolor       1.00      0.94      0.97        18
   virginica       0.92      1.00      0.96        11

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45



## Naive Bayes
Naive Bayes classifiers are based on Bayes' theorem with the assumption of independence between features. They are particularly useful for large datasets.

In [9]:
from sklearn.naive_bayes import GaussianNB

# initializing and training the model
model = GaussianNB()
model.fit(X_train, y_train)

# making predictions
y_pred = model.predict(X_test)

# evaluating the model
print("Naive Bayes")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=target_names))


Naive Bayes
Accuracy: 1.0
Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        16
  versicolor       1.00      1.00      1.00        18
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

