<a href="https://colab.research.google.com/github/Krishna737Sharma/Adaboost-Classification-/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip -q install ucimlrepo

In [None]:
# Suppress all warnings
import warnings
warnings.filterwarnings("ignore")

TASK - 1: Load the UCI zoo dataset from https://archive.ics.uci.edu/dataset/111/zoo  <font color='red'>[MARK - 1]</font>

In [None]:
from ucimlrepo import fetch_ucirepo

# fetch dataset
zoo = fetch_ucirepo(id=111)

# data (as pandas dataframes)
X = zoo.data.features
y = zoo.data.targets

# metadata
# print(zoo.metadata)

# variable information
# print(zoo.variables)

In [None]:
import pandas as pd
url = "https://archive.ics.uci.edu/static/public/111/data.csv"
data = pd.read_csv(url)

TASK - 2: Check the dataset for missing or duplicate values and handle them appropriately.    <font color='red'>[MARK - 1]</font>

In [None]:
# Check for missing values
print(data.isnull().sum())

# Check for duplicate values
print(data.duplicated().sum())

animal_name    0
hair           0
feathers       0
eggs           0
milk           0
airborne       0
aquatic        0
predator       0
toothed        0
backbone       0
breathes       0
venomous       0
fins           0
legs           0
tail           0
domestic       0
catsize        0
type           0
dtype: int64
0


In [None]:
# # Remove duplicates
# data = data.drop_duplicates()

# # Handle missing values (if any)
# data = data.fillna(data.mean())

TASK - 3: Show the class distribution of the dataset.     <font color='red'>[MARK - 1]</font>

In [None]:
# Class distribution
print(data['type'].value_counts())
print("-------------------------------\n")
print(data['type'].value_counts(normalize=True))

type
1    41
2    20
4    13
7    10
6     8
3     5
5     4
Name: count, dtype: int64
-------------------------------

type
1    0.405941
2    0.198020
4    0.128713
7    0.099010
6    0.079208
3    0.049505
5    0.039604
Name: proportion, dtype: float64


TASK - 4: Split the data into training and test sets (80%-20%).      <font color='red'>[MARK - 1]</font>

In [None]:
from sklearn.model_selection import train_test_split

# Separate features and target
X = data.drop(['animal_name', 'type'], axis=1)
y = data['type']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

TASK - 5: Implement Adaboost classifier from scratch for binary classification using decision tree stumps as the base estimator. Use an appropriate value of the number of base estimators.   <font color='red'>[MARK - 5]</font>

In [None]:
import numpy as np
from sklearn.tree import DecisionTreeClassifier

class AdaBoostBinary:
    def __init__(self, n_estimators=50, epsilon=1e-10):
        self.n_estimators = n_estimators
        self.epsilon = epsilon  # Small value to prevent division by zero

    def fit(self, X, y):
        self.alphas = []
        self.models = []
        weights = np.ones(len(y)) / len(y)  # Initialize uniform weights for all samples

        for _ in range(self.n_estimators):
            model = DecisionTreeClassifier(max_depth=1)  # Decision stump
            model.fit(X, y, sample_weight=weights)
            preds = model.predict(X)

            # Compute error
            err = np.sum(weights * (preds != y)) / np.sum(weights)

            # If error is 0, set alpha to a large number
            if err == 0:
                alpha = 1e10  # Large value since the classifier is perfect
            else:
                alpha = 0.5 * np.log((1 - err) / (err + self.epsilon))  # Avoid dividing by zero

            self.alphas.append(alpha)
            self.models.append(model)

            # Update weights
            weights = weights * np.exp(-alpha * y * preds)
            weights = np.maximum(weights, self.epsilon)  # Prevent negative weights
            weights = weights / np.sum(weights)  # Normalize weights

    def predict(self, X):
        # Calculate the weighted sum of predictions
        weak_preds = np.zeros((X.shape[0], len(self.models)))
        for i, model in enumerate(self.models):
            weak_preds[:, i] = model.predict(X)

        weighted_preds = np.dot(weak_preds, self.alphas)  # Weighted sum of predictions
        return weighted_preds

TASK - 6: Train 7 Adaboost classifiers for the 7 classes for the one vs. all classification technique using the binary adaboost classifier you implemented from scratch. For each classifier, handle the labels appropriately. For example, for the first classifier, all samples belonging to class 1 will have label y=+1, and all other samples will have label y = -1. Similarly, for the second classifier, all samples belonging to “class 2” will have label +1, and samples belonging to all other classes would be assigned label –1. YPerform prediction on the test set using the 7 classifiers.   <font color='red'>[MARK - 4]</font>

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

n_classes = len(data['type'].unique())
classifiers = []
for i in range(1, n_classes + 1):
    # Create binary labels for class i (class i is +1, all other classes are -1)
    y_train_binary = np.where(y_train == i, 1, -1)
    y_test_binary = np.where(y_test == i, 1, -1)

    # Train Adaboost binary classifier
    clf = AdaBoostBinary(n_estimators=50)
    clf.fit(X_train, y_train_binary)

    classifiers.append((clf, i))

    # Predict and calculate accuracy
    y_pred = np.sign(clf.predict(X_test))
    print(f"Accuracy for class {i}: {accuracy_score(y_test_binary, y_pred)}")

Accuracy for class 1: 1.0
Accuracy for class 2: 1.0
Accuracy for class 3: 0.9523809523809523
Accuracy for class 4: 1.0
Accuracy for class 5: 1.0
Accuracy for class 6: 1.0
Accuracy for class 7: 1.0


TASK - 7: Compute the classification metrics – accuracy, precision recall.   <font color='red'>[MARK - 3]</font>

In [None]:
def predict_one_vs_all(classifiers, X):
    preds = np.zeros((X.shape[0], len(classifiers)))
    for i, (clf, class_id) in enumerate(classifiers):
        preds[:, i] = clf.predict(X)

    # Predicted class is the one with the highest weighted sum
    return np.argmax(preds, axis=1) + 1  # +1 because class labels start from 1

y_pred_all = predict_one_vs_all(classifiers, X_test)
print(f"Overall Accuracy: {accuracy_score(y_test, y_pred_all)}")
print(f"Overall Precision: {precision_score(y_test, y_pred_all, average='weighted')}")
print(f"Overall Recall: {recall_score(y_test, y_pred_all, average='weighted')}")
print(f"Overall F1-score: {f1_score(y_test, y_pred_all, average='weighted')}")

Overall Accuracy: 0.9523809523809523
Overall Precision: 0.9206349206349207
Overall Recall: 0.9523809523809523
Overall F1-score: 0.9333333333333333


TASK - 8: Perform classification using scikit-learn’s adaboost classifer and the OneVsRest() method: https://scikit-learn.org/1.5/modules/generated/sklearn.multiclass.OneVsRestClassifier.html  <font color='red'>[MARK - 2]</font>

In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.multiclass import OneVsRestClassifier

# Train Adaboost classifier using OneVsRest
clf_sklearn = OneVsRestClassifier(AdaBoostClassifier(n_estimators=50))
clf_sklearn.fit(X_train, y_train)

# Predict and evaluate
y_pred_sklearn = clf_sklearn.predict(X_test)
print(f"Scikit-learn Accuracy: {accuracy_score(y_test, y_pred_sklearn)}")

Scikit-learn Accuracy: 0.9523809523809523


TASK - 9: Compute the performance obtained with scikit-learn's implementation and compare with your from-scratch implementation.   <font color='red'>[MARK - 2]</font>

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

print("Performance metrics with scratch AdaBoost\n")
print(f"Custom Adaboost Accuracy: {accuracy_score(y_test, y_pred_all)}")
precision_custom = precision_score(y_test, y_pred_all, average='weighted')
recall_custom = recall_score(y_test, y_pred_all, average='weighted')
f1_custom = f1_score(y_test, y_pred_all, average='weighted')
print(f"Custom Precision: {precision_custom}, Recall: {recall_custom}, F1: {f1_custom}")
print("------------------------------------------------------------------------------------------------\n")

print("Performance metrics with Scikit-learn AdaBoost\n")
print(f"Scikit-learn Adaboost Accuracy: {accuracy_score(y_test, y_pred_sklearn)}")
precision_sklearn = precision_score(y_test, y_pred_sklearn, average='weighted')
recall_sklearn = recall_score(y_test, y_pred_sklearn, average='weighted')
f1_sklearn = f1_score(y_test, y_pred_sklearn, average='weighted')
print(f"Scikit-learn Precision: {precision_sklearn}, Recall: {recall_sklearn}, F1: {f1_sklearn}")

Performance metrics with scratch AdaBoost

Custom Adaboost Accuracy: 0.9523809523809523
Custom Precision: 0.9206349206349207, Recall: 0.9523809523809523, F1: 0.9333333333333333
------------------------------------------------------------------------------------------------

Performance metrics with Scikit-learn AdaBoost

Scikit-learn Adaboost Accuracy: 0.9523809523809523
Scikit-learn Precision: 0.9285714285714286, Recall: 0.9523809523809523, F1: 0.9365079365079365
