<a href="https://colab.research.google.com/github/ankita1200/Machine-Learning-Topics/blob/main/Adaboost.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Write a Python function `adaboost_fit` that implements the fit method for an AdaBoost classifier. The function should take in a 2D numpy array `X` of shape `(n_samples, n_features)` representing the dataset, a 1D numpy array `y` of shape `(n_samples,)` representing the labels, and an integer `n_clf` representing the number of classifiers. The function should initialize sample weights, find the best thresholds for each feature, calculate the error, update weights, and return a list of classifiers with their parameters.

Example:
    X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
    y = np.array([1, 1, -1, -1])
    n_clf = 3

    clfs = adaboost_fit(X, y, n_clf)
    print(clfs)
    # Output (example format, actual values may vary):
    # [{'polarity': 1, 'threshold': 2, 'feature_index': 0, 'alpha': 0.5},
    #  {'polarity': -1, 'threshold': 3, 'feature_index': 1, 'alpha': 0.3},
    #  {'polarity': 1, 'threshold': 4, 'feature_index': 0, 'alpha': 0.2}]
    

In [4]:
import numpy as np

In [30]:
class Adaboost:

  def DecisionStump(self, X, y, sample_weights):

    """Decision stump is a decision tree that classfies based on 1 feature,threshold pair.
     It has a depth of 1. It is a weak classifier
     Returns : Returns a weak classifier."""

    n_samples, n_features = X.shape
    min_error = float("inf")
    clf = {}
    for feature_index in range(n_features):
      unique_values = np.unique(X[:,feature_index])
      for threshold in unique_values:
        polarity = 1
        predictions = np.ones(y.shape)
        negative_class = X[:,feature_index] < threshold
        predictions[negative_class] = -1
        error_t = np.sum(sample_weights[predictions != y])
        if error_t > 0.5:
          error_t = 1 - error_t
          polarity = -1
        if error_t < min_error:
          clf["polarity"] = polarity
          clf["feature_index"] = feature_index
          clf["threshold"] = threshold
          min_error = error_t
    return clf, min_error


  def fit(self,X, y, n_estimators):
    n_samples, n_features = X.shape
    sample_weights = np.full(n_samples, (1/n_samples))
    clfs = []
    for j in range(n_estimators):
      clf,min_error = self.DecisionStump(X,y,sample_weights)
      predictions = np.ones(y.shape)
      # prediction using the jth weak classifier (decision stump)
      negative_class = clf["polarity"] * X[:, clf["feature_index"]] < clf["polarity"] * clf["threshold"]
      predictions[negative_class] = -1
      #error_j = np.dot(sample_weights, predictions != y)
      alpha_j = 0.5 * np.log((1-min_error)/( min_error+ 1e-10))
      sample_weights = np.multiply(sample_weights, np.exp(-alpha_j*y*predictions))
      # scaling the weights
      sample_weights /= np.sum(sample_weights)
      clf["alpha"] = alpha_j
      clfs.append(clf)
    return clfs


In [31]:
adaboost_clf = Adaboost()
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y = np.array([1, 1, -1, -1])
n_clf = 3
adaboost_clf.fit(X, y, n_clf)

[{'polarity': -1,
  'feature_index': 0,
  'threshold': 3,
  'alpha': 11.512925464970229},
 {'polarity': -1,
  'feature_index': 0,
  'threshold': 3,
  'alpha': 11.512924909859024},
 {'polarity': -1,
  'feature_index': 0,
  'threshold': 1,
  'alpha': 11.512925464970229}]