<a href="https://colab.research.google.com/github/Leon-web-net/Learning_ML/blob/main/Ensemble_Methods/Ada_boosting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
import pandas as pd

In [None]:
from sklearn.datasets import load_breast_cancer

cancer_data = load_breast_cancer()

df_X = pd.DataFrame(cancer_data.data, columns=cancer_data.feature_names)
df_y = pd.Series(cancer_data.target)

print(cancer_data["DESCR"])

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

:Attribute Information:
    - radius (mean of distances from center to points on the perimeter)
    - texture (standard deviation of gray-scale values)
    - perimeter
    - area
    - smoothness (local variation in radius lengths)
    - compactness (perimeter^2 / area - 1.0)
    - concavity (severity of concave portions of the contour)
    - concave points (number of concave portions of the contour)
    - symmetry
    - fractal dimension ("coastline approximation" - 1)

    The mean, standard error, and "worst" or largest (mean of the three
    worst/largest values) of these features were computed for each image,
    resulting in 30 features.  For instance, field 0 is Mean Radius, field
    10 is Radius SE, field 20 is Worst Radius.

    - 

In [None]:
df_y.unique(), cancer_data.target_names

(array([0, 1]), array(['malignant', 'benign'], dtype='<U9'))

In [None]:
df_X.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [None]:
df_X.columns[19]

'fractal dimension error'

# Adaptive Boosting

<a> https://youtu.be/AtYN8QP-U6w <a/>

## 📌 How AdaBoost Works (Simplified)

AdaBoost builds a **committee of weak learners** (usually decision stumps) where each learner focuses on the mistakes of the previous ones.

---

### 🔹 Step-by-step process

1. **Initialize weights**  
   - Start with equal weights for all training samples:  
     $$ w_i = \frac{1}{N}, \quad i=1,\dots,N $$

2. **Train a weak learner**  
   - Fit a decision stump on the weighted dataset.  
   - Compute the weighted error:  
  $$
  \text{error}_t = \frac{\sum_{i=1}^N w_i \cdot \mathbf{1}\{y_i \neq h_t(x_i)\}}{\sum_{i=1}^N w_i}
  $$   

3. **Assign a vote strength (α)**  
   - Better stumps get a bigger voice:  
     $$
     \alpha_t = \tfrac{1}{2} \ln \left(\frac{1 - \text{error}_t}{\text{error}_t}\right)
     $$

4. **Update the sample weights**  
   - For each sample $i$:  
     $$
     w_i^{\text{new}} = w_i^{\text{old}} \cdot \exp\big(-\alpha_t \cdot y_i \cdot h_t(x_i)\big)
     $$

   - Intuition:  
     - If the stump is **correct** on $x_i$: weight decreases.  
     - If the stump is **wrong** on $x_i$: weight increases.  

5. **Normalise weights**  
   - Renormalise so weights sum to 1:  
     $$
     w_i^{\text{new}} \leftarrow \frac{w_i^{\text{new}}}{\sum_{j=1}^N w_j^{\text{new}}}
     $$

6. **Repeat**  
   - Train the next stump on the reweighted data.  

7. **Final prediction**  
   - Combine all learners with a weighted vote:  
     $$
     H(x) = \text{sign}\left( \sum_{t=1}^T \alpha_t \cdot h_t(x) \right)
     $$

---

✅ In plain words:  
- Misclassified points get **heavier weights**, so future learners focus on them.  
- Good learners get **louder votes** (larger $\alpha$).

In [None]:
class DecisionStump:
  def __init__(self):
    self.feature_idx = None
    self.threshold = None
    self.polarity = 1
    self.alpha = None

  def predict(self, X):
    m_samples = X.shape[0]
    preds = np.ones(m_samples)
    if self.polarity == 1:
      preds[X[:,self.feature_idx] < self.threshold] = -1
    else:
      preds[X[:,self.feature_idx] > self.threshold] = -1

    return preds


In [None]:
def build_stump(X,y,w):
    m_samples,n_features =  X.shape
    stump = DecisionStump()
    min_error = float("inf")

    for feature_i in range(n_features):
      feature_values = X[:,feature_i]
      thresholds = np.unique(feature_values)

      for thresh in thresholds:
        for polarity in [1, -1]:
          preds = np.ones(m_samples)
          if polarity == 1:
              preds[feature_values < thresh] = -1
          else:
              preds[feature_values > thresh] = -1

          err = np.sum(w[y!=preds])
          if err<min_error:
            min_error = err
            stump.polarity = polarity
            stump.threshold = thresh
            stump.feature_idx = feature_i

    return stump, min_error



In [None]:
def adaboost_train(X,y,n_estimators=10,EPS=1e-10):
  n_samples = X.shape[0]
  w = np.ones(n_samples)/n_samples
  stumps = []

  for _ in range(n_estimators):
    stump,error = build_stump(X,y,w)

    stump.alpha = 0.5*np.log((1-error+EPS)/(error+EPS))

    preds = stump.predict(X)
    w *= np.exp(-stump.alpha * y *preds)

    w /=np.sum(w)
    stumps.append(stump)

  return stumps

In [None]:
def adaboost_predict(X,stumps):
  stumps_preds = [stump.alpha * stump.predict(X) for stump in stumps]
  y_pred = np.sign(np.sum(stumps_preds,axis=0))
  return y_pred

In [None]:
from sklearn.model_selection import train_test_split

X,y = df_X.to_numpy(),df_y.to_numpy()
y = 2*y-1

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42,stratify=y)

type(X_train), X_train.shape, X_test.shape

(numpy.ndarray, (455, 30), (114, 30))

In [None]:
from sklearn.metrics import classification_report
stumps = adaboost_train(X_train,y_train,n_estimators=10)
y_pred = adaboost_predict(X_test,stumps)

print(f"Accuracy: {np.mean(y_pred==y_test)}")
print(f"{classification_report(y_test,y_pred)}")

Accuracy: 0.9473684210526315
              precision    recall  f1-score   support

          -1       0.93      0.93      0.93        42
           1       0.96      0.96      0.96        72

    accuracy                           0.95       114
   macro avg       0.94      0.94      0.94       114
weighted avg       0.95      0.95      0.95       114



In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

stump = DecisionTreeClassifier(max_depth=1,random_state=42)

clf = AdaBoostClassifier(
    estimator=stump,
    n_estimators=10,
    learning_rate=1.0,
    algorithm="SAMME",
    random_state=42
)

clf.fit(X_train,y_train)
y_pred_skl = clf.predict(X_test)

print(f"Accuracy: {np.mean(y_test==y_pred_skl)}")
print(classification_report(y_test,y_pred_skl))

Accuracy: 0.9649122807017544
              precision    recall  f1-score   support

          -1       0.97      0.93      0.95        42
           1       0.96      0.99      0.97        72

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114





# Why sklearn AdaBoost is Slightly More Accurate Than Custom Implementation

Even when overall accuracy is similar, sklearn's AdaBoost usually gives **better precision, recall, and F1-score**. Here are the key differences that affect accuracy (not speed):

---

### 1️⃣ Threshold selection
- **Custom implementation:** Uses **exact unique feature values** as candidate thresholds.  
- **Sklearn:** Uses **midpoints between consecutive sorted feature values**:
  $$
  \text{threshold} = \frac{x_i + x_{i+1}}{2}
  $$
  - Ensures splits are more balanced and avoids accidentally favoring specific samples.  
  - Provides more candidate splits, improving separation.

---

### 2️⃣ Handling ties in weighted error
- **Custom:** Picks the first threshold with minimum weighted error.  
- **Sklearn:** Uses **deterministic tie-breaking** and optimized calculations, reducing variability.  

---

### 3️⃣ Alpha / weight updates
- **Custom:** Uses **discrete predictions only (-1/+1)** and updates weights exactly as
  $$
  w_i \gets w_i \cdot \exp(-\alpha \, y_i \, h(x_i))
  $$
- **Sklearn:** If using `SAMME.R` (real-valued boosting), it uses **probability estimates from the base learner**, giving more nuanced weight updates that improve class-level metrics.  

---

### 4️⃣ Numerical stability
- Sklearn internally handles floating-point issues, ensuring **α and weights are computed accurately**, which helps avoid small errors accumulating over many weak learners.  

---

### ✅ Bottom line
- Custom AdaBoost is **correct and accurate**, but minor differences in:
  - threshold choice,
  - tie-breaking,
  - weight updates,
  - numerical stability  
  can lead to **slightly better per-class metrics** in sklearn, even if overall accuracy is almost the same.
