# AdaBoost Implementation in Python

## AdaBoost Definition

### What is AdaBoost?

**AdaBoost**, short for Adaptive Boosting, is a machine learning algorithm that combines multiple weak classifiers to form a strong classifier. A weak classifier is one that performs slightly better than random guessing, while a strong classifier is one that performs very well. AdaBoost is particularly good at improving the performance of these weak classifiers.

### How does AdaBoost work?

1. **Initialize Weights**: Each training example is assigned a weight. Initially, all weights are set equally.
2. **Train Weak Classifier**: A weak classifier is trained using the weighted training data.
3. **Compute Weak Classifier Error**: The error of the weak classifier is calculated based on the weights of the misclassified examples.
4. **Update Weights**: The weights of the misclassified examples are increased, while the weights of correctly classified examples are decreased. This way, the algorithm focuses more on the difficult examples in subsequent iterations.
5. **Combine Weak Classifiers**: The weak classifiers are combined to form a strong classifier. Each weak classifier is assigned a weight based on its accuracy.

### Step-by-Step Python Implementation

We'll use the `scikit-learn` library in Python to implement AdaBoost. `scikit-learn` provides a convenient way to use AdaBoost with its `AdaBoostClassifier` class.

1. **Install scikit-learn**: If you haven't installed it yet, you can do so using pip:
   ```bash
   pip install scikit-learn
   ```

2. **Import Libraries**: Start by importing the necessary libraries.
   ```python
   from sklearn.ensemble import AdaBoostClassifier
   from sklearn.tree import DecisionTreeClassifier
   from sklearn.datasets import make_classification
   from sklearn.model_selection import train_test_split
   from sklearn.metrics import accuracy_score
   ```

3. **Create Dataset**: For simplicity, we'll create a synthetic dataset.
   ```python
   # Create synthetic dataset
   X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)
   
   # Split the dataset into training and testing sets
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
   ```

4. **Train AdaBoost Classifier**: We'll use a decision tree as the weak classifier.
   ```python
   # Initialize the base classifier
   base_clf = DecisionTreeClassifier(max_depth=1)
   
   # Initialize the AdaBoost classifier
   ada_clf = AdaBoostClassifier(base_estimator=base_clf, n_estimators=50, learning_rate=1.0, random_state=42)
   
   # Train the AdaBoost classifier
   ada_clf.fit(X_train, y_train)
   ```

5. **Make Predictions**: Use the trained model to make predictions on the test set.
   ```python
   # Make predictions on the test set
   y_pred = ada_clf.predict(X_test)
   
   # Calculate accuracy
   accuracy = accuracy_score(y_test, y_pred)
   print(f'Accuracy: {accuracy * 100:.2f}%')
   ```

### Key Parameters of AdaBoost

- `base_estimator`: The weak classifier used in AdaBoost. Here, we use `DecisionTreeClassifier`.
- `n_estimators`: The number of weak classifiers to train.
- `learning_rate`: Shrinks the contribution of each weak classifier. There is a trade-off between `learning_rate` and `n_estimators`.
