## Example 1 (AdaBoost)

Given data

|x     | y   | Weight|
------|------|----------
0.1   | 1   | 0.1
0.2   | 1   | 0.1
0.3   | 1   | 0.1
0.4   | -1  | 0.1
0.5   | -1  | 0.1
0.6   | -1  | 0.1
0.7   | -1  | 0.1
0.8   | 1   | 0.1
0.9   | 1   | 0.1
1     | 1   | 0.1

## **Iteration 1**
Step 1: Initialize weights
Initially, each sample has equal weight, which is 1/10.

Step 2: For each iteration:

### a. Train a weak learner: We'll use a decision stump as our weak learner, which is essentially a decision tree with a single split.

Let's say for the first iteration, the decision stump splits the data into two parts based on a threshold:

```
Threshold = 0.45
Predictions:
For x < 0.45: Predict 1
For x >= 0.45: Predict -1
```
**Before Prediction:**
| x    | y    | Weight |
|------|------|--------|
| 0.1  | 1    | 0.1    |
| 0.2  | 1    | 0.1    |
| 0.3  | 1    | 0.1    |
| 0.4  | -1   | 0.1    |
| 0.5  | -1   | 0.1    |
| 0.6  | -1   | 0.1    |
| 0.7  | -1   | 0.1    |
| 0.8  | 1    | 0.1    |
| 0.9  | 1    | 0.1    |
| 1    | 1    | 0.1    |

**After Prediction:**

| x    | y    | Weight | Prediction |
|------|------|--------|------------|
| 0.1  | 1    | 0.1    | 1          |
| 0.2  | 1    | 0.1    | 1          |
| 0.3  | 1    | 0.1    | 1          |
| 0.4  | -1   | 0.1    | 1          |
| 0.5  | -1   | 0.1    | -1         |
| 0.6  | -1   | 0.1    | -1         |
| 0.7  | -1   | 0.1    | -1         |
| 0.8  | 1    | 0.1    | -1         |
| 0.9  | 1    | 0.1    | -1         |
| 1    | 1    | 0.1    | -1         |

In the "After Prediction" table, there are four misclassified samples:

1. For x = 0.4, the true label is -1, but the prediction is 1.
2. For x = 0.8, the true label is 1, but the prediction is -1.
3. For x = 0.9, the true label is 1, but the prediction is -1.
4. For x = 1, the true label is 1, but the prediction is -1.

These samples are misclassified because they fall on the side of the threshold opposite to their true label. 

### b. Compute error: Error is computed as the weighted sum of misclassified samples.

To calculate the error, we sum the weights of the misclassified samples and then divide by the sum of all weights. 

```
Error = (Sum of weights of misclassified samples) / (Sum of all weights)
```

From the "Before Prediction" table:
- Weights of misclassified samples: 0.1 (for x = 0.4), 0.1 (for x = 0.8), 0.1 (for x = 0.9), 0.1 (for x = 1)
- Sum of weights of misclassified samples: 0.1 + 0.1 + 0.1 + 0.1 = 0.4

Sum of all weights: 1.0

```
Error = 0.4 / 1.0 = 0.4
```

So, the error is 0.4 or 40%. This means 40% of the samples are misclassified based on the current weak learner's threshold of 0.45. 

### c. Compute weight of the weak learner: We use the error to compute the weight of the weak learner in the final model.

To compute the weight of the weak learner in AdaBoost, we use the error of the weak learner. The formula to compute the weight of the weak learner is:

$ \text{Weight of weak learner} = \frac{1}{2} \ln\left(\frac{1 - \text{error}}{\text{error}}\right) $

Given that the error is 0.4 (as calculated previously):

$ \text{Weight of weak learner} = \frac{1}{2} \ln\left(\frac{1 - 0.4}{0.4}\right) $

$ \text{Weight of weak learner} = \frac{1}{2} \ln\left(\frac{0.6}{0.4}\right) $

$ \text{Weight of weak learner} = \frac{1}{2} \ln(1.5) $

$ \text{Weight of weak learner} = \frac{1}{2} \times 0.4055 $

$ \text{Weight of weak learner} = 0.2028 $

So, the weight of the weak learner is approximately 0.2028.

### d. Update weights: We update the weights of the samples, giving higher weight to misclassified samples.

To update the weights of the samples, we use the following formulas:

For correctly classified samples:
$ \text{New weight}_i = \text{Old weight}_i \times e^{-\text{weight of weak learner}} $

For misclassified samples:
$ \text{New weight}_i = \text{Old weight}_i \times e^{\text{weight of weak learner}} $

Let's calculate the updated weights and present them in the form of a table:

| x    | y    | Weight |   Weight (Updated) |
|------|------|--------|-------------------|
| 0.1  | 1    | 0.1    |0.1 * exp(-0.2028) ≈ 0.0899 |
| 0.2  | 1    | 0.1    |0.1 * exp(-0.2028) ≈ 0.0899 |
| 0.3  | 1    | 0.1    |0.1 * exp(-0.2028) ≈ 0.0899 |
| 0.4  | -1   | 0.1    |0.1 * exp(0.2028) ≈ 0.1111 |
| 0.5  | -1   | 0.1    |0.1 * exp(0.2028) ≈ 0.1111 |
| 0.6  | -1   | 0.1    |0.1 * exp(0.2028) ≈ 0.1111 |
| 0.7  | -1   | 0.1    |0.1 * exp(0.2028) ≈ 0.1111 |
| 0.8  | 1    | 0.1    |0.1 * exp(-0.2028) ≈ 0.0899 |
| 0.9  | 1    | 0.1    |0.1 * exp(-0.2028) ≈ 0.0899 |
| 1    | 1    | 0.1    |0.1 * exp(-0.2028) ≈ 0.0899 |

## **Iteration 2**
For the second iteration of AdaBoost, we follow the same steps as before:

1. Train a weak learner.
2. Compute the error.
3. Compute the weight of the weak learner.
4. Update the weights of the samples.

Let's continue with the updated weights from the first iteration:

**Updated Weights from First Iteration:**

| x    | y    | Weight (Updated) |
|------|------|-------------------|
| 0.1  | 1    | 0.0899 |
| 0.2  | 1    | 0.0899 |
| 0.3  | 1    | 0.0899 |
| 0.4  | -1   | 0.1111 |
| 0.5  | -1   | 0.1111 |
| 0.6  | -1   | 0.1111 |
| 0.7  | -1   | 0.1111 |
| 0.8  | 1    | 0.0899 |
| 0.9  | 1    | 0.0899 |
| 1    | 1    | 0.0899 |

Now, let's proceed with the second iteration:

**Weak Learner for Second Iteration:**
Let's say we choose the split at x = 0.25.

**Error Calculation:**
```
Error = (Sum of weights of misclassified samples) / (Sum of all weights)
```
From the updated weights:
- Weights of misclassified samples: 0.0899 (for x = 0.1), 0.0899 (for x = 0.2)
- Sum of weights of misclassified samples: 0.0899 + 0.0899 = 0.1798

Sum of all weights: 1.0

```
Error = 0.1798 / 1.0 = 0.1798
```

**Weight of the Weak Learner:**
```
Weight of weak learner = 0.5 * ln((1 - error) / error)
Weight of weak learner = 0.5 * ln((1 - 0.1798) / 0.1798)
Weight of weak learner ≈ 0.881
```

**Updating Weights:**

For correctly classified samples:
$ \text{New weight}_i = \text{Old weight}_i \times e^{-\text{weight of weak learner}} $

For misclassified samples:
$ \text{New weight}_i = \text{Old weight}_i \times e^{\text{weight of weak learner}} $

We'll update the weights accordingly and present them in a table.
Sure, let's update the weights using the formulas mentioned earlier and present them in a table:

| x    | y    | Weight (First Iteration) | Weight (Updated) |
|------|------|---------------------------|-------------------|
| 0.1  | 1    | 0.0899                    |0.0899 * exp(-0.881) ≈ 0.0451 |
| 0.2  | 1    | 0.0899                    |0.0899 * exp(-0.881) ≈ 0.0451 |
| 0.3  | 1    | 0.0899                    |0.0899 * exp(-0.881) ≈ 0.0451 |
| 0.4  | -1   | 0.1111                    |0.1111 * exp(0.881) ≈ 0.2186 |
| 0.5  | -1   | 0.1111                    |0.1111 * exp(0.881) ≈ 0.2186 |
| 0.6  | -1   | 0.1111                    |0.1111 * exp(0.881) ≈ 0.2186 |
| 0.7  | -1   | 0.1111                    |0.1111 * exp(0.881) ≈ 0.2186 |
| 0.8  | 1    | 0.0899                    |0.0899 * exp(-0.881) ≈ 0.0451 |
| 0.9  | 1    | 0.0899                    |0.0899 * exp(-0.881) ≈ 0.0451 |
| 1    | 1    | 0.0899                    |0.0899 * exp(-0.881) ≈ 0.0451 |

## **Iteration 3**

**Weak Learner for Third Iteration:**
Let's say we choose the split at x = 0.6.

**Error Calculation:**
```
Error = (Sum of weights of misclassified samples) / (Sum of all weights)
```
From the updated weights:
- Weights of misclassified samples: 0.0451 (for x = 0.8), 0.0451 (for x = 0.9), 0.0451 (for x = 1)
- Sum of weights of misclassified samples: 0.0451 + 0.0451 + 0.0451 = 0.1353

Sum of all weights: 1.0

```
Error = 0.1353 / 1.0 = 0.1353
```

**Weight of the Weak Learner:**
```
Weight of weak learner = 0.5 * ln((1 - error) / error)
Weight of weak learner = 0.5 * ln((1 - 0.1353) / 0.1353)
Weight of weak learner ≈ 0.851
```

**Updating Weights:**

For correctly classified samples:
\[ \text{New weight}_i = \text{Old weight}_i \times e^{-\text{weight of weak learner}} \]

For misclassified samples:
\[ \text{New weight}_i = \text{Old weight}_i \times e^{\text{weight of weak learner}} \]

Let's update the weights accordingly and present them in a table.

Certainly! Let's merge the "Before Update" and "After Update" tables to show the weights before and after the third iteration:

| x    | y    | Weight (Second Iteration) | Weight (Updated) (Third Iteration) |
|------|------|---------------------------|-------------------------------------|
| 0.1  | 1    | 0.0451                    | 0.0451 * exp(-0.851) ≈ 0.0231                            |
| 0.2  | 1    | 0.0451                    | 0.0451 * exp(-0.851) ≈ 0.0231                              |
| 0.3  | 1    | 0.0451                    | 0.0451 * exp(-0.851) ≈ 0.0231                              |
| 0.4  | -1   | 0.2186                    | 0.2186 * exp(0.851) ≈ 0.4679                              |
| 0.5  | -1   | 0.2186                    | 0.2186 * exp(0.851) ≈ 0.4679                              |
| 0.6  | -1   | 0.2186                    | 0.2186 * exp(0.851) ≈ 0.4679                              |
| 0.7  | -1   | 0.2186                    | 0.2186 * exp(0.851) ≈ 0.4679                              |
| 0.8  | 1    | 0.0451                    | 0.0451 * exp(-0.851) ≈ 0.0231                              |
| 0.9  | 1    | 0.0451                    | 0.0451 * exp(-0.851) ≈ 0.0231                              |
| 1    | 1    | 0.0451                    | 0.0451 * exp(-0.851) ≈ 0.0231                              |

## **Summery**

Here's the summary of the AdaBoost iterations:

| Round | Split Point | Left Class | Right Class | Alpha   |
|-------|-------------|------------|-------------|---------|
| 1     | 0.45        | 1          | -1          | 0.4055  |
| 2     | 0.25        | 1          | -1          | 0.881   |
| 3     | 0.6         | 1          | -1          | 0.851   |

## Example 2 (AdaBoost)

<img src="images/adaboost_ex1-1.png" width="100%">

<img src="images/adaboost_ex1-2.png" width="100%">

<img src="images/adaboost_ex1-3.png" width="100%">

<img src="images/adaboost_ex1-4.png" width="100%">

<img src="images/adaboost_ex1-5.png" width="100%">

<img src="images/adaboost_ex1-6.png" width="100%">

<img src="images/adaboost_ex1-7.png" width="100%">

<img src="images/adaboost_ex1-8.png" width="100%">

<img src="images/adaboost_ex1-9.png" width="100%">

<img src="images/adaboost_ex1-10.png" width="100%">

<img src="images/adaboost_ex1-11.png" width="100%">

<img src="images/adaboost_ex1-12.png" width="100%">

<img src="images/adaboost_ex1-13.png" width="100%">

<img src="images/adaboost_ex1-14.png" width="100%">

<img src="images/adaboost_ex1-15.png" width="100%">

<img src="images/adaboost_ex1-16.png" width="100%">

<img src="images/adaboost_ex1-17.png" width="100%">

<img src="images/adaboost_ex1-18.png" width="100%">

<img src="images/adaboost_ex1-19.png" width="100%">

<img src="images/adaboost_ex1-20.png" width="100%">

<img src="images/adaboost_ex1-21.png" width="100%">

<img src="images/adaboost_ex1-22.png" width="100%">

<img src="images/adaboost_ex1-23.png" width="100%">

## Example (Gradient Boosting )

We have below table of sample data with Height, Age and Gender as input variables and weight as the output variable. Target variable is Weight

| Height | Age | Gender | Weight |
|--------|-----|--------|--------|
| 5.4    | 28  | Male   | 88     |
| 5.2    | 26  | Female | 76     |
| 5      | 28  | Female | 56     |
| 5.6    | 25  | Male   | 73     |
| 6      | 25  | Male   | 77     |
| 4      | 22  | Female | 57     |

## Solution

If we assume that the average of weights of all the samples as our initial guess then 71.2 (88+76+56+73+77+57/6=71.2) would be our initial root node.

Step 1: Building the Initial Tree:
Create the root node with the initial guess as the prediction for all samples.

| Height | Age | Gender | Weight | Predicted Weight 1 | Pseudo Residuals 1 |
|--------|-----|--------|--------|--------------------|--------------------|
| 5.4    | 28  | Male   | 88     | 71.2               | 88 - 71.2 = 16.8  |
| 5.2    | 26  | Female | 76     | 71.2               | 76 - 71.2 = 4.8   |
| 5      | 28  | Female | 56     | 71.2               | 56 - 71.2 = -15.2 |
| 5.6    | 25  | Male   | 73     | 71.2               | 73 - 71.2 = 1.8   |
| 6      | 25  | Male   | 77     | 71.2               | 77 - 71.2 = 5.8   |
| 4      | 22  | Female | 57     | 71.2               | 57 - 71.2 = -14.2 |

- Build a new tree to predict these residuals using input variables.
- 
<img src="Images/gradient-boost-1.png" width="100%">

- Scale the predictions of each tree by a learning rate (e.g., 0.1).
- Combining the trees to make the new prediction. So, we start with initial prediction 71.2 and run the sample data down the new tree and sum them.
  
| Height | Age | Gender | Weight | Predicted weight 2       |
|--------|-----|--------|--------|-----------------------|
| 5.4    | 28  | Male   | 88     | 71.2+0.1*16.8=72.9    |       
| 5.2    | 26  | Female | 76     | 71.2+0.1*(-5.2)=70.7  |            
| 5      | 28  | Female | 56     | 71.2+0.1*(-5.2)=70.7  |            
| 5.6    | 25  | Male   | 73     | 71.2+0.1*3.8=71.6     |         
| 6      | 25  | Male   | 77     | 71.2+0.1*3.8=71.6     |         
| 4      | 22  | Female | 57     | 71.2+0.1*(-14.2)=69.8 |

If we observe the new predicted weights, we can see a small improvement in the result compared to the average weight from initial assumption. To further improve the result, we repeat the steps 2 and 3 and build another tree from the new pseudo residuals to predict the weights.

| Height | Age | Gender | Weight | Predicted weight 2   | Pseudo Residuals 2 |
|--------|-----|--------|--------|----------------------|--------------------
| 5.4    | 28  | Male   | 88     | 72.9                 | 88-72.9= 15.1            
| 5.2    | 26  | Female | 76     | 70.7                 | 76-70.7=5.3              
| 5      | 28  | Female | 56     | 70.7                 | 56-70.7=-14.7           
| 5.6    | 25  | Male   | 73     | 71.6                 | 73-71.6= 1.4      
| 6      | 25  | Male   | 77     | 71.6                 | 77-71.6 =5.4      
| 4      | 22  | Female | 57     | 69.8                 | 57-69.8=-12.8

Again build a new tree with the new pseudo residuals.
<img src="Images/gradient-boost-1.png" width="100%">

Now we combine the new tree with all the previous trees to predict the new weights. So, we start with initial prediction and sum it with scaled result of 1st tree and then sum with scaled result of new tree.

| Height | Age | Gender | Weight | Predicted weight 3                   |
|--------|-----|--------|--------|--------------------------------------|
| 5.4    | 28  | Male   | 88     | 71.2+0.1*16.8+0.1*15.1 = 74.4        |       
| 5.2    | 26  | Female | 76     | 71.2+0.1*(-5.2)+0.1*(-4.7) = 70.2    |            
| 5      | 28  | Female | 56     | 71.2+0.1*(-5.2)+0.1*(-4.7) = 70.2    |            
| 5.6    | 25  | Male   | 73     | 71.2+0.1*3.8+0.1*3.4=71.9            |         
| 6      | 25  | Male   | 77     | 71.2+0.1*3.8+0.1*3.4=71.9            |         
| 4      | 22  | Female | 57     | 71.2+0.1*(-14.2)+0.1*(-12.8) = 68.5  |

From the new predicted weight, we can observe there is further improvement in the result. Again we calculate the pseudo weights and build new tree in the similar way. These steps are repeated several times until the new tree doesn’t decrease the pseudo residual value or till maximum number of trees are built.

So the final predicted model would be

<img src="Images/gradient-boost-3.png" width="100%">

c=""