<h1 align=middle style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Random Forest
</font>
</h1>

Ensemble Learning is the practice of combining different decision trees (regression or classification), and giving the combined results as the answer.

<h1 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Classification based on Vote
</font>
</h1>

In [35]:
from sklearn.datasets import make_moons


X, y = make_moons(n_samples=200, noise=0.15)

In [36]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [37]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier


log_clf = LogisticRegression()
ran_clf = RandomForestClassifier()
svm_clf = SVC()

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', ran_clf), ('svc', svm_clf)], 
    voting='hard'
)

voting_clf.fit(X_train, y_train)

In [38]:
from sklearn.metrics import accuracy_score

for clf in (log_clf, ran_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(clf.__class__.__name__, round(accuracy, 4))

LogisticRegression 0.925
RandomForestClassifier 0.95
SVC 0.95
VotingClassifier 0.95


We can expect the result of the voting classifier to be better.

we can add the hyperparameter "probibility=True" to the SVC model and get the probability of each model and the voiting classifier.

<h1 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Bagging and Pasting
</font>
</h1>

In [40]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier



bag_clf = BaggingClassifier(
    DecisionTreeClassifier(),
    n_estimators=500, 
    max_samples=100, 
    bootstrap=True,
    n_jobs=-1
)

bag_clf.fit(X_train, y_train)

In [42]:
y_pred = bag_clf.predict(X_test)

In the bagging method, (for a big enough times of bagging), 37% of the data is not used.

So it is the perfect validation dataset.

In [43]:
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), 
    n_estimators=500, 
    bootstrap=True, 
    n_jobs=-1, 
    oob_score=True
)

bag_clf.fit(X_train, y_train)

In [44]:
bag_clf.oob_score_

0.95625

Let's see how well it works :

In [46]:
from sklearn.metrics import accuracy_score


y_pred = bag_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.95

Noice

We could do the same thing with features, which is called Random Patch

<h1 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Random Forest
</font>
</h1>

In [47]:
from sklearn.ensemble import RandomForestClassifier


rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1)
rnd_clf.fit(X_train, y_train)


y_pred_rnd = rnd_clf.predict(X_test)

In [48]:
accuracy_score(y_pred_rnd, y_test)

0.95

This bagging is nearly identical to RandomForestClassifier

In [49]:
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(splitter='random', max_leaf_nodes=16), 
    n_estimators=500, 
    n_jobs=-1, 
    max_samples=1.0, 
    bootstrap=True
)

In [50]:
bag_clf.fit(X_train, y_train)

In [51]:
y_pred_bag = bag_clf.predict(X_test)
accuracy_score(y_pred_bag, y_test)

0.95

# Even more random

So we can do another thing aswell, we can randomize the feature chosen in every node (it would not neccessarily be the best anymore).


This is called Extremely Randomized Trees.

and reduces variance at the cost of biad.

The class is : ExtraTreesClassifier

# Feature importance

In [53]:
from sklearn.datasets import load_iris


iris = load_iris()
rnd_clf = RandomForestClassifier(n_estimators=500, n_jobs=-1)
rnd_clf.fit(iris["data"], iris["target"])

In [54]:
for name, score in zip(iris["feature_names"], rnd_clf.feature_importances_):
    print(name, score)

sepal length (cm) 0.09211054351098631
sepal width (cm) 0.021515279332164565
petal length (cm) 0.40747698243967273
petal width (cm) 0.47889719471717634


<h1 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Boosting
</font>
</h1>

**Boosting** is a method in machine learning that aims to improve the accuracy of predictive models. It belongs to the ensemble techniques, which combine the predictions from multiple algorithms for enhanced performance.

### Key Concepts

- **Ensemble Method**: Boosting is a type of ensemble learning where multiple models are strategically used to solve the same problem.
- **Weak to Strong Learners**: The goal is to sequentially combine weak learners (models that are slightly better than random guessing) to form a strong, accurate predictor.
- **Sequential Learning**: Models are built in a sequence, with each new model focusing on the errors of its predecessor.
- **Weight Adjustment**: Boosting involves adjusting the weights of training instances, with more focus on the ones that were misclassified in earlier rounds.
- **Popular Algorithms**: AdaBoost (Adaptive Boosting) and Gradient Boosting are widely-used boosting algorithms.
- **Applications**: Boosting is effective for classification and regression, particularly with imbalanced datasets.

### Advantages and Disadvantages

- **Advantages**: Boosting can lead to very high accuracy and is effective in complex predictive problems.
- **Disadvantages**: It can be computationally intensive and has a risk of overfitting if not carefully implemented.

### Conclusion

Boosting is a powerful approach in machine learning, transforming a series of weak learners into a highly accurate collective model through an adaptive, sequential process.


## AdaBoostClassifier

The `AdaBoostClassifier` is a popular boosting algorithm in machine learning. It focuses on combining multiple weak learners, typically decision trees, to create a strong classifier. The key aspects of AdaBoost are:

### 1. Weighted Error Rate of the i'th Predictor

Each weak learner is assigned a weighted error rate, which is calculated based on its performance on the weighted training instances. The error rate is given by:

$$
\text{error}_i = \frac{\sum \text{weights of misclassified instances}}{\sum \text{weights of all instances}}
$$

### 2. Predictor Weight

Based on the error rate, a weight is assigned to the predictor. This weight determines the influence of the predictor in the final decision. It is computed as:

$$
\text{weight}_i = \alpha_i = \eta \cdot \log{\frac{1 - \text{error}_i}{\text{error}_i}}
$$

where \(\eta\) is the learning rate.

### 3. Updating the Weights

After calculating the predictor's weight, the algorithm updates the weights of the training instances. Misclassified instances are given more weight, while correctly classified instances have their weights decreased. The update rule is:

$$
\text{new weight} = \text{weight} \cdot \exp(\alpha_i)
$$

for misclassified instances, and remains the same for correctly classified instances.

### 4. Final Prediction Choice

The final prediction is made by combining the predictions of all the learners, weighted by their respective weights. The predicted class is the one that receives the highest weighted sum of votes:

$$
\text{predicted class} = \argmax \sum_{i: \text{correct predictions}} \alpha_i - \sum_{i: \text{incorrect predictions}} \alpha_i
$$

### Conclusion

The `AdaBoostClassifier` is an effective algorithm that iteratively focuses on the more challenging aspects of the training data, adjusting the weights of the learners and the training instances to improve its predictive accuracy over time.
