#### About

> Random Forests

Random Forests is an ensemble learning method that combines multiple decision tree classifiers to create a more robust and accurate predictive model. It became a popular machine learning technique due to its ability to handle both classification and regression tasks.

> Mathematics

A Random Forest is a collection of decision trees, where each tree is trained on a random subset of the data. The key idea is to introduce randomness into the training process to reduce overfitting and improve the model's generalization performance.

It is achieved by

1. Bootstapped Sampling : A random subset of the original training data is selected with replacement, creating a new dataset of the same size as the original data. This process is known as bootstrapped sampling. Each tree in the Random Forest is trained on this bootstrapped dataset.

2. Feature Randomness: For each node split in a decision tree, only a random subset of features (typically the square root of the total number of features) is considered as potential split candidates. This introduces further randomness and diversity among the trees in the forest.

3. Voting/Aggregation: Once all the trees are trained, the final prediction of a Random Forest is obtained by aggregating the predictions of individual trees. For classification, this is typically done by majority voting, while for regression, it can be done by taking the average or weighted average of the tree predictions.





In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [2]:
iris = load_iris()
X = iris.data
y = iris.target


In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [4]:
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)


In [5]:
y_pred = clf.predict(X_test)


In [6]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

Accuracy: 1.0


Random Forests have several advantages, such as handling high-dimensional data, being resistant to overfitting, and providing robustness to noisy data. However, they may have some limitations, such as increased complexity, computational cost, and potential biases. Proper hyperparameter tuning and model evaluation techniques should be applied to ensure optimal performance in real-world scenarios.