# 1. Theory Introduction: Trees Ensemble

## Ensemble Learning
Ensemble methods use multiple learning models to obtain better predictive performance than could be obtained from any of the constituent learning models. The idea behind ensemble methods is that combining predictions from multiple models can often produce better results, especially if the models have different strengths and weaknesses.

## Random Forests
Random Forest is a popular ensemble learning method that can be used for both classification and regression tasks. It creates multiple decision trees during training and merges their outputs for prediction, which helps to overcome the overfitting problem and adds robustness to the model. Random forests use bootstrapped datasets to train each tree and random subsets of features to split on at each node, adding randomness to the tree-building process.


## Library

In [None]:
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

## 2. Dataset

In [None]:
# Loading the iris dataset
data = load_iris()
X = data.data
y = data.target

## 3. Model coded in Python

In [None]:
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Training the Random Forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Making predictions
y_pred = clf.predict(X_test)

# Calculating the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

## 4. Explanation

### Why Random Forests?
The power of Random Forests (and ensemble methods in general) lies in combining the predictions of several base estimators (in this case, decision trees), which individually might not be very accurate, to produce a more robust and accurate prediction.

### How does it work?

- **Bootstrap sampling**: Random forests randomly sample the data with replacement, creating different datasets called bootstrap datasets. Each of these datasets is used to train a decision tree.

- **Feature Randomness**: At each node split, a random subset of features is considered, adding another layer of randomness. This ensures the individual trees are de-correlated and overfitting is reduced.

- **Voting/Averaging**: Once all trees are trained, for a classification task, the mode of the classes predicted by individual trees is returned. For regression, the average prediction of individual trees is used.

This combination of randomness ensures that the model does not overfit to the noise in the training data, making Random Forests one of the most powerful off-the-shelf algorithms.