<h1 id="random-forest-regressor-algorithm" align="center">🌲 Random Forest Regressor Algorithm 🌲</h1>

<center><i>A Forest of Decision Trees<i></center>

----

<h1 id='brief-description'>📝 Brief Description</h1>

`Random Forest Regressor` Algorithm consists in a group of `Decision Trees` to predict values.

<center>
<img src='https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fmiro.medium.com%2Fmax%2F567%2F1*Mb8awDiY9T6rsOjtNTRcIg.png&f=1&nofb=1&ipt=c968cd83d2ee1132fa978557078fea03427bc90732cf03b8a1ace773961e1849&ipo=images' />
</center>

<br />

**✔️ Pros:**

```
- Random Forests can be used for both classification and regression tasks;

- Random Forests work well with both categorical and numerical data. No scaling or transformation of variables is usually necessary;

- Random Forests implicitly perform feature selection and generate uncorrelated decision trees. It does this by choosing a random set of features to build each decision tree. This also makes it a great model when you have to work with a high number of features in the data;

- Random Forests are not influenced by outliers to a fair degree. It does this by binning the variables;

- Random Forests can handle linear and non-linear relationships well;

- Random Forests generally provide high accuracy and balance the bias-variance trade-off well. Since the model’s principle is to average the results across the multiple decision trees it builds, it averages the variance as well.
```

<br />

**❌ Cons:**

```
- Random Forests are not easily interpretable. They provide feature importance but it does not provide complete visibility into the coefficients as linear regression;

- Random Forests can be computationally intensive for large datasets;

- Random forest is like a black box algorithm, you have very little control over what the model does.
```

<br />

**📛 Some Random Forest Regressor Properties:**

```
- n_estimators: number of Decision Trees
- max_depth: max depth for each Decision Tree
- criterion: evaluation method
- random_state: integer number for results reproducability
```

----

<h1 id='reach-me'>ℹ️ Further Information</h1>
<br/>

For further information, check out these two videos from *[StatQuest with Josh Starmer](https://www.youtube.com/@statquest)* YouTube channel:

- *[StatQuest: Random Forests Part 1 - Building, Using and Evaluating](https://www.youtube.com/watch?v=J4Wdy0Wc_xQ)*
- *[StatQuest: Random Forests Part 2: Missing data and clustering](https://www.youtube.com/watch?v=sQ870aTKqiM)*

----

<h1 id='example-code'>💻 Example Code</h1>
<br/>

Let's use `Scikit Learn` package to demonstrate how to create, fit, make predictions and evaluate a simple `Random Forest Regressor Model`.

To evaluation, we will be using the `Mean Squared Error (MSE)` Algorithm. This Algorithm works getting the absolute value of the substraction between the predicted values by the real ones. After that, we calculate the summatory between them and finds out their mean. The method can be repreented by the following equation:

$mean(sum(abs(predictedvalues - realvalues)))$


Now, let's hop into the code!!

In [None]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

SEED = (2000)
X_MIN = 0
X_MAX = 100
Y_MIN = 0
Y_MAX = 20
TRAIN_SAMPLES = (800)
VALID_SAMPLES = (200)

np.random.seed(SEED)
pd.set_option('display.max_rows', 15)
pd.set_option('display.max_columns', 15)

In [None]:
# Generating fake dataset
X_train = np.random.randint(X_MIN, X_MAX, TRAIN_SAMPLES)
X_valid = np.random.randint(X_MIN, X_MAX, VALID_SAMPLES)
y_train = np.random.randint(Y_MIN, Y_MAX, TRAIN_SAMPLES)
y_valid = np.random.randint(Y_MIN, Y_MAX, VALID_SAMPLES)

X_train = pd.DataFrame(X_train, columns=['X'])
X_valid = pd.DataFrame(X_valid, columns=['X'])
y_train = pd.DataFrame(y_train, columns=['y'])
y_valid = pd.DataFrame(y_valid, columns=['y'])

In [None]:
# Creating the Model
rfg_model = RandomForestRegressor(
    n_estimators=250
    , max_depth=10
    , criterion='squared_error'
    , random_state=SEED
)

In [None]:
# Training and Making Predictions
rfg_model.fit(X_train, y_train)
print('Traning Done!')

predictions = rfg_model.predict(X_valid)
print('Predictions Done!')

In [None]:
# Evaluation
mse = mean_squared_error(y_valid, predictions)
train_score = round(rfg_model.score(X_train, y_train) * 100, 2)
valid_score = round(rfg_model.score(X_valid, y_valid) * 100, 2)

print('Mean Squared Error (MSE):', mse)

**OBS.:** *as far as the goal of this Kernel is to explain what is and how to apply `Random Forest Regressor Algorithm`, we have not done any Data Preprocessing and Transformation, so our model's evaluation is quite suck! Do not worry about this 😂*

----

Thank so much for today, see ya!! 👋👋

<br/>
<h1 id='reach-me'>📫 Reach Me</h1>
<br/>

> **Email:** **[csfelix08@gmail.com](mailto:csfelix08@gmail.com?)**

> **Linkedin:** **[linkedin.com/in/csfelix/](https://www.linkedin.com/in/csfelix/)**

> **Instagram:** **[instagram.com/c0deplus/](https://www.instagram.com/c0deplus/)**

> **Portfolio:** **[CSFelix.io](https://csfelix.github.io/)**

> **Kaggle:** **[DSFelix](https://www.kaggle.com/dsfelix)**