<h2>Random Forest Regression</h2>
<h4>Key Concepts</h4>

- **Decision Trees**: The basic building block of a Random Forest is the decision tree. A decision tree is a model that splits the data into smaller and smaller subsets based on feature values, making decisions at each node to eventually arrive at a prediction;

- **Bagging (Bootstrap Aggregating)**: Random Forest uses a technique called bagging. This involves creating multiple subsets of the training data by sampling with replacement. Each subset is used to train a separate decision tree;

- **Random Feature Selection**: To ensure diversity among the trees, Random Forest introduces randomness in the selection of features. At each split in a tree, a random subset of features is chosen, and the best split is found only among this subset;


<h4>Importing libraries</h4>

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

<h4>Loading dataset</h4>

In [2]:
from sklearn.datasets import fetch_california_housing

X = fetch_california_housing().data
y = fetch_california_housing().target

In [3]:
print(X)

[[   8.3252       41.            6.98412698 ...    2.55555556
    37.88       -122.23      ]
 [   8.3014       21.            6.23813708 ...    2.10984183
    37.86       -122.22      ]
 [   7.2574       52.            8.28813559 ...    2.80225989
    37.85       -122.24      ]
 ...
 [   1.7          17.            5.20554273 ...    2.3256351
    39.43       -121.22      ]
 [   1.8672       18.            5.32951289 ...    2.12320917
    39.43       -121.32      ]
 [   2.3886       16.            5.25471698 ...    2.61698113
    39.37       -121.24      ]]


In [4]:
print(y)

[4.526 3.585 3.521 ... 0.923 0.847 0.894]


<h4>Splitting data into training and testing datasets</h4>

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=32)

<h4>Training the Random Forest Regressor</h4>

In [6]:
from sklearn.ensemble import RandomForestRegressor

random_forest = RandomForestRegressor()
random_forest.fit(X_train, y_train)

<h4>Evaluating the model performance</h4>

In [8]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

y_pred = random_forest.predict(X_test)

#MAE <- calculates the average absolute difference between the predicted values and the actual values
print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred))

#MSE <- measures the average squared difference between the predicted values and the actual values
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))

#R2S <- is a statistical measure used to evaluate the goodness-of-fit of a regression model
print("R^2 Score:", r2_score(y_test, y_pred))


Mean Absolute Error: 0.32387947105135684
Mean Squared Error: 0.25719394058728895
R^2 Score: 0.8118969630257121
