# Random Forest

Random Forest is a `supervised learning algorithm` . Like you can already see from it's name, it creates a forest and makes it somehow random.
`The „forest" it builds, is an ensemble of Decision Trees` , most of the time trained with the "bagging" method. The general idea of the bagging method is that a combination of learning models increases the overall result.

To say it in simple words: Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction.

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [10]:
# load tips dataset
df = sns.load_dataset('tips')
print(df.head())


   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4


In [11]:
# encode categorical columns
le = LabelEncoder()
for col in df.columns:
    if df[col].dtype == 'object' or df[col].dtype == 'category':
        df[col] = le.fit_transform(df[col])

In [20]:
# split data into features and target
X = df.drop('sex', axis=1)
y = df['sex']

# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# create , train and predict model
model = RandomForestClassifier(n_estimators=50,criterion='entropy',max_depth=80, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# evaluate model
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Confusion Matrix:\n', confusion_matrix(y_test, y_pred))
print('Classification Report:', classification_report(y_test, y_pred))

Accuracy: 0.6326530612244898
Confusion Matrix:
 [[ 7 12]
 [ 6 24]]
Classification Report:               precision    recall  f1-score   support

           0       0.54      0.37      0.44        19
           1       0.67      0.80      0.73        30

    accuracy                           0.63        49
   macro avg       0.60      0.58      0.58        49
weighted avg       0.62      0.63      0.61        49



---

In [32]:
# use random forest for regression
# split data into features and target
X = df.drop('tip', axis=1)
y = df['tip']

# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# create , train and predict model
model = RandomForestRegressor(n_estimators=560,random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# evaluate model
print('Mean Squared Error:', mean_squared_error(y_test, y_pred))
print('Mean Absolute Error:', mean_absolute_error(y_test, y_pred))
print('R2 Score:', r2_score(y_test, y_pred))


Mean Squared Error: 0.9338699808868658
Mean Absolute Error: 0.7663538629737593
R2 Score: 0.25288688863259423
