<a href="https://colab.research.google.com/github/austinkirwin/public-projects/blob/main/Python_projects/RandomForest/RandomForestModel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learning Random Forest Models

Throughout this notebook I will be learning and applying the concepts of Random Forest Models.

## Learning

In [1]:
# Data Processing
import pandas as pd
import numpy as np

# Modelling
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, ConfusionMatrixDisplay
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from scipy.stats import randint

# Tree Visualisation
from sklearn.tree import export_graphviz
from IPython.display import Image
import graphviz

### What are the imports for

- **RandomForestClassifier** fits a number of decision trees on subsets of the data
- **accuracy_score** and **precision_score** are used to check how accurate the model is
- **confusion_matrix** and **ConfusionMatrixDisplay** helps see properly labelled responses as well as compare false positives and negatives
- **recall_score** is used to report the proportion of true postives
- **RandomizedSearchCV** is used for hyperparameter tuning
- **train_test_split** for splitting data

Other imports are visualization or helper functions

In [2]:
# Data set
energy = pd.read_csv("https://raw.githubusercontent.com/austinkirwin/public-projects/refs/heads/main/Python_projects/RandomForest/ENB2012_data.csv")
energy.head()

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,Y1,Y2
0,0.98,514.5,294.0,110.25,7.0,2,0.0,0,15.55,21.33
1,0.98,514.5,294.0,110.25,7.0,3,0.0,0,15.55,21.33
2,0.98,514.5,294.0,110.25,7.0,4,0.0,0,15.55,21.33
3,0.98,514.5,294.0,110.25,7.0,5,0.0,0,15.55,21.33
4,0.9,563.5,318.5,122.5,7.0,2,0.0,0,20.84,28.28


In [9]:
### Renaming columns to be more specific

# X1 Relative Compactness
# X2 Surface Area
# X3 Wall Area
# X4 Roof Area
# X5 Overall Height
# X6 Orientation
# X7 Glazing Area
# X8 Glazing Area Distribution
# Y1 Heating Load
# Y2 Cooling Load

energy = energy.rename(columns={'X1':'Relative Compactness','X2':'Surface Area', 'X3':'Wall Area', 'X4': 'Roof Area', 'X5':'Overall Height','X6': 'Orientation', 'X7':'Glazing Area','X8':'Glazing Area Distribution', 'Y1':'Heating Load', 'Y2':'Cooling Load'})
energy.head()

Unnamed: 0,Relative Compactness,Surface Area,Wall Area,Roof Area,Overall Height,Orientation,Glazing Area,Glazing Area Distribution,Heating Load,Cooling Load
0,0.98,514.5,294.0,110.25,7.0,2,0.0,0,15.55,21.33
1,0.98,514.5,294.0,110.25,7.0,3,0.0,0,15.55,21.33
2,0.98,514.5,294.0,110.25,7.0,4,0.0,0,15.55,21.33
3,0.98,514.5,294.0,110.25,7.0,5,0.0,0,15.55,21.33
4,0.9,563.5,318.5,122.5,7.0,2,0.0,0,20.84,28.28


In [14]:
# Splitting the data
predictors = energy.drop(['Heating Load','Cooling Load'], axis =1)
responses = energy[['Heating Load','Cooling Load']]

X_train, X_test, Y_train, Y_test = train_test_split(predictors, responses, test_size = .2)
X_train.head(), Y_train.head()

(     Relative Compactness  Surface Area  Wall Area  Roof Area  Overall Height  \
 136                  0.64         784.0      343.0     220.50             3.5   
 530                  0.98         514.5      294.0     110.25             7.0   
 719                  0.62         808.5      367.5     220.50             3.5   
 477                  0.62         808.5      367.5     220.50             3.5   
 191                  0.62         808.5      367.5     220.50             3.5   
 
      Orientation  Glazing Area  Glazing Area Distribution  
 136            2          0.10                          2  
 530            4          0.40                          1  
 719            5          0.40                          4  
 477            3          0.25                          4  
 191            5          0.10                          3  ,
      Heating Load  Cooling Load
 136         15.41         19.23
 530         32.49         32.83
 719         16.77         16.79
 477   