# StatQuest: Random Forests Part 1 - Building, Using and Evaluating

Random Forest is a flexible, easy to use machine learning algorithm that produces excellent results most of the times, even without hyper-parameter tuning. It is also one of the most used algorithms, because of its simplicity and diversity. Today we will explore the working of the Random Forest model.

## Import the necessary libraries

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

## Load the dataset

In [None]:
# Use your own path to the dataset
path_to_dataset = 'heart_disease.csv'
dataset = pd.read_csv(path_to_dataset)

## Preprocess the data
We will split our data into train and test sets. The training data will be used to build the decision trees and the testing data will be used to evaluate the model.

In [None]:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Implementing the Random Forest model

In [None]:
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

## Evaluating the model
We will calculate the accuracy of the model and also the confusion matrix to evaluate the performance of our model.

In [None]:
print("Accuracy:", accuracy_score(y_test, y_pred))

cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

## Conclusion
Random Forests is a powerful and popular algorithm due to its robustness and versatility. It can be used for both regression and classification tasks with the same ease. It also provides a pretty good indicator of the feature importance, which can be a very handy feature when it comes to explaining the results to business people or non-technical people.

In spite of its great power and flexibility, it has its own limitations such as overfitting for some datasets with noisy classification/regression tasks. But overall, it is an algorithm that's worth knowing and having in your toolbox.