One ensemble learning technique that’s typically utilized for regression and classification is called Random Forest. To provide a forecast that is more reliable and accurate, it constructs many decision trees and blends them.

Every tree in a Random Forest makes a forecast, and the model’s prediction (for classification) belongs to the class that receives the most votes. For regression, it takes the average of outputs by different trees.
Evaluation Metrics:

    Classification: Accuracy, Precision, Recall, F1 Score.
    Regression: Mean Squared Error (MSE), R-squared.

Applying with Sci-kit Learn

We’ll apply Random Forest to the Breast Cancer dataset for classifying tumors as benign or malignant. We’ll train the Random Forest model and evaluate its performance using classification metrics.

1. Create and Train the Random Forest Model:

    Initialize a Random Forest Classifier.
    Utilizing the training data, fit (train) the model.

2. Predict:

    Use the trained model to predict the labels of the test data.

3. Evaluate:

    Assess the model’s performance on the test data using Accuracy, Precision, Recall, and F1 Score.

In [1]:
# Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [2]:
# Load the Breast Cancer dataset
breast_cancer = load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target

In [3]:
# Splitting data into training & testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [4]:
# Creating & training the Random Forest Model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

In [5]:
# Predicting the test set results
y_pred = model.predict(X_test)

In [6]:
# Evaluting the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')


In [7]:
# Printing the results
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)

Accuracy: 0.9707602339181286
Precision: 0.9736486486486486
Recall: 0.9636243386243386
F1 Score: 0.9682592716338123


These results demonstrate that the Random Forest model has a high level of performance on the Breast Cancer dataset, with strong scores across all key metrics.

The high precision and recall suggest that the model is effective in accurately identifying both benign and malignant tumors, with a balanced approach to minimizing both false positives and false negatives.