# Random Forest

This notebook demonstrates how to use the `Random Forest` module from the `rice2025.supervised_learning` library.  

## Setup
Import necessary modules and load data. For this example, the wine dataset from sklearn will be used. 

The Wine dataset is a small classification dataset that has:

- **Samples:** 178  
- **Features:** 13 numeric chemical properties of wines  
- **Classes:** 3 types of wine  

**Goal:** Predict the type of wine based on its chemical features.  

In [2]:
# import library
from rice2025.supervised_learning import random_forest
import rice2025.utilities as util

# load dataset
from sklearn.datasets import load_wine
data = load_wine()
X, y = data.data, data.target

## Data Pre-Processing
Before training, we split the dataset into **training** and **test** sets using `train_test_split`. We can verify the split by printing the lengths of each output dataset. Then, we can use the `scale` function to scale our data. 

In [3]:
# split dataset
X_train, X_test, y_train, y_test = util.train_test_split(X, y, test_size=.2)
print(f"Train size: {X_train.shape}, Test size: {X_test.shape}")

# Scale features
X_train, X_test = util.fit_transform_split(X_train, X_test)

Train size: (142, 13), Test size: (36, 13)


## Initializing and Training the Decision Tree Model

`DecisionTree` supports typical tree hyperparameters such as:

- `n_estimators`
- `min_samples_split`
- `max_depth`

Weâ€™ll use the default parameters, `n_estimators` = 100, `min_samples_split` = 2, and `max_depth` = 100. 

In [4]:
model = random_forest.RandomForest()
model.fit(X_train, y_train)

## Making Predictions
Once the model is trained, the `predict()` method can be used to classify new data points.

In [5]:
y_pred = model.predict(X_test)

## Evaluating the Model

The model's performance can be measured using **accuracy** or a more detailed **classification report**.  
The `accuracy_score` and `classification_report` functions from scikit-learn can help measure performance.

In [6]:
from sklearn.metrics import accuracy_score, classification_report

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy on test set: {accuracy:.2f}")

# Detailed report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

Accuracy on test set: 0.97

Classification Report:
              precision    recall  f1-score   support

     class_0       1.00      1.00      1.00         8
     class_1       1.00      0.95      0.97        19
     class_2       0.90      1.00      0.95         9

    accuracy                           0.97        36
   macro avg       0.97      0.98      0.97        36
weighted avg       0.98      0.97      0.97        36

