# Ensemble Voting Classifier

This notebook demonstrates how to use the `EnsembleVotingClassifier` module from the `rice2025.supervised_learning` library.  

## Setup
Import necessary modules and load data. For this example, the wine dataset from sklearn will be used. 

The Wine dataset is a small classification dataset that has:

- **Samples:** 178  
- **Features:** 13 numeric chemical properties of wines  
- **Classes:** 3 types of wine  

**Goal:** Predict the type of wine based on its chemical features using an ensemble of classifiers.

In [1]:
# import library
from rice2025.supervised_learning import ensemble_methods

from rice2025.supervised_learning import knn
from rice2025.supervised_learning import logistic_regression
from rice2025.supervised_learning import random_forest

import rice2025.utilities as util

# load dataset
from sklearn.datasets import load_wine
data = load_wine()
X, y = data.data, data.target

## Data Pre-Processing
Before training, we split the dataset into **training** and **test** sets using `train_test_split`. We can verify the split by printing the lengths of each output dataset. Then, we can use the `scale` function to scale our data. 

In [2]:
# split dataset
X_train, X_test, y_train, y_test = util.train_test_split(X, y, test_size=.2)
print(f"Train size: {X_train.shape}, Test size: {X_test.shape}")

# scale dataset
X_train = util.scale(X_train)
X_test = util.scale(X_test)


Train size: (142, 13), Test size: (36, 13)


## Initializing and Training the Ensemble Model

The `EnsembleVotingClassifier` combines predictions from three different models using majority voting.

Each base model is initialized independently and then passed into the ensemble. We will use the default parameters for each.  
The `fit()` method trains all three models on the same training data.

In [3]:
logistic = logistic_regression.LogisticRegression()
kn = knn.KNN()
rf = random_forest.RandomForest()

ensemble = ensemble_methods.EnsembleVotingClassifier(logistic, kn, rf)
_ = ensemble.fit(X_train, y_train)

## Making Predictions
Once the ensemble is trained, the `predict()` method aggregates predictions from all three models and assigns the majority label for each data point.

In [4]:
y_pred = ensemble.predict(X_test)

## Evaluating the Model

The model's performance can be measured using **accuracy** or a more detailed **classification report**.  
The `accuracy_score` and `classification_report` functions from scikit-learn can help measure performance.

In [5]:
from sklearn.metrics import accuracy_score, classification_report

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy on test set: {accuracy:.2f}")

# Detailed report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

Accuracy on test set: 0.94

Classification Report:
              precision    recall  f1-score   support

     class_0       0.91      1.00      0.95        10
     class_1       0.94      0.94      0.94        17
     class_2       1.00      0.89      0.94         9

    accuracy                           0.94        36
   macro avg       0.95      0.94      0.94        36
weighted avg       0.95      0.94      0.94        36

