# Perceptron

This notebook demonstrates how to use the `Perceptron` module from the `rice2025.supervised_learning` library.  

## Setup
Import necessary modules and load data. For this example, the breast cancer dataset from sklearn will be used. 

The breast cancer dataset is a small binary classification dataset that has:

- **Samples:** 569  
- **Features:** 30 numeric features
- **Classes:** Malignant vs. Benign  

**Goal:** Predict Malignant vs. Benign based on features.  

In [11]:
# import library
from rice2025.supervised_learning import perceptron
import rice2025.utilities as util
import numpy as np

# load dataset
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X, y = data.data, data.target

## Data Pre-Processing
Before training, we split the dataset into **training** and **test** sets using `train_test_split`. We can verify the split by printing the lengths of each output dataset. Then, we can use the `scale` function to scale our data.  

The rice2025 perceptron model expects labels to be {-1, 1}. Since the breast cancer set uses {0, 1}, we must re-map them. 

In [12]:
# re-map data
y = np.where(y == 0, -1, 1)

# split dataset
X_train, X_test, y_train, y_test = util.train_test_split(X, y, test_size=.2)
print(f"Train size: {X_train.shape}, Test size: {X_test.shape}")

# scale dataset
X_train, X_test = util.fit_transform_split(X_train, X_test)

Train size: (455, 30), Test size: (114, 30)


## Initializing and Training the Perceptron Model

We will use the default parameters for `Perceptron`:
- `lr` = .01
- `n_iter` = 1000  

Use the `fit()` method to "train" the model on the training data.

In [13]:
model = perceptron.Perceptron()
returns = model.fit(X_train, y_train)

## Making Predictions
Once the model is trained, the `predict()` method can be used to classify new data points.

Since we re-mapped the data earlier to {-1, 1}, we can revert those changes after predicting. 

In [14]:
y_pred = model.predict(X_test)

# un-map data
y_pred = np.where(y_pred >= 0.0, 1, -1)

## Evaluating the Model

The model's performance can be measured using **accuracy** or a more detailed **classification report**.  
The `accuracy_score` and `classification_report` functions from scikit-learn can help measure performance.

In [15]:
from sklearn.metrics import accuracy_score, classification_report

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy on test set: {accuracy:.2f}")

# Detailed report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Accuracy on test set: 0.98

Classification Report:
              precision    recall  f1-score   support

        -1.0       0.95      1.00      0.98        40
         1.0       1.00      0.97      0.99        74

    accuracy                           0.98       114
   macro avg       0.98      0.99      0.98       114
weighted avg       0.98      0.98      0.98       114

