# CYBR 486 - Lab #5: Perceptron

## Overview
This lab focuses on using a perceptron model to classify breast cancer data into two categories: malignant or benign. The lab will guide you through tasks such as loading the dataset, splitting it into training and testing sets, training the perceptron model, and evaluating its performance using various metrics.

---

## Objectives
1. Load and preprocess the breast cancer dataset.
2. Split the dataset into training (80%) and testing (20%) subsets.
3. Build and train a perceptron binary classifier using the training set.
4. Make predictions and evaluate the model using:
    - Confusion Matrix
    - Accuracy Score
    - Precision Score
    - Recall Score
5. Visualize the results using a confusion matrix heatmap.

---

## Prerequisites
1. Python 3.x installed on your machine.
2. Required Python libraries:
   - `scikit-learn`
   - `pandas`
   - `numpy`
   - `seaborn`
   - `matplotlib`
   
To install the required libraries, run the following command:

```bash
pip install scikit-learn pandas numpy seaborn matplotlib


## Step 1: Import Necessary Libraries

In [3]:
# Import necessary libraries
from sklearn.linear_model import Perceptron
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix
from sklearn.model_selection import train_test_split
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


## Step 2: Load and Inspect the Dataset

In [4]:
# Load the breast cancer dataset
data_X, data_y = load_breast_cancer(return_X_y=True, as_frame=True)

# Display basic information about the dataset
print("Dataset Information:")
print(data_X.info())  # Display data types and non-null values

# Check for any null values in the dataset
print("\nNull Values Check:")
print(data_X.isnull().sum())  # Check for nulls in the feature data


Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 30 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoot

## Explanation
1. data_X: Contains the feature variables (e.g., mean radius, texture, area).
2. data_y: Contains the target variable (malignant or benign).


## Step 3: Split the Dataset into Training and Testing Subsets


In [6]:
# Split the data into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(data_X, data_y, test_size=0.2, random_state=42)

# Display the shapes of the resulting splits
print("Training Set Shape:", X_train.shape)
print("Testing Set Shape:", X_test.shape)


Training Set Shape: (455, 30)
Testing Set Shape: (114, 30)


## Step 4: Build and Train the Perceptron Model

In [8]:
# Create a Perceptron model
perceptron_model = Perceptron(random_state=42)

# Train the model using the training set
perceptron_model.fit(X_train, y_train)

# Display a message confirming the model has been trained
print("\nPerceptron Model Trained.")



Perceptron Model Trained.


## Step 5: Make Predictions and Evaluate the Model

In [9]:
# Make predictions on the test set
y_pred = perceptron_model.predict(X_test)

# Evaluate the model's performance
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))
print("Precision Score:", precision_score(y_test, y_pred))
print("Recall Score:", recall_score(y_test, y_pred))



Confusion Matrix:
[[43  0]
 [15 56]]

Accuracy Score: 0.868421052631579
Precision Score: 1.0
Recall Score: 0.7887323943661971


## Evaluation Metrics
1. Confusion Matrix: A table used to evaluate the performance of classification models by comparing actual vs predicted values.
2. Accuracy Score: The proportion of correct predictions.
3. Precision Score: The proportion of true positive predictions among all positive predictions.
4. Recall Score: The proportion of true positive predictions among all actual positives.