# __Perceptron-Based Classification Model__

The perceptron-based classification model is a linear binary classifier that uses a single-layer neural network to make predictions based on weighted inputs and a threshold activation function.

Let's understand how to build a perceptron-based classification model.

## __Steps to be followed:__
1. Import the required libraries
2. Read a CSV file
3. Display the data
4. Perform data preprocessing and splitting
5. Fit the model
6. Predict the model

### Step 1: Import the required libraries

- Import necessary modules for numerical computations and defines functions for exponential calculations, array operations, random number generation, and matrix multiplication.

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split
%matplotlib inline

### Step 2: Read a CSV file

- Load the data from a CSV file.
- Read a CSV file using the **pd.read_csv()** function.


**Dataset Explanation**

The Spambase dataset is designed for the task of classifying email messages as spam or not spam (also known as ham). It is commonly used in machine learning for binary classification problems.

In [None]:
data = pd.read_csv("spambase.csv")

### Step 3: Display the data

- The __head()__ is used to retrieve the few rows of the dataset named __mnist_train__.

In [None]:
data.head()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total,spam
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


**Observation:**

- As a result, the display consists of **5** rows and **58** columns.

### Step 4: Perform data preprocessing and splitting

- To check if there are any missing values in the dataset, use the **isnull()** function combined with the **any()** function in pandas.
- The **data.iloc[:,:-1]** selects all columns except the last one from the DataFrame data and assigns them to **df_x**, and __data.iloc[:,-1]__ selects only the last column and assigns it to **df_y**.
- Standaradize the **df_x** using standard scaler.
- Perform a train-test split on the input data **df_x** and **df_y**, allocating **60%** of the data for training **x_train** and **y_train** and **40%** for testing **x_test** and **y_test**, with a random state of **4** for reproducibility.
- Create an instance of the perceptron classifier **per**.
- Initialize the classifier with default parameters, allowing it to be used for classification tasks.


In [None]:
if data.isnull().values.any():
    data = data.fillna(0)

In [None]:
df_x = data.iloc[:,:-1]
df_y = data.iloc[:,-1]

In [None]:
# Split the data to train and test

x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.4, random_state=4)

In [None]:
# Standardize the features

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)

### Step 5: Initialize and Fit the Perceptron

- Fit the perceptron-based classification model for the **x_train_scaled** and **y_train** datasets to supervised learning for various binary classifiers by defining perceptrons.

In [None]:
# Initialize and train the Perceptron

per = Perceptron()
per.fit(x_train_scaled, y_train)

### Step 6: Predict the model

- Predict the model for **x_test_scaled**.

In [None]:
# Predictions
pred_train = per.predict(x_train_scaled)
pred_test = per.predict(x_test_scaled)

In [None]:
pred_test

array([1, 0, 0, ..., 0, 0, 0])

**Observations:**

- The prediction of **x_test_scaled** is presented above in an array format which represents a sequence of values.
- Each value in the array corresponds to a specific element within a dataset or sequence. However, without additional context, the exact meaning or source of these values within the array cannot be precisely determined.

**Check the Accuracy Score**
- Import the **accuracy_score** function from the **sklearn.metrics** module. This function is used to compute the accuracy of a classifier's predictions.
- **y_test** is the actual labels for the test set. These are the 'true' values that the model is trying to predict.
- **pred** is the predicted labels for the test set, as predicted by the model.

In [None]:
# Accuracy
from sklearn.metrics import accuracy_score

accuracy_train = accuracy_score(y_train, pred_train)
accuracy_test = accuracy_score(y_test, pred_test)

In [None]:
print("Accuracy of the training data: ", accuracy_train)
print("Accuracy of the testing data: ", accuracy_test)

Accuracy of the training data:  0.893840579710145
Accuracy of the testing data:  0.8881042911461162


**Observation:**
- The **accuracy_score** function compares these two arrays and returns the proportion of correct predictions, which is the accuracy of the model.