# Breast Cancer Classification using Perceptron, AdalineGD, and Logistic Regression

This notebook explores the classification of breast cancer data using four different models: Perceptron, AdalineGD, Logistic Regression, and SVM(using linear and RBF kernels). The goal is to train and evaluate these models to predict whether a tumor is malignant or benign based on the features in the dataset.

## Steps Covered:
1. Data Loading and Preprocessing
2. Exploratory Data Analysis (EDA)
3. Standardize the data to improve model performance.
4. Implementation and Training of the Perceptron Model.
5. Implementation and Training of the AdalineGD Model.
6. Training Logistic Regression Model.
7. Training SVM models.

Dataset: [Breast Cancer Data Set](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html)



### 1. import Breast cancer Dataset from sklearn

In [13]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()

X = data.data
y = data.target

# Split the data into training and testing sets (80% for training and 20% for testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =0.2, random_state=45)

### 2. Exploratory Data Analysis (EDA)

In [14]:
import pandas as pd

df = pd.DataFrame(X_train, columns=data.feature_names)
df['target'] = y_train

df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,11.76,21.6,74.72,427.9,0.08637,0.04966,0.01657,0.01115,0.1495,0.05888,...,25.72,82.98,516.5,0.1085,0.08615,0.05523,0.03715,0.2433,0.06563,1
1,11.54,10.72,73.73,409.1,0.08597,0.05969,0.01367,0.008907,0.1833,0.061,...,12.87,81.23,467.8,0.1092,0.1626,0.08324,0.04715,0.339,0.07434,1
2,11.6,24.49,74.23,417.2,0.07474,0.05688,0.01974,0.01313,0.1935,0.05878,...,31.62,81.39,476.5,0.09545,0.1361,0.07239,0.04815,0.3244,0.06745,1
3,19.81,22.15,130.0,1260.0,0.09831,0.1027,0.1479,0.09498,0.1582,0.05395,...,30.88,186.8,2398.0,0.1512,0.315,0.5372,0.2388,0.2768,0.07615,0
4,13.0,21.82,87.5,519.8,0.1273,0.1932,0.1859,0.09353,0.235,0.07389,...,30.73,106.2,739.3,0.1703,0.5401,0.539,0.206,0.4378,0.1072,0


### 3. Standardize the features

In [15]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std= sc.transform(X_test)

### 4-1. Implementation of the Perceptron Model

In [16]:
import numpy as np


class Perceptron:
    """Perceptron classifier.

    Parameters
    ------------
    eta : float
      Learning rate (between 0.0 and 1.0)
    n_iter : int
      Passes over the training dataset.
    random_state : int
      Random number generator seed for random weight
      initialization.

    Attributes
    -----------
    w_ : 1d-array
      Weights after fitting.
    b_ : Scalar
      Bias unit after fitting.
    errors_ : list
      Number of misclassifications (updates) in each epoch.

    """
    def __init__(self, eta=0.01, n_iter=50, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state

    def fit(self, X, y):
        """Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_examples, n_features]
          Training vectors, where n_examples is the number of examples and
          n_features is the number of features.
        y : array-like, shape = [n_examples]
          Target values.

        Returns
        -------
        self : object

        """
        rgen = np.random.RandomState(self.random_state)
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1])
        self.b_ = np.float64(0.)

        self.errors_ = []

        for _ in range(self.n_iter):
            errors = 0
            for xi, target in zip(X, y):
                update = self.eta * (target - self.predict(xi))
                self.w_ += update * xi
                self.b_ += update
                errors += int(update != 0.0)
            self.errors_.append(errors)
        return self

    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_) + self.b_

    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.net_input(X) >= 0.0, 1, 0)

### 4-2. Train the Perceptron model and calculate its accuracy

In [17]:
from sklearn.metrics import accuracy_score

eta = 0.01 #learning rate
epochs = 16 #number of epochs
ppn = Perceptron(eta=eta, n_iter=epochs)
ppn.fit(X_train_std, y_train)

# calculate accuracy
ppn_y_pred = ppn.predict(X_test_std)
ppn_accu = accuracy_score(y_test, ppn_y_pred)
print('Perceptron accuracy: %.3f' % ppn_accu)

Perceptron accuracy: 0.982


### 5-1. Implementation of the AdalineGD model

In [18]:
class AdalineGD:
    """ADAptive LInear NEuron classifier.

    Parameters
    ------------
    eta : float
      Learning rate (between 0.0 and 1.0)
    n_iter : int
      Passes over the training dataset.
    random_state : int
      Random number generator seed for random weight
      initialization.


    Attributes
    -----------
    w_ : 1d-array
      Weights after fitting.
    b_ : Scalar
      Bias unit after fitting.
    losses_ : list
      Mean squared eror loss function values in each epoch.

    """
    def __init__(self, eta=0.01, n_iter=50, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state

    def fit(self, X, y):
        """ Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_examples, n_features]
          Training vectors, where n_examples is the number of examples and
          n_features is the number of features.
        y : array-like, shape = [n_examples]
          Target values.

        Returns
        -------
        self : object

        """
        rgen = np.random.RandomState(self.random_state)
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1])
        self.b_ = np.float64(0.)
        self.losses_ = []

        for i in range(self.n_iter):
            net_input = self.net_input(X)
            # Please note that the "activation" method has no effect
            # in the code since it is simply an identity function. We
            # could write `output = self.net_input(X)` directly instead.
            # The purpose of the activation is more conceptual, i.e.,  
            # in the case of logistic regression (as we will see later), 
            # we could change it to
            # a sigmoid function to implement a logistic regression classifier.
            output = self.activation(net_input)
            errors = (y - output)
            
            #for w_j in range(self.w_.shape[0]):
            #    self.w_[w_j] += self.eta * (2.0 * (X[:, w_j]*errors)).mean()
            
            self.w_ += self.eta * 2.0 * X.T.dot(errors) / X.shape[0]
            self.b_ += self.eta * 2.0 * errors.mean()
            loss = (errors**2).mean()
            self.losses_.append(loss)
        return self

    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_) + self.b_

    def activation(self, X):
        """Compute linear activation"""
        return X

    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.activation(self.net_input(X)) >= 0.5, 1, 0)

### 5-2. Train the AdalineGD model and calculate its accuracy

In [19]:
adaline = AdalineGD(eta=0.01, n_iter=100)
adaline.fit(X_train_std, y_train)

# calculate accuracy
adaline_y_pred = adaline.predict(X_test_std)
adaline_accu = accuracy_score(y_test, adaline_y_pred)
print('AdalineGD accuracy: %.3f' % adaline_accu)

AdalineGD accuracy: 0.965


### 6. Import Logistic Regression

In [20]:
from sklearn.linear_model import LogisticRegression

### Train the Logistic Regression model and calculate its accuracy

In [21]:
lr = LogisticRegression(C=1.0, solver='lbfgs', max_iter=100)

lr.fit(X_train_std, y_train)

lr_y_pred = lr.predict(X_test_std)

lr_accu = accuracy_score(y_test,  lr_y_pred)
print('Logistic Regression accuracy: %.3f' %lr_accu) 

Logistic Regression accuracy: 0.982


### 7. Import SVM

In [22]:
from sklearn.svm import SVC

### Train the SVM model with **linear kernel** and calculate its accuracy

In [23]:
svm = SVC(kernel='linear', C=1.0, random_state=1)
svm.fit(X_train_std, y_train)

svm_y_pred = svm.predict(X_test_std)
svm_accu = accuracy_score(y_test, svm_y_pred)
print('SVM with linear kernel accuracy: %.3f' %svm_accu) 

SVM with linear kernel accuracy: 0.982


### Train the SVM model with **rbf kernel** and calculate its accuracy

In [24]:
svm = SVC(kernel='rbf', C=10.0, gamma=0.1, random_state=1)
svm.fit(X_train_std, y_train)

svm_y_pred = svm.predict(X_test_std)
svm_accu = accuracy_score(y_test, svm_y_pred)
print('SVM with rbf kernel accuracy: %.3f' %svm_accu) 

SVM with rbf kernel accuracy: 0.982
