<a href="https://colab.research.google.com/github/HarshMartinTopno/From_Scratch/blob/main/AdaLiNe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
import numpy as np

# Implementation

# Adaline (ADAptive LInear NEuron) Algorithm

Adaline is a single-layer neural network used for binary classification. It is an improvement over the perceptron algorithm and is based on the concept of minimizing a cost function using gradient descent. Below is an explanation of the algorithm based on the provided code implementation.

---

## Algorithm Steps

### 1. **Initialization**
- The weights (`self.w_`) and bias (`self.b_`) are initialized randomly using a normal distribution with a small scale (`scale=0.01`).
- The learning rate (`self.lr`) and number of iterations (`self.n_iters`) are set during initialization.

---

### 2. **Training (Fit Method)**
Adaline uses **gradient descent** to minimize the **mean squared error (MSE)** loss function. For each epoch (iteration):

1. **Net Input**: Compute the net input as a linear combination of the input features and weights, plus the bias:

   $net$\_$input$ $=$ $X$ $⋅$ $w$ + $b$


2. **Activation Function**: The activation function is the identity function (i.e., no transformation is applied):
   
   $output =$ $net$\_$input$

3. **Error Calculation**: Compute the error as the difference between the true labels (`y`) and the predicted output:
   
   $errors = y - output$

4. **Weight and Bias Update**: Update the weights and bias using the gradient of the MSE loss function:

   $w$ = $w$ + $η$ $⋅$ $\frac{2}{N}$ $⋅$ $X^{T}$ $⋅$ $errors$

   $b$ = $b$ + $η$ $⋅$ $\frac{2}{N}$ $⋅$ $mean(errors)$

   Here, $eta$ is the learning rate, and $N$ is the number of samples.


5. **Loss Calculation**: Compute the MSE loss for the current epoch and store it in `self.losses_`.

---

### 3. **Prediction**
- After training, the model predicts the class label for new input data:
  - Compute the net input and apply the activation function.
  - If the output is greater than or equal to 0.5, predict class 1; otherwise, predict class 0.

---

## Benefits of Adaline

1. **Smooth Convergence**:
   - Adaline uses gradient descent to minimize the MSE loss, which provides smoother convergence compared to the perceptron algorithm.

2. **Continuous Optimization**:
   - The use of a continuous loss function (MSE) allows for better optimization and fine-tuning of weights.

3. **Interpretability**:
   - The weights and bias learned by Adaline can be interpreted as the importance of each feature in the decision-making process.

4. **Foundation for Advanced Models**:
   - Adaline serves as a foundation for more advanced models like logistic regression and multi-layer neural networks.

---

## Drawbacks of Adaline

1. **Sensitive to Learning Rate**:
   - The performance of Adaline heavily depends on the choice of the learning rate. A too-small learning rate leads to slow convergence, while a too-large learning rate can cause instability.

2. **Linear Separability**:
   - Like the perceptron, Adaline works well only for linearly separable data. It cannot handle non-linear decision boundaries without feature transformations.

3. **No Guarantee of Global Optimum**:
   - Gradient descent can get stuck in local minima, especially if the loss surface is non-convex.

4. **Scalability**:
   - For large datasets, Adaline can be computationally expensive due to the need to compute the gradient over the entire dataset in each iteration.

5. **Binary Classification Only**:
   - The basic Adaline implementation is limited to binary classification tasks. Extending it to multi-class classification requires modifications.

---

## Key Differences from Perceptron

- **Loss Function**:
  - Perceptron uses a step function for classification and updates weights based on misclassifications.
  - Adaline uses a continuous loss function (MSE) and updates weights using gradient descent.

- **Convergence**:
  - Perceptron converges only if the data is linearly separable.
  - Adaline converges to the best possible solution even if the data is not perfectly separable.

---

In [5]:
class Adaline:

  """ADAptive LInear NEuron classifier. (ADALINE)


  Parameters
  ------------

  eta : float
      Learning rate (between 0.0 and 1.0)

  n_iter : int
      Passes over the training dataset.

  random_state : int
      Random number generator seed for random weight initialization.


  Attributes
  -----------

  w_ : 1d-array
      Weights after fitting.

  b_ : Scalar
      Bias unit after fitting.

  losses_ : list
      Mean squared error loss function values in each epoch.


  """

  def __init__(self, lr = 0.01, n_iters = 50, random_state = 69):

    self.lr = lr
    self.n_iters = n_iters

  def fit(self, X, y):

    rgen = np.random.RandomState(self.random_state)
    self.w_ = rgen.normal(loc = 0.0, scale = 0.01, size = X.shape[1])
    self.b_ = float(0.)
    self.losses_ = []

    for i in range(self.n_iters):

      net_input = self.net_input(X)
      output = self.activation(net_input)
      errors = (y - output)
      self.w_ += self.lr * 2.0 * X.T.dot(errors) / X.shape[0]
      self.b_ += self.lr * 2.0 * errors.mean()
      loss = (errors**2).mean()
      self.losses_.appen(loss)
    return self

  def net_input(self, X):
    return np.dot(X, self.w_) + self.b_


  def activation(self, X):
    return X

  def predict(self, X):
    return np.where(self.activation(self.net_input(X)) >= 0.5, 1, 0)

