### The Perceptron Learning Algorithm

The Perceptron Learning Algorithm is a fundamental method for training a simple binary classifier. It is an iterative method that updates the weights of a perceptron based on errors made in classification. Below is a detailed breakdown of how it works.

#### 1. Setup and Notation

We have:
- **Input features**: $\mathbf{x}$, where each sample $\mathbf{x}_i$ has $n$ features.
- **Binary labels**: $y$ (we use -1 and 1 instead of 0 and 1 for mathematical convenience).
- **Weights**: $\mathbf{w}$, initialized to zeros or small random values.
- **Bias**: $b$, initialized to zero.

A perceptron makes predictions using a linear decision rule:

$$\hat{y} = \text{sign}(\mathbf{w} \cdot \mathbf{x} + b)$$

where:
- $\mathbf{w} \cdot \mathbf{x}$ is the dot product of the weight vector and input features.
- $b$ is the bias term (optional but useful).
- The sign function returns +1 if the result is positive, and -1 otherwise.

#### 2. Learning Algorithm

**Step 1: Initialize Weights and Bias**
- Set all weights $\mathbf{w}$ to zero (or small random values).
- Set bias $b$ to zero.

**Step 2: Iterate Over the Dataset**

For each training sample $(\mathbf{x}_i, y_i)$:
1. **Compute the Predicted Output**

$$\hat{y}_i = \text{sign}(\mathbf{w} \cdot \mathbf{x}_i + b)$$

- If $\hat{y}_i = y_i$, the prediction is correct, and we do nothing.
- If $\hat{y}_i \neq y_i$, the prediction is incorrect, so we update the weights.

2. **Update the Weights and Bias**

When the perceptron makes a mistake, update the weights and bias using:

$$\mathbf{w} \leftarrow \mathbf{w} + \eta y_i \mathbf{x}_i$$
$$b \leftarrow b + \eta y_i$$

- $\eta$ is the learning rate (typically a small value).

**Step 3: Repeat Until Convergence**

- Continue updating until all points are correctly classified or a maximum number of iterations is reached.

#### 3. Intuition Behind Weight Updates

Why does the update rule work?
- If the perceptron misclassifies a point $(\mathbf{x}_i, y_i)$, then the dot product $\mathbf{w} \cdot \mathbf{x}_i$ has the wrong sign.
- The update moves $\mathbf{w}$ closer to the correct classification direction.
- Over time, the updates shift the decision boundary until it correctly separates the data.

#### 4. Example Walkthrough

Imagine we have two features ($x_1$ and $x_2$) and a dataset with two classes (-1 and 1):

| $x_1$ | $x_2$ | $y$ |
|-------|-------|-----|
| 2     | 3     | 1   |
| -1    | -2    | -1  |
| 1     | 1     | 1   |
| -2    | -1    | -1  |

Initial Weights:

Let's assume $\mathbf{w} = [0, 0]$ and $b = 0$.

First Sample (2,3), $y = 1$

Prediction:
$$\hat{y} = \text{sign}(0 \cdot 2 + 0 \cdot 3 + 0) = 0,$$
and
$\hat{y} \neq y,$ so update:

$$\mathbf{w} \leftarrow \mathbf{w} + \eta y \mathbf{x} = [0, 0] + 1 \cdot 1 \cdot [2, 3] = [2, 3]$$
$$b \leftarrow b + \eta y = 0 + 1 \cdot 1 = 1$$

Second Sample (-1, -2), $y = -1$

$$\hat{y} = \text{sign}(2 \cdot -1 + 3 \cdot -2 + 1) = \text{sign}(-8) = -1$$
Since $\hat{y} = y$, no update.

Continue Updating Until Convergence

This process repeats until all points are correctly classified.

#### 5. Convergence and Limitations

- **Guarantees**: If the data is linearly separable, the perceptron will eventually find a separating hyperplane.
- **Limitations**:
    - The perceptron cannot solve non-linearly separable problems (e.g., XOR problem).
    - If the data is not separable, it will never converge and keep updating indefinitely.
    - The perceptron does not model probabilities—it only outputs hard decisions (-1 or 1).

#### 6. Summary
1. Initialize weights and bias to zero.
2. For each training sample:
     - Compute prediction using the sign function.
     - If misclassified, update weights and bias.
3. Repeat until convergence or a maximum number of epochs.
