## Steps to implement a Perceptron

1. Define a Perceptron model (a single-layer neural network).
2. Use forward propagation to compute predictions.
3. Compute loss (Binary Cross-Entropy).
4. Use gradient descent to update weights.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

# 1. Define the model
class Perceptron(nn.Module):
    def __init__(self, input_size):
        super(Perceptron, self).__init__()
        self.fc = nn.Linear(input_size, 1) # Single neuron with weights and bias (y = xW + b)

    def forward(self, x):
        return torch.sigmoid(self.fc(x))

# 2. Create dataset
X = torch.tensor(
    [
        [0, 0],
        [0, 1],
        [1, 0],
        [1, 1]
    ],
    dtype=torch.float32
)
y = torch.tensor(
    [[0], [0], [0], [1]],
    dtype=torch.float32
)

# 3. Initialize model, loss function, and optimizer
model = Perceptron(input_size=2)
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)


# 4. Training loop
for epoch in range(100):
    optimizer.zero_grad() # reset gradients of the model's parameters to 0
    outputs = model(X) # Forward pass
    loss = criterion(outputs, y) # Compute loss
    loss.backward() # Backward pass
    optimizer.step() # Update weights using gradient descent

    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item():.4f}')


# 55. Test the perceptron
with torch.no_grad():
    preds = model(X).round()
    print("\nPredictions:\n", preds)


Epoch 0, Loss: 0.7898
Epoch 10, Loss: 0.6798
Epoch 20, Loss: 0.6188
Epoch 30, Loss: 0.5808
Epoch 40, Loss: 0.5538
Epoch 50, Loss: 0.5323
Epoch 60, Loss: 0.5139
Epoch 70, Loss: 0.4975
Epoch 80, Loss: 0.4824
Epoch 90, Loss: 0.4685

Predictions:
 tensor([[0.],
        [0.],
        [0.],
        [0.]])


## **Interview Questions:**  

---

## **📌 Basic Questions (Conceptual)**

1. **What is a Perceptron, and how does it work?**  
   - **Answer:** A perceptron is the simplest type of artificial neural network, consisting of a single layer of neurons. It computes a weighted sum of the inputs, applies an activation function (e.g., sigmoid or step function), and outputs a binary classification.  

2. **Why do we use the sigmoid activation function in the given Perceptron implementation?**  
   - **Answer:** Sigmoid outputs values between **0 and 1**, making it suitable for binary classification. It also allows the model to learn using gradient-based optimization (e.g., SGD).  

3. **Can a single-layer Perceptron solve non-linearly separable problems?**  
   - **Answer:** No, a **single-layer Perceptron** can only learn **linearly separable** functions (e.g., AND, OR). It **cannot** learn XOR since XOR is **non-linearly separable**.  
   Note: Logistic Regression and perceptron can only non-linearly separable problems

4. **Why do we use Binary Cross-Entropy (BCELoss) instead of Mean Squared Error (MSE) for classification?**  
   - **Answer:** BCELoss is better suited for **probabilistic outputs** because it maximizes the log likelihood, leading to better convergence in classification problems. MSE is mainly for regression and does not perform well with probabilities.  

5. **What is the role of `optimizer.zero_grad()` in the training loop?**  
   - **Answer:** It resets the **gradients** before backpropagation to prevent accumulation from previous iterations.  

---

## **📌 Advanced Questions (Optimization, Training, and Modifications)**
6. **Why do we use `torch.sigmoid()` in the forward function instead of applying it inside the loss function?**  
   - **Answer:** In binary classification (when we use `BCELoss()`), it's common to apply `sigmoid()` in the model. However, if we were using `BCEWithLogitsLoss()`, PyTorch would automatically apply `sigmoid()`, making it unnecessary in the forward pass.  

7. **What happens if we increase or decrease the learning rate (`lr=0.1`)?**  
   - **Answer:**  
   - **Too high (`lr > 1`)** → Model may overshoot optimal weights and fail to converge.  
   - **Too low (`lr < 0.01`)** → Training becomes slow, taking longer to converge.  

8. **How would you modify this Perceptron to handle multi-class classification?**  
   - **Answer:**  
     - Replace **output layer** (`nn.Linear(input_size, 1)`) with `nn.Linear(input_size, num_classes)`.  
     - Use **softmax activation** instead of sigmoid.  
     - Use **CrossEntropyLoss** instead of BCELoss.  

9. **Why do we use `outputs.round()` during testing?**  
   - **Answer:** The model outputs probabilities (values between 0 and 1). `round()` converts them into binary values (`0 or 1`) for classification.  

10. **How can we make this model work with GPUs?**  
    - **Answer:** Move the model and tensors to CUDA using `model.to("cuda")` and `X.to("cuda")`.  

---

## **📌 Code Debugging and Practical Questions**
11. **If the loss does not decrease, what are possible reasons?**  
   - **Answer:**  
     - **Learning rate is too high/low**.  
     - **Data is not linearly separable** (e.g., XOR problem).  
     - **Training data is too small or imbalanced**.  
     - **Weight initialization is poor** (use Xavier/He initialization).  

12. **How can we visualize decision boundaries for this Perceptron?**  
   - **Answer:** Plot the decision boundary using `matplotlib` by creating a mesh grid and computing model predictions for each point.  

13. **What is the role of the `requires_grad=True` in PyTorch tensors?**  
   - **Answer:** It enables **autograd (automatic differentiation)** to compute gradients during backpropagation.  

---

## **📌 System Design & Performance**
14. **How would you optimize this perceptron for large datasets?**  
   - **Answer:**  
     - Use **mini-batch training** instead of full-batch.  
     - Use **Adam optimizer** instead of SGD for better convergence.  
     - Implement **early stopping** to prevent overfitting.  

15. **What are the limitations of a Perceptron, and how can we overcome them?**  
   - **Answer:**  
     - **Cannot learn non-linearly separable problems** → Use a **multi-layer perceptron (MLP)** with hidden layers.  
     - **Slow convergence** → Use **batch normalization** or **adaptive optimizers (Adam, RMSprop)**.  

---
