# Neural Network - A Simple Perceptron

## Question 1: What is Deep Learning? Briefly describe how it evolved and how it differs from traditional machine learning.
- Deep Learning is a subset of Machine Learning that uses artificial neural networks with multiple layers to automatically learn complex patterns from data.
- It evolved from Artificial Neural Networks (ANNs) inspired by the human brain.
- Early models like the Perceptron (1958) laid the foundation, but computational limitations slowed progress.
- With big data, GPUs, and improved algorithms, deep learning became widely successful in tasks like computer vision, speech recognition, and natural language processing.

### Difference from Traditional ML:
- Feature Engineering:
    - Traditional ML → requires manual feature extraction.
    - Deep Learning → automatically extracts features.

- Scalability:
    - Traditional ML → struggles with very large datasets.
    - Deep Learning → performs better with huge datasets.

- Performance:
    - Traditional ML → effective on structured/tabular data.
    - Deep Learning → excels in unstructured data (images, audio, text).

## Question 2: Explain the basic architecture and functioning of a Perceptron. What are its limitations?
- Architecture:
   - Inputs → Weighted Sum → Activation Function → Output
- Functioning:
   - Takes multiple input values.
   - Multiplies each by a weight and adds a bias.
   - Passes the sum through an activation function.
   - Produces an output (0/1 for binary classification).

- Limitations:
   - Can only solve linearly separable problems (fails on XOR).
   - Limited learning capacity (single-layer).
   - Sensitive to feature scaling.

## Question 3: Describe the purpose of activation function in neural networks. Compare Sigmoid, ReLU, and Tanh functions.
- Purpose: Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns.
- Comparison:

| Function    | Formula                                    | Range   | Pros                              | Cons                     |
| ----------- | ------------------------------------------ | ------- | --------------------------------- | ------------------------ |
| **Sigmoid** | $f(x) = \frac{1}{1+e^{-x}}$                | (0, 1)  | Smooth, probabilistic output      | Vanishing gradient       |
| **ReLU**    | $f(x) = \max(0, x)$                        | (0, ∞)  | Fast, reduces vanishing gradient  | Dead neurons             |
| **Tanh**    | $f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$ | (-1, 1) | Zero-centered, stronger gradients | Still vanishing gradient |


## Question 4: What is the difference between Loss function and Cost function in neural networks? Provide examples.
- Loss Function:
  - Measures error for a single training example.
  - Example: Mean Squared Error (MSE) for regression, Binary Cross-Entropy for classification.

- Cost Function:
  - Average of the loss functions over the entire dataset.
  - Key Difference: Loss → single instance, Cost → whole dataset.

## Question 5: What is the role of optimizers in neural networks? Compare Gradient Descent, Adam, and RMSprop.

| Optimizer            | Key Idea                                 | Pros                    | Cons                             |
| -------------------- | ---------------------------------------- | ----------------------- | -------------------------------- |
| **Gradient Descent** | Updates weights using gradient           | Simple, stable          | Slow, sensitive to learning rate |
| **RMSprop**          | Uses moving average of squared gradients | Good for RNNs, adaptive | May converge too fast            |
| **Adam**             | Combines Momentum + RMSprop              | Fast, widely used       | Slightly more memory usage       |


## Question 6: Implement a single-layer perceptron from scratch (NumPy) for AND gate.

In [None]:
# Solution 6

import numpy as np

# AND gate dataset
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 0, 0, 1])

# Initialize weights & bias
w = np.zeros(X.shape[1])
b = 0
lr = 0.1

# Training perceptron
for epoch in range(20):
    for i in range(len(X)):
        linear_output = np.dot(X[i], w) + b
        y_pred = 1 if linear_output >= 0 else 0
        error = y[i] - y_pred
        w += lr * error * X[i]
        b += lr * error

print("Weights:", w, "Bias:", b)

# Testing
for xi in X:
    print(xi, "=>", 1 if np.dot(xi, w) + b >= 0 else 0)


## Question 7: Implement and visualize Sigmoid, ReLU, and Tanh activation functions.

In [None]:
# Solution 7

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-10, 10, 100)

sigmoid = 1 / (1 + np.exp(-x))
relu = np.maximum(0, x)
tanh = np.tanh(x)

plt.figure(figsize=(10,6))
plt.plot(x, sigmoid, label="Sigmoid")
plt.plot(x, relu, label="ReLU")
plt.plot(x, tanh, label="Tanh")
plt.legend()
plt.title("Activation Functions")
plt.grid(True)
plt.show()


## Question 8: Use Keras to build a multilayer NN on MNIST dataset.

In [None]:
# Solution 8

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical

# Load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

# Build model
model = Sequential([
    Flatten(input_shape=(28,28)),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=5, batch_size=32, validation_data=(x_test, y_test))

print("Training Accuracy:", history.history['accuracy'][-1])

## Question 9: Visualize loss & accuracy curves for Fashion MNIST.

In [None]:
# Solution 9

from tensorflow.keras.datasets import fashion_mnist

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

model = Sequential([
    Flatten(input_shape=(28,28)),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_data=(x_test, y_test))

# Plot curves
plt.plot(history.history['loss'], label="Training Loss")
plt.plot(history.history['val_loss'], label="Validation Loss")
plt.plot(history.history['accuracy'], label="Training Accuracy")
plt.plot(history.history['val_accuracy'], label="Validation Accuracy")
plt.legend()
plt.title("Loss & Accuracy Curves")
plt.show()


## Question 10: Fraud Detection Workflow (Banking Project).

- Model Design:
  - Use a Multilayer Neural Network (MLP) with input layer → hidden layers (ReLU) → output (Sigmoid for fraud/legit).

- Activation & Loss:
 - Hidden Layers → ReLU
 - Output Layer → Sigmoid
 - Loss Function → Binary Cross-Entropy (best for binary classification).

- Training & Evaluation:
 - Handle class imbalance with SMOTE or class weights.
 - Evaluate with Precision, Recall, F1, AUC.

- Optimizer & Overfitting Prevention:
 - Use Adam optimizer.
 - Apply Dropout and EarlyStopping.

In [None]:
# Solution 10

from sklearn.utils.class_weight import compute_class_weight
import numpy as np

class_weights = compute_class_weight(class_weight='balanced',
                                     classes=np.unique([0,1]),
                                     y=[0,0,1,0,1,1,0]) # example
print("Class Weights:", class_weights)

model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.3),
    Dense(32, activation='relu'),
    Dropout(0.3),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy','AUC'])
model.fit(X_train, y_train, epochs=20, batch_size=128,
          validation_data=(X_val, y_val), class_weight={0: class_weights[0], 1: class_weights[1]})