# ðŸ“˜ Neural Networks â€” Complete Documentation for NN with PyTorch Project

## 1. What is a Neural Network?

A **Neural Network (NN)** is a computational system inspired by the human brain. Its goal is to learn patterns from data by adjusting parameters called **weights**.

It performs a mapping:

\[
	ext{Input} 
ightarrow f(	ext{weights}) 
ightarrow 	ext{Output}
\]

NNs are powerful because they learn **non-linear, high-dimensional** relationships without manual feature engineering.

## 2. Why Do We Use Neural Networks?

Neural networks can automatically learn:

- Complex relationships  
- Non-linear decision boundaries  
- High-dimensional data (images, audio, text)  
- Features that would be impossible to design manually  

This makes them the backbone of modern AI.

## 3. Building Blocks of a Neural Network

### 3.1 Neuron (Perceptron)

A single neuron computes:

\[
z = w_1x_1 + w_2x_2 + ... + b
\]

Then applies an activation:

\[
a = \sigma(z)
\]

Where:
- \(z\) = weighted sum  
- \(b\) = bias  
- \(\sigma\) = activation function

### 3.2 Layers

A neural network stacks neurons into **layers**:

- **Input Layer** â€” receives data  
- **Hidden Layers** â€” extract features  
- **Output Layer** â€” final prediction  

Common layer types:
- `nn.Linear` (Fully Connected)  
- `nn.Conv2d` (Convolution)  
- `nn.LSTM` / `nn.GRU`  
- Normalization layers

### 3.3 Activation Functions

Activation functions introduce **non-linearity**.

| Activation | Description | Usage |
|-----------|-------------|--------|
| **ReLU** | `max(0, x)` | Most common for hidden layers |
| **Sigmoid** | Output 0â€“1 | Binary classification |
| **Tanh** | Output -1â€“1 | Older RNNs |
| **Softmax** | Probabilities | Final layer for multi-class |

## 4. How Neural Networks Learn

Learning happens in **4 steps**:

### 4.1 Forward Pass
Data moves through the model â†’ prediction:
\[
\hat{y}
\]

### 4.2 Compute Loss
Loss measures how wrong the prediction was.

### 4.3 Backpropagation
Compute gradients of the loss w.r.t. weights:
\[

rac{\partial L}{\partial w}
\]

### 4.4 Optimizer Updates Weights
Using optimizers like **Adam**:
\[
w = w - \eta \cdot 
rac{\partial L}{\partial w}
\]

## 5. Loss Functions

### Cross Entropy Loss (Classification)
\[
CE = -\sum y\log(\hat{y})
\]

### MSE (Regression)
\[
MSE = 
rac{1}{n}\sum (y-\hat{y})^2
\]

## 6. Optimizers

| Optimizer | Notes |
|----------|-------|
| **SGD** | Simple, but slow |
| **Adam** | Most common; adaptive |
| **RMSprop** | Good for RNNs |

## 7. Training Workflow in PyTorch

The typical workflow:

1. **Prepare dataset**  
   - Custom dataset or torchvision datasets  
   - DataLoader for batching  
2. **Build model (nn.Module)**  
3. **Define loss function**  
4. **Define optimizer**  
5. **Training loop**  
   - Forward â†’ Loss â†’ Backward â†’ Update  
6. **Evaluation loop**

## 8. Neural Network Architectures

### 8.1 MLP (Fully Connected NN)
- Used for small, simple datasets  
- Structure: Input â†’ Dense â†’ Activation â†’ Dense

### 8.2 CNN (Convolutional Neural Network)
Used for images:  
- Uses kernels/filters  
- Learns spatial patterns

### 8.3 RNN / LSTM
Used for sequences (text, time-series).  
Mostly replaced by Transformers.

### 8.4 Transformers
Current state of the art:  
- Use attention  
- Highly parallel  
- Dominant in text, vision, speech

## 9. Bias-Variance

### Bias  
Error due to oversimplified model (**underfitting**)

### Variance  
Error due to overly complex model (**overfitting**)

### Goal  
Balance both.

## 10. Evaluation Metrics

For classification:

| Metric | Meaning |
|--------|---------|
| **Accuracy** | % correct |
| **Precision** | Quality of positive predictions |
| **Recall** | Ability to find positives |
| **F1-score** | Precision + recall balance |
| **Confusion Matrix** | Per-class performance |

## 11. Common Training Problems

### Overfitting  
Model memorizes training data.  
**Fix:** dropout, regularization, augmentation.

### Underfitting  
Model too simple.  
**Fix:** more layers, train longer.

### Vanishing Gradients  
Gradients become too small.  
**Fix:** ReLU, batchnorm, better initialization.

## 12. Project Example: SimpleNN (Your MNIST Model)

Architecture:

- Flatten 28Ã—28 â†’ 784  
- Linear(784 â†’ 256)  
- ReLU  
- Linear(256 â†’ 10)  
- Softmax (inside `CrossEntropyLoss`)  

Training config:
- Loss: `nn.CrossEntropyLoss`  
- Optimizer: `Adam`  
- Dataset: MNIST

## 13. Real-World Applications

Neural networks power:

- Face recognition  
- ChatGPT  
- Image classification  
- Recommendation systems  
- Speech-to-text  
- Medical imaging  

NNs are foundational to modern AI.

## 14. Summary

A neural network is a layered function approximator that learns from data using backpropagation and gradient descent.  
It is capable of learning complex relationships and forms the core of modern deep learning.