## Hyper Parameters of a typical Neural Net...


What is a hyperparameter (one-line)

Hyperparameters are externally set configuration values that govern the training process and network architecture.

### 1. Training hyperparameters (most important)

These directly affect learning dynamics.

#### 1. Learning rate (η)

Step size for weight updates

Too high → divergence

Too low → slow learning

Typical values: 0.1, 0.01, 0.001

#### 2. Batch size

Number of samples per gradient update

Common values: 16, 32, 64, 128

Affects speed, memory, and generalization

#### 3. Number of epochs

Number of full passes over the dataset

Too few → underfitting

Too many → overfitting

#### 4. Optimizer

Algorithm for updating weights

Examples:

SGD

SGD + Momentum

Adam (most common)

RMSprop

#### 5. Momentum (if used)

Controls how much past gradients influence current update

Typical: 0.9

### 2. Architecture hyperparameters

These define the structure of the network.

#### 6. Number of layers (depth)

How many hidden layers

More layers → more expressive power

Risk of overfitting if excessive

#### 7. Number of neurons per layer (width)

Controls model capacity

Example: 128 → 64 → 32

#### 8. Activation function

Adds non-linearity

Common:

ReLU (hidden layers)

Sigmoid / Softmax (output layer)

#### 9. Weight initialization

How initial weights are set

Examples:

Xavier (for Tanh)

He initialization (for ReLU)

### 3. Regularization hyperparameters

Used to prevent overfitting.

#### 10. L2 / L1 regularization (weight decay)

Penalizes large weights

Common L2 values: 0.0001, 0.001

#### 11. Dropout rate

Fraction of neurons randomly dropped during training

Typical: 0.2 – 0.5

#### 12. Early stopping patience

Stops training when validation loss stops improving

Patience value: 5–10 epochs

### 4. Data-related hyperparameters
#### 13. Train–validation split

Example: 80/20, 70/30

#### 14. Data augmentation settings (if used)

Image flips, rotations, noise, etc.

Summary table (quick revision)
Category	Hyperparameters
Training	Learning rate, Batch size, Epochs, Optimizer
Architecture	Layers, Neurons, Activation, Initialization
Regularization	Dropout, L1/L2, Early stopping
Data	Split ratio, Augmentation
What is NOT a hyperparameter (common confusion)

Weights

Biases

Gradients

Embeddings (they are learned parameters)

Typical default setup (real-world)

For a beginner or standard DL project:

Optimizer: Adam

Learning rate: 0.001

Batch size: 32 or 64

Activation (hidden): ReLU

Epochs: 20–50

Dropout: 0.3