# Deep Learning : It is subset of Machine Learning
Deep learning is a method in artificial intelligence (AI) that teaches computers to process data in a way that is inspired by the human brain.

# Why we use deep learning
Deep learning models can recognize complex patterns in pictures, text, sounds, and other data to produce accurate insights and predictions

# How we use deep learning ?
Deep learning has aided image classification, language translation, speech recognition. It can be used to solve any pattern recognition problem and without human intervention. Artificial neural networks, comprising many layers, drive deep learning.

![image.png](attachment:image.png)

### Why is Deep Learning Important?
The reasons why deep learning has become the industry standard:

- Handling unstructured data: Models trained on structured data can easily learn from unstructured data, which reduces time and resources in standardizing data sets.
- Handling large data: Due to the introduction of graphics processing units (GPUs), deep learning models can process large amounts of data with lightning speed.
- High Accuracy: Deep learning models provide the most accurate results in computer visions, natural language processing (NLP), and audio processing.
- Pattern Recognition: Most models require machine learning engineer intervention, but deep learning models can detect all kinds of patterns automatically.

# Core Concepts of Deep Learning
---

### **1. Neural Networks Basics**

* **Artificial Neuron / Perceptron**
* **Input, Weights, Bias**
* **Activation Functions**

  * Sigmoid, Tanh, ReLU, Leaky ReLU, Softmax
* **Forward Propagation**
* **Loss Functions**

  * MSE, Cross-Entropy

---

### **2. Backpropagation & Gradient Descent**

* **Backpropagation Algorithm**

  * Chain Rule for derivatives
* **Gradient Descent Optimization**

  * Variants: SGD, Mini-batch, Adam, RMSProp, Momentum
* **Learning Rate & Tuning**

---

### **3. Network Architectures**

* **Feedforward Neural Networks (FNN / MLP)**
* **Convolutional Neural Networks (CNN)**

  * Convolution, Padding, Stride
  * Pooling (Max, Avg)
* **Recurrent Neural Networks (RNN)**

  * Vanishing/Exploding Gradient Problem
  * LSTM / GRU
* **Transformers**

  * Self-Attention, Multi-Head Attention
  * Positional Encoding

---

### **4. Regularization & Generalization**

* **Overfitting vs Underfitting**
* **Dropout**
* **L1/L2 Regularization**
* **Early Stopping**
* **Data Augmentation**

# Before going to Deep Learning you just understand what is it, What it do for all the above concept 
---

### **5. Training Techniques**

* **Batch Size, Epochs, Iterations**
* **Loss Curve Analysis**
* **Learning Rate Scheduling**
* **Gradient Clipping**

---

### **6. Evaluation Metrics**

* **Accuracy, Precision, Recall, F1 Score**
* **Confusion Matrix**
* **ROC / AUC**
* **Perplexity** (for language models)

---

### **7. Transfer Learning & Fine-tuning**

* **Pretrained Models**
* **Feature Extraction**
* **Fine-tuning vs Freezing Layers**

---

### **8. Deep Learning Frameworks**

* **TensorFlow / Keras**
* **PyTorch**
* **Model Definition, Training Loop, Autograd**

---

### **9. Deployment & Inference**

* **Model Saving/Loading**
* **ONNX, TensorRT**
* **Quantization & Pruning (for edge devices)**

---

### **10. Specialized Architectures (optional but core for advanced DL)**

* **GANs (Generative Adversarial Networks)**

  * Generator vs Discriminator
* **Autoencoders**
  * Denoising, Variational Autoencoders (VAE)
* **Attention Mechanisms**
* **Transformers in Vision (ViT)**

---

### Applications 

![image.png](attachment:image.png)

# Neural Network
Neural Network is a method in Aritifical Intelligence that teaches Computer to Process data in a way that is inspired by the human brain.

Neural networks are designed to work just like the human brain does. In the case of recognizing handwriting or facial recognition, the brain very quickly makes some decisions. For example, in the case of facial recognition, the brain might start with ‚ÄúIt is female or male? Is it black or white?  

![image.png](attachment:image.png)

## 1. Neurons (Nodes): Neuron is like a small componential unit. It receives the input, Perform some simple calculation, then Produces an output.
| **Biological Neuron** | **Artificial Neuron**      |
| --------------------- | -------------------------- |
| Cell nucleus          | Node                       |
| Dendrites             | Input                      |
| Synapse               | Weights or Interconnection |
| Axon                  | Output                     |


- Weights: Weights are Numerical values assigned to each input connection of a neuron, Determining the strength or importance of that input.
- Biases: There are additional constants added to the neurons weighted sum, acting as an Offset that allows the neuron to active more easily or less easily, Independent of the inputs.
Both Weights and biases are the learnable parameters of neural network consistently adjusted during training to enable the network to recognize patterns and make accurate prediction.

## 2. Layers: Neurons are not just scattered randomly they are stacked in distinct groups called layers. The way these layers are arranged and connected defines the architecture of the network.
Three main Layers:
1.Input Layer

2.Hidden Layer

3.Output Layer

- Input Layer: In input layer it receives and present the data to the rest of the network. It does not perform any competition Like applying bias, weights or activation function. Each neuron in the input layer typically corresponds to one feature the data set.
- Hidden Layer: It is between the input layer and the output layer. A network can have one or hidden layer. Here the magic of feature extraction and pattern recognition happens. In Neurons hidden layer performs complex calculation like (weighted sum+ bias) and then activation function to learn on the Input.
Each neuron in a hidden layer receive input from all the neurons in the previous layer and each neuron in the hidden layer sends its output to all the neurons in the subsequent layers to either if another hidden layer present or to the output layer.
- Output Layer: It is the final layer of the neural network It produces the network‚Äôs final prediction or decision.  It tailored to the specific task example ( a continuous value of regression probabilities of classification)

## Why Layers Important? 
Players are important because they enable neural networks to learn Hierarchical representations and increasingly complex pattern from data crucial for solving complex problems.

3.Connections: In neural network connections are the links between neurons in different layers.
- Informal flow: They are the pathway along which the outputs from one neuron another neuron.
- Weights: Each connection has numerical value called weight Associated with it. This way determines the strength and influence of the connection if the weight is higher then the signal passes through that connection has a strong impact on the receiving Neuron
- Learning: During the training process It is the process of automatically adjusting the Weights and biases of Connections between neurons minimize the losses or error between predicted and actual outputs, Enabling the network to accurately map inputs to desired outputs.
 
## 3. Activation Functions: It is a mathematical function applied to the output of each neurons weighted sum of input (plus Bias) to Introduce nonlinearity into the network, Enabling it to learn complex patterns and make complex decision.
- Without activation functions, no matter how many layers we stack, the network will behave like a linear function

## Types of Activation function:
1. Linear Activation
Used only in output layers for regression tasks (rare).

f(x)=x

3. Non-Linear Activations
Used in hidden and/or output layers. These introduce complexity and learning capacity.

![image.png](attachment:9c4d3d8d-6328-48b5-816c-3a4c08796efc.png)

# What are perceptrons?
In a neural network, we have the same basic principle, except the inputs are binary and the outputs are binary. The objects that do the calculations are perceptrons. They adjust themselves to minimize the loss function until the model is very accurate. For example, we can get handwriting analysis to be 99% accurate.

As we said, a perceptron is an object that takes binary inputs and outputs a binary output. It uses a weighted sum and a threshold to decide whether the outcome should be yes (1) or no (0).

#### For example, suppose you want to go to France but only if:

##### x1 -> The airline ticket is less than $1,000.
##### x2 -> Your girlfriend or boyfriend can go with you.

![image.png](attachment:a881d435-bb75-4e6b-bad7-effd27b83ada.png)

# How it works

![image.png](attachment:37be1270-cdd6-4efb-8e08-a8484cd19344.png)

# Deep Learning Components:
1.	Neurons (Nodes)
2.	Layers
3.	Connections
4.	Activation Functions


## Perceptron models are divided into two types. These are as follows:
- Single-layer Perceptron Mode
- Multi-layer Perceptron model

### 1. Single-layer Perceptron Mode

single-layered perceptron model consists feed-forward network and also includes a threshold transfer function inside the model. The main objective of the single-layer perceptron model is to analyze the linearly separable objects with binary outcomes.

### 2. Multi-layer Perceptron model
single-layer perceptron model, a multi-layer perceptron model also has the same model structure but has a greater number of hidden layers.


The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes in two stages as follows:

### Forward Stage: Activation functions start from the input layer in the forward stage and terminate on the output layer.

#### Backward Stage: In the backward stage, weight and bias values are modified as per the model's requirement. In this stage, the error between actual output and demanded originated backward on the output layer and ended on the input layer.

## Perceptron model has limitations as follows:

1. The output of a perceptron can only be a binary number (0 or 1) due to the hard limit transfer function.

2. Perceptron can only be used to classify the linearly separable sets of input vectors. If input vectors are non-linear, it is not easy to classify them properly.

# Deep Learning Architecture:

| **Architecture**                   | **Abbreviation** | **Description**                                                             | **Category/Type**                      |
| ---------------------------------- | ---------------- | --------------------------------------------------------------------------- | -------------------------------------- |
| **Single Layer Perceptron**        | SLP              | Basic model with one layer; only solves linearly separable problems.        | Feedforward Network                    |
| **Multilayer Perceptron**          | MLP              | Fully connected network with hidden layers; non-linear function learning.   | Feedforward Network                    |
| **Radial Basis Function Network**  | RBFN             | Uses radial basis functions; strong for function approximation.             | Feedforward Network                    |
| **Convolutional Neural Network**   | CNN              | Captures spatial hierarchies in image data using convolution and pooling.   | Convolution-Based                      |
| **Capsule Network**                | CapsNet          | Enhances CNNs by modeling spatial relationships explicitly.                 | Convolution-Based                      |
| **Recurrent Neural Network**       | RNN              | Processes sequences with temporal dependencies using recurrent connections. | Recurrent / Temporal Network           |
| **Long Short-Term Memory**         | LSTM             | Solves RNN issues by maintaining long-term dependencies with memory gates.  | Recurrent / Temporal Network           |
| **Gated Recurrent Unit**           | GRU              | Simpler, faster version of LSTM with similar capabilities.                  | Recurrent / Temporal Network           |
| **Echo State Network**             | ESN              | Reservoir computing model with fixed recurrent layer.                       | Recurrent / Temporal Network           |
| **Self-Organizing Map**            | SOM              | Unsupervised learning that projects data onto lower dimensions.             | Unsupervised / Representation Learning |
| **Autoencoder**                    | ‚Äî                | Learns compressed input representations; used for denoising, compression.   | Unsupervised / Representation Learning |
| **Variational Autoencoder**        | VAE              | Probabilistic autoencoder for generative tasks.                             | Generative / Representation Learning   |
| **Deep Belief Network**            | DBN              | Stacked RBMs used for unsupervised pre-training.                            | Unsupervised / Representation Learning |
| **Generative Adversarial Network** | GAN              | Composed of a Generator and Discriminator in a minimax game.                | Generative Model                       |
| **Transformer**                    | ‚Äî                | Attention-based architecture for sequence modeling (e.g., NLP).             | Attention-Based                        |
| **Vision Transformer**             | ViT              | Applies Transformer to image patches for vision tasks.                      | Attention-Based                        |
| **Modular Neural Network**         | MNN              | Multiple independent networks trained and combined for a shared task.       | Modular / Ensemble-Based               |
| **Extreme Learning Machine**       | ELM              | Feedforward model with random hidden weights; very fast training.           | Feedforward / Specialized              |
| **Spiking Neural Network**         | SNN              | Neuromorphic model mimicking biological neuron firing.                      | Neuromorphic / Bio-Inspired            |


![ChatGPT Image Jul 17, 2025, 09_35_01 AM.png](attachment:6f2f13fd-1955-4bb4-8c81-33768c7fcdea.png)

## Neural network Data type process
-  Artificial Neural Networks (ANN) - Tabular data, Image data, Text data 
- Convolution Neural Networks (CNN) - Image Classification
- Recurrent Neural Networks (RNN)  - Time Series data, Text data, Audio data
- Radial Basis Functional Neural Network


# Batch Normalization :
Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image.png](attachment:image.png)

#### STEP 1 : BN layer first determines the mean ùúá and the variance œÉ¬≤ of the activation values across the batch, using above (1) and (2) Formula
####  STEP 2 : Normalizes the activation vector Z^(i) with (3). That way, each neuron‚Äôs output follows a standard normal distribution across the batch. 
####  STEP 3 : calculates the layer‚Äôs output ·∫ê(i) by applying a linear transformation with ùõæ and ùõΩ, two trainable parameters (4).
#### STEP 4 : ùõæ allows to adjust the standard deviation ; ùõΩ allows to adjust the bias, shifting the curve on the right or on the left side.


## Note: 
we may not have a full batch to feed into the model during the evaluation phase.
## For convolutional networks (CNN) : Batch Normalization (BN) is better
## For recurrent network (RNN) : Layer Normalization (LN) is better

### BN is widely used because it almost always makes deep learning models perform much better.
##  Batch Normalization increases our network performances, regarding both the loss and the accuracy.

    library view:
    
                    tf.keras.layers.BatchNormalization
                    
                     
               strategy = tf.distribute.MirroredStrategy()
               with strategy.scope():
                   model = tf.keras.Sequential()
                   model.add(tf.keras.layers.Dense(16))
                   model.add(tf.keras.layers.BatchNormalization(synchronized=True))

In [None]:
Covariate Shift

## Types of Dataset Shift
Dataset shift could be divided into three types:

Shift in the independent variables (Covariate Shift)
Shift in the target variable (Prior probability shift)
Shift in the relationship between the independent and the target variable (Concept Shift)

# Covariate Shift
Covariate shift refers to the change in the distribution of the input variables present in the training and the test data. It is the most common type of shift and it is now gaining more attention as nearly every real-world dataset suffers from this problem.

![image-2.png](attachment:image-2.png)

# Regularization

Regularization is a set of techniques that can prevent overfitting in neural networks and thus improve the accuracy of a Deep Learning model when facing completely new data from the problem domain.

Regularization techniques which are called L1, L2, and dropout.

---

### üîÅ **Overfitting vs Underfitting**

| Concept          | Description                                                                                                                                    |
| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| **Overfitting**  | The model learns **too much** from the training data, including noise and outliers. It performs well on training data but poorly on test data. |
| **Underfitting** | The model is **too simple** to capture patterns in the training data. It performs poorly on both training and test data.                       |

> ‚úÖ Goal: A well-generalized model that avoids both.

---

### üß™ **Dropout**

**What it is:**
Randomly **disables a percentage of neurons** during training in each layer.

**Why it's useful:**

* Prevents co-dependency between neurons
* Forces the network to learn more **robust features**

**Example:** If dropout = 0.5, half of the neurons in a layer are randomly "dropped" in each iteration.

---

### üßÆ **L1 and L2 Regularization**

| Type           | Meaning                                                     | Effect                                             |
| -------------- | ----------------------------------------------------------- | -------------------------------------------------- |
| **L1 (Lasso)** | Adds the **absolute value** of weights to the loss function | Promotes **sparsity** (sets some weights to 0)     |
| **L2 (Ridge)** | Adds the **square of weights** to the loss function         | Keeps weights **small**, discourages over-reliance |

**Why it works:**
Penalizing large weights helps the model **simplify** and avoid overfitting.

---

### ‚è≥ **Early Stopping**

**What it is:**
Stop training the model when the **validation loss starts increasing**, even if training loss is still decreasing.

**Why it's useful:**

* Prevents overfitting
* Saves training time
* Uses validation data as a warning system

---

### üñº **Data Augmentation**

**What it is:**
Artificially expanding the dataset by applying **transformations** like:

* Image rotation, flipping, zooming, cropping
* Adding noise, brightness changes, etc.

**Why it's useful:**

* Increases dataset diversity
* Reduces overfitting
* Makes the model more robust to variations in input

---

### ‚úÖ Summary Table:

| **Technique**            | **Purpose**                         | **How it Helps**                                |
| ------------------------ | ----------------------------------- | ----------------------------------------------- |
| Overfitting/Underfitting | Understand model behavior           | Guides regularization choices                   |
| Dropout                  | Randomly deactivate neurons         | Prevents co-adaptation, improves generalization |
| L1 Regularization        | Penalize large weights              | Creates sparse models                           |
| L2 Regularization        | Penalize squared weights            | Keeps weights small, avoids overfitting         |
| Early Stopping           | Stop training based on validation   | Prevents late-stage overfitting                 |
| Data Augmentation        | Modify inputs to simulate diversity | Trains robust, generalized models               |

---