## Introduction to Neural Networks

Neural Networks (NNs) are a family of machine learning models inspired by the structure and functioning of the human brain.  
They are capable of **recognizing patterns**, **learning complex relationships**, and making **predictions or classifications** from data.

### 1. What is a Neural Network?

- A Neural Network is a set of **layers of interconnected nodes (neurons)**  that mimics the human brain to learn from data.  
- Each neuron receives inputs, processes them with mathematical operations, and passes the result to the next layer.  
- By adjusting the strength of the connections (called **weights**) during training, the network learns how to map inputs to outputs.

![image-2.png](attachment:image-2.png)

![image.png](attachment:image.png)

### How Neural Networks Work

1. **Layers**: A neural network is made up of an input layer, one or more hidden layers, and an output layer. 
2. **Neurons (Nodes)**: Each layer contains numerous artificial neurons that receive inputs.
Each node:
- Receives inputs from the previous layer.
- Multiplies each input by its weight.
- Adds a bias (a constant offset).
- Applies an activation function to produce an output.
3. **Connections and Weights**: The connections between neurons have associated weights, which determine the strength of the signal passed between them. 
4. **Activation Functions**: Neurons also use activation functions (like sigmoid or ReLU) to introduce non-linearity, allowing the network to learn complex patterns. 
5. **Learning Process**: During training, the network receives input data and repeatedly adjusts its weights and biases to minimize errors and improve performance on a given task. 

### How It Works

#### 1.Forward Propagation
- Data flows from input - hidden layers - output.
- Each neuron computes a `weighted sum of inputs` and applies an activation function:

![image-3.png](attachment:image-3.png)

| Symbol | Meaning                                                                  |
| ------ | ------------------------------------------------------------------------ |
| $x_i$  | The **input feature** or the output from a neuron in the previous layer. |
| $w_i$  | The **weight** associated with input $x_i$.                              |
| $b$    | The **bias term**, a constant that shifts the activation function.       |

| Symbol     | Meaning                                                  |
| ---------- | -------------------------------------------------------- |
| $f(\cdot)$ | The **activation function** (e.g., ReLU, Sigmoid, Tanh). |
| $a$        | The **output** of the neuron after activation.           |


#### 2.Loss Calculation
- The output is compared to the true value using a loss function (e.g., Mean Squared Error, Cross-Entropy).

#### 3.Backpropagation & Training
- The network adjusts weights and biases using Gradient Descent to reduce the loss.
- This process repeats for many epochs until the model learns optimal parameters.

### 2. Building Blocks of a Neural Network

| Component | Role | Analogy |
|-----------|------|--------|
| **Neuron (Node)** | Computes a weighted sum of inputs, adds a bias, and applies an activation function. | A tiny calculator making a decision. |
| **Input Layer** | Accepts the raw features of the data (e.g., pixel values, stock prices, sensor readings). | Data entry gate. |
| **Hidden Layers** | Intermediate layers that transform the input into more abstract representations. | Workers extracting hidden patterns. |
| **Output Layer** | Produces the final prediction (e.g., a class label or numerical value). | Final decision maker. |
| **Weights** | Determine how strongly each input influences the neuron. | Importance of each feature. |
| **Bias** | Allows the model to shift the activation threshold. | Adjustable offset. |
| **Activation Function** | Introduces non-linearity, enabling the network to learn complex, non-linear relationships. | Decision rule inside each neuron. |

### Common Activation Functions
`Activation functions` introduce non-linearity, allowing the network to model complex relationships. Without them, the network would just be equivalent to a linear model regardless of depth.

- **Sigmoid**: Outputs a value between 0 and 1, useful for probabilities.
   - Gradient diminishes for very large or very small inputs (vanishing gradient problem)
   - Often used in the output layer for binary classification (interpreted as probability)
   - Rarely used in hidden layers of deep networks due to vanishing gradients

- **Tanh**: Outputs between –1 and 1, centered around zero.
   - Zero-centered (helps optimization compared to sigmoid)
   - Steeper gradient than sigmoid
   - Still suffers from vanishing gradient for large |x|
   - Often used in hidden layers of smaller networks
   - Common in recurrent neural networks (RNNs)

- **ReLU (Rectified Linear Unit)**: Outputs `max(0, x)`, simple and efficient for deep networks.
   - Computationally efficient
   - Reduces likelihood of vanishing gradient for positive inputs
   - Very common in hidden layers of deep networks
   - Often used in convolutional neural networks (CNNs) and feedforward networks

### 4. Important Training Concepts

| Term | Meaning |
|------|--------|
| **Epoch** | One complete pass through the entire training dataset. |
| **Batch** | A subset of the dataset processed before updating the weights. |
| **Learning Rate** | Determines how large each parameter update is. Too high → unstable, too low → slow. |
| **Overfitting** | The network memorizes training data but performs poorly on new data. |
| **Regularization** | Techniques (Dropout, L1/L2 penalties, Early Stopping) to prevent overfitting. |

### 5. Types of Neural Networks

Different architectures are designed for specific data types:

| Type | Best For | Key Idea |
|------|---------|---------|
| **Feedforward Neural Network (FNN)** | Tabular or simple data | Data flows only forward from input to output. |
| **Convolutional Neural Network (CNN)** | Images, spatial data | Uses convolution filters to detect local patterns (edges, textures). |
| **Recurrent Neural Network (RNN)** | Time series, text, speech | Maintains memory of past inputs using loops. |
| **Long Short-Term Memory (LSTM)** | Long sequences | A special RNN that handles long-term dependencies. |
| **Transformer** | Language tasks, large-scale sequence modeling | Uses attention mechanisms instead of recurrence. |

### 6. Typical Neural Network Workflow

1. **Data Preparation**  
   - Clean data, normalize or standardize features, split into training/validation/test sets.

2. **Model Design**  
   - Choose the number of layers, neurons per layer, and activation functions.

3. **Training**  
   - Feed batches of data through the network.
   - Compute loss, backpropagate gradients, and update parameters.

4. **Evaluation**  
   - Test the model on unseen data to check accuracy or error.

5. **Deployment**  
   - Save the trained model and use it to make predictions on new data.

### 7. Quick Intuition

Think of a neural network as **layers of feature builders**:
- The first layers capture **simple patterns** (e.g., edges in an image, short-term trends in data).
- Deeper layers capture **complex concepts** (e.g., faces, long-term market behavior).
- The output layer combines these patterns to make a final prediction.

### Key Applications
- **Image Recognition**: Identifying objects, people, or scenes in images. 
- **Natural Language Processing (NLP)**: Understanding and processing human text, powering chatbots and translation tools. 
- **Speech Recognition**: Converting human speech into text or commands, used in virtual assistants. 
Recommendation Engines: Suggesting products or services to users based on their past activity. 
- **Time Series Prediction**: Forecasting future trends in data like stock prices or weather. 