# Introduction to Deep Learning

Deep learning is a subfield of machine learning that has gained significant attention in recent years due to its ability to tackle complex problems with large amounts of data. The motivation behind deep learning stems from the desire to replicate the human mind's ability to learn and make decisions.

![image.png](attachment:image.png)

### Motivation - Human Mind vs. Computer
Let's begin with the motivation behind deep learning and how it relates to the capabilities of the human mind compared to computers.

### Human Mind vs. Computer

When it comes to processing information, the human mind has extraordinary capabilities. We can recognize objects, understand language, and make complex decisions effortlessly. On the other hand, computers have traditionally struggled with tasks that come naturally to us. This discrepancy has driven researchers to develop algorithms that mimic the brain's neural networks, leading to the emergence of deep learning.

![image-2.png](attachment:image-2.png)

**Deep learning** is a subset of machine learning that focuses on building and training neural networks with multiple layers. These neural networks are inspired by the structure and function of the human brain, with each layer processing and transforming data in a hierarchical manner. Deep learning algorithms learn from vast amounts of labeled data to make predictions or decisions.

![image.png](attachment:image.png)


*Making Precise Rules is Difficult*

One of the key challenges in traditional programming is defining precise rules for complex tasks. For example, designing an algorithm to identify handwritten digits would require explicitly specifying rules for every possible variation in handwriting. This process becomes increasingly difficult and time-consuming as the complexity of the task increases. Deep learning, however, offers an alternative approach.

![image.png](attachment:image.png)

## Data

A crucial component of deep learning is the availability of large and labeled datasets. These datasets serve as the training data for deep learning algorithms, allowing them to learn patterns and make accurate predictions. Datasets can vary based on the problem domain, such as images, text, audio, or a combination of different data types.

Data is a crucial component of deep learning. A dataset refers to a collection of examples or instances used to train, validate, and test deep learning models. Datasets can include images, text, audio, video, or any other type of information relevant to the task at hand.

### 1. Training Dataset
The training dataset is used to teach the deep learning model to recognize patterns and make accurate predictions. It consists of a large set of labeled examples, where each example is paired with its corresponding correct output or target value. During training, the model adjusts its parameters based on the differences between predicted outputs and the true outputs.

### 2. Validation Dataset
The validation dataset is used to tune the hyperparameters of the deep learning model. Hyperparameters are settings that are not learned from the data but affect the learning process. By evaluating the model's performance on the validation dataset, one can make adjustments to hyperparameters such as learning rate, network architecture, or regularization techniques.

### 3. Test Dataset
The test dataset is used to evaluate the final performance of the trained deep learning model. It contains examples that were not seen during training or validation and is used to assess how well the model generalizes to new, unseen data. The test dataset provides an unbiased measure of the model's performance.

Deep learning relies on high-quality datasets for effective training and evaluation. The availability of large, labeled datasets has contributed to the success of deep learning in various domains.




 ## Neural Networks

![image.png](attachment:image.png)

In this lecture, we will explore the fundamental concepts of neural networks, their architecture, and how they can be used for solving complex problems. Let's begin by understanding the difference between artificial neurons and biological neurons.

### Artificial Neuron vs. Biological Neuron

Artificial neurons, also known as **PERCEPTRONS**, are the building blocks of neural networks. they are inspired by the structure and function of biological neurons,
Biological neurons are the fundamental units of the human brain and are responsible for processing and transmitting information. Artificial neurons, on the other hand, are mathematical models that simulate the behavior of biological neurons to process and transform data in neural networks.

![image.png](attachment:image.png)

### Perceptron

**A Perceptron or artificial neuron**, is a computational unit that takes multiple inputs, performs a weighted sum of these inputs, applies an activation function, and produces an output. The weights associated with each input determine the importance or contribution of that input to the output. The activation function introduces non-linearity, allowing the neural network to learn complex relationships between inputs and outputs.

Let's consider an example to understand the functioning of an artificial neuron. Suppose we want to predict whether a person will buy a shirt based on three input features: color, sleeves, and fabric. The color can be blue or not, the sleeves can be full or half, and the fabric can be cotton or not. The output will indicate whether the person will buy the shirt or not.

![image.png](attachment:image.png)

We assign weights to each input feature to indicate their significance. For example, let's assume the weights for color, sleeves, and fabric are 7, 4, and 2, respectively. Additionally, we set a threshold of 8 for the perceptron.

![image.png](attachment:image.png)

we present three different possible variations of choices for color, sleeves, and fabric, along with the corresponding output:


| Color    | Sleeves | Fabric      | Output  |
|----------|---------|-------------|---------|
| Blue     | Full    | Cotton      | Buy     |
| Not Blue | Half    | Cotton      | Not Buy |
| Not Blue | Full    | Not Cotton  | Not Buy     |

it's important to note that the inputs do not have to be binary, and the weights assigned to each input determine their significance in the decision-making process.

![image.png](attachment:image.png)

## Training Perceptron with Python

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()

In [None]:
from sklearn.linear_model import Perceptron
per_clf = Perceptron(random_state=42)
per_clf.fit(X, y)


per_clf.coef_
per_clf.intercept_

Multiple perceptrons can be stacked together to form a **neural network.** Each perceptron in the network processes a subset of inputs and contributes to the final output. This stacking allows the neural network to model complex relationships and make more accurate predictions.

In neural networks, there are two types of stacking: ]

**parallel** and **sequential**. 

- In parallel stacking, multiple perceptrons receive the same inputs and produce individual outputs.With parallel stacking we can get multiple outputs with the same input 
- In sequential stacking, the output of one perceptron serves as input for the next perceptron, creating a sequential flow of information through the network.

![image.png](attachment:image.png)

## Paralell 

![image.png](attachment:image.png)

## Sequential

# Neural Network Nomenclature

![image.png](attachment:image.png)

**Input Layer**: The first layer of the neural network that receives the initial inputs.

**Hidden Layers**: Intermediate layers between the input and output layers that perform computations.

**Output Layer**: The final layer of the neural network that produces the desired output.



A **Feed-Forward Network** is a type of neural network where information flows in one direction, from the input layer to the output layer. It does not have loops or cycles in its connections, ensuring that the output is solely based on the current input.

A **Fully Connected Network**, also known as a dense network, is a type of neural network where each neuron in one layer is connected to every neuron in the subsequent layer. This connectivity allows information to flow freely between layers, enabling the network to capture complex relationships.

The depth of a neural network refers to the number of layers it contains. A deeper network, with more layers, has the ability to capture and model increasingly complex relationships in the data. Each additional layer introduces new levels of abstraction, enabling the network to learn and extract higher-level features from the input data.

![image.png](attachment:image.png)

Now, we will explore the inner workings of a neural network, including the propagation of information through its layers and the process of updating the network's parameters using stochastic gradient descent. Let's begin by understanding the fundamental operations of a neural network.

## Neural Network Parameters

Neural networks are composed of interconnected layers of artificial neurons, and they have several important parameters that influence their behavior and performance. Let's explore these parameters:

- Weights: Each connection between perceptron in a neural network has a weight associated with it. These weights determine the strength or importance of the connection. During training, the neural network adjusts these weights to optimize its performance.

- Biases: Each perceptron in a neural network has an associated bias term. The bias acts as an offset, allowing the network to introduce flexibility and adaptability in its decision-making process.

![image.png](attachment:image.png)

### Neural Network Operations

A neural network consists of three primary operations:

- **Forward Propagation**: During forward propagation, input data is fed into the network, and it flows through the layers, undergoing computations and transformations. Each neuron in a layer receives inputs from the previous layer, applies an activation function, and produces an output that serves as input for the next layer.

- **Loss Computation**: Once the data passes through the network and reaches the output layer, a loss function is used to measure the discrepancy between the predicted output and the true output. The loss function quantifies how well the network is performing and provides a measure of the error.

- **Backward Propagation**: Backward propagation, also known as backpropagation, is the process of computing gradients of the loss function with respect to the parameters of the network. These gradients are then used to update the parameters iteratively through an optimization algorithm called **stochastic gradient descent.**

### Stochastic Gradient Descent

Stochastic gradient descent (SGD) is an optimization algorithm commonly used to train neural networks. It involves updating the network's parameters by iteratively adjusting them in the direction that minimizes the loss function. The "stochastic" aspect of SGD refers to the fact that the updates are performed on randomly selected subsets of the training data, known as mini-batches, rather than the entire dataset.

![image.png](attachment:image.png)

SGD works by iteratively updating the weights of the neural network based on the gradients of the loss function with respect to the weights. It performs the following steps:

1. Forward Pass: The network takes an input and performs a forward pass, calculating the predicted output.

2. Loss Calculation: The difference between the predicted output and the true output is calculated using a loss function, such as mean squared error or cross-entropy loss.

3. Backward Pass (Backpropagation): The gradients of the loss function with respect to the weights are calculated using the chain rule of derivatives. These gradients indicate the direction and magnitude of the weight updates required to reduce the loss.

4. Weight Update: The weights are updated by subtracting a fraction of the gradients multiplied by the learning rate. The learning rate determines the step size of each update.

5. Repeat: Steps 1 to 4 are repeated for multiple iterations or epochs until the network converges or a stopping criterion is met.

![image.png](attachment:image.png)

we explored how a neural network works, covering the operations of forward propagation, loss computation, and backward propagation. We then discussed stochastic gradient descent as an optimization algorithm used to update the network's parameters. SGD offers benefits such as efficiency, generalization, and scalability, making it a widely used technique in training neural networks.

![image.png](attachment:image.png)

**next**: we will discuss the importance of activation functions in neural networks and explore the different types of activation functions commonly used

### Why do we use Activation Functions?

Activation functions play a crucial role in neural networks for introducing non-linearity and enabling the network to learn complex patterns and relationships in the data. Without activation functions, a neural network would be reduced to a series of linear transformations, limiting its ability to model and solve non-linear problems effectively.

### Different Types of Activation Functions

There are several types of activation functions used in neural networks. Some common ones include:

1. Sigmoid Activation Function: The sigmoid function maps the input to a value between 0 and 1, providing a smooth and bounded non-linearity. It is often used in the output layer for binary classification tasks.

2. Rectified Linear Unit (ReLU): ReLU activation sets all negative values to zero and leaves positive values unchanged. ReLU is widely used in hidden layers due to its simplicity and effectiveness in handling vanishing gradient problems.

3. Hyperbolic Tangent (tanh): Similar to the sigmoid function, tanh also maps the input to a range between -1 and 1. It offers a symmetric non-linearity and is suitable for outputs or hidden layers.

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)

### Activation Functions for Hidden and Output Layers

In neural networks, the choice of activation functions can vary between the hidden and output layers. Different activation functions may be selected based on the specific requirements of the problem at hand. However, it is common to use the same activation function in all hidden layers for consistency.

### Epoch

In the context of neural networks, an epoch refers to a complete pass through the entire training dataset during the training process. In each epoch, the network goes through forward propagation, loss computation, backward propagation, and parameter updates. Multiple epochs are typically required to train a neural network adequately.

## Hyperparameters

#### Classification Hyperparameters

Classification hyperparameters are settings that control the behavior and performance of the classification algorithm. Examples include:

- Learning Rate: The step size at which the network's parameters are updated during training.

- Number of Hidden Layers: The quantity and configuration of hidden layers in the network.

- Number of Neurons per Layer: The number of artificial neurons in each layer of the network.

### Regression Hyperparameters

Regression hyperparameters are settings that influence the behavior and performance of the regression algorithm. Examples include:

- Learning Rate: The step size at which the network's parameters are updated during training.

- Number of Hidden Layers: The quantity and configuration of hidden layers in the network.

- Number of Neurons per Layer: The number of artificial neurons in each layer of the network.

- Regularization Parameters: Parameters that control the extent of regularization applied to prevent overfitting.

| Hyperparameter              | Typical value                  |
|----------------------------|--------------------------------|
| # input neurons             | One per input feature          |
| # hidden layers             | Depends on the problem, but typically 1 to 5 |
| # neurons per hidden layer  | Depends on the problem, but typically 10 to 100 |
| # output neurons            | 1 per prediction dimension     |
| Hidden activation           | ReLU                           |
| Output activation           | None                           |
| Loss function               | MSE                            |
