# Deep Learning

- Deep Learning is a Subset of Machine Learning
- Makes use of Neural Network Models with more than 3 layers

## Applications

- Natural Lanugae Processing (NLP)
- Speech recognition & Synthesis
- Image recognition
- Automated AI machinery

## Linear Regression

Linear Regression is one of the key foundations of Deep Learning. It is a linear model that ecplains the relationship between two or more variables. 
Linear Regression is a statistical method that allows us to study relationships between two continuous (quantitative) variables:

1. One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
2. The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

The equation `y = ax + b` is the mathematical way of representing this relationship:

- `y` is the dependent variable we're trying to predict or estimate.
- `x` is the independent variable we're using to make the prediction.
- `a` is the slope of the regression line, representing the effect x has on y. If `a` is positive, as `x` increases, so does `y`. If `a` is negative, as `x` increases, `y` decreases.
- `b` is the y-intercept, representing the baseline value of `y` when `x` is 0.

Let's consider a fun, real-world example: predicting the amount of ice cream sold (y) based on the temperature outside (x). 

- `y` (dependent variable) is the amount of ice cream sold.
- `x` (independent variable) is the temperature outside.
- `a` (slope) represents how much the ice cream sales increase for each degree increase in temperature. If `a` is positive, it means that as the temperature increases, ice cream sales also increase.
- `b` (intercept) represents the amount of ice cream sold when the temperature is 0 degrees.

So, if we have the equation `y = 50x + 1000`, it means that for each degree increase in temperature, we sell 50 more ice creams, and even if the temperature is 0 degrees, we still sell 1000 ice creams (maybe because of die-hard ice cream fans!).


### Logistic Regression

Logistic Regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable based on one or more predictor variables. The outcome is binary, meaning it only has two possible outcomes.

The equation `y = f(ax + b)` is a generalized way of representing the logistic regression model. Here:

- `y` is the dependent variable we're trying to predict or estimate. In the context of logistic regression, `y` is a binary variable, meaning it can take on only two possible outcomes, often denoted as 0 and 1.
- `x` is the independent variable we're using to make the prediction.
- `a` is the coefficient of the independent variable. It represents the effect `x` has on the log-odds of `y`. If `a` is positive, as `x` increases, the log-odds of `y` being 1 increases. If `a` is negative, as `x` increases, the log-odds of `y` being 1 decreases.
- `b` is the intercept, representing the baseline log-odds of `y` when `x` is 0.
- `f` is the logistic function, also known as the sigmoid function. It maps any real-valued number into another value between 0 and 1. It is defined as `f(z) = 1 / (1 + e^-z)`, where `z` is the input to the function (in this case, `ax + b`), and `e` is the base of natural logarithms.

So, the logistic regression model can be written as `y = 1 / (1 + e^-(ax + b))`. This equation gives us the probability of `y` being 1 given `x`.

For example, in a spam detection model, `y` could be the probability of an email being spam, and `x` could be the number of times the word "free" appears in the email. The coefficients `a` and `b` are learned from the data during the training process.between 0 and 1.

## Deep Learning Process

Deep Learning is a subset of machine learning that uses neural networks with multiple layers (also known as deep neural networks) to model and understand complex patterns and relationships in data. Here's a simplified overview of the deep learning process:

1. **Initialization**: The process begins with the initialization of the neural network. The weights and biases of the network are usually initialized with small random numbers. This randomness helps break symmetry and allows the network to learn from its errors.

2. **Forward Propagation**: The input data is fed into the network. It passes through each layer, with each layer performing computations using the current weights and biases, and passes the result to the next layer. This continues until the output layer is reached, which gives the final prediction of the network.

3. **Loss Calculation**: A loss function is used to measure the difference between the network's prediction and the actual target value. The goal of the network is to minimize this loss.

4. **Backward Propagation (Backpropagation)**: This is where the network learns from its errors. The gradient of the loss function is calculated with respect to the weights and biases of the network. This gradient indicates how much a small change in the weights and biases would affect the loss.

5. **Gradient Descent**: The weights and biases of the network are then updated in the opposite direction of the gradient. This is done iteratively in small steps, with the size of the steps determined by the learning rate. This process helps the network find the set of weights and biases that minimize the loss.

6. **Iteration**: Steps 2-5 are repeated for a number of iterations or epochs, using all or a portion of the training data each time. With each iteration, the network should get better at minimizing the loss and making accurate predictions.

7. **Evaluation**: Once the network has been trained, it's important to evaluate its performance on unseen data. This is done using a separate test dataset. The network's performance on the test data gives a good indication of how well it has learned to generalize from the training data.

8. **Tuning**: If the network's performance is not satisfactory, the process can be repeated with different hyperparameters (like the number of layers, the number of neurons in each layer, the learning rate, etc.), or with different architectures altogether.

Here's a simple pseudocode representation of the process:

```python
# Initialize the network with random weights and biases
network = initialize_network()

# Set the number of iterations
epochs = 100

# Start the learning process
for i in range(epochs):
    # Forward propagation
    outputs = forward_propagation(network, inputs)
    
    # Calculate the loss
    loss = calculate_loss(outputs, targets)
    
    # Backward propagation
    gradients = backward_propagation(network, loss)
    
    # Update the weights and biases
    network = update_weights(network, gradients)

# Evaluate the network
accuracy = evaluate(network, test_inputs, test_targets)

# Print the accuracy
print(f'Accuracy: {accuracy}%')
```

This is a very simplified version of the process, and actual implementations can be much more complex, but it gives a good overview of the basic steps involved in deep learning.

## Perceptron

The Perceptron is a simple algorithm suitable for binary classification tasks, which is a type of linear classifier used in machine learning and artificial intelligence. It's a single-layer neural network that serves as the foundation for more complex neural network models used in deep learning. You can think of it as a cell in the human brain, a single neural network.

The Perceptron model takes an input, aggregates it (weighted sum), and returns 1 if the aggregated sum is more than some threshold and -1 otherwise. The weights in the aggregation are learned and updated during the training process.

Here's a simplified representation of a Perceptron:

```
y = f(w1*x1 + w2*x2 + ... + wn*xn + b)
```

Where:

- `y` is the output (prediction).
- `f` is the activation function, which in the simplest case is a step function that returns 1 if the input is greater than 0, and -1 otherwise.
- `w1, w2, ..., wn` are the weights.
- `x1, x2, ..., xn` are the inputs.
- `b` is the bias, a constant term that doesn't depend on any input value.



## Artificial Neural Network

An Artificial Neural Network (ANN) is essentially a collection of perceptrons (or neurons) connected in a way that allows them to process complex data. Here's a step-by-step process of how we can use perceptrons to build an ANN:

1. **Define the Architecture**: Decide on the number of input nodes (equal to the number of features), the number of hidden layers, the number of nodes in each hidden layer, and the number of output nodes (equal to the number of classes for classification tasks).

2. **Initialize Weights and Biases**: For each connection between the nodes (perceptrons), initialize a weight and bias with a small random value.

3. **Forward Propagation**: For each layer, calculate the weighted sum of inputs and the bias (just like in a single perceptron), and pass it through an activation function. The output of each node in the current layer serves as the input to the nodes in the next layer.

4. **Calculate Loss**: Use a loss function to measure the difference between the network's final output and the actual target value.

5. **Backward Propagation**: Calculate the gradient of the loss function with respect to each weight and bias in the network. This involves applying the chain rule to propagate the error backward through the network (hence the name backpropagation).

6. **Update Weights and Biases**: Adjust the weights and biases in the direction that reduces the loss. This is typically done using a method called gradient descent.

7. **Iterate**: Repeat the process of forward propagation, loss calculation, backpropagation, and updating weights for a set number of iterations or until the network's performance is satisfactory.

Here's a simple pseudocode representation of an ANN with one hidden layer:

```python
class NeuralNetwork:
    def __init__(self, input_nodes, hidden_nodes, output_nodes):
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights and biases with random values
        self.weights_input_to_hidden = initialize_weights(self.input_nodes, self.hidden_nodes)
        self.weights_hidden_to_output = initialize_weights(self.hidden_nodes, self.output_nodes)
        self.bias_hidden = initialize_bias(self.hidden_nodes)
        self.bias_output = initialize_bias(self.output_nodes)

    def forward_propagation(self, inputs):
        # Calculate the output of the hidden layer
        hidden_layer_input = dot_product(inputs, self.weights_input_to_hidden) + self.bias_hidden
        hidden_layer_output = activation_function(hidden_layer_input)

        # Calculate the output of the output layer
        output_layer_input = dot_product(hidden_layer_output, self.weights_hidden_to_output) + self.bias_output
        output_layer_output = activation_function(output_layer_input)

        return output_layer_output

    def backward_propagation(self, inputs, targets, output):
        # Calculate the error
        error = targets - output

        # Calculate the gradient of the weights and biases in the output layer
        output_gradient = calculate_gradient(error, output)

        # Calculate the error in the hidden layer
        hidden_error = dot_product(output_gradient, transpose(self.weights_hidden_to_output))

        # Calculate the gradient of the weights and biases in the hidden layer
        hidden_gradient = calculate_gradient(hidden_error, inputs)

        return output_gradient, hidden_gradient

    def update_weights_and_biases(self, output_gradient, hidden_gradient, learning_rate):
        # Update the weights and biases in the output layer
        self.weights_hidden_to_output += learning_rate * output_gradient
        self.bias_output += learning_rate * sum(output_gradient)

        # Update the weights and biases in the hidden layer
        self.weights_input_to_hidden += learning_rate * hidden_gradient
        self.bias_hidden += learning_rate * sum(hidden_gradient)

    def train(self, inputs, targets, epochs, learning_rate):
        for i in range(epochs):
            # Forward propagation
            output = self.forward_propagation(inputs)

            # Backward propagation
            output_gradient, hidden_gradient = self.backward_propagation(inputs, targets, output)

            # Update weights and biases
            self.update_weights_and_biases(output_gradient, hidden_gradient, learning_rate)
```

This is a very simplified version of an ANN, and actual implementations can be much more complex. Also, note that the activation function, the loss function, and the method for initializing weights and biases can vary depending on the specific requirements of the task.