# Deep Learning Introduction

## Requirements

This topic requires the following concepts/tools to be experienced either first or during to better follow with the progression of our lessons:
* **Machine Learning**
* **Python ML Libraries (Numpy, Pandas, Scikit-learn)**
* **Keras**
* **TensorFlow**

## Introduction

**Deep learning** is a *subset of machine learning* dealing with the building of *neural networks*. **Neural networks** attempts to mimic human intelligence by imitating how the human brain works and how ideas and information are organized and processed inside it. *Neural networks* that have *three or more layers* are usually referred to as **deep learning networks**. This subset of machine learning had achieved exponential growth during this decade, and has powered advances in *large-scale data processing* and *inference*. Deep learning can be applied in *natural language processing (NLP)*, *speech recognition*, *image recognition*, *maneuver of self-driving cars*, and the creation of complex programming algorithms for *customer experience, healthcare, and robotics*.

## Linear Regression in Deep Learning

**Linear regression** is a basic statistical concept common in machine learning application uses. It also serves as one of the fundamentals/founcations of deep learning.

In *linear regression*, we have two categories of variables which are (1) **x**, the *independent variable*, and (2) **y**, the *dependent variable*. Our model provides an equation to solve the value of *y* based from the value of *x*. To compute this, we require *two constants* which are (1) **a**, the *slope*, and (2) **b**, the *intercept*. The formula is shown below. 

**Linear Regression Formula**:
* *y = ax + b*
* *y = a1x1 + a2x2... an+xn + b*

Note that the formula provides an output solely based from the linear relationship between *y* and *x*, although in realistic situations linear relationships between the independent and dependent variables are often rare to occur and because of this *errors does occur in the prediction of values*.

Linear regression is used in **solving problems that require the *prediction of continues variables***, often numerical in value, and can also be used in occasions where there are *multiple independent variables*.

In machine learning and for this case deep learning, **building linear regression models** meant creating a mathematical model that calculates and determines the values of the formula constants, *slope (a)* and *intercept (b)* that defines the linear relationship between our *independent variable/s (x)* and *dependent variable/s (y)*. This is accomplished through the help of defined values of *x* and *y*. This model would then in turn be used to predict the value of our dependent variable, although this would get more complicated if there are multiple independent variables.

## Logistic Regeression in Deep Learning

**Logistic regression** is a binary model that defines the relationship between two variables. The **output *(y)*** in this case is either a **zero/one**. While similar in a way with that of linear regression, we use an **activation function *(f)*** to conver the continuous variable coming out of the *linear regression formula* into a **bolean value (0/1)**. This can also be use for situations where there are multiple independent variables, but just like that of linear regression, it gets *more complicated*.

**Logistic Regression Formula**:
* *y = f(ax + b)*
* *y = f(a1x1 + a2x2... an+xn + b)*

## Perceptron in Deep Learning

**Perceptron** or **node** is the *single unit of learning / building blocks* of a *neural network*, based on the concept of *logistic regression*. Multiple inputs are fed into the perceptron for calculation and outputs a *boolean result (1/0)*. The formula for the perceptron/node is essential in deep learning.

The formula for the perceptron/node is based from the logistic regression formula but is modified to reflect the nature of machine learning. The variable *slope (a)* is replaced by that of **weight (w)**, and the variable *intercept (b)* is  replaced by that of **bias (b)**. The formula is shown below:

**Perceptron Formula *(based from Logistic Regression)***:
* *y = f(wx + b)*
* *y = f(w1x1 + w2x2... wn+xn + b)*

Below is a visual illustration of the inner structure of a perceptron/node, further explaining how it works as whole:
* *First, our perceptrons receive multiple inputs, such as our independent variables (x), and a constant 1.*
* *Second, these inputs would then be calculated with their corresponding weights and bias.*
* *Third, the sum of these results would then be collected.*
* *Fourth, the sum would then run through an activation function.*
* *Lastly, the result of these would be our dependent variable (y), which is either a 1/0.*

![]((Media)/1.PNG)

## Artificial Neural Network in Deep Learning

An **artificial neural network (*ANN*)** is a *network of perceptrons or nodes*. In a neural network, nodes are organized into multiple layers categorically known as (1) the **input layer**, (2) the **hidden layers**, (3) and the **output layer**. It is also good to note that a neural network usually has **a minimum of 3 layers**.

![]((Media)/2.PNG)

Nodes within a layer *does not connect within each other*, but instead *connect to all nodes in the preceding and succeeding layers*. It is also important to note that individual nodes have their *own individual and often unique values for their weights and biases*. The number of layers and nodes depends on the use case, and may also be affected by the developer's experience.

## Training an Artificial Neural Network

An artificial neural network is created through a *model training process*.  A **neural network model** is represented by a *set of parameters* and *hyperparameters*,including but not limited to the *values of weights and biases*, and the *number of nodes and layers*.

**Training the neural network** is defined as *determining the correct values for the model's parameters and hyperparameters* such that it *maximizes the accuracy and/or performance of the model's prediction* in the relevant use case.

The **training process** consists of the following steps as listed below:
* **Use training data** which consists of known values of inputs and outputs.
* **Create network architecture** based on your *intuition*.
* Start with **random values** for our *weights* and *biases*.
* **Minimize prediction error** through *iterations of parameters/hyperparameter value adjustment*
* **Save model** for integration in relevant architectures.