# Chapter 1: Introducing Neural Networks

Artificial Neural Networks or Neural Networks ('artificial' has been dropped recently) are a shared part of ML and DL.

A deep NN has two or more hidden layers. 

Most NNs in use are a form of deep learning.

<center><img src='./image/1-1-ai-complete-graph.jpeg' style='width: 80%'/></center>

## 1.1. A Brief History

In the 1940s, NNs were conceived.

In the 1960s, the concept of backpropagation came, then people know how to train them.

In 2010, NNs started winning competitions and get much attention than before.

Since 2010, NNs have been on a meteoric rise as their magical ability to solve problems previously deemed unsolvable (i.e., image captioning, language translation, audio and video synthesis, and more).

Currently, NNs are the primary solution to most competitions and technological challenges like self-driving cars, calculating risk, detecting fraud, early cancer detection,…

## 1.2. What is a Neural Network?

ANNs are inspired by the organic brain, translated to the computer.

ANNs have neurons, activations, and interconnectivities.

NNs are considered “black boxes” between inputs and outputs.

<center><img src='./image/1-2-example-nn.png' style='width: 60%'/></center>

Each connection between neurons has a weight associated with it. Weights are multiplied by corresponding input values. These multiplications flow into the neuron and are summed before being added with a bias. Weights and biases are trainable or tunable.

$$
\begin{aligned}
output & = weight \cdot input + bias \\
y & = a \cdot x + b
\end{aligned}
$$

Adjusting the weight will impact the slope of the function.

<center><img src='./image/1-3-adjust-weight.png' style='width: 60%'/></center>

<center><img src='./image/1-4-adjust-weight.png' style='width: 60%'/></center>

<center><img src='./image/1-5-adjust-weight.png' style='width: 60%'/></center>

The bias offsets the overall function.

<center><img src='./image/1-6-adjust-bias.png' style='width: 60%'/></center>

<center><img src='./image/1-7-adjust-bias.png' style='width: 60%'/></center>

Weights and biases impact the outputs of neurons in slightly different ways.

Then, an activation function is applied to the output.

$$
\begin{aligned}
output & = \sum (weight \cdot input) + bias \\
output & = activation (output)
\end{aligned}
$$

When a step function that mimics a neuron in the brain (i.e., “firing” or not, on-off switch) is used as an activation function:
- If its output is greater than 0, the neuron fires (it would output 1).
- If its output is less than 0, the neuron does not fire and would pass along a 0.

NNs of today tend to use more informative activation functions (rather than a step function), such as Rectified Linear (ReLU) activation function.

Example basic neural networks:

<center><img src='./image/1-8-basic-nn.png' style='width: 60%'/></center>

The input layer represents the actual input data (i.e., pixel values from an image, temperature, …)

- The data can be “raw”, should be preprocessed like normalization and scaling. 
- The input needs to be in numeric form.

The output layer is whatever the NN returns.
- In classification, the class of the input is predicted, the output layer has as many neurons as the training dataset has classes. But can also have a single output neuron for binary (two classes) classification.

For example, our goal is to classify a collection of pictures as a “dog” or “cat”:

- The output layer has two neurons, one associated with “dog” and one associated with “cat”.
- Can have just a single output neuron that is “dog” or “not dog”.

<center><img src='./image/1-9-nn-dog.png' style='width: 60%'/></center>

<center><img src='./image/1-10-nn-dog.png' style='width: 60%'/></center>

The math involved makes NNs appear challenging and how scary it can sometimes look.

The full formula for the forward pass of an example NN model:

<center><img src='./image/1-11-loss-function.png' style='width: 90%'/></center>

This function can also be represented in nested python functions like:

<center><img src='./image/1-12-loss-function-code.png' style='width: 80%'/></center>

High school algebra is enough to understand:
- A log function
- A sum operation
- An exponentiating operation
- A dot product
- Transpose

A typical NN has thousands or even up to millions of adjustable parameters (weights and biases).

NNs act as enormous functions with vast numbers of parameters.

Finding the combination of parameter (weight and bias) values is the challenging part.

The end goal for NNs is to adjust their weights and biases (the parameters), so they produce the desired output for unseen data.

A major issue in supervised learning is overfitting, where the algorithm doesn’t understand underlying input-output dependencies, just basically “memorizes” the training data.

The goal of NN is generalization, that can be obtained when separating the data into training data and validation data.

Weights and biases are adjusted based on the error/loss presenting how “wrong” the algorithm in NN predicting the output.

NNs can be used for regression (predict a scalar, singular, value), clustering (assigned unstructured data into groups), and many other tasks.




