# Neural Network Definition
(https://skymind.ai/wiki/neural-network)

- Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. 
- They interpret sensory data through a kind of machine perception, labeling or clustering raw input. 
- The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text or time series, must be translated.

- Neural networks help us **cluster** and **classify**.
    - You can think of them as a clustering and classification layer on top of the data you store and manage. 
    - They help to group unlabeled data according to similarities among the example inputs, and they classify data when they have a labeled dataset to train on. 
    - (Neural networks can also extract features that are fed to other algorithms for clustering and classification; so you can think of deep neural networks as components of larger machine-learning applications involving algorithms for reinforcement learning, classification and regression.)
    
![image.png](attachment:image.png)

- What kind of problems does deep learning solve, and more importantly, can it solve yours? To know the answer, you need to ask questions:

- What outcomes do I care about? Those outcomes are labels that could be applied to data: 
    - for example, **spam** or **not_spam** in an email filter, **good_guy** or **bad_guy** in fraud detection, **angry_customer** or **happy_customer** in customer relationship management.

- Do I have the data to accompany those labels? 
    - That is, can I find labeled data, or can I create a labeled dataset (with a service like **AWS Mechanical Turk** or Figure Eight or Mighty.ai) where spam has been labeled as spam, in order to teach an algorithm the correlation between labels and inputs?
    
![image.png](attachment:image.png)

# A Few Concrete Examples

Deep learning maps **inputs** to **outputs**. The model finds **correlations** between them. 

- It is known as a “universal approximator”, because it can learn to approximate an unknown function **$f(x) = y$** between any **input $x$** and any **output $y$**, assuming they are related at all (by correlation or causation, for example). 

- In the process of learning, a neural network finds the right $f$, or the correct manner of **transforming $x$ into $y$**, whether that be f(x) = 3x + 12 or f(x) = 9x - 0.1. Here are a few examples of what deep learning can do.

# Classification

All classification tasks depend upon labeled datasets; that is, humans must transfer their knowledge to the dataset in order for a neural network to learn the correlation between labels and data. This is known as [supervised learning](https://skymind.ai/wiki/supervised-learning).
- Detect faces, identify people in images, recognize facial expressions (angry, joyful)
- Identify objects in images (stop signs, pedestrians, lane markers…)
- Recognize gestures in video
- Detect voices, identify speakers, transcribe speech to text, recognize sentiment in voices
- Classify text as spam (in emails), or fraudulent (in insurance claims); recognize sentiment in text (customer feedback)

![image.png](attachment:image.png)

# Clustering

Clustering or grouping is the detection of similarities. Deep learning does not require labels to detect similarities. Learning without labels is called [unsupervised learning](https://skymind.ai/wiki/unsupervised-learning). Unlabeled data is the majority of data in the world. One law of machine learning is: the more data an algorithm can train on, the more accurate it will be. Therefore, unsupervised learning has the potential to produce highly accurate models.

- **Search**: Comparing documents, images or sounds to surface similar items.
- **Anomaly detection**: The flipside of detecting similarities is detecting anomalies, or unusual behavior. In many cases, unusual behavior correlates highly with things you want to detect and prevent, such as fraud.

![image.png](attachment:image.png)

# Predictive Analytics: Regressions

With classification, deep learning is able to establish correlations between, say, pixels in an image and the name of a person. You might call this a static prediction. 

By the same token, exposed to enough of the right data, deep learning is able to establish **correlations between present events and future events**. It can run regression between the past and the future. 

The **future event** is like the label in a sense. Deep learning doesn’t necessarily care about time, or the fact that something hasn’t happened yet. Given a time series, deep learning may read a string of number and **predict the number most likely to occur next**.

- Hardware breakdowns (data centers, manufacturing, transport)
- Health breakdowns (strokes, heart attacks based on vital stats and data from wearables)
- Customer churn (predicting the likelihood that a customer will leave, based on web activity and metadata)
- Employee turnover (ditto, but for employees)

# Neural Network Elements

Deep learning is the name we use for **“stacked neural networks”**; that is, networks composed of several layers.

- The layers are made of **nodes**. A **node** is just a place where computation happens, loosely patterned on a neuron in the human brain, which fires when it encounters sufficient stimuli. 
- A node combines input from the data with **a set of coefficients**, or **weights**, that either amplify or dampen that input, thereby assigning significance to inputs with regard to the task the algorithm is trying to learn; e.g. which input is most helpful is classifying data without error? 
- These input-weight products are summed and then the sum is passed through a node’s so-called **activation function**, to determine whether and to what extent that signal should progress further through the network to affect the ultimate outcome, say, an act of classification. If the signals passes through, the neuron has been “activated.”

Here’s a diagram of what one node might look like.
![image.png](attachment:image.png)

1. First, it adds up the value of every neurons from the previous column it is connected to. On the Figure, there are 3 inputs (x1, x2, x3) coming to the neuron, so 3 neurons of the previous column are connected to our neuron.

2. This value is multiplied, before being added, by another variable called “weight” (w1, w2, w3) which determines the connection between the two neurons. Each connection of neurons has its own weight, and those are the only values that will be modified during the learning process.
- Moreover, a bias value may be added to the total value calculated. It is not a value coming from a specific neuron and is chosen before the learning phase, but can be useful for the network.

3. After all those summations, the neuron finally applies a function called “activation function” to the obtained value.


![image.png](attachment:image.png)

The so-called activation function usually serves to turn the total value calculated before to a number between 0 and 1 (done for example by a sigmoid function shown by Figure). Other function exist and may change the limits of our function, but keeps the same aim of limiting the value.

A node layer is a row of those neuron-like switches that turn on or off as the input is fed through the net. Each layer’s output is simultaneously the subsequent layer’s input, starting from an initial input layer receiving your data.
![image.png](attachment:image.png)
Pairing the model’s adjustable weights with input features is how we assign significance to those features with regard to how the neural network classifies and clusters input.

# Key Concepts of Deep Neural Networks

**Deep-learning networks** are distinguished from the more commonplace single-hidden-layer neural networks by their depth; that is, the number of node layers through which data must pass in a multistep process of pattern recognition.

Earlier versions of neural networks such as the first [perceptrons](https://skymind.ai/wiki/multilayer-perceptron) were shallow, composed of one input and one output layer, and at most one hidden layer in between. 

More than three layers (including input and output) qualifies as “deep” learning. So deep is not just a buzzword to make algorithms seem like they read Sartre and listen to bands you haven’t heard of yet. It is a strictly defined term that means more than one hidden layer.

In deep-learning networks, **each layer of nodes trains on a distinct set of features** based on the previous layer’s output. 

The further you advance into the neural net, the more complex the features your nodes can recognize, since they aggregate and recombine features from the previous layer.

![image.png](attachment:image.png)

This is known as **feature hierarchy**, and it is a hierarchy of increasing complexity and abstraction. It makes deep-learning networks capable of handling very large, high-dimensional data sets with billions of parameters that pass through nonlinear functions.

Above all, these neural nets are capable of discovering latent structures within **unlabeled**, **unstructured data**, which is the vast majority of data in the world. 
-  Another word for unstructured data is raw media; i.e. pictures, texts, video and audio recordings. 

Therefore, one of the problems deep learning solves best is in processing and clustering the world’s raw, unlabeled media, discerning similarities and anomalies in data that no human has organized in a relational database or ever put a name to.

For example, deep learning can take a million images, and cluster them according to their similarities: 
- cats in one corner, ice breakers in another, and in a third all the photos of your grandmother.

This is the basis of so-called smart photo albums.

Deep-learning networks perform **automatic feature extraction** without **human intervention**, unlike most traditional machine-learning algorithms. 

Given that feature extraction is a task that can take teams of data scientists years to accomplish, deep learning is a way to circumvent the chokepoint of limited experts. It augments the powers of small data science teams, which by their nature do not scale.

![image.png](attachment:image.png)

# Example: Feedforward Networks

Our goal in using a neural net is to arrive at the point of **least error as fast as possible**. 
- We are running a race, and the race is around a track, so we pass the same points repeatedly in a loop. 
- The starting line for the race is the state in which our weights are initialized, and the finish line is the state of those parameters when they are capable of producing sufficiently accurate classifications and predictions.

The race itself involves **many steps**, and each of **those steps resembles the steps before and after**. 
- Just like a runner, we will engage in a repetitive act over and over to arrive at the finish. 
- **Each step** for a neural network involves **a guess**, an error measurement and a slight update in its weights, an incremental adjustment to the coefficients, as it slowly learns to pay attention to the most important features.

A collection of weights, whether they are in their start or end state, is also called **a model**, because it is an attempt to model data’s relationship to ground-truth labels, to grasp the data’s structure. 

Models normally start out bad and end up less bad, **changing over time as the neural network updates its parameters**.

This is because a neural network is born in ignorance. It does not know which weights and biases will translate the input best to make the correct guesses. 

It has to start out with a guess, and then try to make better guesses sequentially as it learns from its mistakes.

(You can think of a neural network as a miniature enactment of the scientific method, testing hypotheses and trying again – only it is the scientific method with a blindfold on. Or like a child: they are born not knowing much, and through exposure to life experience, they slowly learn to solve problems in the world. For neural networks, data is the only experience.)

Here is a simple explanation of what happens during learning with a feedforward neural network, the simplest architecture to explain.

(1) Input enters the network. The **coefficients, or weights**, map that input to a set of guesses the network makes at the end.

$ input * weight = guess $

(2) Weighted input results in a guess about what that input is. The neural then takes its guess and compares it to a ground-truth about the data, effectively asking an expert “Did I get this right?”

$ground truth - guess = error$

(3) The difference between the network’s guess and the ground truth is its error. The network measures that error, and walks the error back over its model, adjusting weights to the extent that they contributed to the error.

$error * \mbox{weight's contribution to error} = adjustment$

The three pseudo-mathematical formulas above account for the three key functions of neural networks: 
1. scoring input, 
2. calculating loss and
3. applying an update to the model 

Loop the three-step process over again. A neural network is a **corrective feedback loop**, rewarding weights that support its correct guesses, and punishing weights that lead it to err.

# How does a neural network learn ?

Yep, creating variables and making them interact with each other is great, but that is not enough to make the whole neural network learn by itself. We need to prepare a lot of data to give to our network. Those data include the inputs and the output expected from the neural network.

To determine which weight is better to modify, a particular process, called **“backpropagation”** is done. We won’t linger too much on that, since the neural network we will build doesn’t use this exact process, but it consists on going back on the neural network and inspect every connection to check how the output would behave according to a change on the weight.

![image.png](attachment:image.png)