# Introduction to Deep Learning (Neural Networks)

By `Atwine Mugume Twinamatsiko`

<img src='deep1.jpeg'/>

## What is deep learning?
- Deep learning is an AI function that mimics the workings of the human brain in processing data for use in decision making.
- Deep learning AI is able to learn from data that is both unstructured and unlabeled


## Crash Course Overview
We are going to cover a lot of ground in this lesson. Here is an idea of what is ahead:
1. Multilayer Perceptrons.
2. Neurons, Weights and Activations.
3. Networks of Neurons.
4. Training Networks.

# Biological Neuron
A human brain has billions of neurons. Neurons are interconnected nerve cells in the human brain that are involved in processing and transmitting chemical and electrical signals. Dendrites are branches that receive information from other neurons.

<img src='deep4.jpg'/>

Cell nucleus or Soma processes the information received from dendrites. Axon is a cable that is used by neurons to send information. Synapse is the connection between an axon and other neuron dendrites

### Artificial Neuron
An artificial neuron is a mathematical function based on a model of biological neurons, where each neuron takes inputs, weighs them separately, sums them up and passes this sum through a nonlinear function to produce output.

<img src='deep5.jpeg'/>

# Neurons

The building block for neural networks are artificial neurons. These are simple computational units that have weighted input signals and produce an output signal using an activation function.

<img src='deep2.png'/>

### Neuron Weights

You may be familiar with linear regression, in which case the weights on the inputs are very
much like the coefficients used in a regression equation. Like linear regression, each neuron also has a bias which can be thought of as an input that always has the value 1.0 and it too must be weighted. For example, a neuron may have two inputs in which case it requires three weights. One for each input and one for the bias

### Activation

The weighted inputs are summed and passed through an activation function, sometimes called a
transfer function. An activation function is a simple mapping of summed weighted input to the
output of the neuron. It is called an activation function because it governs the threshold at
which the neuron is activated and the strength of the output signal.

### Networks of Neurons
Neurons are arranged into networks of neurons. A row of neurons is called a layer and one
network can have multiple layers. The architecture of the neurons in the network is often called the network topology.

<img src='deep3.png'/>


### Input or Visible Layers
The bottom layer that takes input from your dataset is called the visible layer, because it is
the exposed part of the network.


### Hidden Layers
Layers after the input layer are called hidden layers because they are not directly exposed to
the input.


### Output Layer

The final hidden layer is called the output layer and it is responsible for outputting a value
or vector of values that correspond to the format required for the problem. The choice of
activation function in the output layer is strongly constrained by the type of problem that you are modeling.


### Stochastic Gradient Descent

The classical and still preferred training algorithm for neural networks is called stochastic
gradient descent. This is where one row of data is exposed to the network at a time as input.
The network processes the input upward activating neurons as it goes to finally produce an
output value. This is called a forward pass on the network. It is the type of pass that is also used after the network is trained in order to make predictions on new data.

The output of the network is compared to the expected output and an error is calculated.
This error is then propagated back through the network, one layer at a time, and the weights
are updated according to the amount that they contributed to the error. This clever bit of math is called the Back Propagation algorithm. The process is repeated for all of the examples in your training data. One round of updating the network for the entire training dataset is called an epoch. A network may be trained for tens, hundreds or many thousands of epochs.

# What the heck if a Perceptron?

A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm enables neurons to learn and processes elements in the training set one at a time.

<img src='deep6.jpg'/>

There are two types of Perceptrons: Single layer and Multilayer.

Single layer Perceptrons can learn only linearly separable patterns.

Multilayer Perceptrons or feedforward neural networks with two or more layers have the greater processing power.

The Perceptron algorithm learns the weights for the input signals in order to draw a linear decision boundary.


### Perceptron Learning Rule
Perceptron Learning Rule states that the algorithm would automatically learn the optimal weight coefficients. The input features are then multiplied with these weights to determine if a neuron fires or not.

<img src='deep7.jpg'/>



### What is Activation Function?

It’s just a thing function that you use to get the output of node. It is also known as Transfer Function.

It is used to determine the output of neural network like yes or no. It maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the function).

### Sigmoid or Logistic Activation Function

<img src='deep8.png'/>

The main reason why we use sigmoid function is because it exists between (0 to 1). `Therefore, it is especially used for models where we have to predict the probability as an output.Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.`
The function is differentiable.That means, we can find the slope of the sigmoid curve at any two points.

The function is monotonic but function’s derivative is not.

The logistic sigmoid function can cause a neural network to get stuck at the training time.

The softmax function is a more generalized logistic activation function which is used for multiclass classification.

### ReLU (Rectified Linear Unit) Activation Function
The ReLU is the most used activation function in the world right now.Since, it is used in almost all the convolutional neural networks or deep learning.

<img src='deep9.png'/>

### Differences between deep learner and machine learning:

- The main difference between deep learning and machine learning is due to the way data is presented in the system. Machine learning algorithms almost always require structured data, while deep learning networks rely on layers of ANN (artificial neural networks).
- Machine learning algorithms are designed to “learn” to act by understanding labeled data and then use it to produce new results with more datasets. However, when the result is incorrect, there is a need to “teach them”.
- Deep learning networks do not require human intervention, as multilevel layers in neural networks place data in a hierarchy of different concepts, which ultimately learn from their own mistakes. However, even they can be wrong if the data quality is not good enough.
- Data decides everything. It is the quality of the data that ultimately determines the quality of the result.

# Multi-layer Perceptron

In the Multilayer perceptron, there can more than one linear layer (combinations of neurons). If we take the simple example the three-layer network, first layer will be the input layer and last will be output layer and middle layer will be called hidden layer. We feed our input data into the input layer and take the output from the output layer. We can increase the number of the hidden layer as much as we want, to make the model more complex according to our task.

<img src='deep10.jpeg'/>

Feed Forward Network, is the most typical neural network model. Its goal is to approximate some function f (). Given, for example, a classifier y = f ∗ (x) that maps an input x to an output class y, the MLP find the best approximation to that classifier by defining a mapping, y = f(x; θ) and learning the best parameters θ for it. The MLP networks are composed of many functions that are chained together. A network with three functions or layers would form f(x) = f (3)(f (2)(f (1)(x))). Each of these layers is composed of units that perform an affine transformation of a linear sum of inputs. Each layer is represented as y = f(WxT + b). Where f is the activation function (covered below), W is the set of parameter, or weights, in the layer, x is the input vector, which can also be the output of the previous layer, and b is the bias vector. The layers of an MLP consists of several fully connected layers because each unit in a layer is connected to all the units in the previous layer. In a fully connected layer, the parameters of each unit are independent of the rest of the units in the layer, that means each unit possess a unique set of weights.

## Forward pass
In this step of training the model, we just pass the input to model and multiply with weights and add bias at every layer and find the calculated output of the model.

<img src='deep11.png'/>

## Loss Calculate
When we pass the data instance(or one example) we will get some output from the model that is called Predicted output(pred_out) and we have the label with the data that is real output or expected output(Expect_out). Based upon these both we calculate the loss that we have to backpropagate(using Backpropagation algorithm). There is various Loss Function that we use based on our output and requirement.

## Backward Pass
After calculating the loss, we backpropagate the loss and updates the weights of the model by using gradient. This is the main step in the training of the model. In this step, weights will adjust according to the gradient flow in that direction.

# NLP

### What is NLP (Natural Language Processing)?
NLP is a subfield of computer science and artificial intelligence concerned with interactions between computers and human (natural) languages. It is used to apply machine learning algorithms to text and speech.

For example, we can use NLP to create systems like speech recognition, document summarization, machine translation, spam detection, named entity recognition, question answering, autocomplete, predictive typing and so on.


### Introduction to the NLTK library for Python
NLTK (Natural Language Toolkit) is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to many corpora and lexical resources. Also, it contains a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. Best of all, NLTK is a free, open source, community-driven project.

[Link](https://towardsdatascience.com/introduction-to-natural-language-processing-for-text-df845750fb63)

## The Basics of NLP for Text
In this article, we’ll cover the following topics:
- Sentence Tokenization
- Word Tokenization
- Text Lemmatization and Stemming
- Stop Words
- Regex
- Bag-of-Words
- TF-IDF

### Sentence Tokenization
Sentence tokenization (also called sentence segmentation) is the problem of dividing a string of written language into its component sentences. The idea here looks very simple. In English and some other languages, we can split apart the sentences whenever we see a punctuation mark.

#### Example:
Let’s look a piece of text about a famous board game called backgammon.

>>Backgammon is one of the oldest known board games. Its history can be traced back nearly 5,000 years to archeological discoveries in the Middle East. It is a two player game where each player has fifteen checkers which move between twenty-four points according to the roll of two dice.

To apply a sentence tokenization with NLTK we can use the `nltk.sent_tokenize` function.

```python
text = "Backgammon is one of the oldest known board games. Its history can be traced back nearly 5,000 years to archeological discoveries in the Middle East. It is a two player game where each player has fifteen checkers which move between twenty-four points according to the roll of two dice."
sentences = nltk.sent_tokenize(text)
for sentence in sentences:
    print(sentence)
    print()

```

#### As an output, we get the 3 component sentences separately.

```python
Backgammon is one of the oldest known board games.

Its history can be traced back nearly 5,000 years to archeological discoveries in the Middle East.

It is a two player game where each player has fifteen checkers which move between twenty-four points according to the roll of two dice.```

## Word Tokenization
Word tokenization (also called word segmentation) is the problem of dividing a string of written language into its component words. In English and many other languages using some form of Latin alphabet, space is a good approximation of a word divider.

### Example:
Let’s use the sentences from the previous step and see how we can apply word tokenization on them. We can use the `nltk.word_tokenize` function.

```python
for sentence in sentences:
    words = nltk.word_tokenize(sentence)
    print(words)
    print()
    
    ```
    
 ### Output:   
 
 ```python
['Backgammon', 'is', 'one', 'of', 'the', 'oldest', 'known', 'board', 'games', '.']

['Its', 'history', 'can', 'be', 'traced', 'back', 'nearly', '5,000', 'years', 'to', 'archeological', 'discoveries', 'in', 'the', 'Middle', 'East', '.']

['It', 'is', 'a', 'two', 'player', 'game', 'where', 'each', 'player', 'has', 'fifteen', 'checkers', 'which', 'move', 'between', 'twenty-four', 'points', 'according', 'to', 'the', 'roll', 'of', 'two', 'dice', '.']

```


[Link1](https://becominghuman.ai/a-simple-introduction-to-natural-language-processing-ea66a1747b32)

[Link1](https://towardsdatascience.com/introduction-to-natural-language-processing-for-text-df845750fb63)

[Link](https://medium.com/analytics-vidhya/introduction-to-natural-language-processing-part-1-777f972cc7b3)