<a href="https://colab.research.google.com/github/adinsa1/Data110/blob/main/DeepLearning_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Deep Learning
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

Yann LeCun, Yoshua Bengio & Geoffrey Hinton. Deep learning. Nature 521, 436–444 (28 May 2015)

### 2018 Turing Award

<img src="https://awards.acm.org/binaries/content/gallery/acm/ctas/awards/turing-2018-bengio-hinton-lecun.jpg">

### Artificial Neural Networks
Deep learning is based on Artificial neural networks (ANNs) which are inspired by the biological neural networks in animal brains.

<img src="https://miro.medium.com/max/610/1*SJPacPhP4KDEB1AdhOFy_Q.png">

### Traditional Machine Learning vs Deep Learning


<img src="https://www.merkleinc.com/sites/default/files/inline-images/DL%20and%20ML1%20resized.jpg">

### Gradient Descent

The slope is the derivative of a single variable function f(x). It tells us the direction of x with the steepest ascent. The negative of the slope tells us the direction of x with the steepest descent.

<img src="https://srdas.github.io/DLBook/DL_images/TNN1.png">

Gradient means the slope of a multivariable function with respect to each variable. It is a vector pointing in the direction of steepest ascent. The negative of the gradient points in the direction of steepest descent.

Here is an analogy: You are hiking on a mountain. All of a sudden it got dark or foggy. You want to go to the bottom of the mountain (or valley). The negative of the gradient tells you the direction to take your step to get fastest to the bottom (steepest descent). Hence, the gradient descent.

<img src="https://hackernoon.com/hn-images/1*f9a162GhpMbiTVTAua_lLQ.png">

Algorithm:

<img src="https://miro.medium.com/max/484/1*lIthvknHt9Tok5aIj4e__g.png">

### Learning Rate --Alpha

<img src="https://www.jeremyjordan.me/content/images/2018/02/Screen-Shot-2018-02-24-at-11.47.09-AM.png">

### Logistic Regression as a Neural Network

<img src="https://slideplayer.com/slide/12620833/76/images/8/Neuron+Model%3A+Logistic+Unit.jpg">

### Backpropogation

#### Naming Conventions: What we uses to call the coefficients (slopes with respect to each variable) in Linear Regression is now (in Deep Learning) called the weights. The intercept is now called the bias term.

#### Chain Rule: You may remember composition of functions, e.g., h(x) = g(f(x)). To do the derivative of h(x), we use the chain rule. h'(x) = g'(x)*f'(x)

For a single neuron network (e.g., Logistic Regression), it is  Sigmoid(Linear(x) + Bias) or sigmoid(θ0 + θ1x1 + θ2x2 + θ3x3 + θ4x4)

The composition of functions gets longer with more neurons and layers.

Backpropagation is an algorithm that computes the gradient of the cost (error) function with respect to each weight(coefficient or slope). It uses  the chain rule (Calculus) and propogates backwards from the output.

<img src="https://images.deepai.org/glossary-terms/73eec54be08746f6b546a874580b8673/backpropagation.png">

### Implemention of Single-layer Neural Network using NumPy

The human brain is remarkable at learning new tasks and this is made possible by the neurons.
Neurons learn through the process of trial and error, which we will be mimicking in this notebook.
#### Task
We will build a neural network that learns to predict 1 when a certain neuron is 1.

**Train data**

Input_1 | Input_2 | Input_3 | Output |
:-------------: |:-------------: | :-------------: | :-------------: |
0 | 0 | 0 | 0
0 | 0 | 1 | 0
0 | 1 | 0 | 1
1 | 0 | 0 | 0
1 | 1 | 0 | 1
1 | 1 | 1 | 1

#### Network Structure
Our network has three inputs, three weights and one output

In [None]:
from numpy import exp, array, random, dot


In [None]:
class SingleNeuronNetwork():
    def __init__(self):
        # We set the seed for the random number generator
        # so that  same random numbers are reproduced
        # every time the program is run
        random.seed(42)

        # --- Model a single neuron: 3 input connections and 1 output connection ---
        # Assign random weights to a 3 x 1 matrix: Floating-point values in (-1, 1)
        self.weights = 2 * random.random((3, 1)) - 1

    # --- Define the Sigmoid function ---
    # Pass the weighted sum of inputs through this function to normalize between [0, 1]
    def __sigmoid(self, x):
        return 1 / (1 + exp(-x))

    # --- Define derivative of the Sigmoid function ---
    # Evaluates confidence of existing learnt weights
    def __sigmoid_derivative(self, x):
        return x * (1 - x)

    # --- Define feed-forward procedure ---
    def feed_forward(self, inputs):
        # Feed-forward inputs through the single-neuron neural network
        return self.__sigmoid(dot(inputs, self.weights))

    # --- Define the training procedure ---
    # Modufy weights by calculating error after every iteration
    def train(self, train_inputs, train_outputs, num_iterations):
        # We run the training for num_iteration times
        for iteration in range(num_iterations):
            # Feed-forward the training set through the single neuron neural network
            output = self.feed_forward(train_inputs)

            # Calculate the error in predicted output
            # Difference between the desired output and the feed-forward output
            error = train_outputs - output

            # Multiply the error by the input and again by the gradient of Sigmoid curve
            # 1. Less confident weights are adjusted more
            # 2. Inputs, that are zero, do not cause changes to the weights
            adjustment = dot(train_inputs.T, error *
                             self.__sigmoid_derivative(output))

            # Make adjustments to the weights
            self.weights += adjustment



In [None]:
# Intialise a single-neuron neural network.
neural_network = SingleNeuronNetwork()

In [None]:
print ("Neural network weights before training (random initialization): ")
print (neural_network.weights)

# The train data consists of 6 examples, each consisting of 3 inputs and 1 output
train_inputs = array([[0, 0, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0], [1,1,0], [1,1,1]])
train_outputs = array([[0, 0, 1, 0, 1, 1]]).T

Neural network weights before training (random initialization): 
[[-0.25091976]
 [ 0.90142861]
 [ 0.46398788]]


In [None]:
# Train the neural network using a train inputs.
# Train the network for 10,000 steps while modifying weights to reduce error.
neural_network.train(train_inputs, train_outputs, 10000)

print ("Neural network weights after training: ")
print (neural_network.weights)

Neural network weights after training: 
[[-4.21652261]
 [12.79677774]
 [-4.21664048]]


**Test data**

Now that we have trained the network, let us use the weights of the trained network to predict inputs that were not used to train the network:

Input_1 | Input_2 | Input_3 | Expected Output |
:-------------: |:-------------: | :-------------: | :-------------: |
1 | 0 | 0 | 0
0 | 1 | 1 | 1


In [None]:
# Test the neural network with a new input
print ("Inferring predicting from the network for [1, 0] -> ?: ")
print (neural_network.feed_forward(array([1, 0, 0])))

Inferring predicting from the network for [1, 0] -> ?: 
[0.01453545]


In [None]:
print ("Inferring predicting from the network for [0, 1, 1] -> ?: ")
print (neural_network.feed_forward(array([0, 1,1])))

Inferring predicting from the network for [0, 1, 1] -> ?: 
[0.99981224]


### Linear vs Non-Linear Classifiers

<img src="https://assets.website-files.com/5cc74fa87e7513b0c52acc95/5cc7604a671396f4dab030a1_v7jVB_tJO9Vs8yc8dZRW4ZGrs2ujWhbjrJV_AMM_A1GT_GJMmhVVtT2nGM9fmZae_7e4kUzIzI-diTVVR2BxSdnEfO5LE_qNoMMJJj0Vc_BwwXbo4Ug8Qt5bm9nQqMNLm_NP8W2d.png">

Remember y = mx + b. The equation mx + b -y = 0 is a line that seperates the two classes. In higher dimensions (more features), a (hyper-)plane seperates the classes. Machine Learning notation: θ0 + θ1x1 + θ2x2 + θ3x3 + θ4x4 = 0 is the line or hyperplane that seperates the classes.

<img src="https://jtsulliv.github.io/images/perceptron/linsep_new.png?raw=True">

### Perceptron: XOR problem

<img src="https://saedsayad.com/images/Perceptron_XOR.png">

### Deep Learning Frameworks

Tensorflow is an open-source package for Neural Networks developed by Google. Keras is a high-level API that runs on top of Tensorflow. Easy to use for beginners.

PyTorch is an open-source package for Neural Networks developed by Facebook.
PyTorch is more popular in academia, whereas Tensorflow is more popular in industry.

### Keras/Tensorflow for the XOR problem:

Here is a blog on how to implement XOR:

https://blog.thoughtram.io/machine-learning/2016/11/02/understanding-XOR-with-keras-and-tensorlow.html


### PyTorch for XOR

https://courses.cs.washington.edu/courses/cse446/18wi/sections/section8/XOR-Pytorch.html
