### Introduction

“Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction”.

A deep learning algorithm doesn’t need feature engineering from the user end! Instead, it asks for a sequence of input images and creates features (whisker, eyes, nose, hair pattern, etc.) out of them, all by itself! In this sense, we can say that a deep learning algorithm learns just as a child does.

#### Application

Facial Recognition Photo Tagging: Are you on Facebook? If yes, have you ever uploaded a picture of yours along with your friends? Did you notice that just by hovering over your friends’ faces, Facebook identifies their faces and asks you to tag them too? This is an example of deep learning image identification backed by the Deep Face (a deep learning architecture developed by Facebook, Inc.).

Natural Language Processing/Generation/Translation: Have you ever met a person from a region, whose language is difficult for you to decipher? What do you do if you have to converse with that person? Translation apps could be the answer! Google Translate is one of the targeted services to solve such frequent problems. The Google Translate product uses Google’s Neural Machine Translation System (a deep learning architecture) to provide near human-expertise in translating languages.

Self-driving cars – Have you heard of the autonomous cars being developed by Tesla, Google, Uber and every other major automaker? Have you wondered how their self-governing navigation system works? They use sensors and on-board analytics to learn to recognize any obstacles in the path and react to them appropriately using Deep Learning.

#### ML vs DL

Of course, similar to a child, a deep learning algorithm does perform mistakes while starting its learning journey but due to the availability of enormous data and computational resources in the present times it has proven to be far better than classical machine learning architectures in solving certain complicated problems (as seen before) and in some cases competitive to the human experts.
![image.png](attachment:image.png)

### Artificial Neural Network

The study of artificial neural network (ANN) is inspired by attempts to simulate the biological neural system. ANN consists of interconnected nodes, analogous to a human/biological network of neurons. Just like a neuron receives an input signal, processes it and transmits to other neurons, in ANN, the node receives the input, process it using a function known as an activation function and transmits the output to the other nodes.

An ANN has 3 kinds of nodes, namely,

Input Nodes: These are the nodes, that receive the input. <br> 
Hidden Nodes: These nodes receive the input from the input nodes, process the input and pass it on to the output nodes.<br>
Output Nodes: These nodes receive the processed inputs from the hidden nodes. 

<b>Input Layer</b>

It is the first layer in any neural network.
Input is fed to the network through this layer.
It brings the initial data into the system for further processing by subsequent layers of artificial neurons.
The input layer is the very beginning of the workflow for the artificial neural network.
It is the leftmost layer in the architecture of a neural network.

<b>Output Layer</b>

It is the last layer in any neural network.
We get the output of the network through this layer.
It gives the processed output by the previous layers of artificial neurons.
The output layer is the last of the workflow for the artificial neural network.
It is the rightmost layer in the architecture of a neural network.

<b>Hidden Layer</b>

It is the layer between the input layer and the output layer in any neural network.
It takes a set of weighted inputs and produces output through an activation function.
It is a typical part of nearly any neural network through which engineers try to simulate the activities that occur in a human brain.
There can be multiple hidden layers and each of these hidden layers can contain the same or a different number of neurons.
The input after getting processed with different weights and biases at each neuron at different layers is used to give us the output.

<b>Weights and Biases:</b> Weights are numerical parameters which determine how strongly each of the neurons affects the other. A bias unit is an "extra" neuron added to each pre-output layer that stores the value of 1.

Weights are learned (or adjusted) to produce the desired outputs from the given inputs. Mathematical techniques like Stochastic Gradient Descent, Adam, etc. (which we will discuss later) are used to search for optimal weights that will make the model accurate in its prediction.

<b>Activation function:</b> An activation function is a mathematical function which converts the input to an output. It gives the nonlinearity property to neural networks and makes them the true universal functional approximately. Without activation functions, the working of neural networks will be comparable to the linear functions.

In [1]:
pip install tensorflow

Collecting setuptools>=41.0.0 (from tensorboard<2.1.0,>=2.0.0->tensorflow)
  Using cached https://files.pythonhosted.org/packages/54/28/c45d8b54c1339f9644b87663945e54a8503cfef59cf0f65b3ff5dd17cf64/setuptools-42.0.2-py2.py3-none-any.whl
Installing collected packages: setuptools
  Found existing installation: setuptools 40.8.0
    Uninstalling setuptools-40.8.0:
      Successfully uninstalled setuptools-40.8.0
Successfully installed setuptools-42.0.2
Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install keras

Collecting keras
  Downloading https://files.pythonhosted.org/packages/ad/fd/6bfe87920d7f4fd475acd28500a42482b6b84479832bdc0fe9e589a60ceb/Keras-2.3.1-py2.py3-none-any.whl (377kB)
Installing collected packages: keras
Successfully installed keras-2.3.1
Note: you may need to restart the kernel to use updated packages.


In [5]:
# Importing libraries
import pandas as pd
import tensorflow as tf
import keras
# Reading train and test data
train_data = pd.read_csv('C:/Users/91758/Downloads/fashionmnisttrain.csv')
# Class names
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
# Creating validation data from test data
val_data = train_data.iloc[:5000,:]
test_data = train_data.iloc[5000:,:]
# Fetching the labels
train_labels = train_data.label
val_labels = val_data.label
test_labels = test_data.label
# Reshaping training data
train_images = train_data.iloc[:,1:].values.reshape(60000, 28, 28)
# Reshaping validation data
val_images = val_data.iloc[:,1:].values.reshape(5000, 28, 28)
# Scaling data in the range of 0-1
train_images = train_images/255.0
val_images = val_images/255.0

In [6]:
# Defining multi-layer perceptron model with 1 hidden layer having 1 neuron
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)), # Perform conversion of higher dimensional data (here, 2-D) to 1-D data.
    keras.layers.Dense(1, activation=tf.keras.activations.linear), # Hidden layer with 1 neuron and linear activation function
    keras.layers.Dense(10, activation=tf.keras.activations.linear) # Output layer with linear activation function 
])                                                   
# Defining parameters like optimizer, loss function and evaluating metric
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=keras.optimizers.Adam(), 
              metrics=['accuracy'])
model1 = model.fit(train_images, train_labels, epochs=5, validation_data=(val_images, val_labels))

Train on 60000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [7]:
# Defining multi-layer perceptron model with 1 hidden layer having 10 neurons 
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)), # Perform conversion of higher dimensional data (here, 2-D) to 1-D data.
    keras.layers.Dense(10, activation=tf.keras.activations.linear), # Hidden layer with 10 neurons and linear activation function
    keras.layers.Dense(10, activation=tf.keras.activations.linear) # Output layer with linear activation function 
])
# Defining parameters like optimizer, loss function and evaluating metric
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=keras.optimizers.Adam(), 
              metrics=['accuracy'])
model2 = model.fit(train_images, train_labels, epochs=5, validation_data=(val_images, val_labels))


Train on 60000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [8]:
# Defining multi-layer perceptron model with 1 hidden layer having 10 neurons with non-linearity
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)), # Perform conversion of higher dimensional data (here, 2-D) to 1-D data.
    keras.layers.Dense(10, activation=tf.nn.relu), # Hidden layer with 10 neurons and ReLU activation function
    keras.layers.Dense(10, activation=tf.nn.softmax) # Output layer with softmax activation function 
])
# Defining parameters like optimizer, loss function and evaluating metric
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=keras.optimizers.Adam(), 
              metrics=['accuracy'])
model3 = model.fit(train_images, train_labels, epochs=5, validation_data=(val_images, val_labels))


Train on 60000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


The model learns its parameters (weights and biases) through a process of forward propagation of data and back propagation of errors. To start with, random weights and biases are assumed. The input is then fed forward through the network. Each hidden layer modifies the input to it, until it reaches the output. This output achieved is compared to the expected output. Error is recorded as their difference. This error is then propagated backward through the layers to adjust the parameters. The forward propagation and the back propagation continue over a number of iterations, till a desirable low level of error is achieved. The neural network is then said to have learned its parameters.

### Activation Function

<b>Sigmoid Function</b>
![image.png](attachment:image.png)

Logistic (Sigmoid) activation function suffers with the below given drawbacks:

A saturated sigmoid neuron can cause the gradient to vanish.<br>
The output is always positive and hence it doesn’t follow a zero-centered approach where the output can flow in either positive or negative direction, thus making the optimization hard.<br>
The convergence gets slower due to the factor e-x which requires a heavy computation.

<b>ReLU</b>

ReLU is quite faster and much more efficient in learning for high-dimensional data. It doesn’t require intensive computation as it computes the function f(x)= max(0, x) which simply thresholds input matrix to zero. It doesn’t saturate i.e. no vanishing or exploding gradient and hence suitable for deep networks (multiple cascaded layers).
![image.png](attachment:image.png)

On the other side, ReLU is also responsible for creating dying neurons (a dying neuron is the one which never fires). Dying neurons are generated when no gradient flows backward through ReLU. There are two solutions to mitigate this problem:

Assign small learning rate<br>
Use leaky ReLU which allows a small negative slope when the unit is not active

<b>Leaky ReLU</b>

As mentioned in the ReLU section, leaky ReLU is created to fix the dying neurons problem of ReLU. With a negative weighted sum of inputs, it introduces a small slope to keep the update alive for the neurons.
![image.png](attachment:image.png)

![image.png](attachment:image.png)
From the result we can observe that, a positive input number returns itself whereas a negative input number returns a negative value scaled by 0.01 (or any other small value).

By using this activation function, we avoid saturation issue as it works for both negative and positive sides, no vanishing gradient, and easier computation.

<b>Softmax</b>

Softmax activation function results in probability values for a multiclass classification problem, the same way as sigmoid works for a binary classification problem.

![image.png](attachment:image.png)
Generally, it is used in the output layer as it is where we are required to get the probabilities of each class.