# Basic of simple fully connected neural network 

1. Introduction

   * An artificial neural network (ANN) is a network inspired by biological neural networks which are used to estimate or approximate functions that can depend on a large number of inputs that are generally unknown
   * A fully-connected neural network (NN) contains stacked nodes connected from the input layer -> hidden layers -> output layer
   * A node in a neural network is built from Weights and Activation function
   * During the early days, ANN containing a single node is called the preceptron

<img src="data/images/Perceptron.png" width="45%">

   * A Perceptron Network can be designed to have multiple layers, leading to the Multi-Layer Perceptron (aka `MLP`)

<img src="data/images/MLP.png" width="45%">

In [1]:
# import relevant libraries
import numpy as np
import pandas as pd
from keras.datasets import imdb # see more information from https://keras.io/datasets/
# Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). 
# Reviews have been preprocessed, and each review is encoded as a sequence of word indexes 
# (integers). For convenience, words are indexed by overall frequency in the dataset, 
# so that for instance the integer "3" encodes the 3rd most frequent word in the data. 
# This allows for quick filtering operations 
# such as: "only consider the top 10,000 most common words, but eliminate the top 20 most 
# common words
from keras.preprocessing.text import Tokenizer 
from sklearn.model_selection import train_test_split
from keras import models
from keras import layers

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
# view dataset
no_features = 1000
(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words = no_features) 
# num_words refers to the maximum number of words to keep, based on word frequency. 
# Only the most common num_words will be kept
new_table = pd.DataFrame()
new_table['x-train data'] = x_train
new_table['y-train data'] = y_train
new_table['x-test data'] = x_test
new_table['y-test data'] = y_test
new_table.head()

Unnamed: 0,x-train data,y-train data,x-test data,y-test data
0,"[1, 14, 22, 16, 43, 530, 973, 2, 2, 65, 458, 2...",1,"[1, 591, 202, 14, 31, 6, 717, 10, 10, 2, 2, 5,...",0
1,"[1, 194, 2, 194, 2, 78, 228, 5, 6, 2, 2, 2, 13...",0,"[1, 14, 22, 2, 6, 176, 7, 2, 88, 12, 2, 23, 2,...",1
2,"[1, 14, 47, 8, 30, 31, 7, 4, 249, 108, 7, 4, 2...",0,"[1, 111, 748, 2, 2, 2, 2, 4, 87, 2, 2, 7, 31, ...",1
3,"[1, 4, 2, 2, 33, 2, 4, 2, 432, 111, 153, 103, ...",1,"[1, 13, 2, 119, 14, 552, 7, 20, 190, 14, 58, 1...",0
4,"[1, 249, 2, 7, 61, 113, 10, 10, 13, 2, 14, 20,...",0,"[1, 40, 49, 85, 84, 2, 146, 6, 783, 254, 2, 33...",1


2. Building of neural network model  

    * To create a two-layer neural network (NN). Note that when counting layers we don't include the input layer as it does not have any parameters to learn.
    
    * Each layer is "dense" which is also called as fully connected, i.e. all the units in the previous layer are connected to all the neurals in the next layer
    
    Important parameters to be concerned:
    
    * units -> number of nodes
    
    * activation -> activation function
    
    * input_shape -> shape of input feature data

In [3]:
# initiate neural network
nn_model = models.Sequential()
nn_model.add(layers.Dense(units=16, activation="relu", input_shape=(1000,)))
nn_model.add(layers.Dense(units=16, activation='relu'))
nn_model.add(layers.Dense(units=1, activation='sigmoid'))

3. Compilation of neural network model  

    * To create a two-layer neural network (NN). Note that when counting layers we don't include the input layer as it does not have any parameters to learn.
    
    * Each layer is "dense" which is also called as fully connected, i.e. all the units in the previous layer are connected to all the neurals in the next layer
    
    Important parameters to be concerned:
    
    * units -> number of nodes
    
    * activation -> activation function
    
    * input_shape -> shape of input feature data

In [4]:
# model compilation
nn_model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])
nn_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 16)                16016     
_________________________________________________________________
dense_2 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 17        
Total params: 16,305
Trainable params: 16,305
Non-trainable params: 0
_________________________________________________________________


4. Training a binary classifier neural network

    Step 1: Decide on the number of input features
    
    Step 2: Setting up of training and validation datasets
    
    Use of tokenizer as the text preprocessing function. For more information, see https://keras.io/preprocessing/text/
    
    Step 3: Model training

In [5]:
# fitting ANN model
tokenizer = Tokenizer(num_words = no_features)
train_features = tokenizer.sequences_to_matrix(x_train,mode='binary')
test_features = tokenizer.sequences_to_matrix(x_test,mode='binary')
# print(train_features.shape,test_features.shape)

model_history = nn_model.fit(train_features,
                            y_train,
                            epochs = 100,
                            verbose = 2,
                            batch_size = 100,
                            validation_data = (test_features,y_test))

Train on 25000 samples, validate on 25000 samples
Epoch 1/100
 - 2s - loss: 0.4159 - acc: 0.8135 - val_loss: 0.3346 - val_acc: 0.8588
Epoch 2/100
 - 1s - loss: 0.3247 - acc: 0.8640 - val_loss: 0.3289 - val_acc: 0.8605
Epoch 3/100
 - 1s - loss: 0.3151 - acc: 0.8667 - val_loss: 0.3304 - val_acc: 0.8597
Epoch 4/100
 - 1s - loss: 0.3067 - acc: 0.8724 - val_loss: 0.3345 - val_acc: 0.8575
Epoch 5/100
 - 1s - loss: 0.2998 - acc: 0.8739 - val_loss: 0.3267 - val_acc: 0.8590
Epoch 6/100
 - 1s - loss: 0.2908 - acc: 0.8790 - val_loss: 0.3321 - val_acc: 0.8562
Epoch 7/100
 - 1s - loss: 0.2812 - acc: 0.8822 - val_loss: 0.3304 - val_acc: 0.8575
Epoch 8/100
 - 1s - loss: 0.2707 - acc: 0.8882 - val_loss: 0.3303 - val_acc: 0.8584
Epoch 9/100
 - 1s - loss: 0.2587 - acc: 0.8907 - val_loss: 0.3399 - val_acc: 0.8554
Epoch 10/100
 - 1s - loss: 0.2466 - acc: 0.8970 - val_loss: 0.3413 - val_acc: 0.8564
Epoch 11/100
 - 1s - loss: 0.2343 - acc: 0.9034 - val_loss: 0.3576 - val_acc: 0.8539
Epoch 12/100
 - 1s - los

Epoch 97/100
 - 1s - loss: 0.0019 - acc: 0.9994 - val_loss: 1.9904 - val_acc: 0.8085
Epoch 98/100
 - 1s - loss: 0.0021 - acc: 0.9995 - val_loss: 2.0079 - val_acc: 0.8074
Epoch 99/100
 - 1s - loss: 0.0016 - acc: 0.9994 - val_loss: 1.9984 - val_acc: 0.8051
Epoch 100/100
 - 1s - loss: 0.0012 - acc: 0.9995 - val_loss: 2.0099 - val_acc: 0.8060
