## Deep Neural Networks

Deep Neural Networks architectures are designed to learn through multiple connections of layers where every single layer only receives a connection from previous and provides connections only to the next layer in the hidden part. The input layer is embedding vectors as shown in Figure below. The output layer neurons equal to the number of classes for multi-class classification and only one neuron for binary classification. Here, we have multi-class DNNs where the number of nodes in each layer as well as the number of layers are randomly assigned. The implementation of Deep Neural Network (DNN) is basically a discriminatively trained model that uses the standard back-propagation algorithm and sigmoid or ReLU as activation functions. The output layer for multi-class classification should use Softmax.


<img src="./images/DeepNeuralNetwork.png">

### Import Packages

In [1]:
# import libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn import metrics
import pandas as pd
import torch
import torch.nn.functional as F

import torch.nn as nn
import torch.optim as optim
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
from torch.utils.data import DataLoader
import pytorch_lightning as pl
import torch.utils.data as data_utils
from pytorch_lightning import Trainer

### Convert text to TF-IDF

In [2]:
# Convert text to TF-IDF:
def TFIDF(X_train, X_test, MAX_NB_WORDS=75000):
    '''
    In information retrieval, tf–idf or TFIDF, short for term frequency–
    inverse document frequency, is a numerical statistic that is intended
    to reflect how important a word is to a document in a collection or 
    corpus.[1] It is often used as a weighting factor in searches of 
    information retrieval, text mining, and user modeling. The tf–idf value
    increases proportionally to the number of times a word appears in the 
    document and is offset by the number of documents in the corpus that 
    contain the word, which helps to adjust for the fact that some words 
    appear more frequently in general. tf–idf is one of the most popular 
    term-weighting schemes today; 83% of text-based recommender systems in
    digital libraries use tf–idf.

    Parameters
    ----------
    X_train : list with each item having a set of words that will be used
    for training the model 
    X_test : list with each item having a set of words that will be used
    for testing the model
    MAX_NB_WORDS : Number of features in the vector
    '''
    # initiate vectorizer for obtaining fix dimensional vectors 
    vectorizer_x = TfidfVectorizer(max_features=MAX_NB_WORDS)
    
    # vectorize the train set
    X_train = vectorizer_x.fit_transform(X_train).toarray()
    
    # vectorize the test set
    X_test = vectorizer_x.transform(X_test).toarray()
    
    # print number of features in the vector
    print("tf-idf with",str(np.array(X_train).shape[1]),"features")
    
    # return train and test set
    return (X_train,X_test)

### Define model class using Pytorch Lightning

Pytorch Lightning provides a standard wrapper to load data, define and train deep learning models.

In this codeblock we define:
1. Model
2. Training/Validation/Test Steps
3. Optimizer settings
4. Train/Validation/Test Data Loader 

In [13]:

criterion = nn.CrossEntropyLoss()
class CoolSystem(pl.LightningModule):

    def __init__(self, shape, nclasses, dropout = 0.5):
        '''
        Deep neural network with 5 layers of 512 fully connected nodes.
        
        Parameters
        ----------
        shape: the dimensions of input layer
        nclasses: the dimensions of output layer
        dropout: the probability of dropping out.
        '''
        super(CoolSystem, self).__init__()
        self.shape = shape
        self.nclasses = nclasses
        self.dropout = dropout
        self.feature = nn.Sequential(
            nn.Linear(shape, 512),
            nn.ReLU(),
            nn.Dropout(p = dropout),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Dropout(p = dropout),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Dropout(p = dropout),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Dropout(p = dropout),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Dropout(p = dropout),
            nn.Linear(512, nclasses))

    def forward(self, x):
        '''
        Passes the input through Deep neural network defined before.
        
        Parameters
        ----------
        X: input
        '''
        x = self.feature(x)
        return x

    def training_step(self, batch, batch_nb):
        '''
        Training step, takes the training batch and pass it forward through network
        
        Parameters
        ----------
        batch: input
        batch_nb: batch number
        '''
        x, y = batch
        y_hat = self.forward(x)
        return {'loss': criterion(y_hat, y)}

    def validation_step(self, batch, batch_nb):
        '''
        Training step, takes the training batch and pass it forward through trained network
        
        Parameters
        ----------
        batch: input
        batch_nb: batch number
        '''
        x, y = batch
        y_hat = self.forward(x)
        return {'val_loss': criterion(y_hat, y)}

    def validation_end(self, outputs):
        '''
        Takes and stacks validation loss.
        Early stop can also be defined here
        
        Parameters
        ----------
        Outputs: Output of validation step
        '''
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        return {'avg_val_loss': avg_loss}

    def configure_optimizers(self):
        '''
        Optimizer for the network

        '''
        return torch.optim.Adam(self.parameters())

    @pl.data_loader
    def tng_dataloader(self):
        '''
        Training data loader, takes input directly from global environment
        Preprocessing can also be defined here.
        
        '''
        
        return\
    DataLoader(data_utils.TensorDataset(torch.Tensor(X_train_tfidf), torch.LongTensor(y_train)), batch_size=128)

    @pl.data_loader
    def val_dataloader(self):
        '''
        Validation data loader, takes input directly from global environment
        Preprocessing can also be defined here.
        
        '''
        return\
    DataLoader(data_utils.TensorDataset(torch.Tensor(X_test_tfidf), torch.LongTensor(y_test)), batch_size=128)

    @pl.data_loader
    def test_dataloader(self):
        '''
        Test data loader, takes input directly from global environment
        Preprocessing can also be defined here.
        
        '''
        return\
    DataLoader(data_utils.TensorDataset(torch.Tensor(X_test_tfidf), torch.LongTensor(y_test)), batch_size=128)

### Load text dataset (20newsgroups)

In [4]:
# load train data
newsgroups_train = fetch_20newsgroups(subset='train')

# load test data
newsgroups_test = fetch_20newsgroups(subset='test')

# make x and y
X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target

# Convert the text TFIDF
X_train_tfidf,X_test_tfidf = TFIDF(X_train,X_test)

tf-idf with 75000 features


### Train Model using Pytorch Lightning

In [15]:
# model
model = CoolSystem(75000, 20)

# most basic trainer, uses good defaults
trainer = Trainer(max_nb_epochs=10)  
trainer.fit(model)

gpu available: False, used: False


  0%|          | 0/5 [00:00<?, ?it/s]

          Name        Type    Params
0      feature  Sequential  39461396
1    feature.0      Linear  38400512
2    feature.1        ReLU         0
3    feature.2     Dropout         0
4    feature.3      Linear    262656
5    feature.4        ReLU         0
6    feature.5     Dropout         0
7    feature.6      Linear    262656
8    feature.7        ReLU         0
9    feature.8     Dropout         0
10   feature.9      Linear    262656
11  feature.10        ReLU         0
12  feature.11     Dropout         0
13  feature.12      Linear    262656
14  feature.13        ReLU         0
15  feature.14     Dropout         0
16  feature.15      Linear     10260


100%|██████████| 148/148 [02:09<00:00,  5.55it/s, avg_val_loss=1.33, batch_nb=88, epoch=9, loss=0.083]

1

### Evaluate Results on Test Set

In [17]:
# get prediction
with torch.no_grad():
        model.eval()
        predicted = model.forward(torch.Tensor(X_test_tfidf))

# get classification report
predicted = predicted.detach().numpy()
print(metrics.classification_report(y_test, np.argmax(predicted, axis=1)))

              precision    recall  f1-score   support

           0       0.75      0.72      0.74       319
           1       0.67      0.62      0.64       389
           2       0.74      0.55      0.63       394
           3       0.55      0.70      0.62       392
           4       0.57      0.80      0.67       385
           5       0.84      0.70      0.76       395
           6       0.88      0.82      0.85       390
           7       0.83      0.70      0.76       396
           8       0.98      0.91      0.94       398
           9       0.93      0.87      0.90       397
          10       0.99      0.90      0.94       399
          11       0.98      0.81      0.88       396
          12       0.51      0.79      0.62       393
          13       0.93      0.65      0.77       396
          14       0.79      0.91      0.84       394
          15       0.91      0.86      0.89       398
          16       0.77      0.87      0.81       364
          17       0.98    