<a href="https://colab.research.google.com/github/abhionair1/Autoencoder/blob/main/Redone_U3W14_CS_NLP_with_CNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The objective of this experiment is to see the application of Convolutional Neural Networks in NLP.

####Note that this case study based on this [paper.](http://www.aclweb.org/anthology/D14-1181)

In [17]:

!pip install kaggle
from google.colab import drive
drive.mount('/content/drive')
!mkdir ~/.kaggle
!cp /content/drive/MyDrive/Kaggle_api/kaggle.json ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [18]:
!kaggle datasets download -d umbertogriffo/googles-trained-word2vec-model-in-python

googles-trained-word2vec-model-in-python.zip: Skipping, found more recently modified local copy (use --force to force download)


In [19]:
!unzip googles-trained-word2vec-model-in-python.zip

Archive:  googles-trained-word2vec-model-in-python.zip
replace GoogleNews-vectors-negative300.bin? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace GoogleNews-vectors-negative300.bin.gz? [y]es, [n]o, [A]ll, [N]one, [r]ename: n


##Importing required packages

In [None]:
import torch  # Import the PyTorch library for deep learning
import torch.nn as nn  # Import the neural network module from PyTorch
from torch.autograd import Variable  # Import the Variable module for automatic differentiation
import torch.optim as optim  # Import PyTorch's optimization library
import torch.nn.functional as F  # Import the functional interface of PyTorch's neural network operations

import random  # Import the random module for generating random numbers
import numpy as np  # Import NumPy for numerical operations
from collections import Counter, OrderedDict  # Import Counter and OrderedDict for data collections
import nltk  # Import NLTK again (duplicate import)
import re  # Import the regular expression library for text processing
from copy import deepcopy  # Import the deepcopy function for creating deep copies


In [21]:
torch.__version__

'2.0.1+cu118'

## Code for accessing CUDA

In [22]:
USE_CUDA = torch.cuda.is_available()
gpus = [0]
torch.cuda.set_device(gpus[0])
FloatTensor = torch.cuda.FloatTensor if USE_CUDA else torch.FloatTensor
LongTensor = torch.cuda.LongTensor if USE_CUDA else torch.LongTensor
ByteTensor = torch.cuda.ByteTensor if USE_CUDA else torch.ByteTensor

## Function to split the data in to batches

In [None]:
'''
This function is useful for creating batches of data for training deep learning models in mini-batch fashion,
which is a common practice to efficiently train large datasets.


'''
def getBatch(batch_size, train_data):
    random.shuffle(train_data)  # Shuffle the training data randomly
    sindex = 0  # Initialize the start index of the batch
    eindex = batch_size  # Initialize the end index of the batch

    while eindex < len(train_data):
        batch = train_data[sindex: eindex]  # Extract a batch of data from the training data
        temp = eindex  # Store the current end index in a temporary variable
        eindex = eindex + batch_size  # Update the end index for the next batch
        sindex = temp  # Update the start index for the next batch
        yield batch  # Yield the batch for further processing

    if eindex >= len(train_data):
        batch = train_data[sindex:]  # Extract the last batch if it's smaller than batch_size
        yield batch  # Yield the last batch




## Function to add the padding to batches if required

In [None]:
'''
This function is commonly used in natural language processing tasks when working with variable-length sequences
to create batches of data with uniform dimensions, which is a requirement for many deep learning models.
'''
def pad_to_batch(batch):
    x, y = zip(*batch)  # Unzip the batch into separate lists of sequences (x) and labels (y)
    max_x = max([s.size(1) for s in x])  # Find the maximum sequence length in the batch
    x_p = []  # Initialize a list to store padded sequences

    for i in range(len(batch)):
        if x[i].size(1) < max_x:
            # If the sequence length is less than the maximum length, pad it with '<PAD>' tokens
            x_p.append(torch.cat([x[i], Variable(LongTensor([word2index['<PAD>']] * (max_x - x[i].size(1)))).view(1, -1)], 1))
        else:
            # If the sequence is already as long as the maximum, keep it as is
            x_p.append(x[i])

    # Concatenate the padded sequences and reshape the labels
    return torch.cat(x_p), torch.cat(y).view(-1)
'''
Here's how this function works:

It first unzips the batch into separate lists x (sequences) and y (labels).
It calculates the maximum sequence length (max_x) within the batch by finding the length of the longest sequence in x.
It initializes an empty list x_p to store the padded sequences.
It iterates over each sequence in the batch:
If the sequence length is less than the max_x, it pads the sequence with '<PAD>' tokens using torch.cat. This ensures that all sequences have the same length.
If the sequence is already as long as max_x, it keeps it as is.
Finally, it concatenates the padded sequences using torch.cat and reshapes the labels to be a 1D tensor.
'''

## Function to prepare the sequence

In [None]:
'''
This function is commonly used in natural language processing tasks to convert text data into a
format suitable for training deep learning models, where words are represented as numerical indices.
The use of <UNK> for unknown words is a typical practice to handle out-of-vocabulary words.
'''
def prepare_sequence(seq, to_index):
    # Map each word in the input sequence 'seq' to its corresponding index in 'to_index'
    idxs = list(map(lambda w: to_index[w] if to_index.get(w) is not None else to_index["<UNK>"], seq))
    # Create a tensor from the list of indices and wrap it in a PyTorch Variable
    return Variable(LongTensor(idxs))
'''
This Python function prepare_sequence is designed to prepare a sequence for input to a deep learning model
by converting it into a tensor of indices. It takes two arguments:

seq: The input sequence, which is a list of words.
to_index: A dictionary that maps words to their corresponding indices.
It uses the map function to apply a lambda function to each word in the input sequence (seq).

The lambda function looks up each word in the to_index dictionary and retrieves its index.
If the word is not found (i.e., to_index.get(w) is None), it uses the index corresponding to the <UNK> (unknown) token.

The resulting list of indices (idxs) represents the input sequence converted into a sequence of indices.

Finally, it creates a PyTorch LongTensor from the list of indices and wraps it in a Variable.
This is done to prepare the sequence for input to a neural network model, where each word is represented as an index.
'''

## Data load & Preprocessing

### TREC question dataset(http://cogcomp.org/Data/QA/QC/)

In [26]:
!wget http://cogcomp.org/Data/QA/QC/train_5500.label

--2023-09-12 11:39:31--  http://cogcomp.org/Data/QA/QC/train_5500.label
Resolving cogcomp.org (cogcomp.org)... 173.236.182.118
Connecting to cogcomp.org (cogcomp.org)|173.236.182.118|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.cogcomp.org/Data/QA/QC/train_5500.label [following]
--2023-09-12 11:39:32--  http://www.cogcomp.org/Data/QA/QC/train_5500.label
Resolving www.cogcomp.org (www.cogcomp.org)... 173.236.182.118
Reusing existing connection to cogcomp.org:80.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cogcomp.seas.upenn.edu/Data/QA/QC/train_5500.label [following]
--2023-09-12 11:39:33--  https://cogcomp.seas.upenn.edu/Data/QA/QC/train_5500.label
Resolving cogcomp.seas.upenn.edu (cogcomp.seas.upenn.edu)... 158.130.57.77
Connecting to cogcomp.seas.upenn.edu (cogcomp.seas.upenn.edu)|158.130.57.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 335858 (328K)
Saving to: ‘

Task involves
classifying a question into 6 question
types (whether the question is about person,
location, numeric information, etc.)

## Load the data

In [27]:
data = open('train_5500.label', 'r', encoding='latin-1').readlines()

'''
open('train_5500.label', 'r', encoding='latin-1'): This part of the code opens the file named 'train_5500.label' in read mode ('r') and
specifies the encoding as 'latin-1'. The encoding='latin-1' parameter is used to specify the character encoding of the file, which is important
 for correctly reading text files with non-standard character encodings.

.readlines(): This method is called on the file object returned by open. It reads the entire content of the
file and splits it into a list of strings. Each string in the list represents a line from the file.

So, after executing this line of code, the data variable will contain a list of strings, where each string
corresponds to a line from the 'train_5500.label' file. You can then process and analyze the contents of this list as needed for your specific task.
'''

In [28]:
data[:5]

['DESC:manner How did serfdom develop in and then leave Russia ?\n',
 'ENTY:cremat What films featured the character Popeye Doyle ?\n',
 "DESC:manner How can I find a list of celebrities ' real names ?\n",
 'ENTY:animal What fowl grabs the spotlight after the Chinese Year of the Monkey ?\n',
 'ABBR:exp What is the full form of .com ?\n']

## Split the data by seperating the labels

In [29]:
data = [[d.split(':')[1][:-1], d.split(':')[0]] for d in data]

'''this code processes the data list by splitting each string based on the colon character, extracting the text content
(excluding the newline character), and separating it from the label. It transforms the data into a format that pairs the
text with its corresponding label

for d in data: This part of the code iterates over each string d in the data list.

d.split(':'): It splits each string d into two parts using the colon ':' as the delimiter.
This results in a list with two elements.

[1]: From the split result, it selects the second element (index 1), which corresponds to the part
of the string after the colon. This is typically some text data.

[1:-1]: It slices the selected text to exclude the last character, which is usually a newline character ('\n').
This is done using slicing with [1:-1] to remove the newline character at the end of the text.

[0]: From the split result, it selects the first element (index 0), which corresponds to the part of the string
before the colon. This is often a label or category.
'''

In [30]:
data[:5]

[['manner How did serfdom develop in and then leave Russia ?', 'DESC'],
 ['cremat What films featured the character Popeye Doyle ?', 'ENTY'],
 ["manner How can I find a list of celebrities ' real names ?", 'DESC'],
 ['animal What fowl grabs the spotlight after the Chinese Year of the Monkey ?',
  'ENTY'],
 ['exp What is the full form of .com ?', 'ABBR']]

In [31]:
X, y = list(zip(*data))
X = list(X)

In [32]:
print(X[:5])
print(y[:5])

['manner How did serfdom develop in and then leave Russia ?', 'cremat What films featured the character Popeye Doyle ?', "manner How can I find a list of celebrities ' real names ?", 'animal What fowl grabs the spotlight after the Chinese Year of the Monkey ?', 'exp What is the full form of .com ?']
('DESC', 'ENTY', 'DESC', 'ENTY', 'ABBR')


## Print the labels in the data

In [33]:
set(y)

{'ABBR', 'DESC', 'ENTY', 'HUM', 'LOC', 'NUM'}

## Number masking

In [37]:
for i, x in enumerate(X):
    X[i] = re.sub('\d', '#', x).split()

'''
This code snippet uses a loop to process each element in the X list, which contains text data, and performs the following operations for each text:

re.sub('\d', '#', x): It uses the re.sub function from the re module to substitute all digits (\d) in the text x with the # character.
This effectively replaces all digits with #.

.split(): After replacing the digits, it splits the modified text into a list of words.
The split() method without any arguments splits the text using whitespace as the delimiter, effectively breaking it into individual words.

The code uses enumerate to loop through the X list, so i represents the index of the current element, and x represents the text data at that index.
It then updates the X list with the processed version of the text.

The purpose of this code is to preprocess the text data by replacing digits with # and splitting the text into words.
This type of preprocessing is common in natural language processing tasks to prepare text data for further analysis or modeling.
'''






Replacing the numbers with # (hash)

It reduces the search space.

For example,

my birthday is 12.22 ==> my birthday is ##.##

In [38]:
X[:2]

[['manner',
  'How',
  'did',
  'serfdom',
  'develop',
  'in',
  'and',
  'then',
  'leave',
  'Russia',
  '?'],
 ['cremat',
  'What',
  'films',
  'featured',
  'the',
  'character',
  'Popeye',
  'Doyle',
  '?']]

## Building the Vocabulary

In [39]:
vocab = list(set(flatten(X)))
print(len(vocab))
print(vocab)

'''
This line of code is used to create a vocabulary from the processed text data in the X list.
flatten(X): This function is called to flatten the list X, which contains lists of words. The flatten function is defined earlier
in your code as a lambda function that flattens a list of lists. It's used to convert the list of lists into a single list containing all the words from all the texts.

set(flatten(X)): This part of the code creates a set from the flattened list of words. A set is an unordered collection of unique elements.
Using a set here ensures that you have a list of unique words from the entire text corpus, removing any duplicates.

list(set(flatten(X))): Finally, the set is converted back into a list. This step is done to obtain a list of unique words
(vocabulary) from the text data. The vocab variable will contain a list of all unique words used in the text corpus.

In natural language processing, creating a vocabulary is a common preprocessing step. This vocabulary is often used for
tasks like text classification, sentiment analysis, or training neural networks. Having a vocabulary allows you to represent words
as numerical indices, which is essential for many machine learning and deep learning algorithms.'''

9117
['Soviet', 'haven', 'filthiest', 'bands', 'close', 'makes', 'Garrett', 'ballad', 'Triangle', 'woo', 'hear', 'physician', 'constitute', 'Water', 'declare', 'logo', 'cans', 'HUGO', 'buy', 'Tel', 'new', 'Harriet', 'candle', 'conference', 'Hamlet', 'MGM', 'Edmund', 'petroleum', 'sleep', 'beat', 'incubate', 'short', 'KDGE', 'Johnson', 'fiddlers', 'entertainer', 'Allsburg', 'Duane', 'didn', 'find', 'referring', 'GE', 'congressional', 'contemporary', 'Major', 'Ivy', 'jazz', 'year', 'trip', 'holiday', 'Gillette', 'Hyatt', 'Yeat', 'conjugations', 'Vichy', 'Sisters', 'rolling', 'dissolved', 'preacher', 'domesticated', 'Piazza', 'signed', 'Congress', 'America', 'comparisons', 'proficient', 'officially', 'Pike', '###', 'Jesse', 'groups', 'naked', 'connection', 'disaster', 'Throat', 'battlefield', 'transport', 'Fiesta', 'hearings', 'once-removed', 'Ancient', 'abacus', 'Sicilian', 'Dwarfs', 'blush', 'frozen', 'Turner', 'nonchlorine', 'Claus', 'Linux', 'battle', 'Oswald', 'Writer', 'equator', 'i

## Check for number of classes

In [40]:
len(set(y)) # num of class

6

## Create the index to words in the vocabulary

In [41]:
word2index={'<PAD>': 0, '<UNK>': 1}
print(len(word2index))

'''
In this code snippet, you are creating a dictionary named word2index that maps words to their corresponding numerical indices
These two special tokens are commonly used in natural language processing tasks:

'<PAD>': Typically represents padding tokens used to make sequences in a batch of equal length.
'<UNK>': Represents unknown or out-of-vocabulary tokens.
The dictionary allows you to convert words into their corresponding numerical indices, and it provides default values for padding and handling unknown words.

This kind of dictionary is essential when preparing text data for deep learning models because it enables you to represent words as numerical values,
which can be used as input to neural networks or other machine learning algorithms
'''

2


In [42]:
for vo in vocab:
    if word2index.get(vo) is None:
        word2index[vo] = len(word2index)
#print(word2index)
index2word = {v:k for k, v in word2index.items()}
#print(index2word)

'''
In this code snippet, you are expanding the `word2index` dictionary to include the words from your vocabulary `vocab` that are not already present in the dictionary. Additionally, you are creating a corresponding `index2word` dictionary for reverse lookups. Here's a breakdown of the code with comments:

- This loop iterates over each word `vo` in your vocabulary list `vocab`.
- Inside the loop, it checks whether the word `vo` exists as a key in the `word2index` dictionary using `word2index.get(vo)`.
If it doesn't exist (i.e., it returns `None`), it means the word is not in the dictionary.
- In that case, it assigns a new index to the word by using `len(word2index)`. This effectively adds the word to the dictionary with a unique numerical index.

This code is useful for creating a consistent mapping between words and their numerical indices, especially when working with text data and deep learning models.

- This line of code creates the `index2word` dictionary, which is the reverse mapping of `word2index`.
It swaps the keys and values from `word2index` so that you can easily look up words by their indices.

Both `word2index` and `index2word` dictionaries are important for mapping between words and their corresponding indices,
allowing you to convert text data into a numerical format that can be used as input to machine learning and deep learning models.'''

## Create the index to target

In [43]:
target2index = {}

for cl in set(y):
    if target2index.get(cl) is None:
        target2index[cl] = len(target2index)

index2target = {v:k for k, v in target2index.items()}
'''
In this code snippet, you are creating two dictionaries, `target2index` and `index2target`, to map class labels to numerical indices and vice versa.

- This loop iterates over the set of unique class labels in the `y` list (presumably, these are the labels associated with your text data).
- Inside the loop, it checks whether the class label `cl` exists as a key in the `target2index` dictionary using `target2index.get(cl)`.
If it doesn't exist (i.e., it returns `None`), it means the class label is not in the dictionary.
- In that case, it assigns a new index to the class label by using `len(target2index)`. This effectively adds the class label to the dictionary with a unique numerical index.

This code is useful for creating a consistent mapping between class labels and their numerical indices, which is essential for tasks like classification,
where you need to represent class labels as numerical values for training machine learning models.

- This line of code creates the `index2target` dictionary, which is the reverse mapping of `target2index`.
It swaps the keys and values from `target2index` so that you can easily look up class labels by their numerical indices.

Both `target2index` and `index2target` dictionaries are important for mapping between class labels and their corresponding numerical indices,
which is essential for supervised machine learning tasks like text classification.

## Preparing the data in tensor format

In [44]:
X_p, y_p = [], []
for pair in zip(X,y):
    ## Create the indexes for the list of split words of questions present in X and changing to tensor format
    X_p.append(prepare_sequence(pair[0], word2index).view(1, -1))
    ## Changes the format of labels to tensor format
    y_p.append(Variable(LongTensor([target2index[pair[1]]])).view(1, -1))
'''
In this code snippet, you are creating two lists, `X_p` and `y_p`, which will hold the prepared data for your machine learning or deep learning model.
- This loop iterates over pairs of text data and labels using `zip(X, y)`. It combines elements from the `X` list (which contains text data) and
the `y` list (which contains labels) into pairs represented by `pair`.

- For each pair, it does the following:

  - `prepare_sequence(pair[0], word2index)`: It prepares the input text data (`pair[0]`) by converting it into a tensor format using the `prepare_sequence` function,
  which you defined earlier. This function maps words to their corresponding indices based on the `word2index` dictionary.
  The resulting tensor is then reshaped using `.view(1, -1)` to create a 2D tensor with one row.

  - `Variable(LongTensor([target2index[pair[1]]]))`: It converts the label (`pair[1]`) into a tensor format by first looking up the label
  in the `target2index` dictionary to get its numerical index and then creating a tensor from that index. The result is also reshaped into a 2D tensor.

Both `X_p` and `y_p` will now contain the text data and labels in tensor format, which is typically used as input for training machine learning or
deep learning models. This code prepares the data for supervised learning, where you have pairs of input data (X) and corresponding target labels (y) ready for training.

## Zipping both the data and labels and shuffle randomly

In [45]:
data_p = list(zip(X_p, y_p))
random.shuffle(data_p)

'''
This code snippet prepares your data by combining the prepared input data (`X_p`) and corresponding labels (`y_p`) into pairs and then shuffling these pairs.

- `zip(X_p, y_p)`: This part of the code combines the prepared input data (`X_p`) and the corresponding labels (`y_p`) into pairs.
Each pair consists of a prepared input sequence and its corresponding label. The `zip` function effectively pairs elements from `X_p` and `y_p` element-wise.

- `list(...)`: The result of `zip(X_p, y_p)` is converted into a list. This creates a list of pairs where each pair contains a prepared input sequence and its label.

- `random.shuffle(data_p)`: After creating the list of pairs (`data_p`), this line shuffles the pairs randomly.
Shuffling the data is a common practice in machine learning to ensure that the order of the data does not introduce any biases during training.
It randomizes the order of your training examples.

After executing these lines of code, the `data_p` list will contain pairs of prepared input data and labels, and the order of these pairs will be random.
This shuffled data is typically used for training and evaluating machine learning or deep learning models.

## Split the data into train and test

In [46]:
train_data = data_p[: int(len(data_p) * 0.9)]
test_data = data_p[int(len(data_p) * 0.9):]

## Load Pretrained word vector

In [47]:
import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)

In [None]:
word2index.keys()

In [49]:
print(model['pail'].shape)
print(np.random.randn(300).shape)

(300,)
(300,)


## Get the vector corresponding to the word using the pretrained model

In [50]:
pretrained = []

for index, key in enumerate(word2index.keys()):
    try:
        pretrained.append(model[key]) # Try to retrieve the pretrained word vector for the word 'key'
    except:
        #print(index, key)
        pretrained.append(np.random.randn(300))

pretrained_vectors = np.vstack(pretrained)
#print(pretrained)

'''
In this code snippet, you are creating a list of pretrained word vectors by looking up each word in your `word2index` dictionary and attempting to retrieve
its corresponding word vector from a pre-trained Word2Vec model. Here's what the code does with comments:

This loop iterates over the keys (words) in your `word2index` dictionary using `enumerate`. For each word (`key`), it does the following:

  - `model[key]`: It attempts to retrieve the pretrained word vector for the word `key` from the `model`, which is your pre-trained Word2Vec model.

  - If the word is found in the pre-trained model, it appends the word vector to the `pretrained` list.

  - If the word is not found in the pre-trained model (raises an exception), it appends a randomly generated word vector of dimensionality 300 to the `pretrained` list.
  This is done to handle words that are not in the pre-trained vocabulary.

- This line of code stacks the word vectors in the `pretrained` list vertically using `np.vstack`, creating a 2D NumPy array called `pretrained_vectors`.
Each row in this array represents a word vector.

The resulting `pretrained_vectors` array contains word vectors for the words in your `word2index` dictionary.
Words found in the pre-trained model have their corresponding word vectors, while words not found in the model have randomly generated word vectors.

These pretrained word vectors can be used as embeddings in various natural language processing tasks, such as text classification, sentiment analysis,
or any task where word embeddings are beneficial.

## Modeling

![alttxt](https://cdn.talentsprint.com/aiml/Casestudies_slides/NLP_with_CNN/NLP_with_CNN.png)


The above image is borrowed from this [paper.](http://www.aclweb.org/anthology/D14-1181)

## Define CNN classifier architecture for classification as per the paper

In [None]:
class CNNClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, output_size, kernel_dim=100, kernel_sizes=(3, 4, 5), dropout=0.5):
        super(CNNClassifier, self).__init__()

        # Embedding layer to convert word indices into dense word vectors
        self.embedding = nn.Embedding(vocab_size, embedding_dim)

        # Convolutional layers with different kernel sizes
        self.convs = nn.ModuleList([nn.Conv2d(1, kernel_dim, (K, embedding_dim)) for K in kernel_sizes])

        # Dropout layer for regularization
        self.dropout = nn.Dropout(dropout)

        # Fully connected (linear) layer for classification
        self.fc = nn.Linear(len(kernel_sizes) * kernel_dim, output_size)

    def init_weights(self, pretrained_word_vectors, is_static=False):
        # Initialize the embedding weights with pretrained word vectors
        self.embedding.weight = nn.Parameter(torch.from_numpy(pretrained_word_vectors).float())
        if is_static:
            self.embedding.weight.requires_grad = False

    def forward(self, inputs, is_training=False):
        # Embedding layer: convert input word indices to word vectors
        inputs = self.embedding(inputs).unsqueeze(1)  # Add a channel dimension

        # Convolutional and pooling layers
        inputs = [F.relu(conv(inputs)).squeeze(3) for conv in self.convs]  # Apply convolution and ReLU
        inputs = [F.max_pool1d(i, i.size(2)).squeeze(2) for i in inputs]  # Max-pooling over time

        # Concatenate the outputs of different kernel sizes
        concated = torch.cat(inputs, 1)

        if is_training:
            concated = self.dropout(concated)

        # Fully connected layer for classification
        out = self.fc(concated)

        # Apply log softmax for output probabilities
        return F.log_softmax(out, 1)
'''
The code you provided defines a CNN (Convolutional Neural Network) classifier for text classification using PyTorch.
This network is designed for text data and can be used for various natural language processing tasks.
Below is an explanation of the key components and methods of this `CNNClassifier` class:

- `__init__`: The constructor of the class takes several parameters to configure the network architecture, including the vocabulary size,
embedding dimension, output size (number of classes), kernel dimensions, kernel sizes, and dropout rate.
It initializes the embedding layer, convolutional layers with multiple kernel sizes, a dropout layer, and a fully connected layer.

- `init_weights`: This method initializes the embedding weights with pretrained word vectors. If `is_static` is set to `True`, it freezes the embedding weights
during training, making them non-trainable.

- `forward`: This method defines the forward pass of the network. It takes input word indices, converts them into word vectors using the embedding layer,
applies convolution and ReLU activation functions, performs max-pooling over time, concatenates the outputs of different kernel sizes, and passes the result
through a fully connected layer. The final output is obtained by applying the log softmax function for classification.

This `CNNClassifier` is designed for text classification tasks, such as sentiment analysis or document classification, and leverages convolutional
neural networks to capture important local features in the text data. It can be trained using text data and used for making predictions on new text samples.

## Training the model

## Set the parameters

In [52]:
EPOCH = 5
BATCH_SIZE = 50
KERNEL_SIZES = [2,2,2]
KERNEL_DIM = 100
LR = 0.001

## Set up the defined CNN model and  Initialize embedding matrix using pretrained vectors

In [53]:
model = CNNClassifier(len(word2index), 300, len(target2index), KERNEL_DIM, KERNEL_SIZES)
model.init_weights(pretrained_vectors) # initialize embedding matrix using pretrained vectors

In [54]:
if USE_CUDA:
    model = model.cuda()

## Define loss function and optimizer

In [55]:
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=LR)

## Train the data batch wise

In [56]:
for epoch in range(EPOCH):
    losses = []
    for i,batch in enumerate(getBatch(BATCH_SIZE, train_data)):
        inputs,targets = pad_to_batch(batch)

        model.zero_grad()
        preds = model(inputs, True)

        loss = loss_function(preds, targets)
        losses.append(loss.data.item())
        loss.backward()

        #for param in model.parameters():
        #    param.grad.data.clamp_(-3, 3)

        optimizer.step()

        if i % 100 == 0:
            print("[%d/%d] mean_loss : %0.2f" %(epoch, EPOCH, np.mean(losses)))
            losses = []

'''
In this code snippet, you are training your CNN-based text classification model for a specified number of epochs (`EPOCH`). Inside the training loop,
you are performing the following steps for each batch of training data:

1. Zeroing the gradients using `model.zero_grad()`: This step clears the gradients of the model's parameters, preparing it for a new gradient computation.

2. Forward pass: You pass the input data (`inputs`) through the model by calling `model(inputs, True)`. The `True` argument indicates that you are
in training mode, which might be used for features like dropout.

3. Computing the loss: The model's predictions (`preds`) and the target labels (`targets`) are used to compute the loss using the specified loss
function (`loss_function`). The loss is then appended to the `losses` list for tracking.

4. Backpropagation: The gradients are computed by calling `loss.backward()`. This step computes the gradients of the loss with respect to the
model's parameters, which are needed for gradient-based optimization.

5. Optimizer step: You update the model's parameters using the optimizer (`optimizer.step()`). The optimizer adjusts the model's weights based on
the computed gradients and the learning rate.

6. Printing progress: If `i` (the current batch index) is a multiple of 100, you print the mean loss for the current epoch to monitor the training progress.

This loop runs for each batch of training data, and the process repeats for the specified number of epochs (`EPOCH`). The model's parameters are
updated iteratively, and the loss is monitored to gauge how well the model is learning. The learning rate, loss function, and other hyperparameters
play crucial roles in determining the model's convergence and performance during training.

Note: The code includes commented-out lines for gradient clipping (`param.grad.data.clamp_(-3, 3)`), which can be useful for stabilizing training
in some cases. You can uncomment and use this code if gradient explosion is observed during training.

[0/5] mean_loss : 1.93
[1/5] mean_loss : 0.46
[2/5] mean_loss : 0.15
[3/5] mean_loss : 0.07
[4/5] mean_loss : 0.05


## Predict the test data with the trained model and calculate the test accuracy

In [57]:
accuracy = 0
for test in test_data:
    pred = model(test[0].cuda()).max(1)[1]
    pred = pred.data.tolist()[0] #pred is converted from a PyTorch tensor to a Python list to extract the predicted class index.
    target = test[1].data.tolist()[0][0] #target is similarly extracted from the test example to get the true class index.
    if pred == target:
        accuracy += 1

print(accuracy/len(test_data) * 100)

'''
For each test example, you pass its input (test[0]) to the model, perform a forward pass (.cuda() indicates using the GPU if available),
and obtain the predicted class label. model(test[0]) computes the class probabilities, .max(1) finds the class with the highest probability,
and [1] extracts the index of the predicted class.

98.35164835164835
