# CE7455 Assignment 2

Name: PENG HONGYI <br>
Matric No: G2105029E

## Quetion One (i)

__Named Entity Recognition__ (NER) as an important task in NLP, attemps to classify predefined entities in a sentence. In our assignment, we use _eng.train_ for training, _eng.testa_ for validation and _eng.testb_ for testing.  

A sentence in the dataset is presented below, where the first columns is the input word and the last column is the output tag. The dataset contains four different types of predefined entities: PERSON, LOCATION, ORGANIZATION, and MISC. As shown in the fourth column, the fourth column contains the groudtruth entity name and the BIO tag, seperated by '-'.

| Word    |     |      | Tag    |
|---------|-----|------|--------|
| EU      | NNP | I-NP | I-ORG  |
| rejects | VBZ | I-VP | O      |
| German  | JJ  | I-NP | I-MISC |
| call    | NN  | I-NP | O      |
| to      | TO  | I-VP | O      |
| boycott | VB  | I-VP | O      |
| British | JJ  | I-NP | I-MISC |
| lamb    | NN  | I-NP | O      |
| .       | .   | O    | O      |

## Question one (ii)


There are 5 preprocessing steps in the code provided:
* Replacing all the digit with 0.
* Convert BIO tagging to BIOES tagging.
* Generate words mapping.
* Generate tag mapping.
* Generate chracter mapping.
Mappings here are dictionaries that assign an integer id to every unique word, character and tag. After the proprocessing step, we found: 

> Found 17493 unique words (203621 in total) 
Found 75 unique characters 
Found 19 unique named entity tags

The preprocessed dataset is stored in _data/mapping.pkl_. To save time, we will directly processed data throughout this assignment.



In [1]:
# Load data
import pickle
with open('data/mapping.pkl', 'rb') as f:
    mapping = pickle.load(f)
list(mapping.keys())

['word_to_id', 'tag_to_id', 'char_to_id', 'parameters', 'word_embeds']

In [2]:
word_to_id = mapping['word_to_id']
tag_to_id = mapping['tag_to_id']
char_to_id = mapping['char_to_id']
# We use our own parameters
# parameters = mapping['parameters']
word_embeds = mapping['word_embeds']

In [3]:
from collections import OrderedDict
import torch
parameters = OrderedDict()
parameters['train'] = "./data/eng.train" #Path to train file
parameters['dev'] = "./data/eng.testa" #Path to test file
parameters['test'] = "./data/eng.testb" #Path to dev file
parameters['tag_scheme'] = "BIOES" #BIO or BIOES
parameters['lower'] = True # Boolean variable to control lowercasing of words
parameters['zeros'] =  True # Boolean variable to control replacement of  all digits by 0 
parameters['char_dim'] = 30 #Char embedding dimension
parameters['word_dim'] = 100 #Token embedding dimension
parameters['word_lstm_dim'] = 200 #Token LSTM hidden layer size
parameters['word_bidirect'] = True #Use a bidirectional LSTM for words
parameters['embedding_path'] = "./data/glove.6B.100d.txt" #Location of pretrained embeddings
parameters['all_emb'] = 1 #Load all embeddings
parameters['crf'] =1 #Use CRF (0 to disable)
parameters['dropout'] = 0.5 #Droupout on the input (0 = no dropout)
parameters['epoch'] =  50 #Number of epochs to run"
parameters['weights'] = "" #path to Pretrained for from a previous run
parameters['name'] = "self-trained-model" # Model name
parameters['gradient_clip']=5.0
parameters['char_mode']="CNN"
models_path = "./models/" #path to saved models
parameters['use_gpu'] = torch.cuda.is_available() #GPU Check
use_gpu = parameters['use_gpu']
parameters['reload'] = "./models/pre-trained-model" 

In [4]:
from Utils import load_sentences
from TagConversion import update_tag_scheme
train_sentences = load_sentences(parameters['train'], parameters['zeros'])
test_sentences = load_sentences(parameters['test'], parameters['zeros'])
val_sentences = load_sentences(parameters['dev'], parameters['zeros'])
update_tag_scheme(train_sentences, parameters['tag_scheme'])
update_tag_scheme(val_sentences, parameters['tag_scheme'])
update_tag_scheme(test_sentences, parameters['tag_scheme'])


In [5]:
from Utils import prepare_dataset

train_data = prepare_dataset(
    train_sentences, word_to_id, char_to_id, tag_to_id, parameters['lower']
)
val_data = prepare_dataset(
    val_sentences, word_to_id, char_to_id, tag_to_id, parameters['lower']
)
test_data = prepare_dataset(
    test_sentences, word_to_id, char_to_id, tag_to_id, parameters['lower']
)
print("{} / {} / {} sentences in train / val / test.".format(len(train_data), len(val_data), len(test_data)))

14041 / 3250 / 3453 sentences in train / val / test.


In [6]:
from BaseModel import BiLSTM_CRF
model = BiLSTM_CRF(vocab_size=len(word_to_id),
                   tag_to_ix=tag_to_id,
                   embedding_dim=parameters['word_dim'],
                   hidden_dim=parameters['word_lstm_dim'],
                   use_gpu=use_gpu,
                   char_to_ix=char_to_id,
                   pre_word_embeds=word_embeds,
                   use_crf=parameters['crf'],
                   char_mode=parameters['char_mode'])

  nn.init.uniform(input_embedding, -bias, bias)
  nn.init.uniform(weight, -sampling_range, sampling_range)
  nn.init.uniform(weight, -sampling_range, sampling_range)
  nn.init.uniform(weight, -sampling_range, sampling_range)
  nn.init.uniform(weight, -sampling_range, sampling_range)
  nn.init.uniform(input_linear.weight, -bias, bias)


In [7]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

In [8]:
num_parameters = count_parameters(model)
print(f'The base model contain {num_parameters} parameters')

The base model contain 2284255 parameters


In [9]:
model.load_state_dict(torch.load(parameters['reload']))
print("model reloaded :", parameters['reload'])
if use_gpu:
    model.cuda()

model reloaded : ./models/pre-trained-model


In [10]:
import numpy as np
from torch.autograd import Variable
from Helper import get_chunks
from tqdm import tqdm


def evaluate(model, datas):
    prediction = []
    correct_preds, total_correct, total_preds = 0., 0., 0.
    for data in tqdm(datas, total=len(datas)):
        ground_truth_id = data['tags']
        words = data['str_words']
        chars2 = data['chars']
        if parameters['char_mode'] == 'LSTM':
            chars2_sorted = sorted(chars2, key=lambda p: len(p), reverse=True)
            d = {}
            for i, ci in enumerate(chars2):
                for j, cj in enumerate(chars2_sorted):
                    if ci == cj and not j in d and not i in d.values():
                        d[j] = i
                        continue
            chars2_length = [len(c) for c in chars2_sorted]
            char_maxl = max(chars2_length)
            chars2_mask = np.zeros(
                (len(chars2_sorted), char_maxl), dtype='int')
            for i, c in enumerate(chars2_sorted):
                chars2_mask[i, :chars2_length[i]] = c
            chars2_mask = Variable(torch.LongTensor(chars2_mask))

        if parameters['char_mode'] == 'CNN':
            d = {}
            chars2_length = [len(c) for c in chars2]
            char_maxl = max(chars2_length)
            chars2_mask = np.zeros(
                (len(chars2_length), char_maxl), dtype='int')
            for i, c in enumerate(chars2):
                chars2_mask[i, :chars2_length[i]] = c
            chars2_mask = Variable(torch.LongTensor(chars2_mask))

        dwords = Variable(torch.LongTensor(data['words']))
        if use_gpu:
            val, out = model(
                dwords.cuda(), chars2_mask.cuda(), chars2_length, d)
        else:
            val, out = model(dwords, chars2_mask, chars2_length, d)
        predicted_id = out
        lab_chunks = set(get_chunks(ground_truth_id, tag_to_id))
        lab_pred_chunks = set(get_chunks(predicted_id,
                                         tag_to_id))

        correct_preds += len(lab_chunks & lab_pred_chunks)
        total_preds += len(lab_pred_chunks)
        total_correct += len(lab_chunks)

    p = correct_preds / total_preds if correct_preds > 0 else 0
    r = correct_preds / total_correct if correct_preds > 0 else 0
    F = 2 * p * r / (p + r) if correct_preds > 0 else 0
    return F

In [16]:
f_score = evaluate(model, test_data)
print(f'Original Model Test F1-Score: {f_score}')

100%|██████████| 3453/3453 [03:46<00:00, 15.23it/s]

Original Model Test F1-Score: 0.8401554170055119





We load the trained model provided at https://github.com/TheAnig/NER-LSTM-CNN-Pytorch/raw/master/trained-model-cpu and evaluate it performance on the test set. The test f1-score is:
> trained model: __0.84__

## Question one (iii)

Either an CNN and an LSTM can be used to perform character-level encoding
In the provided code, the CNN is declared as 
```
char_cnn = nn.Conv2d(in_channels=1, out_channels=self.out_channels, kernel_size=(3, char_embedding_dim), padding=(2,0))
```
Whereas, the LSTM is defined as
```
char_lstm = nn.LSTM(char_embedding_dim, char_lstm_dim, num_layers=1, bidirectional=True)
                init_lstm(self.char_lstm)
```
No matter what characte-level encoder is used, the extracted character-level representation will be contactenated with higher-level word embeddings and be fed into a Bidirectional lstm. However, for different encoder, the input dimension for the higher-level LSTM are different. If CNN is used, the input dimension is word_embedding_dim + out_channeles. If LSTM is used, the input dimension is word_embedding_dim + char_lstm_dim * 2

## Question one (iV)
As mentioned in the previous section, word embeddings, contactenated with the characte level embedding, are fed into an LSTM.
In this section, we will replace the LSTM with CNN.

First of all, let's check the output dimension of the word-level lstm

In [11]:
with torch.no_grad():
    x = torch.randn(8, 1, 125).cuda()
    lstm = model.lstm
    print(lstm)
    lstm_y, _ = lstm(x)
    print(lstm_y.shape)

LSTM(125, 200, bidirectional=True)
torch.Size([8, 1, 400])


Clearly, the input of the word-level lstm has shape $(L, N, H_{emb})$ where $L$ is the length of sequences, in our case the length of the input sentences, $N$ is the number of samples in the mini-batch. The provided code adopts single-batch training. Thus, $N=1$. $H_{emb}$ is the dimension of input embedding. In our case, $H_{emb} = H_{word} + H_{char} = 100 + 25 = 125$. The output of word-level lstm has shape $(L, N, 2*H_{ltsm})$ since our lstm is bi-directional.(we set $H_{lstm}=200$.)

__nn.Conv1d()__ takes a $(N, C_{in}, L_{in})$ tensor as input. To make sure that the output dimension is the same after replacing lstm with cnn, we need to transpose the input tensor, and set the output channel $C_{out}$ = 400. Moreover, we set the kernal size equals to 3, if no padding added, the output of cnn will be $(N, C_{out}, L_{in}-3+1)$. In order to not to change $L$, the padding is set to 1.

In [12]:
from torch import nn
KERNAL_SIZE = 3
with torch.no_grad():
    cnn = nn.Conv1d(in_channels=125, out_channels=400, kernel_size=3, padding=1).cuda()
    x = x.squeeze()
    x_t = x.transpose(0, 1)
    x_t = x_t.unsqueeze(0)
    print('Input: ', x_t.shape)
    y = cnn(x_t)
    print('Output:', y.shape)

Input:  torch.Size([1, 125, 8])
Output: torch.Size([1, 400, 8])


In [13]:
lstm_y= lstm_y.view(8, -1)
y = y.squeeze_().t()
print(lstm_y.shape == y.shape)

True


To replace the lstm layer, we define a new class that inherit the provided model and modify the __get_lstm_features()__ function. Although, it should be called __get_cnn_features()__ now

In [14]:
class OneCNN_WordEncoderModel(BiLSTM_CRF):
    def __init__(self, char_lstm_dim=25, *args, **kwargs):
        self.char_lstm_dim = char_lstm_dim
        super().__init__(*args, **kwargs)
        # Init word-level cnn
        self.word_cnn = nn.Conv1d(
            in_channels=self.embedding_dim + self.out_channels,
            out_channels=2*self.hidden_dim,
            kernel_size=KERNAL_SIZE,
            padding=1    
        )
        
        
        
    def get_lstm_features(self, sentence, chars2, chars2_length, d):
        if self.char_mode == 'LSTM':
            chars_embeds = self.char_embeds(chars2).transpose(0, 1)
            packed = torch.nn.utils.rnn.pack_padded_sequence(
                chars_embeds, chars2_length)
            lstm_out, _ = self.char_lstm(packed)
            outputs, output_lengths = torch.nn.utils.rnn.pad_packed_sequence(
                lstm_out)
            outputs = outputs.transpose(0, 1)
            chars_embeds_temp = Variable(torch.FloatTensor(
                torch.zeros((outputs.size(0), outputs.size(2)))))
            if self.use_gpu:
                chars_embeds_temp = chars_embeds_temp.cuda()
            for i, index in enumerate(output_lengths):
                chars_embeds_temp[i] = torch.cat(
                    (outputs[i, index-1, :self.char_lstm_dim], outputs[i, 0, self.char_lstm_dim:]))
            chars_embeds = chars_embeds_temp.clone()
            for i in range(chars_embeds.size(0)):
                chars_embeds[d[i]] = chars_embeds_temp[i]
        if self.char_mode == 'CNN':
            chars_embeds = self.char_embeds(chars2).unsqueeze(1)
            chars_cnn_out3 = self.char_cnn3(chars_embeds)
            chars_embeds = nn.functional.max_pool2d(chars_cnn_out3,
                                                    kernel_size=(chars_cnn_out3.size(2), 1)).view(chars_cnn_out3.size(0), self.out_channels)
        embeds = self.word_embeds(sentence)
        embeds = torch.cat((embeds, chars_embeds), 1)
        # embeds = embeds.unsqueeze(1)
        embeds = self.dropout(embeds)
        
        # Orignal Code for LSTM Features 
        #################################################################
        # Word lstm
            # lstm_out, _ = self.lstm(embeds)
            # print('lstm-out', lstm_out.shape)

        # Reshaping the outputs from the lstm layer
            # lstm_out = lstm_out.view(len(sentence), self.hidden_dim*2)
            # print('lstm_out view change', embeds.shape)

        # Dropout on the lstm output
            # lstm_out = self.dropout(lstm_out)

        # Linear layer converts the ouput vectors to tag space
        #################################################################
        # Our Code for CNN Features 
        # The embed size now is (L, 125)
        embeds = embeds.transpose(0, 1)
        # embede size: (125, L)
        embeds = embeds.unsqueeze(0)
        # embdes size: (1, 125, L)
        embeds = self.word_cnn(embeds)
        # embedes size: (1, 400, L)
        cnn_out = embeds.squeeze_().t()
        # embedes size: (400, L) => (L, 400)
        lstm_feats = self.hidden2tag(cnn_out)
        return lstm_feats


In [18]:
WordCNNmodel = OneCNN_WordEncoderModel(vocab_size=len(word_to_id),
                   tag_to_ix=tag_to_id,
                   embedding_dim=parameters['word_dim'],
                   hidden_dim=parameters['word_lstm_dim'],
                   use_gpu=use_gpu,
                   char_to_ix=char_to_id,
                   pre_word_embeds=word_embeds,
                   use_crf=parameters['crf'],
                   char_mode=parameters['char_mode'])

Let's see the total number of parameters in our model

In [16]:
count_parameters(WordCNNmodel)

2434655

Now we declare the training function and the optimizer

In [19]:
def adjust_learning_rate(optimizer, lr):
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

In [25]:
def train(num_epochs, model, data, save_dir = 'models/cnn-word.pt'):
    model.train(True)
    model.cuda()
    losses = []
    best_valid = 0.
    stop_count = 5
    valid_scores = []
    learning_rate = 0.015
    momentum = 0.9
    decay_rate = 0.05
    gradient_clip = parameters['gradient_clip']
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)
    for epoch in range(1, num_epochs+1):
        loss = 0
        print(f'Epoch {epoch} training starts')
        p_bar = tqdm(enumerate(np.random.permutation(len(train_data))), total=len(train_data))
        for i, index in p_bar:
            data = train_data[index]
            model.zero_grad()
            sentence_in = data['words']
            sentence_in = Variable(torch.LongTensor(sentence_in))
            tags = data['tags']
            chars2 = data['chars']
            if parameters['char_mode'] == 'LSTM':
                chars2_sorted = sorted(chars2, key=lambda p: len(p), reverse=True)
                d = {}
                for i, ci in enumerate(chars2):
                    for j, cj in enumerate(chars2_sorted):
                        if ci == cj and not j in d and not i in d.values():
                            d[j] = i
                            continue
                chars2_length = [len(c) for c in chars2_sorted]
                char_maxl = max(chars2_length)
                chars2_mask = np.zeros((len(chars2_sorted), char_maxl), dtype='int')
                for i, c in enumerate(chars2_sorted):
                    chars2_mask[i, :chars2_length[i]] = c
                chars2_mask = Variable(torch.LongTensor(chars2_mask))
            
            if parameters['char_mode'] == 'CNN':
                d = {}
                chars2_length = [len(c) for c in chars2]
                char_maxl = max(chars2_length)
                chars2_mask = np.zeros((len(chars2_length), char_maxl), dtype='int')
                for i, c in enumerate(chars2):
                    chars2_mask[i, :chars2_length[i]] = c
                chars2_mask = Variable(torch.LongTensor(chars2_mask))
            targets = torch.LongTensor(tags)
            neg_log_likelihood = model.neg_log_likelihood(sentence_in.cuda(), targets.cuda(), chars2_mask.cuda(), chars2_length, d)
            loss = neg_log_likelihood.data / len(data['words'])
            losses.append(loss)
            p_bar.set_postfix_str(f'Loss {loss}')
            neg_log_likelihood.backward()
            torch.nn.utils.clip_grad_norm(model.parameters(), gradient_clip)
            optimizer.step()
        
        # One epoch adjust lr
        adjust_learning_rate(optimizer, lr=learning_rate/(1+decay_rate))
        
       
        print(f'Epoch {epoch} Validation starts')
        valid_score = evaluate(model, val_data)
        valid_scores.append(valid_score)
        print(f'Current valid score {valid_score:.3f} | Best valid  score {best_valid:.3f} ')
        if valid_score > best_valid:
            print(f'Saving to {save_dir}')
            torch.save(model.state_dict(), save_dir)
            best_valid = valid_score
            stop_count = 0
        else:
            print(f'No improvement for {stop_count} epoch')
            stop_count += 1
        if stop_count == 5:
            break

Due to time and computation constraints, we only train our model for 20 epochs with early stop

In [28]:
train(20, WordCNNmodel, train_data)

Epoch 1 training starts


  torch.nn.utils.clip_grad_norm(model.parameters(), gradient_clip)
100%|██████████| 14041/14041 [10:06<00:00, 23.14it/s, Loss 9.15527380129788e-06]   


Epoch 1 Validation starts


100%|██████████| 3250/3250 [04:04<00:00, 13.28it/s]


Current valid score 0.828 | Best valid  score 0.000 
Saving to models/cnn-word.pt
Epoch 2 training starts


100%|██████████| 14041/14041 [10:11<00:00, 22.96it/s, Loss 9.482247696723789e-05]  


Epoch 2 Validation starts


100%|██████████| 3250/3250 [04:05<00:00, 13.24it/s]


Current valid score 0.853 | Best valid  score 0.000 
Saving to models/cnn-word.pt
Epoch 3 training starts


100%|██████████| 14041/14041 [10:10<00:00, 22.99it/s, Loss 8.803147647995502e-05]  


Epoch 3 Validation starts


100%|██████████| 3250/3250 [04:04<00:00, 13.30it/s]


Current valid score 0.860 | Best valid  score 0.000 
Saving to models/cnn-word.pt
Epoch 4 training starts


100%|██████████| 14041/14041 [10:12<00:00, 22.94it/s, Loss 0.003604888916015625]   


Epoch 4 Validation starts


100%|██████████| 3250/3250 [03:43<00:00, 14.52it/s] 


Current valid score 0.873 | Best valid  score 0.000 
Saving to models/cnn-word.pt
Epoch 5 training starts


100%|██████████| 14041/14041 [10:10<00:00, 23.01it/s, Loss 1.811981201171875e-05]  


Epoch 5 Validation starts


100%|██████████| 3250/3250 [04:04<00:00, 13.27it/s]


Current valid score 0.870 | Best valid  score 0.000 
Saving to models/cnn-word.pt
Epoch 6 training starts


  1%|          | 145/14041 [00:06<10:18, 22.47it/s, Loss 0.001172614865936339]   


KeyboardInterrupt: 

The training is very slow (10 mins per epoch, 20 epochs will take more than 3 hours, so we interrupt the training). Since this is just an assignment which doesn't require comparing our model with the state-of-the-art. Later on, we only train 5 epochs for each model. As shown in the cell above, even we only train for 5 epochs, our model acheives decent performance (0.873 f1-score) on validation sets.

Now we evaluate its performance on the test set

In [55]:
WordCNNmodel.load_state_dict(torch.load('models/cnn-word.pt'))
WordCNNmodel = WordCNNmodel.cuda()
test_score = evaluate(WordCNNmodel, test_data)
print('CNN Word-level & CNN Char-level: ', test_score)

100%|██████████| 3453/3453 [03:37<00:00, 15.85it/s]

CNN Word-level & CNN Char-level:  0.8163854552016602





## Question one (v)
In the section, we compare the performance of our previous "CNN-Char-CNN-Word" model with "LSTM-CHAR-CNN-Word" model. To do so, we change the parameters['char_mode'] to 'lstm'

In prvevious sections, the out channels for CNN-Char-Encoder is 25. For a fair comparison, we also set the __char_lstm_dim__=25.

In [21]:
parameters['char_mode'] = 'LSTM'

In [22]:
CharLSTMWordCNNmodel = OneCNN_WordEncoderModel(
                   char_lstm_dim=25,
                   vocab_size=len(word_to_id),
                   tag_to_ix=tag_to_id,
                   embedding_dim=parameters['word_dim'],
                   hidden_dim=parameters['word_lstm_dim'],
                   use_gpu=use_gpu,
                   char_to_ix=char_to_id,
                   pre_word_embeds=word_embeds,
                   use_crf=parameters['crf'],
                   char_mode='LSTM')

In [23]:
count_parameters(CharLSTMWordCNNmodel)

2483155

In [26]:
train(5, CharLSTMWordCNNmodel, train_data, save_dir='models/lstm-char-cnn-word.pt')

Epoch 1 training starts


  torch.nn.utils.clip_grad_norm(model.parameters(), gradient_clip)
100%|██████████| 14041/14041 [15:25<00:00, 15.17it/s, Loss 0.0014972686767578125]  


Epoch 1 Validation starts


100%|██████████| 3250/3250 [03:13<00:00, 16.77it/s]


Current valid score 0.824 | Best valid  score 0.000 
Saving to models/cnn-char-cnn-word.pt
Epoch 2 training starts


100%|██████████| 14041/14041 [15:24<00:00, 15.19it/s, Loss 0.008248466067016125]   


Epoch 2 Validation starts


100%|██████████| 3250/3250 [03:35<00:00, 15.12it/s]


Current valid score 0.859 | Best valid  score 0.824 
Saving to models/cnn-char-cnn-word.pt
Epoch 3 training starts


100%|██████████| 14041/14041 [15:23<00:00, 15.20it/s, Loss 0.002460055984556675]   


Epoch 3 Validation starts


100%|██████████| 3250/3250 [04:32<00:00, 11.94it/s]


Current valid score 0.864 | Best valid  score 0.859 
Saving to models/cnn-char-cnn-word.pt
Epoch 4 training starts


100%|██████████| 14041/14041 [15:32<00:00, 15.05it/s, Loss 0.008756637573242188]   


Epoch 4 Validation starts


100%|██████████| 3250/3250 [04:22<00:00, 12.40it/s]


Current valid score 0.878 | Best valid  score 0.864 
Saving to models/cnn-char-cnn-word.pt
Epoch 5 training starts


100%|██████████| 14041/14041 [15:22<00:00, 15.22it/s, Loss 3.24249267578125e-05]   


Epoch 5 Validation starts


100%|██████████| 3250/3250 [04:24<00:00, 12.30it/s]


Current valid score 0.879 | Best valid  score 0.878 
Saving to models/cnn-char-cnn-word.pt


In [27]:
CharLSTMWordCNNmodel.load_state_dict(torch.load('models/lstm-char-cnn-word.pt'))
CharLSTMWordCNNmodel = CharLSTMWordCNNmodel.cuda()
test_score = evaluate(CharLSTMWordCNNmodel, test_data)
print('CNN Word-level & LSTM Char-level: ', test_score)

100%|██████████| 3453/3453 [03:45<00:00, 15.33it/s]

CNN Word-level & LSTM Char-level:  0.8307967770814682





We present the resuts at the table below. Char-LSTM-Word-CNN achieves better f1-score with more parameters.

| Model              | Parameters | F1-Score |
|--------------------|------------|----------|
| Char-CNN-Word-CNN  | 2434655    | 0.8163   |
| Char-LSTM-Word-CNN | 2483155    | __0.8307__   |

## Question One (vi)
In this section, we increase the number of CNN layer for word-level encoder.
Let's try a 3-layer-CNN for word encoding. First of all let's switch back to the CNN-based character-level encoder.

In [None]:
parameters['char_mode'] = 'CNN'

In [None]:
class MultiLayerCNN_WordEncoderModel(OneCNN_WordEncoderModel):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # Init word-level cnn
        
        cnns = [
            nn.Conv1d(
            in_channels=self.embedding_dim + self.out_channels,
            out_channels=2*self.hidden_dim,
            kernel_size=KERNAL_SIZE,
            padding=1), 
            nn.Conv1d(
            in_channels=self.embedding_dim + self.out_channels,
            out_channels=2*self.hidden_dim,
            kernel_size=KERNAL_SIZE,
            padding=1), 
            nn.Conv1d(
            in_channels=self.embedding_dim + self.out_channels,
            out_channels=2*self.hidden_dim,
            kernel_size=KERNAL_SIZE,
            padding=1), 
            
        ]
        self.word_cnn = nn.Sequential(*cnns)