# NLP Assignment #4
### by Prodromos Kampouridis MTN2203

#### IMPORTANT NOTE
##### *Due to the large length of code, the answers to tasks 1-5 can also be found as markdowns in the cells below.*


##### *For more detailed information, please refer to the report entitled PRODROMOS KAMPOURIDIS REPORT*.

## A. TRANSITION-BASED DEPENDENCY PARSER

### Introduction

In this part of the assignment we will explore a transition-based dependency parsing model which is based on a feed-forward neural network architecture from the work of Chan & Manning (2014). This model detects unlabeled dependencies, meaning that the type of the dependency in not taken into account. As part of the assignment, we will explore various modifications in the model's architecture, and their effect in the performance of the model, in terms of UAS evaluation metric.


We have modified the `parser_model.py` and `utils/parser_utils.py` files to implement the changes described in the questions. Moreover, we have enriched the `run.py` script with additional arguments, which control the behavior we want to achieve in each question.

## Answers

### 1.

Firstly, we run the experiment in its original form, by executing `run.py` without any flags.

Command:
```
python run.py
```

The model's performance on the test set is:

test UAS: 89.19

In [None]:
!python run.py

INITIALIZING
Loading data...
took 4.20 seconds
Building parser...
took 3.23 seconds
Loading pretrained embeddings...
took 7.02 seconds
Vectorizing data...
took 1.25 seconds
Preprocessing training data...
took 43.36 seconds
took 0.10 seconds

TRAINING
Epoch 1 out of 10
100% 1848/1848 [02:09<00:00, 14.27it/s]
Average Train Loss: 0.18183151351134758
Evaluating on dev set
1445850it [00:00, 25382599.14it/s]
- dev UAS: 84.35
New best dev UAS! Saving model.

Epoch 2 out of 10
100% 1848/1848 [02:06<00:00, 14.63it/s]
Average Train Loss: 0.11481908756384859
Evaluating on dev set
1445850it [00:00, 23382549.80it/s]
- dev UAS: 86.08
New best dev UAS! Saving model.

Epoch 3 out of 10
100% 1848/1848 [02:05<00:00, 14.76it/s]
Average Train Loss: 0.10064809365818898
Evaluating on dev set
1445850it [00:00, 48713817.59it/s]
- dev UAS: 87.15
New best dev UAS! Saving model.

Epoch 4 out of 10
100% 1848/1848 [02:08<00:00, 14.42it/s]
Average Train Loss: 0.0917026315194865
Evaluating on dev set
1445850it [00:0

### 2.
Running the experiment with random word embeddings rather than pretrained. To achieve this, we execute the `run.py` script with the flag `--disable_pretrained_embeddings`. Consequently, `__main__` calls the `load_and_preprocess_data` function with `use_pretrained_word_embeddings=False`. Inside the `load_and_preprocess_data` function, we have modified the following snippet in the code:

```
embeddings_matrix = np.asarray(np.random.normal(0, 0.9, (parser.n_tokens, 50)), dtype='float32')

if use_pretrained_word_embeddings:
    for token in parser.tok2id:
        i = parser.tok2id[token]
        if token in word_vectors:
            embeddings_matrix[i] = word_vectors[token]
        elif token.lower() in word_vectors:
            embeddings_matrix[i] = word_vectors[token.lower()]
```

In this snippet, the embeddings_matrix is initialized with random vectors. Then, for each token in the vocabulary, if there exists a word embedding for this token, the corresponding random vector is replaced by the pretrained vector. Since POS tags do not appear in the word_vectors vocabulary, they remain randomly initialized. We modified the code by adding the `if use_pretrained_word_embeddings` condition. If this condition is False, the usage of pretrained embeddings is skipped and all the embeddings (word and pos) are randomly initialized.


Command:
```
python run.py --disable_pretrained_embeddings
```

After running the experiment, the performance of the model in the test set is:

test UAS: 87.98

Using random word embeddings instead of pretrained ones leads in a decline in performance compared to the previous question.

In [None]:
!python run.py --disable_pretrained_embeddings

INITIALIZING
Loading data...
took 1.69 seconds
Building parser...
took 1.08 seconds
Loading pretrained embeddings...
Vectorizing data...
took 1.26 seconds
Preprocessing training data...
took 44.39 seconds
took 0.02 seconds

TRAINING
Epoch 1 out of 10
100% 1848/1848 [02:05<00:00, 14.71it/s]
Average Train Loss: 0.2085644712800652
Evaluating on dev set
1445850it [00:00, 48363780.51it/s]
- dev UAS: 81.68
New best dev UAS! Saving model.

Epoch 2 out of 10
100% 1848/1848 [02:05<00:00, 14.78it/s]
Average Train Loss: 0.12934378006402805
Evaluating on dev set
1445850it [00:00, 48341419.86it/s]
- dev UAS: 84.70
New best dev UAS! Saving model.

Epoch 3 out of 10
100% 1848/1848 [02:05<00:00, 14.70it/s]
Average Train Loss: 0.11290142575109546
Evaluating on dev set
1445850it [00:00, 28938415.91it/s]
- dev UAS: 85.57
New best dev UAS! Saving model.

Epoch 4 out of 10
100% 1848/1848 [02:06<00:00, 14.62it/s]
Average Train Loss: 0.1035408309189143
Evaluating on dev set
1445850it [00:00, 47714972.57it/s]

### 3.
To disable the usage of POS features, we execute the `run.py` script with the flag  `--disable_pos_features`. We have modified the `load_and_preprocess_data` function to receive the `use_pos_features` parameter. Internally, `load_and_preprocess_data` replaces the value of the `config.use_pos variable`, with the value of the `use_pos_features` parameter, which is now `False`, and disables the extraction of POS features. The same change in Config also happens during the creation of the `Parser` object inside the `load_and_preprocess_data` method, by again passing a `use_pos_features` parameter. Additionally, inside `main` we create the `ParserModel` by explicitly setting `n_features=parser.n_features` which is 18 when POS features are disabled, instead of 36 when enabled.

Command:
```
python run.py --disable_pos_features
```

After running the experiment, the performance of the model is:

test UAS: 87.40

Compared to the result of the first question (UAS: 89.19), the performance declines when the model does not use the POS features.

In [None]:
!python run.py --disable_pos_features

INITIALIZING
Loading data...
took 1.57 seconds
Building parser...
took 1.08 seconds
Loading pretrained embeddings...
took 2.95 seconds
Vectorizing data...
took 1.53 seconds
Preprocessing training data...
took 35.05 seconds
took 0.03 seconds

TRAINING
Epoch 1 out of 10
100% 1848/1848 [01:22<00:00, 22.35it/s]
Average Train Loss: 0.22439072232338644
Evaluating on dev set
1445850it [00:00, 47611952.88it/s]
- dev UAS: 80.10
New best dev UAS! Saving model.

Epoch 2 out of 10
100% 1848/1848 [01:27<00:00, 21.17it/s]
Average Train Loss: 0.1373219888625078
Evaluating on dev set
1445850it [00:00, 46826306.21it/s]
- dev UAS: 83.41
New best dev UAS! Saving model.

Epoch 3 out of 10
100% 1848/1848 [01:25<00:00, 21.67it/s]
Average Train Loss: 0.11797573992345499
Evaluating on dev set
1445850it [00:00, 46517761.06it/s]
- dev UAS: 85.05
New best dev UAS! Saving model.

Epoch 4 out of 10
100% 1848/1848 [01:24<00:00, 21.79it/s]
Average Train Loss: 0.10690105495998611
Evaluating on dev set
1445850it [00:0

### 4. Adding 1 extra 100-D hidden layer
We modified the ParserModel class in parser_model.py file, so that its constructor receives one extra parameter `extra_hidden_size`. If its value is None, the model remains in its initial form. Otherwise, an extra hidden layer is created named `hidden_to_hidden`, with input size equal to the output size of the previous hidden layer (`embed_to_hidden`) and output size equal to `extra_hidden_size`, as shown in the following snippet:
```
self.hidden_to_hidden = None
if self.extra_hidden_size is not None:
      self.hidden_to_hidden = nn.Linear(self.hidden_size, self.extra_hidden_size, bias=True)
      nn.init.xavier_uniform_(self.hidden_to_hidden.weight) #in-place function
```
Additionally, we change the input size of the output layer to `extra_hidden_size`, i.e. 100, as shown in the else clause below:
```
if self.extra_hidden_size is None:
    self.hidden_to_logits = nn.Linear(self.hidden_size, self.n_classes, bias=True)
else:
    self.hidden_to_logits = nn.Linear(self.extra_hidden_size, self.n_classes, bias=True)
```
Finally, we apply the extra hidden layer (`hidden_to_hidden`), in the forward function, and apply the activation function (which is ReLU for this question):
```
h = self.activation_func(self.embed_to_hidden(embeddings))
if self.hidden_to_hidden is not None:
    h = self.activation_func(self.hidden_to_hidden(h))
logits = self.hidden_to_logits(self.dropout(h))
```

To activate this extra hidden layer in the experiment, we execute the `run.py` script with the flag `--enable_extra_hidden`, which consequently creates a ParserModel with `extra_hidden_size=100` inside the `__main__` method.

Command:
```
python run.py --enable_extra_hidden
```

After running the experiment, the performance of the model is:

test UAS: 89.19

Compared to the first question, the model's performance remains unchanged with the addition of an extra hidden layer.

In [None]:
!python run.py --enable_extra_hidden

INITIALIZING
Loading data...
took 1.56 seconds
Building parser...
took 1.12 seconds
Loading pretrained embeddings...
took 2.44 seconds
Vectorizing data...
took 1.33 seconds
Preprocessing training data...
took 44.24 seconds
took 0.02 seconds

TRAINING
Epoch 1 out of 10
100% 1848/1848 [02:23<00:00, 12.84it/s]
Average Train Loss: 0.16588238836770064
Evaluating on dev set
1445850it [00:00, 48033951.72it/s]
- dev UAS: 84.19
New best dev UAS! Saving model.

Epoch 2 out of 10
100% 1848/1848 [02:23<00:00, 12.86it/s]
Average Train Loss: 0.09664869482641096
Evaluating on dev set
1445850it [00:00, 46585298.77it/s]
- dev UAS: 86.80
New best dev UAS! Saving model.

Epoch 3 out of 10
100% 1848/1848 [02:42<00:00, 11.41it/s]
Average Train Loss: 0.08101522086744571
Evaluating on dev set
1445850it [00:00, 48480932.78it/s]
- dev UAS: 87.77
New best dev UAS! Saving model.

Epoch 4 out of 10
100% 1848/1848 [02:42<00:00, 11.39it/s]
Average Train Loss: 0.0715264335468218
Evaluating on dev set
1445850it [00:0

### 5.
First, we implement the cube activation function inside the ParserModel class, with the following definition:
```
def cube_activation(x):
    return torch.pow(x, 3)
```
We have modified the constructor of the ParserModel class, to accept an extra parameter `activation_func`. Inside the constructor, the `self.activation_func` property of the class is set to either `F.relu` or `cube_activation`, depending on the value of `activation_func` as shown bellow:
```
if activation_func == "relu":
    self.activation_func = F.relu
elif activation_func == "cube":
    self.activation_func = ParserModel.cube_activation
```
Then, in `forward()`, instead of calling directly relu or cube, we use the self.activation_func property of the class, which contains the desired activation function.

To use the `cube` function in this experiment, we execute the `run.py` script with the argument `--activation_function=cube`.

Command:
```
python run.py --activation_function=cube
```

After running the experiment, the performance of the model is:

test UAS: 87.38

Lastly, the model’s performance is lower when using the cube activation, compared to the question 1 with UAS equal to 89.19.

In [None]:
!python run.py --activation_function=cube

INITIALIZING
Loading data...
took 1.53 seconds
Building parser...
took 1.09 seconds
Loading pretrained embeddings...
took 2.24 seconds
Vectorizing data...
took 1.25 seconds
Preprocessing training data...
took 43.28 seconds
took 0.02 seconds

TRAINING
Epoch 1 out of 10
100% 1848/1848 [02:12<00:00, 13.96it/s]
Average Train Loss: 0.327132300147182
Evaluating on dev set
1445850it [00:00, 46586372.38it/s]
- dev UAS: 80.86
New best dev UAS! Saving model.

Epoch 2 out of 10
100% 1848/1848 [02:39<00:00, 11.60it/s]
Average Train Loss: 0.1465652146582286
Evaluating on dev set
1445850it [00:00, 44148880.97it/s]
- dev UAS: 83.38
New best dev UAS! Saving model.

Epoch 3 out of 10
100% 1848/1848 [02:36<00:00, 11.80it/s]
Average Train Loss: 0.12767950702636016
Evaluating on dev set
1445850it [00:00, 48049936.52it/s]
- dev UAS: 85.47
New best dev UAS! Saving model.

Epoch 4 out of 10
100% 1848/1848 [02:36<00:00, 11.79it/s]
Average Train Loss: 0.11764733023231938
Evaluating on dev set
1445850it [00:00,