# One-Shot Learning for Language Modelling

This notebooks allows for quick and easy experimentation with the work done by Group 17 of the Statistical Natural Language Processing module at UCL, formed by:

- Talip Ucar (talip.ucar.16@ucl.ac.uk)
- Adrian Daniel Szwarc (adrian.szwarc.18@ucl.ac.uk)
- Matthew Lee (matthew.lee.16@ucl.ac.uk)
- Adrian Gonzalez-Martin (adrian.martin.18@ucl.ac.uk)

Our work implements the Matching Networks architecture ([Vinyals et al.,
2016](http://arxiv.org/abs/1606.04080)) in `pytorch` and applies it to a
Language Modelling task. We then experiment with using different distance metrics and episodes sizes.

More details can be found in the associated paper or in the repository https://github.com/adriangonz/statistical-nlp-17.


## Setup

We will first setup Colab's environment by:

* Installing `pipenv`, the dependencies management tool we use.
* Clone the repository (found at [adriangonz/statistical-nlp-17](https://github.com/adriangonz/statistical-nlp-17)).
* Install its dependencies.

In [5]:
!pip install pipenv

Collecting pipenv
[?25l  Downloading https://files.pythonhosted.org/packages/13/b4/3ffa55f77161cff9a5220f162670f7c5eb00df52e00939e203f601b0f579/pipenv-2018.11.26-py3-none-any.whl (5.2MB)
[K    100% |████████████████████████████████| 5.2MB 5.2MB/s 
[?25hCollecting virtualenv (from pipenv)
[?25l  Downloading https://files.pythonhosted.org/packages/33/5d/314c760d4204f64e4a968275182b7751bd5c3249094757b39ba987dcfb5a/virtualenv-16.4.3-py2.py3-none-any.whl (2.0MB)
[K    100% |████████████████████████████████| 2.0MB 17.3MB/s 
[?25hCollecting virtualenv-clone>=0.2.5 (from pipenv)
  Downloading https://files.pythonhosted.org/packages/e3/d9/d9c56deb483c4d3289a00b12046e41428be64e8236fa210111a1f57cc42d/virtualenv_clone-0.5.1-py2.py3-none-any.whl
Installing collected packages: virtualenv, virtualenv-clone, pipenv
Successfully installed pipenv-2018.11.26 virtualenv-16.4.3 virtualenv-clone-0.5.1


In [1]:
!git clone https://github.com/adriangonz/statistical-nlp-17

Cloning into 'statistical-nlp-17'...
remote: Enumerating objects: 114, done.[K
remote: Counting objects: 100% (114/114), done.[K
remote: Compressing objects: 100% (71/71), done.[K
remote: Total 805 (delta 68), reused 80 (delta 43), pack-reused 691[K
Receiving objects: 100% (805/805), 349.89 MiB | 36.08 MiB/s, done.
Resolving deltas: 100% (446/446), done.
Checking out files: 100% (78/78), done.


In [2]:
%cd statistical-nlp-17

/content/statistical-nlp-17


In [6]:
!pipenv install

[39m[1mCreating a virtualenv for this project…[39m[22m
Pipfile: [31m[1m/content/statistical-nlp-17/Pipfile[39m[22m
[39m[1mUsing[39m[22m [31m[1m/usr/local/bin/python[39m[22m [32m[22m(3.6.7)[39m[22m [39m[1mto create virtualenv…[39m[22m
⠋[0m Creating virtual environment...[K[34m[22mUsing base prefix '/usr'
New python executable in /root/.local/share/virtualenvs/statistical-nlp-17-bg53uuH_/bin/python3
Also creating executable in /root/.local/share/virtualenvs/statistical-nlp-17-bg53uuH_/bin/python
Installing setuptools, pip, wheel...
done.
Running virtualenv with interpreter /usr/local/bin/python
[39m[22m
[K[?25h[32m[22m✔ Successfully created virtual environment![39m[22m[0m 
Virtualenv location: [32m[22m/root/.local/share/virtualenvs/statistical-nlp-17-bg53uuH_[39m[22m
[39m[1mInstalling dependencies from Pipfile.lock (84d074)…[39m[22m
[33m[22mIgnoring appnope: markers 'sys_platform == "darwin"' don't match your environment[39m[22m
  🐍   [32

## Loading model

We can now load one of the pre-trained models. In particular, we will choose a model trained with the following parameters:

* Poincaré as distance metric.
* $N = 5$
* $k = 3$

The `state_dict` for this model can be found in [models/poincare_vanilla_N=5_k=3_model_34.pth](https://github.com/adriangonz/statistical-nlp-17/blob/master/models/poincare_vanilla_N%3D5_k%3D3_model_34.pth).

In [0]:
import os
import csv

import torch

from src.matching_network import MatchingNetwork
from src.utils import extract_model_parameters, get_model_name
from src.data import read_vocab, read_data_set, reverse_tensor
from src.datasets import EpisodesDataset
from src.evaluation import _episode_to_text

In [0]:
def load_model(model_path):
    model_file_name = os.path.basename(model_path)
    distance, embeddings, N, k = extract_model_parameters(model_file_name)
    model_name = get_model_name(distance, embeddings, N, k)
    model = MatchingNetwork(model_name, distance_metric=distance)
    model_state_dict = torch.load(model_path)
    model.load_state_dict(model_state_dict)

    return model, N, k

In [24]:
model, N, k = load_model("./models/poincare_vanilla_N=5_k=3_model_34.pth")
model.eval()

MatchingNetwork(
  (encode): EncodingLayer(
    (encoding_layer): Embedding(27443, 64, padding_idx=1)
  )
  (g): GLayer(
    (fce_layer): LSTM(64, 64, batch_first=True, bidirectional=True)
  )
  (f): FLayer(
    (lstm_cell): LSTMCell(64, 64)
  )
)

## Defining test set

We will now define a small test set with some labels and sentences pairs. From this set, an episode will be sampled and the output will be predicted following the meta-testing framework described in the paper.

In [0]:
test_text = {
    "particularly": [
        "their work which used <unk> paints to create designs representing body painting and ground sculptures rapidly spread across indigenous communities of central australia <blank_token> after the introduction of a government sanctioned art program in central australia in N",
        "the center of education since the colonial period manila <blank_token> <unk> is home to several philippine universities and colleges as well as its oldest ones",
        "over next two months however fluctuations in sea surface temperatures <blank_token> those in the central pacific caused the group to revise their predictions downward and indicated the probability for a slightly below average typhoon season in their june forecast",
        "a breech <unk> could be <unk> without moving the gun a lengthy process <blank_token> if the gun then needed to be re aimed"
    ],
    "score": [
        "in N the magazine chose the <blank_token> as one of N essential soundtracks it believed spoke to the complex and innovative relationships between music and screen storytelling",
        "the music is used like a visual cue so that lester and the <blank_token> are staring at angela",
        "currently the film holds an N <blank_token> on rotten tomatoes based on N reviews with an average rating of N N N the critical consensus reads <unk> cast and <unk> with dark acid wit american beauty is a smart provocative high point of late 90s mainstream hollywood film",
        "instead <unk> was drawn to the emotion and darkness he began to use the <blank_token> and shots he had intended to <unk> to craft the film along these lines"
    ],
    "managed": [
        "following another <unk> by <unk> <unk> <blank_token> to secure a <unk> and force a submission at N N of the first round and <unk> lost in his <unk> debut",
        "at summerslam <unk> defeated <unk> to become the wwe world heavyweight champion and during the match he delivered sixteen <unk> most of which were german <unk> and two f <unk> to <unk> who barely <blank_token> any offense",
        "all the schools under the national education system are <blank_token> by the <unk> district education office",
        "he <blank_token> to win a seat later when a special election was held after <unk> opened several seats"
    ],
    "director": [
        "a music video for the single was shot with <blank_token> tony <unk> and was released on june N N online through yahoo",
        "emmanuel <unk> jewish holocaust survivor and <blank_token> of the search party to find hitler after <unk> out of a death pit in <unk> he never took the time to <unk> and embarked on a life consuming obsession to bring those responsible for the <unk> to justice",
        "by the end of the year reports of a critical backlash suggested american beauty was the underdog in the race for best picture however at the golden globe awards in january N american beauty won best film best <blank_token> and best screenplay",
        "<unk> was named best <blank_token> by the new york film critics circle awards and <unk> and carroll shared the writers guild of america award for best written drama"
    ],
    "music": [
        "the <blank_token> video received N N million views in a N hour period and positive commentary from reviewers who appreciated its <unk> <unk> nature",
        "a <blank_token> video for the single was shot with director <unk> brown in los angeles the video received a premier on mtv 's <unk> ball on june N N",
        "in addition to <unk> the <unk> also introduces a redesigned <unk> revised title sequence and theme <blank_token> and sees changes to the doctor 's costume",
        "the album the first independent release by <unk> after he was signed by sony <blank_token> in N and warner music in N was issued by his own label <unk>"  
    ]
}

For ease of re-using the current implementation, we will write these to a CSV, to then read it back as numericalised tensors using a pre-computed vocabulary.

In [0]:
with open("data/test-colab.csv", 'w') as output_file:
    writer = csv.writer(output_file)
    writer.writerow(["label", "sentence"])

    for label, sentences in test_text.items():
        for sentence in sentences:
            writer.writerow([label, sentence])

In [0]:
vocab = read_vocab("data/vocab.json")
X_test, y_test = read_data_set("data/test-colab.csv", vocab)

## Generating episode

We can now load our data into a `DataSet`, and generate a meta-testing episode, composed of $N = 5$ labels and $k = 3$ examples.

By default, the target and examples will be chosen randomly out of the test set.

In [0]:
test_set = EpisodesDataset(X_test, y_test, k=k)

In [0]:
episode = test_set[(0, 1, 2, 3, 4)]

We can visualise the content of the chosen examples and target query.

In [0]:
support_set_text, targets_text, support_labels, _ = _episode_to_text(*episode, vocab)

In [110]:
support_set_text

array([['over next two months however fluctuations in sea surface temperatures <blank_token> those in the central pacific caused the group to revise their predictions downward and indicated the probability for a slightly below average typhoon season in their june forecast',
        'the center of education since the colonial period manila <blank_token> <unk> is home to several philippine universities and colleges as well as its oldest ones',
        'a breech <unk> could be <unk> without moving the gun a lengthy process <blank_token> if the gun then needed to be re aimed'],
       ['in N the magazine chose the <blank_token> as one of N essential soundtracks it believed spoke to the complex and innovative relationships between music and screen storytelling',
        'instead <unk> was drawn to the emotion and darkness he began to use the <blank_token> and shots he had intended to <unk> to craft the film along these lines',
        'currently the film holds an N <blank_token> on rotten t

In [111]:
support_labels

array(['particularly', 'score', 'managed', 'director', 'music'],
      dtype='<U12')

In [112]:
targets_text

array(['their work which used <unk> paints to create designs representing body painting and ground sculptures rapidly spread across indigenous communities of central australia <blank_token> after the introduction of a government sanctioned art program in central australia in N'],
      dtype='<U269')

## Making prediction

Finally, we can predict the label of the target query.

In [0]:
# Shape as batch
support_set, targets, labels, _ = episode
batch = (
    support_set.unsqueeze(0),
    targets.unsqueeze(0),
    labels.unsqueeze(0))
predictions = model(batch)

In [0]:
predicted_label = predictions.squeeze().argmax()

In [116]:
reverse_tensor(predicted_label.unsqueeze(0), vocab)[0]

'particularly'