# Homework and bake-off: pragmatic color descriptions

In [1]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2020"

## Contents

1. [Overview](#Overview)
1. [Set-up](#Set-up)
1. [All two-word examples as a dev corpus](#All-two-word-examples-as-a-dev-corpus)
1. [Dev dataset](#Dev-dataset)
1. [Random train–test split for development](#Random-train–test-split-for-development)
1. [Question 1: Improve the tokenizer [1 point]](#Question-1:-Improve-the-tokenizer-[1-point])
1. [Use the tokenizer](#Use-the-tokenizer)
1. [Question 2: Improve the color representations [1 point]](#Question-2:-Improve-the-color-representations-[1-point])
1. [Use the color representer](#Use-the-color-representer)
1. [Initial model](#Initial-model)
1. [Question 3: GloVe embeddings [1 points]](#Question-3:-GloVe-embeddings-[1-points])
1. [Try the GloVe representations](#Try-the-GloVe-representations)
1. [Question 4: Color context [3 points]](#Question-4:-Color-context-[3-points])
1. [Your original system [3 points]](#Your-original-system-[3-points])
1. [Bakeoff [1 point]](#Bakeoff-[1-point])

## Overview

This homework and associated bake-off are oriented toward building an effective system for generating color descriptions that are pragmatic in the sense that they would help a reader/listener figure out which color was being referred to in a shared context consisting of a target color (whose identity is known only to the describer/speaker) and a set of distractors.

The notebook [colors_overview.ipynb](colors_overview.ipynb) should be studied before work on this homework begins. That notebook provides backgroud on the task, the dataset, and the modeling code that you will be using and adapting.

The homework questions are more open-ended than previous ones have been. Rather than asking you to implement pre-defined functionality, they ask you to try to improve baseline components of the full system in ways that you find to be effective. As usual, this culiminates in a prompt asking you to develop a novel system for entry into the bake-off. In this case, though, the work you do for the homework will likely be directly incorporated into that system.

## Set-up

See [colors_overview.ipynb](colors_overview.ipynb) for set-up in instructions and other background details.

In [2]:
from colors import ColorsCorpusReader
import os
from sklearn.model_selection import train_test_split
from torch_color_selector import (
    ColorizedNeuralListener, create_example_dataset)
from torch_color_describer import (
    ColorizedInputDescriber, create_example_dataset)
import utils
from utils import START_SYMBOL, END_SYMBOL, UNK_SYMBOL

In [3]:
utils.fix_random_seeds()

In [4]:
COLORS_SRC_FILENAME = os.path.join(
    "data", "colors", "filteredCorpus.csv")

## All two-word examples as a dev corpus

So that you don't have to sit through excessively long training runs during development, I suggest working with the two-word-only subset of the corpus until you enter into the late stages of system testing.

In [5]:
dev_corpus = ColorsCorpusReader(
    COLORS_SRC_FILENAME, 
    word_count=None, 
    normalize_colors=True)

In [6]:
dev_examples = list(dev_corpus.read())

This subset has about one-third the examples of the full corpus:

In [7]:
len(dev_examples)

46994

We __should__ worry that it's not a fully representative sample. Most of the descriptions in the full corpus are shorter, and a large proportion are longer. So this dataset is mainly for debugging, development, and general hill-climbing. All findings should be validated on the full dataset at some point.

## Dev dataset

The first step is to extract the raw color and raw texts from the corpus:

In [8]:
def load_from_pickle():
    import pickle 
    with open('dev_vocab_speaker.pickle', 'rb') as handle:
        dev_vocab = pickle.load(handle)
    with open('dev_vocab_listener.pickle', 'rb') as handle:
        dev_vocab_listener = pickle.load(handle)
    with open('dev_seqs_test.pickle', 'rb') as handle:
        dev_seqs_test = pickle.load(handle)
    with open('dev_seqs_train_speaker.pickle', 'rb') as handle:
        dev_seqs_train = pickle.load(handle)
    with open('dev_cols_test.pickle', 'rb') as handle:
        dev_cols_test = pickle.load(handle)
    with open('dev_cols_train_speaker.pickle', 'rb') as handle:
        dev_cols_train = pickle.load(handle)
    with open('dev_glove_vocab.pickle', 'rb') as handle:
        dev_glove_vocab = pickle.load(handle)
    with open('dev_glove_embedding.pickle', 'rb') as handle:
        dev_glove_embedding = pickle.load(handle)
    with open('embedding.pickle', 'rb') as handle:
        embedding = pickle.load(handle)
    return dev_vocab, dev_vocab_listener, dev_seqs_test, dev_seqs_train, dev_cols_test, dev_cols_train, \
dev_glove_vocab, dev_glove_embedding, embedding
dev_vocab, dev_vocab_listener, dev_seqs_test, dev_seqs_train, dev_cols_test, dev_cols_train, dev_glove_vocab, \
dev_glove_embedding, embedding = load_from_pickle()

## Literal speaker

In [9]:
toy_color_seqs, toy_word_seqs, toy_vocab = create_example_dataset(
    group_size=50, vec_dim=2)

In [10]:
toy_color_seqs_train, toy_color_seqs_test, toy_word_seqs_train, toy_word_seqs_test = \
    train_test_split(toy_color_seqs, toy_word_seqs)

In [11]:
toy_mod = ColorizedInputDescriber(
    toy_vocab, 
    embed_dim=10, 
    hidden_dim=100, 
    max_iter=10, 
    batch_size=128)

Using cuda


In [12]:
_ = toy_mod.fit(toy_color_seqs_train, toy_word_seqs_train)

Epoch 0; train err = 1.6312565803527832; time = 0.3290736675262451
Epoch 1; train err = 1.5512980222702026; time = 0.023005247116088867
Epoch 2; train err = 1.4668164253234863; time = 0.023005008697509766
Epoch 3; train err = 1.355759859085083; time = 0.02200460433959961
Epoch 4; train err = 1.2426749467849731; time = 0.023005247116088867
Epoch 5; train err = 1.1513525247573853; time = 0.02200484275817871
Epoch 6; train err = 1.1144208908081055; time = 0.022995471954345703
Epoch 7; train err = 1.0158658027648926; time = 0.02200460433959961
Epoch 8; train err = 0.9626907110214233; time = 0.023005008697509766
Epoch 9; train err = 0.8614288568496704; time = 0.021996021270751953


In [13]:
toy_mod.listener_accuracy(toy_color_seqs_test, toy_word_seqs_test)

0.7368421052631579

If that worked, then you can now try this model on SCC problems!

In [14]:
literal_listener_listener = ColorizedNeuralListener(
    dev_vocab_listener, 
    #embedding=dev_glove_embedding, 
    embed_dim=100,
    embedding=embedding,
    hidden_dim=100, 
    max_iter=100,
    batch_size=256,
    dropout_prob=0.,
    eta=0.001,
    lr_rate=0.96,
    warm_start=True,
    device='cuda')
literal_listener_listener.load_model("literal_listener_with_attention_listener_split.pt")

Using cuda


In [15]:
dev_color_mod = ColorizedInputDescriber(
    dev_glove_vocab, 
    embedding=dev_glove_embedding, 
    hidden_dim=100, 
    max_iter=5, 
    eta=0.0005,
    batch_size=32,
    warm_start=True)
#dev_color_mod.load_model("literal_speaker.pt")
#dev_color_mod.warm_start=True
#dev_color_mod.opt = dev_color_mod.optimizer(
#                dev_color_mod.model.parameters(),
#                lr=dev_color_mod.eta,
#                weight_decay=dev_color_mod.l2_strength)

Using cuda


In [16]:
def calc_performance(speaker, listener, cols):
    speaker_preds_test = speaker.predict(cols)
    listened_preds = listener.predict(cols, speaker_preds_test)
    correct = sum([1 if x == 2 else 0 for x in listened_preds])
    print("test", correct, "/", len(listened_preds), correct/len(listened_preds))

In [20]:
for i in range(9):
    dev_color_mod.fit(dev_cols_train, dev_seqs_train)
    
    calc_performance(dev_color_mod, literal_listener_listener, dev_cols_test)

Epoch 45; train err = 332.3955352306366; time = 12.813606023788452
Epoch 46; train err = 334.60132697224617; time = 13.113934516906738
Epoch 47; train err = 330.8800345361233; time = 13.065903902053833
Epoch 48; train err = 326.39075142145157; time = 14.95833444595337
Epoch 49; train err = 322.96645595133305; time = 15.567483186721802
test 9864 / 11749 0.8395608136862712
Epoch 50; train err = 325.7755722999573; time = 15.631486177444458
Epoch 51; train err = 326.77504739165306; time = 15.681498050689697
Epoch 52; train err = 323.1483790129423; time = 15.379430294036865
Epoch 53; train err = 319.0581514984369; time = 15.963561534881592
Epoch 54; train err = 320.0287114083767; time = 15.819517850875854
test 9959 / 11749 0.8476466082219763
Epoch 55; train err = 319.82313945889473; time = 15.365428447723389
Epoch 56; train err = 315.4929445683956; time = 16.141589641571045
Epoch 57; train err = 314.95708388090134; time = 15.940555572509766
Epoch 58; train err = 315.3841543495655; time = 15

In [18]:
dev_color_mod.listener_accuracy(dev_cols_test, dev_seqs_test)

  perp = [np.prod(s)**(-1/len(s)) for s in scores]


0.8085794535705166

In [19]:
dev_perp = dev_color_mod.perplexities(dev_cols_test, dev_seqs_test)
dev_perp[0]

1.1630750132721086

In [None]:
dev_color_mod.save_model('literal_speaker.pt')