# Answer Sentence Selection - Visualization

This notebook shows how to load and play with models in ipython/jupyter context (the advantage is immediate feedback in case of syntax errors etc., rather than long loading times before each experiment) and how to do simple visualizations like token attention intensity.

In [1]:
%load_ext autoreload
%autoreload 2

from __future__ import print_function
from __future__ import division

import numpy as np
import pysts.embedding as emb
import pysts.loader as loader
import pysts.eval as ev

from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.layers.recurrent import SimpleRNN, GRU, LSTM

import pysts.kerasts.blocks as B
from pysts.kerasts.callbacks import AnsSelCB
from pysts.kerasts.objectives import ranknet

import anssel_train

Using Theano backend.
Using gpu device 0: Tesla K20m (CNMeM is disabled)


Initialization.  Autoreload makes sure that if anything in the common code is changed, it is reloaded when a cell is executed the next time.  Note that manually re-importing pysts.kerasts.blocks sometimes fails for whatever reason.

In [3]:
glove = emb.GloVe(300)

In [4]:
s0, s1, y, vocab, gr = anssel_train.load_set('../anssel-wang/train-all.csv')
s0t, s1t, yt, _, grt = anssel_train.load_set('../anssel-wang/dev.csv', vocab)

Load datasets.

## Train a Simple Predefined Model

In [5]:
import models.cnn as M

In [7]:
conf, ps, h = anssel_train.config(M.config, [])
### insert your default parameter changes here
# conf['l2reg'] = 0
model = anssel_train.build_model(glove, vocab, M.prep_model, conf)
model.fit(gr, validation_data=grt,
          callbacks=[AnsSelCB(s0t, grt),
                     ModelCheckpoint('weights-ntb.h5', save_best_only=True, monitor='mrr', mode='max'),
                     EarlyStopping(monitor='mrr', mode='max', patience=4)],
          batch_size=160, nb_epoch=16, samples_per_epoch=5000)
model.load_weights('weights-ntb.h5')
ev.eval_anssel(model.predict(gr)['score'][:,0], s0, y, 'Train')
ev.eval_anssel(model.predict(grt)['score'][:,0], s0t, yt, 'Val')

Train on 42783 samples, validate on 1117 samples
Epoch 1/16
Epoch 2/16
Epoch 3/16
Epoch 4/16
Epoch 5/16
Epoch 6/16
Epoch 7/16
Epoch 8/16
Epoch 9/16
Epoch 10/16
Epoch 11/16
Epoch 12/16
Epoch 13/16
Epoch 14/16
Epoch 15/16
Epoch 16/16
Train Accuracy: raw 0.938317 (y=0 0.973980, y=1 0.651816), bal 0.812898
Train MRR: 0.901080  (on training set, y=0 is subsampled!)
Val Accuracy: raw 0.828111 (y=0 0.982456, y=1 0.141463), bal 0.561960
Val MRR: 0.874872  


0.87487179487179478

This is a basic "run unit" code that builds, compiles, trains and benchmarks a model.  Piece of cake!

## Load a Pre-trained Model

In [8]:
import models.attn1511 as M

In [10]:
conf, ps, h = anssel_train.config(M.config, [])
### insert your default parameter changes here
# conf['l2reg'] = 0
model = anssel_train.build_model(glove, vocab, M.prep_model, conf)
model.load_weights('../weights-attn1511-4b1c525a282fd583-bestval.h5')
ev.eval_anssel(model.predict(gr)['score'][:,0], s0, y, 'Train')
ev.eval_anssel(model.predict(grt)['score'][:,0], s0t, yt, 'Val')

Train Accuracy: raw 0.953346 (y=0 0.995400, y=1 0.615498), bal 0.805449
Train MRR: 0.948843  (on training set, y=0 is subsampled!)
Val Accuracy: raw 0.825425 (y=0 0.992325, y=1 0.082927), bal 0.537626
Val MRR: 0.883333  


0.88333333333333341

## Visualize Per-token Attention

Let's take look at how the model sees the training data on token level internally.

In [54]:
from pysts.kerasts import graph_input_slice
sl = slice(500, 1000)
grs = graph_input_slice(grt, sl)
s0s = s0t[sl]
s1s = s1t[sl]

We will operate on a subset of validation data here.

In [55]:
import theano
def layer_fun(model, gr, layer_name):
    thf = theano.function([model.inputs[name].input for name in model.input_order],
                          model.nodes[layer_name].get_output(train=False),
                          on_unused_input='ignore', allow_input_downcast=True)
    return thf(*[gr[name] for name in model.input_order])

def predict_internal(model, gr):
    pred = model.predict(gr)
    ypred = pred['score'][:, 0]
    if 'tokens' in pred:
        tpred = pred['tokens']
    else:
        tpred = None

    # e1a is e0-driven attention
    # (use e1a[3] inst. of e1a[2] to get the softmax focus)
    e1a = layer_fun(model, gr, 'e1a[2]')
    # e0c, e1c are convolutions that are max-pooled for summary embedding
    # (so more important areas should get higher convolution norm?)
    e0c = layer_fun(model, gr, 'e0c')
    e1c = layer_fun(model, gr, 'e1c')
    return (ypred, tpred, e1a, e0c, e1c)

ypred, tpred, e1a, e0c, e1c = predict_internal(model, grs)

Let's get the value of a variety of hidden layers of attn1511 that should be correlated with the importance of each token (or filtlen-gram starting at that index).

In [56]:
def predict_table(s0, s1, gr, e0rgb, e1rgb):
    from IPython.display import HTML
    from numpy.linalg import norm
    h = []
    for i in range(len(s0)):
        def rgbnorm(rgb, i):
            rgbi = [0, 0, 0]
            rgbin = [0, 0, 0]
            for j in range(3):
                try:
                    rgbi[j] = [norm(e) for e in rgb[j][i]]
                    rgbin[j] = rgbi[j] / np.max(rgbi[j])
                except TypeError:  # 0 inst. of list
                    rgbi[j] = [0 for e in range(anssel_train.s0pad)]
                    rgbin[j] = rgbi[j]
            return (rgbi, rgbin)
        e0rgbi, e0rgbin = rgbnorm(e0rgb, i)
        e1rgbi, e1rgbin = rgbnorm(e1rgb, i)

        def tokcolor(rgb, rgbn, j, t):
            return ('<span style="background: rgb(%d,%d,%d)" title="%.3f | %.3f | %.3f">%s' %
                    (128+rgbn[0][j]*128, 128+rgbn[1][j]*128, 128+rgbn[2][j]*128,
                     norm(rgb[0][j]), norm(rgb[1][j]), norm(rgb[2][j]), t))
        toks0 = ' '.join([tokcolor(e0rgbi, e0rgbin, j, t) for j, t in enumerate(s0[i][:38])])
        toks1 = ' '.join([tokcolor(e1rgbi, e1rgbin, j, t) for j, t in enumerate(s1[i][:38])])

        h.append('<tr style="%s"><td style="color: rgb(%d,0,0)">%.3f<td>%d<td>%s<td>%s' %
                 (' font-weight: bold' if gr['score'][i] == 1. else '',
                  0, ypred[i], gr['score'][i],
                  toks0, toks1))
    return HTML('<table>' + ''.join(h) + '</table>')

We use Jupyter HTML output capabilities here to produce nice heatmaps.

The red channel is "a priori attention" determined by static n-gram convolution, while the blue channel is "a posteriori" attention determined by a question-answer focus mechanism.

In [57]:
predict_table(s0s, s1s, grs, [e0c, 0, 0], [e1c, 0, e1a])

0,1,2,3
-8.306,0,Where is the company Rohm and Haas located ?,"The combined company will have annual revenues of $ billion , executives said Monday .-7.8940Where is the company Rohm and Haas located ?Morton 's salt business , which will represent percent of the combined company 's revenues , will retain its headquarters in Chicago , its home base since .-7.5920Where is the company Rohm and Haas located ?The company makes a wide range of specialty chemicals , which include additives that enhance the performance or quality of end products .-0.1011What is Rohm and Haas 's annual revenue ?Rohm and Haas , with $ billion in annual sales , makes chemicals found in such products as decorative and industrial paints , semiconductors and shampoos .-0.6771What is Rohm and Haas 's annual revenue ?The deal is the latest in a series of recent acquisitions by Rohm and Haas , a Philadelphia -based manufacturer of chemicals found in products including paints , semiconductors and shampoos , with $ billion in annual-7.5070What is Rohm and Haas 's annual revenue ?The transaction announced today creates a global specialty chemicals company with combined annual revenues of $ billion .-7.8410What is Rohm and Haas 's annual revenue ?The combined company will have annual revenues of $ billion , executives said Monday .-6.8150What is Rohm and Haas 's annual revenue ?Chicago -based Morton , whose products also include adhesives , dyes and electronic materials , had total annual sales of $ billion for the fiscal year that ended June , .-1.1010What is Rohm and Haas 's annual revenue ?About jobs will be cut , contributing to annual cost savings of about $ million , said"
