[View in Colaboratory](https://colab.research.google.com/github/josd/eye/blob/master/transduction/transduction_dices/observation_prediction_dices.ipynb)

# Transduction from observation to prediction for dices

## Introduction

What is [Transduction (machine learning)](https://en.wikipedia.org/wiki/Transduction_(machine_learning%29):

> In logic, statistical inference, and supervised learning, transduction or
transductive inference is reasoning from observed, specific (training) cases
to specific (test) cases. In contrast, induction is reasoning from observed
training cases to general rules, which are then applied to the test cases.
The distinction is most interesting in cases where the predictions of the
transductive model are not achievable by any inductive model. Note that this
is caused by transductive inference on different test sets producing mutually
inconsistent predictions.

What is the Tensor2Tensor [Transformer model](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py):

> The Transformer model consists of an encoder and a decoder. Both are stacks
of self-attention layers followed by feed-forward layers. This model yields
good results on a number of problems, especially in NLP and machine translation.
See "Attention Is All You Need" (https://arxiv.org/abs/1706.03762) for the full
description of the model and the results obtained with its early version.

![Transformer model](https://pbs.twimg.com/media/DCKhefrUMAE9stK.jpg)

> The encoder is composed of a stack of N identical layers. Each layer has
two sub-layers. The first is a multi-head self-attention mechanism, and the
second is a simple, positionwise fully connected feed-forward network.
There is a residual connection around each of the two sub-layers, followed by
layer normalization.

> The decoder is also composed of a stack of N identical layers. In addition
to the two sub-layers in each encoder layer, the decoder inserts a third
sub-layer, which performs multi-head attention over the output of the encoder
stack. The self-attention sub-layer in the decoder stack is modified to prevent
positions from attending to subsequent positions.  This masking, combined with
the fact that the output embeddings are offset by one position, ensures that the
predictions for position i can depend only on the known outputs at positions
less than i.

In [1]:
# Preparation

# install tensor2tensor
! pip install -U tensor2tensor

# get the needed resources
! curl -o observation_prediction_dices.sh http://josd.github.io/eye/transduction/transduction_dices/observation_prediction_dices.sh
! curl -o observation_prediction_dices.py http://josd.github.io/eye/transduction/transduction_dices/observation_prediction_dices.py
! curl -o __init__.py http://josd.github.io/eye/transduction/transduction_dices/__init__.py
! curl -o sample_dices.observation http://josd.github.io/eye/transduction/transduction_dices/sample_dices.observation
! chmod +x observation_prediction_dices.sh

# clear data and model
% rm -fr /tmp/t2t_data/observation_prediction_dices
% rm -fr /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/

# start tensorboard
! curl -O https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
! unzip -o ngrok-stable-linux-amd64.zip
get_ipython().system_raw('tensorboard --logdir /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small --host 0.0.0.0 --port 6006 &')
get_ipython().system_raw('./ngrok http 6006 &')
! curl -s http://localhost:4040/api/tunnels | python3 -c "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

Collecting tensor2tensor
[?25l  Downloading https://files.pythonhosted.org/packages/15/fc/f1b2519081b2049ae4278e43e85b7529579e0c6511ecf167b348cbc9c445/tensor2tensor-1.6.2-py2.py3-none-any.whl (660kB)
[K    100% |████████████████████████████████| 665kB 5.9MB/s 
[?25hCollecting gym (from tensor2tensor)
[?25l  Downloading https://files.pythonhosted.org/packages/9b/50/ed4a03d2be47ffd043be2ee514f329ce45d98a30fe2d1b9c61dea5a9d861/gym-0.10.5.tar.gz (1.5MB)
[K    100% |████████████████████████████████| 1.5MB 11.2MB/s 
[?25hCollecting bz2file (from tensor2tensor)
  Downloading https://files.pythonhosted.org/packages/61/39/122222b5e85cd41c391b68a99ee296584b2a2d1d233e7ee32b4532384f2d/bz2file-0.98.tar.gz
Collecting gevent (from tensor2tensor)
[?25l  Downloading https://files.pythonhosted.org/packages/92/a3/827edd16c2d3557eb3168789d6d4e484b334a21da45c55fedf06e85df8ec/gevent-1.3.1-cp36-cp36m-manylinux1_x86_64.whl (4.5MB)
[K    100% |████████████████████████████████| 4.5MB 4.6MB/s 
[?25hColl

In [1]:
# See the observation_prediction_dices problem

! pygmentize -g observation_prediction_dices.py

[34mimport[39;49;00m [04m[36mrandom[39;49;00m
[34mfrom[39;49;00m [04m[36mtensor2tensor.data_generators[39;49;00m [34mimport[39;49;00m problem
[34mfrom[39;49;00m [04m[36mtensor2tensor.data_generators[39;49;00m [34mimport[39;49;00m text_problems
[34mfrom[39;49;00m [04m[36mtensor2tensor.utils[39;49;00m [34mimport[39;49;00m registry

[30;01m@registry.register_problem[39;49;00m
[34mclass[39;49;00m [04m[32mObservationPredictionDices[39;49;00m(text_problems.Text2TextProblem):
  [33m"""Transduction from observation to prediction for dices."""[39;49;00m

  [30;01m@property[39;49;00m
  [34mdef[39;49;00m [32mapprox_vocab_size[39;49;00m([36mself[39;49;00m):
    [34mreturn[39;49;00m [34m2[39;49;00m**[34m14[39;49;00m  [37m# ~16k[39;49;00m

  [30;01m@property[39;49;00m
  [34mdef[39;49;00m [32mis_generate_per_split[39;49;00m([36mself[39;49;00m):
    [37m# generate_data will shard the data into TRAIN and EVAL for us.[39;49;00m

In [2]:
# See the observation_prediction_dices script

! pygmentize -g observation_prediction_dices.sh

[37m#!/bin/bash[39;49;00m
[31mPROBLEM[39;49;00m=observation_prediction_dices
[31mMODEL[39;49;00m=transformer
[31mHPARAMS[39;49;00m=transformer_small

[31mUSER_DIR[39;49;00m=[31m$PWD[39;49;00m
[31mDATA_DIR[39;49;00m=/tmp/t2t_data/[31m$PROBLEM[39;49;00m
[31mTMP_DIR[39;49;00m=/tmp/t2t_datagen/[31m$PROBLEM[39;49;00m
[31mTRAIN_DIR[39;49;00m=/tmp/t2t_train/[31m$PROBLEM[39;49;00m/[31m$MODEL[39;49;00m-[31m$HPARAMS[39;49;00m

mkdir -p [31m$DATA_DIR[39;49;00m [31m$TMP_DIR[39;49;00m [31m$TRAIN_DIR[39;49;00m

[37m# Generate data[39;49;00m
t2t-datagen [33m\[39;49;00m
  --data_dir=[31m$DATA_DIR[39;49;00m [33m\[39;49;00m
  --problem=[31m$PROBLEM[39;49;00m [33m\[39;49;00m
  --t2t_usr_dir=[31m$USER_DIR[39;49;00m [33m\[39;49;00m
  --tmp_dir=[31m$TMP_DIR[39;49;00m

[37m# Train[39;49;00m
t2t-trainer [33m\[39;49;00m
  --data_dir=[31m$DATA_DIR[39;49;00m [33m\[39;49;00m
  --eval_steps=[34m10[39;49;00m [33m\[39;49;00m
  --h

In [3]:
# Run the observation_prediction_dices script

! ./observation_prediction_dices.sh

  from ._conv import register_converters as _register_converters
INFO:tensorflow:Importing user module transduction_dices from path /home/jdroo/github.com/josd/eye/transduction
[2018-05-20 12:35:47,560] Importing user module transduction_dices from path /home/jdroo/github.com/josd/eye/transduction
INFO:tensorflow:Generating problems:
    observation:
      * observation_prediction_dices
[2018-05-20 12:35:47,562] Generating problems:
    observation:
      * observation_prediction_dices
INFO:tensorflow:Generating data for observation_prediction_dices.
[2018-05-20 12:35:47,562] Generating data for observation_prediction_dices.
INFO:tensorflow:Generating vocab file: /tmp/t2t_data/observation_prediction_dices/vocab.observation_prediction_dices.16384.subwords
[2018-05-20 12:35:47,563] Generating vocab file: /tmp/t2t_data/observation_prediction_dices/vocab.observation_prediction_dices.16384.subwords
INFO:tensorflow:Trying min_count 500
[2018-05-20 12:35:48,273] Trying min_count 500
INFO:tens

INFO:tensorflow:Reading data files from /tmp/t2t_data/observation_prediction_dices/observation_prediction_dices-train*
[2018-05-20 12:36:00,633] Reading data files from /tmp/t2t_data/observation_prediction_dices/observation_prediction_dices-train*
INFO:tensorflow:partition: 0 num_data_files: 5
[2018-05-20 12:36:00,634] partition: 0 num_data_files: 5
INFO:tensorflow:Calling model_fn.
[2018-05-20 12:36:00,809] Calling model_fn.
INFO:tensorflow:Setting T2TModel mode to 'train'
[2018-05-20 12:36:03,595] Setting T2TModel mode to 'train'
INFO:tensorflow:Using variable initializer: uniform_unit_scaling
[2018-05-20 12:36:03,596] Using variable initializer: uniform_unit_scaling
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_38_256.bottom
[2018-05-20 12:36:03,627] Transforming feature 'inputs' with symbol_modality_38_256.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_38_256.targets_bottom
[2018-05-20 12:36:03,833] Transforming 'targets' with symbol_modalit

INFO:tensorflow:Running local_init_op.
[2018-05-20 12:37:02,865] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[2018-05-20 12:37:02,912] Done running local_init_op.
INFO:tensorflow:Evaluation [1/10]
[2018-05-20 12:37:04,117] Evaluation [1/10]
INFO:tensorflow:Evaluation [2/10]
[2018-05-20 12:37:04,240] Evaluation [2/10]
INFO:tensorflow:Evaluation [3/10]
[2018-05-20 12:37:04,358] Evaluation [3/10]
INFO:tensorflow:Evaluation [4/10]
[2018-05-20 12:37:04,476] Evaluation [4/10]
INFO:tensorflow:Evaluation [5/10]
[2018-05-20 12:37:04,593] Evaluation [5/10]
INFO:tensorflow:Evaluation [6/10]
[2018-05-20 12:37:04,711] Evaluation [6/10]
INFO:tensorflow:Evaluation [7/10]
[2018-05-20 12:37:04,828] Evaluation [7/10]
INFO:tensorflow:Evaluation [8/10]
[2018-05-20 12:37:04,946] Evaluation [8/10]
INFO:tensorflow:Evaluation [9/10]
[2018-05-20 12:37:05,063] Evaluation [9/10]
INFO:tensorflow:Evaluation [10/10]
[2018-05-20 12:37:05,187] Evaluation [10/10]
INFO:tensorflow:Finished evaluat

INFO:tensorflow:Graph was finalized.
[2018-05-20 12:38:06,493] Graph was finalized.
2018-05-20 12:38:06.493326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-20 12:38:06.493367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-20 12:38:06.493378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-20 12:38:06.493386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-20 12:38:06.493520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1502 MB memory) -> physical GPU (device: 0, name: GeForce 840M, pci bus id: 0000:03:00.0, compute capability: 5.0)
INFO:tensorflow:Restoring parameters from /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt-200
[2018-05-20 12:38:06,493] Restoring parameters from /tmp/t2t_

INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_38_256.bottom
[2018-05-20 12:39:05,567] Transforming feature 'inputs' with symbol_modality_38_256.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_38_256.targets_bottom
[2018-05-20 12:39:05,691] Transforming 'targets' with symbol_modality_38_256.targets_bottom
INFO:tensorflow:Building model body
[2018-05-20 12:39:05,700] Building model body
INFO:tensorflow:Transforming body output with symbol_modality_38_256.top
[2018-05-20 12:39:08,376] Transforming body output with symbol_modality_38_256.top
INFO:tensorflow:Done calling model_fn.
[2018-05-20 12:39:09,185] Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-05-20-10:39:09
[2018-05-20 12:39:09,202] Starting evaluation at 2018-05-20-10:39:09
INFO:tensorflow:Graph was finalized.
[2018-05-20 12:39:09,423] Graph was finalized.
2018-05-20 12:39:09.423469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2

INFO:tensorflow:Calling model_fn.
[2018-05-20 12:40:05,465] Calling model_fn.
INFO:tensorflow:Setting T2TModel mode to 'eval'
[2018-05-20 12:40:08,305] Setting T2TModel mode to 'eval'
INFO:tensorflow:Setting hparams.layer_prepostprocess_dropout to 0.0
[2018-05-20 12:40:08,306] Setting hparams.layer_prepostprocess_dropout to 0.0
INFO:tensorflow:Setting hparams.relu_dropout to 0.0
[2018-05-20 12:40:08,306] Setting hparams.relu_dropout to 0.0
INFO:tensorflow:Setting hparams.symbol_dropout to 0.0
[2018-05-20 12:40:08,306] Setting hparams.symbol_dropout to 0.0
INFO:tensorflow:Setting hparams.dropout to 0.0
[2018-05-20 12:40:08,306] Setting hparams.dropout to 0.0
INFO:tensorflow:Setting hparams.attention_dropout to 0.0
[2018-05-20 12:40:08,306] Setting hparams.attention_dropout to 0.0
INFO:tensorflow:Using variable initializer: uniform_unit_scaling
[2018-05-20 12:40:08,307] Using variable initializer: uniform_unit_scaling
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_38_

INFO:tensorflow:loss = 0.53496945, step = 400
[2018-05-20 12:40:34,735] loss = 0.53496945, step = 400
INFO:tensorflow:Saving checkpoints for 500 into /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt.
[2018-05-20 12:41:07,473] Saving checkpoints for 500 into /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt.
INFO:tensorflow:Loss for final step: 0.5332501.
[2018-05-20 12:41:08,216] Loss for final step: 0.5332501.
INFO:tensorflow:Evaluating model now.
[2018-05-20 12:41:08,216] Evaluating model now.
INFO:tensorflow:Reading data files from /tmp/t2t_data/observation_prediction_dices/observation_prediction_dices-dev*
[2018-05-20 12:41:08,222] Reading data files from /tmp/t2t_data/observation_prediction_dices/observation_prediction_dices-dev*
INFO:tensorflow:partition: 0 num_data_files: 5
[2018-05-20 12:41:08,223] partition: 0 num_data_files: 5
INFO:tensorflow:Calling model_fn.
[2018-05-20 12:41:08,374] Calling model_fn.
IN

INFO:tensorflow:Running local_init_op.
[2018-05-20 12:41:28,506] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[2018-05-20 12:41:28,558] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 501 into /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt.
[2018-05-20 12:41:36,665] Saving checkpoints for 501 into /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt.
INFO:tensorflow:loss = 0.5278151, step = 500
[2018-05-20 12:41:37,384] loss = 0.5278151, step = 500
INFO:tensorflow:Saving checkpoints for 600 into /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt.
[2018-05-20 12:42:10,198] Saving checkpoints for 600 into /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt.
INFO:tensorflow:Loss for final step: 0.52420855.
[2018-05-20 12:42:10,942] Loss for final step: 0.52420855.
INFO:tensorflow:Evaluating model now.
[2018-05-20 

INFO:tensorflow:Graph was finalized.
[2018-05-20 12:42:30,789] Graph was finalized.
2018-05-20 12:42:30.789558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-20 12:42:30.789598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-20 12:42:30.789619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-20 12:42:30.789629: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-20 12:42:30.789769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1502 MB memory) -> physical GPU (device: 0, name: GeForce 840M, pci bus id: 0000:03:00.0, compute capability: 5.0)
INFO:tensorflow:Restoring parameters from /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt-600
[2018-05-20 12:42:30,790] Restoring parameters from /tmp/t2t_

INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_38_256.bottom
[2018-05-20 12:43:26,073] Transforming feature 'inputs' with symbol_modality_38_256.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_38_256.targets_bottom
[2018-05-20 12:43:26,197] Transforming 'targets' with symbol_modality_38_256.targets_bottom
INFO:tensorflow:Building model body
[2018-05-20 12:43:26,207] Building model body
INFO:tensorflow:Transforming body output with symbol_modality_38_256.top
[2018-05-20 12:43:28,857] Transforming body output with symbol_modality_38_256.top
INFO:tensorflow:Base learning rate: 0.200000
[2018-05-20 12:43:28,967] Base learning rate: 0.200000
INFO:tensorflow:Trainable Variables Total size: 3699200
[2018-05-20 12:43:28,976] Trainable Variables Total size: 3699200
INFO:tensorflow:Using optimizer Adam
[2018-05-20 12:43:28,976] Using optimizer Adam
INFO:tensorflow:Done calling model_fn.
[2018-05-20 12:43:33,032] Done calling model_fn.
INFO:tensorflow:Create

INFO:tensorflow:Calling model_fn.
[2018-05-20 12:44:25,856] Calling model_fn.
INFO:tensorflow:Setting T2TModel mode to 'train'
[2018-05-20 12:44:28,706] Setting T2TModel mode to 'train'
INFO:tensorflow:Using variable initializer: uniform_unit_scaling
[2018-05-20 12:44:28,707] Using variable initializer: uniform_unit_scaling
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_38_256.bottom
[2018-05-20 12:44:28,738] Transforming feature 'inputs' with symbol_modality_38_256.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_38_256.targets_bottom
[2018-05-20 12:44:28,861] Transforming 'targets' with symbol_modality_38_256.targets_bottom
INFO:tensorflow:Building model body
[2018-05-20 12:44:28,871] Building model body
INFO:tensorflow:Transforming body output with symbol_modality_38_256.top
[2018-05-20 12:44:31,500] Transforming body output with symbol_modality_38_256.top
INFO:tensorflow:Base learning rate: 0.200000
[2018-05-20 12:44:31,611] Base learning rate:

INFO:tensorflow:Calling model_fn.
[2018-05-20 12:45:28,427] Calling model_fn.
INFO:tensorflow:Setting T2TModel mode to 'train'
[2018-05-20 12:45:31,246] Setting T2TModel mode to 'train'
INFO:tensorflow:Using variable initializer: uniform_unit_scaling
[2018-05-20 12:45:31,247] Using variable initializer: uniform_unit_scaling
INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_38_256.bottom
[2018-05-20 12:45:31,278] Transforming feature 'inputs' with symbol_modality_38_256.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_38_256.targets_bottom
[2018-05-20 12:45:31,401] Transforming 'targets' with symbol_modality_38_256.targets_bottom
INFO:tensorflow:Building model body
[2018-05-20 12:45:31,410] Building model body
INFO:tensorflow:Transforming body output with symbol_modality_38_256.top
[2018-05-20 12:45:34,049] Transforming body output with symbol_modality_38_256.top
INFO:tensorflow:Base learning rate: 0.200000
[2018-05-20 12:45:34,160] Base learning rate:

  from ._conv import register_converters as _register_converters
INFO:tensorflow:Importing user module transduction_dices from path /home/jdroo/github.com/josd/eye/transduction
[2018-05-20 12:46:38,566] Importing user module transduction_dices from path /home/jdroo/github.com/josd/eye/transduction
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
[2018-05-20 12:46:38,640] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/utils/trainer_lib.py:151: RunConfig.__init__ (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
INFO:tensorflow:schedule=continuous_train_and_eval
[2018-05-20 12:46:38,640] schedule=continuous_train_and_eval
INFO:tensorflow:worker_gpu=1
[2018-05-20 12:46:38,640] worker_gpu=1
INFO:tensorflow:sync=False
[2018-05-20 12:46:38,640] sync=Fa

2018-05-20 12:46:46.195355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-20 12:46:46.195382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-20 12:46:46.195401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-20 12:46:46.195581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1502 MB memory) -> physical GPU (device: 0, name: GeForce 840M, pci bus id: 0000:03:00.0, compute capability: 5.0)
INFO:tensorflow:Restoring parameters from /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt-1000
[2018-05-20 12:46:46,213] Restoring parameters from /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
[2018-05-20 12:46:46,423] Running local_init_op.
INFO:tensorflow:Done running 

In [4]:
# See the transductions
# For each observation the top 6 predictions are shown with their respective log probability

! pygmentize -g sample_dices.observation
print("->-")
! pygmentize -g sample_dices.prediction

A_THROW
A_THROW
A_THROW
A_THROW
A_THROW
A_THROW
->-
A 2	-1.63	A 5	-1.66	A 1	-1.70	A 6	-1.71	A 3	-1.77	A 4	-1.83
A 2	-1.63	A 5	-1.66	A 1	-1.70	A 6	-1.71	A 3	-1.77	A 4	-1.83
A 2	-1.63	A 5	-1.66	A 1	-1.70	A 6	-1.71	A 3	-1.77	A 4	-1.83
A 2	-1.63	A 5	-1.66	A 1	-1.70	A 6	-1.71	A 3	-1.77	A 4	-1.83
A 2	-1.63	A 5	-1.66	A 1	-1.70	A 6	-1.71	A 3	-1.77	A 4	-1.83
A 2	-1.63	A 5	-1.66	A 1	-1.70	A 6	-1.71	A 3	-1.77	A 4	-1.83


In [5]:
import os

import tensorflow as tf

from tensor2tensor import problems
from tensor2tensor.bin import t2t_decoder  # To register the hparams set
from tensor2tensor.utils import registry
from tensor2tensor.utils import trainer_lib
from tensor2tensor.visualization import attention
from tensor2tensor.visualization import visualization

  from ._conv import register_converters as _register_converters


In [6]:
%%javascript
require.config({
  paths: {
      d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min'
  }
});

<IPython.core.display.Javascript object>

## HParams

In [7]:
# PUT THE MODEL YOU WANT TO LOAD HERE!
CHECKPOINT = os.path.expanduser('/tmp/t2t_train/observation_prediction_dices/transformer-transformer_small')

In [8]:
# HParams
problem_name = 'observation_prediction_dices'
data_dir = os.path.expanduser('/tmp/t2t_data/observation_prediction_dices')
model_name = "transformer"
hparams_set = "transformer_small"

## Visualization

In [9]:
import observation_prediction_dices

visualizer = visualization.AttentionVisualizer(hparams_set, model_name, data_dir, problem_name, beam_size=1)

INFO:tensorflow:Setting T2TModel mode to 'eval'


[2018-05-20 12:59:48,229] Setting T2TModel mode to 'eval'


INFO:tensorflow:Setting hparams.layer_prepostprocess_dropout to 0.0


[2018-05-20 12:59:48,230] Setting hparams.layer_prepostprocess_dropout to 0.0


INFO:tensorflow:Setting hparams.relu_dropout to 0.0


[2018-05-20 12:59:48,231] Setting hparams.relu_dropout to 0.0


INFO:tensorflow:Setting hparams.dropout to 0.0


[2018-05-20 12:59:48,232] Setting hparams.dropout to 0.0


INFO:tensorflow:Setting hparams.attention_dropout to 0.0


[2018-05-20 12:59:48,234] Setting hparams.attention_dropout to 0.0


INFO:tensorflow:Setting hparams.symbol_dropout to 0.0


[2018-05-20 12:59:48,235] Setting hparams.symbol_dropout to 0.0


INFO:tensorflow:Using variable initializer: uniform_unit_scaling


[2018-05-20 12:59:48,247] Using variable initializer: uniform_unit_scaling


INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_38_256.bottom


[2018-05-20 12:59:48,267] Transforming feature 'inputs' with symbol_modality_38_256.bottom


INFO:tensorflow:Transforming 'targets' with symbol_modality_38_256.targets_bottom


[2018-05-20 12:59:48,390] Transforming 'targets' with symbol_modality_38_256.targets_bottom


INFO:tensorflow:Building model body


[2018-05-20 12:59:48,401] Building model body


Instructions for updating:
keep_dims is deprecated, use keepdims instead


[2018-05-20 12:59:48,640] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/layers/common_layers.py:607: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead


INFO:tensorflow:Transforming body output with symbol_modality_38_256.top


[2018-05-20 12:59:50,788] Transforming body output with symbol_modality_38_256.top


Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



[2018-05-20 12:59:50,864] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/layers/common_layers.py:1880: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



INFO:tensorflow:Greedy Decoding


[2018-05-20 12:59:50,905] Greedy Decoding


Instructions for updating:
keep_dims is deprecated, use keepdims instead


[2018-05-20 12:59:52,796] From /usr/local/lib/python3.5/dist-packages/tensor2tensor/layers/common_layers.py:2967: calling reduce_logsumexp (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead


In [10]:
tf.Variable(0, dtype=tf.int64, trainable=False, name='global_step')

sess = tf.train.MonitoredTrainingSession(
    checkpoint_dir=CHECKPOINT,
    save_summaries_secs=0,
)

INFO:tensorflow:Create CheckpointSaverHook.


[2018-05-20 12:59:53,733] Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


[2018-05-20 12:59:54,660] Graph was finalized.


INFO:tensorflow:Restoring parameters from /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt-1000


[2018-05-20 12:59:55,356] Restoring parameters from /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt-1000


INFO:tensorflow:Running local_init_op.


[2018-05-20 12:59:55,610] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


[2018-05-20 12:59:55,635] Done running local_init_op.


In [11]:
input_sentence = "A_THROW"
output_string, inp_text, out_text, att_mats = visualizer.get_vis_data_from_string(sess, input_sentence)
print(output_string)

INFO:tensorflow:Saving checkpoints for 1000 into /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt.


[2018-05-20 12:59:59,851] Saving checkpoints for 1000 into /tmp/t2t_train/observation_prediction_dices/transformer-transformer_small/model.ckpt.


A 2<EOS>


## Interpreting the Visualizations
- The layers drop down allow you to view the different Transformer layers, 0-indexed of course.
  - Tip: The first layer, last layer and 2nd to last layer are usually the most interpretable.
- The attention dropdown allows you to select different pairs of encoder-decoder attentions:
  - All: Shows all types of attentions together. NOTE: There is no relation between heads of the same color - between the decoder self attention and decoder-encoder attention since they do not share parameters.
  - Input - Input: Shows only the encoder self-attention.
  - Input - Output: Shows the decoder’s attention on the encoder. NOTE: Every decoder layer attends to the final layer of encoder so the visualization will show the attention on the final encoder layer regardless of what layer is selected in the drop down.
  - Output - Output: Shows only the decoder self-attention. NOTE: The visualization might be slightly misleading in the first layer since the text shown is the target of the decoder, the input to the decoder at layer 0 is this text with a GO symbol prepreded.
- The colored squares represent the different attention heads.
  - You can hide or show a given head by clicking on it’s color.
  - Double clicking a color will hide all other colors, double clicking on a color when it’s the only head showing will show all the heads again.
- You can hover over a word to see the individual attention weights for just that position.
  - Hovering over the words on the left will show what that position attended to.
  - Hovering over the words on the right will show what positions attended to it.

In [12]:
attention.show(inp_text, out_text, *att_mats)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>