In [1]:
import sys
sys.path.insert(1, '../')

import torch
import torch.optim as optim

from models import generator
from models import discriminator
from trainers import train_generator_MLE, train_generator_PG, train_discriminator

%load_ext autoreload
%autoreload 2

# Tutorial Part 1 Overall Story

here we will describe each step of the pipeline at a high level to help us contextualize our future learnings


## Synthetic Data Experiment

The most accurate way of evaluating generative models is that we draw some samples from it and let human observers re- view them based on their prior knowledge. We assume that the human observer has learned an accurate model of the natural distribution p_human(x). 
 
the authots used a randomly initialized language model as the true model, aka, the ***oracle***, to generate the "real" data distribution p(x_t |x_1 , . . . , x_t−1 ). The benefit of having such oracle is that firstly, it provides the training dataset and secondly evaluates the exact perfor- mance of the generative models, which will not be possible with real data.

In [2]:
# experimental constants

CUDA = False
VOCAB_SIZE = 5000
MAX_SEQ_LEN = 20

BATCH_SIZE = 32
START_LETTER = 0

GEN_EMBEDDING_DIM = 32 # length of input vectors for generator and oracle
GEN_HIDDEN_DIM = 32 # length of hidden state for generator and oracle

oracle_state_dict_path = '../oracle_EMBDIM32_HIDDENDIM32_VOCAB5000_MAXSEQLEN20.trc'
oracle_samples_path = '../oracle_samples.trc'


MLE_TRAIN_EPOCHS = 100
ADV_TRAIN_EPOCHS = 50
POS_NEG_SAMPLES = 10000

DIS_EMBEDDING_DIM = 64 # length of input vectors for discriminator
DIS_HIDDEN_DIM = 64 # length of hidden state for discriminator

pretrained_gen_path = '../gen_MLEtrain_EMBDIM32_HIDDENDIM32_VOCAB5000_MAXSEQLEN20.trc'
pretrained_dis_path = '../dis_pretrain_EMBDIM_64_HIDDENDIM64_VOCAB5000_MAXSEQLEN20.trc'

In [3]:
oracle = generator.Generator(GEN_EMBEDDING_DIM, GEN_HIDDEN_DIM, VOCAB_SIZE, MAX_SEQ_LEN, gpu=CUDA)

# for reproducibiility we provide saved parameters for the oracle
oracle.load_state_dict(torch.load(oracle_state_dict_path))

oracle

Generator(
  (embeddings): Embedding(5000, 32)
  (gru): GRU(32, 32)
  (gru2out): Linear(in_features=32, out_features=5000, bias=True)
)

The output above should look like this

```
Generator(
  (embeddings): Embedding(5000, 32)
  (gru): GRU(32, 32)
  (gru2out): Linear(in_features=32, out_features=5000, bias=True)
)
```

To explain the information above: the model has 5000 possible input vectors in its vocab each with length 32, the GRU takes vectors  length 32 and outputs activation of the same length. The output returns activations of the same length as the vocab.

the authors use the oracle to generate 10,000 sequences of length 20 as the training set S for the generative models.
we have already used helpers.batchwise_sample() to save S so you can load it below

In [4]:
oracle_samples = torch.load(oracle_samples_path).type(torch.LongTensor)
print(type(oracle_samples), oracle_samples.shape)

<class 'torch.Tensor'> torch.Size([10000, 20])


### instantiate a generator and discriminator

In [5]:
gen = generator.Generator(GEN_EMBEDDING_DIM, GEN_HIDDEN_DIM, VOCAB_SIZE, MAX_SEQ_LEN, gpu=CUDA)
dis = discriminator.Discriminator(DIS_EMBEDDING_DIM, DIS_HIDDEN_DIM, VOCAB_SIZE, MAX_SEQ_LEN, gpu=CUDA)

In [16]:
gen

Generator(
  (embeddings): Embedding(5000, 32)
  (gru): GRU(32, 32)
  (gru2out): Linear(in_features=32, out_features=5000, bias=True)
)

In [17]:
dis

Discriminator(
  (embeddings): Embedding(5000, 64)
  (gru): GRU(64, 64, num_layers=2, dropout=0.2, bidirectional=True)
  (gru2hidden): Linear(in_features=256, out_features=64, bias=True)
  (dropout_linear): Dropout(p=0.2, inplace=False)
  (hidden2out): Linear(in_features=64, out_features=1, bias=True)
)

### If you have and want to use GPU, all models and model inputs need to be on GPU

In [18]:
if CUDA:
    oracle = oracle.cuda()
    gen = gen.cuda()
    dis = dis.cuda()
    oracle_samples = oracle_samples.cuda()

### GENERATOR MLE TRAINING

At the beginning of the training, the authors used maximum likelihood estimation (MLE) to pretrain Gθ on training set S. 

They found the supervised signal from the pretrained discriminator is informative to help adjust the generator efficiently.

```
# GENERATOR MLE TRAINING
print('Starting Generator MLE Training...')
gen_optimizer = optim.Adam(gen.parameters(), lr=1e-2)
train_generator_MLE(gen, gen_optimizer, oracle, oracle_samples, MLE_TRAIN_EPOCHS)
torch.save(gen.state_dict(), pretrained_gen_path)

# PRETRAIN DISCRIMINATOR
print('Starting Discriminator Training...')
dis_optimizer = optim.Adagrad(dis.parameters())
train_discriminator(dis, dis_optimizer, oracle_samples, gen, oracle, d_steps = 50,  epochs = 3)
torch.save(dis.state_dict(), pretrained_dis_path)
```

The below pretraining pretraining only needs to be done once, once saved it can be loaded using `model.load_state_dict(torch.load(pretrained_gen_path))` two cells down while skipping th enet two cells



In [24]:
# GENERATOR MLE TRAINING
print('Starting Generator MLE Training...')
gen_optimizer = optim.Adam(gen.parameters(), lr=1e-2)
train_generator_MLE(gen, gen_optimizer, oracle, oracle_samples, MLE_TRAIN_EPOCHS)
torch.save(gen.state_dict(), pretrained_gen_path)

Starting Generator MLE Training...


In [7]:
# PRETRAIN DISCRIMINATOR
print('Starting Discriminator Training...')
dis_optimizer = optim.Adagrad(dis.parameters())
train_discriminator(dis, dis_optimizer, oracle_samples, gen, oracle, d_steps = 50,  epochs = 3)
torch.save(dis.state_dict(), pretrained_dis_path)

Starting Discriminator Training...


In [8]:
# load pretrained generator and discrimnator
gen.load_state_dict(torch.load(pretrained_gen_path))
dis.load_state_dict(torch.load(pretrained_dis_path))

<All keys matched successfully>