# Thinking in tensors in PyTorch

Hands-on training  by [Piotr Migdał](https://p.migdal.pl) (2019). 


## Transformer models


* [GPT-2 - better language models and their implications](https://openai.com/blog/better-language-models/) by Open AI


PROMPT: 

> **Cities & Lights**
> 
> When you enter the city of Singapore during the night, you see lights: colorful and ubiquitous. Lights on every building, on every fountain, and in every park.

GENERATED:

>  Lights shining in a city in which the majority of people are now using mobile phones. Singapore has a bright future as a technology hub, and it 's not too late to make it happen.

Inspired by [Invisible Cities by Italo Calvino](https://en.wikipedia.org/wiki/Invisible_Cities).

### Interactive demos

* [Write With Transformer by Hugging Face](https://transformer.huggingface.co/)
    * [GPT-2 large](https://transformer.huggingface.co/doc/gpt2-large)
* [Gwern's AI-generated poetry](https://slatestarcodex.com/2019/03/14/gwerns-ai-generated-poetry/) and [GPT-2 Neural Network Poetry](https://www.gwern.net/GPT-2)
* [AI Dungeon](https://www.aidungeon.io/) - a text-based adventure game powered by GPT-2

### This example

https://github.com/huggingface/pytorch-transformers

https://huggingface.co/pytorch-transformers/index.html

Heavily based on https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_generation.py

Note: models are BIG.

In [None]:
!pip install pytorch_transformers

In [None]:
import argparse
import logging
from tqdm import trange

import torch
import torch.nn.functional as F
import numpy as np

from pytorch_transformers import GPT2Config
from pytorch_transformers import GPT2LMHeadModel, GPT2Tokenizer

In [None]:
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
                    datefmt = '%m/%d/%Y %H:%M:%S',
                    level = logging.INFO)
logger = logging.getLogger(__name__)

In [None]:
MAX_LENGTH = int(10000)  # Hardcoded max length to avoid infinite loop

ALL_MODELS = sum((tuple(conf.pretrained_config_archive_map.keys()) for conf in (GPT2Config,)), ())

MODEL_CLASSES = {
    'gpt2': (GPT2LMHeadModel, GPT2Tokenizer)
}

print(ALL_MODELS)

### Sampling functions

In [None]:
def top_k_top_p_filtering(logits, top_k=0, top_p=0.0, filter_value=-float('Inf')):
    """ Filter a distribution of logits using top-k and/or nucleus (top-p) filtering
        Args:
            logits: logits distribution shape (vocabulary size)
            top_k > 0: keep only top k tokens with highest probability (top-k filtering).
            top_p > 0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).
                Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)
        From: https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317
    """
    assert logits.dim() == 1  # batch size 1 for now - could be updated for more but the code would be less clear
    top_k = min(top_k, logits.size(-1))  # Safety check
    if top_k > 0:
        # Remove all tokens with a probability less than the last token of the top-k
        indices_to_remove = logits < torch.topk(logits, top_k)[0][..., -1, None]
        logits[indices_to_remove] = filter_value

    if top_p > 0.0:
        sorted_logits, sorted_indices = torch.sort(logits, descending=True)
        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)

        # Remove tokens with cumulative probability above the threshold
        sorted_indices_to_remove = cumulative_probs > top_p
        # Shift the indices to the right to keep also the first token above the threshold
        sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
        sorted_indices_to_remove[..., 0] = 0

        indices_to_remove = sorted_indices[sorted_indices_to_remove]
        logits[indices_to_remove] = filter_value
    return logits

In [None]:
def sample_sequence(model, length, context, num_samples=1, temperature=1, top_k=0, top_p=0.0, device='cpu'):
    context = torch.tensor(context, dtype=torch.long, device=device)
    context = context.unsqueeze(0).repeat(num_samples, 1)
    generated = context
    with torch.no_grad():
        for _ in trange(length):

            inputs = {'input_ids': generated}

            outputs = model(**inputs)  # Note: we could also use 'past' with GPT-2/Transfo-XL/XLNet (cached hidden-states)
            next_token_logits = outputs[0][0, -1, :] / temperature
            filtered_logits = top_k_top_p_filtering(next_token_logits, top_k=top_k, top_p=top_p)
            next_token = torch.multinomial(F.softmax(filtered_logits, dim=-1), num_samples=1)
            generated = torch.cat((generated, next_token.unsqueeze(0)), dim=1)
    return generated

### Loading model

In [None]:
model_type = 'gpt2' #@param ["gpt2"]
model_name_or_path = 'gpt2-medium'  #@param ["gpt2", "gpt2-medium", "gpt2-large"]
device = 'cuda'  #@param ["cuda", "cpu"]
# device auto?

In [None]:
# instead of args
model_class, tokenizer_class = MODEL_CLASSES[model_type]

In [None]:
# this line downloads things
tokenizer = tokenizer_class.from_pretrained(model_name_or_path)

In [None]:
# this even more
# and loading itself takes ~20 sec
model = model_class.from_pretrained(model_name_or_path)

In [None]:
model.to(device)
model.eval()

In [None]:
generate_length = 64  #@param {type:"integer"}
temperature = 1.  #@param {type:"slider", min:0.1, max:5.0, step:0.1}
top_k = 50  #@param {type:"integer"}
top_p = 0.  #@param {type:"slider", min:0.0, max:1.0, step:0.05}
text_prompt = 'Before going into the wilderedness, make sure that' #@param {type:"string"}

context_tokens = tokenizer.encode(text_prompt)
out = sample_sequence(
    model=model,
    context=context_tokens,
    length=generate_length,
    temperature=temperature,
    top_k=top_k
    top_p=top_p,
    device=device
)
out = out[0, len(context_tokens):].tolist()
text = tokenizer.decode(out, clean_up_tokenization_spaces=True)

print(raw_text)
print(text)