Shows how one can generate text given a prompt and some hyperparameters, using either minGPT or huggingface/transformers

In [1]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel, BloomForCausalLM, BloomTokenizerFast
from mingpt.model import GPT
from mingpt.utils import set_seed
from mingpt.bpe import BPETokenizer
set_seed(3407)

In [2]:
use_mingpt = False # use minGPT or huggingface/transformers model?
use_bloom = True
model_type = 'gpt2-xl'
device = 'cpu'



In [3]:
if use_mingpt:
    model = GPT.from_pretrained(model_type)
elif use_bloom:
    model = BloomForCausalLM.from_pretrained("bigscience/bloom-1b1")
else:
    model = GPT2LMHeadModel.from_pretrained(model_type)
    model.config.pad_token_id = model.config.eos_token_id # suppress a warning

Downloading (…)lve/main/config.json:   0%|          | 0.00/693 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/2.13G [00:00<?, ?B/s]

In [4]:
#torch.cuda.empty_cache()
#import gc
#gc.collect()
#model.to(device)
model.eval();

In [5]:

def generate(prompt='', num_samples=10, steps=20, do_sample=True):
        
    # tokenize the input prompt into integer input sequence
    if use_mingpt:
        tokenizer = BPETokenizer()
        if prompt == '':
            # to create unconditional samples...
            # manually create a tensor with only the special <|endoftext|> token
            # similar to what openai's code does here https://github.com/openai/gpt-2/blob/master/src/generate_unconditional_samples.py
            x = torch.tensor([[tokenizer.encoder.encoder['<|endoftext|>']]], dtype=torch.long)
        else:
            x = tokenizer(prompt).to(device)
    elif use_bloom:
        tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom-1b1")
        if prompt == '': 
            # to create unconditional samples...
            # huggingface/transformers tokenizer special cases these strings
            prompt = '<|endoftext|>'
        encoded_input = tokenizer(prompt, return_tensors='pt').to(device)
        x = encoded_input['input_ids']
    else:
        tokenizer = GPT2Tokenizer.from_pretrained(model_type)
        if prompt == '': 
            # to create unconditional samples...
            # huggingface/transformers tokenizer special cases these strings
            prompt = '<|endoftext|>'
        encoded_input = tokenizer(prompt, return_tensors='pt').to(device)
        x = encoded_input['input_ids']
    
    # we'll process all desired num_samples in a batch, so expand out the batch dim<
    x = x.expand(num_samples, -1)

    # forward the model `steps` times to get samples, in a batch
    y = model.generate(x, max_new_tokens=steps, do_sample=do_sample, top_k=40)
    
    for i in range(num_samples):
        out = tokenizer.decode(y[i].cpu().squeeze())
        print('-'*80)
        print(out)
        

In [12]:
generate(prompt='Write a text about war', num_samples=5, steps=30)

--------------------------------------------------------------------------------
Write a text about war.”</s><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>
--------------------------------------------------------------------------------
Write a text about war, peace, or some major event in the history of the world. Write about events happening around the world. Why do you think the world has changed
--------------------------------------------------------------------------------
Write a text about war in your own language. It may not be very detailed or comprehensive, but if it’s clear and concise, it will get your points across on your own
--------------------------------------------------------------------------------
Write a text about war (by your choice, like a letter to your son, a picture of your husband, or a poem), or ask your students to write a description
-------------------------------