<a href="https://colab.research.google.com/github/JayThibs/gpt-experiments/blob/main/notebooks/gpt_neo_simple_text_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple Text Generation with GPT-Neo using Huggingface

Credit: https://github.com/mallorbc/GPTNeo_notebook

Video: https://www.youtube.com/watch?v=d_ypajqmwcU

## Installations

In [1]:
pip install transformers pynvml --quiet

[K     |████████████████████████████████| 2.8 MB 7.9 MB/s 
[K     |████████████████████████████████| 46 kB 3.3 MB/s 
[K     |████████████████████████████████| 3.3 MB 48.3 MB/s 
[K     |████████████████████████████████| 52 kB 1.4 MB/s 
[K     |████████████████████████████████| 636 kB 55.7 MB/s 
[K     |████████████████████████████████| 895 kB 36.7 MB/s 
[?25h

## Imports

In [2]:
from transformers import GPTNeoForCausalLM, GPT2Tokenizer
import torch

# Loading the Model

In [3]:
model_name = 'EleutherAI/gpt-neo-2.7B' # or 1.3B
model = GPTNeoForCausalLM.from_pretrained(model_name)

Downloading:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/10.7G [00:00<?, ?B/s]

This model can run inference on GPU, but does not have to be. The 2.7B model takes slightly less than 13 GB of Vram. The 1.3B model takes slighly less than 7.5GB of Vram. The model will be placed on the GPU if there is one and if there is enough Vram.

Let's check how much VRAM we have.

In [4]:
free_vram = 0.0
if torch.cuda.is_available():
    from pynvml import *
    nvmlInit()
    h = nvmlDeviceGetHandleByIndex(0)
    info = nvmlDeviceGetMemoryInfo(h)
    free_vram = info.free / 1048576000
    print('There is a GPU with ' + str(free_vram) + 'GB of free VRAM')

There is a GPU with 16.27875GB of free VRAM


In [6]:
if model_name == 'EleutherAI/gpt-neo-2.7B' and free_vram>13.5:
    use_cuda = True
    model.to('cuda:0')
elif model_name == 'EleutherAI/gpt-neo-1.3B' and free_vram>7.5:
    use_cuda = True
    model.to('cuda:0')
else:
    use_cuda = False

Now we need to load the tokenizer to prepare the input for GPT-Neo.

In [7]:
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

Downloading:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/200 [00:00<?, ?B/s]

Now we need to decide what prompt we want the model to use to predict the next words, as well as how long we want the generated output to be.

In [22]:
prompt = str(input('Please enter a prompt: '))

Please enter a prompt: What are some causes for chronic illness?


In [23]:
output_length = int(input("How long should the generated output be? "))

How long should the generated output be? 200


We now need to tokenize the input prompt to prepare it for use with the model. If we are using a GPU we will put it on the GPU as well.


In [24]:
input_ids = tokenizer(prompt, return_tensors='pt').input_ids
if use_cuda:
    input_ids = input_ids.cuda()

In [25]:
gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=output_length)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [26]:
generated_text = tokenizer.batch_decode(gen_tokens)[0]
print(generated_text)

What are some causes for chronic illness?

In healthy people, the cause of many chronic illnesses is unknown. However, there are many things that can cause long-term problems and diseases. It is common for people to have a lot of stress in their lives. Their bodies feel uncomfortable, they have aches and pain, and they often feel tired all the time. The most common cause for chronic illness is a lack of sleep. It is not unusual for people without a disease to feel tired. In many ways, the most important causes of chronic illness are:

stress

poor diet

excessive alcohol consumption

poor eating and physical activity habits

physical inactivity

poor exercise habits

lack of sleep

This all means that all these possible causes are in fact also the primary causes for chronic illness. The best way to prevent or deal with many illnesses and diseases is to stay fit and healthy and to get regular exercise and sufficient sleep. This


Examples of outputs I got:

1. What are some causes for chronic illness?

In healthy people, the cause of many chronic illnesses is unknown. However, there are many things that can cause long-term problems and diseases. It is common for people to have a lot of stress in their lives. Their bodies feel uncomfortable, they have aches and pain, and they often feel tired all the time. The most common cause for chronic illness is a lack of sleep. It is not unusual for people without a disease to feel tired. In many ways, the most important causes of chronic illness are:

stress

poor diet

excessive alcohol consumption

poor eating and physical activity habits

physical inactivity

poor exercise habits

lack of sleep

This all means that all these possible causes are in fact also the primary causes for chronic illness. The best way to prevent or deal with many illnesses and diseases is to stay fit and healthy and to get regular exercise and sufficient sleep.

2. How do I start a career in filmmaking? 

How should I go about creating my own material? These are the kinds of questions that every aspiring filmmaker should be asked over and over again, at every step of the filmmaking process. Here is what I’ve learned over the past year in the process of making my film, The Wreckers, and I’ll share it with you.

Who should you go after your dream job in the industry?

Most people who would be most qualified to interview for a job as a director will be the person you’re most interested in hiring. But it’s not always that simple.

One of my mentors, who was once a junior producer, told me that “hiring a new hire is like trying to get a car loan. Get your references and make a decision based on them.” This is because you want to hire someone who is your ideal person, not necessarily one who is good
