First lets install the correct packages for GPT3.  We are already in the conda environment from jupyter.

First lets install pytorch.

In [1]:
!conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia -y

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.9.2
  latest version: 4.10.1

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



Now lets install HuggingFace.  It makes using popular Tranformers MUCH easier.

In [2]:
!conda install -c huggingface transformers -y

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.9.2
  latest version: 4.10.1

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



Lets import the needed packages now

In [1]:
from transformers import GPTNeoForCausalLM, GPT2Tokenizer
import torch

  from .autonotebook import tqdm as notebook_tqdm


Now lets get the model.  We can either run the 1.3 billion paramater model or the 2.7 billion parameter model. Lets do the 2.7B model, which is "EleutherAI/gpt-neo-2.7B".  The 1.3B model is "EleutherAI/gpt-neo-1.3B"

In [2]:
model_name = "EleutherAI/gpt-neo-2.7B"
model = GPTNeoForCausalLM.from_pretrained(model_name)

This model can be ran on a GPU, but does not have to be. The 2.7B model takes slightly less than 13 GB of Vram.  The 1.3B model takes slighly less than 7.5GB of Vram.  The model will be placed on the GPU if there is one and if there is enough Vram.

Lets install pynvml to take a look at how much VRAM we have.

In [3]:
!pip install pynvml



In [5]:
free_vram = 0.0
if torch.cuda.is_available():
    from pynvml import *
    nvmlInit()
    h = nvmlDeviceGetHandleByIndex(0)
    info = nvmlDeviceGetMemoryInfo(h)
    free_vram = info.free/1048576000
    print("There is a GPU with " + str(free_vram) + "GB of free VRAM")

In [6]:
if model_name == "EleutherAI/gpt-neo-2.7B" and free_vram>13.5:
    use_cuda = True
    model.to("cuda:0")
elif model_name == "EleutherAI/gpt-neo-1.3B" and free_vram>7.5:
    use_cuda = True
    model.to("cuda:0")
else:
    use_cuda = False

Now we need to load the tokenizer to prepare the input for GPT3

In [7]:
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

We are almost done. At this point we need to decide what prompt we need to decide what prompt we want the model to continue, as well a how long we want the generated output to be.

In [8]:
prompt = str(input("Please enter a prompt: "))

In [9]:
output_length = int(input("How long should the generated output be? "))

In [10]:
def get_Chat_response(text):
    input_ids = tokenizer(text, return_tensors="pt").input_ids
    if use_cuda:
        input_ids = input_ids.cuda()
    gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=200, pad_token_id=tokenizer.eos_token_id)
    
    return tokenizer.batch_decode(gen_tokens)[0]
    
get_Chat_response("fried egg")   

'fried egg. He had no way of knowing that there was no point in his asking his father what had happened. Even if they had discussed the matter, he had no idea that his father had simply told him again and again that things were going to be different from now on, that they would return to the old order of things.\n\n## 16\n\n## A LITTLE BLUE\n\nIn the last week, the family was becoming increasingly aware of the changes being forced upon them. That was the result of the first crisis, of course, and that was the reason they had moved to the flat. The second crisis was the death of a friend, a man who had been to work on the _Maastricht_ project and who never spoke to them again. Their reaction was the same as it ever was. They had to accept that they were victims of fate, and it was clear to them that their old life was over. They had been trapped by a situation'

We now need to tokenize the input prompt to prepare it for use with the model.  If we are using a GPU we will put it on the GPU as well.

In [11]:
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
if use_cuda:
    input_ids = input_ids.cuda()

In [12]:
gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=output_length)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [13]:
gen_text = tokenizer.batch_decode(gen_tokens)[0]
print(gen_text)

WRITE POEM ABOUT SUNDAYS IN LA and all the fun on the way!



“The greatest gift is life’s love, and the most precious is love’s gift to us.” ― Friedrich Nietzsche





“What was the worst thing you ever did?”





“You took your life, you made the wrong choices.” ― Bill Hicks





“I do what I love because it is the only thing I do. The rest of my life serves as the proof for the pudding.” ― Henry Ford





“Everything in life is a journey and the person who laughs at the sky is the one who has arrived.” ― Dalai Lama





“Someday in my life, I will meet someone who is just a little bit like you or a little bit like me or just a little bit something in between.
