# This notebook is a simple example of utilizing HuggingFace's transformers and get a start in using large language models

In [1]:
# import requirements
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
# Create device agnostic code:
device = "cuda" if torch.cuda.is_available() else "cpu"



False

### For our purposes, we want to use smaller models (so multiple people can test the demo at the same time, and so we don't use all of Jet's resources)

In [None]:
# OPT is an open-source language model with a variety of different sizes
# Parameters influence the amount of VRAM that the model utilizes as well as the complexity/accuracy of the answers the model can generate.
model_name = "facebook/opt-350m"

There are a few different ways to utilize language models with transformers.

The first (and simplest) is to create a pipeline for generation.

In [None]:
generator = pipeline('text-generation', model=model_name)
response = generator("Hello, I am conscious and ")

response

The next way, which allows for more complex use of the model, is loading it with `AutoTokenizer` and `AutoModeForCausalLM`

In [None]:
tokenizer = AutoTokenizer(model_name)
model = AutoModelForCausalLM(model_name)

input = "Hello, I am conscious and "

input_ids = tokenizer(input, return_tensors="pt").input_ids.to(device)

outputs = model.generate(input_ids)
output = tokenizer.decode(outputs[0])

output