<a href="https://colab.research.google.com/github/GeraudBourdin/llm-scripts/blob/main/vllm-gpt2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Avec VVLM

In [None]:
!pip install vllm

In [None]:
from vllm import LLM

prompts = ["Hello, my name is", "The capital of France is"]
llm = LLM(model="gpt2")
outputs = llm.generate(prompts)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

# Avec le CPU seulement

In [None]:
!pip install torch transformers

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
if torch.cuda.is_available():
    torch.cuda.set_device(0)

tokenizer   = AutoTokenizer.from_pretrained("gpt2")
model       = AutoModelForCausalLM.from_pretrained("gpt2",return_dict_in_generate=True)
model.to(device)

input_text = "The best thing about AI is its ability to"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)

attention_mask = torch.ones(
    input_ids.shape
    , dtype=torch.long
    , device=model.device
)


#output the probability of each out put words
output = model.generate(
    input_ids
    , attention_mask    = attention_mask
    , eos_token_id      = tokenizer.eos_token_id
    , do_sample         = True
    , output_scores     = True
    , max_new_tokens    = 512
)

# only use id's that were generated
gen_sequences = output.sequences[:, input_ids.shape[-1]:]
generated_text = tokenizer.decode(gen_sequences[0],skip_special_tokens=True)

print("################################################")
print(generated_text)

# Avec un pipeline Huggingface

In [2]:
from transformers import pipeline
import time

input_text = "The best thing about AI is its ability to"


generator = pipeline('text-generation', model='gpt2')
text = generator(
                  input_text
                 ,max_length           = 512
                 ,pad_token_id         = 50256
                 ,num_return_sequences = 1
                 )

outputs = generator(
                   input_text
                  ,num_return_sequences = 1
                  ,pad_token_id=generator.tokenizer.eos_token_id
                  , max_new_tokens    = 512
                  )


for output in outputs:
  generated_text = output['generated_text']
  print(f"{generated_text!r}")


"The best thing about AI is its ability to do something simple and quick, but it's great to learn, use, test, optimize, and evolve to new patterns. I would love to see more AI learn from others, because they are more interesting. We are now in the era of deep learning, and many are working with it extensively. I think a lot of these technologies will be more useful in the future, so when we are just starting to learn how to drive, drive, drive, drive, you can start making some pretty cool things if we go along that way."
