<h1>Text Generation based in prompt and image inputs</h1>

<h3>1. Basic use of a Hugging Face model for text generation</h3>

In [1]:
from transformers import pipeline

In [2]:
prompt = "Yogurt tastes better when topped with"

In [3]:
text_generator = pipeline(
    model="gpt2",
    task="text-generation", 
    clean_up_tokenization_spaces=True
)

# Set pad_token_id to eos_token_id
text_generator.model.config.pad_token_id = text_generator.model.config.eos_token_id

generated_text = text_generator(prompt)
print(generated_text)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[{'generated_text': 'Yogurt tastes better when topped with lemon-flavored creme fraiche, and it\'s best in the form of an orange-based creme fraiche. My favorite, however, was the "dang. dang."\n\n'}]


<h3>2. Adding more control over the process and model used for text generation to improve results</h3>

In [4]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

In [5]:
input_ids = tokenizer.encode(prompt, return_tensors="pt")
print(input_ids)

tensor([[   56,   519,  3325, 18221,  1365,   618, 20633,   351]])


In [6]:
output = model.generate(input_ids, max_length=15, num_return_sequences=1)
print(output)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


tensor([[   56,   519,  3325, 18221,  1365,   618, 20633,   351,   257,  1310,
          1643,   286, 18873, 13135,    13]])


In [7]:
generated_text = tokenizer.decode(output[0])
print(generated_text)

Yogurt tastes better when topped with a little bit of lemon juice.


<h3>3. Generating text from an image</h3>

In [8]:
from transformers import (
    AutoProcessor,
    AutoModelForCausalLM
)

# Get the processor and model
processor = AutoProcessor.from_pretrained("microsoft/git-base-coco")

# Load the model
model = AutoModelForCausalLM.from_pretrained("microsoft/git-base-coco")

In [9]:
from PIL import Image

# Load the image and get its pixels values
img = Image.open("data/images/person_1.jpg")
pixel_values = processor(images=img, return_tensors="pt").pixel_values

# Generate the ids
generated_ids = model.generate(pixel_values=pixel_values, max_length=15)

# Decode the output
generated_caption = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True
)

# View the generated text based on the image input
print(generated_caption[0])

a man in a yellow beanie, wearing a yellow beanie.


In [10]:
# Load the image and get its pixels values
img = Image.open("data/images/person_2.jpg")
pixel_values = processor(images=img, return_tensors="pt").pixel_values

# Generate the ids
generated_ids = model.generate(pixel_values=pixel_values, max_length=15)

# Decode the output
generated_caption = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True
)

# View the generated text based on the image input
print(generated_caption[0])

woman in a black hat and red nail polish.
