#### **Pre-trained Models for Text Generation**

Why use pretrained models?
- They are trained on extensive datasets
- They have high performance across various text generation tasks like
  - Sentiment Analysis
  - Text Completion
  - Language Translation


What are the limitations of pretrained models?
- They require high computaitonal cost for training.
- They have large storage requirements
- They have limited customization options.


#### **Pre-trained models in PyTorch**

We can use Hugging Face Transformers, which is a library of pre-trained models (550,000+ models), including models like -
- GPT-2
- T5
- Llama 2
- Mistral

and so on...

#### **Let us try GPT-2 Tokenizer and Model**

`GPT2LMHeadModel` is HuggingFace's take on GPT-2. It is tailored for text generation.

`GPT2Tokenizer` converts text into tokens. It handles subword tokenization. (e.g. 'larger' might become ['large', 'r'])

In [3]:
# importing GPT2 
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
seed_text = 'Once upon a time'
input_ids = tokenizer.encode(seed_text, return_tensors='pt')

  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [5]:
output = model.generate(input_ids, max_length=40, temperature=0.7, no_repeat_ngram_size=2, pad_token_id=tokenizer.eos_token_id)

generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)



Once upon a time, the world was a place of great beauty and great danger. The world of the gods was the place where the great gods were born, and where they were to live.



#### **Let us try T5 language translation**

`t5-small` is a text-to-text transfer transformer. It is a pre-trained model for language translation tasks .

In [3]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')

input_prompt = "translate English to French: 'Hello, how are you?'"
input_ids = tokenizer.encode(input_prompt, return_tensors='pt')

output = model.generate(input_ids, max_length=100)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print('Generated Text: ', generated_text)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Generated Text:  "Jo, comment êtes-vous?"
