Install requirements

In [1]:
! pip install -r requirements.txt



Using huggingface transformers pipeline to generate text

In [2]:
from transformers import pipeline
import torch
torch.manual_seed(42)
model_type = "openai-community/gpt2"
device = "cuda" if torch.cuda.is_available() else "cpu"
generator = pipeline("text-generation", model=model_type,device=device)
generator("Hello, I'm a language model,", max_length=30, num_return_sequences=1,truncation=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hello, I\'m a language model, so you can\'t define something in any other language. Let me introduce another topic:\n\nThe name "'}]

Split the pipeline into 2 parts
1. Encode the text into tokens
2. forward pass of the model
3. decode the tokens into text
![image](images/tokenizer.png)

In [4]:
from transformers import AutoTokenizer, AutoModelForCausalLM
# Encode the text into tokens
tokenizer = AutoTokenizer.from_pretrained(model_type,use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_type).to(device)
text = 'Hello, I am a language model,'
tokens = tokenizer(text, return_tensors='pt').to(device)
tokens

{'input_ids': tensor([[15496,    11,   314,   716,   257,  3303,  2746,    11]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

In [8]:
# generation of the model
max_length = 30
input_ids, attention_mask = tokens['input_ids'], tokens['attention_mask']
generation = model.generate(input_ids=input_ids,attention_mask=attention_mask, max_length=max_length, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id,do_sample=True)
generation

tensor([[15496,    11,   314,   716,   257,  3303,  2746,    11,   290,   314,
           716,  4609,   287,  4673,   517,   546,   262,  2842,   326,  8950,
           670,    11,   290,   703,   484,   389,   973,   287,   584,  8950]],
       device='cuda:0')

In [6]:
# decode the tokens into text
text = tokenizer.decode(generation[0], skip_special_tokens=True)
text

'Hello, I am a language model, a language design framework, and a language for language engineering, and also a language for analysis. Today, I'

Some important parameters for language model:
1. model_max_length: Also called context length or block size. It is the maximum number of tokens that the model can process.
2. vocab_size: The number of tokens in the vocabulary dictionary. The larger the vocabulary size, the shorter will the token sequences be, but the model need more data to train and model will have more parameters.
3. hidden_size : The hidden size refers to the word embedding dimension of the model.
4. number of layers: The number of blocks (attention + feed-forward-network) in the model. The more layers, the more complex the model will be.

We can investigate these parameters in the model by looking into config.json file in the model directory.


The below is the simple comparison of llama3 and gpt2.

|          | gpt2 | llama3 |
|----------|----------|----------|
|   model_max_length  | 1024   | 4096   |
|  vocab_size   | 50257   | 128256   |
| hidden_size    | 768   | 4096   |
| number of layers    | 12  | 32  |
| number of parameters    | 124M  | 8B  |

gpt2 config file:
https://huggingface.co/openai-community/gpt2/blob/main/config.json   
llama3 config file:
https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/config.json
