<a href="https://colab.research.google.com/github/angelaaaateng/AIR_AI_Engineering_Course_2024/blob/main/Day1/2_LMMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction to Large Language Models

## "Next Word Prediction" in Python through LLMs like GPTNeo

GPT-Neo is an open-source language model created by EleutherAI, designed as a free alternative to models like OpenAI's GPT-3. It's based on the same underlying architecture (Transformer) and can perform tasks such as text generation, summarization, and language understanding. GPT-Neo is available in different sizes, depending on the number of parameters (1.3B, 2.7B, etc.), and can be fine-tuned or used out of the box for various natural language processing tasks

**Import Libraries:**

The code imports torch for handling tensors and the transformers library to use pre-trained language models like GPT-Neo.

In [None]:
import torch
from transformers import AutoTokenizer, GPTNeoForCausalLM

**Loading the Tokenizer and Model:**

The tokenizer and GPT-Neo model are loaded from the transformers library using the `from_pretrained` method. The model used here is the 1.3 billion parameter version of GPT-Neo.

In [None]:
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")

model

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.35k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/5.31G [00:00<?, ?B/s]

GPTNeoForCausalLM(
  (transformer): GPTNeoModel(
    (wte): Embedding(50257, 2048)
    (wpe): Embedding(2048, 2048)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-23): 24 x GPTNeoBlock(
        (ln_1): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (attn): GPTNeoAttention(
          (attention): GPTNeoSelfAttention(
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (resid_dropout): Dropout(p=0.0, inplace=False)
            (k_proj): Linear(in_features=2048, out_features=2048, bias=False)
            (v_proj): Linear(in_features=2048, out_features=2048, bias=False)
            (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
            (out_proj): Linear(in_features=2048, out_features=2048, bias=True)
          )
        )
        (ln_2): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (mlp): GPTNeoMLP(
          (c_fc): Linear(in_features=2048, out_features=8192, bias=True)
          (c_proj):

**Tokenizing Input Text:**

The input text "Language models are" is tokenized into a format that the model understands and converted into tensors.

In [None]:
# Tokenize the input text
input_ids = tokenizer("Language models are", return_tensors="pt")
input_ids

{'input_ids': tensor([[32065,  4981,   389]]), 'attention_mask': tensor([[1, 1, 1]])}

`input_ids:` The original input text "Language models are" is converted into token IDs, which are unique numerical representations for each word or subword. In this case, "Language models are" is tokenized as tensor([[32065, 4981, 389]]). Each number corresponds to a specific word in the model's vocabulary.

`attention_mask:` This tensor [1, 1, 1] indicates that all tokens are valid and should be attended to during processing (no padding tokens).

**Generate Next Token:**

The model generates the next token based on the input sequence, returning scores for each possible next word.

In [None]:
# Generate the next token with scores
gen_tokens = model.generate(**input_ids, max_new_tokens=1,
                            output_scores=True, return_dict_in_generate=True)

gen_tokens

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


GenerateDecoderOnlyOutput(sequences=tensor([[32065,  4981,   389,   257]]), scores=(tensor([[-10.9917, -10.1963, -13.0822,  ..., -14.0905, -14.2715, -13.7649]]),), logits=None, attentions=None, hidden_states=None, past_key_values=((tensor([[[[-0.5316,  0.4638,  0.1836,  ...,  0.1358,  0.4674,  0.5552],
          [ 0.0431,  0.3145,  0.4239,  ..., -0.0848,  0.4906,  0.3449],
          [-0.2633,  0.0505, -0.1287,  ...,  0.1375,  0.6232,  0.6701]],

         [[-0.1071, -0.2553, -0.4357,  ...,  0.0933, -0.2020, -0.6138],
          [-0.0771,  0.3338, -0.7453,  ...,  0.0141, -0.4864, -0.2899],
          [-0.3422,  0.2439, -0.5358,  ...,  0.3610, -0.2869, -0.4788]],

         [[ 0.5849, -0.3117,  0.4096,  ..., -0.1905,  0.0388,  0.1751],
          [ 0.9367, -0.6178,  0.1473,  ...,  0.2834, -0.1385,  0.1037],
          [ 0.6936, -0.3523, -0.1253,  ..., -0.0921, -0.1441,  0.2856]],

         ...,

         [[-0.2950,  0.0706, -0.3925,  ..., -1.5442, -0.9863, -0.4138],
          [-0.3942,  0.1521

**Extract and Sort Scores:**

The output scores are extracted, and the top 20 token probabilities are sorted in descending order.

In [None]:
# Extract the output scores
output_scores = gen_tokens["scores"]
scores_tensor = output_scores[0]

print(output_scores)
scores_tensor

(tensor([[-10.9917, -10.1963, -13.0822,  ..., -14.0905, -14.2715, -13.7649]]),)


tensor([[-10.9917, -10.1963, -13.0822,  ..., -14.0905, -14.2715, -13.7649]])

In [None]:
# Sort the tokens by their scores in descending order
sorted_indices = torch.argsort(scores_tensor[0], descending=True)[:20]
sorted_indices

tensor([  257,   973,   262,  1690,   281,  6768,   407,  6481, 27888,  1593,
          530,  5447,   852,  8811,   845,  6032,  3716,  3665,  3303, 29280])

**Displaying Results:**

A loop iterates through the top 20 tokens, decodes them back into words, and displays each word with its corresponding score, showing the model’s confidence for each next-word prediction.

In [None]:
# Loop through the top 20 token indices and display token name and score
for index in sorted_indices:
    token_id = index
    token_name = tokenizer.decode([token_id.item()])
    token_score = scores_tensor[0][index].item()
    print(f"Token: {token_name}, Score: {token_score}")

Token:  a, Score: -1.1022074222564697
Token:  used, Score: -1.4315834045410156
Token:  the, Score: -1.7675762176513672
Token:  often, Score: -1.841552972793579
Token:  an, Score: -2.465235471725464
Token:  widely, Score: -2.657839775085449
Token:  not, Score: -2.6726551055908203
Token:  increasingly, Score: -2.756849765777588
Token:  ubiquitous, Score: -2.868807792663574
Token:  important, Score: -2.902836322784424
Token:  one, Score: -2.908351182937622
Token:  defined, Score: -3.08156681060791
Token:  being, Score: -3.2117607593536377
Token:  commonly, Score: -3.311004161834717
Token:  very, Score: -3.3173515796661377
Token:  typically, Score: -3.4478561878204346
Token:  complex, Score: -3.521364212036133
Token:  powerful, Score: -3.533857822418213
Token:  language, Score: -3.550966739654541
Token:  pervasive, Score: -3.563505172729492


## Open-Source LLMs

In [None]:
# Install the necessary libraries
!pip install transformers

# Import the T5 model and tokenizer
from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load the T5 model and tokenizer (T5-small for faster execution)
model_name = "t5-small"  # You can also use "t5-base" for a bigger model
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

# Set the translation prompt (T5 expects prompts in the format 'translate English to French: ...')
prompt = "translate English to French: Hello, how are you?"

# Tokenize the input prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")

# Generate the translation
output = model.generate(input_ids, max_length=50)

# Decode the output to text
response = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the translated text
print(response)




config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Bonjour, comment êtes-vous?


In [None]:
response

'Bonjour, comment êtes-vous?'

In [None]:
# Install the latest OpenAI package
!pip install openai

# Import OpenAI library
from openai import OpenAI

# Set your API key here
api_key = "XXXX API KEY"


from openai import OpenAI
client = OpenAI(api_key=api_key)

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Translate the following from english to french: 'hello, how are you?' "
        }
    ]
)

print(completion.choices[0].message)
response

ChatCompletionMessage(content='The translation of "hello, how are you?" in French is "Bonjour, comment ça va ?"', refusal=None, role='assistant', function_call=None, tool_calls=None)


'Bonjour, comment êtes-vous?'