## Ensemble learning on transformer based models

In [2]:
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
import torch.nn.functional as F
import torch

### Authenticate HuggingFace

In [3]:
from huggingface_hub import login

# Replace 'YOUR_TOKEN' with your actual Hugging Face token
login(token="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")

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


### model selection

Hugging Face Transformers provides access to a wide variety of pre-trained models for Natural Language Processing (NLP) tasks like text generation, classification, translation, and more. These models are built using different architectures, including:

- **BERT (Bidirectional Encoder Representations from Transformers)**: A pre-trained model designed for understanding the context in text by looking at both directions (left and right). Used for tasks like classification and question answering.

- **GPT (Generative Pre-trained Transformer)**: Focused on text generation, GPT models predict the next word in a sequence, making them ideal for tasks like conversation and text completion.

- **T5 (Text-to-Text Transfer Transformer)**: A versatile model that converts all NLP tasks into a text-to-text format, applicable for translation, summarization, and more.

- **RoBERTa (A Robustly Optimized BERT Pretraining Approach)**: An optimized version of BERT with better training techniques for improved performance on various NLP tasks.

Each model can be fine-tuned for specific use cases or used directly in applications, and they come with easy integration through the Hugging Face transformers library.

In [4]:
model_name_one = "microsoft/Phi-3-mini-4k-instruct"
model_name_two = "meta-llama/Llama-3.2-3B-Instruct"

### Model Selection and Setup

model_name

**Purpose**: The model_name variable is set to the string "microsoft/Phi-3-mini-4k-instruct". This is the identifier for the pretrained model you're loading from Hugging Face's model hub.

**Model**: The microsoft/Phi-3-mini-4k-instruct is a specific language model developed by Microsoft, optimized for instruction following tasks.

tokenizer = AutoTokenizer.from_pretrained(...)

**Purpose**: This line loads the tokenizer associated with the microsoft/Phi-3-mini-4k-instruct model.

**How it works**:
- The tokenizer is responsible for converting input text (e.g., natural language) into tokens, which are numerical representations that the model understands.
- The from_pretrained() method fetches the pretrained tokenizer (if not already cached locally) using the specified model name.
- trust_remote_code=True allows the model to load custom code that might be required for special tokenization logic.

model = AutoModelForCausalLM.from_pretrained(...)

**Purpose**: This line loads the pretrained model itself.

**How it works**:
- AutoModelForCausalLM loads a causal language model, meaning it is designed to generate text in an autoregressive fashion, where each token depends on the previously generated tokens.
- from_pretrained() fetches the model weights and configuration from Hugging Face’s model hub (or from the local cache if it's already downloaded).
- trust_remote_code=True allows loading any custom implementation required by this specific model.

In [5]:
weight = 1.0

# tokenizer = AutoTokenizer.from_pretrained(model_name_two, trust_remote_code=True)
# model = AutoModelForCausalLM.from_pretrained(model_name_two, trust_remote_code=True)

tokenizer_one = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
model_one = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Setup the test prompt

In [6]:
prompt = "What is the capital of France?"
prompt_tokenized = tokenizer(prompt, return_tensors="pt")
prompt_tokenized_phi = tokenizer_one(prompt, return_tensors="pt")

In [23]:
prompt_tokenized

{'input_ids': tensor([[128000,   3923,    374,    279,   6864,    315,   9822,     30]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}

In [7]:
prompt_tokenized_phi

{'input_ids': tensor([[ 1724,   338,   278,  7483,   310,  3444, 29973]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}

In [25]:
# https://huggingface.co/docs/transformers/main/en/model_doc/llama#transformers.LlamaForCausalLM
generate_ids = model.generate(prompt_tokenized.input_ids, max_length=30)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

#

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


'What is the capital of France? Paris\nWhat is the capital of China? Beijing\nWhat is the capital of Japan? Tokyo\nWhat is'

In [8]:
test_param = "alhabetically"
w1 = tokenizer(test_param, return_tensors="pt")
w2 = tokenizer_one(test_param, return_tensors="pt")

In [9]:
w1, w2

({'input_ids': tensor([[128000,    278,     71,  10448,   2740]]), 'attention_mask': tensor([[1, 1, 1, 1, 1]])},
 {'input_ids': tensor([[ 394, 7308,  300, 1711]]), 'attention_mask': tensor([[1, 1, 1, 1]])})

### Get the top-k tokens

In [18]:
# Function to get top-k tokens and probabilities
def get_top_k(model, prompt: str, k: int = 10):
    # Tokenize the prompt
    inputs = tokenizer(prompt, return_tensors="pt")

    # Get model outputs
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits

    # Convert logits to probabilities
    probabilities = F.softmax(logits, dim=-1)

    # Get the probabilities for the last token in the sequence
    last_token_probabilities = probabilities[0, -1, :]

    # Get the top-k token indices and their corresponding probabilities
    top_k_indices = last_token_probabilities.argsort()[-k:][::-1]
    top_k_probs = last_token_probabilities[top_k_indices].cpu().numpy()
    top_k_tokens = tokenizer.convert_ids_to_tokens(top_k_indices)

    return list(zip(top_k_tokens, top_k_probs))

In [14]:
"""
Get the top-k token probabilities from the model output.

:param k: The number of top tokens to retrieve.
:return: A list of tuples containing the top-k tokens and their probabilities.
"""
def get_top_k_two(model, prompt, k: int = 10):
    # Get the model outputs
    with torch.no_grad():
        outputs = model(prompt)
        logits = outputs.logits

    probabilities = F.softmax(logits, dim=-1)           # Convert logits to probabilities
    last_token_probabilities = probabilities[0, -1, :]  # Get the probabilities for the last token

    # Convert probabilities to a more readable format
    probs = last_token_probabilities.cpu().numpy()

    # Get the top 10 probabilities
    top_k = 10
    top_k_indices = probs.argsort()[-top_k:][::-1]
    top_k_probs = probs[top_k_indices]
    top_k_tokens = tokenizer.convert_ids_to_tokens(top_k_indices)

    top_k_indices = last_token_probabilities.argsort()[-k:][::-1]
    top_k_probs = last_token_probabilities[top_k_indices]
    top_k_tokens = tokenizer.convert_ids_to_tokens(top_k_indices)

    return list(zip(top_k_tokens, top_k_probs))

In [17]:
# Get the model outputs
with torch.no_grad():
  outputs = model(prompt)
  logits = outputs.logits

probabilities = F.softmax(logits, dim=-1)           # Convert logits to probabilities
last_token_probabilities = probabilities[0, -1, :]  # Get the probabilities for the last token

print(probabilities)

# Convert probabilities to a more readable format
probs = last_token_probabilities.cpu().numpy()

# Get the top 10 probabilities
top_k = 10
top_k_indices = probs.argsort()[-top_k:][::-1]
top_k_probs = probs[top_k_indices]
top_k_tokens = tokenizer.convert_ids_to_tokens(top_k_indices)
top_k_indices = last_token_probabilities.argsort()[-k:][::-1]
top_k_probs = last_token_probabilities[top_k_indices]
top_k_tokens = tokenizer.convert_ids_to_tokens(top_k_indices)

list_k_props = list(zip(top_k_tokens, top_k_probs))

TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not str

### Get Computed Probabilities

In [16]:
top_k = get_top_k_two(model, prompt_tokenized)
print(f"Token: {top_k.token}, Probability: {top_k.prob:.4f}")

TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not BatchEncoding

### Ensemble Result for Simple token averaging