## Ensemble learning on transformer based models

In [1]:
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
import torch.nn.functional as F
import torch

# typings
from typing import List, Tuple

### Authenticate HuggingFace

In [None]:
from huggingface_hub import login

# Replace 'YOUR_TOKEN' with your actual Hugging Face token
login(token="YOUR_TOKEN")

### model selection

Hugging Face Transformers provides access to a wide variety of pre-trained models for Natural Language Processing (NLP) tasks like text generation, classification, translation, and more. These models are built using different architectures, including:

- **BERT (Bidirectional Encoder Representations from Transformers)**: A pre-trained model designed for understanding the context in text by looking at both directions (left and right). Used for tasks like classification and question answering.

- **GPT (Generative Pre-trained Transformer)**: Focused on text generation, GPT models predict the next word in a sequence, making them ideal for tasks like conversation and text completion.

- **T5 (Text-to-Text Transfer Transformer)**: A versatile model that converts all NLP tasks into a text-to-text format, applicable for translation, summarization, and more.

- **RoBERTa (A Robustly Optimized BERT Pretraining Approach)**: An optimized version of BERT with better training techniques for improved performance on various NLP tasks.

Each model can be fine-tuned for specific use cases or used directly in applications, and they come with easy integration through the Hugging Face transformers library.

In [2]:
model_name = "microsoft/Phi-3-mini-4k-instruct"

### Model Selection and Setup

model_name

**Purpose**: The model_name variable is set to the string "microsoft/Phi-3-mini-4k-instruct". This is the identifier for the pretrained model you're loading from Hugging Face's model hub.

**Model**: The microsoft/Phi-3-mini-4k-instruct is a specific language model developed by Microsoft, optimized for instruction following tasks.

tokenizer = AutoTokenizer.from_pretrained(...)

**Purpose**: This line loads the tokenizer associated with the microsoft/Phi-3-mini-4k-instruct model.

**How it works**:
- The tokenizer is responsible for converting input text (e.g., natural language) into tokens, which are numerical representations that the model understands.
- The from_pretrained() method fetches the pretrained tokenizer (if not already cached locally) using the specified model name.
- trust_remote_code=True allows the model to load custom code that might be required for special tokenization logic.

model = AutoModelForCausalLM.from_pretrained(...)

**Purpose**: This line loads the pretrained model itself.

**How it works**:
- AutoModelForCausalLM loads a causal language model, meaning it is designed to generate text in an autoregressive fashion, where each token depends on the previously generated tokens.
- from_pretrained() fetches the model weights and configuration from Hugging Face’s model hub (or from the local cache if it's already downloaded).
- trust_remote_code=True allows loading any custom implementation required by this specific model.

In [3]:
weight = 1.0
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Setup the test prompt

In [4]:
prompt = "What is the capital of France?"
prompt_tokenized = tokenizer(prompt, return_tensors="pt")

### Get the top-k tokens

In [None]:
# Function to get top-k tokens and probabilities
def get_top_k(prompt: str, k: int = 10) -> List[Tuple[str, float]]:
    # Tokenize the prompt
    inputs = tokenizer(prompt, return_tensors="pt")

    # Get model outputs
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits

    # Convert logits to probabilities
    probabilities = F.softmax(logits, dim=-1)

    # Get the probabilities for the last token in the sequence
    last_token_probabilities = probabilities[0, -1, :]

    # Get the top-k token indices and their corresponding probabilities
    top_k_indices = last_token_probabilities.argsort()[-k:][::-1]
    top_k_probs = last_token_probabilities[top_k_indices].cpu().numpy()
    top_k_tokens = tokenizer.convert_ids_to_tokens(top_k_indices)

    return list(zip(top_k_tokens, top_k_probs))

In [6]:
"""
Get the top-k token probabilities from the model output.

:param k: The number of top tokens to retrieve.
:return: A list of tuples containing the top-k tokens and their probabilities.
"""
def get_top_k(prompt, k: int = 10) -> List[(List[str], Tensor)]:
    # Get the model outputs
    with torch.no_grad():
        outputs = model(prompt)
        logits = outputs.logits

    probabilities = F.softmax(logits, dim=-1)           # Convert logits to probabilities
    last_token_probabilities = probabilities[0, -1, :]  # Get the probabilities for the last token

    # Convert probabilities to a more readable format
    probs = last_token_probabilities.cpu().numpy()

    # Get the top 10 probabilities
    top_k = 10
    top_k_indices = probs.argsort()[-top_k:][::-1]
    top_k_probs = probs[top_k_indices]
    top_k_tokens = tokenizer.convert_ids_to_tokens(top_k_indices)

    top_k_indices = last_token_probabilities.argsort()[-k:][::-1]
    top_k_probs = last_token_probabilities[top_k_indices]
    top_k_tokens = tokenizer.convert_ids_to_tokens(top_k_indices)

    return list(zip(top_k_tokens, top_k_probs))

TypeError: Too many arguments for typing.List; actual 2, expected 1

### Get Computed Probabilities

In [None]:
top_k = model.get_top_k(prompt_tokenized)
print(f"Token: {top_k.token}, Probability: {top_k.prob:.4f}")

### Ensemble Result for Simple token averaging