# Testing the Custom LLM Model

This notebook demonstrates how to use the trained Custom LLM model for text generation. We'll explore:

1. Loading a pretrained model checkpoint
2. Basic text generation with different parameters
3. Batch generation for multiple prompts
4. Effect of temperature and sampling parameters

In [3]:
import sys
sys.path.append('..') 

import torch
from transformers import AutoTokenizer
from src.inference.inference_pipeline import InferencePipeline

# Set random seed for reproducibility
torch.manual_seed(42)

<torch._C.Generator at 0x1ead3ce1c30>

## Load Model and Tokenizer

First, we'll load a pretrained tokenizer and our trained model checkpoint.

In [9]:
# Initialize tokenizer (update with your preferred base model)
tokenizer = AutoTokenizer.from_pretrained('gpt2')  # Example using GPT-2 tokenizer

# Load the model checkpoint
model_path = '../checkpoints/model.pt'  # Update with your model path
pipeline = InferencePipeline.from_pretrained(
    model_path=model_path,
    tokenizer=tokenizer,
    model_config={    
                'd_model': 512,  # Smaller dimension
                'n_heads': 8,   # Fewer attention heads
                'n_layers': 4,  # Fewer layers
                'd_ff': 2048,   # Smaller feed-forward dimension
                'dropout': 0.1
    }
)

KeyError: 'model_state_dict'

## Basic Text Generation

Let's try generating text with default parameters.

In [None]:
prompt = "The artificial intelligence revolution will"

# Generate with default parameters
generated_text = pipeline.generate(
    prompt=prompt,
    max_length=100,
    num_return_sequences=1
)

print(f"Prompt: {prompt}")
print(f"Generated: {generated_text[0]}")

## Exploring Temperature Effects

Now let's see how different temperature values affect the output. Higher temperature (>1.0) makes the output more random, while lower temperature (<1.0) makes it more focused and deterministic.

In [None]:
prompt = "In the year 2050, robots will"

# Try different temperatures
temperatures = [0.5, 1.0, 1.5]

for temp in temperatures:
    print(f"Temperature: {temp}")
    generated = pipeline.generate(
        prompt=prompt,
        max_length=100,
        temperature=temp,
        num_return_sequences=1
    )
    print(f"Generated: {generated[0]}")

## Top-k and Top-p Sampling

Let's experiment with different sampling strategies using top-k and nucleus (top-p) sampling.

In [None]:
prompt = "The future of space exploration"

# Different sampling configurations
configs = [
    {'top_k': 50, 'top_p': 0.9},
    {'top_k': 10, 'top_p': 0.9},
    {'top_k': 50, 'top_p': 0.5}
]

for config in configs:
    print(f"Config: {config}")
    generated = pipeline.generate(
        prompt=prompt,
        max_length=100,
        temperature=1.0,
        **config
    )
    print(f"Generated: {generated[0]}")

## Batch Generation

Finally, let's try generating multiple sequences in batch mode.

In [None]:
prompts = [
    "The key to successful AI development is",
    "In the next decade, quantum computing will",
    "The relationship between humans and AI will"
]

# Generate multiple sequences in batch
batch_generated = pipeline.batch_generate(
    prompts=prompts,
    batch_size=2,
    max_length=100,
    temperature=0.8,
    top_k=40,
    top_p=0.9
)

for prompt, generated in zip(prompts, batch_generated):
    print(f"Prompt: {prompt}")
    print(f"Generated: {generated}")