# Understanding Basic Concepts of LLM Models using Hugging Face

## Course Overview
This notebook introduces Large Language Models (LLMs) and how to work with them using the Hugging Face library. LLMs are the "brain" of agentic AI systems, providing reasoning and language understanding capabilities.

## Learning Objectives
- Understand what Large Language Models (LLMs) are
- Learn about the Hugging Face ecosystem
- Explore different types of LLM architectures
- Learn to load and use pre-trained models
- Understand tokenization and text generation
- Explore model parameters and inference


## What are Large Language Models (LLMs)?

**Large Language Models** are AI systems trained on vast amounts of text data to:
- Understand and generate human-like text
- Answer questions
- Perform language tasks (translation, summarization, etc.)
- Reason about complex problems

Key characteristics:
- **Transformer architecture**: The underlying neural network design
- **Pre-training**: Models learn from massive text corpora
- **Fine-tuning**: Models can be adapted for specific tasks
- **Context window**: The amount of text a model can process at once


## Introduction to Hugging Face

**Hugging Face** is a platform and library that provides:
- Pre-trained models (Transformers library)
- Tokenizers for text processing
- Model Hub for sharing models
- Easy-to-use APIs for working with LLMs

Let's start by installing and importing the necessary libraries:


In [None]:
# Install required packages (uncomment if needed)
# !pip install transformers torch accelerate

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
import warnings
warnings.filterwarnings('ignore')

print("Hugging Face Transformers library loaded successfully!")
print(f"PyTorch version: {torch.__version__}")


## Understanding Tokenization

**Tokenization** is the process of converting text into tokens (smaller units) that the model can understand. Different models use different tokenizers.

Let's explore tokenization:


In [None]:
# Load a tokenizer (using GPT-2 as an example - lightweight model)
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example text
text = "Hello! How are you today? I'm learning about LLMs."

# Tokenize the text
tokens = tokenizer.tokenize(text)
token_ids = tokenizer.encode(text)

print("Original text:")
print(text)
print("\n" + "="*60)
print("Tokenized (words/subwords):")
print(tokens)
print("\n" + "="*60)
print("Token IDs (numerical representation):")
print(token_ids)
print("\n" + "="*60)
print(f"Number of tokens: {len(tokens)}")
print(f"Number of characters: {len(text)}")

# Decode back to text
decoded_text = tokenizer.decode(token_ids)
print("\nDecoded back to text:")
print(decoded_text)


## Loading Pre-trained Models

Hugging Face makes it easy to load pre-trained models. Let's load a small model for demonstration:


In [None]:
# Load a small, efficient model for demonstration
# Note: For production, you might use larger models like Llama, Mistral, etc.
model_name = "gpt2"

print(f"Loading model: {model_name}")
print("This may take a moment on first run (downloading model)...")

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Set pad token if not present
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print(f"\nModel loaded successfully!")
print(f"Model type: {type(model).__name__}")
print(f"Vocabulary size: {len(tokenizer)}")


## Text Generation with LLMs

Now let's generate text using the model:


In [None]:
def generate_text(prompt, max_length=50, temperature=0.7, top_k=50):
    """
    Generate text using the loaded model
    
    Parameters:
    - prompt: Input text to continue
    - max_length: Maximum length of generated text
    - temperature: Controls randomness (lower = more deterministic)
    - top_k: Limits sampling to top k tokens
    """
    # Tokenize input
    inputs = tokenizer.encode(prompt, return_tensors="pt")
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=max_length,
            temperature=temperature,
            top_k=top_k,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    # Decode output
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

# Example 1: Simple generation
prompt1 = "The future of artificial intelligence"
result1 = generate_text(prompt1, max_length=80)
print("Prompt:", prompt1)
print("Generated:", result1)
print("\n" + "="*60 + "\n")

# Example 2: Different temperature
prompt2 = "Once upon a time"
result2_low_temp = generate_text(prompt2, max_length=60, temperature=0.3)
result2_high_temp = generate_text(prompt2, max_length=60, temperature=1.2)

print("Prompt:", prompt2)
print("\nLow temperature (0.3) - more deterministic:")
print(result2_low_temp)
print("\nHigh temperature (1.2) - more creative:")
print(result2_high_temp)


## Using Hugging Face Pipelines

Hugging Face provides convenient **pipelines** that simplify common tasks:


In [None]:
# Text generation pipeline
generator = pipeline("text-generation", model=model_name, tokenizer=model_name)

# Generate text
prompt = "Artificial intelligence will"
result = generator(prompt, max_length=50, num_return_sequences=1, temperature=0.7)

print("Prompt:", prompt)
print("\nGenerated text:")
print(result[0]['generated_text'])


## Understanding Model Parameters

Key parameters that control LLM behavior:

1. **Temperature**: Controls randomness (0.0 = deterministic, 1.0+ = creative)
2. **Top-k**: Samples from top k most likely tokens
3. **Top-p (nucleus sampling)**: Samples from tokens with cumulative probability p
4. **Max length**: Maximum number of tokens to generate
5. **Repetition penalty**: Reduces repetition in generated text

Let's see how these affect generation:


In [None]:
def compare_generation_parameters(prompt):
    """Compare different generation parameters"""
    print(f"Prompt: '{prompt}'\n")
    print("="*70)
    
    # Low temperature
    result1 = generate_text(prompt, max_length=40, temperature=0.1)
    print("Low Temperature (0.1) - Very deterministic:")
    print(result1)
    print()
    
    # Medium temperature
    result2 = generate_text(prompt, max_length=40, temperature=0.7)
    print("Medium Temperature (0.7) - Balanced:")
    print(result2)
    print()
    
    # High temperature
    result3 = generate_text(prompt, max_length=40, temperature=1.5)
    print("High Temperature (1.5) - Very creative:")
    print(result3)
    print()

# Test with a prompt
compare_generation_parameters("The secret to success is")


## Different LLM Architectures

Common LLM architectures available on Hugging Face:

1. **GPT (Generative Pre-trained Transformer)**: Autoregressive models
2. **BERT**: Bidirectional encoder models (good for understanding)
3. **T5**: Text-to-text transfer transformer
4. **Llama**: Meta's open-source LLM
5. **Mistral**: Efficient open-source models

Let's explore model information:


In [None]:
# Get model configuration
config = model.config

print("Model Configuration:")
print("="*60)
print(f"Model Type: {config.model_type}")
print(f"Vocab Size: {config.vocab_size}")
print(f"Max Position Embeddings: {config.n_positions}")
print(f"Number of Layers: {config.n_layer}")
print(f"Hidden Size: {config.n_embd}")
print(f"Number of Attention Heads: {config.n_head}")
print("="*60)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"\nTotal Parameters: {total_params:,}")
print(f"Trainable Parameters: {trainable_params:,}")
print(f"Model Size (approx): {total_params * 4 / 1024 / 1024:.2f} MB (FP32)")


## Using LLMs in Agentic AI

LLMs serve as the "reasoning engine" in agentic AI systems:

1. **Understanding**: Process and understand user queries
2. **Planning**: Break down complex tasks into steps
3. **Decision Making**: Choose which tools/actions to use
4. **Response Generation**: Create natural language responses

Here's a simple example of how an LLM can be used for agent reasoning:


In [None]:
def agent_reasoning(user_query, available_tools):
    """
    Simulate how an LLM might reason about which tool to use
    """
    # Create a prompt for the LLM
    tools_list = ", ".join(available_tools.keys())
    prompt = f"""User query: {user_query}
Available tools: {tools_list}

Which tool should be used? Respond with just the tool name.
Tool:"""
    
    # Generate response
    inputs = tokenizer.encode(prompt, return_tensors="pt")
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=inputs.shape[1] + 10,
            temperature=0.3,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract tool name (simplified)
    tool_suggestion = response.split("Tool:")[-1].strip().split()[0] if "Tool:" in response else "unknown"
    
    return tool_suggestion, response

# Example
available_tools = {
    "search": "Search the web",
    "calculate": "Perform calculations",
    "translate": "Translate text"
}

query = "What is the capital of France?"
tool, full_response = agent_reasoning(query, available_tools)

print(f"User Query: {query}")
print(f"\nSuggested Tool: {tool}")
print(f"\nFull LLM Response:\n{full_response}")


## Best Practices

1. **Model Selection**: Choose models appropriate for your task and resources
2. **Token Management**: Be aware of token limits and costs
3. **Parameter Tuning**: Experiment with temperature, top-k, top-p
4. **Prompt Engineering**: Well-crafted prompts improve results significantly
5. **Error Handling**: Always handle potential errors in model inference
6. **Resource Management**: Consider using smaller models for development

## Next Steps

In the next notebook, we'll use LangChain to build more sophisticated AI agents that combine LLMs with tools and memory systems.


## Exercises

1. **Experiment with different models**: Try loading different models from Hugging Face Hub
2. **Parameter tuning**: Experiment with different temperature and top-k values
3. **Custom prompt engineering**: Create prompts for specific tasks (summarization, Q&A)
4. **Token analysis**: Analyze how different texts are tokenized
5. **Build a simple Q&A system**: Use an LLM to answer questions

Try these exercises to deepen your understanding of LLMs!
