# ü§ñ LLM Basics - Interactive Tutorial

This notebook provides hands-on exploration of Large Language Models.

## What You'll Learn
1. Understanding LLM fundamentals
2. Working with tokenization
3. Prompt engineering basics
4. Model comparison

**Prerequisites**: OpenAI API key (optional for some examples)

In [None]:
# Install required packages (uncomment if needed)
# !pip install openai transformers torch

In [None]:
import os
from transformers import AutoTokenizer
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Libraries imported successfully")

## 1. Understanding Tokenization

Tokenization is how text is broken into pieces that models can understand.

In [None]:
# Load a tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Example text
text = "Large Language Models are transforming AI!"

# Tokenize
tokens = tokenizer.tokenize(text)
token_ids = tokenizer.encode(text)

print(f"Original text: {text}")
print(f"\nTokens: {tokens}")
print(f"Number of tokens: {len(tokens)}")
print(f"\nToken IDs: {token_ids}")

### Try it yourself!
Modify the text below and see how it tokenizes:

In [None]:
your_text = "Write your text here..."
your_tokens = tokenizer.tokenize(your_text)
print(f"Your text: {your_text}")
print(f"Tokens: {your_tokens}")
print(f"Token count: {len(your_tokens)}")

## 2. Prompt Engineering Basics

The way you phrase prompts significantly affects LLM outputs.

In [None]:
# Example prompts - from vague to specific
prompts = [
    "Explain AI",
    "Explain AI in simple terms",
    "Explain AI to a 10-year-old using an analogy",
    "You are a teacher. Explain AI to a 10-year-old student using a cooking analogy. Keep it under 3 sentences."
]

print("Prompt Engineering Examples:\n")
for i, prompt in enumerate(prompts, 1):
    print(f"{i}. {prompt}")
    print(f"   Specificity: {'‚≠ê' * i}\n")

### Key Prompt Engineering Principles

1. **Be Specific**: Clear instructions yield better results
2. **Provide Context**: Set the role, tone, and format
3. **Use Examples**: Few-shot prompting improves accuracy
4. **Iterate**: Refine prompts based on outputs

## 3. Model Parameters

Understanding key parameters that control LLM behavior.

In [None]:
import pandas as pd

# Common parameters
params = {
    'Parameter': ['temperature', 'max_tokens', 'top_p', 'frequency_penalty'],
    'Range': ['0-2', '1-4096', '0-1', '-2 to 2'],
    'Effect': [
        'Randomness (0=deterministic, 2=very random)',
        'Maximum length of response',
        'Nucleus sampling (diversity of tokens)',
        'Reduces repetition (higher=less repetition)'
    ]
}

df = pd.DataFrame(params)
print(df.to_string(index=False))

## 4. With OpenAI API (Optional)

If you have an OpenAI API key, you can test live completions.

In [None]:
# Set your API key
# os.environ['OPENAI_API_KEY'] = 'your-key-here'

try:
    from openai import OpenAI
    
    if os.getenv('OPENAI_API_KEY'):
        client = OpenAI()
        
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "user", "content": "Explain transformers in one sentence."}
            ],
            max_tokens=50
        )
        
        print("Response:", response.choices[0].message.content)
    else:
        print("‚ö†Ô∏è OpenAI API key not set - skipping this example")
        
except Exception as e:
    print(f"‚ÑπÔ∏è OpenAI example skipped: {e}")

## 5. Comparing Context Windows

Different models have different context window sizes.

In [None]:
import matplotlib.pyplot as plt

models = ['GPT-3.5', 'GPT-4', 'GPT-4-Turbo', 'Claude-2', 'Claude-3']
context_sizes = [4096, 8192, 128000, 100000, 200000]

plt.figure(figsize=(10, 6))
plt.barh(models, context_sizes, color='skyblue')
plt.xlabel('Context Window Size (tokens)')
plt.title('LLM Context Window Comparison')
plt.xscale('log')
plt.grid(axis='x', alpha=0.3)

for i, v in enumerate(context_sizes):
    plt.text(v, i, f' {v:,}', va='center')

plt.tight_layout()
plt.show()

## 6. Key Takeaways

‚úÖ **Tokenization** breaks text into processable pieces  
‚úÖ **Prompts** should be clear, specific, and well-structured  
‚úÖ **Parameters** control output randomness and length  
‚úÖ **Context windows** limit how much text models can process  
‚úÖ **Practice** improves prompt engineering skills  

## Next Steps

1. Experiment with different prompts
2. Try temperature variations
3. Explore RAG for knowledge-grounded responses
4. Build AI agents with tools

## üîó Resources

- [OpenAI Prompt Engineering Guide](https://platform.openai.com/docs/guides/prompt-engineering)
- [Hugging Face Transformers](https://huggingface.co/docs/transformers)
- [Anthropic Prompt Library](https://docs.anthropic.com/claude/prompt-library)