# LLM Fundamentals: Understanding the Basics

This notebook covers the fundamental concepts of working with Large Language Models (LLMs).

## Topics Covered:
1. Setting up API connections
2. Basic API calls
3. Understanding tokens and tokenization
4. Temperature and sampling parameters
5. Streaming responses
6. Cost estimation
7. Testing our custom utils

---

## 1. Setup and Installation

First, let's import the necessary libraries and set up our environment.

In [2]:
import os
import sys
from dotenv import load_dotenv
from openai import OpenAI
import tiktoken

# Add parent directory to path to import our utils
sys.path.append('..')
from utils.config import Config
from utils.text_processing import count_tokens
from utils.performance import CostEstimator, timer

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

print("‚úÖ Setup complete!")

‚úÖ Setup complete!


## 2. Your First LLM API Call

Let's start with a simple API call to understand the basic structure.

In [4]:
def simple_completion(prompt: str, model: str = "gpt-3.5-turbo"):
    """Make a simple completion request."""
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content

# Test it out
prompt = "Explain what a large language model is in one sentence."
response = simple_completion(prompt)

print(f"Prompt: {prompt}")
print(f"\nResponse: {response}")

AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: your_key*here. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

### Understanding the Response Object

Let's examine what information the API returns.

In [None]:
# Get full response object
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

print("=" * 60)
print("RESPONSE OBJECT STRUCTURE")
print("=" * 60)
print(f"\nModel used: {response.model}")
print(f"ID: {response.id}")
print(f"Created: {response.created}")
print(f"\nToken usage:")
print(f"  - Prompt tokens: {response.usage.prompt_tokens}")
print(f"  - Completion tokens: {response.usage.completion_tokens}")
print(f"  - Total tokens: {response.usage.total_tokens}")
print(f"\nFinish reason: {response.choices[0].finish_reason}")
print(f"\nMessage content: {response.choices[0].message.content}")

## 3. Understanding Tokens and Tokenization

Tokens are the fundamental units that LLMs process. Understanding tokenization is crucial for:
- Cost estimation (pricing is per token)
- Context window management
- Prompt engineering

### What is a Token?
- A token is a piece of text (not exactly a word)
- Common words = 1 token (e.g., "cat")
- Uncommon words = multiple tokens (e.g., "unconventional" ‚âà 3 tokens)
- 1 token ‚âà 4 characters in English

In [None]:
def count_and_display_tokens(text: str, model: str = "gpt-4"):
    """Count tokens and show token breakdown."""
    encoding = tiktoken.encoding_for_model(model)
    tokens = encoding.encode(text)

    print(f"Text: '{text}'")
    print(f"Length: {len(text)} characters")
    print(f"Tokens: {len(tokens)}")
    print(f"Token IDs: {tokens[:20]}...")  # Show first 20
    print(f"Chars per token: {len(text) / len(tokens):.2f}")
    print("-" * 60)

# Test with different texts
texts = [
    "Hello, world!",
    "The quick brown fox jumps over the lazy dog.",
    "Supercalifragilisticexpialidocious",
    "AI and ML are transforming technology.",
    "‰∫∫Â∑•Êô∫ËÉΩÊ≠£Âú®ÊîπÂèò‰∏ñÁïå"  # Chinese text
]

for text in texts:
    count_and_display_tokens(text)

### Token Counting with Our Custom Utility

In [None]:
# Using our custom utility
text = """Large Language Models (LLMs) are advanced AI systems trained on vast amounts
of text data. They can understand and generate human-like text, making them useful
for tasks like translation, summarization, and question-answering."""

token_count = count_tokens(text, model="gpt-4")
print(f"Text length: {len(text)} characters")
print(f"Estimated tokens: {token_count}")
print(f"\nText preview:")
print(text)

## 4. Temperature and Sampling Parameters

Temperature controls the randomness of the model's output:
- **Temperature = 0**: Deterministic, always picks most likely token (good for factual tasks)
- **Temperature = 0.3-0.7**: Balanced creativity and consistency
- **Temperature = 1.0+**: More random and creative (good for creative writing)

Let's see this in action!

In [None]:
def compare_temperatures(prompt: str, temperatures: list):
    """Compare outputs at different temperatures."""
    print(f"Prompt: {prompt}")
    print("=" * 80)

    for temp in temperatures:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=temp,
            max_tokens=100
        )

        print(f"\nüå°Ô∏è  Temperature: {temp}")
        print(f"Response: {response.choices[0].message.content}")
        print("-" * 80)

# Test with a factual question
compare_temperatures(
    "What is the capital of France?",
    temperatures=[0.0, 0.7, 1.5]
)

In [None]:
# Test with a creative task
compare_temperatures(
    "Write a creative tagline for an AI startup.",
    temperatures=[0.0, 0.7, 1.5]
)

### Other Important Parameters

In [None]:
# max_tokens: Limit response length
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Explain quantum computing."}],
    max_tokens=50  # Short response
)
print("Short response (max_tokens=50):")
print(response.choices[0].message.content)
print(f"\nTokens used: {response.usage.completion_tokens}")

print("\n" + "="*60 + "\n")

# top_p: Nucleus sampling (alternative to temperature)
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Write a haiku about AI."}],
    top_p=0.9,
    temperature=0.8
)
print("Creative output (top_p=0.9, temperature=0.8):")
print(response.choices[0].message.content)

## 5. System Messages and Conversation Context

System messages set the behavior and context for the model.

In [None]:
def chat_with_system_message(system_msg: str, user_msg: str):
    """Make a call with a system message."""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": system_msg},
            {"role": "user", "content": user_msg}
        ]
    )
    return response.choices[0].message.content

# Example 1: Technical expert
response1 = chat_with_system_message(
    system_msg="You are a senior software engineer with expertise in Python.",
    user_msg="How do I optimize a loop that processes 1 million items?"
)
print("Technical Expert Response:")
print(response1)

print("\n" + "="*80 + "\n")

# Example 2: Simple explainer
response2 = chat_with_system_message(
    system_msg="You explain complex topics to 10-year-olds using simple language and analogies.",
    user_msg="How do I optimize a loop that processes 1 million items?"
)
print("Simple Explainer Response:")
print(response2)

### Multi-turn Conversations

In [None]:
# Maintaining conversation context
conversation = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "What's the capital of France?"},
]

# First exchange
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=conversation
)
assistant_reply = response.choices[0].message.content
conversation.append({"role": "assistant", "content": assistant_reply})

print("User: What's the capital of France?")
print(f"Assistant: {assistant_reply}\n")

# Follow-up question (references previous context)
conversation.append({"role": "user", "content": "What's its population?"})
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=conversation
)
assistant_reply = response.choices[0].message.content

print("User: What's its population?")
print(f"Assistant: {assistant_reply}")

## 6. Streaming Responses

Streaming allows you to get responses token-by-token as they're generated, improving perceived latency.

In [None]:
def stream_completion(prompt: str):
    """Stream a completion response."""
    print("Streaming response: ", end="", flush=True)

    stream = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )

    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            content = chunk.choices[0].delta.content
            print(content, end="", flush=True)
            full_response += content

    print()  # New line
    return full_response

# Test streaming
response = stream_completion(
    "Explain the concept of neural networks in 3 sentences."
)

## 7. Cost Estimation

Understanding and tracking costs is crucial when working with LLM APIs.

In [None]:
def estimate_and_call(prompt: str, model: str = "gpt-3.5-turbo"):
    """Estimate cost before making the call."""
    # Estimate input tokens
    input_tokens = count_tokens(prompt, model)

    # Estimate cost (assuming ~150 output tokens)
    estimated_cost = CostEstimator.estimate_cost(
        model=model,
        input_tokens=input_tokens,
        output_tokens=150
    )

    print(f"Estimated input tokens: {input_tokens}")
    print(f"Estimated cost: ${estimated_cost:.6f}")
    print("-" * 60)

    # Make the actual call
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )

    # Calculate actual cost
    actual_cost = CostEstimator.estimate_cost(
        model=model,
        input_tokens=response.usage.prompt_tokens,
        output_tokens=response.usage.completion_tokens
    )

    print(f"\nActual tokens:")
    print(f"  Input: {response.usage.prompt_tokens}")
    print(f"  Output: {response.usage.completion_tokens}")
    print(f"  Total: {response.usage.total_tokens}")
    print(f"Actual cost: ${actual_cost:.6f}")

    return response.choices[0].message.content

# Test it
response = estimate_and_call(
    "Explain the difference between supervised and unsupervised learning."
)
print(f"\nResponse:\n{response}")

### Comparing Costs Across Models

In [None]:
# Compare costs for different models
prompt = "Write a 100-word summary of machine learning."
models = ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]

input_tokens = 100  # Approximate
output_tokens = 150  # Approximate

print("Cost Comparison for Same Task:")
print("=" * 60)
for model in models:
    cost = CostEstimator.estimate_cost(model, input_tokens, output_tokens)
    print(f"{model:20s}: ${cost:.6f}")

## 8. Testing Our Custom Utils

Let's test the utility functions we created.

In [None]:
# Test timer decorator
@timer
def slow_completion(prompt: str):
    """A completion that we'll time."""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

result = slow_completion("Tell me a fun fact about Python.")
print(f"\nResult: {result}")

In [None]:
# Test Config utility
config = Config.from_env()

print("Configuration loaded from environment:")
print(f"Model: {config.get('model')}")
print(f"Temperature: {config.get('temperature')}")
print(f"Max tokens: {config.get('max_tokens')}")

# Use config in a call
response = client.chat.completions.create(
    model=config.get('model'),
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=config.get('temperature'),
    max_tokens=config.get('max_tokens')
)
print(f"\nResponse: {response.choices[0].message.content}")

## 9. Best Practices Summary

### When to Use Different Settings:

| Task Type | Temperature | Model | Why |
|-----------|-------------|-------|-----|
| Factual Q&A | 0.0 - 0.3 | GPT-3.5-Turbo | Consistent, cheap |
| Data extraction | 0.0 | GPT-3.5-Turbo | Deterministic output |
| Creative writing | 0.7 - 1.0 | GPT-4 | More creative, high quality |
| Code generation | 0.0 - 0.3 | GPT-4 | Reliable, accurate |
| Summarization | 0.3 - 0.5 | GPT-3.5-Turbo | Balanced |
| Brainstorming | 0.8 - 1.2 | GPT-4 | Diverse ideas |

### Cost Optimization Tips:
1. Use GPT-3.5-Turbo for simple tasks
2. Keep prompts concise
3. Use `max_tokens` to limit output
4. Cache common responses
5. Batch similar requests

### Common Pitfalls:
1. Not counting tokens properly
2. Using GPT-4 when GPT-3.5 suffices
3. Forgetting to handle rate limits
4. Not streaming for long responses
5. Ignoring token context limits

## 10. Practice Exercises

Try these exercises to reinforce your learning:

In [None]:
# Exercise 1: Create a function that estimates cost BEFORE making a call
# and only proceeds if cost is below a threshold

def safe_completion(prompt: str, max_cost: float = 0.01):
    """Only make API call if estimated cost is below threshold."""
    # TODO: Implement this
    pass

# Exercise 2: Create a function that tries GPT-3.5-Turbo first,
# and falls back to GPT-4 if the response quality is poor

def smart_completion(prompt: str):
    """Try cheap model first, upgrade if needed."""
    # TODO: Implement this
    pass

# Exercise 3: Create a conversation manager that maintains context
# and tracks total cost

class ConversationManager:
    def __init__(self, system_message: str):
        # TODO: Implement this
        pass

    def add_user_message(self, message: str):
        # TODO: Implement this
        pass

    def get_response(self):
        # TODO: Implement this
        pass

    def get_total_cost(self):
        # TODO: Implement this
        pass

## Summary

In this notebook, you learned:
- ‚úÖ How to make basic API calls to OpenAI
- ‚úÖ Understanding tokens and their importance
- ‚úÖ How temperature affects output
- ‚úÖ Using system messages for context
- ‚úÖ Streaming responses for better UX
- ‚úÖ Estimating and tracking costs
- ‚úÖ Using our custom utility functions

### Next Steps:
- Move to `02_langchain_basics.ipynb` to learn about LangChain
- Experiment with different prompts and parameters
- Complete the practice exercises
- Read the OpenAI API documentation

---

**Happy Learning! üöÄ**