# 🧠 Lab 2: Exploring Different LLM Models (OpenAI + Ollama + Variants)
Welcome to **Lab 2** of the *Agentic AI Practical Series*! 👋

In this lab, we’ll explore **multiple Large Language Models (LLMs)** using the **OpenAI SDK** — without any external frameworks.

By the end of this lab, you’ll understand how to:
- Use and compare models like `gpt-4o`, `gpt-4.1-mini`, and local Ollama models
- Tune core parameters (`temperature`, `top_p`, `max_output_tokens`)
- Stream outputs and handle structured data using pydantic model and json object
- Simulate memory and multi-turn conversations
- Apply Agentic AI concepts practically

Let’s begin 🚀

## 🔐 Step 1: Setup Environment and API Keys

In [46]:
from dotenv import load_dotenv
import os

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')

if openai_api_key:
    print(f'✅ OpenAI API Key found, begins with: {openai_api_key[:8]}')
else:
    print('❌ API key not found. Please check your .env file.')

✅ OpenAI API Key found, begins with: sk-svcac


## ⚙️ Step 2: Initialize OpenAI and Ollama Clients

In [47]:
from openai import OpenAI
client = OpenAI(api_key=openai_api_key) #OpenAI()

try:
    from openai import OpenAI as OllamaClient
    ollama = OllamaClient(base_url='http://localhost:11434/v1', api_key='ollama')
    print('✅ Ollama client initialized successfully')
except Exception as e:
    print('⚠️ Ollama client not connected:', e)

# gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
# deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
# groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
# claude = Anthropic()

✅ Ollama client initialized successfully


## 🧩 Step 3: Compare Model Outputs

In [None]:
!ollama pull llama3.2

In [48]:
prompt = 'Explain the concept of reinforcement learning in simple terms.'
models = ['gpt-4o-mini'] # , 'gpt-4o'
for m in models:
    res = client.responses.create(model=m, input=prompt)
    ## here also client.chat.completions.create
    print(f'\n--- {m} ---\n{res.output_text}')

try:
    # this wont work because - ollama server doesn't implement /v1/responses, \
    #  only /v1/chat/completions and /v1/completions
    # res_local = ollama.responses.create(model='llama3.2', input=prompt)
    # print(f'\n--- Ollama (llama3) ---\n{res_local.output_text}')
    model_name = "llama3.2"
    messages = [{"role": "user", "content": prompt}]
    res_local = ollama.chat.completions.create(model=model_name, messages=messages)
    print(f'\n--- Ollama (llama3.2) ---\n{res_local.choices[0].message.content}')
except Exception as e:
    print('Ollama model skipped:', e)


--- gpt-4o-mini ---
Reinforcement learning is a type of machine learning where an agent learns to make decisions by trying things out in an environment and receiving feedback. 

Here’s how it works, step by step:

1. **Agent and Environment:** Imagine a robot (the agent) navigating a maze (the environment).

2. **Actions:** The robot can take different actions, like moving forward, turning left, or turning right.

3. **Rewards:** After each action, the robot gets a reward or a penalty. For example, reaching a goal might give it points (a reward), while hitting a wall might take away points (a penalty).

4. **Learning:** The robot learns over time which actions yield the best rewards by trying different paths and remembering what worked well.

5. **Goal:** The ultimate goal is for the robot to figure out the best strategy to maximize its total rewards as quickly as possible.

In short, reinforcement learning is about learning from experience to make better decisions based on rewards an

## ⚡ Step 4: Streaming Responses

In [49]:
import sys
with client.responses.stream(
    model='gpt-4o-mini',
    input='Write a 3-line poem about open-source AI.',
    temperature=0.6
) as stream:
    for event in stream:
        if event.type == 'response.output_text.delta':
            sys.stdout.write(event.delta)
            sys.stdout.flush()

# to get full response in one 
final = stream.get_final_response()
stream.close()

In code we trust, where minds unite,  
A world of knowledge, shining bright,  
Open-source AI, our shared insight.

## 🧠 Step 5: Structured Output and Function Calling

In [52]:
from pydantic import BaseModel
from typing import List

# previous lab
# response = client.responses.create(
#     model='gpt-4o-mini',
#     input='List three AI frameworks and their main use cases in JSON format.',
#     response_format={'type': 'json_object'}
# )
# print(response.output_text)


# 1️⃣ Define your schema
class Framework(BaseModel):
    name: str
    use_case: str

class FrameworkList(BaseModel):
    frameworks: List[Framework]

# 2️⃣ Ask model to return output matching the schema
# response = client.responses.create(
#     model="gpt-4o-mini",
#     input="List three AI frameworks and their main use cases.",
#     response_format=FrameworkList  # 👈 Use Pydantic class directly
# )
# structured_output = response.output_parsed

# Note: its not client.chat.completions.create but parse
response = client.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that always returns valid JSON."},
        {"role": "user", "content": "List three AI frameworks and their main use cases."}
    ],
    temperature=0.2,
    response_format=FrameworkList
)

print(response)


ParsedChatCompletion[FrameworkList](id='chatcmpl-CUdo2vrGliwqbvXSXToknnQ7bZuRL', choices=[ParsedChoice[FrameworkList](finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage[FrameworkList](content='{"frameworks":[{"name":"TensorFlow","use_case":"Deep learning and machine learning applications."},{"name":"PyTorch","use_case":"Research and production in deep learning."},{"name":"Keras","use_case":"High-level neural networks API for fast experimentation."}]}', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None, parsed=FrameworkList(frameworks=[Framework(name='TensorFlow', use_case='Deep learning and machine learning applications.'), Framework(name='PyTorch', use_case='Research and production in deep learning.'), Framework(name='Keras', use_case='High-level neural networks API for fast experimentation.')])))], created=1761418890, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fi

In [53]:
# 3️⃣ Access structured data
structured_output = response.choices[0].message.parsed.frameworks


for fw in structured_output:
    print(f"- {fw.name}: {fw.use_case}")

- TensorFlow: Deep learning and machine learning applications.
- PyTorch: Research and production in deep learning.
- Keras: High-level neural networks API for fast experimentation.


## 🧩 Step 6: Simulate Memory and Multi-turn Conversations

 ChatGPT doesn’t actually have long-term memory by default — it only “remembers” what’s inside the current request.

 By passing the full conversation array each time, you’re giving it that memory manually.

In [54]:
conversation = [
    {'role': 'system', 'content': 'You are a memory-capable AI assistant.'},
    {'role': 'user', 'content': 'I studied transformers yesterday. What are they?'}
]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=conversation
)
print(response.choices[0].message.content)

conversation.append({'role': 'assistant', 'content': response.choices[0].message.content})
conversation.append({'role': 'user', 'content': 'Now explain attention mechanism briefly.'})


# sample conversation input now
# [
#   {'role': 'system', 'content': 'You are a memory-capable AI assistant.'},
#   {'role': 'user', 'content': 'I studied transformers yesterday. What are they?'},
#   {'role': 'assistant', 'content': 'Transformers are neural network architectures...'},
#   {'role': 'user', 'content': 'Now explain attention mechanism briefly.'}
# ]

response2 = client.chat.completions.create(model='gpt-4o-mini', messages=conversation)
print('\n--- Follow-up ---\n', response2.choices[0].message.content)

Transformers are a type of deep learning model primarily used in natural language processing (NLP) tasks, such as translation, text generation, and sentiment analysis. Introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017, transformers rely on a mechanism called self-attention that allows them to weigh the importance of different words in a sentence, regardless of their position. This is in contrast to earlier models like recurrent neural networks (RNNs) which processed data sequentially.

Key components of transformers include:

1. **Self-Attention**: This mechanism computes attention scores for each word in relation to all other words in the input sequence, allowing the model to focus on relevant parts of the text.

2. **Positional Encoding**: Since transformers process input in parallel rather than sequentially, they incorporate positional encodings to maintain information about the order of words.

3. **Multi-Head Attention**: The model uses multiple attenti

## 🧮 Step 7: Model Comparison Helper

In [55]:
!uv pip install tabulate

[2mUsing Python 3.12.11 environment at: E:\agentic-dpoint\.venv[0m
[2mAudited [1m1 package[0m [2min 48ms[0m[0m


In [56]:
import textwrap
import time
from tabulate import tabulate  # pip install tabulate

def compare_models(prompt, models):
    """
    Compare responses from multiple models (OpenAI, Ollama, etc.)
    Args:
        prompt (str): Input prompt to test
        models (list[str]): Model names to compare
    """
    results = []
    print(f"\n🧠 Prompt: {prompt}\n{'='*80}")

    for model_name in models:
        start = time.time()
        output = ""
        source = "Unknown"

        try:
            # --- Case 1️⃣: OpenAI Models ---
            if model_name.startswith(("gpt-", "o1-", "o3-")):
                source = "OpenAI"
                res = client.responses.create(
                    model=model_name,
                    input=prompt,
                    temperature=0.6,
                    max_output_tokens=250
                )
                output = res.output_text.strip()

            # --- Case 2️⃣: Ollama Local Models ---
            elif model_name.lower() in ["llama3", "llama3.2", "mistral", "phi3"]:
                source = "Ollama"
                res_local = ollama.chat.completions.create(
                    model=model_name,
                    messages=[{"role": "user", "content": prompt}],
                    temperature=0.6
                )
                output = res_local.choices[0].message.content.strip()

            else:
                output = f"⚠️ No handler for '{model_name}'"

        except Exception as e:
            output = f"❌ Failed: {e}"

        latency = time.time() - start
        results.append({
            "Model": model_name,
            "Source": source,
            "Tokens/Words": len(output.split()),
            "Latency (s)": round(latency, 2),
            "Response": output
        })

    # --- 🧾 Tabular Summary ---
    summary = [
        [r["Model"], r["Source"], r["Tokens/Words"], r["Latency (s)"]]
        for r in results
    ]
    print("\n📊 Summary:\n")
    print(tabulate(summary, headers=["Model", "Source", "Tokens/Words", "Latency (s)"], tablefmt="fancy_grid"))

    # --- 🪶 Side-by-side Output Comparison ---
    print("\n🧩 Detailed Comparison:\n")
    for r in results:
        print(f"\n=== {r['Model'].upper()} ({r['Source']}) ===")
        print(textwrap.fill(r["Response"], width=100))

    print("\n✅ Comparison complete.\n")


In [57]:
compare_models(
    prompt="Describe the concept of Agentic AI.",
    models=["gpt-4o-mini", "gpt-4o", "llama3.2"]
)



🧠 Prompt: Describe the concept of Agentic AI.

📊 Summary:

╒═════════════╤══════════╤════════════════╤═══════════════╕
│ Model       │ Source   │   Tokens/Words │   Latency (s) │
╞═════════════╪══════════╪════════════════╪═══════════════╡
│ gpt-4o-mini │ OpenAI   │            187 │          6.53 │
├─────────────┼──────────┼────────────────┼───────────────┤
│ gpt-4o      │ OpenAI   │            179 │          5.26 │
├─────────────┼──────────┼────────────────┼───────────────┤
│ llama3.2    │ Ollama   │            342 │         19.35 │
╘═════════════╧══════════╧════════════════╧═══════════════╛

🧩 Detailed Comparison:


=== GPT-4O-MINI (OpenAI) ===
Agentic AI refers to artificial intelligence systems that possess a degree of autonomy and decision-
making capability, allowing them to act independently in pursuit of specific goals. Unlike
traditional AI, which typically follows predefined rules and algorithms, agentic AI can evaluate
situations, learn from experiences, and adapt its behavi

## 🏁 Step 8: Key Takeaways
- You learned to use and compare **multiple LLMs** including OpenAI and Ollama models.
- You tuned model behavior using parameters like `temperature` and `max_output_tokens`.
- You generated structured data, streamed outputs, and simulated conversational memory.
- You’re now ready to build **Agentic AI systems** leveraging model variety and reasoning.
- You learned to compare multiple LLMs side-by-side, including both OpenAI cloud models and local Ollama models.
- You explored differences in API usage (client.responses.create vs ollama.chat.completions.create) and learned how to handle each safely in a unified workflow.
- You enhanced evaluation by adding quantitative metrics such as response latency, word/token count, and model source for performance benchmarking.
- You reviewed qualitative outputs to judge clarity, coherence, and style—skills useful when selecting models for specific Agentic AI tasks.
- You built robust error handling and modular code to easily extend comparisons to new providers (Anthropic, Gemini, Groq, etc.).
- You’re now ready to create automated evaluation pipelines that analyze, rank, or fine-tune model behaviors across diverse LLMs.