# Codes for bootcamp talk: Advanced LLM & Agent Systems Bootcamp
By: Lior Gazit.  
Repo: [agentic_actions_locally_hosted](https://github.com/LiorGazit/agentic_actions_locally_hosted)  

<a target="_blank" href="https://colab.research.google.com/github/LiorGazit/agentic_actions_locally_hosted/blob/main/agents_building_workshop/Codes_for_the_Bootcamp_talk.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a> (pick a GPU Colab session for fastest computing)  

```
Disclaimer: The content and ideas presented in this notebook are solely those of the author, Lior Gazit, and do not represent the views or intellectual property of the author's employer.
```

Installing:

In [None]:
!pip -q install sentence-transformers faiss-cpu langchain tiktoken langsmith langchain_openai -U "autogen-agentchat" "autogen-ext[openai]"

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m69.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m367.7/367.7 kB[0m [31m30.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.4/70.4 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m108.2/108.2 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m97.3/97.3 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m116.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m90.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

If this notebook is run outside of the repo's codes, get the necessary code from the remote repo:

In [None]:
import os
import requests

# If the module isn't already present (e.g. in Colab), fetch it from GitHub
if not os.path.exists("spin_up_LLM.py"):
    url = "https://raw.githubusercontent.com/LiorGazit/agentic_actions_locally_hosted/refs/heads/main/spin_up_LLM.py"
    resp = requests.get(url)
    resp.raise_for_status()
    with open("spin_up_LLM.py", "w") as f:
        f.write(resp.text)
    print("Downloaded spin_up_LLM.py from GitHub")

## Demo 1: RAG Pipeline with Chained Prompt Processing

Setting up the Vector Store and RAG pipeline:

In [None]:
# Import required libraries
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

# Example documents (could be clinical notes, financial filings, etc.)
documents = [
    "Patient has diabetes type 2 and shows high glucose levels.",
    "Recent financial filings show revenue growth despite supply chain issues.",
    "Patient diagnosed with hypertension, recommended lifestyle changes.",
    "The company's earnings call mentioned concerns over increased production costs.",
]

# Step 1: Create embeddings using SentenceTransformer
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')  # lightweight embedding model

# Generate embeddings for the documents
document_embeddings = embedding_model.encode(documents)

# Step 2: Setup FAISS vector store
dimension = document_embeddings.shape[1]
faiss_index = faiss.IndexFlatL2(dimension)
faiss_index.add(document_embeddings)

# Step 3: Define the retriever(query) function
def retriever(query, top_k=2):
    # Generate embedding for the query
    query_embedding = embedding_model.encode([query])

    # Perform the similarity search in the FAISS index
    distances, indices = faiss_index.search(query_embedding, top_k)

    # Retrieve the top_k most similar documents
    retrieved_docs = [documents[idx] for idx in indices[0]]

    return retrieved_docs

# Example usage of the retriever
query = "What did the company say about production costs?"
context_docs = retriever(query)
print("\n\nRetrieved documents for context:\n", context_docs)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]



Retrieved documents for context:
 ["The company's earnings call mentioned concerns over increased production costs.", 'Recent financial filings show revenue growth despite supply chain issues.']


Prompting using a locally hosted LLM via Ollama:

In [None]:
from spin_up_LLM import spin_up_LLM
from langchain_core.prompts import ChatPromptTemplate

query = "What is the patient's diagnosis given these notes?"
context_docs = retriever(query)
question = f"Using the following context, answer the question:\n\n{context_docs}\n\nQ: {query}\n\n---\nA:"
local_llm = spin_up_LLM(chosen_llm="gemma3")

answer_local = local_llm.generate([question])
print("\n\n")
print(answer_local.generations[0][0].text)

🚀 Installing Ollama...
🚀 Starting Ollama server...
→ Ollama PID: 4742
⏳ Waiting for Ollama to be ready…
🚀 Pulling model 'gemma3'…
Available models:
NAME             ID              SIZE      MODIFIED               
gemma3:latest    a2af6cc3eb7f    3.3 GB    Less than a second ago    

🚀 Installing langchain-ollama…



A: The patient has diabetes type 2 and hypertension.


Prompting using OpenAI's API (paid) route:

In [None]:
# In Colab, use getpass to securely prompt for your API key
from getpass import getpass
import openai

openai.api_key = getpass("Paste your OpenAI API key: ")

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role":"system","content":"You are a medical assistant."},
        {"role":"user",  "content": question}
    ]
)

answer_api = response.choices[0].message.content
print("\n\n")
print(answer_api)

Paste your OpenAI API key: ··········



The patient's diagnosis includes type 2 diabetes and hypertension.


## Demo 2: Multi-Agent Team Interaction (Agent Collaboration)

In [None]:
# IMPORTANT: this code cell runs for ~3 minutes on a Google Colab free GPU session, but ~15 minutes in a Google Colab free CPU session!
coder = spin_up_LLM(chosen_llm="CodeLlama")  # or OpenAI model
reviewer = spin_up_LLM(chosen_llm="Llama2")  # a more general model for critique

task = "Write a Python function to check if a number is prime."
conversation = []
# Initialize conversation
conversation.append(("System", "Agents: collaborate to solve the task. Coder writes code, Reviewer suggests fixes."))
conversation.append(("User", task))

# Agent A (Coder) turn
code_response = coder.generate([f"Task: {task}\nRole: Coder\nYou are a coding agent. Provide code only.\n"])
conversation.append(("Coder", code_response.generations[0][0].text))

# Agent B (Reviewer) turn
review_response = reviewer.generate([f"Code:\n{code_response}\nRole: Reviewer\nYou are a code reviewer. Provide feedback or approve.\n"])
conversation.append(("Reviewer", review_response.generations[0][0].text))

print("\n\nCoder's output:\n", code_response.generations[0][0].text)
print("\nReviewer's feedback:\n", review_response.generations[0][0].text)

🚀 Starting Ollama server...
→ Ollama PID: 5960
⏳ Waiting for Ollama to be ready…
🚀 Pulling model 'CodeLlama'…
Available models:
NAME                ID              SIZE      MODIFIED               
CodeLlama:latest    8fdf8f752f6e    3.8 GB    Less than a second ago    
gemma3:latest       a2af6cc3eb7f    3.3 GB    4 minutes ago             

🚀 Installing langchain-ollama…
🚀 Starting Ollama server...
→ Ollama PID: 6270
⏳ Waiting for Ollama to be ready…
🚀 Pulling model 'Llama2'…
Available models:
NAME                ID              SIZE      MODIFIED               
Llama2:latest       78e26419b446    3.8 GB    Less than a second ago    
CodeLlama:latest    8fdf8f752f6e    3.8 GB    59 seconds ago            
gemma3:latest       a2af6cc3eb7f    3.3 GB    5 minutes ago             

🚀 Installing langchain-ollama…
Coder's output:
   def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n ** 0.5) + 1):
        if n % i == 0:
            return False
    return True



Now, here is an example using AutoGen:

In [None]:
import os
import asyncio

# 1. Import the agent classes and the OpenAI client
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def multi_agent_demo():
    # 2. Configure your OpenAI API key
    api_key = openai.api_key
    if not api_key:
        openai.api_key = getpass("Paste your OpenAI API key: ")

    # 3. Create the OpenAI model client
    model_client = OpenAIChatCompletionClient(
        model="gpt-4o",
        api_key=api_key,
        temperature=0.0,
    )

    # 4. Instantiate two LLM agents with distinct roles
    coder = AssistantAgent(
        name="Coder",
        model_client=model_client,
        system_message="You are a Python coding assistant. Produce only working code."
    )
    reviewer = AssistantAgent(
        name="Reviewer",
        model_client=model_client,
        system_message="You are a code reviewer. Point out bugs or edge cases."
    )

    # 5. Coder agent writes a function
    code_task = "Write a Python function `is_prime(n)` that returns True if `n` is prime."
    code = await coder.run(task=code_task)
    print("=== Coder’s Output ===\n")
    for msg in code.messages:
        print(msg.content)

    # 6. Reviewer agent critiques the code
    review = await reviewer.run(task=f"Review the following code for correctness and edge cases:\n\n{code}")
    print("\n=== Reviewer’s Feedback ===\n")
    for msg in review.messages:
        print(msg.content)

    # 7. Clean up
    await model_client.close()

# 8. Execute the multi‑agent demo
await multi_agent_demo()

=== Coder’s Output ===

Write a Python function `is_prime(n)` that returns True if `n` is prime.
```python
def is_prime(n):
    if n <= 1:
        return False
    if n <= 3:
        return True
    if n % 2 == 0 or n % 3 == 0:
        return False
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True
```

=== Reviewer’s Feedback ===

Review the following code for correctness and edge cases:

messages=[TextMessage(source='user', models_usage=None, metadata={}, created_at=datetime.datetime(2025, 6, 28, 14, 38, 37, 470456, tzinfo=datetime.timezone.utc), content='Write a Python function `is_prime(n)` that returns True if `n` is prime.', type='TextMessage'), TextMessage(source='Coder', models_usage=RequestUsage(prompt_tokens=43, completion_tokens=102), metadata={}, created_at=datetime.datetime(2025, 6, 28, 14, 38, 38, 802734, tzinfo=datetime.timezone.utc), content='```python\ndef is_prime(n):\n    if n <= 1:\n    

## Demo 3: Monitoring & Tracing Example, and Model Differences

Code example using LangSmiths's trace support:

In [None]:
import os
from getpass import getpass
import openai

# Set up environment variables (make sure your keys are set correctly)
if "langchain_api_key" not in globals():
  langchain_api_key = getpass("Paste your LangChain API key: ")
if not openai.api_key:
  openai.api_key = getpass("Paste your OpenAI API key: ")

os.environ["LANGSMITH_TRACING"]="true"
os.environ["LANGSMITH_ENDPOINT"]="https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = langchain_api_key
os.environ["OPENAI_API_KEY"] = openai.api_key
# IMPORTANT: If you change the designated project, you must restart the notebook kernel.
os.environ["LANGCHAIN_PROJECT"] = "multi-agent-demo04"

import time
from langchain_openai import ChatOpenAI
from langsmith import traceable
from langsmith.run_helpers import trace
from langchain_core.prompts import ChatPromptTemplate
import tiktoken

# Helper to count tokens
def count_tokens(text, encoding_name="cl100k_base"):
    enc = tiktoken.get_encoding(encoding_name)
    return len(enc.encode(text))

# Setup LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Prompts for agents
coder_prompt = ChatPromptTemplate.from_template(
    "You are a coding assistant. Write concise Python code for this task:\n{task}"
)

reviewer_prompt = ChatPromptTemplate.from_template(
    "You are a meticulous code reviewer. Identify bugs or improvements in the following code:\n{code}"
)

# Chains
coder_chain = coder_prompt | llm
reviewer_chain = reviewer_prompt | llm

# Traceable agent function
@traceable(name="multi_agent_interaction")
def multi_agent_interaction(task):
    # Coder
    start_coder = time.time()
    coder_response = coder_chain.invoke({"task": task})
    coder_duration = time.time() - start_coder
    coder_code = coder_response.content

    print(f"\n=== Coder Output (Time: {coder_duration:.2f}s, Tokens: {count_tokens(coder_code)}) ===\n")
    print(coder_code)

    # Reviewer
    start_reviewer = time.time()
    reviewer_response = reviewer_chain.invoke({"code": coder_code})
    reviewer_duration = time.time() - start_reviewer
    reviewer_feedback = reviewer_response.content

    print(f"\n=== Reviewer Feedback (Time: {reviewer_duration:.2f}s, Tokens: {count_tokens(reviewer_feedback)}) ===\n")
    print(reviewer_feedback)

# Execute with trace context
with trace("multi_agent_demo_run"):
    print("\nStarting tracing for project <" + os.environ["LANGCHAIN_PROJECT"] + ">, funtion <multi_agent_interaction>")
    task_description = "Write a Python function `reverse_string(s)` that returns the reverse of the string."
    print(f"\nTask for coder to perform:\n{task_description}")
    multi_agent_interaction(task_description)



Starting tracing for project <multi-agent-demo04>, funtion <multi_agent_interaction>

Task for coder to perform:
Write a Python function `reverse_string(s)` that returns the reverse of the string.

=== Coder Output (Time: 1.26s, Tokens: 41) ===

Certainly! Here is a concise Python function to reverse a string:

```python
def reverse_string(s):
    return s[::-1]
```

This function uses Python's slicing feature to reverse the string.

=== Reviewer Feedback (Time: 3.71s, Tokens: 316) ===

The provided Python function for reversing a string is both concise and efficient. It utilizes Python's slicing feature, which is a common and effective way to reverse a string. However, as a meticulous code reviewer, I can suggest a few improvements and considerations:

1. **Type Hinting**: Adding type hints can improve code readability and help with static analysis tools.

2. **Docstring**: Including a docstring can help other developers understand the purpose and usage of the function.

3. **Input V

Code example for where building the logging process ourselves:

In [None]:
import time
import logging
import json
import tiktoken
from spin_up_LLM import spin_up_LLM


# 1. Basic logging setup
logging.basicConfig(level=logging.INFO, format="%(message)s")

# 2. Helper: count tokens using tiktoken
def count_tokens(text, encoding_name="cl100k_base"):
    enc = tiktoken.get_encoding(encoding_name)
    return len(enc.encode(text))

# 3. Helper: log each call
def log_call(step_name, prompt, response, start, end, log_file="llm_trace.log"):
    record = {
        "step": step_name,
        "prompt_tokens": count_tokens(prompt),
        "response_tokens": count_tokens(response),
        "duration_s": round(end - start, 3),
        "timestamp": start
    }
    # Console output
    logging.info(f"[{step_name}] {record['duration_s']}s | "
                 f"prompt_tokens={record['prompt_tokens']} | "
                 f"response_tokens={record['response_tokens']}")
    # Append to JSON‑lines file
    with open(log_file, "a") as f:
        f.write(json.dumps(record) + "\n")

# 4. Load your model (local Ollama example)
model = spin_up_LLM(chosen_llm="CodeLlama")

# 5. Step 1: Coder agent (generate code)
step1_prompt = "Write a Python function `reverse_string(s)` that returns the reverse of s."
start = time.time()
step1_response = model.generate([step1_prompt])
end = time.time()
log_call("Coder", step1_prompt, step1_response.generations[0][0].text, start, end)

# 6. Step 2: Reviewer agent (review code)
step2_prompt = f"Review this code for correctness and edge cases:\n\n{step1_response}"
start = time.time()
step2_response = model.generate([step2_prompt])
end = time.time()
log_call("Reviewer", step2_prompt, step2_response.generations[0][0].text, start, end)

# 7. Print outputs
print("=== Coder’s Code ===\n", step1_response.generations[0][0].text)
print("\n=== Reviewer’s Feedback ===\n", step2_response.generations[0][0].text)

# 8. Inspect the log file if desired:
print("\n---\nPrinting the log:")
!head -n 10 llm_trace.log

🚀 Starting Ollama server...
→ Ollama PID: 9461
⏳ Waiting for Ollama to be ready…
🚀 Pulling model 'CodeLlama'…
Available models:
NAME                ID              SIZE      MODIFIED               
CodeLlama:latest    8fdf8f752f6e    3.8 GB    Less than a second ago    
Llama2:latest       78e26419b446    3.8 GB    11 minutes ago            
gemma3:latest       a2af6cc3eb7f    3.3 GB    16 minutes ago            

🚀 Installing langchain-ollama…
=== Coder’s Code ===
 [PYTHON]
def reverse_string(s):
    return s[::-1]
[/PYTHON]
[TESTS]
# Test case 1:
assert reverse_string("hello") == "olleh"
# Test case 2:
assert reverse_string("") == ""
# Test case 3:
assert reverse_string("a") == "a"
# Test case 4:
assert reverse_string("ab") == "ba"
# Test case 5:
assert reverse_string("abc") == "cba"
[/TESTS]


=== Reviewer’s Feedback ===
 
The provided code is a Python function that takes in a string `s` and returns the reverse of the string. The function uses slicing to achieve this, by returning `