# Codes for bootcamp talk: Advanced LLM & Agent Systems Bootcamp
By: Lior Gazit.  
Repo: [agentic_actions_locally_hosted](https://github.com/LiorGazit/agentic_actions_locally_hosted)  

<a target="_blank" href="https://colab.research.google.com/github/LiorGazit/agentic_actions_locally_hosted/blob/main/agents_building_workshop/Codes_for_the_Bootcamp_talk.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a> (pick a GPU Colab session for fastest computing)  

```
Disclaimer: The content and ideas presented in this notebook are solely those of the author, Lior Gazit, and do not represent the views or intellectual property of the author's employer.
```

Installing:

In [None]:
!pip -q install sentence-transformers faiss-cpu langchain tiktoken langsmith langchain_openai -U "autogen-agentchat" "autogen-ext[openai]"

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m70.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.0/363.0 kB[0m [31m31.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.4/65.4 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m108.2/108.2 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m97.3/97.3 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m438.1/438.1 kB[0m [31m33.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m66.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

If this notebook is run outside of the repo's codes, get the necessary code from the remote repo:

In [None]:
import os
import requests

# If the module isn't already present (e.g. in Colab), fetch it from GitHub
if not os.path.exists("spin_up_LLM.py"):
    url = "https://raw.githubusercontent.com/LiorGazit/agentic_actions_locally_hosted/refs/heads/main/spin_up_LLM.py"
    resp = requests.get(url)
    resp.raise_for_status()
    with open("spin_up_LLM.py", "w") as f:
        f.write(resp.text)
    print("Downloaded spin_up_LLM.py from GitHub")

Downloaded spin_up_LLM.py from GitHub


## Demo 1: RAG Pipeline with Chained Prompt Processing

Setting up the Vector Store and RAG pipeline:

In [None]:
# Import required libraries
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

# Example documents (could be clinical notes, financial filings, etc.)
documents = [
    "Patient has diabetes type 2 and shows high glucose levels.",
    "Recent financial filings show revenue growth despite supply chain issues.",
    "Patient diagnosed with hypertension, recommended lifestyle changes.",
    "The company's earnings call mentioned concerns over increased production costs.",
]

# Step 1: Create embeddings using SentenceTransformer
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')  # lightweight embedding model

# Generate embeddings for the documents
document_embeddings = embedding_model.encode(documents)

# Step 2: Setup FAISS vector store
dimension = document_embeddings.shape[1]
faiss_index = faiss.IndexFlatL2(dimension)
faiss_index.add(document_embeddings)

# Step 3: Define the retriever(query) function
def retriever(query, top_k=2):
    # Generate embedding for the query
    query_embedding = embedding_model.encode([query])

    # Perform the similarity search in the FAISS index
    distances, indices = faiss_index.search(query_embedding, top_k)

    # Retrieve the top_k most similar documents
    retrieved_docs = [documents[idx] for idx in indices[0]]

    return retrieved_docs

# Example usage of the retriever
query = "What did the company say about production costs?"
context_docs = retriever(query)
print("\n\nRetrieved documents for context:\n", context_docs)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]



Retrieved documents for context:
 ["The company's earnings call mentioned concerns over increased production costs.", 'Recent financial filings show revenue growth despite supply chain issues.']


Prompting using a locally hosted LLM via Ollama:

In [None]:
from spin_up_LLM import spin_up_LLM
from langchain_core.prompts import ChatPromptTemplate

query = "What is the patient's diagnosis given these notes?"
context_docs = retriever(query)
question = f"Using the following context, answer the question:\n\n{context_docs}\n\nQ: {query}\n\n---\nA:"
local_llm = spin_up_LLM(chosen_llm="gemma3")

answer_local = local_llm.generate([question])
print(answer_local.generations[0][0].text)

🚀 Installing Ollama...
🚀 Starting Ollama server...
→ Ollama PID: 1335
⏳ Waiting for Ollama to be ready…
🚀 Pulling model 'gemma3'…
Available models:
NAME             ID              SIZE      MODIFIED               
gemma3:latest    a2af6cc3eb7f    3.3 GB    Less than a second ago    

🚀 Installing langchain-ollama…
A: The patient has diabetes type 2 and hypertension.


Prompting using OpenAI's API (paid) route:

In [None]:
# In Colab, use getpass to securely prompt for your API key
from getpass import getpass
import openai

openai.api_key = getpass("Paste your OpenAI API key: ")

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role":"system","content":"You are a medical assistant."},
        {"role":"user",  "content": question}
    ]
)

answer_api = response.choices[0].message.content
print(answer_api)

Paste your OpenAI API key: ··········
The patient's diagnosis is type 2 diabetes and hypertension.


## Demo 2: Multi-Agent Team Interaction (Agent Collaboration)

In [None]:
# IMPORTANT: this code cell runs for ~3 minutes on a Google Colab free GPU session, but ~15 minutes in a Google Colab free CPU session!
coder = spin_up_LLM(chosen_llm="CodeLlama")  # or OpenAI model
reviewer = spin_up_LLM(chosen_llm="Llama2")  # a more general model for critique

task = "Write a Python function to check if a number is prime."
conversation = []
# Initialize conversation
conversation.append(("System", "Agents: collaborate to solve the task. Coder writes code, Reviewer suggests fixes."))
conversation.append(("User", task))

# Agent A (Coder) turn
code_response = coder.generate([f"Task: {task}\nRole: Coder\nYou are a coding agent. Provide code only.\n"])
conversation.append(("Coder", code_response.generations[0][0].text))

# Agent B (Reviewer) turn
review_response = reviewer.generate([f"Code:\n{code_response}\nRole: Reviewer\nYou are a code reviewer. Provide feedback or approve.\n"])
conversation.append(("Reviewer", review_response.generations[0][0].text))

print("Coder's output:\n", code_response.generations[0][0].text)
print("\nReviewer's feedback:\n", review_response.generations[0][0].text)

🚀 Starting Ollama server...
→ Ollama PID: 1653
⏳ Waiting for Ollama to be ready…
🚀 Pulling model 'CodeLlama'…
Available models:
NAME                ID              SIZE      MODIFIED               
CodeLlama:latest    8fdf8f752f6e    3.8 GB    Less than a second ago    
gemma3:latest       a2af6cc3eb7f    3.3 GB    About a minute ago        

🚀 Installing langchain-ollama…
🚀 Starting Ollama server...
→ Ollama PID: 1896
⏳ Waiting for Ollama to be ready…
🚀 Pulling model 'Llama2'…
Available models:
NAME                ID              SIZE      MODIFIED               
Llama2:latest       78e26419b446    3.8 GB    Less than a second ago    
CodeLlama:latest    8fdf8f752f6e    3.8 GB    59 seconds ago            
gemma3:latest       a2af6cc3eb7f    3.3 GB    2 minutes ago             

🚀 Installing langchain-ollama…
Coder's output:
 
def is_prime(n):
    """ Checks if the input number 'n' is a prime number"""
    if n < 2:
        return False
    for i in range(2, int(n ** 0.5) + 1):
      

Now, here is an example using AutoGen:

In [None]:
!pip -q install -U "autogen-agentchat" "autogen-ext[openai]"



In [None]:
import os
import asyncio

# 1. Import the agent classes and the OpenAI client
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def multi_agent_demo():
    # 2. Configure your OpenAI API key
    api_key = openai.api_key
    if not api_key:
        openai.api_key = getpass("Paste your OpenAI API key: ")

    # 3. Create the OpenAI model client
    model_client = OpenAIChatCompletionClient(
        model="gpt-4o",
        api_key=api_key,
        temperature=0.0,
    )

    # 4. Instantiate two LLM agents with distinct roles
    coder = AssistantAgent(
        name="Coder",
        model_client=model_client,
        system_message="You are a Python coding assistant. Produce only working code."
    )
    reviewer = AssistantAgent(
        name="Reviewer",
        model_client=model_client,
        system_message="You are a code reviewer. Point out bugs or edge cases."
    )

    # 5. Coder agent writes a function
    code_task = "Write a Python function `is_prime(n)` that returns True if `n` is prime."
    code = await coder.run(task=code_task)
    print("=== Coder’s Output ===\n")
    for msg in code.messages:
        print(msg.content)

    # 6. Reviewer agent critiques the code
    review = await reviewer.run(task=f"Review the following code for correctness and edge cases:\n\n{code}")
    print("\n=== Reviewer’s Feedback ===\n")
    for msg in review.messages:
        print(msg.content)

    # 7. Clean up
    await model_client.close()

# 8. Execute the multi‑agent demo
await multi_agent_demo()

=== Coder’s Output ===

Write a Python function `is_prime(n)` that returns True if `n` is prime.
```python
def is_prime(n):
    if n <= 1:
        return False
    if n <= 3:
        return True
    if n % 2 == 0 or n % 3 == 0:
        return False
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True
```

=== Reviewer’s Feedback ===

Review the following code for correctness and edge cases:

messages=[TextMessage(source='user', models_usage=None, metadata={}, created_at=datetime.datetime(2025, 6, 10, 3, 23, 53, 111247, tzinfo=datetime.timezone.utc), content='Write a Python function `is_prime(n)` that returns True if `n` is prime.', type='TextMessage'), TextMessage(source='Coder', models_usage=RequestUsage(prompt_tokens=43, completion_tokens=102), metadata={}, created_at=datetime.datetime(2025, 6, 10, 3, 23, 54, 508391, tzinfo=datetime.timezone.utc), content='```python\ndef is_prime(n):\n    if n <= 1:\n      

## Demo 3: Monitoring & Tracing Example, and Model Differences