# 🚀 Hero Project: On-Device AI Agent

**This notebook loads and uses the real Qwen3-4B model!**

This notebook demonstrates a complete local AI agent system using:
- **Atomic Agents Framework** for agent orchestration
- **Qwen3-4B-Instruct** for vision-language processing in GGUF format
- **ChromaDB** for vector storage and RAG
- **Real model loading** - downloads and uses the actual model!

## 🎯 What You'll Learn

1. **Real Model Loading**: Download and load the actual Qwen3-4B model
2. **RAG Q&A**: Answer questions using a knowledge base
3. **Task Automation**: Execute file operations and system tasks
4. **Agent Orchestration**: How to build composable AI agents
5. **Local Deployment**: Complete on-device AI system

## 🚀 Quick Setup

This notebook will automatically download the model if needed!


In [None]:
# KERNEL CHECK - Make sure you're using the correct Python environment
import sys
import os

print("🐍 Python Environment Check")
print("=" * 50)
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")
print(f"Current working directory: {os.getcwd()}")

# Change to the correct directory
# Note: Adjust this path based on your setup
notebook_dir = os.path.dirname(os.path.abspath('__file__'))
hero_project_dir = os.path.join(notebook_dir, '..')
os.chdir(hero_project_dir)
print(f"✅ Changed to: {os.getcwd()}")

# Check if we're in the right directory
if not os.path.exists('src/model_loader.py'):
    print("⚠️ Warning: Not in hero-project directory. Please run from the hero-project-complete directory")
else:
    print("✅ In correct directory")

# Add src to path
sys.path.append('src')
sys.path.append('.')
print("✅ Added src and current directory to Python path")


🐍 Python Environment Check
Python version: 3.13.2 (main, Feb  4 2025, 14:51:09) [Clang 16.0.0 (clang-1600.0.26.6)]
Python executable: /Users/freddyayala/Documents/GitHub/slm-ebook/companion-code/hero-project/venv/bin/python
Current working directory: /Users/freddyayala/Documents/GitHub/slm-ebook/companion-code/hero-project/notebooks
✅ Changed to: /Users/freddyayala/Documents/GitHub/slm-ebook/companion-code/hero-project
✅ In correct directory
✅ Added src and current directory to Python path


In [2]:
# Import all required libraries
import warnings
warnings.filterwarnings("ignore")

import torch
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import time
import os

# Import our custom modules
from src.model_loader import Qwen3VLLoader
from src.vector_store import VectorStore
from src.agents.rag_agent import RAGAgent
from src.agents.task_agent import TaskAgent

print("✅ All imports successful!")
print(f"   PyTorch: {torch.__version__}")
print(f"   Device: {'CUDA' if torch.cuda.is_available() else 'CPU'}")
if torch.cuda.is_available():
    print(f"   GPU: {torch.cuda.get_device_name(0)}")


✅ All imports successful!
   PyTorch: 2.9.0
   Device: CPU


## 🧠 STEP 1: LOAD THE REAL QWEN3-4B MODEL

This will download and load the actual model (may take several minutes on first run).


In [3]:
# Initialize and load the REAL model
print("🔄 Loading REAL Qwen3-4B-Instruct model...")
print("=" * 60)
print("⚠️ This may take several minutes on first run (downloading ~2.5GB model)")

model_loader = Qwen3VLLoader()

# Actually load the model (this will download if needed)
print("\n📥 Loading model (this may take a while)...")
model = model_loader.load()

# Get model information
model_info = model_loader.get_model_info()
print("\n📊 Model Information:")
for key, value in model_info.items():
    print(f"   {key}: {value}")

print("\n✅ REAL Model loaded successfully!")


🔄 Loading REAL Qwen3-4B-Instruct model...
⚠️ This may take several minutes on first run (downloading ~2.5GB model)

📥 Loading model (this may take a while)...
🔄 Loading unsloth/Qwen3-4B-Instruct-2507-GGUF in GGUF format...
✅ Using existing model: ./models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf


llama_context: n_ctx_per_seq (4096) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_set_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_c4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64

✅ Model loaded successfully on cpu
   Model format: GGUF
   Context window: 4096 tokens

📊 Model Information:
   model_path: unsloth/Qwen3-4B-Instruct-2507-GGUF
   model_file: ./models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf
   device: cpu
   format: GGUF
   context_window: 4096
   status: loaded

✅ REAL Model loaded successfully!


In [4]:
# Test REAL text generation
print("🧪 Testing REAL text generation...")
print("=" * 50)

test_messages = [
    {"role": "user", "content": "Hello! Can you tell me about small language models?"}
]

start_time = time.time()
response = model_loader.generate_response(test_messages)
end_time = time.time()

print(f"🤖 REAL Model Response:")
print(f"{response}")
print(f"\n⏱️ Generation time: {end_time - start_time:.2f} seconds")
print(f"📝 Response length: {len(response)} characters")


🧪 Testing REAL text generation...
🤖 REAL Model Response:
Hello! 😊 Absolutely — let's dive into small language models!

### What Are Small Language Models?

Small language models (LLMs) are compact versions of large language models (LLMs) — like GPT-3, LLaMA, or BERT — that are designed to be more efficient in terms of **size, speed, and computational requirements**. Instead of having billions of parameters (like GPT-3 with 175 billion parameters), small language models typically have **tens to hundreds of millions of parameters**, making them much lighter and easier to run on devices like smartphones, laptops, or even edge hardware.

---

### Key Characteristics of Small Language Models:

1. **Smaller Size & Lower Resource Use**  
   - They require significantly less RAM, GPU, or cloud computing power.
   - Can be deployed locally on devices (e.g., your phone or laptop), enabling privacy and offline use.

2. **Faster Inference**  
   - Generate responses more quickly than large models.

## 🗄️ STEP 2: SETUP VECTOR STORE WITH CHROMADB

Create a knowledge base using ChromaDB for RAG (Retrieval-Augmented Generation).


In [5]:
# Initialize vector store
print("🔄 Setting up ChromaDB vector store...")
print("=" * 50)

vector_store = VectorStore()

# Create sample documents
print("\n📚 Creating sample knowledge base...")
vector_store.save_sample_documents()

# Get collection info
collection_info = vector_store.get_collection_info()
print("\n📊 Vector Store Information:")
for key, value in collection_info.items():
    print(f"   {key}: {value}")

print("\n✅ Vector store ready!")


🔄 Setting up ChromaDB vector store...
✅ Vector store initialized: knowledge_base
   Persist directory: ./chroma_db
   Embedding model: all-MiniLM-L6-v2

📚 Creating sample knowledge base...
✅ Created 4 sample documents in ./data/knowledge_base
✅ Added 4 documents to vector store
✅ Loaded 4 documents from ./data/knowledge_base

📊 Vector Store Information:
   collection_name: knowledge_base
   document_count: 4
   persist_directory: ./chroma_db
   embedding_model: all-MiniLM-L6-v2

✅ Vector store ready!


In [6]:
# Test vector search
print("🔍 Testing vector search...")
print("=" * 40)

test_query = "What are small language models?"
search_results = vector_store.search(test_query, n_results=2)

print(f"Query: {test_query}")
print(f"Found {search_results['count']} relevant documents:")
print()

for i, (doc, metadata) in enumerate(zip(search_results['documents'], search_results['metadatas']), 1):
    print(f"📄 Document {i}:")
    print(f"   Source: {metadata.get('filename', 'Unknown')}")
    print(f"   Content: {doc[:200]}...")
    print()

print("✅ Vector search working!")


🔍 Testing vector search...
Query: What are small language models?
Found 2 relevant documents:

📄 Document 1:
   Source: small_language_models.txt
   Content: Small Language Models (SLMs) are compact versions of large language models designed 
            to run efficiently on local devices. They typically have fewer than 10 billion parameters 
            ...

📄 Document 2:
   Source: vision_language_models.txt
   Content: Vision-Language Models (VLMs) are AI models that can process and understand both 
            visual and textual information. They can analyze images, answer questions about visual 
            conten...

✅ Vector search working!


## 🤖 STEP 3: TEST RAG AGENT WITH REAL MODEL

Test the RAG agent with the actual loaded model.


In [7]:
# Initialize RAG agent with REAL model
print("🔄 Initializing RAG Agent with REAL model...")
print("=" * 50)

rag_agent = RAGAgent(model_loader, vector_store)
print("✅ RAG Agent initialized with REAL model!")

# Test RAG agent with a question
print("\n🧪 Testing RAG Agent with REAL responses...")
test_question = "What is artificial intelligence?"
result = rag_agent.run(test_question)

print(f"\n🤖 Question: {test_question}")
print(f"🤖 REAL Answer: {result['answer']}")
print(f"📚 Context used: {len(result['context'])} characters")

print("\n✅ RAG Agent working with REAL model!")


🔄 Initializing RAG Agent with REAL model...
✅ RAG Agent initialized
✅ RAG Agent initialized with REAL model!

🧪 Testing RAG Agent with REAL responses...
🔍 Searching knowledge base for: 'What is artificial intelligence?'
🤖 Generating response...

🤖 Question: What is artificial intelligence?
🤖 REAL Answer: Artificial Intelligence (AI) is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. These tasks include learning, reasoning, problem-solving, perception, and language understanding.
📚 Context used: 1201 characters

✅ RAG Agent working with REAL model!


## 🛠️ STEP 4: TEST TASK AGENT WITH REAL MODEL

Test the Task agent with the actual loaded model.


In [8]:
# Initialize Task Agent with REAL model
print("🔄 Initializing Task Agent with REAL model...")
print("=" * 50)

task_agent = TaskAgent(model_loader)
print("✅ Task Agent initialized with REAL model!")

# Test Task agent with file operations
print("\n🧪 Testing Task Agent with REAL responses...")
test_task = "Create a file called test_agent.txt with the content: Hello from the REAL AI agent!"
result = task_agent.run(test_task)

print(f"\n🤖 Task: {test_task}")
print(f"🤖 REAL Result: {result['result']}")
print(f"🔧 Tools used: {result['tools_used']}")

print("\n✅ Task Agent working with REAL model!")


🔄 Initializing Task Agent with REAL model...
✅ Task Agent initialized
✅ Task Agent initialized with REAL model!

🧪 Testing Task Agent with REAL responses...
🤖 Processing task: 'Create a file called test_agent.txt with the content: Hello from the REAL AI agent!'
🔧 Agent decided to use tool: file_write

🤖 Task: Create a file called test_agent.txt with the content: Hello from the REAL AI agent!
🤖 REAL Result: The file `test_agent.txt` has been successfully created with the content: "Hello from the REAL AI agent!".
🔧 Tools used: ['file_write']

✅ Task Agent working with REAL model!


## 🎉 FINAL DEMONSTRATION WITH REAL MODEL

Complete demonstration using the actual Qwen3-4B model.


In [9]:
# 🎉 FINAL DEMONSTRATION WITH REAL MODEL
print("🎯 HERO PROJECT: COMPLETE DEMONSTRATION WITH REAL MODEL")
print("=" * 70)

# Test 1: RAG Agent with REAL model
print("\n🔍 DEMO 1: RAG Agent with REAL Qwen3-4B")
print("-" * 50)
rag_question = "What are the benefits of small language models?"
rag_result = rag_agent.run(rag_question)
print(f"Question: {rag_question}")
print(f"REAL Answer: {rag_result['answer'][:300]}...")

# Test 2: Task Agent with REAL model
print("\n🔧 DEMO 2: Task Agent with REAL Qwen3-4B")
print("-" * 50)
task_instruction = "Create a file called hero_demo_real.txt with a summary of what we learned about AI"
task_result = task_agent.run(task_instruction)
print(f"Task: {task_instruction}")
print(f"REAL Result: {task_result['result']}")

# Test 3: Verify file creation
print("\n📁 DEMO 3: Verify File Creation")
print("-" * 50)
try:
    with open('hero_demo_real.txt', 'r') as f:
        content = f.read()
    print(f"✅ File created successfully!")
    print(f"Content: {content[:200]}...")
except FileNotFoundError:
    print("❌ File not found")

print("\n🎉 HERO PROJECT DEMONSTRATION COMPLETE WITH REAL MODEL!")
print("✅ RAG Agent: Working with REAL Qwen3-4B")
print("✅ Task Agent: Working with REAL Qwen3-4B") 
print("✅ File Operations: Working")
print("✅ Real Agentic AI: ACHIEVED WITH REAL MODEL!")
print("\n🚀 This is a fully functional, local AI agent system!")


🎯 HERO PROJECT: COMPLETE DEMONSTRATION WITH REAL MODEL

🔍 DEMO 1: RAG Agent with REAL Qwen3-4B
--------------------------------------------------
🔍 Searching knowledge base for: 'What are the benefits of small language models?'
🤖 Generating response...
Question: What are the benefits of small language models?
REAL Answer: The benefits of small language models (SLMs) include:

- **Local processing and privacy**: Models can run on local devices, reducing the need to send data to remote servers and enhancing user privacy.  
- **Lower computational requirements**: SLMs require less power and hardware resources, making th...

🔧 DEMO 2: Task Agent with REAL Qwen3-4B
--------------------------------------------------
🤖 Processing task: 'Create a file called hero_demo_real.txt with a summary of what we learned about AI'
🔧 Agent decided to use tool: file_write
Task: Create a file called hero_demo_real.txt with a summary of what we learned about AI
REAL Result: The file `hero_demo_real.txt` has 