# EcoHome Energy Advisor - RAG Setup

In this notebook, you'll set up the Retrieval-Augmented Generation (RAG) pipeline for the EcoHome Energy Advisor. This will allow the agent to access and cite relevant energy-saving tips and best practices.

## Learning Objectives
- Set up ChromaDB vector store
- Load and process energy-saving documents
- Create embeddings for document chunks
- Implement semantic search functionality
- Test the RAG pipeline

## Documents Available
- `tip_device_best_practices.txt` - Device-specific optimization tips
- `tip_energy_savings.txt` - General energy-saving strategies


## 1. Import Required Libraries


In [16]:
# Import the necessary libraries for RAG setup
import os
from langchain_chroma  import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from dotenv import load_dotenv
from utils import get_voc_creds



In [17]:
load_dotenv()


True

## 2. Load and Process Documents


In [19]:
# Load the energy-saving tip documents
# Load ALL .txt tip files in `data/documents` using glob so new tips are picked up automatically
from langchain_community.document_loaders import TextLoader
import glob

documents = []
document_paths = sorted(glob.glob("data/documents/*.txt"))

if not document_paths:
    print("Warning: no document files found in data/documents/")

# Try a list of encodings to avoid UnicodeDecodeError on Windows files
encodings_to_try = ["utf-8", "utf-8-sig", "cp1252", "latin-1", "cp932"]

for doc_path in document_paths:
    if os.path.exists(doc_path):
        loaded = False
        for enc in encodings_to_try:
            try:
                loader = TextLoader(doc_path, encoding=enc)
                docs = loader.load()
                documents.extend(docs)
                print(f"Loaded {len(docs)} documents from {doc_path} with encoding {enc}")
                loaded = True
                break
            except UnicodeDecodeError:
                print(f"Encoding {enc} failed for {doc_path}; trying next encoding...")
                continue
            except Exception as e:
                print(f"Error loading {doc_path} with encoding {enc}: {e}")
                # try next encoding
                continue
        if not loaded:
            print(f"Failed to load {doc_path} with tried encodings; skipping.")
    else:
        print(f"Warning: {doc_path} not found")

print(f"Total documents loaded: {len(documents)}")


Loaded 1 documents from data/documents\tip_device_best_practices.txt with encoding utf-8
Loaded 1 documents from data/documents\tip_energy_savings.txt with encoding utf-8
Loaded 1 documents from data/documents\tip_energy_storage.txt with encoding utf-8
Loaded 1 documents from data/documents\tip_hvac_optimization.txt with encoding utf-8
Loaded 1 documents from data/documents\tip_renewable_integration.txt with encoding utf-8
Loaded 1 documents from data/documents\tip_seasonal_energy_management.txt with encoding utf-8
Loaded 1 documents from data/documents\tip_smart_home_automation.txt with encoding utf-8
Total documents loaded: 7


## 4. Create Vector Store


In [25]:
import os
os.environ['VOCAREUM_API_KEY'] = "your-key"
os.environ['VOCAREUM_API_BASE'] = "https://openai.vocareum.com/v1"

In [21]:
# Create a ChromaDB vector store
# Initialize OpenAIEmbeddings
# Create the vector store with the document chunks
# Persist the vector store to disk for future use

# Set up the persist directory
persist_directory = "data/vectorstore"
os.makedirs(persist_directory, exist_ok=True)

# Use the helper to get a synchronous API key/base URL
try:
    api_key, base_url = get_voc_creds()
except Exception as e:
    raise RuntimeError("Failed to obtain Vocareum credentials: " + str(e))

# Defensive check: embeddings libraries may accept callables,
# but passing async callables causes the 'Sync client is not available' error.
import inspect, asyncio
if inspect.iscoroutinefunction(api_key) or asyncio.iscoroutine(api_key):
    raise ValueError("The provided API key appears to be async. Provide a plain string API key instead.")

# Initialize embeddings with the sync API key and base URL
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    base_url=base_url,
    api_key=api_key
)

# Create the vector store and persist to disk
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=persist_directory
)

print(f"Vector store created and persisted to {persist_directory}")
print(f"Total vectors stored: {len(splits)}")

Vector store created and persisted to data/vectorstore
Total vectors stored: 4


## 5. Test the RAG Pipeline


In [22]:
# Test the search functionality
# Try different queries related to energy optimization
# Test queries like:
# - "electric vehicle charging tips"
# - "thermostat optimization"
# - "dishwasher energy saving"
# - "solar power maximization"

test_queries = [
    "electric vehicle charging tips",
    "thermostat optimization",
    "dishwasher energy saving",
    "solar power maximization",
    "HVAC system efficiency",
    "pool pump scheduling"
]

print("=== Testing Vector Search ===")
for query in test_queries:
    print(f"\nQuery: '{query}'")
    docs = vectorstore.similarity_search(query, k=2)
    for i, doc in enumerate(docs):
        print(f"  Result {i+1}: {doc.page_content[:100]}...")


=== Testing Vector Search ===

Query: 'electric vehicle charging tips'
  Result 1: Large devices like electric vehicles, washing machines and dishwashers often support delayed start o...
  Result 2: Large devices like electric vehicles, washing machines and dishwashers often support delayed start o...

Query: 'thermostat optimization'
  Result 1: Large devices like electric vehicles, washing machines and dishwashers often support delayed start o...
  Result 2: Large devices like electric vehicles, washing machines and dishwashers often support delayed start o...

Query: 'thermostat optimization'
  Result 1: Title: HVAC Optimization Strategies for Night-Shift Households

- Pre-cool or pre-heat using cheaper...
  Result 2: HVAC System Optimization:
- Change air filters monthly for better airflow
- Clean outdoor condenser ...

Query: 'dishwasher energy saving'
  Result 1: Title: HVAC Optimization Strategies for Night-Shift Households

- Pre-cool or pre-heat using cheaper...
  Result 2: HV

## 6. Test the Search Tool


In [23]:
# Test the search_energy_tips tool from tools.py
# Import and test the tool with various queries
# Verify that it returns relevant results

from tools import search_energy_tips

# Test the search_energy_tips function
print("=== Testing search_energy_tips Tool ===")

test_queries = [
    "electric vehicle charging",
    "thermostat settings",
    "dishwasher optimization",
    "solar power tips"
]

for query in test_queries:
    print(f"\nQuery: '{query}'")
    result = search_energy_tips.invoke(
        input={
            "query": query, 
            "max_results": 3,
        }
    )
    
    if "error" in result:
        print(f"  Error: {result['error']}")
    else:
        print(f"  Found {result['total_results']} results")
        for i, tip in enumerate(result['tips']):
            print(f"    {i+1}. {tip['content'][:100]}...")
            print(f"       Source: {tip['source']}")
            print(f"       Relevance: {tip['relevance_score']}")


=== Testing search_energy_tips Tool ===

Query: 'electric vehicle charging'
  Found 3 results
    1. Large devices like electric vehicles, washing machines and dishwashers often support delayed start o...
       Source: data\documents\tip_device_best_practices.txt
       Relevance: high
    2. Large devices like electric vehicles, washing machines and dishwashers often support delayed start o...
       Source: data/documents/tip_device_best_practices.txt
       Relevance: high
    3. Large devices like electric vehicles, washing machines and dishwashers often support delayed start o...
       Source: data/documents/tip_device_best_practices.txt
       Relevance: medium

Query: 'thermostat settings'
  Found 3 results
    1. Large devices like electric vehicles, washing machines and dishwashers often support delayed start o...
       Source: data\documents\tip_device_best_practices.txt
       Relevance: high
    2. Large devices like electric vehicles, washing machines and dishwashers of

In [24]:
# Quick RAG check: similarity search on the built vector store
from utils import get_voc_creds
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

api_key, base_url = get_voc_creds()
persist_directory = 'data/vectorstore'
embeddings = OpenAIEmbeddings(model='text-embedding-3-small', base_url=base_url, api_key=api_key)
vectorstore = Chroma(persist_directory=persist_directory, embedding_function=embeddings)

test_queries = [
    'energy saving tips',
    'solar power optimization',
    'thermostat efficiency',
]

print('=== Testing Vector Search ===')
for q in test_queries:
    docs = vectorstore.similarity_search(q, k=2)
    print(f"\nQuery: {q}")
    if not docs:
        print('  No results returned.')
    for i, doc in enumerate(docs):
        preview = doc.page_content.replace('\n',' ')[:120]
        print(f"  {i+1}. {preview}...")
        print(f"     Source: {doc.metadata.get('source','unknown')}")


=== Testing Vector Search ===

Query: energy saving tips
  1. Saving energy at home can be simple and effective. Turn off lights when not in use and unplug devices that draw standby ...
     Source: data/documents/tip_energy_savings.txt
  2. Saving energy at home can be simple and effective. Turn off lights when not in use and unplug devices that draw standby ...
     Source: data/documents/tip_energy_savings.txt

Query: energy saving tips
  1. Saving energy at home can be simple and effective. Turn off lights when not in use and unplug devices that draw standby ...
     Source: data/documents/tip_energy_savings.txt
  2. Saving energy at home can be simple and effective. Turn off lights when not in use and unplug devices that draw standby ...
     Source: data/documents/tip_energy_savings.txt

Query: solar power optimization
  1. Title: Energy Storage Optimization  - If you have a home battery, reserve 20–30% state-of-charge for peak-rate hours and...
     Source: data\documents\tip_en