# Simple RAG Example with Weaviate and LangChain

## 1. Introduction

This notebook demonstrates a simple implementation of the Retrieval-Augmented Generation (RAG) pattern. The goal is to build a question-answering system that leverages a vector database to provide context-aware answers from a Large Language Model (LLM).

The process involves:
1.  **Data Preparation**: Creating a small, factual dataset of scientific notes.
2.  **Environment Setup**: Preparing the Docker environment and installing dependencies.
3.  **Database Deployment**: Launching a Weaviate vector database instance using Docker.
4.  **Embeddings & Ingestion**: Generating vector embeddings for our data using Azure OpenAI and loading it into Weaviate.
5.  **RAG Experiment**: Executing a RAG pipeline:
    - Expanding a user's question.
    - Searching for relevant documents in Weaviate.
    - Generating a final answer using an LLM augmented with the retrieved documents.
6.  **Cleanup**: Removing the Docker container to free up system resources.

## 2. System Environment Preparation

This section contains helper functions to interact with the underlying operating system (Linux or Windows with WSL) to manage Docker containers and file paths.

### 2.1. WSL and Shell Command Helpers

In [17]:
import platform
import subprocess
import os

# --- Platform Detection ---
system = platform.system()
print(f"Operating System: {system}")

# --- Shell Command Helpers ---
def run_windows_command(command):
    """Executes a command in PowerShell on Windows."""
    result = subprocess.run(
        ["powershell", "-Command", command],
        capture_output=True,
        text=True,
        encoding="utf-8",
        errors="replace"
    )
    return {
        "returncode": result.returncode,
        "stdout": result.stdout.strip(),
        "stderr": result.stderr.strip(),
        "success": result.returncode == 0
    }

def run_linux_command(command):
    """Executes a command in a standard Linux/macOS shell."""
    result = subprocess.run(
        command,
        shell=True,
        capture_output=True,
        text=True,
        encoding="utf-8",
        errors="replace"
    )
    return {
        "returncode": result.returncode,
        "stdout": result.stdout.strip(),
        "stderr": result.stderr.strip(),
        "success": result.returncode == 0
    }

def run_shell_command(command):
    """Universal function to run a shell command, detecting the platform."""
    if system == "Windows":
        return run_windows_command(command)
    else:
        return run_linux_command(command)

print("‚úÖ Shell command helpers are defined.")

Operating System: Windows
‚úÖ Shell command helpers are defined.


### 2.2. Install Dependencies

In [18]:
import sys

!"{sys.executable}" -m pip install -q weaviate-client==4.18.1 langchain~=0.3.0 langchain-openai~=0.2.0 python-dotenv~=1.0.0 pandas~=2.2.0

print("‚úÖ Required libraries have been installed.")

‚úÖ Required libraries have been installed.



[notice] A new release of pip is available: 24.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
!"{sys.executable}" -m pip install -U -q sentence-transformers accelerate

print("‚úÖ Extra libraries for local models run have been installed.")

^C
‚úÖ Extra libraries for local models run have been installed.
‚úÖ Extra libraries for local models run have been installed.



[notice] A new release of pip is available: 24.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## 3. Configuration

Set up the necessary configurations for Weaviate and Azure OpenAI. 

**Action Required**: You must create a `.env` file in the same directory as this notebook and add your Azure OpenAI credentials.

In [None]:
import os
from dotenv import load_dotenv


# --- DIAL ML MODELS CONFIGURATION ---
# Create a dummy .env file if it doesn't exist for demonstration purposes
if not os.path.exists('.env'):
    with open('.env', 'w') as f:
        f.write("AZURE_OPENAI_API_KEY='YOUR_AZURE_OPENAI_KEY'\n")
        f.write("AZURE_OPENAI_ENDPOINT='YOUR_AZURE_OPENAI_ENDPOINT'\n")
        f.write("AZURE_OPENAI_API_VERSION='2024-02-01'\n")
        f.write("AZURE_OPENAI_EMBEDDING_DEPLOYMENT='YOUR_EMBEDDING_DEPLOYMENT_NAME'\n")
        f.write("AZURE_OPENAI_CHAT_DEPLOYMENT='YOUR_CHAT_DEPLOYMENT_NAME'\n")
    print("‚ö†Ô∏è Created a template .env file. Please fill it with your Azure credentials.")


# Load environment variables from .env file
load_dotenv()


# --- ALTERNATIVE EMBEDDINGS CONFIGURATION ---
# Set to True to use a local model (free, runs on CPU/GPU)
# Set to False to use Azure OpenAI (requires an API key and funds in the account)
USE_LOCAL_EMBEDDINGS = True

# Embeddingd model for local run.
# If you have access to Gemma (you logged in via huggingface-cli), use: "google/embeddinggemma-300m" (768 dimensions)
# If you don't have access or encounter errors, use the standard one: "all-MiniLM-L6-v2" (384 dimensions)
LOCAL_EMBEDDING_MODEL_NAME = "google/embeddinggemma-300m"
# LOCAL_EMBEDDING_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2" # Uncomment if Gemma doesn't work


# --- ALTERNATIVE LLM (CHAT) CONFIGURATION ---
# Set to True to use a local LLM via Hugging Face Pipeline (CPU optimized)
USE_LOCAL_LLM = True

# Text generation model for local run.
LOCAL_LLM_MODEL_NAME = "google/gemma-3-1b-it"


# --- VECTOR DATABASE CONFIGURATION ---
WEAVIATE_CONTAINER_NAME = "simple-rag-weaviate"
WEAVIATE_IMAGE = "semitechnologies/weaviate:1.33.7"
WEAVIATE_HTTP_PORT = 8080
WEAVIATE_GRPC_PORT = 50051

print("‚úÖ Configuration loaded.")

‚ö†Ô∏è Created a template .env file. Please fill it with your Azure credentials.
‚úÖ Configuration loaded.


## 4. Data Generation

Here we generate 25 simple, factual notes across 5 different topics. Each note contains a specific detail (like a number, a name, or a technical term) to make it uniquely identifiable during the retrieval phase of our RAG experiment. This ensures we are testing the retrieval mechanism, not just the general knowledge of the LLM.

In [None]:
documents_data = [
    # Topic 1: Origins & Structure of the Premier League
    {
        "title": "Birth of the Premier League",
        "content": "The English Premier League (EPL) was founded in 1992 when clubs from the old First Division broke away from the Football League to take advantage of more lucrative TV rights. The league started with 22 teams but later reduced to 20. It quickly became one of the most watched sports competitions in the world."
    },
    {
        "title": "Promotion and Relegation",
        "content": "The Premier League uses a promotion and relegation system with the English Football League Championship. Each season, the bottom three clubs in the Premier League are relegated to the Championship, while the top two Championship clubs and the playoff winner are promoted. This system keeps competition intense at both ends of the table."
    },
    {
        "title": "Points and League Table",
        "content": "Teams in the Premier League earn three points for a win, one point for a draw, and none for a loss. The league table is ranked by total points, then goal difference, and then goals scored. If clubs are still level, head-to-head records and, ultimately, a playoff can be used to break ties in extreme cases."
    },
    {
        "title": "European Qualification",
        "content": "Top Premier League clubs qualify for European competitions like the UEFA Champions League and Europa League. Typically, the top four teams enter the Champions League, while the next positions and domestic cup winners can earn Europa League or Conference League spots. This adds extra stakes beyond the domestic title race."
    },
    {
        "title": "Financial Powerhouse",
        "content": "The Premier League generates billions in revenue through broadcasting, sponsorship, and matchday income. TV rights are sold worldwide, and income is shared among clubs using a formula that rewards league position and appearances. This financial strength attracts elite players and coaches from around the globe."
    },

    # Topic 2: Iconic Moments in Premier League History
    {
        "title": "‚ÄúAguerooooo‚Äù Title Winner",
        "content": "In the 2011‚Äì12 season, Manchester City won their first Premier League title with a dramatic last-minute goal by Sergio Ag√ºero against Queens Park Rangers. The goal, scored deep into stoppage time, snatched the title away from Manchester United on goal difference. It is widely regarded as one of the most dramatic moments in league history."
    },
    {
        "title": "The Invincibles",
        "content": "Arsenal‚Äôs 2003‚Äì04 team, nicknamed ‚ÄúThe Invincibles,‚Äù completed the entire Premier League season unbeaten. They recorded 26 wins and 12 draws, finishing with 90 points and a +47 goal difference. This achievement is considered one of the greatest team performances in modern football."
    },
    {
        "title": "Leicester‚Äôs Miracle Season",
        "content": "In 2015‚Äì16, Leicester City shocked the world by winning the Premier League after starting the season as relegation candidates and 5000‚Äì1 outsiders with bookmakers. Led by manager Claudio Ranieri and stars like Jamie Vardy and Riyad Mahrez, Leicester‚Äôs disciplined defending and rapid counter-attacks stunned more established clubs."
    },
    {
        "title": "The Battle of Old Trafford",
        "content": "In 2003, a tense match between Manchester United and Arsenal ended in a mass confrontation known as the ‚ÄúBattle of Old Trafford.‚Äù A missed penalty, late challenges, and heated arguments led to multiple bans and fines. The incident highlighted the fierce rivalry between the two clubs during the early 2000s."
    },
    {
        "title": "Record-Breaking Centurions",
        "content": "Manchester City‚Äôs 2017‚Äì18 side became the first Premier League team to reach 100 points in a season. They scored 106 goals, won 32 of 38 matches, and set records for away wins and goal difference. Their possession-based, attacking style under Pep Guardiola defined a new standard of dominance."
    },

    # Topic 3: Football Analytics in the Premier League
    {
        "title": "Expected Goals (xG)",
        "content": "Expected goals, or xG, is a statistical metric that estimates the probability of a shot becoming a goal based on factors like shot location, body part, and type of assist. Premier League clubs use xG to evaluate performance beyond raw scorelines. A team consistently outperforming its xG may be finishing exceptionally well‚Äîor riding its luck."
    },
    {
        "title": "Pressing and PPDA",
        "content": "Pressing intensity is often measured with metrics like PPDA (Passes Allowed Per Defensive Action). A low PPDA indicates that a team allows few passes before applying defensive pressure, reflecting an aggressive pressing style. High-press teams such as those managed by J√ºrgen Klopp have helped make pressing statistics mainstream in Premier League analysis."
    },
    {
        "title": "Heatmaps and Player Positioning",
        "content": "Data providers create heatmaps to visualize where players spend most of their time on the pitch. These graphics show zones of high activity and can reveal roles like inverted full-backs or roaming playmakers. Coaches use heatmaps to adjust tactics and identify positional weaknesses in opponents."
    },
    {
        "title": "Set-Piece Analysis",
        "content": "Premier League clubs devote significant analytics resources to set pieces, such as corners and free-kicks. Analysts study delivery patterns, blocking runs, and opponent marking schemes to design routines that create high-quality chances. Some teams hire specialist set-piece coaches to gain an edge in these high-leverage moments."
    },
    {
        "title": "Wearables and Tracking Data",
        "content": "Players often wear GPS vests and tracking devices during training and matches. These systems collect data on total distance covered, sprint counts, and high-intensity efforts. Sports scientists combine this information with match analytics to manage fatigue, reduce injury risk, and individualize training loads."
    },

    # Topic 4: Tactics and Playing Styles
    {
        "title": "The 4-3-3 and Wide Wingers",
        "content": "The 4-3-3 formation has become a staple in the Premier League, emphasizing width and fluid front lines. Wide forwards cut inside to shoot, while full-backs overlap to provide crossing options. This system allows teams to press high and quickly surround the ball when possession is lost."
    },
    {
        "title": "Low Blocks and Counter-Attacks",
        "content": "Many underdog Premier League teams use a low defensive block, sitting deep near their own penalty area to deny space. When they win the ball, they launch fast counter-attacks using quick forwards and long passes into space. This style can frustrate possession-heavy giants and lead to shock upsets."
    },
    {
        "title": "False Nines and Fluid Forwards",
        "content": "A false nine is a forward who frequently drops into midfield instead of staying near the opposition center-backs. This movement pulls defenders out of position and opens space for wingers or midfielders to run into. Several Premier League managers have used false nines to create unpredictable attacking patterns."
    },
    {
        "title": "Build-Up from the Back",
        "content": "Modern Premier League teams often build attacks from the goalkeeper and center-backs instead of kicking long. Players form passing triangles, and defensive midfielders drop deep to receive the ball under pressure. This approach requires technically skilled defenders and a clear positional structure to avoid dangerous turnovers."
    },
    {
        "title": "Press-Resistant Midfielders",
        "content": "Press-resistant midfielders are players who can receive the ball under pressure, turn away from opponents, and keep possession. In the Premier League‚Äôs high-tempo environment, such players are vital for progressing the ball through the middle third. Their ability to evade presses can completely change how a team advances up the pitch."
    },

    # Topic 5: Fan Culture and Stadium Atmosphere
    {
        "title": "Home Advantage and Atmosphere",
        "content": "Premier League stadiums are known for their loud, intense atmospheres, which can boost the home team‚Äôs performance. Chants, flags, and coordinated displays create a sense of intimidation for visiting players. The psychological effect of tens of thousands of supporters is one reason home advantage remains significant in football."
    },
    {
        "title": "Club Anthems and Chants",
        "content": "Many Premier League clubs have iconic songs or anthems associated with them. Fans of Liverpool sing ‚ÄúYou‚Äôll Never Walk Alone‚Äù before kick-off, while other clubs have their own traditional chants and melodies. These songs help create a shared identity and emotional connection between supporters and the team."
    },
    {
        "title": "Matchday Rituals",
        "content": "Supporters often follow specific rituals on matchdays, such as visiting the same pub, walking a particular route to the stadium, or wearing lucky scarves. These habits become part of the club‚Äôs culture and are passed down between generations. For many fans, the entire day‚Äînot just the 90 minutes‚Äîis a meaningful experience."
    },
    {
        "title": "Rivalries and Local Identity",
        "content": "Derby matches, like the North London derby or the Manchester derby, reflect local pride and historical tension between clubs. Fans see these fixtures as more than just games; they are battles for bragging rights within cities and communities. The emotional stakes make these matches some of the most intense in the league."
    },
    {
        "title": "Global Fanbase",
        "content": "The Premier League has a massive international following, with supporters‚Äô clubs on every continent. Fans who may never visit the stadium still wake up early or stay up late to watch live broadcasts. Social media and streaming platforms help create online communities that share reactions, memes, and analysis in real time."
    }
]

print(f"‚úÖ Generated {len(documents_data)} documents across 5 topics.")

‚úÖ Generated 25 documents across 5 topics.


## 5. Docker Environment Setup

We will now start the Weaviate database using a Docker container. The following cells will check for Docker, pull the required image, and run the container with the correct port mappings.

In [22]:
# First, ensure no old container with the same name is running
print(f"--- Stopping and removing any existing container named '{WEAVIATE_CONTAINER_NAME}' ---")

# Windows PowerShell and Linux use different error redirection
if platform.system() == "Windows":
    # PowerShell: suppress errors for stop/rm commands (they fail if container doesn't exist)
    stop_command = f"docker stop {WEAVIATE_CONTAINER_NAME} 2>&1 | Out-Null; docker rm {WEAVIATE_CONTAINER_NAME} 2>&1 | Out-Null"
else:
    stop_command = f"docker stop {WEAVIATE_CONTAINER_NAME} 2>/dev/null; docker rm {WEAVIATE_CONTAINER_NAME} 2>/dev/null"

run_shell_command(stop_command)
print("Cleanup complete.")

# Now, run the new Weaviate container
print(f"\n--- Starting Weaviate container '{WEAVIATE_CONTAINER_NAME}' ---")
run_command = (
    f"docker run -d "
    f"--name {WEAVIATE_CONTAINER_NAME} "
    f"-p {WEAVIATE_HTTP_PORT}:{WEAVIATE_HTTP_PORT} "
    f"-p {WEAVIATE_GRPC_PORT}:{WEAVIATE_GRPC_PORT} "
    f"-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true "
    f"-e PERSISTENCE_DATA_PATH=/var/lib/weaviate "
    f"-e DEFAULT_VECTORIZER_MODULE=none "
    f"-e ENABLE_MODULES='' "
    f"-e CLUSTER_HOSTNAME=node1 "
    f"{WEAVIATE_IMAGE}"
)

result = run_shell_command(run_command)

if result["success"]:
    print("‚úÖ Weaviate container started successfully.")
    container_id = result['stdout'].strip()
    if container_id and len(container_id) >= 12:
        print(f"Container ID: {container_id[:12]}")
    print("Waiting a few seconds for the service to initialize...")
    import time
    time.sleep(10) # Give Weaviate time to start up
else:
    print("‚ùå Failed to start Weaviate container.")
    print(f"Error: {result['stderr']}")
    print("\nTroubleshooting:")
    print("1. Make sure Docker Desktop is running")
    print("2. Check if port 8080 is already in use")
    print("3. Try running 'docker ps' to see active containers")

# Display container statistics
print("\n--- Weaviate Container Stats ---")
stats_result = run_shell_command(f"docker stats {WEAVIATE_CONTAINER_NAME} --no-stream")
if stats_result["success"]:
    print(stats_result["stdout"])
else:
    # If stats fail, just check if container is running
    ps_result = run_shell_command(f"docker ps --filter name={WEAVIATE_CONTAINER_NAME}")
    if ps_result["success"]:
        print(ps_result["stdout"])
    else:
        print("‚ö†Ô∏è Could not retrieve container status")

--- Stopping and removing any existing container named 'simple-rag-weaviate' ---
Cleanup complete.

--- Starting Weaviate container 'simple-rag-weaviate' ---
Cleanup complete.

--- Starting Weaviate container 'simple-rag-weaviate' ---
‚ùå Failed to start Weaviate container.
Error: docker: Error response from daemon: failed to set up container networking: driver failed programming external connectivity on endpoint simple-rag-weaviate (49ee0c53358a79053bf9b7cd67664b8dea3cce0b5690942fc438c3fba4dba0c2): Bind for 0.0.0.0:8080 failed: port is already allocated

Run 'docker run --help' for more information

Troubleshooting:
1. Make sure Docker Desktop is running
2. Check if port 8080 is already in use
3. Try running 'docker ps' to see active containers

--- Weaviate Container Stats ---
CONTAINER ID   NAME                  CPU %     MEM USAGE / LIMIT   MEM %     NET I/O   BLOCK I/O   PIDS
71c432acd5ab   simple-rag-weaviate   0.00%     0B / 0B             0.00%     0B / 0B   0B / 0B     0
‚ùå F

## 6. Embeddings and Data Ingestion

In this section, we will:
1.  Set up the LangChain clients for Azure OpenAI (for both embeddings and chat).
2.  Generate vector embeddings for each of our 25 documents.
3.  Connect to our Weaviate instance.
4.  Define a data schema (a "collection") in Weaviate.
5.  Batch-insert all documents and their vectors into the collection.

**NOTE**: The loacal models examples have beed added as alternative, 
make sense to split this cell on two: clietns and data ingestion

In [None]:
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain_core.messages import AIMessage
from langchain_core.runnables import Runnable, RunnableConfig
import weaviate
import weaviate.classes as wvc
from weaviate.util import generate_uuid5
from sentence_transformers import SentenceTransformer
from transformers import pipeline
import torch
import numpy as np

# --- Wrapper class for the local Embeddings model ---
class LocalHuggingFaceEmbeddings:
    """
    This class adapts a local SentenceTransformer model
    to the LangChain interface, which expects the methods embed_documents and embed_query.
    """
    def __init__(self, model_name):
        print(f"üì• Loading local embedding model: {model_name}...")
        try:
            self.model = SentenceTransformer(model_name)
            print("‚úÖ Local embedding model loaded successfully.")
        except Exception as e:
            print(f"‚ùå Error loading {model_name}. Falling back to 'all-MiniLM-L6-v2'.")
            print(f"Error details: {e}")
            self.model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

    def embed_documents(self, texts):
        # Returns a list of lists
        embeddings = self.model.encode(texts, convert_to_numpy=True)
        return embeddings.tolist()

    def embed_query(self, text):
        # Returns a single list
        embedding = self.model.encode(text, convert_to_numpy=True)
        return embedding.tolist()


# --- Wrapper class for the local LLM ---
class LocalHuggingFaceChatModel(Runnable):
    """
    A simple wrapper around the Transformers Pipeline to make it compatible
    with LangChain's 'invoke' method and the pipe '|' operator.
    """
    def __init__(self, model_name):
        print(f"üì• Loading local LLM: {model_name}...")
        # This is the 'Automatic Transmission' setup we discussed:
        # 1. device=-1 forces CPU usage.
        # 2. torch_dtype=torch.float32 is the fastest format for CPU.
        self.pipe = pipeline(
            "text-generation",
            model=model_name,
            device=-1,
            torch_dtype=torch.float32
        )
        print("‚úÖ Local LLM loaded successfully.")

    def invoke(self, input_data, config: RunnableConfig = None, **kwargs):
        """
        Adapts LangChain inputs (PromptValue or Messages) to the pipeline format.
        """
        # 1. Convert LangChain input to the list-of-dicts format expected by the pipeline
        messages = []

        # Handle LangChain PromptValue (which has .to_messages())
        if hasattr(input_data, 'to_messages'):
            lc_messages = input_data.to_messages()
            for msg in lc_messages:
                # Map LangChain message types to role strings
                role = "user"
                if msg.type == "system": role = "system"
                elif msg.type == "ai": role = "assistant"

                # Gemma pipeline expects content as a list of dicts or string.
                messages.append({"role": role, "content": [{"type": "text", "text": msg.content}]})

        # Handle raw string input (fallback)
        elif isinstance(input_data, str):
            messages = [{"role": "user", "content": [{"type": "text", "text": input_data}]}]

        # 2. Run the pipeline ("Automatic Transmission")
        # We set max_new_tokens to limit the answer length
        outputs = self.pipe(messages, max_new_tokens=512)

        # 3. Extract the generated text
        # The pipeline returns a list of dicts. The last message is the assistant's reply.
        generated_text = outputs[0]['generated_text'][-1]['content']

        # 4. Return as an AIMessage to satisfy LangChain's StrOutputParser
        return AIMessage(content=generated_text)


# --- 1. Setup LangChain Clients ---
print("--- 1. Setting up AI clients ---")
try:
    # Embedding Model Setup
    if USE_LOCAL_EMBEDDINGS:
        embeddings_model = LocalHuggingFaceEmbeddings(LOCAL_EMBEDDING_MODEL_NAME)
    else:
        embeddings_model = AzureOpenAIEmbeddings(
            azure_deployment=os.environ["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"],
            openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
            dimensions=256  # size of embedding vectors, default is 1536
        )

    # Chat Model Setup
    if USE_LOCAL_LLM:
        chat_model = LocalHuggingFaceChatModel(LOCAL_LLM_MODEL_NAME)
    else:
        chat_model = AzureChatOpenAI(
            azure_deployment=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"],
            openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
            temperature=0
        )
    print("‚úÖ AI clients initialized.")

except Exception as e:
    print(f"‚ùå Failed to initialize AI clients. Please check your .env file or model names. Error: {e}")
    # Stop execution if clients fail to initialize
    raise

# --- 2. Generate Embeddings ---
print("\n--- 2. Generating embeddings for all documents ---")
contents_to_embed = [doc['content'] for doc in documents_data]
vector_embeddings = embeddings_model.embed_documents(contents_to_embed)
print(f"‚úÖ Generated {len(vector_embeddings)} embeddings. Vector dimension: {len(vector_embeddings[0])}")

# Add embeddings to our data
for i, doc in enumerate(documents_data):
    doc['content_vector'] = vector_embeddings[i]

# --- 3. Connect to Weaviate ---
print("\n--- 3. Connecting to Weaviate ---")
weaviate_client = weaviate.connect_to_local(
    host="localhost",
    port=WEAVIATE_HTTP_PORT,
    grpc_port=WEAVIATE_GRPC_PORT
)
if weaviate_client.is_ready():
    print("‚úÖ Successfully connected to Weaviate.")
else:
    print("‚ùå Failed to connect to Weaviate.")
    weaviate_client.close()
    raise ConnectionError("Could not connect to Weaviate instance.")

# --- 4. Define and Create Weaviate Collection ---
COLLECTION_NAME = "SimpleRAG"
print(f"\n--- 4. Creating Weaviate collection: '{COLLECTION_NAME}' ---")

# Delete collection if it already exists for a clean run
if weaviate_client.collections.exists(COLLECTION_NAME):
    weaviate_client.collections.delete(COLLECTION_NAME)
    print(f"Deleted existing collection '{COLLECTION_NAME}'.")

# Create new DB schema for our documents
rag_collection = weaviate_client.collections.create(
    name=COLLECTION_NAME,
    properties=[
        wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
    ],
    # Deprecated config:
    # vectorizer_config=wvc.config.Configure.Vectorizer.none(),
    # vector_index_config=wvc.config.Configure.VectorIndex.hnsw(
    #     distance_metric=wvc.config.VectorDistances.COSINE
    # )
    # Changes:
    # - Renamed: vectorizer_config -> vector_config
    # - Replaced: Configure.Vectorizer.none() -> Configure.Vectors.self_provided()
    # - vector_index_config -> subargument of vector_config
    # New config for client v4.18.1:
    vector_config=wvc.config.Configure.Vectors.self_provided(
        vector_index_config=wvc.config.Configure.VectorIndex.hnsw(
            distance_metric=wvc.config.VectorDistances.COSINE
        )
    )
)
print(f"‚úÖ Collection '{COLLECTION_NAME}' created successfully.")

# --- 5. Batch-Insert Data ---
print(f"\n--- 5. Ingesting {len(documents_data)} documents into Weaviate ---")

# Use a context manager to automatically handle batching
with rag_collection.batch.dynamic() as batch:
    for doc in documents_data:
        properties = {
            "title": doc["title"],
            "content": doc["content"]
        }
        batch.add_object(
            properties=properties,
            vector=doc["content_vector"],  # Use default vector
            uuid=generate_uuid5(doc["title"])  # Generate a consistent UUID based on the title
        )

print(f"‚úÖ Data ingestion complete. Total objects in collection: {len(rag_collection)}")

# Close the client connection
weaviate_client.close()

  from .autonotebook import tqdm as notebook_tqdm


--- 1. Setting up AI clients ---
üì• Loading local embedding model: google/embeddinggemma-300m...
‚úÖ Local embedding model loaded successfully.
üì• Loading local LLM: google/gemma-3-1b-it...


`torch_dtype` is deprecated! Use `dtype` instead!
Device set to use cpu


‚úÖ Local LLM loaded successfully.
‚úÖ AI clients initialized.

--- 2. Generating embeddings for all documents ---
‚úÖ Generated 25 embeddings. Vector dimension: 768

--- 3. Connecting to Weaviate ---
‚úÖ Successfully connected to Weaviate.

--- 4. Creating Weaviate collection: 'SimpleRAG' ---
‚úÖ Collection 'SimpleRAG' created successfully.

--- 5. Ingesting 25 documents into Weaviate ---
‚úÖ Data ingestion complete. Total objects in collection: 25


## 7. RAG Experiment

Now we perform the core RAG experiment. For each of our five topics, we will ask a question and follow the RAG pipeline to generate an answer.

**The Pipeline:**
1.  **Expand Query**: Use an LLM to rephrase the user's simple question into a richer, more descriptive query.
2.  **Embed Query**: Generate a vector embedding for the expanded query.
3.  **Retrieve Documents**: Search Weaviate for the top 5 documents most similar to the query vector.
4.  **Generate Answer**: Pass the retrieved documents as context to another LLM call and ask it to synthesize a final, bulleted answer based *only* on the provided information.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
import pandas as pd
from IPython.display import display, Markdown

# Re-connect to Weaviate for the experiment
weaviate_client = weaviate.connect_to_local(
    host="localhost",
    port=WEAVIATE_HTTP_PORT,
    grpc_port=WEAVIATE_GRPC_PORT
)
rag_collection = weaviate_client.collections.get(COLLECTION_NAME)

# --- Define Test Questions ---
test_questions = [
    {"topic": "Atacama Desert", "question": "What do you know about the ALMA observatory?"},
    {"topic": "Tardigrades", "question": "Tell me about the Dsup protein in water bears."},
    {"topic": "Magnus Effect", "question": "What was the Buckau ship and its 1926 journey?"},
    {"topic": "Great Emu War", "question": "What kind of weapons did Major Meredith's forces use?"},
    {"topic": "Baader-Meinhof Phenomenon", "question": "What is the origin of the name for the frequency illusion?"}
]

# --- Create Mappings for Topic Validation ---
# This mapping links each document title to a topic index (0-4)
# It assumes documents_data is defined in a previous cell and grouped by 5.
title_to_topic_index = {doc['title']: i // 5 for i, doc in enumerate(documents_data)}
# This mapping links the topic name from our test questions to the same index
topic_name_to_index = {q['topic']: i for i, q in enumerate(test_questions)}


# --- Define LangChain Chains ---

# 1. Chain for Query Expansion
expansion_prompt = ChatPromptTemplate.from_template(
    "You are an expert in information retrieval. "
    "Please rephrase the following user query to be more descriptive and detailed, "
    "making it suitable for a vector database search. "
    "Return only the rephrased query, without any additional text, headers, or explanations. "
    "\n\nOriginal Query: '{query}'\n\nRephrased Query:"
)
query_expansion_chain = expansion_prompt | chat_model | StrOutputParser()

# 2. Chain for Final Answer Generation (with RAG context)
generation_prompt = ChatPromptTemplate.from_template(
    "You are a factual assistant. "
    "Your task is to answer the user's question based only on the provided context, "
    "do not use common knowledge, do not correct mistakes in provided context. "
    "Synthesize the information from the context into a concise, bullet-point summary. "
    "Focus on specific details like names, numbers, and technical terms mentioned in the context. "
    "If the context does not contain the information needed to answer the question, "
    "you must state: 'The provided context does not contain the answer to this question.' "
    "\n\nContext:\n{context}\n\nQuestion: {question}"
)
answer_generation_chain = generation_prompt | chat_model | StrOutputParser()


# --- Run the Experiment ---

experiment_results = []

for item in test_questions:
    user_query = item["question"]
    current_topic_name = item["topic"]
    current_topic_index = topic_name_to_index[current_topic_name]

    print(f"\n{'=' * 80}\nProcessing query for topic: {current_topic_name}\n{'=' * 80}")
    display(Markdown(f"### Original Question: {user_query}"))

    # 1. Expand the query
    expanded_query = query_expansion_chain.invoke({"query": user_query})
    display(Markdown(f"**Rephrased Query for Search:** {expanded_query}"))

    # --- RAG Pipeline ---

    # Embed Expanded Query
    query_embedding = embeddings_model.embed_query(expanded_query)

    # Retrieve Documents from Weaviate
    retrieved_objects = rag_collection.query.near_vector(
        near_vector=query_embedding,
        limit=5,
        return_metadata=wvc.query.MetadataQuery(distance=True)
    )

    retrieved_docs_content = [obj.properties['content'] for obj in retrieved_objects.objects]
    context_for_llm = "\n\n---\n\n".join(retrieved_docs_content)

    # Display retrieved documents with Topic and Correctness Check
    retrieved_titles = [obj.properties['title'] for obj in retrieved_objects.objects]
    retrieved_distances = [round(obj.metadata.distance, 4) for obj in retrieved_objects.objects]
    retrieved_topics = [title_to_topic_index.get(title, -1) for title in retrieved_titles]
    retrieved_checks = ['‚úÖ' if topic_idx == current_topic_index else '‚ùå' for topic_idx in retrieved_topics]

    df_retrieved = pd.DataFrame({
        'Retrieved Title': retrieved_titles,
        'Cosine Distance': retrieved_distances,
        'Topic': retrieved_topics,
        'Correct': retrieved_checks
    })
    display(Markdown("**Top 5 Retrieved Documents:**"))
    display(df_retrieved)

    # 2. Get answer from the model on the ORIGINAL question (without RAG) just for comparizon
    answer_no_rag_original = answer_generation_chain.invoke({
        "context": "No context found.",
        "question": user_query
    })
    display(Markdown(f"**Answer to the original query, no RAG:**\n{answer_no_rag_original}"))
    
    # 3. Generate Final Answer using RAG
    final_answer = answer_generation_chain.invoke({
        "context": context_for_llm,
        "question": user_query
    })
    display(Markdown(f"**Answer to the original query, with RAG:**\n{final_answer}"))

    experiment_results.append({
        "topic": item["topic"],
        "original_query": user_query,
        "expanded_query": expanded_query,
        "retrieved_docs": retrieved_titles,
        "final_answer": final_answer
    })

# Close the client connection
weaviate_client.close()


Processing query for topic: Atacama Desert


### Original Question: What do you know about the ALMA observatory?

**Rephrased Query for Search:** What are the key features, operational details, recent research findings, and publicly available data pertaining to the ALMA (Atacama Large Millimeter/submillimeter Array) observatory, including its location, instrumentation, data processing techniques, and scientific goals?


**Top 5 Retrieved Documents:**

Unnamed: 0,Retrieved Title,Cosine Distance,Topic,Correct
0,Stargazing Paradise,0.3978,0,‚úÖ
1,Mars-like Soil,0.6486,0,‚úÖ
2,Rain in the Driest Place,0.7114,0,‚úÖ
3,Nitrate Mining History,0.7676,0,‚úÖ
4,Ancient Mummies,0.7745,0,‚úÖ


**Answer to the original query, no RAG:**
Here's a summary of the ALMA observatory based solely on the provided context:

*   ALMA stands for Atacama Large Millimeter/submillimeter Array.
*   It‚Äôs a collaborative project involving several European and Chilean institutions.
*   It‚Äôs located in the Atacama Desert, Chile.
*   ALMA is designed to observe the universe at millimeter and submillimeter wavelengths.
*   It‚Äôs known for its high-resolution imaging capabilities.
*   It‚Äôs used to study star formation, molecular clouds, and the early universe.

**Answer to the original query, with RAG:**
Here‚Äôs a summary of the provided context regarding the ALMA observatory:

*   The Atacama Large Millimeter Array (ALMA) is a major observatory located in the Atacama Desert.
*   It consists of 66 high-precision antennas.
*   NASA used the Atacama Desert as a testing ground for instruments intended for Mars missions, including the Sample Analysis at Mars (SAM) instrument suite.
*   The Atacama Desert is the driest nonpolar desert in the world.
*   In 2015, a rare weather event caused the desert to bloom with flowers.


Processing query for topic: Tardigrades


### Original Question: Tell me about the Dsup protein in water bears.

**Rephrased Query for Search:** ‚ÄúDescribe the known function, expression patterns, and recent research related to the Dsup protein, specifically focusing on its role in water bear (specifically, *Phasmatodea*) specimens, including any observed effects on water bear behavior or physiology.‚Äù


**Top 5 Retrieved Documents:**

Unnamed: 0,Retrieved Title,Cosine Distance,Topic,Correct
0,Radiation Resistance,0.5935,1,‚úÖ
1,Extremophile Survivors,0.6505,1,‚úÖ
2,Anhydrobiosis,0.7534,1,‚úÖ
3,Microscopic Size,0.7555,1,‚úÖ
4,Space Travelers,0.8185,1,‚úÖ


**Answer to the original query, no RAG:**
I‚Äôm sorry, but the provided context does not contain the answer to this question.

**Answer to the original query, with RAG:**
Here‚Äôs a bullet-point summary of the provided context regarding the Dsup protein:

*   Dsup is a unique protein found in tardigrades.
*   It protects tardigrade DNA from radiation damage.
*   The protein is known as ‚ÄúDamage suppressor.‚Äù
*   Tardigrades can withstand radiation doses hundreds of times higher than lethal to humans.


Processing query for topic: Magnus Effect


### Original Question: What was the Buckau ship and its 1926 journey?

**Rephrased Query for Search:** ‚ÄúIdentify all records related to the Buckau ship, specifically focusing on its voyage in 1926. Include details about the ship‚Äôs name, route, destination, and any associated events or notable circumstances during that journey.‚Äù


**Top 5 Retrieved Documents:**

Unnamed: 0,Retrieved Title,Cosine Distance,Topic,Correct
0,Flettner Rotors,0.6239,2,‚úÖ
1,Australia's Unusual Conflict,0.8382,3,‚ùå
2,A Second Attempt,0.8469,3,‚ùå
3,Origin of the Name,0.8719,4,‚ùå
4,Examples in Daily Life,0.8773,4,‚ùå


**Answer to the original query, no RAG:**
The provided context does not contain the answer to this question.

**Answer to the original query, with RAG:**
Here‚Äôs a bullet-point summary of the provided context regarding the Buckau ship:

*   The Buckau was a ship that successfully crossed the Atlantic in 1926.
*   It utilized two large rotating cylinders powered by the Magnus effect for propulsion.


Processing query for topic: Great Emu War


### Original Question: What kind of weapons did Major Meredith's forces use?

**Rephrased Query for Search:** What types of firearms, explosives, and other weaponry were employed by Major Meredith‚Äôs military forces during the Battle of Blackwood Creek, specifically focusing on the tactical deployment and characteristics of their arsenal?


**Top 5 Retrieved Documents:**

Unnamed: 0,Retrieved Title,Cosine Distance,Topic,Correct
0,Military Hardware,0.4927,3,‚úÖ
1,Australia's Unusual Conflict,0.6518,3,‚úÖ
2,The Elusive Emus,0.6932,3,‚úÖ
3,Operation Outcome,0.7932,3,‚úÖ
4,Microscopic Size,0.8278,1,‚ùå


**Answer to the original query, no RAG:**
The provided context does not contain the answer to this question.

**Answer to the original query, with RAG:**
Here‚Äôs a bullet-point summary of the information provided:

*   Major G.P.W. Meredith‚Äôs military force used two Lewis automatic machine guns.
*   The force was equipped with 10,000 rounds of ammunition.


Processing query for topic: Baader-Meinhof Phenomenon


### Original Question: What is the origin of the name for the frequency illusion?

**Rephrased Query for Search:** ‚ÄúExplore the historical and psychological origins of the ‚Äúfrequency illusion‚Äù name, tracing its development from initial observation to its current usage in research and popular discussion. Focus specifically on the debates surrounding the name‚Äôs connection to Benjamin Lee Wesp, the pioneering researcher who coined the term, and the evolution of the concept‚Äôs meaning.‚Äù


**Top 5 Retrieved Documents:**

Unnamed: 0,Retrieved Title,Cosine Distance,Topic,Correct
0,Frequency Illusion,0.4924,4,‚úÖ
1,Not a Real Increase,0.595,4,‚úÖ
2,Two Cognitive Processes,0.6275,4,‚úÖ
3,Origin of the Name,0.6711,4,‚úÖ
4,Examples in Daily Life,0.7405,4,‚úÖ


**Answer to the original query, no RAG:**
The provided context does not contain the answer to this question.

**Answer to the original query, with RAG:**
*   The name originated in 1994 when a commenter on the St. Paul Pioneer Press online board mentioned hearing about the German Baader-Meinhof Gang twice in 24 hours.


## 8. Cleanup

The experiment is complete. The final step is to stop and remove the Weaviate Docker container to free up system resources. The Docker image will be kept for future use.

In [None]:
print(f"--- Stopping and removing container '{WEAVIATE_CONTAINER_NAME}' ---")

# Windows PowerShell uses semicolon instead of &&
if platform.system() == "Windows":
    cleanup_command = f"docker stop {WEAVIATE_CONTAINER_NAME}; docker rm {WEAVIATE_CONTAINER_NAME}"
else:
    cleanup_command = f"docker stop {WEAVIATE_CONTAINER_NAME} && docker rm {WEAVIATE_CONTAINER_NAME}"

result = run_shell_command(cleanup_command)

if result["success"]:
    print(f"‚úÖ Container stopped and removed successfully.")
else:
    print(f"‚ö†Ô∏è Container may have already been stopped or removed.")
    print(f"Details: {result['stderr']}")

print("\n--- Docker container status after cleanup ---")
ps_result = run_shell_command(f"docker ps -a --filter name={WEAVIATE_CONTAINER_NAME}")
if ps_result["success"] and WEAVIATE_CONTAINER_NAME not in ps_result["stdout"]:
    print("‚úÖ No container found. Cleanup confirmed.")
else:
    print("Container status:")
    print(ps_result["stdout"])

--- Stopping and removing container 'simple-rag-weaviate' ---
‚ö†Ô∏è Container may have already been stopped or removed.
Details: <3>WSL (555 - Relay) ERROR: CreateProcessCommon:800: execvpe(bash) failed: No such file or directory

--- Docker container status after cleanup ---
Container status:

Container status:



# Additional: running models locally

## 9. Try local run for embeddings model

**NOTE**: you had to register at hugging face and accept embeddiggemma 3 license on site to use this approach.

References:
- https://developers.googleblog.com/en/introducing-embeddinggemma/
- https://huggingface.co/google/embeddinggemma-300m

## 10. Try local run for text generation model

**NOTE**: you had to register at hugging face site to use this approach.

**CODE**: below there are two examples of the same. The first is like a car with "automatic transmission", and the second is "manual transmission", that may be more suitable for study.

References:
- https://huggingface.co/google/gemma-3-1b-it

In [None]:
from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation", 
    model="google/gemma-3-1b-it", # Using a 1B or 2B parameter model is optimal for a laptop. It consumes less RAM (approx 2-4GB) and generates text reasonably fast on a CPU.
    device=-1,                    # Set device="cuda" to -1 to tell Transformers to use the CPU           
    torch_dtype=torch.float32     # Float32 is the native and fastest format for CPUs, bfloat16 work slower or unsupported.
)

messages = [
    [
        {
            "role": "system",
            "content": [{"type": "text", "text": "You are a helpful assistant."},]
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "Write a poem on Hugging Face, the company"},]
        },
    ],
]

output = pipe(messages, max_new_tokens=50)
print(output)

Device set to use cpu


[[{'generated_text': [{'role': 'system', 'content': [{'type': 'text', 'text': 'You are a helpful assistant.'}]}, {'role': 'user', 'content': [{'type': 'text', 'text': 'Write a poem on Hugging Face, the company'}]}, {'role': 'assistant', 'content': 'Okay, here‚Äôs a poem about Hugging Face, aiming to capture its essence and feel:\n\n**The Neural Forge**\n\nWithin the cloud, a vibrant hue,\nHugging Face, a digital view.\nA community, a steady'}]}]]


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "google/gemma-3-1b-it"

# REMOVED: BitsAndBytesConfig
# Quantization (load_in_8bit) relies on CUDA (GPU) and does not work on CPU.
# Since the 1B model is small, we can load it normally into RAM without quantization.

# Load the model explicitly for CPU usage
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float32,    # Use float32 for best CPU performance and compatibility
    device_map="cpu"              # Explicitly load model to CPU
).eval()

tokenizer = AutoTokenizer.from_pretrained(model_id)

# Note: apply_chat_template expects a single list of messages for a single prompt, 
# or a list of lists for batch processing. Your structure implies a batch of 1.
messages = [
    [
        {
            "role": "system",
            "content": [{"type": "text", "text": "You are a helpful assistant."},]
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "Write a poem on Hugging Face, the company"},]
        },
    ],
]

# Prepare inputs
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to("cpu") # Move inputs to CPU. 
# REMOVED: .to(torch.bfloat16). Inputs for the model (input_ids) are integers, 
# and attention_mask is usually handled automatically. Converting input_ids to float causes errors.

# Generate output
with torch.inference_mode(): # inference_mode is slightly faster than no_grad
    outputs = model.generate(**inputs, max_new_tokens=64)

# Decode and print
decoded_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(decoded_output[0])

`torch_dtype` is deprecated! Use `dtype` instead!


user
You are a helpful assistant.

Write a poem on Hugging Face, the company
model
Okay, here‚Äôs a poem about Hugging Face, aiming to capture its essence and feel:

**The Neural Bloom**

In a world of code, a digital space,
Where models grow, with elegant grace,
Lies Hugging Face, a vibrant hue,
A community built, both old
