# Section 3: Implementing Reinforcement Learning Using Graph-based RAG

In this section, we will introduce the concept of **reinforcement learning** and explore how it enhances the capabilities of Retrieval-Augmented Generation (RAG) systems. Reinforcement learning involves training models to improve their performance based on feedback from interactions, enabling them to adapt and refine their behavior over time. Here, we will leverage fixed Cypher paths to streamline interactions with a Neo4j knowledge graph, ensuring consistency and efficiency in retrieving data.

By combining reinforcement learning with Graph-based RAG, you'll learn how to build adaptive and intelligent agents capable of delivering precise and timely insights.

> **IMPORTANT:** You will need to rerun the environment setup since this is a different Jupyter Notebook. This is needed because of the memory limits of Google Colab.

## Step 1: Environment Setup

To get started, we need to set up two main components of our environment: a Neo4j graph database and a local LLM for question-answering. We'll use **Docker** to run Neo4j, you will need to download an LLM (we'll provide some recommendations) and set up a Python environment for our code.

### 1.1 Launch Neo4j with Docker

First, spin up a Neo4j instance using Docker. We can use the official Neo4j image and expose the default ports (7474 for HTTP interface, 7687 for Bolt protocol). Below is a **Docker command** that starts a Neo4j instance:

In [None]:
!pip install udocker
!udocker --allow-root install

!mkdir -p /root/neo4j-inmem
!mount -t tmpfs -o size=1g tmpfs /root/neo4j-inmem
!mkdir -p /root/neo4j-inmem/data
!mkdir -p /root/neo4j-inmem/logs
!mkdir -p /root/neo4j-inmem/import
!mkdir -p /root/neo4j-inmem/plugins

# Run neo4j in docker container
!nohup udocker --allow-root run \
  --publish=7474:7474 --publish=7687:7687 \
  --env NEO4J_AUTH=neo4j/neo4jneo4j \
  -v /root/neo4j-inmem/data:/data \
  -v /root/neo4j-inmem/logs:/logs \
  -v /root/neo4j-inmem/import:/var/lib/neo4j/import \
  -v /root/neo4j-inmem/plugins:/plugins \
  -e NEO4JLABS_PLUGINS='["apoc"]' \
  -e NEO4J_apoc_export_file_enabled=true \
  -e NEO4J_apoc_import_file_enabled=true \
  -e NEO4J_dbms_directories_data=/data \
  neo4j:5.26 &

print("\n\n")
print("Setup Docker Complete!")

### 1.2 Python Environment and Dependencies

With Neo4j running and the model file ready, set up a Python environment for running the ingestion and querying code. You should have Python 3.10+ available. It's recommended to use a virtual environment or Conda environment for the lab.

> **IMPORTANT** This will take ~5-6 minutes to complete on Google Colab.

In [None]:
!pip install llama-cpp-python neo4j==5.28.1 requests==2.32.3 sentence-transformers==4.1.0 ctransformers==0.2.27 spacy==3.8.5

!pip install Flask==3.1.0
!pip install gdown==5.2.0

# download small English model for NER
# python -m spacy download en_core_web_sm
import spacy

spacy.prefer_gpu()
nlp = spacy.load("en_core_web_sm")

# Download files from my Public Google Drive
print("\n\n")
print("Download Dataset from Google Drive")

import os
from pathlib import Path

import gdown
import zipfile
import shutil

# backup id: 13GRUxdsUUlUK9uC832Su9Qy2mst9Jznq
url = 'https://drive.google.com/uc?id=1f3dqqf9VSnGoVFCP4IozY2AWI39C1UMe'
output = 'bbc-mini.zip'

# Check if the file already exists
if not os.path.exists(output):
    print("Downloading the zip file...")
    gdown.download(url, output, quiet=False)
else:
    print("Zip file already exists, skipping download.")

with zipfile.ZipFile("bbc-mini.zip","r") as zip_ref:
    zip_ref.extractall("./")

print("\n\n")
print("Setup Complete!")

> **IMPORTANT:** Don't move onto the next section until you see a "Complete!" in the output for this section.

### 1.3 Set Up the Local LLM

In this lab, we'll use a 7B parameter model called [neural-chat-7B-v3-3-GGUF](https://huggingface.co/TheBloke/neural-chat-7B-v3-3-GGUF) (a quantized GGUF file). This is the model that will be used in the lab, so for maximum "it just works", stick with this model.

In [None]:
!wget https://huggingface.co/TheBloke/neural-chat-7B-v3-3-GGUF/resolve/main/neural-chat-7b-v3-3.Q4_K_M.gguf

print("\n\n")
print("Download Complete!")

> **IMPORTANT:** Don't move onto the next section until you see a "Complete!" in the output for this section.

## Step 2: Data Ingestion: From Raw Text to a Queryable Graph

With the environment ready, we'll proceed to prepare our data (BBC articles) and build the knowledge graph in Neo4j as graph nodes and relationships.

Our knowledge source is a collection of BBC news articles in text format which can be found in the zip file [bbc-lite.zip](./workshop/1_llm_cypher/bbc-lite.zip). This zip file ontains a subset of 300 BBC news articles from the 2225 articles in the [BBC Full Text Document Classification](https://bit.ly/4hBKNjp) dataset. After unzipping the archive, the directory structure will look like:

```
bbc/
├── tech/
    ├── 001.txt
    ├── 002.txt
    ├── 003.txt
    ├── 004.txt
    ├── 005.txt
    └── ...
```

Each file is a news article relating to technology in the world today.

### 2.1 What We’re Really Building

Forget vector stores for a moment. We're creating **two node labels** and **one relationship type**-all you need for entity-centric retrieval:

| Node Label       | Key Properties           | Purpose                                    |
| ---------------- | ------------------------ | ------------------------------------------ |
| `:Document`      | `id`, `title`, `content` | Holds the full article text                |
| `:Entity`       | `name`                   | Unique named entities (people, orgs, etc.) |
| **Relationship** | **Direction**            | **Meaning**                                |
| `[:MENTIONS]`    | `(Document) → (Entity)` | "This article talks about that entity."    |

No vectors... just raw NER-driven connections that keep the graph clean and demo-ready.

### 2.2 Ingestion Script

Now we will construct the knowledge graph in Neo4j by creating nodes for **documents** and **entities**, and defining relationships among them. Our graph schema will be:

* **Document** nodes: each article is a document node with properties like `title` (we'll use filename as title) and `content` (the full text).
* **Entity** nodes: significant entities mentioned in the articles (we'll extract these via NER).

Relationships:

* `(:Document)-[:MENTIONS]->(:Entity)` - links a document to an entity it mentions.

We'll use **spaCy** to identify named entities in each article as our "areas of interest." SpaCy's small English model can recognize entities like PERSON, ORG (organization), GPE (location), etc. We'll treat each unique entity text as a Entity node (with an optional property for its type/label).

In [None]:
#!/usr/bin/env python3

import os
import uuid
import spacy

from neo4j import GraphDatabase

# Neo4j connection settings
NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "neo4jneo4j"

# Path to the unzipped BBC dataset folder (with subfolders like 'tech')
DATASET_PATH = "./bbc"

def ingest_bbc_documents_with_ner():
    """
    Ingest BBC documents from the 'technology' subset (or other categories if desired)
    and store them in Neo4j with Document and Entity nodes. The code uses spaCy for NER
    and links documents to extracted entities using MENTIONS relationships.
    """
    # Load spaCy's small English model for Named Entity Recognition
    nlp = spacy.load("en_core_web_sm")

    # Connect to Neo4j
    driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))

    # Perform ingestion in a session
    with driver.session() as session:
        # Optional: clear old data
        print("Clearing old data from Neo4j...")
        session.run("MATCH (n) DETACH DELETE n")
        print("Old data removed.\n")

        # Walk through each category folder
        for category in os.listdir(DATASET_PATH):
            category_path = os.path.join(DATASET_PATH, category)
            if not os.path.isdir(category_path):
                continue  # Skip non-directories

            print(f"Ingesting documents in category '{category}'...")
            for filename in os.listdir(category_path):
                if filename.endswith(".txt"):
                    filepath = os.path.join(category_path, filename)

                    with open(filepath, "r", encoding="utf-8", errors="replace") as f:
                        text_content = f.read()

                    # Generate a UUID for each document
                    doc_uuid = str(uuid.uuid4())

                    # Create or MERGE the Document node
                    create_doc_query = """
                    MERGE (d:Document {doc_uuid: $doc_uuid})
                    ON CREATE SET
                        d.title = $title,
                        d.content = $content,
                        d.category = $category
                    RETURN d
                    """
                    session.run(
                        create_doc_query,
                        doc_uuid=doc_uuid,
                        title=filename,
                        content=text_content,
                        category=category
                    )

                    # Named Entity Recognition
                    doc_spacy = nlp(text_content)

                    # For each recognized entity, MERGE on (name + label)
                    # Then create a relationship from the Document to the Entity.
                    for ent in doc_spacy.ents:
                        # Skip very short or numeric-only entities
                        if len(ent.text.strip()) < 3:
                            continue

                        # Generate a unique ID for new entities
                        entity_uuid = str(uuid.uuid4())

                        merge_entity_query = """
                        MERGE (e:Entity {name: $name, label: $label})
                        ON CREATE SET e.ent_uuid = $ent_uuid
                        RETURN e.ent_uuid as eUUID
                        """
                        record = session.run(
                            merge_entity_query,
                            name=ent.text.strip(),
                            label=ent.label_,
                            ent_uuid=entity_uuid
                        ).single()

                        ent_id = record["eUUID"]

                        # Now create relationship by matching on doc_uuid & ent_uuid
                        rel_query = """
                        MATCH (d:Document { doc_uuid: $docId })
                        MATCH (e:Entity { ent_uuid: $entId })
                        MERGE (d)-[:MENTIONS]->(e)
                        """
                        session.run(
                            rel_query,
                            docId=doc_uuid,
                            entId=ent_id
                        )

            print(f"Finished ingesting category '{category}'.\n")

    driver.close()

# Ingest the data into our RAG pipeline/neo4j
ingest_bbc_documents_with_ner()

print("\n\n")
print("Ingest Complete!")

At this point, we have a rich knowledge graph: documents categorized, and connected to the key entities they mention. This graph can answer more complex questions than a pure vector search - for example, we can traverse from categories to entities to documents, etc., to find multi-hop relationships. We'll leverage this graph for querying in the next step.

> **IMPORTANT:** Don't move onto the next section until you see a "Complete!" in the output for this section.

## Step 3: Implementation of Reinforcement Using Graph-based RAG

This reinforcement learning demonstration presents a series of five technology "facts" to the user and treats each "yes" response as a positive reward signal, storing the approved fact as a Neo4j `:Document` node and linking it to its named entities via `:MENTIONS` relationships that carry a `24-hour expiration timestamp` (short-term memory). Facts the user declines are simply skipped. Immediately afterward, the script runs a RAG-style retrieval: it extracts entities from a question derived from each fact, fetches all `:Document` nodes whose `:MENTIONS` relationships are either still unexpired or have never expired (long-term), concatenates document snippets into a context, and feeds that context plus question to a locally loaded LLaMA model (via llama-cpp-python) to generate an answer. Through this loop of interactive feedback and retrieval-augmented inference, the system "learns" which facts to retain and demonstrates how those preferences influence downstream Q&A.

### 3.1 Ingesting New Facts in Short-Term Memory

New facts are ingested as Document nodes linked to recognized Entity nodes via MENTIONS relationships stamped with a 24-hour expiration to represent short-term memory.

> **IMPORTANT:** You will be asked a series of questions which require a response in the output console below. Hit 'yes' to like 2 of these items and no to the rest.

In [None]:
#!/usr/bin/env python3

import time
import uuid
import spacy
from neo4j import GraphDatabase
from llama_cpp import Llama

# Neo4j connection settings
NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "neo4jneo4j"

# Path to your local LLaMA model file. Example: "models/ggml-model-q4_0.bin"
LLAMA_MODEL_PATH = "./neural-chat-7b-v3-3.Q4_K_M.gguf"

# Five BBC-style technology facts
TECH_FACTS = [
    """
    OpenAI has agreed to buy artificial intelligence-assisted coding tool Windsurf for about $3 billion, Bloomberg News reported on Monday, citing people familiar with the matter.
    The deal has not yet closed, the report added.

    OpenAI declined to comment, while Windsurf did not immediately respond to Reuters' requests for comment.

    Windsurf, formerly known as Codeium, had recently been in talks with investors, including General Catalyst and Kleiner Perkins, to raise funding at a $3 billion valuation, according to Bloomberg News.

    It was valued at $1.25 billion last August following a $150 million funding round led by venture capital firm General Catalyst. Other investors in the company include Kleiner Perkins and Greenoaks.

    The deal, which would be OpenAI's largest acquisition to date, would complement ChatGPT's coding capabilities. The company has been rolling out improvements in coding with the release of each of its newer models, but the competition is heating up.

    OpenAI has made several purchases in recent years to boost different segments of its AI products. It bought search and database analytics startup Rockset in a nine-figure stock deal last year, to provide better infrastructure for its enterprise products.

    OpenAI's weekly active users surged past 400 million in February, jumping sharply from the 300 million weekly active users in December.
    """,
    """
    Will the Apple Vision Pro be discontinued? It's certainly starting to look that way. In the last couple of months, numerous reports have emerged suggesting that Apple is either slowing down or completely halting production of its flagship headset.

    So, what does that mean for Apple's future in the extended reality market?

    Apple has had a rough time with its Vision Pro headset. Despite incredibly hype leading up to the initial release, and the fact that preorders for the device sold out almost instantly, demand for headset has consistently dropped over the last year.

    In fact, sales have diminished to the point that rumors have been coming thick and fast. For a while now, industry analysts and tech enthusiasts believe Apple might give up on its XR journey entirely and return its focus to other types of tech (like smartphones).

    However, while Apple has failed to achieve its sales targets with the Vision Pro, I don't think they will abandon the XR market entirely. It seems more likely that Apple will view the initial Vision Pro as an experiment, using it to pave the way to new, more popular devices.

    Here's what we know about Apple's XR journey right now.
    """,
    """
    OpenAI sees itself paying a lower share of revenue to its investor and close partner Microsoft by 2030 than it currently does, The Information reported, citing financial documents.

    The news comes after OpenAI this week changed tack on a major restructuring plan to pursue a new plan that would see its for-profit arm becoming a public benefit corporation (PBC) but continue to be controlled by its nonprofit division.

    OpenAI currently has an agreement to share 20% of its top line with Microsoft, but the AI company has told investors it expects to share 10% of revenue with its business partners, including Microsoft, by the end of this decade, The Information reported.

    Microsoft has invested tens of billions in OpenAI, and the two companies currently have a contract until 2030 that includes revenue sharing from both sides. The deal also gives Microsoft rights to OpenAI IP within its AI products, as well as exclusivity on OpenAI's APIs on Azure.

    Microsoft has not yet approved OpenAI's proposed corporate structure, Bloomberg reported on Monday, as the bigger tech company reportedly wants to ensure the new structure protects its multi-billion-dollar investment.

    OpenAI and Microsoft did not immediately return requests for comment.
    """,
    """
    Perplexity, the developer of an AI-powered search engine, is raising a $50 million seed and pre-seed investment fund, CNBC reported. Although the majority of the capital is coming from limited partners, Perplexity is using some of the capital it raised for the company's growth to anchor the fund. Perplexity reportedly raised $500 million at a $9 billion valuation in December.

    Perplexity's fund is managed by general partners Kelly Graziadei and Joanna Lee Shevelenko, who in 2018 co-founded an early-stage venture firm, F7 Ventures, according to PitchBook data. F7 has invested in startups like women's health company Midi. It's not clear if Graziadei and Shevelenko will continue to run F7 or if they will focus all their energies on Perplexity's venture fund.

    OpenAI also manages an investment fund known as the OpenAI Startup Fund. However, unlike Perplexity, OpenAI claims it does not use its own capital for these investments.
    """,
    """
    DeepSeek-R2 is the upcoming AI model from Chinese startup DeepSeek, promising major advancements in multilingual reasoning, code generation, and multimodal capabilities. Scheduled for early 2025, DeepSeek-R2 combines innovative training techniques with efficient resource usage, positioning itself as a serious global competitor to Silicon Valley's top AI technologies.

    In the rapidly evolving landscape of artificial intelligence, a new contender is emerging from China that promises to reshape global AI dynamics. DeepSeek, a relatively young AI startup, is making waves with its forthcoming DeepSeek-R2 model—a bold step in China's ambition to lead the global AI race.

    As Western tech giants like OpenAI, Anthropic, and Google dominate headlines, DeepSeek's R2 model represents a significant milestone in AI development from the East. With its unique approach to training, multilingual capabilities, and resource efficiency, DeepSeek-R2 isn't just another language model—it's potentially a game-changer for how we think about AI development globally.

    What is DeepSeek-R2?
    DeepSeek-R2 is a next-generation large language model that builds upon the foundation laid by DeepSeek-R1. According to reports from Reuters, DeepSeek may be accelerating its launch timeline, potentially bringing this advanced AI system to market earlier than the original May 2025 target.

    What sets DeepSeek-R2 apart is not just its improved performance metrics but its underlying architecture and training methodology. While R1 established DeepSeek as a serious competitor with strong multilingual and coding capabilities, R2 aims to push these boundaries significantly further while introducing new capabilities that could challenge the dominance of models like GPT-4 and Claude.

    DeepSeek-R2 represents China's growing confidence and technical capability in developing frontier AI technologies. The model has been designed from the ground up to be more efficient with computational resources—a critical advantage in the resource-intensive field of large language model development.
    """
]
TECH_CHECK =[
    "How much did OpenAI pay for Windsurf?",
    "What is the status of the Apple Vision Pro?",
    "What is the revenue share agreement between OpenAI and Microsoft?",
    "What is Perplexity's new fund?",
    "What is the significance of DeepSeek-R2?"
]

def connect_neo4j():
    driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
    return driver

def setup_neo4j_schema(session):
    """
    Optional: Clear old documents and relationships if desired,
    for a fresh run. This will delete all :Document nodes and Entities.
    Comment out if you want to preserve prior data.
    """
    query = """
    MATCH (d:Document)
    DETACH DELETE d
    """
    session.run(query)

    query = """
    MATCH (e:Entity)
    DETACH DELETE e
    """
    session.run(query)

def insert_fact_with_expiration(session, fact_text, nlp, expiration_window_seconds=24*60*60):
    """
    Insert the fact as a :Document node. For each recognized entity, create
    a :MENTIONS relationship with an expiration time (now + expiration_window_seconds).
    """
    doc_uuid = str(uuid.uuid4())
    create_doc_query = """
    MERGE (d:Document {doc_uuid: $doc_uuid})
    ON CREATE SET
        d.content = $content,
        d.timestamp = timestamp()
    RETURN d
    """
    session.run(create_doc_query, doc_uuid=doc_uuid, content=fact_text)

    # Named Entity Recognition
    doc_spacy = nlp(fact_text)
    expiration_time = time.time() + expiration_window_seconds

    for ent in doc_spacy.ents:
        if len(ent.text.strip()) < 3:
            continue

        entity_uuid = str(uuid.uuid4())

        merge_entity_query = """
        MERGE (e:Entity {name: $name, label: $label})
        ON CREATE SET e.ent_uuid = $ent_uuid
        RETURN e
        """
        session.run(
            merge_entity_query,
            name=ent.text.strip(),
            label=ent.label_,
            ent_uuid=entity_uuid
        )

        # Create a short-term mention relationship with an expiration
        mentions_query = """
        MATCH (d:Document {doc_uuid: $docId})
        MATCH (e:Entity {ent_uuid: $entId})
        MERGE (d)-[m:MENTIONS]->(e)
        ON CREATE SET m.expiration = $expiration
        """
        session.run(
            mentions_query,
            docId=doc_uuid,
            entId=entity_uuid,
            expiration=expiration_time
        )

    return doc_uuid

def extract_entities_spacy(text, nlp):
    doc = nlp(text)
    return [(ent.text.strip(), ent.label_) for ent in doc.ents if len(ent.text.strip()) >= 3]

def fetch_documents_by_entities(session, entity_texts, top_k=5):
    """
    Fetch documents for which there is a :MENTIONS relationship *not expired*
    or having no expiration property. That is:
      - m.expiration IS NULL (long-term) OR m.expiration > now (unexpired short-term)
    Return up to top_k docs sorted by the count of matched entities.
    """
    if not entity_texts:
        return []

    entity_list_lower = [txt.lower() for txt in entity_texts]
    current_time = time.time()

    query = """
    MATCH (d:Document)-[m:MENTIONS]->(e:Entity)
    WHERE toLower(e.name) IN $entity_list
      AND (m.expiration IS NULL OR m.expiration > $current_time)
    WITH d, count(e) AS matchingEntities
    ORDER BY matchingEntities DESC
    LIMIT $topK
    RETURN
        d.doc_uuid AS doc_uuid,
        d.content AS content,
        matchingEntities
    """
    results = session.run(query, entity_list=entity_list_lower, current_time=current_time, topK=top_k)

    docs = []
    for record in results:
        docs.append({
            "doc_uuid": record["doc_uuid"],
            "content": record["content"],
            "match_count": record["matchingEntities"]
        })
    return docs

def generate_answer(llm, question, context):
    """
    Generates an answer using llama-cpp-python.
    """
    prompt = f"""You are given the following context from multiple documents:
{context}

Question: {question}

Please provide a concise answer.
Answer:
"""
    output = llm(
        prompt,
        max_tokens=1024,
        temperature=0.2,
        stop=["Answer:"]
    )
    return output["choices"][0]["text"].strip()

def main():
    print("=== Reinforcement Learning Demo (Single Mechanism for Memory) ===")

    # Load spaCy
    nlp = spacy.load("en_core_web_sm")

    # Load LLaMA model
    print("Loading local LLaMA model; please wait...")
    llm = Llama(
        model_path=LLAMA_MODEL_PATH,
        n_ctx=32768,
        temperature=0.2,
        top_p=0.95,
        repeat_penalty=1.2,
        verbose=False,
    )

    # use_gpu=True,
    # n_gpu_layers=-1,      # offload *all* transformer layers to the GPU
    # n_threads=2,          # spawn enough CPU threads to feed the GPU
    # n_batch=256,          # process 256 tokens at once for throughput
    # f16_kv=True,          # store KV cache in half-precision on GPU

    driver = connect_neo4j()
    with driver.session() as session:
        # Optional: Clear existing data
        setup_neo4j_schema(session)

        # Store or skip each fact
        stored_fact_uuids = []
        for fact in TECH_FACTS:
            print("\nNew Fact Detected:")
            print(f" -> {fact}")
            decision = input("Store this fact for 24 hours? (yes/no): ").strip().lower()

            if decision == "yes":
                doc_uuid = insert_fact_with_expiration(session, fact, nlp)
                stored_fact_uuids.append(doc_uuid)
                print(f"Stored with doc_uuid {doc_uuid}\n")
            else:
                print("Skipped storing fact.\n")

        # Now let's do a RAG query test for each fact
        for idx, fact in enumerate(TECH_CHECK, start=1):
            print(f"\n=== RAG Query Test for Fact #{idx} ===")
            question = f"What do we know related to: \"{fact}\"?"
            recognized_entities = extract_entities_spacy(question, nlp)
            entity_texts = [ent[0] for ent in recognized_entities]

            docs = fetch_documents_by_entities(session, entity_texts, top_k=5)
            if not docs:
                print("No documents found for this query.")
                continue

            # Build context
            combined_context = ""
            for doc in docs:
                snippet = doc["content"][:200].replace("\n", " ")
                combined_context += f"\n---\nDocUUID: {doc['doc_uuid']}\nSnippet: {snippet}...\n"

            final_answer = generate_answer(llm, question, combined_context)
            print(f"Question: {question}")
            print(f"Answer: {final_answer}")

    driver.close()

# run main
main()

print("\n\n")
print("Reinforcement Learning Complete!")


Below are the key considerations you'll want to keep in mind when running, extending, or hardening this RAG-style reinforcement demo:

1. **Memory & Expiration Logic**

   * **24-hour TTL**: By default `expiration = now + 24h`. You can parameterize `expiration_window_seconds` for shorter or longer short-term memory.
   * **Automatic pruning**: Relationships past their `expiration` won't be returned by your RAG query - but they still live in the graph for auditing. If you prefer automatic cleanup, consider Neo4j's `apoc.periodic` jobs or [TTL Procedures](https://neo4j.com/labs/apoc/4.4/temporal/ttl/) to delete or archive expired edges.

2. **NER Quality & Granularity**

   * **Model choice**: `en_core_web_sm` is lightweight but misses many fine-grained entities. For technical facts you may want `en_core_web_trf` or a custom fine-tuned model.
   * **Filtering**: You skip entities shorter than three characters - but you may also want to filter out numeric entities or overly generic terms.

3. **Auditability & Data Retention**

   * Every `:Document` and `:Entity` persists forever - only `m.expiration` changes. If privacy or storage is a concern, plan a downstream archival process.
   * You might also add a `createdBy` or `source` property to track provenance of each fact.

Keep these points in mind as you test and evolve the code. They'll help you maintain performance, accuracy, and a clear audit trail as your RAG agent navigates between ephemeral and enduring knowledge.

## Step 4: Short-Term and Long-Term Memory

Short-term memory temporarily records new facts with an expiration timestamp, whereas long-term memory endures by removing that expiration flag to preserve knowledge permanently.

### 4.1 Promoting Data from Short-Term to Long-Term Memory Through Reinforcement Learning

> **IMPORTANT:** For this section, type `yes` to promote 1 of these facts into long-term memory and then type `expire` on the second fact.

In [None]:
#!/usr/bin/env python3
"""
transfer_and_query_demo.py
Unified short-/long-term memory with a single :Document label.
Adds a third option (“expire”) to force-expire short-term MENTIONS.
"""

import time
import spacy
from neo4j import GraphDatabase
from llama_cpp import Llama

# ─── Configuration ─────────────────────────────────────────────────────────────
NEO4J_URI      = "bolt://localhost:7687"
NEO4J_USER     = "neo4j"
NEO4J_PASSWORD = "neo4jneo4j"

# Path to your GGUF model
LLAMA_MODEL_PATH = "./neural-chat-7b-v3-3.Q4_K_M.gguf"

TECH_CHECK = [
    "How much did OpenAI pay for Windsurf?",
    "What is the status of the Apple Vision Pro?",
    "What is the revenue share agreement between OpenAI and Microsoft?",
    "What is Perplexity's new fund?",
    "What is the significance of DeepSeek-R2?"
]
# ───────────────────────────────────────────────────────────────────────────────


# ─── Neo4j Helpers ─────────────────────────────────────────────────────────────
def connect_neo4j():
    return GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))


def find_documents_with_unexpired_mentions(session):
    """
    Return documents whose :MENTIONS relationships still have a future expiration.
    """
    now = time.time()
    query = """
    MATCH (d:Document)-[m:MENTIONS]->(e:Entity)
    WHERE m.expiration IS NOT NULL AND m.expiration > $now
    WITH d, collect(DISTINCT e.name) AS entities
    RETURN d.doc_uuid AS uuid, d.content AS content, entities
    ORDER BY d.timestamp ASC
    """
    return [
        {"uuid": r["uuid"], "content": r["content"], "entities": r["entities"]}
        for r in session.run(query, now=now)
    ]


def promote_to_long_term(session, doc_uuid):
    """Remove expiration ⇒ promote to long-term."""
    session.run(
        """
        MATCH (d:Document {doc_uuid:$uuid})-[m:MENTIONS]->()
        REMOVE m.expiration
        """,
        uuid=doc_uuid,
    )
    print(f"Promoted {doc_uuid} to long-term (expiration removed).")


def force_expire(session, doc_uuid, seconds_ago=2 * 24 * 60 * 60):
    """Force-expire by setting expiration to NOW - 2 days (default)."""
    past = time.time() - seconds_ago
    session.run(
        """
        MATCH (d:Document {doc_uuid:$uuid})-[m:MENTIONS]->()
        SET m.expiration = $past
        """,
        uuid=doc_uuid,
        past=past,
    )
    print(f"Forced expiration on {doc_uuid} (set to 2 days ago).")


def fetch_documents_by_entities(session, entity_texts, top_k=5):
    """
    Retrieve docs where MENTIONS are unexpired or permanent.
    """
    if not entity_texts:
        return []

    now = time.time()
    entity_list = [t.lower() for t in entity_texts]

    query = """
    MATCH (d:Document)-[m:MENTIONS]->(e:Entity)
    WHERE toLower(e.name) IN $entity_list
      AND (m.expiration IS NULL OR m.expiration > $now)
    WITH d, count(e) AS matches
    ORDER BY matches DESC
    LIMIT $topK
    RETURN d.doc_uuid AS uuid, d.content AS content, matches
    """

    return [
        {"uuid": r["uuid"], "content": r["content"], "matches": r["matches"]}
        for r in session.run(query, entity_list=entity_list, now=now, topK=top_k)
    ]


# ─── LLM / NLP Helpers ─────────────────────────────────────────────────────────
def extract_entities(text, nlp):
    doc = nlp(text)
    return [ent.text.strip() for ent in doc.ents if len(ent.text.strip()) >= 3]


def generate_answer(llm, question, context):
    prompt = f"""You are given the following context from multiple documents:
{context}

Question: {question}

Answer:"""
    res = llm(prompt, max_tokens=2048, temperature=0.2, stop=["Answer:"])
    return res["choices"][0]["text"].strip()


# ─── Main Workflow ─────────────────────────────────────────────────────────────
def main():
    print("=== Transfer & Query Demo (single memory with 'expire' option) ===")

    # Load NLP & LLM
    nlp = spacy.load("en_core_web_sm")
    print("Loading local LLaMA model...")
    llm = Llama(
        model_path=LLAMA_MODEL_PATH,
        n_ctx=32768,
        temperature=0.2,
        top_p=0.95,
        repeat_penalty=1.2,
        verbose=False,
    )

    # use_gpu=True,
    # n_gpu_layers=-1,      # offload *all* transformer layers to the GPU
    # n_threads=2,          # spawn enough CPU threads to feed the GPU
    # n_batch=256,          # process 256 tokens at once for throughput
    # f16_kv=True,          # store KV cache in half-precision on GPU

    driver = connect_neo4j()
    with driver.session() as session:
        # 1. Review unexpired short-term docs
        docs = find_documents_with_unexpired_mentions(session)
        if not docs:
            print("No unexpired short-term documents found.")
        else:
            for d in docs:
                print(f"\nDocUUID: {d['uuid']}")
                print(f"Content: {d['content']}")
                print(f"Entities: {d['entities']}")
                choice = input(
                    "Remove expiration (promote to long-term)? "
                    "(yes/no/expire): "
                ).strip().lower()

                if choice == "yes":
                    promote_to_long_term(session, d["uuid"])
                elif choice == "expire":
                    force_expire(session, d["uuid"])
                else:
                    print("Leaving document unchanged.")

        # 2. Run RAG queries for the TECH_CHECK questions
        for idx, fact in enumerate(TECH_CHECK, start=1):
            print(f"\n=== RAG Query Test for Fact #{idx} ===")
            question = f"What do we know related to: \"{fact}\"?"

            entity_texts = extract_entities(question, nlp)
            docs = fetch_documents_by_entities(session, entity_texts, top_k=5)
            if not docs:
                print("No documents found for this query.")
                continue

            # Build context
            combined_context = ""
            for doc in docs:
                snippet = doc["content"][:200].replace("\n", " ")
                combined_context += (
                    f"\n---\nDocUUID: {doc['uuid']}\nSnippet: {snippet}...\n"
                )

            answer = generate_answer(llm, question, combined_context)
            print(f"Question: {question}")
            print(f"Answer: {answer}")

    driver.close()

# run main
main()

print("\n\n")
print("Positive and Negative Reinforcement Complete!")

You should see only one of these new facts with documents found in the RAG Agent based on the selections you provided! The other will have been "forgotten".

Here's a curated checklist of critical considerations to keep this reinforcement-learning + RAG pipeline robust, performant, and maintainable:

1. **Expiration & Time Synchronization**

   * **Clock Skew**: All expiration logic relies on the host's system time. If your Neo4j server and application server clocks drift, short-term facts could expire prematurely (or hang around indefinitely). Consider NTP synchronization.
   * **Expiration Granularity**: Using `time.time()` (float seconds) vs. Neo4j's `timestamp()` (milliseconds) demands careful unit conversions - mismatches may filter out docs incorrectly.

2. **NER Coverage & Noise**

   * **Short Texts**: Very brief facts may yield no entities, so your RAG queries return empty context. You might want a fallback (e.g. keyword search) for "no-entity" cases.
   * **Entity Normalization**: spaCy may extract overlapping or partial entities ("OpenAI" vs. "OpenAI Inc."). Merging on `name` alone can fragment your graph. Consider lowercasing, stripping punctuation, or using a dedicated alias table.

3. **Session & Transaction Management**

   * **Batching vs. Per-Relationship Commits**: Each MERGE/CREATE currently runs in its own transaction. For bulk ingest or high throughput, wrap multiple operations in a single transaction to reduce overhead.
   * **Error Handling**: Uncaught exceptions (e.g., network blips, write conflicts) will crash the script. Consider try/except around sessions, with retries or circuit-breakers.

4. **Auditability & Data Retention**

   * **Historical Facts**: Once the 24h expiration lapses, mentions vanish from RAG queries. If you need to debug or audit, consider logging or archiving expired relationships rather than letting them slip away.
   * **Promotions**: If you ever remove expiration (promote to long-term), record that event (e.g., with a timestamp or provenance field) so you know why a document persisted.

5. **Edge Cases & Emergency Overrides**

    * **"No Entities Found"**: If a user question yields no entities, handle gracefully - perhaps by returning a canned message or falling back to "search all unexpired docs."
    * **Forced Expiration**: Be mindful that setting expiration two days in the past effectively hides docs forever. If that happens by accident, you'll need a manual override to restore them.

By keeping these points in view, you'll avoid the classic "it worked on my laptop" pitfalls and ensure your unified memory + RAG system remains reliable, performant, and secure in production.

### 4.2: Understanding How Reinforcement Learning Works: Vector vs Graph

Imagine teaching an AI system what's "important" and what's "forgettable," not unlike deciding which jokes survive at your next stand-up set. In RAG (Retrieval-Augmented Generation) pipelines, "reinforcement learning" isn't just about reward functions - it can also describe how our system **reinforces** or **expires** knowledge. Two dominant paradigms emerge:

1. **Vector-based memory** (think embedding indexes in FAISS or Pinecone).
2. **Graph-based memory** (our Neo4j + spaCy solution with expiring `MENTIONS` edges).

Let's dive into how each handles the "yes/no" decision process, and why the graph approach offers fine-grained control over your AI's short-term and long-term facts.

#### Vector-Based Memory: The Quick and Dirty Cache

* **Mechanism**: Every new fact is converted into a fixed-length embedding, then appended to a vector index.
* **Reinforcement Analogy**: Accept a fact? Push its vector. Reject it? Don't.
* **Expiry**: Often you must delete vectors manually or retrain your index to "forget." There's no native timestamp on individual embeddings - you're juggling IDs, rebuilds, and hoping you tracked them properly.
* **Pros**: Blazing-fast similarity searches; turnkey solutions in FAISS, Annoy, Pinecone.
* **Cons**: Coarse control - deleting or promoting facts is a blunt operation. No built-in audit trail.

> **Real-world quip**: It's like tossing important receipts into a shredder when they expire - you can do it, but you have to remember which bin you used.

#### Graph-Based Memory: Precision and Auditability

Our Neo4j + spaCy + llama-cpp setup brings structure to the party:

1. **Document Nodes** (`:Document`): Every stored fact - be it short-term or long-term - lives here.

2. **Entity Nodes** (`:Entity`): Extracted by spaCy NER, these anchor the "who," "what," and "where."

3. **MENTIONS Relationships** (`:Document`→`Entity`):

   * **`expiration` property** (Unix timestamp):

     * **Short-term**: `expiration = now + 24h`.
     * **Long-term**: `expiration = NULL`.

   * **Promote to Long-Term**: simply **remove** the `expiration` field.
   * **Force-Expire**: set `expiration` to two days ago, effectively hiding it from any future queries.

4. **RAG Queries**:

   * Match only `MENTIONS` where `expiration IS NULL` **or** `expiration > now`.
   * Build context snippets and feed them to LLaMA via llama-cpp.

#### Why This Matters

* **Granular Control**: Pin-point which facts you want to expire or promote without touching the nodes themselves.
* **Audit Trail**: Documents and entities remain intact - perfect for compliance or later analysis.
* **Unified Retrieval**: Single Cypher query handles both short- and long-term facts seamlessly.

# Workshop End!

Now that we understand how Reinforcement Learning works in relation to Graph-based RAG implementations, you are finished with this workshop.