# RAG Failure #14: The Temporal Sequence Scramble

## The Problem
LLMs have a weak sense of time. They suffer from **Recency Bias** (preferring the last thing read) or **Presentation Bias** (assuming the order of text in the prompt is the chronological order).

In RAG, the Retriever sends documents to the LLM sorted by **Relevance Score**, not by **Timestamp**. 
If you ask "What caused the server crash?", RAG will send the "Server Crash" log first (because it matches the query best), and the "Root Cause" log last. The LLM reads the effect *before* the cause and often fails to reconstruct the timeline correctly.

## The Scenario: Cyber Security Incident Response
**Query:** "What was the immediate trigger event that occurred right before the **Main Database Encryption**?"

**The Log Data (The Attack Chain):**
1.  **08:00 AM:** Phishing Email clicked by Admin.
2.  **08:15 AM:** Malware 'Silent-Bot' installed in background.
3.  **08:45 AM:** Privileged Access Escalation detected.
4.  **09:00 AM (Target):** **Main Database Encryption** initiated by attacker.
5.  **09:30 AM:** Ransom Note displayed on screens.

**Naive RAG Failure (Full Context):** 
The Retriever ranks "Ransom Note" and "Database Encryption" at the top. It puts "Phishing Email" at the bottom. 
The LLM reads the Ransom Note first and often says: *"The Ransom Note triggered the Encryption."* (Reversing cause and effect).

**KG Solution:** We build a **Time-Series Graph**. Nodes are events. Edges are `NEXT_EVENT`. We find the "Encryption" node and look backwards (`predecessors`).

In [None]:
# --- Step 1: Environment Setup ---
!pip install -q langchain langchain-community langchain-huggingface faiss-cpu networkx transformers sentence-transformers accelerate bitsandbytes dateparser

In [None]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from langchain_huggingface import HuggingFacePipeline, HuggingFaceEmbeddings
import networkx as nx
import dateparser

# --- Step 2: Load Model ---
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

print(f"Loading {model_id}...")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=256, 
    temperature=0.1, 
    do_sample=True
)

llm = HuggingFacePipeline(pipeline=pipe)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
print("Model loaded. Pipeline ready.")

Loading TinyLlama-1.1B-Chat-v1.0...
Model loaded. Pipeline ready.


In [None]:
from langchain.docstore.document import Document

# --- Step 3: Simulate The Attack Chain ---
# Each log has a strict timestamp.
raw_texts = [
    "[08:00:00] SECURITY ALERT: User 'Admin' clicked a suspicious link in Subject: 'Invoice Update'. Source IP: 192.168.1.5.",
    "[08:15:00] SYSTEM PROCESS: Unknown process 'Silent-Bot.exe' started in background. CPU usage nominal.",
    "[08:45:00] AUTH LOG: User 'Admin' privileges escalated to ROOT via CVE-2024-99 exploit.",
    "[09:00:00] DATABASE LOG: Main Database Encryption sequence initiated. AES-256 keys generated. Writing to disk.",
    "[09:30:00] UI ALERT: Desktop wallpaper changed to 'Ransom_Note.png'. Message: 'Pay 5 BTC'."
]

docs = [Document(page_content=t) for t in raw_texts]
print(f"Created {len(docs)} Timestamped Logs.")
for i, d in enumerate(docs):
    print(f"Doc {i+1}: {d.page_content}")

Created 5 Timestamped Logs.
Doc 1: [08:00:00] SECURITY ALERT: User 'Admin' clicked a suspicious link in Subject: 'Invoice Update'. Source IP: 192.168.1.5.
Doc 2: [08:15:00] SYSTEM PROCESS: Unknown process 'Silent-Bot.exe' started in background. CPU usage nominal.
Doc 3: [08:45:00] AUTH LOG: User 'Admin' privileges escalated to ROOT via CVE-2024-99 exploit.
Doc 4: [09:00:00] DATABASE LOG: Main Database Encryption sequence initiated. AES-256 keys generated. Writing to disk.
Doc 5: [09:30:00] UI ALERT: Desktop wallpaper changed to 'Ransom_Note.png'. Message: 'Pay 5 BTC'.


In [None]:
# --- Step 4: Naive RAG (Full Retrieval) ---
from langchain_community.vectorstores import FAISS

print("\n--- NAIVE RAG (Full Context / Jumbled Order) ---")
query = "What was the immediate trigger event that occurred right before the Main Database Encryption?"
print(f"Query: {query}")

# 1. Indexing
vectorstore = FAISS.from_documents(docs, embeddings)

# 2. Retrieval
# CRITICAL: Vector Stores return results ranked by SIMILARITY score.
# The "Ransom Note" contains the word "Message" and "Pay", which might be semantically linked to "Trigger".
# The "Phishing Email" is semantically very far from "Database Encryption".
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
retrieved_docs = retriever.invoke(query)

print("\nRetrieved Context (k=5 - Sorted by Semantic Relevance, NOT Time):")
context_str = ""
for i, d in enumerate(retrieved_docs):
    print(f"{i+1}. {d.page_content}")
    context_str += d.page_content + "\n"

# 3. Generation
prompt = f"<|system|>\nAnswer the question based on the context.\n<|user|>\nContext:\n{context_str}\nQuestion:\n{query}\n<|assistant|>"
response = llm.invoke(prompt)
cleaned_response = response.split("<|assistant|>")[-1].strip()

print("\nLLM Answer:")
print(cleaned_response)


--- NAIVE RAG (Full Context / Jumbled Order) ---
Query: What was the immediate trigger event that occurred right before the Main Database Encryption?

Retrieved Context (k=5 - Sorted by Semantic Relevance, NOT Time):
1. [09:00:00] DATABASE LOG: Main Database Encryption sequence initiated. AES-256 keys generated. Writing to disk.
2. [09:30:00] UI ALERT: Desktop wallpaper changed to 'Ransom_Note.png'. Message: 'Pay 5 BTC'.
3. [08:45:00] AUTH LOG: User 'Admin' privileges escalated to ROOT via CVE-2024-99 exploit.
4. [08:00:00] SECURITY ALERT: User 'Admin' clicked a suspicious link in Subject: 'Invoice Update'. Source IP: 192.168.1.5.
5. [08:15:00] SYSTEM PROCESS: Unknown process 'Silent-Bot.exe' started in background. CPU usage nominal.

LLM Answer:
Based on the context, the immediate trigger event right before the Main Database Encryption (09:00:00) was the UI ALERT at 09:30:00 where the wallpaper changed to 'Ransom_Note.png'.

ANALYSIS:
FAILURE. The Retriever ordered the documents by '

In [None]:
# --- Step 5: Event Sourcing (Graph Construction) ---
# We extract events and timestamps. 
# Then we use PYTHON to sort them (Reliable). We do not let the LLM guess the order.

kg = nx.DiGraph()
events_buffer = []

def extract_event(text):
    """
    Extracts concise Event Name and Time.
    """
    prompt = f"""<|system|>
    Extract the Event description and the Time.
    Format: Event_Name | Time
    <|user|>
    Text: {text}
    <|assistant|>"""
    
    raw = llm.invoke(prompt)
    out = raw.split("<|assistant|>")[-1].strip()
    if "|" in out:
        return [p.strip() for p in out.split("|")]
    return []

print("\n--- EVENT SOURCING PIPELINE ---")

for doc in docs:
    print(f"\nProcessing: {doc.page_content}")
    parts = extract_event(doc.page_content)
    
    if len(parts) >= 2:
        name, time_str = parts[0], parts[1]
        print(f"   [Extracted]: {name} | {time_str}")
        
        # Use Dateparser to get a real sortable datetime object
        dt = dateparser.parse(time_str)
        if dt:
            events_buffer.append({"name": name, "time": dt})

# CRITICAL STEP: Programmatic Sorting
sorted_events = sorted(events_buffer, key=lambda x: x['time'])

print("\nBuilding Causal Chain (Sorted by Time)...")
for i in range(len(sorted_events) - 1):
    curr = sorted_events[i]['name']
    next_e = sorted_events[i+1]['name']
    
    kg.add_edge(curr, next_e, relation="NEXT")
    print(f"   ({curr}) --[NEXT]-> ({next_e})")


--- EVENT SOURCING PIPELINE ---

Processing: [08:00:00] SECURITY ALERT: User 'Admin' clicked a suspicious link in Subject: 'Invoice Update'. Source IP: 192.168.1.5.
   [Extracted]: Admin Clicked Phishing Link | 08:00:00

Processing: [08:15:00] SYSTEM PROCESS: Unknown process 'Silent-Bot.exe' started in background. CPU usage nominal.
   [Extracted]: Silent-Bot Process Started | 08:15:00

Processing: [08:45:00] AUTH LOG: User 'Admin' privileges escalated to ROOT via CVE-2024-99 exploit.
   [Extracted]: Admin Escalated to ROOT | 08:45:00

Processing: [09:00:00] DATABASE LOG: Main Database Encryption sequence initiated. AES-256 keys generated. Writing to disk.
   [Extracted]: Main Database Encryption | 09:00:00

Processing: [09:30:00] UI ALERT: Desktop wallpaper changed to 'Ransom_Note.png'. Message: 'Pay 5 BTC'.
   [Extracted]: Ransom Note Displayed | 09:30:00

Building Causal Chain (Sorted by Time)...
   (Admin Clicked Phishing Link) --[NEXT]-> (Silent-Bot Process Started)
   (Silent-Bo

In [None]:
# --- Step 6: The Solution (Predecessor Traversal) ---
# We find the target event and look BACKWARDS in the graph.

print("\n--- GRAPH PREDECESSOR TRAVERSAL ---")
print(f"Query: \"{query}\"")

def find_trigger_event(target_keyword):
    # 1. Fuzzy Match Target Node
    print("\n1. Locating Target Event in Graph...")
    target_node = None
    for node in kg.nodes():
        if target_keyword in node:
            target_node = node
            break
            
    if not target_node:
        return "Target event not found in logs."
    print(f"   Target identified: '{target_node}'")
    
    # 2. Get Predecessors (Reverse Traversal)
    print("\n2. Performing Reverse Look-up (Finding Predecessors)...")
    predecessors = list(kg.predecessors(target_node))
    
    if not predecessors:
        return "This appears to be the first event (Root Cause)."
    
    # In a linked list, there is only 1 predecessor
    trigger = predecessors[0]
    print(f"   Found Predecessor: '{trigger}'")
    
    return f"The event that immediately triggered the {target_node} was: {trigger}."

# We search for "Encryption"
final_answer = find_trigger_event("Encryption")

print(f"\nFinal Answer (Generated from Graph Logic):\n{final_answer}")


--- GRAPH PREDECESSOR TRAVERSAL ---
Query: "What was the immediate trigger event that occurred right before the Main Database Encryption?"

1. Locating Target Event in Graph...
   Target identified: 'Main Database Encryption'

2. Performing Reverse Look-up (Finding Predecessors)...
   Found Predecessor: 'Admin Escalated to ROOT'

Final Answer (Generated from Graph Logic):
The event that immediately triggered the Main Database Encryption was: Admin Escalated to ROOT.
