# 🕯️ Candlekeep — Dungeon Master (DM) Kit Notebook for Forgotten Realms using LangChain

This is a notebook to run the end‑to‑end demo for the Candlekeep DM Kit Forgotten Realms helper.
- Contains a local gazetteer + BM25
- RAG ingestion (web → chunks → embeddings → Chroma) with citations
- SQLite rules DB for items, spells, and monsters
- DM-prep help for the quest, NPC, encounter, and treasure/rewards

## Install dependencies (as needed)

In [2]:
pip install keras

Collecting keras
  Downloading keras-3.11.3-py3-none-any.whl.metadata (5.9 kB)
Downloading keras-3.11.3-py3-none-any.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: keras
Successfully installed keras-3.11.3
Note: you may need to restart the kernel to use updated packages.


In [22]:
pip -q install langchain langchain-community chromadb sentence-transformers beautifulsoup4 lxml requests rank-bm25 pandas tabulate pydantic rich sqlite-utils

Note: you may need to restart the kernel to use updated packages.


## RAG URL Allow List

In [80]:
USE_VECTOR_RAG = True
SAFE_URLS = [
    'https://en.wikipedia.org/wiki/Waterdeep',
    'https://en.wikipedia.org/wiki/Baldur%27s_Gate',
    'https://en.wikipedia.org/wiki/Forgotten_Realms',
    'https://forgottenrealms.fandom.com/wiki/Harpers',
    'https://forgottenrealms.fandom.com/wiki/History_of_Waterdeep',
    'https://www.dndbeyond.com/classes?srsltid=AfmBOor2dDwPFSfeOnvG8YlbwNsadKZ7CSOT9gGg8Z6JwZgMPzXSCseZ',
]
print('Vector RAG URLs:', len(SAFE_URLS))

Vector RAG URLs: 5


## Imports and Path Configuration

In [81]:
import os, sqlite3, textwrap, random, json, shutil
from pathlib import Path
from typing import List, Dict, Any

import pandas as pd
from tabulate import tabulate
from rich import print as rprint
from rich.panel import Panel

from rank_bm25 import BM25Okapi

try:
    from langchain_community.document_loaders import WebBaseLoader
    from langchain_text_splitters import RecursiveCharacterTextSplitter
    from langchain_community.vectorstores import Chroma
    from langchain_community.embeddings import HuggingFaceEmbeddings
    from langchain_community.retrievers import BM25Retriever as LCBM25
    LC_OK = True
except Exception:
    WebBaseLoader = None
    RecursiveCharacterTextSplitter = None
    Chroma = None
    HuggingFaceEmbeddings = None
    LCBM25 = None
    LC_OK = False

from rich.console import Console
from rich.markdown import Markdown
from rich.panel import Panel
from rich.text import Text

console = Console()

BASE = Path.cwd()
DATA_DIR = BASE / 'data'
LORE_DIR = DATA_DIR / 'lore'
RULES_DIR = DATA_DIR / 'rules'
DB_DIR = BASE / 'db'
INDEX_DIR = BASE / 'index' / 'chroma'
for p in (DB_DIR, DATA_DIR, LORE_DIR, RULES_DIR, INDEX_DIR.parent):
    p.mkdir(parents=True, exist_ok=True)

SQLITE_PATH = DB_DIR / 'fr_rules.sqlite'
random.seed(42)
rprint(Panel.fit(f'Data dir: {DATA_DIR}\nDB: {SQLITE_PATH}\nIndex dir: {INDEX_DIR}', title='Paths'))

## Initial Lore Corpus

In [82]:
GAZETTEER = {
  'factions_harpers.md': (
    '# The Harpers (14th Century DR)\n'
    'The Harpers are a semi-secret network devoted to promoting good, preserving lore, and maintaining balance.\n'
    'They prefer subtle influence over open rule and often work through bards, sages, and sympathetic officials.\n'
    'Their aims frequently put them at odds with the Zhentarim. In cities such as Waterdeep and Baldur\'s Gate,\n'
    'Harper agents cultivate informants and steer events away from tyranny.'
  ),
  'factions_zhentarim.md': (
    '# The Zhentarim (Black Network)\n'
    'A ruthless mercantile syndicate that seeks profit and power by controlling trade routes and security.\n'
    'They hire mercenaries, corrupt officials, and smugglers to expand influence. The Harpers often oppose their methods.'
  ),
  'city_waterdeep.md': (
    '# Waterdeep, the City of Splendors\n'
    'A metropolis of trade, politics, and masked lords. Competing factions—Harpers, Lords\' Alliance, and the Zhentarim—\n'
    'vie for advantage beneath the watchful City Guard. Heists and covert negotiations are common.'
  ),
  'city_baldurs_gate.md': (
    '# Baldur\'s Gate\n'
    'A bustling port rife with merchant intrigue and smuggler dens. Law exists but influence and coin shape outcomes.\n'
    'Factions operate from warehouses and taverns along the docks; secrets travel as quickly as ships.'
  ),
  'regions_sword_coast.md': (
    '# Sword Coast Overview\n'
    'City-states like Neverwinter, Waterdeep, and Baldur\'s Gate anchor trade along treacherous roads and sea lanes.\n'
    'Caravans, guilds, and clandestine groups compete for contracts, while adventurers solve problems others cannot.'
  )
}
for fn, content in GAZETTEER.items():
    p = LORE_DIR / fn
    if not p.exists():
        p.write_text(content, encoding='utf-8')

docs_texts, metas = [], []
for path in LORE_DIR.glob('*.md'):
    docs_texts.append(path.read_text(encoding='utf-8'))
    metas.append({'source': path.name})
rprint(Panel.fit(f'Local lore files: {len(docs_texts)}', title='Gazetteer'))

## Build RAG Lore Retriever

In [83]:
retriever_mode = 'BM25'
bm25_fallback = None
vector_vs = None
vector_retriever = None

def build_bm25():
    global bm25_fallback, retriever_mode
    try:
        if LC_OK and LCBM25 is not None:
            bm25 = LCBM25.from_texts(docs_texts, metadatas=metas)
            bm25.k = 5
            retriever_mode = 'LC_BM25'
            return bm25
    except Exception:
        pass
    # pure rank_bm25 fallback
    from rank_bm25 import BM25Okapi
    bm25_fallback = BM25Okapi([t.split() for t in docs_texts])
    retriever_mode = 'BM25'
    return None

bm25_retriever = None

def _build_vectorstore_persistent(web_chunks, local_chunks):
    """Try to build a persistent Chroma store under INDEX_DIR."""
    from chromadb.config import Settings
    emb = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
    if INDEX_DIR.exists():
        shutil.rmtree(INDEX_DIR, ignore_errors=True)
    INDEX_DIR.mkdir(parents=True, exist_ok=True)
    client_settings = Settings(
        anonymized_telemetry=False,
        is_persistent=True,
        persist_directory=str(INDEX_DIR)
    )
    vs = Chroma.from_documents(
        documents=(web_chunks + local_chunks),
        embedding=emb,
        persist_directory=str(INDEX_DIR),
        client_settings=client_settings,
    )
    vs.persist()
    return vs

def _build_vectorstore_memory(web_chunks, local_chunks):
    """Fallback: in-memory Chroma (no disk writes)."""
    emb = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
    return Chroma.from_documents(documents=(web_chunks + local_chunks), embedding=emb)

if USE_VECTOR_RAG and LC_OK and WebBaseLoader and RecursiveCharacterTextSplitter and Chroma and HuggingFaceEmbeddings:
    try:
        # 1) Load web docs
        loader = WebBaseLoader(web_paths=SAFE_URLS)
        raw_docs = loader.load()
        splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=120)
        web_chunks = splitter.split_documents(raw_docs)
        for d in web_chunks:
            d.metadata["source"] = (
                d.metadata.get("source_url")
                or d.metadata.get("source")
                or d.metadata.get("title")
                or "web"
            )

        # 2) Split local gazetteer
        local_chunks = []
        for i, text in enumerate(docs_texts):
            parts = splitter.split_text(text)
            src = metas[i]['source']
            from langchain_core.documents import Document
            local_chunks.extend([Document(page_content=ch, metadata={'source': src}) for ch in parts])

        # 3) Try persistent first
        try:
            vector_vs = _build_vectorstore_persistent(web_chunks, local_chunks)
            retriever_mode = 'VECTOR_RAG'
            rprint(Panel.fit('Built Chroma index (persistent) with web + local lore.', title='RAG Ready'))
        except Exception as e_persist:
            rprint(Panel.fit(f'Persistent index failed ({e_persist}); retrying in-memory.', title='RAG Warning'))
            # 4) Fallback: in-memory
            vector_vs = _build_vectorstore_memory(web_chunks, local_chunks)
            retriever_mode = 'VECTOR_RAG_MEM'
            rprint(Panel.fit('Built Chroma index (in-memory) with web + local lore.', title='RAG Ready (Memory)'))

        # Prepare a retriever with MMR if available
        try:
            vector_retriever = vector_vs.as_retriever(
                search_type="mmr",
                search_kwargs={"k": 5, "fetch_k": 25}
            )
        except Exception:
            vector_retriever = None

    except Exception as e:
        rprint(Panel.fit(f'RAG ingest failed ({e}); falling back to BM25.', title='RAG Error'))
        bm25_retriever = build_bm25()
else:
    bm25_retriever = build_bm25()

def lore_search(query: str) -> List[Dict[str, str]]:
    """Unified lore search across modes; tries vectorstore APIs that are stable across LC versions."""
    # VECTOR RAG (persistent or in-memory)
    if retriever_mode in ('VECTOR_RAG', 'VECTOR_RAG_MEM') and vector_vs is not None:
        try:
            if hasattr(vector_vs, "max_marginal_relevance_search"):
                docs = vector_vs.max_marginal_relevance_search(query, k=5, fetch_k=25)
            elif hasattr(vector_vs, "similarity_search"):
                docs = vector_vs.similarity_search(query, k=5)
            elif vector_retriever is not None and hasattr(vector_retriever, "invoke"):
                docs = vector_retriever.invoke(query)
            elif vector_retriever is not None and hasattr(vector_retriever, "get_relevant_documents"):
                docs = vector_retriever.get_relevant_documents(query)
            else:
                docs = []

            out = [
                {"rank": i+1, "source": d.metadata.get("source", "unknown"), "text": d.page_content.strip()}
                for i, d in enumerate(docs)
            ]

            # ensure at least one URL appears if available
            if not any(str(x["source"]).startswith("http") for x in out):
                if hasattr(vector_vs, "similarity_search_with_score"):
                    scored = vector_vs.similarity_search_with_score(query, k=20)
                    web_hits = [(d, s) for (d, s) in scored if str(d.metadata.get("source","")).startswith("http")]
                    if web_hits:
                        best_web, _ = sorted(web_hits, key=lambda t: t[1])[:1][0]
                        if out:
                            out[-1] = {"rank": out[-1]["rank"],
                                       "source": best_web.metadata.get("source","unknown"),
                                       "text": best_web.page_content.strip()}
                        else:
                            out = [{"rank": 1,
                                    "source": best_web.metadata.get("source","unknown"),
                                    "text": best_web.page_content.strip()}]
            return out
        except Exception:
            pass  # fall through to BM25

    # LC BM25 retriever
    if retriever_mode == 'LC_BM25' and bm25_retriever is not None:
        if hasattr(bm25_retriever, 'get_relevant_documents'):
            docs = bm25_retriever.get_relevant_documents(query)
        elif hasattr(bm25_retriever, 'invoke'):
            docs = bm25_retriever.invoke(query)
        else:
            docs = []
        return [
            {'rank': i+1,
             'source': getattr(d, 'metadata', {}).get('source', 'unknown') if hasattr(d, 'metadata') else 'unknown',
             'text': (getattr(d, 'page_content', str(d)) or '').strip()}
            for i, d in enumerate(docs)
        ]

    # Pure BM25 fallback
    from rank_bm25 import BM25Okapi
    global bm25_fallback
    if bm25_fallback is None:
        bm25_fallback = BM25Okapi([t.split() for t in docs_texts])
    scores = bm25_fallback.get_scores(query.split())
    ranked = sorted(list(enumerate(scores)), key=lambda x: x[1], reverse=True)[:5]
    return [{'rank': i+1, 'source': metas[idx]['source'], 'text': docs_texts[idx].strip()} for i, (idx, _) in enumerate(ranked)]

rprint(Panel.fit(f'Retriever mode: {retriever_mode}', title='Lore Search Mode'))

## Bundled Sample of Rules SQLite Data

In [84]:
items = pd.DataFrame([
    {'name':'Potion of Healing','type':'potion','rarity':'common','weight':0.5,'cost':50,'text':'Regain 2d4+2 HP when drunk.'},
    {'name':'Cloak of Elvenkind','type':'wondrous','rarity':'uncommon','weight':1.0,'cost':0,'text':'Advantage on Stealth checks to hide.'},
    {'name':'Shortsword','type':'weapon','rarity':'mundane','weight':2.0,'cost':10,'text':'1d6 piercing, finesse, light.'},
    {'name':'Light Crossbow','type':'weapon','rarity':'mundane','weight':5.0,'cost':25,'text':'1d8 piercing, loading, two-handed.'},
])
spells = pd.DataFrame([
    {'name':'Detect Magic','level':1,'school':'divination','classes':'Bard, Cleric, Wizard','casting_time':'1 action','range':'Self','duration':'10 minutes','components':'V,S','text':'Sense presence of magic within 30 feet.'},
    {'name':'Silence','level':2,'school':'illusion','classes':'Bard, Cleric','casting_time':'1 action','range':'120 feet','duration':'10 minutes','components':'V,S','text':'No sound can be created within or pass through a 20-foot-radius sphere.'},
    {'name':'Pass without Trace','level':2,'school':'abjuration','classes':'Druid, Ranger','casting_time':'1 action','range':'Self','duration':'1 hour','components':'V,S, M','text':'+10 bonus to Stealth checks to you and companions.'},
])
monsters = pd.DataFrame([
    {'name':'Bandit','type':'humanoid','size':'Medium','alignment':'any non-good','ac':12,'hp':11,'speed':'30 ft.','cr':0.125,'text':'Thug working for a gang or syndicate.'},
    {'name':'Guard','type':'humanoid','size':'Medium','alignment':'any','ac':16,'hp':11,'speed':'30 ft.','cr':0.125,'text':'City or caravan guard.'},
    {'name':'Spy','type':'humanoid','size':'Medium','alignment':'any','ac':12,'hp':27,'speed':'30 ft.','cr':1.0,'text':'Agent skilled in deception and stealth.'},
    {'name':'Veteran','type':'humanoid','size':'Medium','alignment':'any','ac':17,'hp':58,'speed':'30 ft.','cr':3.0,'text':'Seasoned warrior, often an officer.'},
])

if SQLITE_PATH.exists():
    SQLITE_PATH.unlink()
conn = sqlite3.connect(SQLITE_PATH)
items.to_sql('items', conn, index=False)
spells.to_sql('spells', conn, index=False)
monsters.to_sql('monsters', conn, index=False)
conn.execute('CREATE INDEX IF NOT EXISTS idx_items_name ON items(name);')
conn.execute('CREATE INDEX IF NOT EXISTS idx_spells_level ON spells(level);')
conn.execute('CREATE INDEX IF NOT EXISTS idx_monsters_cr ON monsters(cr);')
conn.commit(); conn.close()
rprint(Panel.fit('SQLite rules DB built (items, spells, monsters).', title='Rules DB'))

## Utilities

In [85]:
def print_lore_results(passages, max_citations=2):

    if not passages:
        console.print("[bold red]No passages retrieved.[/bold red]")
        return

    # Limit citations
    top_passages = passages[:max_citations]

    # Format the main answer and citations
    console.print("\n[bold underline white]Answer[/bold underline white]\n", justify="center")

    answer_md = Markdown(top_passages[0]['text'])
    console.print(Panel.fit(answer_md, title="[bold yellow]Synthesized Summary[/bold yellow]", border_style="bright_yellow"))

    console.print("\n[dim underline]Top Citations[/dim underline]", justify="center")
    for i, p in enumerate(top_passages, 1):
        src = p["source"]
        snippet = Text(p["text"].strip(), style="dim")
        citation_panel = Panel(
            snippet,
            title=f"[dim]Citation {i}[/dim] — [link={src}]{src}[/link]",
            border_style="grey37",
            expand=False,
            padding=(0, 1),
        )
        console.print(citation_panel)

def summarize_passages(passages: List[Dict[str,str]], max_sents: int = 3) -> str:
    import re
    sents = []
    for p in passages:
        parts = re.split(r"(?<=[.!?])\s+", p['text'].strip())
        if parts:
            sents.append(parts[0])
    return ' '.join(sents[:max_sents])

def run_sql(query: str) -> pd.DataFrame:
    with sqlite3.connect(SQLITE_PATH) as c:
        return pd.read_sql_query(query, c)

def table(df: pd.DataFrame) -> str:
    return tabulate(df, headers='keys', tablefmt='github', showindex=False)


## Preset DM Templates

In [86]:
def make_quest(city: str, faction: str, tier: str, citations: List[int]) -> Dict[str, Any]:
    return {
        'title': f'Shadows at the Docks of {city}',
        'tier': tier,
        'premise': f'The {faction} seek discreet help to disrupt a smuggling ring undermining fair trade.',
        'beats': [
            'Meet the contact in a busy tavern and learn recent incidents.',
            'Shadow suspected warehouses along the waterfront.',
            'Expose the ring-leader and secure incriminating manifests.'
        ],
        'complications': [
            'The City Watch is on edge; too much noise draws attention.',
            'A rival fixer tries to hire the party for a double-cross.'
        ],
        'reward': '100 gp each, a minor magical trinket, and faction goodwill.',
        'citations': citations
    }

def make_npc(name: str, role: str, faction: str, citations: List[int]) -> Dict[str, Any]:
    return {
        'name': name,
        'role': role,
        'faction': faction,
        'motivation': 'Keep trade fair and people safe without starting a war.',
    'secret': 'Previously smuggled information for leverage; fears exposure.',
        'quirks': 'Hums old harbor shanties when thinking.',
        'stat_suggestion': 'Spy (MM) or Guard, depending on tone.',
        'citations': citations
    }

def make_encounter(env: str, cr_min: float, cr_max: float, limit: int = 5) -> Dict[str, Any]:
    df = run_sql(
        f"""
        SELECT name, type, ac, hp, cr 
        FROM monsters
        WHERE cr >= {cr_min} AND cr <= {cr_max} AND type='humanoid'
        LIMIT {limit};
        """
    )
    enemies = df.to_dict(orient='records')
    return {
        'environment': env,
        'enemies': enemies,
        'tactics': 'Strike from cover, withdraw to alleys; one scout runs to warn a veteran.',
        'scaling_notes': 'Add a Veteran at high end; remove a Spy at low end.'
    }

def make_treasure(rarity: str = 'uncommon', limit: int = 4) -> Dict[str, Any]:
    df = run_sql(
        f"""
        SELECT name, type, rarity, text 
        FROM items 
        WHERE rarity='{rarity}' OR rarity='common'
        LIMIT {limit};
        """
    )
    return {'hoard': df.to_dict(orient='records')}

## Demo of Lore Q&A

In [77]:
question = "What is the Harper stance toward the Zhentarim in Waterdeep, and how might it influence a heist?"
rprint(Panel.fit(question, title="[bold cyan]User Question[/bold cyan]"))

passages = lore_search(question)
summary = summarize_passages(passages, max_sents=3)

rprint(Panel.fit(summary, title="[bold green]Answer[/bold green]"))
print_lore_results(passages, max_citations=2)

In [91]:
question = "Tell me about player classes in Dungeons and Dragons?"
rprint(Panel.fit(question, title="[bold cyan]User Question[/bold cyan]"))

passages = lore_search(question)
summary = summarize_passages(passages, max_sents=3)

rprint(Panel.fit(summary, title="[bold green]Answer[/bold green]"))
print_lore_results(passages, max_citations=2)

In [92]:
question = "What are some Faerun key locations?"
rprint(Panel.fit(question, title="[bold cyan]User Question[/bold cyan]"))

passages = lore_search(question)
summary = summarize_passages(passages, max_sents=3)

rprint(Panel.fit(summary, title="[bold green]Answer[/bold green]"))
print_lore_results(passages, max_citations=2)

## Demo of Rules Lookup

In [62]:
df_rules = run_sql(
    """
    SELECT name, type, ac, hp, cr
    FROM monsters
    WHERE type='humanoid' AND cr >= 1.0 AND cr <= 3.0
    ORDER BY cr ASC
    LIMIT 8;
    """
)
rprint(Panel.fit(table(df_rules), title='Rules Table'))


## Demo of DM Prep

In [63]:
city = 'Baldur\'s Gate'
faction = 'Harpers'
tier = 'Tier 1 (Levels 1–4)'
cites = [p['rank'] for p in passages[:2]] if passages else [1]
quest = make_quest(city, faction, tier, citations=cites)
npc = make_npc('Mirael Thornquill', 'Harbor informant', faction, citations=cites)
encounter = make_encounter(env='Foggy docks at midnight', cr_min=1.0, cr_max=3.0)
treasure = make_treasure(rarity='uncommon', limit=4)
rprint(Panel.fit(json.dumps(quest, indent=2), title='Quest'))
rprint(Panel.fit(json.dumps(npc, indent=2), title='NPC'))
rprint(Panel.fit(json.dumps(encounter, indent=2), title='Encounter'))
rprint(Panel.fit(json.dumps(treasure, indent=2), title='Treasure'))
