## Datasets
- **Best Books Ever**: Book metadata with descriptions (Zenodo).
- **Fantasy-Name-Generator**: Names from WoW, LOTR, etc. (GitHub).
- **Game of Thrones**: Locations and regions (GitHub).
- **CharacterCodex**: Character descriptions (Hugging Face).
- **Dungeons & Dragons**: D&D character data (Kaggle).
- **Reddit D&D Characters**: 2,775+ D&D character profiles (GitHub).
- **Fantasy Locations**: Fictional location names (GitHub).

In [48]:
import pandas as pd
import requests
import json

# Best Books Ever Dataset
# Download from Zenodo, filter for fantasy, and extract descriptions.


url_bbe = (
    "https://zenodo.org/records/4265096/files/books_1.Best_Books_Ever.csv?download=1"
)
try:
    response_bbe = requests.get(url_bbe)
    with open("data/best_books_ever.csv", "wb") as file:
        file.write(response_bbe.content)
    df_bbe = pd.read_csv("data/best_books_ever.csv")
    df_bbe.drop_duplicates(inplace=True)
    df_bbe = df_bbe[df_bbe["genres"].str.contains("fantasy", case=False, na=False)]
    df_bbe = df_bbe.dropna(subset=["description"])
    bbe_descriptions = df_bbe["description"].tolist()
    print(f"Best Books Ever: {len(bbe_descriptions)} descriptions")
    print(df_bbe[["title", "description"]].head())
except Exception as e:
    bbe_descriptions = []
    print(f"Error with Best Books Ever: {e}")

Best Books Ever: 15200 descriptions
                                       title  \
0                           The Hunger Games   
1  Harry Potter and the Order of the Phoenix   
4                                   Twilight   
6                                Animal Farm   
7                   The Chronicles of Narnia   

                                         description  
0  WINNING MEANS FAME AND FORTUNE.LOSING MEANS CE...  
1  There is a door at the end of a silent corrido...  
4  About three things I was absolutely positive.\...  
6  Librarian's note: There is an Alternate Cover ...  
7  Journeys to the end of the world, fantastic cr...  


In [49]:
# Fantasy-Name-Generator Dataset
# Download names from GitHub for characters and entities.

# %%
url_names = [
    "https://raw.githubusercontent.com/borhanMorphy/fantasy-name-generator/refs/heads/main/data/lotr_names.txt"
]

fantasy_names = []

for url in url_names:
    try:
        response_names = requests.get(url)
        with open(f"{url[url.rfind('/') :]}", "wb") as file:
            file.write(response_names.content)
        with open(f"{url[url.rfind('/') :]}", "r", encoding="utf-8") as file:
            [fantasy_names.append(line.strip()) for line in file if line.strip()]
        print(f"Fantasy-Name-Generator: {len(fantasy_names)} names")
        print(fantasy_names[:5])
    except Exception as e:
        fantasy_names = []
        print(f"Error with Fantasy Names: {e}")

Fantasy-Name-Generator: 840 names
['Adalbert Bolger', 'Adaldrida Bolger', 'Adalgar Bolger', 'Adalgrim Took', 'Adamanta Chubb']


In [50]:
# Game of Thrones Locations
# Download locations.json from GitHub for fantasy locations.


url_got = "https://raw.githubusercontent.com/jeffreylancaster/game-of-thrones/master/data/locations.json"
try:
    response_got = requests.get(url_got)
    with open("data/got_locations.json", "wb") as file:
        file.write(response_got.content)
    with open("data/got_locations.json", "r") as file:
        got_data = json.load(file)
    df_got = pd.DataFrame(got_data.get("regions", []))
    got_locations = df_got["location"].dropna().tolist()
    print(f"Game of Thrones: {len(got_locations)} locations")
    print(df_got.head())
except Exception as e:
    got_locations = []
    print(f"Error with Game of Thrones: {e}")

Game of Thrones: 26 locations
            location                                        subLocation
0  North of the Wall  [The Lands of Always Winter, Cave Outside Wild...
1           The Wall  [Nightfort, Eastwatch, Top of the Wall, Castle...
2          The North  [The Gift, Last Hearth, Bear Island, North to ...
3  The Shivering Sea                                                 []
4           The Vale  [Runestone, The Eyrie, To The Eyrie, Eastern R...


In [51]:
# CharacterCodex Dataset
# Download from Hugging Face for character descriptions.

url_cc = "https://huggingface.co/datasets/NousResearch/CharacterCodex/resolve/main/character_codex.json"
try:
    response_cc = requests.get(url_cc)
    with open("data/character_codex.json", "wb") as file:
        file.write(response_cc.content)

    with open("data/character_codex.json", "r", encoding="utf-8") as file:
        entry = json.loads(file.read())
    print(f"CharacterCodex: {len(entry)} descriptions")
    [print(x) for x in entry[:5]]
except Exception as e:
    print(f"Error with CharacterCodex: {e}")

CharacterCodex: 15939 descriptions
{'media_type': 'Webcomics', 'genre': 'Fantasy Webcomics', 'character_name': 'Alana', 'media_source': 'Saga', 'description': 'Alana is one of the main characters from the webcomic "Saga." She is a strong-willed and fiercely protective mother who is on the run with her family in a war-torn galaxy. The story blends elements of fantasy and science fiction, creating a rich and complex narrative.', 'scenario': "You are a fellow traveler in the galaxy needing help, and Alana offers her assistance while sharing stories of her family's struggles and triumphs."}
{'media_type': 'Novels', 'genre': 'Martial Arts Fiction', 'character_name': 'Yilin', 'media_source': 'The Smiling, Proud Wanderer', 'description': 'Yilin is a young nun from the Hengshan Sect in Jin Yong\'s novel "The Smiling, Proud Wanderer." Known for her innocence and kindness, she becomes friends with the protagonist Linghu Chong. Her gentle nature often puts her at odds with the violent world of ma

In [53]:
# CRD3 Dataset (World Descriptions)
# Download a sample CRD3 file, extract dialogue for world-building narratives.

# %%
# Download CRD3 sample file (e.g., episode 1 transcript) total 115

crd3_descriptions = []

for i in range(1, 110):
    try:
        file_name = f"C1E{str(i).zfill(3)}.json"
        url_crd3 = (
            f"https://raw.githubusercontent.com/RevanthRameshkumar/CRD3/refs/heads/master/data/cleaned%20data/"
            + file_name
        )

        response_crd3 = requests.get(url_crd3)
        with open(f"data/" + file_name, "wb") as file:
            file.write(response_crd3.content)

        with open(f"data/" + file_name, "r", encoding="utf-8") as file:
            crd3_data = json.load(file)

        for turn in crd3_data.get("TURNS", []):
            if "MATT" in turn.get("NAMES", []):
                utterances = turn.get("UTTERANCES", [])
                for utterance in utterances:
                    crd3_descriptions.append(utterance)
    except Exception as e:
        print(e)


print(f"CRD3: {len(crd3_descriptions)} descriptions extracted.")
[print(x) for x in crd3_descriptions[:5]]

Extra data: line 1 column 4 (char 3)
Extra data: line 1 column 4 (char 3)
CRD3: 243013 descriptions extracted.
Hello everyone. My name is Matthew Mercer,
voice actor and Dungeon Master for Critical Role
on Geek & Sundry, where I take a bunch of other
voice actors and run them through a fantastical
fantasy adventure through the world of Dungeons &


[None, None, None, None, None]

In [None]:
# Combine Data for RAG
# Combine descriptions, names, and locations into a single file.

# %%
all_data = {
    "book_descriptions": bbe_descriptions,
    "character_names": fantasy_names + [x.get("name") for x in entry if x.get("name")],
    "character_descriptions": [x["description"] for x in entry],
    "location_names": got_locations,
    "narrations": crd3_descriptions,
}
print(
    f"Total: {len(all_data['book_descriptions'])} book descriptions, {len(all_data['character_names'])} character names, {len(all_data['character_descriptions'])} character descriptions, {len(all_data['location_names'])} location names, {len(all_data['narrations'])} narrations"
)

with open("worldbuilding_data.txt", "w", encoding="utf-8") as file:
    file.write("=== Book Descriptions ===\n")
    for desc in all_data["book_descriptions"]:
        file.write(f"{desc}\n---\n")
    file.write("=== Character Names ===\n")
    for name in all_data["character_names"]:
        file.write(f"{name}\n")
    file.write("=== Character Descriptions ===\n")
    for desc in all_data["character_descriptions"]:
        file.write(f"{desc}\n---\n")
    file.write("=== Location Names ===\n")
    for loc in all_data["location_names"]:
        file.write(f"{loc}\n")
    file.write("=== Narrations ===\n")
    for narration in all_data["narrations"]:
        file.write(f"{narration}\n")

print("Data saved to worldbuilding_data.txt")

Total: 15200 book descriptions, 840 character names, 15939 character descriptions, 26 location names, 6778 narrations
Data saved to worldbuilding_data.txt


## Donjon.bin.sh 
A good site with open API for generating Fantasy and DND-specific things 

We may be able to use it as agent tools 

In [59]:
from pprint import pprint

print("World API")
r = requests.get(
    "https://donjon.bin.sh/fantasy/random/rpc-fantasy.fcgi?type=World&n=10"
)
pprint(json.loads(r.text)[:2])

World API
['Eglalath: A medieval world of desolate deserts and azure seas, illuminated '
 'by a white sun. The world is mostly populated by dwarves, though with large '
 'numbers of other races.',
 'Thumla: A savage realm of shadowed woodlands, vast lowlands, and azure '
 'oceans. The realm is primarily populated by gnolls and goblins.']


In [60]:
print("Locations Tool API")
r = requests.get(
    "https://donjon.bin.sh/fantasy/random/rpc-fantasy.fcgi?type=Location&loc_type=&n=10"
)
pprint(json.loads(r.text)[:2])

Locations Tool API
['Mara, a malevolent realm of ashen deserts and poisoned seas, illuminated by '
 'three crimson suns. The realm is the domain of Maeli, a demigod of '
 'destruction, and is populated by trolls. It is also the location of the '
 'Tower of Eternal Night, a bastion of insane cultists and warlocks.',
 'Gothrithiml, an elemental world of sapphire water and forests of kelp. The '
 'world is the domain of Elebrir, a deity of rivers, and is populated by '
 'nereids. The denizens of Gothrithiml are at war with invading demons.']


In [61]:
print("Character Generation API")
r = requests.get(
    "https://donjon.bin.sh/fantasy/random/rpc-fantasy.fcgi?type=NPC&race=&gender=&order=&culture=&n=10"
)
pprint(json.loads(r.text)[:2])

Character Generation API
['Efrix: Female Halfling Scofflaw, Neutral. Efrix has a narrow face, with grey '
 'hair and brown eyes. She wears sturdy clothing and carries a hazel staff. '
 'Efrix is courageous and suspicious.',
 'Grodre Iostidson: Male Dwarf Priest, Evil. Grodre has golden hair and bright '
 'blue eyes, and a maimed right hand. He wears well-made clothing and a sling '
 'of vials and potions. Grodre is violent but cultured.']


In [69]:
print("Towns and Cities API")
r = requests.get(
    "https://donjon.bin.sh/fantasy/random/rpc-fantasy.fcgi?type=Town&size=&race=&culture=&n=10"
)
pprint(json.loads(r.text)[:2])

Towns and Cities API
['Naramunz: Population 11800, mixed dwarf and gnome. Most of the city is '
 'delved into the sides of a mountain. It is governed by a consortium of '
 'guildmasters, whose weekly meetings often turn into drunken brawls. Naramunz '
 'is renowned for its fine armories and axesmiths.',
 'Lindorthell: Population 650, primarily elf, some halfling. A major road runs '
 'through the center of the village. It is governed by a council of elders, '
 'whose bizarre laws and decrees are commonly ignored. Lindorthell has seen '
 'better days, and its streets are filled with filth.']


And a good example on how an Adventure should be generated

Can be used both as a tool or just as an adventure schema

In [67]:
from bs4 import BeautifulSoup

r = requests.get("https://donjon.bin.sh/fantasy/adventure/")


soup = BeautifulSoup(r.text, "html.parser")


adventure_data = {}

table = soup.find("table", id="adventure")

rows = table.find_all("tr", class_="section")

for row in rows:
    section_title = row.find("td", class_="section_title").get_text(strip=True)

    value_cell = row.find("td").find_next_sibling("td")
    value = (
        value_cell.find("b").get_text(strip=True)
        if value_cell.find("b")
        else value_cell.get_text(strip=True)
    )

    next_row = row.find_next_sibling("tr")
    description = next_row.find("td").get_text(strip=True) if next_row else ""

    adventure_data[section_title] = {"value": value, "description": description}


adventure_data

{'Theme': {'value': 'Horror',
  'description': 'This type of adventure is designed to scare both the characters\n      and the players. Just having a monster attack is not enough for a\n      horror theme; the monster must first frighten the characters.'},
 'Goal': {'value': 'Rescue NPC(s)',
  'description': 'The characters must rescue one or more kidnapped NPCs, probably\n      from the stronghold of the kidnapper. Obviously, they must get to\n      said stronghold, break in, rescue the kidnappee, break out, and\n      escape back to safety; this usually requires careful and clever\n      planning and a large dose of luck.'},
 'Story Hook': {'value': 'Old Friend',
  'description': 'Another classic story hook is the Old Friend, the childhood\n      friend of our hero, who shows up in one of several ways to drag\n      the hero into the story.'},
 'Plot': {'value': 'A-B-C Quest',
  'description': 'This is an epic sort of plot. In it, the heros are given (or\n      undertake) a task to p