## The QUEST Dataset

This notebook covers the analysis of the "quest" dataset prepared for the implementation of this project. This project gathers the quest information from the following games to implement and evaluate the performance of popular LLMs (large language models) for the procedural generation of quests:

- Baldur's Gate[^1]
- Baldur's Gate II: Shadows of Amn[^2]
- The Elder Scrolls IV: Oblivion[^3]
- The Elder Scrolls V: Skyrim[^4]
- Minecraft[^5]
- Torchlight II[^6]
- Fallout[^7] †
- Fallout 2[^8] †

> _† Quest data for these games is collected manually with assistance of an LLM (to simplify the representations)._

**Sources:**

- S. Värtinen, P. Hämäläinen and C. Guckelsberger, "Generating Role-Playing Game Quests With GPT Language Models," in _IEEE Transactions on Games_, vol. 16, no. 1, pp. 127-139, March 2024, doi: 10.1109/TG.2022.3228480, Paper: https://ieeexplore.ieee.org/document/9980408, Dataset: https://osf.io/jtqdb/, Accessed On: 4 April 2025.
- Fallout Quests, _Nukapedia: The Fallout Wiki_, Available: https://fallout.fandom.com/wiki/Fallout_quests, Accessed On: 4 April, 2025.
- Fallout 2 Quests, _Nukapedia: The Fallout Wiki_, Available: https://fallout.fandom.com/wiki/Fallout_2_quests, Accessed On: 4 April 2025.

**Findings:**

- The dataset has some "rare" columns that will likely be empty, retained for further expressions of the quest.
- The relative distribution of quests for [^4] is the highest (34.3034%) and [^7] is the lowest (5.1146%).
- The quests added from the _Fallout_ games increase the total number of entries (Värtinen et al.) by 156.

<!-- References -->

[^1]: https://en.wikipedia.org/wiki/Baldur%27s_Gate_(video_game)
[^2]: https://en.wikipedia.org/wiki/Baldur%27s_Gate_II:_Shadows_of_Amn
[^3]: https://elderscrolls.fandom.com/wiki/The_Elder_Scrolls_IV:_Oblivion
[^4]: https://elderscrolls.fandom.com/wiki/The_Elder_Scrolls_V:_Skyrim
[^5]: https://www.minecraft.net/
[^6]: https://www.torchlight2.com/
[^7]: https://fallout.fandom.com/wiki/Fallout
[^8]: https://fallout.fandom.com/wiki/Fallout_2


In [1]:
import json
import os
import sys
from os import PathLike
from pathlib import Path
from typing import Any, Callable, Optional

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [3]:
root: str = str(Path.cwd().parent.resolve())
if root not in sys.path:
    sys.path.insert(0, root)

In [4]:
from utils.dirpath import get_target_dirpath

In [5]:
# Map identifiers to game titles
id_to_title_map: dict[str, str] = {
    "BG": "Baldur's Gate",
    "BGII": "Baldur's Gate II: Shadows of Amn",
    "oblivion": "The Elder Scrolls IV: Oblivion",
    "skyrim": "The Elder Scrolls V: Skyrim",
    "MC": "Minecraft",
    "TL2": "Torchlight II",
    "fallout": "Fallout",
    "fallout2": "Fallout 2",
}

# Look through the dataset directory
# and collect paths to the JSON files
data_dir: Path = get_target_dirpath("data/raw")
paths: list[Path] = []
for filepath in data_dir.rglob("*.json"):
    print(filepath)
    paths.append(filepath)

/app/data/raw/BG/quests_BG.json
/app/data/raw/BG/quests_BGII.json
/app/data/raw/FO/quests_fallout.json
/app/data/raw/FO/quests_fallout2.json
/app/data/raw/MC/quests_MC.json
/app/data/raw/TES/quests_oblivion.json
/app/data/raw/TES/quests_skyrim.json
/app/data/raw/TL2/quests_TL2.json


In [6]:
# Map game titles to their file paths
title_to_path_map: dict[str, Path] = {}

for path in paths:
    filename: str = path.stem  # Extract filename without extension
    identifier: str = filename.split("_")[-1]  # Extract game identifier
    if identifier in id_to_title_map:
        title_to_path_map[id_to_title_map[identifier]] = path

title_to_path_map

{"Baldur's Gate": PosixPath('/app/data/raw/BG/quests_BG.json'),
 "Baldur's Gate II: Shadows of Amn": PosixPath('/app/data/raw/BG/quests_BGII.json'),
 'Fallout': PosixPath('/app/data/raw/FO/quests_fallout.json'),
 'Fallout 2': PosixPath('/app/data/raw/FO/quests_fallout2.json'),
 'Minecraft': PosixPath('/app/data/raw/MC/quests_MC.json'),
 'The Elder Scrolls IV: Oblivion': PosixPath('/app/data/raw/TES/quests_oblivion.json'),
 'The Elder Scrolls V: Skyrim': PosixPath('/app/data/raw/TES/quests_skyrim.json'),
 'Torchlight II': PosixPath('/app/data/raw/TL2/quests_TL2.json')}

In [7]:
# Load and consolidate the datasets
quest_data_map: dict[str, pd.DataFrame] = {
    title: pd.read_json(path) for title, path in title_to_path_map.items()
}

for title, data in quest_data_map.items():
    data["title"] = title  # Associate the data with their respective titles

quest_data_map["Fallout"].head(10)

Unnamed: 0,name,objective,first_tasks,first_task_locations,quest_giver,reward,tools,description,title
0,Find the Water Chip,Locate and retrieve a replacement water chip f...,"[Exit Vault 13 and explore the wasteland., Tra...","[{'name': 'Vault 13', 'description': 'The play...","{'name': 'Vault 13 Overseer', 'description': '...","[{'name': 'Experience Points', 'description': ...",[Rope],"""Our vault's water chip has failed. Without it...",Fallout
1,Calm Rebel Faction,Convince the rebel faction in Vault 13 to rema...,[Speak to upset Vault 13 citizens to learn abo...,"[{'name': 'Vault 13 Living Quarters', 'descrip...","{'name': 'Upset Vault Dweller', 'description':...","[{'name': 'Experience Points', 'description': ...",[None],Upon returning to Vault 13 during the search f...,Fallout
2,Find the Water Thief,Identify and stop the person stealing water fr...,[Investigate reports of missing water rations ...,"[{'name': 'Vault 13 Storage Area', 'descriptio...","{'name': 'Vault 13 Overseer', 'description': '...","[{'name': 'Experience Points', 'description': ...",[],"Water rations in Vault 13 are disappearing, an...",Fallout
3,Destroy the Mutant Leader,"Eliminate the Master, leader of the mutant army.",[Infiltrate the Cathedral and locate the Master.],"[{'name': 'The Cathedral', 'description': 'A h...","{'name': 'Brotherhood of Steel Elders', 'descr...","[{'name': 'Experience Points', 'description': ...","[Power Armor, Plasma Rifle, Stealth Boy]","The Master, a mutated being seeking to unite a...",Fallout
4,Destroy the Source of the Mutants,Destroy the Military Base producing Super Muta...,[Locate the Mariposa Military Base and find a ...,"[{'name': 'Mariposa Military Base', 'descripti...","{'name': 'Brotherhood of Steel Elders', 'descr...","[{'name': 'Experience Points', 'description': ...","[C4 Explosives, Stealth Boy, Plasma Rifle]",The Vault Dweller must infiltrate the Mariposa...,Fallout
5,Obtain More Equipment,Acquire better equipment to increase survivabi...,[Find sources of better armor and weapons.],"[{'name': 'Gun Runners', 'description': 'A mer...","{'name': 'Self-initiated', 'description': 'The...","[{'name': 'Improved Gear', 'description': 'Acc...","[Caps (for purchasing equipment), Lockpick (fo...",As the Vault Dweller progresses through the wa...,Fallout
6,Research Important Information,Gather crucial intelligence to aid in survival...,[Locate sources of valuable information.],"[{'name': 'The Brotherhood of Steel', 'descrip...","{'name': 'Self-initiated', 'description': 'The...","[{'name': 'Crucial Intel', 'description': 'New...",[High Intelligence (for analyzing information)...,The Vault Dweller must seek out and analyze im...,Fallout
7,Unlock the Armory,Gain access to a restricted armory to acquire ...,[Find a way to bypass or unlock the armory doo...,"[{'name': 'Military Base', 'description': 'A h...","{'name': 'Self-initiated or Faction Leader', '...","[{'name': 'Advanced Weaponry', 'description': ...",[Lockpicking Skill (to manually bypass securit...,The Vault Dweller must find a way to enter a s...,Fallout
8,Rescue Tandi from the Raiders,"Rescue Tandi, the daughter of Aradesh, from th...",[Find Tandi's location within the Khans' camp.],"[{'name': 'Khan Raider Camp', 'description': '...","{'name': 'Aradesh', 'description': 'The leader...","[{'name': 'Experience Points', 'description': ...","[Speech (to negotiate with the Khans), Stealth...","Tandi, the daughter of Aradesh, has been kidna...",Fallout
9,Stop the Radscorpions,Eliminate the radscorpions threatening Shady S...,[Locate the radscorpion cave and eliminate the...,"[{'name': 'Radscorpion Cave', 'description': '...","{'name': 'Aradesh', 'description': 'The leader...","[{'name': 'Experience Points', 'description': ...",[Melee or ranged weapons to fight radscorpions...,The village of Shady Sands is plagued by deadl...,Fallout


In [8]:
# Check the quest count for each game
print("Title of the Game".ljust(32), "|", "Quest Count")
print("-" * 32, "+", "-" * 11)
for title, data in quest_data_map.items():
    print(title.ljust(32), "|", str(len(data)).rjust(11))

Title of the Game                | Quest Count
-------------------------------- + -----------
Baldur's Gate                    |         100
Baldur's Gate II: Shadows of Amn |          94
Fallout                          |          58
Fallout 2                        |          98
Minecraft                        |         100
The Elder Scrolls IV: Oblivion   |         215
The Elder Scrolls V: Skyrim      |         389
Torchlight II                    |          80


In [9]:
# Combine into the quest dataset
quest_df: pd.DataFrame = pd.concat(quest_data_map, ignore_index=True)
quest_df.head(10)

Unnamed: 0,name,objective,first_tasks,first_task_locations,quest_giver,reward,tools,description,items,characters,groups,locations,enemies,title,motivation
0,A Bard's Request,bring an authoritative history of the Unicorn ...,[find an authoritative history of the Unicorn ...,"[{'name': 'NONE', 'description': ''}]","{'name': 'Rinnie', 'description': 'a bard and ...","[{'name': 'Scroll of Protection From Poison', ...",[NONE],"""Sweet waters of the forest\nflow through each...",,,,,,Baldur's Gate,
1,A Book for Firebead,"bring ""The History of the Fateful Coin"" to Fir...","[find ""The History of the Fateful Coin""]","[{'name': '', 'description': 'one of the local...","{'name': 'Firebead Elvenhair', 'description': ...","[{'name': '""History of the Dead Three""', 'desc...",[NONE],"Hello, I am Firebead Elvenhair: an elderly mag...","[{'name': '""The History of the Fateful Coin""',...",,,,,Baldur's Gate,
2,A Child in the Lighthouse,save Ardrouine's little son from worgs,[go to the abandoned lighthouse],"[{'name': '', 'description': 'abandoned lighth...","{'name': 'Ardrouine', 'description': 'a distre...","[{'name': '', 'description': 'coins', 'amount'...",[NONE],"Please help me, I am just poor Ardrouine!\nI d...",,,,,,Baldur's Gate,
3,A Contract Killing,assassinate Cyrdemac,[find Cyrdemac],"[{'name': 'Elfsong Tavern', 'description': 'a ...","{'name': 'Areana', 'description': 'a mysteriou...","[{'name': '', 'description': 'gold', 'amount':...",[NONE],Perhaps... perhaps you are the type of people ...,,"[{'name': 'Cyrdemac', 'description': 'a man wh...",,,,Baldur's Gate,
4,Albert and His Dog,bring Rufie to Albert,[find Rufie],"[{'name': 'NONE', 'description': ''}]","{'name': 'Albert', 'description': 'a young boy...","[{'name': '', 'description': 'Rufie's another ...",[Albert gives the player Rufie's chew toy to d...,Hi! I'm Albert.\n'Scuse me. I ever so sowwy to...,,"[{'name': 'Rufie', 'description': 'Albert's pe...",,,,Baldur's Gate,
5,A Corpse for Arkion,bring a corpse to Arkion,[find a corpse],"[{'name': 'sewers', 'description': ''}]","{'name': 'Arkion', 'description': 'a necromanc...","[{'name': '', 'description': 'gold pieces', 'a...",[NONE],The name's Arkion.\nI've got a dirty deed fer ...,,,,,,Baldur's Gate,
6,Alatos and the Thieves' Guild,"steal spellbook, statuette, and artifact fragm...",[go to Oberan's Estate],"[{'name': 'Oberan's Estate', 'description': 'a...","{'name': 'Alatos Thuibuld', 'description': 'th...","[{'name': '', 'description': 'a lot of gold', ...","[Shandalar lives inside a Halruaan skyship, Sh...","I am Alatos Thuibuld, though I thank you not t...","[{'name': '', 'description': 'spellbook, statu...","[{'name': 'Shandalar', 'description': 'a local...","[{'name': 'Thieves' Guild', 'description': 'a ...","[{'name': 'Baldur's Gate', 'description': 'a c...",,Baldur's Gate,
7,Arkion's Amulet,steal Arkion's amulet and bring it to Nemphre,[find Arkion's amulet],"[{'name': 'Baldur's Gate', 'description': 'a l...","{'name': 'Nemphre', 'description': 'a flirty w...","[{'name': 'Scroll of Vampiric Touch', 'descrip...",[Arkion was Nemphre's lover],"Mmm, be a sweet one. Could you do a favor for ...","[{'name': '', 'description': 'Arkion's amulet'...","[{'name': 'Arkion', 'description': 'a male nec...",,,,Baldur's Gate,
8,Arkion and Nemphre,steal Arkion's Bloodstone Amulet and Nemphre's...,[find Arkion's Bloodstone Amulet and Nemphre's...,"[{'name': 'Baldur's Gate', 'description': 'a l...","{'name': 'Ordulinian', 'description': 'a pries...","[{'name': 'Cloak of Non-Detection', 'descripti...","[Arkion and Nemphre used to be lovers, Arkion ...","Ah, fair adventurers, I'm Ordulinian.\nThe roa...","[{'name': 'Bloodstone Amulet', 'description': ...","[{'name': 'Arkion', 'description': 'a male nec...",,,,Baldur's Gate,
9,A Ring for a Lady,bring an angel skin ring to Fergus,[find an angel skin ring],"[{'name': 'NONE', 'description': ''}]","{'name': 'Fergus', 'description': 'a happy man...","[{'name': '', 'description': 'recommendation t...",[NONE],"I'm Fergus, a very happy man.\nI've got to adm...",,"[{'name': 'Scar', 'description': 'second-in-co...",,,,Baldur's Gate,


In [10]:
quest_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1134 entries, 0 to 1133
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   name                  1134 non-null   object
 1   objective             1134 non-null   object
 2   first_tasks           1134 non-null   object
 3   first_task_locations  1134 non-null   object
 4   quest_giver           1134 non-null   object
 5   reward                1134 non-null   object
 6   tools                 1132 non-null   object
 7   description           1134 non-null   object
 8   items                 270 non-null    object
 9   characters            583 non-null    object
 10  groups                372 non-null    object
 11  locations             375 non-null    object
 12  enemies               25 non-null     object
 13  title                 1134 non-null   object
 14  motivation            100 non-null    object
dtypes: object(15)
memory usage: 133.0+ KB


In [11]:
quest_df["title"].value_counts(normalize=True)

title
The Elder Scrolls V: Skyrim         0.343034
The Elder Scrolls IV: Oblivion      0.189594
Baldur's Gate                       0.088183
Minecraft                           0.088183
Fallout 2                           0.086420
Baldur's Gate II: Shadows of Amn    0.082892
Torchlight II                       0.070547
Fallout                             0.051146
Name: proportion, dtype: float64

In [12]:
quest_df.isnull().sum()  # Count of the missing values

name                       0
objective                  0
first_tasks                0
first_task_locations       0
quest_giver                0
reward                     0
tools                      2
description                0
items                    864
characters               551
groups                   762
locations                759
enemies                 1109
title                      0
motivation              1034
dtype: int64

In [13]:
quest_df[
    (quest_df["tools"].astype(str) == "[]")
    | (quest_df["tools"].astype(str) == "['NONE']")
]  # Rows with empty tools list

Unnamed: 0,name,objective,first_tasks,first_task_locations,quest_giver,reward,tools,description,items,characters,groups,locations,enemies,title,motivation
0,A Bard's Request,bring an authoritative history of the Unicorn ...,[find an authoritative history of the Unicorn ...,"[{'name': 'NONE', 'description': ''}]","{'name': 'Rinnie', 'description': 'a bard and ...","[{'name': 'Scroll of Protection From Poison', ...",[NONE],"""Sweet waters of the forest\nflow through each...",,,,,,Baldur's Gate,
1,A Book for Firebead,"bring ""The History of the Fateful Coin"" to Fir...","[find ""The History of the Fateful Coin""]","[{'name': '', 'description': 'one of the local...","{'name': 'Firebead Elvenhair', 'description': ...","[{'name': '""History of the Dead Three""', 'desc...",[NONE],"Hello, I am Firebead Elvenhair: an elderly mag...","[{'name': '""The History of the Fateful Coin""',...",,,,,Baldur's Gate,
2,A Child in the Lighthouse,save Ardrouine's little son from worgs,[go to the abandoned lighthouse],"[{'name': '', 'description': 'abandoned lighth...","{'name': 'Ardrouine', 'description': 'a distre...","[{'name': '', 'description': 'coins', 'amount'...",[NONE],"Please help me, I am just poor Ardrouine!\nI d...",,,,,,Baldur's Gate,
3,A Contract Killing,assassinate Cyrdemac,[find Cyrdemac],"[{'name': 'Elfsong Tavern', 'description': 'a ...","{'name': 'Areana', 'description': 'a mysteriou...","[{'name': '', 'description': 'gold', 'amount':...",[NONE],Perhaps... perhaps you are the type of people ...,,"[{'name': 'Cyrdemac', 'description': 'a man wh...",,,,Baldur's Gate,
5,A Corpse for Arkion,bring a corpse to Arkion,[find a corpse],"[{'name': 'sewers', 'description': ''}]","{'name': 'Arkion', 'description': 'a necromanc...","[{'name': '', 'description': 'gold pieces', 'a...",[NONE],The name's Arkion.\nI've got a dirty deed fer ...,,,,,,Baldur's Gate,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1122,Lair of the Manticore,bring the skin of the Manticore to the Sphinx,[kill the Manticore],"[{'name': 'Stygian Aerie', 'description': 'a M...","{'name': 'Sphinx', 'description': 'the riddle-...","[{'name': '', 'description': 'the Sphinx allow...",[NONE],"My ancient charge, to guard the gate\nThe dese...",,,,,"[{'name': 'Manticore', 'description': 'a half ...",Torchlight II,
1123,Perilous Passage,save the Mana Queen,[go through the gate to the Forsaken Vaults],"[{'name': 'Forsaken Vaults', 'description': 'a...","{'name': 'Sphinx', 'description': 'the guardia...","[{'name': 'NONE', 'description': '', 'amount':...",[NONE],The Mana Queen has come and gone\nThrough this...,,"[{'name': 'Mana Queen', 'description': 'a good...",,,,Torchlight II,
1125,A Shattered Visage Lies,help the Faceless King reunite the Zeraphi and...,"[find the Royal Masks, and bring them to the F...","[{'name': 'NONE', 'description': ''}]","{'name': 'Faceless King', 'description': 'the ...","[{'name': 'NONE', 'description': '', 'amount':...",[NONE],"I am called the Faceless King, and I have resi...","[{'name': 'Royal Masks', 'description': 'the m...",,"[{'name': 'Zeraphi, Ezrohir', 'description': '...","[{'name': 'Vault of Souls', 'description': 'a ...",,Torchlight II,
1129,Body Work,restore Umahd's physical form,[find the Ezrohir marauder who wears Umahd's w...,"[{'name': 'NONE', 'description': ''}]","{'name': 'Umahd', 'description': 'a disembodie...","[{'name': '', 'description': 'Umahd shall rewa...",[NONE],"I am no ghost, only disembodied spirit: a Zera...",,,"[{'name': 'Ezrohir', 'description': 'a race of...",,,Torchlight II,


In [14]:
# Fill rows for columns with empty rows: ([], null) -> ["NONE"]
for col in ("tools", "characters", "locations", "items", "enemies", "groups"):
    quest_df.loc[
        (quest_df[col].astype(str) == "[]") | (quest_df[col].isnull()), col
    ] = [["NONE"]]

In [15]:
# Fill the missing rows of the motivation column with "NONE"
quest_df.loc[quest_df["motivation"].isnull(), "motivation"] = "NONE"

In [16]:
train_df: pd.DataFrame
val_df: pd.DataFrame
test_df: pd.DataFrame

# Stratify split the quest dataset
train_df, test_df = train_test_split(
    quest_df, test_size=0.2, random_state=42, stratify=quest_df["title"]
)
val_df, test_df = train_test_split(
    test_df, test_size=0.5, random_state=42, stratify=test_df["title"]
)

len(train_df), len(val_df), len(test_df)

(907, 113, 114)

In [17]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 907 entries, 1123 to 883
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   name                  907 non-null    object
 1   objective             907 non-null    object
 2   first_tasks           907 non-null    object
 3   first_task_locations  907 non-null    object
 4   quest_giver           907 non-null    object
 5   reward                907 non-null    object
 6   tools                 907 non-null    object
 7   description           907 non-null    object
 8   items                 907 non-null    object
 9   characters            907 non-null    object
 10  groups                907 non-null    object
 11  locations             907 non-null    object
 12  enemies               907 non-null    object
 13  title                 907 non-null    object
 14  motivation            907 non-null    object
dtypes: object(15)
memory usage: 113.4+ KB


In [18]:
train_df["title"].value_counts(normalize=True)

title
The Elder Scrolls V: Skyrim         0.342889
The Elder Scrolls IV: Oblivion      0.189636
Minecraft                           0.088203
Baldur's Gate                       0.088203
Fallout 2                           0.085998
Baldur's Gate II: Shadows of Amn    0.082690
Torchlight II                       0.070562
Fallout                             0.051819
Name: proportion, dtype: float64

In [19]:
val_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 113 entries, 254 to 382
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   name                  113 non-null    object
 1   objective             113 non-null    object
 2   first_tasks           113 non-null    object
 3   first_task_locations  113 non-null    object
 4   quest_giver           113 non-null    object
 5   reward                113 non-null    object
 6   tools                 113 non-null    object
 7   description           113 non-null    object
 8   items                 113 non-null    object
 9   characters            113 non-null    object
 10  groups                113 non-null    object
 11  locations             113 non-null    object
 12  enemies               113 non-null    object
 13  title                 113 non-null    object
 14  motivation            113 non-null    object
dtypes: object(15)
memory usage: 14.1+ KB


In [20]:
val_df["title"].value_counts(normalize=True)

title
The Elder Scrolls V: Skyrim         0.345133
The Elder Scrolls IV: Oblivion      0.185841
Baldur's Gate                       0.088496
Fallout 2                           0.088496
Minecraft                           0.088496
Baldur's Gate II: Shadows of Amn    0.079646
Torchlight II                       0.070796
Fallout                             0.053097
Name: proportion, dtype: float64

In [21]:
test_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 114 entries, 293 to 518
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   name                  114 non-null    object
 1   objective             114 non-null    object
 2   first_tasks           114 non-null    object
 3   first_task_locations  114 non-null    object
 4   quest_giver           114 non-null    object
 5   reward                114 non-null    object
 6   tools                 114 non-null    object
 7   description           114 non-null    object
 8   items                 114 non-null    object
 9   characters            114 non-null    object
 10  groups                114 non-null    object
 11  locations             114 non-null    object
 12  enemies               114 non-null    object
 13  title                 114 non-null    object
 14  motivation            114 non-null    object
dtypes: object(15)
memory usage: 14.2+ KB


In [22]:
test_df["title"].value_counts(normalize=True)

title
The Elder Scrolls V: Skyrim         0.342105
The Elder Scrolls IV: Oblivion      0.192982
Minecraft                           0.087719
Fallout 2                           0.087719
Baldur's Gate                       0.087719
Baldur's Gate II: Shadows of Amn    0.087719
Torchlight II                       0.070175
Fallout                             0.043860
Name: proportion, dtype: float64

In [23]:
def save_to_json(
    df: pd.DataFrame, split_type: str, save_dir: PathLike = get_target_dirpath("data")
) -> None:
    save_path: Path = Path(save_dir) / f"quests_{split_type}.json"
    df.to_json(save_path, orient="records", indent=2)
    print(f"Saved to {save_path}")


# Save the split datasets for further use
save_to_json(train_df, "train")
save_to_json(val_df, "val")
save_to_json(test_df, "test")

Saved to /app/data/quests_train.json
Saved to /app/data/quests_val.json
Saved to /app/data/quests_test.json


In [24]:
def format_quest(entry: dict[str, Any]) -> str:
    def make_fmt(
        third_key: Optional[str] = None, default: str = "NONE"
    ) -> Callable[[dict[str, Any]], str]:
        if not third_key:
            return lambda item: "{} - {}".format(
                item.get("name", default), item.get("description", default)
            )
        else:
            return lambda item: "{} - {} ({}: {})".format(
                item.get("name", default),
                item.get("description", default),
                third_key,
                item.get(third_key, default),
            )

    def djoin(
        items: list[Any],
        formatter: Callable[[dict[str, Any]], str],
        default: str = "NONE",
    ) -> str:
        if not isinstance(items, list) or not items:
            return default
        result = [formatter(item) for item in items if isinstance(item, dict)]
        return ", ".join(result) if result else default

    # Main fields
    name: str = entry.get("name", "NONE")
    objective: str = entry.get("objective", "NONE")
    first_tasks: str = ", ".join(entry.get("first_tasks", [])) or "NONE"
    tools: str = ", ".join(entry.get("tools", [])) or "NONE"
    title: str = entry.get("title", "NONE")
    motivation: str = entry.get("motivation", "NONE")

    # Nested fields
    task_locations: str = djoin(entry.get("first_task_locations", []), make_fmt())
    reward: str = djoin(entry.get("reward", []), make_fmt("amount"))
    characters: str = djoin(entry.get("characters", []), make_fmt("location"))
    locations: str = djoin(entry.get("locations", []), make_fmt("location"))
    items: str = djoin(entry.get("items", []), make_fmt("amount"))
    enemies: str = djoin(entry.get("enemies", []), make_fmt("amount"))
    groups: str = djoin(entry.get("groups", []), make_fmt())

    # Quest giver, handled separately
    quest_giver: str = "{} - {} (location: {})".format(
        getattr(entry.get("quest_giver", {}), "name", "NONE"),
        getattr(entry.get("quest_giver", {}), "description", "NONE"),
        getattr(entry.get("quest_giver", {}), "location", "NONE"),
    )

    quest_text: str = (
        f"Quest Name: {name}\n"
        f"Objective: {objective}\n"
        f"First Tasks: {first_tasks}\n"
        f"First Task Locations: {task_locations}\n"
        f"Quest Giver: {quest_giver}\n"
        f"Reward: {reward}\n"
        f"Characters: {characters}\n"
        f"Tools: {tools}\n"
        f"Locations: {locations}\n"
        f"Items: {items}\n"
        f"Enemies: {enemies}\n"
        f"Groups: {groups}\n"
        f"Title: {title}\n"
        f"Motivation: {motivation}"
    )

    return quest_text

In [25]:
print(format_quest(train_df.to_dict(orient="records")[3]))

Quest Name: Deliver Hubologist's Field Report to AHS-9 in San Francisco
Objective: Take the Hubologist's field report and deliver it to AHS-9 in San Francisco.
First Tasks: Obtain the Hubologist's field report., Travel to San Francisco., Locate AHS-9 within the Hubologist compound., Hand over the field report to complete the delivery.
First Task Locations: Hubologist Camp - A settlement where the Hubologists operate and collect data for their cause., San Francisco - A city controlled by various factions, including the Hubologists.
Quest Giver: NONE - NONE (location: NONE)
Reward: Caps - A payment for successfully delivering the report. (amount: 500), Hubologist Reputation - Increased standing with the Hubologists. (amount: NONE)
Characters: NONE
Tools: Travel gear to safely navigate the wasteland., Speech to ensure smooth interactions with AHS-9.
Locations: NONE
Items: NONE
Enemies: NONE
Groups: NONE
Title: Fallout 2
Motivation: NONE


In [26]:
def save_to_txt(
    df: pd.DataFrame,
    split_type: str,
    save_dir: PathLike = get_target_dirpath("data"),
    instruction: str = (
        "Generate a video game quest description based on the following structured information."
    ),
    default: str = "NONE",
) -> None:
    def write_split_dir(entry: str, index: int) -> None:
        nonlocal split_dir
        os.makedirs(split_dir, exist_ok=True)
        with open(split_dir / f"{index:03}.txt", "w", encoding="utf-8") as fileptr:
            fileptr.write(entry + "\n")

    split_dir: Path = Path(save_dir) / split_type
    save_path: Path = split_dir / "quests.json"
    count: int = 0

    with open(save_path, "w", encoding="utf-8") as f_all:
        records: list[dict[str, str]] = []

        for idx, entry in enumerate(df.to_dict(orient="records")):
            prompt: str = (
                f"### Instruction:\n{instruction}\n\n"
                f"### Input:\n{format_quest(entry)}\n\n"
                f"### Response:"
            )
            response: str = entry.get("description", default).replace("\n", " ")

            records.append({"prompt": prompt, "response": response})
            write_split_dir(f"{prompt}\n{response}", idx)
            count += 1

        json.dump(records, f_all, indent=2)

    print(f"Saved {count} entries to:\n- {save_path}\n- {split_dir / '*'}")

In [27]:
# Save the formatted training dataset into text files
save_to_txt(train_df, "train")

Saved 907 entries to:
- /app/data/train/quests.json
- /app/data/train/*


In [28]:
# Save the formatted validation dataset into text files
save_to_txt(val_df, "val")

Saved 113 entries to:
- /app/data/val/quests.json
- /app/data/val/*


In [29]:
# Save the formatted testing dataset into text files
save_to_txt(test_df, "test")

Saved 114 entries to:
- /app/data/test/quests.json
- /app/data/test/*
