# Building a AI Recomendation System for D&D

Due to the nature of the available dataset on the internet, so far the data includes the basic 5e core rules and some extra contet. Meaning, no Artificer, some extra races, etc.
After analyzing the data I will describe it better bellow.

## Import Libraries

In [48]:
import requests
import pandas as pd
import numpy as np
import os
import ast
import json

## Fetch data with API

Open5e has an API that allows to fetch all types of data related to Dungeons and Dragons 5th edition

In [2]:
# Define the API base URL and endpoints
api_base_url = "https://api.open5e.com/v1/"
endpoints = [
    "spells", "spelllist", "monsters", "documents", "backgrounds", 
    "planes", "sections", "feats", "conditions", "races", "classes", 
    "magicitems", "weapons", "armor"
]

# Function to fetch data from the API
def fetch_data(endpoint, params=None):
    url = f"{api_base_url}{endpoint}/"
    response = requests.get(url, params=params)
    if response.status_code == 200:
        data = response.json()
        return data['results']  # assuming paginated results
    else:
        print(f"Failed to fetch data from {url}")
        return None

# Create a directory to save CSV files
output_dir = "data"
os.makedirs(output_dir, exist_ok=True)

# Loop through each endpoint, fetch data, and save it to CSV
for endpoint in endpoints:
    data = fetch_data(endpoint)
    if data:
        df = pd.DataFrame(data)
        # Save DataFrame to CSV file
        csv_file = os.path.join(output_dir, f"{endpoint}.csv")
        df.to_csv(csv_file, index=False)
        print(f"Data for {endpoint} saved to {csv_file}")
    else:
        print(f"No data fetched for {endpoint}")

print("Data fetching and saving complete.")


Data for spells saved to data\spells.csv
Data for spelllist saved to data\spelllist.csv
Data for monsters saved to data\monsters.csv
Data for documents saved to data\documents.csv
Data for backgrounds saved to data\backgrounds.csv
Data for planes saved to data\planes.csv
Data for sections saved to data\sections.csv
Data for feats saved to data\feats.csv
Data for conditions saved to data\conditions.csv
Data for races saved to data\races.csv
Data for classes saved to data\classes.csv
Data for magicitems saved to data\magicitems.csv
Data for weapons saved to data\weapons.csv
Data for armor saved to data\armor.csv
Data fetching and saving complete.


## Lets Understand the Data

In [3]:
#Data from classes
DFclasses = pd.read_csv('data/classes.csv')
DFclasses.head(2)

Unnamed: 0,name,slug,desc,hit_dice,hp_at_1st_level,hp_at_higher_levels,prof_armor,prof_weapons,prof_tools,prof_saving_throws,prof_skills,equipment,table,spellcasting_ability,subtypes_name,archetypes,document__slug,document__title,document__license_url,document__url
0,Barbarian,barbarian,"### Rage \n \nIn battle, you fight with primal...",1d12,12 + your Constitution modifier,1d12 (or 7) + your Constitution modifier per b...,"Light armor, medium armor, shields","Simple weapons, martial weapons",,"Strength, Constitution","Choose two from Animal Handling, Athletics, In...","You start with the following equipment, in add...",| Level | Proficiency Bonus | Features ...,,Primal Paths,"[{'name': 'Path of the Berserker', 'slug': 'pa...",wotc-srd,5e Core Rules,http://open5e.com/legal,http://dnd.wizards.com/articles/features/syste...
1,Bard,bard,### Spellcasting \n \nYou have learned to unta...,1d8,8 + your Constitution modifier,1d8 (or 5) + your Constitution modifier per ba...,Light armor,"Simple weapons, hand crossbows, longswords, ra...",Three musical instruments of your choice,"Dexterity, Charisma",Choose any three,"You start with the following equipment, in add...",| Level | Proficiency Bonus | Features ...,Charisma,Bard Colleges,"[{'name': 'College of Lore', 'slug': 'college-...",wotc-srd,5e Core Rules,http://open5e.com/legal,http://dnd.wizards.com/articles/features/syste...


In [4]:
DFclasses.columns

Index(['name', 'slug', 'desc', 'hit_dice', 'hp_at_1st_level',
       'hp_at_higher_levels', 'prof_armor', 'prof_weapons', 'prof_tools',
       'prof_saving_throws', 'prof_skills', 'equipment', 'table',
       'spellcasting_ability', 'subtypes_name', 'archetypes', 'document__slug',
       'document__title', 'document__license_url', 'document__url'],
      dtype='object')

From this we can see that some columns are probably not usefull, such as the urls, the slugs and document title. so we will drop them

In [5]:
DFclasses.drop(['slug','document__slug','document__title', 'document__license_url', 'document__url'],axis=1, inplace=True)

In [6]:
DFclasses.head(1)

Unnamed: 0,name,desc,hit_dice,hp_at_1st_level,hp_at_higher_levels,prof_armor,prof_weapons,prof_tools,prof_saving_throws,prof_skills,equipment,table,spellcasting_ability,subtypes_name,archetypes
0,Barbarian,"### Rage \n \nIn battle, you fight with primal...",1d12,12 + your Constitution modifier,1d12 (or 7) + your Constitution modifier per b...,"Light armor, medium armor, shields","Simple weapons, martial weapons",,"Strength, Constitution","Choose two from Animal Handling, Athletics, In...","You start with the following equipment, in add...",| Level | Proficiency Bonus | Features ...,,Primal Paths,"[{'name': 'Path of the Berserker', 'slug': 'pa..."


### Rinse and Repeat
We will load and clear out useless columns for the dataframes without modifying the original data. If in the process we find problematic data, we shall deal with it. After all of this, we will save the modified data with different names.

In [7]:
#Data from spells
DFspells = pd.read_csv('data/spells.csv')
DFspells.head(2)

Unnamed: 0,slug,name,desc,higher_level,page,range,target_range_sort,components,requires_verbal_components,requires_somatic_components,...,spell_level,school,dnd_class,spell_lists,archetype,circles,document__slug,document__title,document__license_url,document__url
0,abhorrent-apparition,Abhorrent Apparition,You imbue a terrifying visage onto a gourd and...,If you cast this spell using a spell slot of 5...,,60 feet,60,M,False,False,...,4,illusion,"Bard, Sorcerer, Wizard","['bard', 'sorcerer', 'wizard']",,,dmag,Deep Magic 5e,http://open5e.com/legal,https://koboldpress.com/kpstore/product/deep-m...
1,abrupt-hug,Abrupt Hug,You or the creature taking the Attack action c...,,,30 Feet,30,V,True,False,...,1,transmutation,Ranger,['ranger'],,,warlock,Warlock Archives,http://open5e.com/legal,https://koboldpress.com/kpstore/product-catego...


In [8]:
DFspells.drop(['slug','document__slug','document__title', 'document__license_url', 'document__url'],axis=1,inplace=True)

In [9]:
DFspells.head(1)

Unnamed: 0,name,desc,higher_level,page,range,target_range_sort,components,requires_verbal_components,requires_somatic_components,requires_material_components,...,requires_concentration,casting_time,level,level_int,spell_level,school,dnd_class,spell_lists,archetype,circles
0,Abhorrent Apparition,You imbue a terrifying visage onto a gourd and...,If you cast this spell using a spell slot of 5...,,60 feet,60,M,False,False,True,...,False,1 action,4th-level,4,4,illusion,"Bard, Sorcerer, Wizard","['bard', 'sorcerer', 'wizard']",,


A pattern may have emerged, some columns have repeated so we will check if those columns appear in all of the dataframes, if so. We will create a loop to drop them all. 

In [10]:
#Loading all remaning dataframes
DFspelllist = pd.read_csv('data/spelllist.csv')
DFmonsters = pd.read_csv('data/monsters.csv')
DFdocuments = pd.read_csv('data/documents.csv')
DFbackgrounds = pd.read_csv('data/backgrounds.csv')
DFplanes = pd.read_csv('data/planes.csv')
DFsections = pd.read_csv('data/sections.csv')
DFfeats = pd.read_csv('data/feats.csv')
DFconditions = pd.read_csv('data/conditions.csv')
DFraces = pd.read_csv('data/races.csv')
DFmagicitems = pd.read_csv('data/magicitems.csv')
DFweapons = pd.read_csv('data/weapons.csv')
DFarmor = pd.read_csv('data/armor.csv')

In [11]:
print(DFspelllist.columns)
print(DFmonsters.columns)
print(DFdocuments.columns)
print(DFbackgrounds.columns)
print(DFplanes.columns)
print(DFsections.columns)
print(DFfeats.columns)
print(DFconditions.columns)
print(DFraces.columns)
print(DFmagicitems.columns)
print(DFweapons.columns)
print(DFarmor.columns)

Index(['slug', 'name', 'desc', 'spells', 'document__slug', 'document__title',
       'document__license_url', 'document__url'],
      dtype='object')
Index(['slug', 'desc', 'name', 'size', 'type', 'subtype', 'group', 'alignment',
       'armor_class', 'armor_desc', 'hit_points', 'hit_dice', 'speed',
       'strength', 'dexterity', 'constitution', 'intelligence', 'wisdom',
       'charisma', 'strength_save', 'dexterity_save', 'constitution_save',
       'intelligence_save', 'wisdom_save', 'charisma_save', 'perception',
       'skills', 'damage_vulnerabilities', 'damage_resistances',
       'damage_immunities', 'condition_immunities', 'senses', 'languages',
       'challenge_rating', 'cr', 'actions', 'bonus_actions', 'reactions',
       'legendary_desc', 'legendary_actions', 'special_abilities',
       'spell_list', 'page_no', 'environments', 'img_main', 'document__slug',
       'document__title', 'document__license_url', 'document__url'],
      dtype='object')
Index(['title', 'slug', 'u

From the information above we can confirm our hypothesis and continue to delete those columns, we can also asume that the DFdocuments is going to be fairle useless for DMs so we will not cosider it in the future.

In [12]:
# List of dataframes
dfs = [DFarmor,DFbackgrounds,DFconditions,DFfeats,DFmagicitems,
       DFmonsters,DFplanes,DFraces,DFsections,DFspelllist,DFweapons]

# Common columns to delete
common_columns = ['slug','document__slug','document__title', 'document__license_url', 'document__url']

# Loop through each dataframe
for df in dfs:
    # Delete common columns from the dataframe
    df.drop(columns=common_columns, axis=1, inplace=True, errors='ignore')


In [27]:
#Iterate through the dfs list to check that everything is all right.
#But first we expand the list to contain all the dfs and sort it so its alphabetiaclly ordered
dfs = [DFarmor,DFbackgrounds,DFclasses,DFconditions,DFfeats,DFmagicitems,
       DFmonsters,DFplanes,DFraces,DFsections,DFspelllist,DFspells,DFweapons]
dfs[2].columns

Index(['name', 'desc', 'hit_dice', 'hp_at_1st_level', 'hp_at_higher_levels',
       'prof_armor', 'prof_weapons', 'prof_tools', 'prof_saving_throws',
       'prof_skills', 'equipment', 'table', 'spellcasting_ability',
       'subtypes_name', 'archetypes'],
      dtype='object')

## What the hell is actually in this data?

Lets see which classes are available and their sub classes.

In [30]:
DFclasses.name

0     Barbarian
1          Bard
2        Cleric
3         Druid
4       Fighter
5          Monk
6       Paladin
7        Ranger
8         Rogue
9      Sorcerer
10      Warlock
11       Wizard
Name: name, dtype: object

So all the oficial classes except for the artificer. How about subclasses?

In [42]:
DFclasses.archetypes[0]

'[{\'name\': \'Path of the Berserker\', \'slug\': \'path-of-the-berserker\', \'desc\': "For some barbarians, rage is a means to an end- that end being violence. The Path of the Berserker is a path of untrammeled fury, slick with blood. As you enter the berserker\'s rage, you thrill in the chaos of battle, heedless of your own health or well-being. \\n \\n##### Frenzy \\n \\nStarting when you choose this path at 3rd level, you can go into a frenzy when you rage. If you do so, for the duration of your rage you can make a single melee weapon attack as a bonus action on each of your turns after this one. When your rage ends, you suffer one level of exhaustion (as described in appendix A). \\n \\n##### Mindless Rage \\n \\nBeginning at 6th level, you can\'t be charmed or frightened while raging. If you are charmed or frightened when you enter your rage, the effect is suspended for the duration of the rage. \\n \\n##### Intimidating Presence \\n \\nBeginning at 10th level, you can use your a

So the subclasses are hidden under all that pile of text, but we can see that there is a pattern to the information, we will exploit that in order to make a dicitonary with all the information correctly ordered.

In [87]:
# Extract the JSON string from the DataFrame
text =DFclasses.archetypes[0][1:-1]

# Correct the JSON string (replace single quotes with double quotes)
#json_str = json_str.replace("'", '"')

# Attempt to load JSON
text

'{\'name\': \'Path of the Berserker\', \'slug\': \'path-of-the-berserker\', \'desc\': "For some barbarians, rage is a means to an end- that end being violence. The Path of the Berserker is a path of untrammeled fury, slick with blood. As you enter the berserker\'s rage, you thrill in the chaos of battle, heedless of your own health or well-being. \\n \\n##### Frenzy \\n \\nStarting when you choose this path at 3rd level, you can go into a frenzy when you rage. If you do so, for the duration of your rage you can make a single melee weapon attack as a bonus action on each of your turns after this one. When your rage ends, you suffer one level of exhaustion (as described in appendix A). \\n \\n##### Mindless Rage \\n \\nBeginning at 6th level, you can\'t be charmed or frightened while raging. If you are charmed or frightened when you enter your rage, the effect is suspended for the duration of the rage. \\n \\n##### Intimidating Presence \\n \\nBeginning at 10th level, you can use your ac

We get a problem, so lets examine the problem

In [82]:
json_str[-2:-1]

'}'

In [None]:
we see that the problem is that some of the sigle quotes where just part of the text. 