# DnD Monster Data Wrangling


## Importation

In [198]:
import math
import numpy as np
import pandas as pd
import ast
import re
from src.features.build_features import clean_list
from word2number import w2n


monster_df = pd.read_csv('../data/raw/Monster_Data_RAW.csv')

monster_df.head()

Unnamed: 0.1,Unnamed: 0,Monster Name,Size,Type,Alignment,Traits,Damage Resistances,Monster Tags:,Mythic Actions,Reactions,...,Proficiency Bonus,STR,DEX,CON,INT,WIS,CHA,Actions,Legendary Actions,Environment:
0,0,Adult Green Dragon,Huge,['dragon'],lawful evil,['Amphibious. The dragon can breathe air and w...,,,,,...,5,23,12,21,18,15,17,['Multiattack. The dragon can use its Frightfu...,"[""The dragon can take 3 legendary actions, cho...",['Forest']
1,1,Adult Silver Dragon,Huge,['dragon'],lawful good,['Legendary Resistance (3/Day). If the dragon ...,,,,,...,5,27,10,25,16,13,21,['Multiattack. The dragon can use its Frightfu...,"[""The dragon can take 3 legendary actions, cho...","['Mountain', 'Urban']"
2,2,Adult White Dragon,Huge,['dragon'],chaotic evil,"[""Ice Walk. The dragon can move across and cli...",,,,,...,5,22,10,22,8,12,12,['Multiattack. The dragon can use its Frightfu...,"[""The dragon can take 3 legendary actions, cho...",['Arctic']
3,3,Air Elemental,Large,['elemental'],neutral,"[""Air Form. The elemental can enter a hostile ...","Lightning, Thunder; Bludgeoning, Piercing, and...",,,,...,3,14,20,14,6,10,6,['Multiattack. The elemental makes two slam at...,,"['Desert', 'Mountain']"
4,4,Ape,Medium,['beast'],unaligned,[nan],,['Misc Creature'],,,...,2,16,14,14,6,12,7,['Multiattack. The ape makes two fist attacks....,,['Forest']


## Performing the Basic Cleanup
Removing columns we won't use, cleanining up feature titles, and checking out our dataset for datatypes and issues.

We know that Mythic Actions weren't introducted until a later version, so no monsters contain this category.

Unnamed:0 is useless, Monster Tags: is the same as Monster type (with a bit more specificity, which we don't need.)

Skills, Source, Launguages, and Senses are all unecessary for our MVP.

In [199]:
monster_df.drop(columns = {'Unnamed: 0', 'Mythic Actions', 'Monster Tags:', "Skills", 'Source', 'Languages', "Senses"}, inplace = True, axis = 1)
monster_df.rename(columns = {"Environment:":"Environment"}, inplace=True)

In [200]:
print(monster_df.columns)
print(monster_df.describe())

Index(['Monster Name', 'Size', 'Type', 'Alignment', 'Traits',
       'Damage Resistances', 'Reactions', 'Armor Class', 'Hit Points', 'Speed',
       'Saving Throws', 'Damage Vulnerabilities', 'Damage Immunities',
       'Condition Immunities', 'Challenge', 'Proficiency Bonus', 'STR', 'DEX',
       'CON', 'INT', 'WIS', 'CHA', 'Actions', 'Legendary Actions',
       'Environment'],
      dtype='object')
       Armor Class  Hit Points  Proficiency Bonus         STR         DEX  \
count   348.000000  348.000000         348.000000  348.000000  348.000000   
mean     13.985632   78.399425           2.712644   14.951149   12.706897   
std       3.155403   96.670352           1.296486    6.705018    3.078279   
min       5.000000    1.000000           2.000000    1.000000    1.000000   
25%      12.000000   17.000000           2.000000   11.000000   10.000000   
50%      13.000000   45.000000           2.000000   16.000000   13.000000   
75%      16.000000  110.000000           3.000000   19.00

## Challenge Rating (CR)
Currently Challenge Rating is a string, we want it to be an integer so we can use it. We need to remove the experience string attached and convert.

In [201]:
#split the string and only take the first part (Challenge Rating)
monster_df["Challenge"] = monster_df["Challenge"].str.split().str[0]

#turn fraction strings into floats
for indx, challenge in enumerate(monster_df["Challenge"]):
    if "/" in challenge:
       monster_df.loc[indx,'Challenge'] = pd.eval(challenge)
    else:
        monster_df.loc[indx,'Challenge'] = pd.to_numeric(challenge)

monster_df["Challenge"] = pd.to_numeric(monster_df["Challenge"])

## Monster Type
There are a large number of monster sub-types, this is unecessary for our analysis. we want to consolidate. 

In [202]:
for indx,Type in enumerate(monster_df['Type']):
    monster_df.loc[indx,"Type"] = Type.split(",")[0].strip("[']")

## Missing Values

All features with missing values make sense esxcept Actions, which I would have assumed every creature has an action, that may not be the case however. For the others, they are all optional features that lower level monsters won't have.

The NAs may still cause issues down the road, so I will replace them all with a string text like "NA"

In [203]:
monster_df.isna().any()

monster_df.fillna("['NA']",inplace=True)

## List Values
We current have features where the values are varying list of items. For example, a monster may be found in more than one environment such as [mountain, coastal, underdark]. Unfortuntaly, they are reading as strings right now so we will need to convert them to lists for easy of use.

In [204]:
#All lists columns are actually strings!
for i,j in enumerate(monster_df["Environment"]):
   print("list",i,"is",type(j))

list 0 is <class 'str'>
list 1 is <class 'str'>
list 2 is <class 'str'>
list 3 is <class 'str'>
list 4 is <class 'str'>
list 5 is <class 'str'>
list 6 is <class 'str'>
list 7 is <class 'str'>
list 8 is <class 'str'>
list 9 is <class 'str'>
list 10 is <class 'str'>
list 11 is <class 'str'>
list 12 is <class 'str'>
list 13 is <class 'str'>
list 14 is <class 'str'>
list 15 is <class 'str'>
list 16 is <class 'str'>
list 17 is <class 'str'>
list 18 is <class 'str'>
list 19 is <class 'str'>
list 20 is <class 'str'>
list 21 is <class 'str'>
list 22 is <class 'str'>
list 23 is <class 'str'>
list 24 is <class 'str'>
list 25 is <class 'str'>
list 26 is <class 'str'>
list 27 is <class 'str'>
list 28 is <class 'str'>
list 29 is <class 'str'>
list 30 is <class 'str'>
list 31 is <class 'str'>
list 32 is <class 'str'>
list 33 is <class 'str'>
list 34 is <class 'str'>
list 35 is <class 'str'>
list 36 is <class 'str'>
list 37 is <class 'str'>
list 38 is <class 'str'>
list 39 is <class 'str'>
list 40 is

### Attack, Spell Attack, Save DC
Actually, there is some information within these strings we can pull out easily with regex search, match, findall. Let's do that before converting into lists

In [205]:
# Create new columns for features
monster_df = monster_df.assign(Attack_Bonus= '', Spell_Bonus = '', Spell_Save_DC = '')

#Attack Bonus
for indx, action in enumerate(monster_df['Actions']):
    try:
        found = re.search("\+(.+?) to hit", action).group(0)
        monster_df.loc[indx,'Attack_Bonus'] = int(found.split()[0].lstrip('+'))
    except:
        monster_df.loc[indx,'Attack_Bonus'] = 0

#Spell Attack Bonus

for indx, trait in enumerate(monster_df['Traits']):
    try:
        found = re.search("\+(.+?) to hit", trait).group(0)
        monster_df.loc[indx,'Spell_Bonus'] = int(found.split()[0].lstrip('+'))
    except:
        monster_df.loc[indx,'Spell_Bonus'] = 0

#Spell Save DC

for indx, trait in enumerate(monster_df['Traits']):
    try:
        found = re.search("spell save DC [0-9]+", trait).group(0)
        monster_df.loc[indx,'Spell_Save_DC'] = int(found.split()[-1])
    except:
        monster_df.loc[indx,'Spell_Save_DC'] = 0

## Saving Throw Expansion
We want to be able to evaluate Saving Throw Numbers just like we due stats. Some monsters have bonuses to certain saving throws, which we will input first. Then we will use the stats to fill in the rest of the saving throws. Stat numbers have a base relationship to saving throw where every 2 stat increases is +1 into Saving throw. Example, 10 in Strength is a +0 in Str Saving Throw, but a 12 in Strength is a +1, and finally a 20 in Strength is a +5 in Strength saving throw.

In [206]:
monster_df["Saving Throws"] = monster_df['Saving Throws'].apply(clean_list)

#turn saving throw feature values into lists using literal_eval
monster_df["Saving Throws"] = monster_df["Saving Throws"].apply(ast.literal_eval)

#Saving Throw Exapanded features
saving_throw_df = pd.DataFrame(columns={"STR_SV","DEX_SV","CON_SV","INT_SV","WIS_SV","CHA_SV"})

for indx, saving_throw in enumerate(monster_df['Saving Throws']): 
    for string in saving_throw:
        if "DEX" in string:
            saving_throw_df.loc[indx,"DEX_SV"] = int(string.split()[1].lstrip('+'))
        elif "CON" in string:
            saving_throw_df.loc[indx,"CON_SV"] = int(string.split()[1].lstrip('+'))
        elif "STR" in string:
            saving_throw_df.loc[indx,"STR_SV"] = int(string.split()[1].lstrip('+'))
        elif "WIS" in string:
            saving_throw_df.loc[indx,"WIS_SV"] = int(string.split()[1].lstrip('+'))
        elif "INT" in string:
            saving_throw_df.loc[indx,"INT_SV"] = int(string.split()[1].lstrip('+'))
        elif "CHA" in string:
            saving_throw_df.loc[indx,"CHA_SV"] = int(string.split()[1].lstrip('+'))
            
monster_df = pd.concat([monster_df,saving_throw_df], axis=1)

In [207]:
#Using Stats to fill in missing saving throws
stat_modifiers ={('1') : -5, ('2','3') : -4, ('4','5') : -3, ('6','7') : -2, ('8','9') : -1, ('10','11') : 0, ('12','13') : 1, ('14','15') : 2, ('16','17') : 3, ('18','19') : 4, 
('20','21') : 5, ('22','23') : 6, ('24','25') : 7, ('26','27') : 8, ('28','29') : 9, ('30') : 10}

for clms in monster_df.iloc[:,28:34]:
    monster_stat = clms.split('_')[0]
    for indx, value in enumerate(monster_df[clms]):
        if math.isnan(value):
            for stat_num, modifier in stat_modifiers.items():
               if str(monster_df.loc[indx,monster_stat]) in stat_num:
                    monster_df.loc[indx,clms] = modifier
          

In [208]:

#evaluate string and turn into lists
column_lists = ["Environment", "Reactions", "Actions"]
for columns in column_lists:
    monster_df[columns] = monster_df[columns].apply(ast.literal_eval)
            
#"Damage Resistances","Damage Vulnerabilities", "Damage Immunities", have wonky typing due to semicolon
# Traits, Condition immunities, saving throws create type error

#check that they are lists
for i,j in enumerate(monster_df["Environment"]):
   print("list",i,"is",type(j))

#create dummy variables for envinroment, which includes the list for variables
dummies = pd.get_dummies(monster_df['Environment'].explode()).reset_index().groupby(['index']).sum()
monster_df = pd.concat([monster_df,dummies], axis=1)

list 0 is <class 'list'>
list 1 is <class 'list'>
list 2 is <class 'list'>
list 3 is <class 'list'>
list 4 is <class 'list'>
list 5 is <class 'list'>
list 6 is <class 'list'>
list 7 is <class 'list'>
list 8 is <class 'list'>
list 9 is <class 'list'>
list 10 is <class 'list'>
list 11 is <class 'list'>
list 12 is <class 'list'>
list 13 is <class 'list'>
list 14 is <class 'list'>
list 15 is <class 'list'>
list 16 is <class 'list'>
list 17 is <class 'list'>
list 18 is <class 'list'>
list 19 is <class 'list'>
list 20 is <class 'list'>
list 21 is <class 'list'>
list 22 is <class 'list'>
list 23 is <class 'list'>
list 24 is <class 'list'>
list 25 is <class 'list'>
list 26 is <class 'list'>
list 27 is <class 'list'>
list 28 is <class 'list'>
list 29 is <class 'list'>
list 30 is <class 'list'>
list 31 is <class 'list'>
list 32 is <class 'list'>
list 33 is <class 'list'>
list 34 is <class 'list'>
list 35 is <class 'list'>
list 36 is <class 'list'>
list 37 is <class 'list'>
list 38 is <class 'lis

## Actions: Damage
While there is a ton of information in Actions that we may use for word clouds later, the most critical thing for our MVP is trying to pull out the potential damage of the monsters. This is difficult since monsters are so diverse, some have multiattack, which could mean many different things, some have spells, which we don't immediatelly have the damage for, some do secondary damage upon a failed saving throw. So distilling this down into a simple X damage per round will prove difficult. 

First, we can see that regular attacks follow a pattern of 'Hit: XX (XdX + X) """ """ damage.' This is important because we can pull out the average damage and use it for the monster. I'm thinking we may need to make a seperate dataframe to work with this information to start.

In [209]:
monster_actions = monster_df[['Monster Name', 'Actions']]

#Find out max number of attacks is 10
max_attacks = monster_actions['Actions'].explode()
max_attacks.groupby(max_attacks.index).count().max()

#Create 10 columns to work through attacks individually
monster_actions = monster_actions.assign(Attack_1 = "", Attack_2 = "",  Attack_3 = "", Attack_4 = "", Attack_5 = "", Attack_6 = "", Attack_7 = "", Attack_8 = "", Attack_9 = "", Attack_10 = "")

for indx,actions in enumerate(monster_actions['Actions']):
    n = 0
    for action in actions:
        monster_actions.iloc[indx, n+2] = action
        n+=1
        
#columns Attack_9 and Attack_10 can be removed since they are relating to dragon polymorph
monster_actions.drop(columns={"Actions", "Attack_9","Attack_10"},axis=1, inplace = True)

In [210]:
#update column to show dictionary of types and number of attacks for multiattacks

def MultiAttackSearch(attack_value, search, replaced, replace, split):
    multiattack = re.search(search, attack_value).group(1)
    multiattack = multiattack.replace(replaced,replace)
    multiattack = re.split(split, multiattack)
    for indx1, item in enumerate(multiattack):
        if item !=" ":
            MA_number = {}
            multiattack[indx1] = item.split()
            value, key = multiattack[indx1][0], multiattack[indx1][1]
            MA_number[key] = w2n.word_to_num(value)
            multiattack[indx1] = MA_number
    return multiattack

for indx, attack in enumerate(monster_actions["Attack_1"]):
        if "Multiattack" in attack:
            if ": " in attack:
                monster_actions.loc[indx,"Attack_1"] = MultiAttackSearch(attack,"\: (.*?)\.", " with its "," ",'and |,')
            elif ("makes " in attack) and ("1d4" not in attack) and ("either" not in attack) and ("as" not in attack):   
                multiattack = re.search("makes (.*?) ", attack).group(1)
                multiattack = multiattack.replace("makes ","")
                multiattack = w2n.word_to_num(multiattack)
                MA_number = {}
                value, key = multiattack, "Attack"
                MA_number[key] = value
                Monster_list = []
                Monster_list.append(MA_number)
                monster_actions.loc[indx,"Attack_1"] = Monster_list
            elif ("medusa" in attack):
                monster_actions.loc[indx,"Attack_1"] = [{'snake hair':1},{'shortsword':2}]
            elif ("drider" in attack):
                monster_actions.loc[indx,"Attack_1"] = [{'longsword':3}]
            elif ("flameskull" in attack):
                monster_actions.loc[indx,"Attack_1"] = [{'Fire Ray':2}]
            elif ("oni" in attack):
                monster_actions.loc[indx,"Attack_1"] = [{'Glaive':2}]
            elif ("fungus" in attack):
                monster_actions.loc[indx,"Attack_1"] = [{"Rotting Touch": 4}]
            elif ("hydra" in attack):
                monster_actions.loc[indx,"Attack_1"] = [{"Bite": 5}]
            elif ("assassin" in attack):
                monster_actions.loc[indx,"Attack_1"] = [{"Bite": 5}]
            elif ("rakshasa" in attack):
                monster_actions.loc[indx,"Attack_1"] = [{"Bite": 5}]
            elif ("veteran" in attack):
                monster_actions.loc[indx,"Attack_1"] = [{"longsword": 2},{"shortsword":1}]





In [211]:
#The way split works if there is ', and' it causes a blank value in the list, so we will delete it
for indx, values in enumerate(monster_actions["Attack_1"]):
    if isinstance(values,list):
        if " " in values:
            values.remove(" ")

monster_actions["Multiattack"] = monster_actions["Attack_1"]

In [212]:
#remove the multiattack from Attack_1 so that it only will contain attacks for calculations
for indx, value in enumerate(monster_actions["Attack_1"]):
    if type(value) == list:
        monster_actions.loc[indx,"Attack_1"] = ""

#remove the attacks from multiattack so that it only will contain attacks for calculations
for indx, value in enumerate(monster_actions["Multiattack"]):
    if type(value) != list:
        monster_actions.loc[indx,"Multiattack"] = ""

In [213]:
# Several monsters have "if this monster takes damage than X happens", which makes finding and replacing attacks with "target takes X damage" difficult. We will replace these 5 spots manually

#replace "Hit:" attacks with just damage and bonus damage such as extra lightning damage
for col in monster_actions.iloc[:,1:9]:
    for indx, attack in enumerate(monster_actions[col]):
        try:
            if ("worm" in attack) or ("kraken takes" in attack) or ("remorhaz takes" in attack) or ("behir takes" in attack) or ("tarrasque takes" in attack): 
                monster_actions.loc[indx,col] = 0
            elif "plus" in attack:
                ext_damage = int(re.search("plus (.+?) ", attack).group(1))         
                if "Hit:" in attack:
                    prim_damage = int(re.search("Hit: (.+?) ", attack).group(1))
                monster_actions.loc[indx,col] = prim_damage + ext_damage
            else:
                if "Hit:" in attack:
                    monster_actions.loc[indx,col] = int(re.search("Hit: (.+?) ", attack).group(1))
                elif "taking " in attack:
                    monster_actions.loc[indx,col] = int(re.search("taking (.+?) ", attack).group(1))
                elif "take " in attack:
                    monster_actions.loc[indx,col] = int(re.search("take [0-9]+ ", attack).group(0).split()[1])
                elif "takes " in attack:
                    monster_actions.loc[indx,col] = int(re.search("takes [0-9]+ ", attack).group(0).split()[1])                                    
        except:
            continue

In [214]:
#replace remaining strings as 0
for col in monster_actions.iloc[:,1:11]:
    monster_actions[col].replace(to_replace='^', value=0, regex=True, inplace=True)

In [215]:
# Create Round 1 damage for multi attack and single attack monsters
for indx, lst in enumerate(monster_actions["Multiattack"]):
    if lst != 0:
        attack_total = 0
        for i in range(len(lst)):
            col = str("Attack_"+str(i+2))
            attack_num = monster_actions.loc[indx,col]
            multiplier = int(list(lst[i].values())[0])
            attack_total += attack_num * multiplier 
        monster_actions.loc[indx,"Round_1"] = attack_total
    else:
        monster_actions.loc[indx,"Round_1"] = monster_actions.loc[indx,"Attack_1"] 

monster_actions["Round_2"] = monster_actions["Round_1"] 

In [216]:
monster_actions["Round_3"] = monster_actions["Round_1"]
for col in monster_actions.iloc[:,1:9]:
    for indx, attack in enumerate(monster_actions[col]):
        if attack > monster_actions.loc[indx,"Round_1"]:
            monster_actions.loc[indx,"Round_3"] = attack


In [217]:
# Final Round Total
monster_actions["Total_Action Damage_3Rounds"] = (monster_actions["Round_1"] + monster_actions["Round_2"] +  monster_actions["Round_3"])

In [218]:
# Creatures with swallow all have additional Round damage
# Giant Toad 10
# Giant Frog 5
# Remorhaz 21
# Tarrasque 56
# Behir 21
# Old Croaker 10 atk2
monster_actions[monster_actions["Monster Name"] == "Giant Toad"]

swallow_monsters = ['Giant Toad', 'Giant Frog', 'Remorhaz', 'Tarrasque', 'Behir', 'Old Croaker']

for monster in swallow_monsters:
    indx = monster_actions[monster_actions["Monster Name"] == monster].index
    for col in monster_actions[['Attack_8','Attack_7','Attack_6','Attack_5','Attack_4','Attack_3','Attack_2']]:
        attack = int(monster_actions.loc[indx, col])
        if attack != 0:
            print(monster_actions.loc[indx, "Round_1"] )
            monster_actions.loc[indx, "Round_1"] += attack
            monster_actions.loc[indx, "Round_2"] += attack
            monster_actions.loc[indx, "Round_3"] += attack
            break                
                   


251    12.0
Name: Round_1, dtype: float64
302    4.0
Name: Round_1, dtype: float64
340    50.0
Name: Round_1, dtype: float64
341    148.0
Name: Round_1, dtype: float64
295    56.0
Name: Round_1, dtype: float64
314    12.0
Name: Round_1, dtype: float64


In [219]:
#Monsters with  Angelic Weapon need to increase damage: Deva 54, Planatar, 67.5, Solar 81

deva_index = monster_actions[monster_actions["Monster Name"]=="Deva"].index
solar_index = monster_actions[monster_actions["Monster Name"]=="Solar"].index
planatar_index = monster_actions[monster_actions["Monster Name"]=="Planetar"].index


monster_actions.loc[deva_index,"Total_Action Damage_3Rounds"]+=108
monster_actions.loc[solar_index,"Total_Action Damage_3Rounds"]+=162
monster_actions.loc[planatar_index,"Total_Action Damage_3Rounds"]+=135

In [220]:
#While not a perfect set, its a great start for the MVP lets clean up for Adding new columns
monster_actions.drop(columns={"Multiattack", "Round_1", "Round_2","Round_3"}, axis = 1, inplace = True)

In [221]:
monster_actions[monster_actions["Monster Name"]=="Planetar"]

Unnamed: 0,Monster Name,Attack_1,Attack_2,Attack_3,Attack_4,Attack_5,Attack_6,Attack_7,Attack_8,Total_Action Damage_3Rounds
134,Planetar,0,43,0,0,0,0,0,0,393.0


## Reactions and Legendary Actions
Now we need to add Reaction and Legendary Action damage to the total action damage for many of these monsters.

In [222]:
#Need to Add Legendary Actions and Reactions to Total Damage before calculating average
monster_reactions = monster_df[['Monster Name', 'Reactions']]

In [223]:
# None of the Reactions deal with damage. There are some AC based ones we may consider
for value in enumerate(monster_reactions["Reactions"]):
    print(value)

(0, ['NA'])
(1, ['NA'])
(2, ['NA'])
(3, ['NA'])
(4, ['NA'])
(5, ['NA'])
(6, ['NA'])
(7, ["Split. When a pudding that is Medium or larger is subjected to lightning or slashing damage, it splits into two new puddings if it has at least 10 hit points. Each new pudding has hit points equal to half the original pudding's, rounded down. New puddings are one size smaller than the original pudding."])
(8, ['NA'])
(9, ['NA'])
(10, ['NA'])
(11, ['NA'])
(12, ['NA'])
(13, ["Unnerving Mask. When a creature the devil can see starts its turn within 30 feet of the devil, the devil can create the illusion that it looks like one of the creature's departed loved ones or bitter enemies. If the creature can see the devil, it must succeed on a DC 14 Wisdom saving throw or be frightened until the end of its turn."])
(14, ['NA'])
(15, ['NA'])
(16, ['NA'])
(17, ['NA'])
(18, ['NA'])
(19, ['NA'])
(20, ['NA'])
(21, ['NA'])
(22, ['NA'])
(23, ['NA'])
(24, ['NA'])
(25, ['NA'])
(26, ['NA'])
(27, ['NA'])
(28, ['NA'])


In [224]:
#Need to Add Legendary Actions and Reactions to Total Damage before calculating average
monster_leg_actions = monster_df[['Monster Name', 'Legendary Actions']]

In [225]:
monster_leg_actions['Legendary Actions'][1]

'["The dragon can take 3 legendary actions, choosing from the options below. Only one legendary action option can be used at a time and only at the end of another creature\'s turn. The dragon regains spent legendary actions at the start of its turn.", \'Detect. The dragon makes a Wisdom (Perception) check.\', \'Tail Attack. The dragon makes a tail attack.\', \'Wing Attack (Costs 2 Actions). The dragon beats its wings. Each creature within 10 feet of the dragon must succeed on a DC 21 Dexterity saving throw or take 15 (2d6 + 8) bludgeoning damage and be knocked prone. The dragon can then fly up to half its flying speed.\']'

In [226]:
#Start with Dragons, all dragons will choose to make 3 tail attacks each round since predicting AOE damage is more involved. They would need to hit more than two people to be worth it.
#3 rounds and 3 L.A. per round

for indx, attack in enumerate(monster_actions["Attack_4"]):
    if "Ancient" in monster_actions.loc[indx,"Monster Name"] or "Adult" in monster_actions.loc[indx,"Monster Name"]:
        monster_actions.loc[indx,"Legendary Damage_3Rounds"] = attack*3*3

In [227]:
monster_actions[monster_actions['Legendary Damage_3Rounds'].isna()]

Unnamed: 0,Monster Name,Attack_1,Attack_2,Attack_3,Attack_4,Attack_5,Attack_6,Attack_7,Attack_8,Total_Action Damage_3Rounds,Legendary Damage_3Rounds
3,Air Elemental,0,14,0,0,0,0,0,0,84.0,
4,Ape,0,6,6,0,0,0,0,0,36.0,
5,Assassin,0,6,7,0,0,0,0,0,90.0,
6,Azer,10,0,0,0,0,0,0,0,30.0,
7,Black Pudding,24,0,0,0,0,0,0,0,72.0,
...,...,...,...,...,...,...,...,...,...,...,...
343,Wyvern,0,11,13,11,0,0,0,0,72.0,
344,Zombie,4,0,0,0,0,0,0,0,12.0,
345,Commoner,2,0,0,0,0,0,0,0,6.0,
346,Giant Owl,8,0,0,0,0,0,0,0,24.0,


In [228]:
#filter down to which monsters have LA
#remove dragons since we completed them already based on industry knowledge
monster_leg_actions = monster_leg_actions[monster_leg_actions["Legendary Actions"].str.contains('legendary actions')]
monster_leg_actions = monster_leg_actions[~monster_leg_actions["Legendary Actions"].str.contains('dragon')]

In [229]:
Leg_action_damage = [35,28,39,54,23,33,45,51,86,84]

monster_leg_actions["Legendary Damage"] = Leg_action_damage

# Mummy Lord 35
# Solar 28
# Gynosphinx 39
# Lich 54
# Vampire 23
# Unicorn 33
# Aboleth 45
# Androsphinx 51
# Kraken 86
# Tarrasque  84

In [230]:
monster_leg_actions["Legendary Damage"] = monster_leg_actions["Legendary Damage"] * 3
#because there are so few left, it will be easier to manually deal with them than trying to search for unifying words. If this ends up being a useful MVP, we will have a lot more work to do. 

In [231]:
monster_leg_actions

Unnamed: 0,Monster Name,Legendary Actions,Legendary Damage
54,Mummy Lord,"[""The mummy lord can take 3 legendary actions,...",105
70,Solar,"[""The solar can take 3 legendary actions, choo...",84
126,Gynosphinx,"[""The sphinx can take 3 legendary actions, cho...",117
127,Lich,"[""The lich can take 3 legendary actions, choos...",162
202,Vampire,"['The vampire can take 3 legendary actions, ch...",69
212,Unicorn,"['The unicorn can take 3 legendary actions, ch...",99
214,Aboleth,"[""The aboleth can take 3 legendary actions, ch...",135
226,Androsphinx,"[""The sphinx can take 3 legendary actions, cho...",153
268,Kraken,"[""The kraken can take 3 legendary actions, cho...",258
341,Tarrasque,"['The tarrasque can take 3 legendary actions, ...",252


In [232]:
for indx,damage in zip(monster_leg_actions["Legendary Damage"].index,monster_leg_actions["Legendary Damage"]):
   monster_actions.loc[indx,"Legendary Damage_3Rounds"] = damage

In [233]:
monster_actions[monster_actions["Monster Name"] == "Kraken"]

Unnamed: 0,Monster Name,Attack_1,Attack_2,Attack_3,Attack_4,Attack_5,Attack_6,Attack_7,Attack_8,Total_Action Damage_3Rounds,Legendary Damage_3Rounds
268,Kraken,0,23,0,20,0,22,0,0,207.0,258.0


In [234]:
monster_actions["Legendary Damage_3Rounds"].fillna(0,inplace=True)

In [235]:
monster_actions["Average_Damage_per_Round"] = (monster_actions["Total_Action Damage_3Rounds"]+monster_actions["Legendary Damage_3Rounds"])/3

In [236]:
monster_actions.drop(monster_actions.iloc[:,0:11], axis=1, inplace= True)

In [237]:
monster_actions

Unnamed: 0,Average_Damage_per_Round
0,97.000000
1,103.000000
2,94.333333
3,28.000000
4,12.000000
...,...
343,24.000000
344,4.000000
345,2.000000
346,8.000000


In [238]:
monster_df = pd.concat([monster_df,monster_actions], axis=1)

In [239]:
monster_df['Average_Damage_per_Round'].describe()

count    348.000000
mean      24.298851
std       32.164982
min        0.000000
25%        5.000000
50%       12.000000
75%       28.000000
max      232.000000
Name: Average_Damage_per_Round, dtype: float64

## Damage Immunities, Resistences, Condition Immunities, Vunerabilities
We are only interested in how many the monsters possess for now, There may come a time where which type is important (if we ever do analysis on common damage inflicted)




In [240]:
monster_resist = monster_df[["Monster Name","Damage Resistances", "Damage Immunities", "Condition Immunities", "Damage Vulnerabilities"]]

In [241]:
#Set NA to zero since they have no resistances
for indx, resist in enumerate(monster_resist["Damage Resistances"]):
    if "Bludgeoning" in resist:
        resist = resist.replace("Bludgeoning, Piercing, and Slashing", "Blugeoning")
        resist = resist.replace(";", ",")
        monster_resist.loc[indx,'Damage Resistances'] = len(resist.split(','))
    elif monster_resist.loc[indx,"Damage Resistances"] == "['NA']":
        monster_resist.loc[indx,'Damage Resistances'] = 0
    else:
        monster_resist.loc[indx,"Damage Resistances"] = len(resist.split(','))

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value, self.name)


In [242]:
#Set NA to zero since they have no Immunities
for indx, resist in enumerate(monster_resist["Damage Immunities"]):
    if "Bludgeoning" in resist:
        resist = resist.replace("Bludgeoning, Piercing, and Slashing", "Blugeoning")
        resist = resist.replace(";", ",")
        monster_resist.loc[indx,'Damage Immunities'] = len(resist.split(','))
    elif monster_resist.loc[indx,"Damage Immunities"] == "['NA']":
        monster_resist.loc[indx,'Damage Immunities'] = 0
    else:
        monster_resist.loc[indx,"Damage Immunities"] = len(resist.split(','))

In [243]:
#Set NA to zero since they have no condition immunities
for indx, resist in enumerate(monster_resist["Condition Immunities"]):
    if monster_resist.loc[indx,"Condition Immunities"] == "['NA']":
        monster_resist.loc[indx,'Condition Immunities'] = 0
    else:
        monster_resist.loc[indx,'Condition Immunities'] = len(resist.split(','))

In [244]:
#Set NA to zero since they have no vulnerabilties
for indx, resist in enumerate(monster_resist["Damage Vulnerabilities"]):
    if "Bludgeoning" in resist:
        resist = resist.replace("Bludgeoning, Piercing, and Slashing", "Blugeoning")
        resist = resist.replace(";", ",")
        monster_resist.loc[indx,'Damage Vulnerabilities'] = len(resist.split(','))
    elif monster_resist.loc[indx,"Damage Vulnerabilities"] == "['NA']":
        monster_resist.loc[indx,'Damage Vulnerabilities'] = 0
    else:
        monster_resist.loc[indx,"Damage Vulnerabilities"] = len(resist.split(','))

In [245]:
monster_resist.drop("Monster Name",axis=1, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


In [246]:
monster_df.drop(columns={"Damage Resistances", "Damage Immunities", "Condition Immunities", "Damage Vulnerabilities"},axis =1,inplace=True)
monster_df = pd.concat([monster_df,monster_resist],axis=1)


In [247]:
monster_df.drop('Saving Throws', axis=1)

Unnamed: 0,Monster Name,Size,Type,Alignment,Traits,Reactions,Armor Class,Hit Points,Speed,Challenge,...,NA,Swamp,Underdark,Underwater,Urban,Average_Damage_per_Round,Damage Resistances,Damage Immunities,Condition Immunities,Damage Vulnerabilities
0,Adult Green Dragon,Huge,dragon,lawful evil,['Amphibious. The dragon can breathe air and w...,[NA],19,207,"40 ft., fly 80 ft., swim 40 ft.",15.00,...,0,0,0,0,0,97.000000,0,1,1,0
1,Adult Silver Dragon,Huge,dragon,lawful good,['Legendary Resistance (3/Day). If the dragon ...,[NA],19,243,"40 ft., fly 80 ft.",16.00,...,0,0,0,0,1,103.000000,0,1,0,0
2,Adult White Dragon,Huge,dragon,chaotic evil,"[""Ice Walk. The dragon can move across and cli...",[NA],18,200,"40 ft., burrow 30 ft., fly 80 ft., swim 40 ft.",13.00,...,0,0,0,0,0,94.333333,0,1,0,0
3,Air Elemental,Large,elemental,neutral,"[""Air Form. The elemental can enter a hostile ...",[NA],15,90,"0 ft., fly 90 ft. (hover)",5.00,...,0,0,0,0,0,28.000000,3,1,8,0
4,Ape,Medium,beast,unaligned,[nan],[NA],12,19,"30 ft., climb 30 ft.",0.50,...,0,0,0,0,0,12.000000,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
343,Wyvern,Large,dragon,unaligned,[nan],[NA],13,110,"20 ft., fly 80 ft.",6.00,...,0,0,0,0,0,24.000000,0,0,0,0
344,Zombie,Medium,undead,neutral evil,['Undead Fortitude. If damage reduces the zomb...,[NA],8,22,20 ft.,0.25,...,0,0,0,0,1,4.000000,0,1,1,0
345,Commoner,Medium,humanoid,any alignment,[nan],[NA],10,4,30 ft.,0.00,...,0,0,0,0,1,2.000000,0,0,0,0
346,Giant Owl,Large,beast,neutral,"[""Flyby. The owl doesn't provoke opportunity a...",[NA],12,19,"5 ft., fly 60 ft.",0.25,...,0,0,0,0,0,8.000000,0,0,0,0


In [248]:
monster_df["Traits"].value_counts()

[nan]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    56
['Legendary Resistance (3/Day). If the dragon fails a saving throw, it can choose to succeed instead.']                                                                                                                                                                                                                                                                                                                                                                                   9
['Amphibious. The dragon can breathe air and wat

## Traits and Spellcasters
Lets try and simplify traits so we can use them, also we need to take a manual look at our spellcasters to ensure the damage isn't skewed since they aren't physical fighters

In [249]:

monster_traits = monster_df[['Monster Name', 'Traits']]

#binary 1 or 0 for spellcaster

for indx, spell in enumerate(monster_traits["Traits"]):
    if "Spellcasting" in monster_traits.loc[indx,"Traits"]:
        monster_traits.loc[indx, "Spellcaster"] = 1
    else:
        monster_traits.loc[indx, "Spellcaster"] = 0

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


In [250]:
monster_traits[monster_traits["Traits"].str.contains("Angelic Weapons")]

Unnamed: 0,Monster Name,Traits,Spellcaster
19,Deva,"[""Angelic Weapons. The deva's weapon attacks a...",1.0
70,Solar,"[""Angelic Weapons. The solar's weapon attacks ...",1.0
134,Planetar,"[""Angelic Weapons. The planetar's weapon attac...",1.0


In [251]:
for indx, spell in enumerate(monster_traits["Traits"]):
    if "Magic Resistance" in monster_traits.loc[indx,"Traits"]:
        monster_traits.loc[indx, "Magic Resistance"] = 1
    else:
        monster_traits.loc[indx, "Magic Resistance"] = 0

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


In [252]:
for indx, spell in enumerate(monster_traits["Traits"]):
    if "Legendary Resistance" in monster_traits.loc[indx,"Traits"]:
        monster_traits.loc[indx, "Legendary Resistance"] = 1
    else:
        monster_traits.loc[indx, "Legendary Resistance"] = 0

In [253]:
for indx, spell in enumerate(monster_traits["Traits"]):
    if "Regeneration" in monster_traits.loc[indx,"Traits"]:
        monster_traits.loc[indx, "Regeneration"] = 1
    else:
        monster_traits.loc[indx, "Regeneration"] = 0

In [254]:
for indx, spell in enumerate(monster_traits["Traits"]):
    if "Undead Fortitude" in monster_traits.loc[indx,"Traits"]:
        monster_traits.loc[indx, "Undead Fortitude"] = 1
    else:
        monster_traits.loc[indx, "Undead Fortitude"] = 0

In [255]:
for indx, spell in enumerate(monster_traits["Traits"]):
    if "Pack Tactics" in monster_traits.loc[indx,"Traits"]:
        monster_traits.loc[indx, "Pack Tactics"] = 1
    else:
        monster_traits.loc[indx, "Pack Tactics"] = 0

In [256]:
for indx, spell in enumerate(monster_traits["Traits"]):
    if "Damage Transfer" in monster_traits.loc[indx,"Traits"]:
        monster_traits.loc[indx, "Damage Transfer"] = 1
    else:
        monster_traits.loc[indx, "Damage Transfer"] = 0

In [257]:
for indx, spell in enumerate(monster_traits["Traits"]):
    if "Angelic Weapons" in monster_traits.loc[indx,"Traits"]:
        monster_traits.loc[indx, "Angelic Weapons"] = 1
    else:
        monster_traits.loc[indx, "Angelic Weapons"] = 0

In [258]:
for indx, spell in enumerate(monster_traits["Traits"]):
    if "Charge" in monster_traits.loc[indx,"Traits"]:
        monster_traits.loc[indx, "Charge"] = 1
    else:
        monster_traits.loc[indx, "Charge"] = 0

In [259]:
monster_traits.drop("Traits",axis=1,inplace=True)
monster_df = pd.concat([monster_df,monster_traits],axis=1)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


In [260]:
monster_df.columns

monster_df.drop(columns={"Saving Throws"}, axis=1, inplace=True)

In [261]:
monster_df.to_csv('../data/processed/1_Monster_DataSet.csv')