# DnD Monsters: Dice and Data
As a Dungeon Master, it is very important to understand the strength of the monsters you pit against your players. Too weak, they are bored, too strong, they die... or worse..they don't have fun. The current method known as Challenge Rating, CR, is a numerical system used to determine how difficult an enemey is based on a party of 4 players. Challenge Ratings range from 0 to 30.

CR uses AC, HP, attack bonus, damage per round and Save DC as a general guideline. Unfortunately, it doesn't fully take into account legendary actions, at will spells, special abilities that cause status ailments, or any other boosting abilities. It's pretty tuned into attack and defensive type behaviors. That makes sense too, because they are the most translatable into an equation, which WotC has done for us (I showed this in the previous notebook).

The goal of this investigation is to learn more about Monster's abilities in relation to the CR system. To understand if there are correlations in any of the stats, abilities, environments, size, etc. To see if we can classify monsters based on any of these traits. Finally, to create a basic app that can predict a monster stat and trait block based on inputs such as party level, difficulty, monster type, environment, and alignment. 

In order to stay within WotC terms of service, we will only work on monsters that are part of the SRD. 

## DnDWiki: html instead of DnDBeyond's javascript
Although, DnDBeyond has a really well structured website, it requires javascript parsing, which is more advanced than the knowledge than the scope of my original project. I was hoping to stick with requests. 

Intead, I will try to work with DnDWiki since it utilizes all html.

In [18]:
# Import Libraries for scraping
import requests as rq
import pandas as pd
import numpy as np
from time import sleep
from collections import defaultdict

#Scraping libraries
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.common.exceptions import TimeoutException, WebDriverException
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service

from random_user_agent.user_agent import UserAgent
from random_user_agent.params import SoftwareName, OperatingSystem


#Helper functions
from src.data.build_data import monster_stat_gathering
from src.data.selenium_scrape import Request

### Get Request for Monster Names
This will give us a list of monster names to use for stat block parsing

In [11]:
# Fetching HTML
url = "https://www.dandwiki.com/wiki/5e_SRD:Monsters"
Request = rq.get(url).text

soup = bs(Request, 'html.parser')

### Collect Names of All Monsters in a List 
Unfortunately, dndwiki is not well crafted, which meant I needed to get creative. There weren't distinguishing classes or names or ids, which is usually what I would use to parse. However, the styles between tables were a bit different, so i used that to gather the information needed.

In [12]:
# Find the main content div and and extract it for processing
# This involves finding the list items that are only housed within the parent table that has a width of 100%.
tables = soup.findAll('table', {'style':"width: 100%;"})
monster_names_dndwiki = []

for table in tables:
    li_table = table.findAll('li')
    for name in li_table:
          monster_names_dndwiki.append(name.text)

### Clean up data
Because DnDWiki doesn't have a great structure, even our creative method had some issues.
We need to remove duplicates and non-monsters from the list.

In [13]:
# Remove duplicate monsters
monster_names = list(set(monster_names_dndwiki))

# Filter through and replace spaces with dashes to format names for url intergration
monster_list = []
for name in monster_names:
    if not(name.strip().isdigit()):
        new_name = name.replace(' ','-')
        monster_list.append(new_name)
    else:
        monster_list.append(name)



### Dictionary of URLs to parse
I will iterate through the monster name, knowing that dandwiki has a uniform site for all monsters pages www.dandwiki.com/wiki/5e_SRD:'MonsterName'.

In [None]:
monster_url=[]
for name in monster_list:
    monster_url.append('https://www.dndbeyond.com/monsters/'+name)


### Website Structure is disgusing
There are still some things on here that are not monsters (they summon monsters). For example the Deck of Many Things. This will break any analysis or modeling I try to do, so I need to remove them. 

We can look at all things monsters have in common that these other objects do not. Unfortunately, the 'DoMT' and the 'figures of power' also contain niche "monster" stats for their monsters. 

We will include these in our table, however Zombies and Dinosaurs do not, since they are just a category of many monsters, all of which are included in the list already. 



## DNDBeyond: Testing selenium webdriver on DnDBeyond with a single Monster
DnDWiki is frankly just very unhelpful in terms of web structure. 

There are no defining classes, ids , names, or elements on any of the information, which makes parsing a nightmare. I will move into DnDBeyond using Selenium.

Since I've never used Selenium before, and its quite a bit more involved, I will take a lot of my information from this documentation: https://selenium-python.readthedocs.io/

Additionally, I will develop the process incrementally, first by pulling information from a single monster.

First, I will grab all the information from the Mummy Lord in the 'mon-stat-block' class and the footer information which contains all our tags like source book, environment, and monster tags.

### Libraries for Parsing

First I need to gain access to our monster data sheet. As stated above, dndbeyond.com has a great repository of monster data. This will need to be scraped from there site. Unfortuntately, each of the monster pages is hidden behind an accordion dropdown and will need to be extracted. This is something I have not yet done, so I am excited to try. 

In [10]:
url = 'https://www.dndbeyond.com/monsters/mummy-lord'

# selenium webdriver: has to be updated occasionally 
driver = webdriver.Chrome(executable_path='../env/chromedriver.exe')

driver.get(url)

driver.implicitly_wait(5)

# javascript parser
soup = bs(driver.page_source, 'lxml')

# search for div with stats and footer than contains tags such as environment
stat_block = soup.find('div',{'class':'mon-stat-block'})
Environment = soup.find('footer')

  driver = webdriver.Chrome(executable_path='../env/chromedriver.exe')


### Column Names: Parsing for headings, labels, and tags
Unfortunately, I don't know any one monster that contains every single type of column we are looking for. The Mummy Lord is a strong enemy that includes a lot of traits and features.
I added any column names to the start of the list if they weren't included in the Mummy Lord's stat blocks.

Then we create for loops looking for classes that end with 'label' or 'heading'. This should give us all the column names we need. 

I will start with just enviromnnt-tags (later I will decide to expand this to all tags)

In [11]:
# Column names that weren't added from the Mummy Records, but exist in DnDBeyond
column_names = ['Monster Name','Size','Type', 'Alignment','Traits', 'Damage Resistances', 'Monster Tags:', 'Mythic Actions', 'Reactions','Source']

# Search for new headers and append to the column_name list
for headers in stat_block.findAll('span',{'class': lambda e: e.endswith('label') if e else False}):    
    column_names.append(headers.text)
    
for headers in stat_block.findAll('div',{'class': lambda e: e.endswith('heading') if e else False}):    
    column_names.append(headers.text)

# Formating for the environment-tags is a bit different and requires striping contents
for headers in Environment.findAll('p',{'class': lambda e: e.startswith('environment-tags') if e else False}):    
    column_names.append(headers.contents[0].strip())



### Create Dictionary to Store Monster Data
While we only have one monster now, this will become more important later on

In [12]:
# Create empty dictionary with column names as keys
monster_dict = dict.fromkeys(column_names)

# Initialize the monster_dic with each value for all keys to be an empty list
for column in column_names:
    monster_dict[column] = []

monster_dict

{'Monster Name': [],
 'Size': [],
 'Type': [],
 'Alignment': [],
 'Traits': [],
 'Damage Resistances': [],
 'Monster Tags:': [],
 'Mythic Actions': [],
 'Reactions': [],
 'Source': [],
 'Armor Class': [],
 'Hit Points': [],
 'Speed': [],
 'Saving Throws': [],
 'Skills': [],
 'Damage Vulnerabilities': [],
 'Damage Immunities': [],
 'Condition Immunities': [],
 'Senses': [],
 'Languages': [],
 'Challenge': [],
 'Proficiency Bonus': [],
 'STR': [],
 'DEX': [],
 'CON': [],
 'INT': [],
 'WIS': [],
 'CHA': [],
 'Actions': [],
 'Legendary Actions': [],
 'Environment:': []}

### Add Values of Mummy Data into our Dictionary
Here is our big show stopper. This will be turned into a function to be used in the main scrape

In [None]:
# Monster Name
monster_name = stat_block.find('div', {'class':'mon-stat-block__name'}).text
monster_dict['Monster Name'].append(' '.join(str(monster_name).split())) 

# The subtitle contains: (Size, Alignment, and Type), will split the single meta text using split() and replace() functions
monster_subtitle_info = stat_block.find('div', {'class':'mon-stat-block__meta'})
monster_subtitle_info=monster_subtitle_info.text

# Size (first word)
monster_size = monster_subtitle_info.split()[0]
monster_dict['Size'].append(monster_size) 

# Alignment (after comma)
monster_alignment = monster_subtitle_info.split(', ')[-1]
monster_dict['Alignment'].append(monster_alignment)

# Type (remaining words). 
'''The sublist will remove the above two variables from the text, as well as the loose comma. It will also create a list for the type, as sometimes there are sub-types associated with monsters (e.g Titan)'''
sub_list = (monster_size, monster_alignment, ', ')
monster_type = monster_subtitle_info
for substring in sub_list:
    monster_type = monster_type.replace(substring,'')
monster_type = monster_type.split()
monster_dict['Type'].append(monster_type) 

# Find all attribute metrics
attribute_data = stat_block.findAll('span',{'class':'mon-stat-block__attribute-data-value'})

# Armor Class
monster_ac = ' '.join(str(attribute_data[0].text).split())
monster_dict['Armor Class'].append(monster_ac)

# Hit Points
monster_hp = ' '.join(str(attribute_data[1].text).split())
monster_dict['Hit Points'].append(monster_hp)

# Speed
monster_speed = ' '.join(str(attribute_data[2].text).split())
monster_dict['Speed'].append(monster_speed)

# Find all tidbit metrics
tidbit_label = stat_block.findAll('span', {'class':'mon-stat-block__tidbit-label'})

for label in tidbit_label:    
    '''
    Because the tidbits column shifts based on the monster, we can't index the rows, as they are added or deleted based on the monster. So instead, we will write a for loop that loops through the monsters tidbit headings (e.g. Skills, Saving Throws, etc.) and if they exits, it will take
    the sibling data (i.e. it will take the actual data corresponding to each heading) and deposit it into the dictionary.
    Any columns not in the monster data will be left blank for now. Each if statement is labeled with the corresponding tidbit.
    '''
    if label.text == "Saving Throws":
        monster_saving_throw = ' '.join(str(label.find_next_sibling('span').text).split())
        monster_dict['Saving Throws'].append(monster_saving_throw)
    elif label.text == "Skills":
        monster_skills = ' '.join(str(label.find_next_sibling('span').text).split())
        monster_dict['Skills'].append(monster_skills)
    elif label.text == "Damage Vulnerabilities":    
        monster_damage_vulnerability = ' '.join(str(label.find_next_sibling('span').text).split())
        monster_dict['Damage Vulnerabilities'].append(monster_damage_vulnerability)
    elif label.text == "Damage Immunities":
        monster_damage_immunity = ' '.join(str(label.find_next_sibling('span').text).split())
        monster_dict['Damage Immunities'].append(monster_damage_immunity)
    elif label.text == 'Condition Immunities':
        monster_condition_immunity = ' '.join(str(label.find_next_sibling('span').text).split())
        monster_dict['Condition Immunities'].append(monster_condition_immunity)
    elif label.text == 'Senses':
        monster_senses = ' '.join(str(label.find_next_sibling('span').text).split())
        monster_dict['Senses'].append(monster_senses)
    elif label.text == 'Languages':
        monster_languages = ' '.join(str(label.find_next_sibling('span').text).split())
        monster_dict['Languages'].append(monster_languages)
    elif label.text == 'Challenge':
        monster_challenge= ' '.join(str(label.find_next_sibling('span').text).split())
        monster_dict['Challenge'].append(monster_challenge)
    elif label.text == 'Proficiency Bonus':
        monster_proficiency = ' '.join(str(label.find_next_sibling('span').text).split())
        monster_dict['Proficiency Bonus'].append(monster_proficiency)
    elif label.text == 'Damage Resistances':
        monster_damage_resistence = ' '.join(str(label.find_next_sibling('span').text).split())
        monster_dict['Damage Resistances'].append(monster_damage_resistence)


# Find all ability score metrics
ability_scores = stat_block.findAll('span',{'class':'ability-block__score'})
    # STR Score
monster_str = ability_scores[0].text
monster_dict['STR'].append(monster_str)
    # DEX Score
monster_dex = ability_scores[1].text
monster_dict['DEX'].append(monster_dex)
    # CON Score
monster_con = ability_scores[2].text
monster_dict['CON'].append(monster_con)
    # INT Score
monster_int = ability_scores[3].text
monster_dict['INT'].append(monster_int)
    # WIS Score
monster_wis = ability_scores[4].text
monster_dict['WIS'].append(monster_wis)
    # CHA Score
monster_cha = ability_scores[5].text
monster_dict['CHA'].append(monster_cha)    
    
# Traits: because traits doesn't contain any defining HTML or any headings such as Actions or Legendary Actions I searched through all the description blocks of the text. If they don't contain the div 'heading' then we print the block. This allows us to only print traits and to place them in a list if need be for later wrangling and analysis. 
             
trait_list = []
description_block = stat_block.findAll('div', {'class':'mon-stat-block__description-block'})
for block in description_block:
     if not block.findAll('div',{'class':'mon-stat-block__description-block-heading'}):
        for p in block.findAll('p'):
            trait_list.append(p.text)

# Remaining descriptions that had headings
description_heading = stat_block.findAll('div', {'class':'mon-stat-block__description-block-heading'})
action_list = []
for heading in description_heading:    
    '''
    Because the description column shifts based on the monster, we can't index the rows, as they are added or deleted based on the monster. So instead, we will write a for loop that loops through the monsters description headings (e.g. Actions, Legendary Actions, etc.) and if they exits, it will take
    the sibling data (i.e. it will take the actual data corresponding to each heading) and deposit it into the dictionary. Any columns not in the monster data will be left blank for now. Each if statement is labeled with the corresponding tidbit.
    '''
    action_list = []
    if heading.text == "Actions":
        monster_actions = heading.find_next_sibling('div')
        for p in monster_actions.findAll('p'):
           action_list.append(p.text.strip())
        monster_dict['Actions'].append(action_list)
    elif heading.text == "Legendary Actions":
        monster_legendary_actions = heading.find_next_sibling('div')
        for p in monster_legendary_actions.findAll('p'):
           action_list.append(p.text.strip())
        monster_dict['Legendary Actions'].append(action_list)
    elif heading.text == "Mythic Actions":
        monster_mythic_actions = heading.find_next_sibling('div')
        for p in monster_mythic_actions.findAll('p'):
           action_list.append(p.text.strip())
        monster_dict['Mythic Actions'].append(action_list)
    elif heading.text == "Reactions":
        monster_reactions = heading.find_next_sibling('div')
        for p in monster_reactions.findAll('p'):
           action_list.append(p.text.strip())
        monster_dict['Reactions'].append(action_list)
         
# These final tags are either referring to the environment it lives in (can be multiple), the sub type its classified as, or the source book it came from. All of these or none of these may be represented in the monster sheet.
monster_tags = Environment.findAll('span') 

for tag in Environment.find_all("p"):
       
    if (tag.contents[0].strip()) == "Environment:":
       monster_dict['Environment:'].append(monster_tags[0].text)
    elif (tag.contents[0].strip()) == "Monster Tags:":
        monster_dict['Monster Tags:'].append(monster_tags[1].text)
    else:
        monster_dict['Source'].append(tag.contents[0].strip())

In [None]:
# Create a series for values that will add NaN for any missing values
monster_dict = dict([ (k,pd.Series(v)) for k,v in monster_dict.items()])

monster_df = pd.DataFrame(monster_dict)
monster_df.to_csv('../data/raw/MummyTest.csv')

## Scraping DnDBeyond for all SRD monsters
1. We will change out naming database since DnDBeyond is now active for us. We will need to first iterate through each of the pages of monster files.
2. Then we will need to read each monster on each of the page and place them into our monster_list
3. Next we will remove any spaces in the monster names and replace them with '-' this will be necessary for the urls
4. We will append to the monster url and add to the monster_url list, which we will then use to iterate over for our above test. 

### Parsing Request Class and Selenium Function
We want our final request clean and clear, so we will create a reusable request class with a get_selenium function.
This function will randomize our user profile to help protect against throttling/halting the srape. We will 
also perform this as headless so as not to tax our computer. The function looks for a certain class, and waits a certain
amount of time. If it sees the class, the function will return the page_source information, otherwise it will close the 
browser.

This class and function were adapted from a medium post by Oren Spiegal on scraping Facebook and Tiktok

In [None]:
# Testing function and class on adult green dragon

page_html = Request('https://www.dndbeyond.com/monsters/adult-green-dragon').get_selenium("mon-stat-block__name")
bs(page_html,'lxml')

### DnD Monster Page Iteration
The website has the same formula 'https://www.dndbeyond.com/monsters?page=' so we just need to iterate from 1 to 106 (last page)

In [None]:
url = 'https://www.dndbeyond.com/monsters?page='

monster__name= []

# Iterate through monster pages and grab each monster class name to place in mosnter_name list
for i in range(1,3):
    
    page_html = Request(url+str(i)).get_selenium('name')
    soup = bs(page_html, 'lxml')
    page_find_names = soup.find_all('span',{'class':'name'})

    for span in page_find_names:
        monster__name.append(span.text.strip())
        
    sleep(60)

In [None]:
monster_nospaces = []

# Filter through and replace spaces with dashes to format for urls

for name in monster__name:
    if not(name.strip().isdigit()):
        new_name = name.replace(' ','-')
        monster_nospaces.append(new_name)
    else:
        monster_nospaces.append(name)
        
monster_name_url_ending = []

# Some monsters contain '()'. Filter and replace '()' with nothing
for name in monster_nospaces:
    if not(name.strip().isdigit()):
        new_name = name.replace('(','')
        final_name = new_name.replace(')','')
        monster_name_url_ending.append(final_name)
    else:
        monster_name_url_ending.append(name)


In [2]:
# Save our names and urls in case IDE crashes (add -r to recover)
%store -r monster_dict
%store -r monster__name
%store -r monster_name_url_ending

## Use Monster Names & Custom Function to scrap SRD Monster Data
Using our test function from the Mummy, we will iterate over all the monsters in monster_name_url_ending
to parse each monster page for their data and slam it into the dictionary!

we saved our previous variable using store magic, so we don't need to rerun the monster names or column names each time.

The function is stored as a helper function in the src folder

### Iterate over monster pages
Don't grab any info that we don't have access to

In [None]:
url = 'https://www.dndbeyond.com/monsters/'
j=0

# Iterate through monster names and add to url variable
for i in monster_name_url_ending[2:107]:
  page_html = None
# Request the html using selenium function
  page_html = Request(url+i).get_selenium('mon-stat-block__name')
  j+=1

#If the request is successful (doesn't hit a blocked monster page), parse and run the function.      
  if page_html is not None:
      soup = bs(page_html, 'lxml')
      monster_stat_gathering(soup,monster_dict)
  sleep(60)
  print(j)
  print(monster_dict)
  
  # Store dictionary in case crash
  %store monster_dict

In [4]:
# Ensure list lengths are the same in order to print to DF
list_length = []

for col in monster_dict:
    list_length.append(len(monster_dict[col]))
print(list_length)

monster_df = pd.DataFrame(monster_dict)
monster_df.to_csv('../data/raw/Partial_Monster_Data8.csv')


[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]


Unnamed: 0,Monster Name,Size,Type,Alignment,Traits,Damage Resistances,Monster Tags:,Mythic Actions,Reactions,Source,...,Proficiency Bonus,STR,DEX,CON,INT,WIS,CHA,Actions,Legendary Actions,Environment:
0,Adult Gold Dragon,Huge,[Dragon],Lawful Good,[Amphibious. The dragon can breathe air and wa...,,,,,Basic Rules,...,6,27,14,25,16,15,24,[Multiattack. The dragon can use its Frightful...,"[The dragon can take 3 legendary actions, choo...","[Forest, Grassland]"
1,Adult Green Dragon,Huge,[Dragon],Lawful Evil,[Amphibious. The dragon can breathe air and wa...,,,,,Basic Rules,...,5,23,12,21,18,15,17,[Multiattack. The dragon can use its Frightful...,"[The dragon can take 3 legendary actions, choo...",[Forest]


### Round 2: Reparse
Apparently, we only got about 1/2 the monsters possible from that run. It's possible that our time to wait is too short, or we have some internet blips. 

I will consolidate all the tests we have done and create a list of names from that list. Then we can remove the names we already have from our master list so we don't rerun them on the next parse. 

In [None]:
%store -r monster_dict

In [5]:
# Create a list of monster names that we have already parsed for comparison.
monster_name_parsed = list(monster_dict["Monster Name"])
monster_name_parsed

['Adult Gold Dragon', 'Adult Green Dragon']

In [15]:
# Using our original list from DnDWiki, lets see what monsters we are missing from the basic rules

# DnDWiki Difference
monster_names_not_in_dndwiki = list(name for name in monster_name_parsed  if name not in monster_names_dndwiki)

print(len(monster_names_not_in_dndwiki), "monsters not found on DnDWiki from DnDBeyond")


# DnDBeyond difference
monster_names_not_parsed_from_DnDBeyond = list(name for name in monster_names_dndwiki if name not in monster_name_parsed)

print(len(monster_names_not_parsed_from_DnDBeyond), "on the DnDWiki list we haven't parsed in DnDBeyond")

0 not in DnDWiki
229 on the DnDWiki list we haven't parsed


### Interesting Results
The dndwiki list has 173 monsters that were not parsed, 231 monsters total. We have parsed 101 monsters, which means the dndbeyond site has available monsters not on the dndwiki page. 
43 monsters according to the difference in the list

This means we shouldn't use the dndwiki list since it will clearly miss cool mosnters

### Can we make a DnDBeyond friendly list?
I don't want to parse 1300 or even 1200 websites again. Is there a way to parse only the basic rules info?

Or, what if I iterated through the dropdown on the monster page with Selienium clicks? That is how I got the monster names, I would still only parse what I have access to

In [16]:
# Turn the monster name parsed list into a "url-ending" list

monster_nospace=[]

# Filter through and replace spaces with dashes to format for urls

for name in monster_name_parsed:
    if not(name.strip().isdigit()):
        new_name = name.replace(' ','-')
        monster_nospace.append(new_name)
    else:
        monster_nospace.append(name)

monster_name_url_ending_parsed = []
# Filter and replace '()' with nothing
for name in monster_nospace:
    if not(name.strip().isdigit()):
        new_name = name.replace('(','')
        final_name = new_name.replace(')','')
        monster_name_url_ending_parsed.append(final_name)
    else:
        monster_name_url_ending_parsed.append(name)



# Remove the allready parsed monsters from the full url-ending list
monster_name_url_ending_second_round = list(name for name in monster_name_url_ending if name not in monster_name_url_ending_parsed)
len(monster_name_url_ending_second_round) 

1378

### Rerun
We have removed the monsters we have already parsed (101) and will rerun with 1279 monsters. I will increase the wait time to 20 seconds hopefully, this will improve capture.

### Rerun 2
We have removed monsters after the second run (additional 58) and will rerun with 1221 monsters. I will increase the wait time to 30 seconds.

### Rerun 3
We have removed monsters after the third run (additional 28) and will rerun with 1193 monsters. I brought the wait time down to 15 seconds again to see if there is any impact. 

### Rerun 4
We have removed monsters after the third run (additional 17) and will rerun with 1176 monsters.

### Rerun 5
We have removed monsters after the 5th run (additional 10) and will rerun. Will spend time thinking about how to remove the known unavailable monsters before proceeding to reduce run time. 

In [None]:
url = 'https://www.dndbeyond.com/monsters/'
j=0

# Iterate through monster names as urls and perform scrape and stat function
for i in monster_name_url_ending_second_round[0:1381]:
  page_html = None
  page_html = Request(url+i).get_selenium('mon-stat-block__name')
  j+=1
       
  if page_html is not None:
      soup = bs(page_html, 'lxml')
      monster_stat_gathering(soup, monster_dict)
  sleep(60)
  print(j)
%store monster_dict

### Recreate List
Before rerunning, we will recreate the list of names from the monster site, while filtering for monsters only in the free source lists. This should limit the number of monsters we need to parse.

In [None]:
#find the uniqe source books we can parse through

source_list=np.array(monster_df['Source'])
print(np.unique(source_list))

#Iterating through all the monster pages in dndbeyond
ser = Service('../env/chromedriver.exe')

url = 'https://www.dndbeyond.com/monsters?page='

monster_name_source_material= []

for i in range(1,107):
    browser = webdriver.Chrome(service=ser)
    browser.get(url+str(i))
    page_html = browser.page_source

    soup = bs(page_html, 'lxml')
    monster_names = soup.find_all('div',{'class':"row monster-name"})

#At each page, we will iterate through all the monsters, if the monsters contain one of the 3 sources from our list, we will grab that monster name

    for monster in monster_names:
        page_name = monster.find('span',{'class':'name'})
        source_name = monster.find('span',{'class':'source'})
        if source_name.text.strip() in source_list:
            monster_name_source_material.append(page_name.text.strip())
    browser.quit()        
    sleep(6)
%store monster_name_source_material

In [None]:
%store -r monster_name_source_material
len(monster_name_source_material)

### Running Source Only
After creating a list of public source only material, I have come up with 549 monster. We will remove any monster we have already parsed. This leaves us with 335 monsters left. A MUCH shorter list to parse.

In [None]:
monster_name_parsed = list(monster_dict["Monster Name"])

monster_name_url_ending_source_material = list(name for name in monster_name_source_material if name not in monster_name_parsed)

len(monster_name_url_ending_source_material)

In [None]:
monster_nospace = []

# Filter through and replace spaces with dashes to format for urls

for name in monster_name_url_ending_source_material:
    if not(name.strip().isdigit()):
        new_name = name.replace(' ','-')
        monster_nospace.append(new_name)
    else:
        monster_nospace.append(name)
        
monster_name_url_ending = []
#filter and replace '()' with nothing
for name in monster_nospace:
    if not(name.strip().isdigit()):
        new_name = name.replace('(','')
        final_name = new_name.replace(')','')
        monster_name_url_ending.append(final_name)
    else:
        monster_name_url_ending.append(name)

In [None]:
url = 'https://www.dndbeyond.com/monsters/'
j=0

# Iterate through monster names and add to url 
for i in monster_name_url_ending[0:215]:
  page_html = None
# Request the html using selenium function
  page_html = Request(url+i).get_selenium('mon-stat-block__name')
  j+=1
      
  if page_html is not None:
      soup = bs(page_html, 'lxml')
      monster_stat_gathering(soup, monster_dict)
  sleep(60)
  print(j)
%store monster_dict

In [None]:
# Count and save monster dictionary
len(monster_dict['Monster Name'])
monster_df.to_csv('../data/raw/Monster_Data_RAW.csv')