# Goal:

Create a scraper to navigate the MTG website and scrape the rules for card abilities for assistance offline with card that have less common abilities and just reminder text.

Expansion Idea: Scrape the rules table for each card in MTG and create a Dataset of card, cost, type/sub type and rules.

This may be placed within an application that allows for searching of rules to provide quick info with potential examples/example interactions.

----------

## Steps:

- acquire list of all cards
- acquire list of all card abilities
- consider which site(s) to scrape 
- generate iterative means of delving the various wikis for information.

---------

Spells can be navigated through modification of the following URL with '%20' as the space
https://gatherer.wizards.com/pages/card/details.aspx?name=Lightning%20Bolt

sets might be gleamed from this link:
https://gatherer.wizards.com/Pages/Default.aspx

---------

In [1]:
from splinter import Browser
from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager

In [2]:
# set up splinter
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)



Current google-chrome version is 102.0.5005
Get LATEST chromedriver version for 102.0.5005 google-chrome
Trying to download new driver from https://chromedriver.storage.googleapis.com/102.0.5005.61/chromedriver_win32.zip
Driver has been saved in cache [C:\Users\Alec\.wdm\drivers\chromedriver\win32\102.0.5005.61]


In [3]:
url = 'https://gatherer.wizards.com/Pages/Default.aspx'
browser.visit(url)

In [4]:
# create the soup object
html = browser.html
mtg_soup = BeautifulSoup(html, 'html.parser')

# Goal - Gain a list of all sets to use in simple search on MTG gatherer.wizards 
# site to then use to build card name dataset for further scraping


# isolate the div containing the card set options
search_section_div = mtg_soup.find_all('div', class_ = 'style-select')

In [5]:
# separte card set from the select boxes
card_set = search_section_div[1]

# turn all <options> into a list
card_set_options = card_set.find_all('option')

In [6]:
# test text grab
card_set_options[1].get_text()

'Adventures in the Forgotten Realms'

In [7]:
# create list of all set by extracting the text from the options.
set_list = []
for i,option in enumerate(card_set_options):
    text = option.get_text()
    if len(text)>0:
        set_list.append(text)
    else:
        continue

In [11]:
set_list[:10]

['Adventures in the Forgotten Realms',
 'Adventures in the Forgotten Realms Commander',
 'Aether Revolt',
 'Alara Reborn',
 'Alliances',
 'Amonkhet',
 'Amonkhet Remastered',
 'Antiquities',
 'Apocalypse',
 'Arabian Nights']

In [5]:
set_list = ['Adventures in the Forgotten Realms','Adventures in the Forgotten Realms Commander','Aether Revolt','Alara Reborn','Alliances','Amonkhet','Amonkhet Remastered','Antiquities','Apocalypse','Arabian Nights','Archenemy','Archenemy: Nicol Bolas','Arena Base Set','Avacyn Restored','Battle for Zendikar','Battle Royale Box Set','Battlebond','Beatdown Box Set','Betrayers of Kamigawa','Born of the Gods','Champions of Kamigawa','Chronicles','Classic Sixth Edition','Coldsnap','Commander 2013 Edition','Commander 2014','Commander 2015','Commander 2016','Commander 2017','Commander 2018','Commander 2019','Commander 2021','Commander Anthology','Commander Anthology 2018','Commander Collection: Green','Commander Legends',"Commander's Arsenal",'Conflux','Conspiracy: Take the Crown','Core Set 2019','Core Set 2020','Core Set 2021','Dark Ascension','Darksteel','Dissension','Dominaria','Double Masters',"Dragon's Maze",'Dragons of Tarkir','Duel Decks Anthology, Divine vs. Demonic','Duel Decks Anthology, Elves vs. Goblins','Duel Decks Anthology, Garruk vs. Liliana','Duel Decks Anthology, Jace vs. Chandra','Duel Decks: Ajani vs. Nicol Bolas','Duel Decks: Blessed vs. Cursed','Duel Decks: Divine vs. Demonic','Duel Decks: Elspeth vs. Kiora','Duel Decks: Elspeth vs. Tezzeret','Duel Decks: Elves vs. Goblins','Duel Decks: Elves vs. Inventors','Duel Decks: Garruk vs. Liliana','Duel Decks: Heroes vs. Monsters','Duel Decks: Izzet vs. Golgari','Duel Decks: Jace vs. Chandra','Duel Decks: Jace vs. Vraska','Duel Decks: Knights vs. Dragons','Duel Decks: Merfolk vs. Goblins','Duel Decks: Mind vs. Might','Duel Decks: Nissa vs. Ob Nixilis','Duel Decks: Phyrexia vs. the Coalition','Duel Decks: Sorin vs. Tibalt','Duel Decks: Speed vs. Cunning','Duel Decks: Venser vs. Koth','Duel Decks: Zendikar vs. Eldrazi','Eighth Edition','Eldritch Moon','Eternal Masters','Eventide','Exodus','Explorers of Ixalan','Fallen Empires','Fate Reforged','Fifth Dawn','Fifth Edition','Fourth Edition','From the Vault: Angels','From the Vault: Annihilation (2014)','From the Vault: Dragons','From the Vault: Exiled','From the Vault: Legends','From the Vault: Lore','From the Vault: Realms','From the Vault: Relics','From the Vault: Transform','From the Vault: Twenty','Future Sight','Game Night','Game Night 2019','Gatecrash','Gift Pack','Global Series: Jiang Yanggu and Mu Yanling','Guild Kit: Azorius','Guild Kit: Boros','Guild Kit: Dimir','Guild Kit: Golgari','Guild Kit: Gruul','Guild Kit: Izzet','Guild Kit: Orzhov','Guild Kit: Rakdos','Guild Kit: Selesnya','Guild Kit: Simic','Guildpact','Guilds of Ravnica','Guilds of Ravnica Mythic Edition','Historic Anthology 1','Historic Anthology 2','Historic Anthology 3','Historic Anthology 4','Historic Anthology 5','Homelands','Hour of Devastation','Ice Age','Iconic Masters','Ikoria Commander','Ikoria: Lair of Behemoths','Innistrad','Innistrad: Crimson Vow','Innistrad: Crimson Vow Commander','Innistrad: Midnight Hunt','Innistrad: Midnight Hunt Alchemy','Innistrad: Midnight Hunt Commander','Invasion','Ixalan','Journey into Nyx','Judgment','Jumpstart','Jumpstart: Historic Horizons','Kaladesh','Kaladesh Remastered','Kaldheim','Kaldheim Commander','Kamigawa: Neon Dynasty','Kamigawa: Neon Dynasty Alchemy','Kamigawa: Neon Dynasty Commander','Khans of Tarkir','Legends','Legions','Limited Edition Alpha','Limited Edition Beta','Lorwyn','Magic 2010','Magic 2011','Magic 2012','Magic 2013','Magic 2014 Core Set','Magic 2015 Core Set','Magic Origins','Magic: The Gathering-Commander','Magic: The Gathering—Conspiracy','Masterpiece Series: Amonkhet Invocations','Masterpiece Series: Kaladesh Inventions','Masters 25','Masters Edition','Masters Edition II','Masters Edition III','Masters Edition IV','Mercadian Masques','Mirage','Mirrodin','Mirrodin Besieged','Modern Event Deck 2014','Modern Horizons','Modern Horizons 2','Modern Masters','Modern Masters 2015 Edition','Modern Masters 2017 Edition','Morningtide','Mystery Booster','Nemesis','New Phyrexia','Ninth Edition','Oath of the Gatewatch','Odyssey','Onslaught','Planar Chaos','Planechase','Planechase 2012 Edition','Planechase Anthology','Planeshift','Portal','Portal Second Age','Portal Three Kingdoms','Premium Deck Series: Fire and Lightning','Premium Deck Series: Graveborn','Premium Deck Series: Slivers','Promo set for Gatherer','Prophecy','Ravnica Allegiance','Ravnica Allegiance Mythic Edition','Ravnica: City of Guilds','Return to Ravnica','Revised Edition','Rise of the Eldrazi','Rivals of Ixalan','Saviors of Kamigawa','Scars of Mirrodin','Scourge','Secret Lair Drop','Seventh Edition','Shadowmoor','Shadows over Innistrad','Shards of Alara','Signature Spellbook: Gideon','Signature Spellbook: Jace','Starter 1999','Starter 2000','Streets of New Capenna','Streets of New Capenna Commander','Strixhaven Mystical Archive','Strixhaven: School of Mages','Stronghold','Tempest','Tempest Remastered','Tenth Edition','The Dark','Theros','Theros Beyond Death','Throne of Eldraine','Time Spiral','Time Spiral "Timeshifted"','Time Spiral Remastered','Torment',"Ugin's Fate promos",'Ultimate Box Toppers','Ultimate Masters','Unglued','Unhinged','Unlimited Edition','Unsanctioned','Unstable',"Urza's Destiny","Urza's Legacy","Urza's Saga",'Vanguard','Vintage Masters','Visions','War of the Spark','War of the Spark Mythic Edition','Weatherlight','Welcome Deck 2016','Welcome Deck 2017','Worldwake','Zendikar','Zendikar Expeditions','Zendikar Rising','Zendikar Rising Commander','Zendikar Rising Expeditions']

gatherer.wizards returns more or less than the amount of card listed in teh set from the MTG wiki

I want to find out why.

I will oull up all of the cards in the gathere site from the aether revolt set
- gatherer lists 194 cards in the set.
- MTG wiki lists 184

someone is either incorrect or there are reprints listed in the gatherer site.

when parsing the cards will need to incorporate logic to allow a switch the next page
    parse number of available links for extraction
        for loop to extract, then nav to page two,three,nth somehow
        IDEA: find in HTML where page numbers are located and set loop-end to max INT from extracted values.
        
set up logic for iterating through a dictionary of card sets then have:
card_dict['set']['card_name']['attribute1'] = ""code to extract the attributes""
card_dict['set']['card_name']['attribute2'] = ""code to extract the attributes""

stuff to record:
    - multiverseid:
        test = card_list[0].attrs['href']
        test.split('=')
        multiverseid = test[-1]
        


In [6]:
url_01 = 'https://gatherer.wizards.com/Pages/Search/Default.aspx?set=[%22Aether+Revolt%22]'
# URLs can be parsed by replacing " with %22   &  whitespace with +

url_02 = 'https://gatherer.wizards.com/Pages/Search/Default.aspx?set=[%22Alara+Reborn%22]'


browser.visit(url_01)



In [7]:
# create the soup object
html = browser.html
# soup object
search_from_set_soup = BeautifulSoup(html, 'html.parser')

# Goal - Gain a list of all sets to use in simple search on MTG gatherer.wizards 
# site to then use to build card name dataset for further scraping


# isolate the div containing the cards from the search
cardList = search_from_set_soup.find('div', class_ = 'cardList') 

# the return from a search generates a table that contains all cards
# cardTable = cardList.find('table', class_='cardItemTable')

# isolate the <span> tags for the hrefs and the card names to the actual card information
card_info_list = cardList.find_all('td', class_='middleCol')


# <td> class='middlecol'
# contains all of the information about the cards minus the href
# uses imgs as the mana symbols but also uses alt= number or color to indicate detailed mana costs
# will have to extract the alts from the <td> for each of the elements in the resulting lists then preserve order of 
# mana costs while incorporating logic to know when a card is done being parsed/when to move on to the next card in list.

# different abilities in the card, rules text is separated by <p> tags.
# parse those and add to the card_dict as list of elements?
# add card_info_dict['set']['card_name']['rules_text'] = [<p>,<p>,<p>,<p>,]
# OR
# add card_info_dict['set']['card_name']['rules_text']['line_{i}'] = for each <p>



In [23]:
# <td class="middleCol">
# <div class="clear"></div>
# <div class="cardInfo">
# <span class="cardTitle"><a id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardTitle" onclick="return CardLinkAction(event, this, 'SameWindow');" href="../Card/Details.aspx?multiverseid=423808">Aegis Automaton</a></span> 
# <span class="manaCost"><img src="/Handlers/Image.ashx?size=small&amp;name=2&amp;type=symbol" alt="2" align="absbottom"></span> (<span class="convertedManaCost">2</span>)
# <br>
# <span class="typeLine">Artifact Creature  — Construct(0/3)</span>
# <br>
# <div class="rulesText"><p><img src="/Handlers/Image.ashx?size=small&amp;name=4&amp;type=symbol" alt="4" align="absbottom"><img src="/Handlers/Image.ashx?size=small&amp;name=W&amp;type=symbol" alt="White" align="absbottom">: Return another target creature you control to its owner's hand.</p></div>
# </div>
# </td>

for i in card_info_list:
    name = card_info_list[0].find('span', class_='cardTitle').text.split('\n')[-1]

    href = card_info_list[i].find('a').attrs['href']

    multiverse_id = card_info_list[i].find('a').attrs['href'].split('=')[-1]

In [10]:
card_info_list[0:2]

[<td class="middleCol">
 <div class="clear"></div>
 <div class="cardInfo">
 <span class="cardTitle">
 <a href="../Card/Details.aspx?multiverseid=423808" id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardTitle" onclick="return CardLinkAction(event, this, 'SameWindow');">Aegis Automaton</a></span> <span class="manaCost">
 <img align="absbottom" alt="2" src="/Handlers/Image.ashx?size=small&amp;name=2&amp;type=symbol"/></span> (<span class="convertedManaCost">2</span>)
                             <br/>
 <span class="typeLine">
                                 Artifact Creature  — Construct
                                 (0/3)</span>
 <br/>
 <div class="rulesText">
 <p><img align="absbottom" alt="4" src="/Handlers/Image.ashx?size=small&amp;name=4&amp;type=symbol"/><img align="absbottom" alt="White" src="/Handlers/Image.ashx?size=small&amp;name=W&amp;type=symbol"/>: Return another target creature you control to its owner's hand.</p></div>
 </div>
 </td>,

In [9]:
card_info_list[0]

<td class="middleCol">
<div class="clear"></div>
<div class="cardInfo">
<span class="cardTitle">
<a href="../Card/Details.aspx?multiverseid=423808" id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardTitle" onclick="return CardLinkAction(event, this, 'SameWindow');">Aegis Automaton</a></span> <span class="manaCost">
<img align="absbottom" alt="2" src="/Handlers/Image.ashx?size=small&amp;name=2&amp;type=symbol"/></span> (<span class="convertedManaCost">2</span>)
                            <br/>
<span class="typeLine">
                                Artifact Creature  — Construct
                                (0/3)</span>
<br/>
<div class="rulesText">
<p><img align="absbottom" alt="4" src="/Handlers/Image.ashx?size=small&amp;name=4&amp;type=symbol"/><img align="absbottom" alt="White" src="/Handlers/Image.ashx?size=small&amp;name=W&amp;type=symbol"/>: Return another target creature you control to its owner's hand.</p></div>
</div>
</td>

In [12]:
### Notes ###
'''
Consider changing full word symbols into First letter capitol symbols to save space in rules text

EX: change -> {Energy} to {E}

This may be reverted with the implementation of Keyword-ability counters in order
to preserve clarity if information at the cost of space , time will tell.

'''

symbol_rename_dict = {
    'White':'{W}',
    'Red':'{R}',
    'Green':'{G}',
    'Blue':'{U}',
    'Black':'{B}',
    'Colorless':'{CL}',
    'Variable Colorless':'{X}',
    'Energy':'{E}',
    
    'Zero': '{0}',
    'One': '{1}',
    'Two': '{2}',
    'Three': '{3}',
    'Four': '{4}',
    'Five': '{5}',
    'Six': '{6}',
    'Seven': '{7}',
    'Eight': '{8}',
    'Nine': '{9}',
    'Ten': '{10}',
    'Eleven': '{11}',
    'Twelve': '{12}',
    'Thirteen': '{13}',
    'Fourteen': '{14}',
    'Fifteen': '{15}',
    'Sixteen': '{16}',
    'Seventeen': '{17}',
    'Eighteen': '{18}',
    'Nineteen': '{19}',
    'Twenty': '{20}',
    
    '0': '{0}', 
    '1': '{1}', 
    '2': '{2}', 
    '3': '{3}', 
    '4': '{4}', 
    '5': '{5}', 
    '6': '{6}', 
    '7': '{7}', 
    '8': '{8}', 
    '9': '{9}', 
    '10': '{10}', 
    '11': '{11}', 
    '12': '{12}', 
    '13': '{13}', 
    '14': '{14}', 
    '15': '{15}', 
    '16': '{16}', 
    '17': '{17}', 
    '18': '{18}', 
    '19': '{19}', 
    '20': '{20}', 
    
    'White or Red': '{W/R}',
    'White or Green': '{W/G}',
    'White or Blue': '{W/U}',
    'White or Black': '{W/B}',
    
    'Red or White': '{R/W}',
    'Red or Green': '{R/G}',
    'Red or Blue': '{R/U}',
    'Red or Black': '{R/B}',
    
    'Green or White': '{G/W}',
    'Green or Red': '{G/R}',
    'Green or Blue': '{G/U}',
    'Green or Black': '{G/B}',
    
    'Blue or White': '{U/W}',
    'Blue or Red': '{U/R}',
    'Blue or Green': '{U/G}',
    'Blue or Black': '{U/B}',
    
    'Black or White': '{B/W}',
    'Black or Red': '{B/R}',
    'Black or Green': '{B/G}',
    'Black or Blue': '{B/U}',
    
    'Two or White': '{2/W}',
    'Two or Red': '{2/R}',
    'Two or Green': '{2/G}',
    'Two or Blue': '{2/U}',
    'Two or Black': '{2/B}',
    
    # phyrexian mana symbols
    
    'Phyrexian or White': '{phW}',
    'Phyrexian or Red': '{phR}',
    'Phyrexian or Green': '{phU}',
    'Phyrexian or Blue': '{phG}',
    'Phyrexian or Black': '{phB}',
    
    # issues in naming conventions with gatherer website
    # https://gatherer.wizards.com/Pages/Card/Details.aspx?multiverseid=548551
    # phG/phU is called "variable" and "variable mana"
    # may require Unknown label until alternative means of labelling is added
    
    # Ideas: parse the img tag SRC for the IMG tag in <div class=value> on the card rulings page
    # split using '&' as delimeter, isolate "name=***", use that as key for value pull in assigning mana symbol?
    'Variable Mana ': '{UNK}',
    'Variable ': '{UNK}',

}


In [None]:
card_info_dict = {}
card_info_dict['set'] = {}
card_info_dict['set']['url'] = 'logic for parsing and setting up URLs by splitting and joining words or using .replace()'
# perhaps pre_load the list of urls before running for all of the card attributes

card_info_dict['set']['card_name'] = {} # dictionary to hold all info about the card
card_info_dict['set']['card_name']['name'] = # found by navigating <span>s
card_info_dict['set']['card_name']['multiverse_id'] = # created through slicing <a> link to card 
card_info_dict['set']['card_name']['navigable_url'] = # find <a> link to card
card_info_dict['set']['card_name']['mana_cost'] = variable
card_info_dict['set']['card_name']['type'] = [] # list to hold all possible types
card_info_dict['set']['card_name']['sub_type'] = [] # list to hold all possible subtypes
card_info_dict['set']['card_name']['power'] = variable
card_info_dict['set']['card_name']['toughness'] = variable
card_info_dict['set']['card_name']['power/toughness'] = variable

card_info_dict['set']['card_name']['rules_text'] = []



In [None]:
# for i in range(0, len(card_info_list)):

i=0

# ------------------------------------------------ name -------------------------------------------------------------
# Card Name with spaces
name = card_info_list[i]\
        .find('span', class_='cardTitle')\
        .text.split('\n')[-1]\
        .replace(' ','_')

# ------------------------------------------------ href -------------------------------------------------------------
# link to the card and rulings ( I hope )
href = card_info_list[i].find('a').attrs['href']

# ------------------------------------------------ multiverse_id ----------------------------------------------------
multiverse_id = card_info_list[i].find('a').attrs['href'].split('=')[-1]

# ------------------------------------------------ detailed_mana_cost -----------------------------------------------
# find the span containing the mana_cost
mana_cost_info_span = card_info_list[i].find('span', class_='manaCost')

img_tag_info = mana_cost_info_span.find_all('img')

# set up list to hold all cleaned mana symbol names, 
## some cards have multiple mana colors, each with respective color_symbol, including hybrid mana
mana_color_list = []

# for each img, extract the name of the symbol and filter player_name via dictionary
for i, img_tag in enumerate(img_tag_info):
    mana_color = img_tag.attrs['alt']
    # attrs in BS4 is accessed in Dictionary format, try/exc for handling key errors 
    try:
        clean_alt = symbol_rename_dict[f'{mana_color}']
    except KeyError:
        clean_alt = mana_color
    mana_color_list.append(clean_alt)
#format the mana_cost to show
# number_symbol_symbol | symbol_symbol_symbol_etc
detailed_mana_cost = '_'.join(mana_color_list)

# ------------------------------------------------ converted_mana_cost ----------------------------------------------
converted_mana_cost = card_info_list[i].find('span', class_='convertedManaCost').text

# ------------------------------------------------ Type_breakdown ------------------------------------------------
# parse the list for the card
# find the HTML <span> that contains the types
# extract the text
# replace /n with ''
# turn into list via .split() with delimeter as ' '

text_list = card_info_list[i]\
            .find('span',class_='typeLine')\
            .text\
            .replace('\n','')\
            .split(' ')

# set up list to hold types
clean_types = []
power_toughness = []
# for each element, keep everything thats not empty space (the whitespace is not whitespace in the 
# HTML from the Site, it's '')

for i, element in enumerate(text_list):
    # filter to keep anything longer than 0, removing empty space
    if (len(element)>1) or (element == '—'):
        # prevent preserving of attk/toughness
        if (element[0] != '('):
            clean_types.append(element)
        else:
            power_toughness.append(element)
# lists to hold the types
main_types = []
sub_types = []

# for each item in the list, check to see if the element matches the separator '—'
# if true: set up list slices nad create type lists from the cleaned_list
for i, card_type in enumerate(clean_types):

    if card_type == '—':

        main_types = list(clean_types[:i])
        sub_types = list(clean_types[i+1:])
# ------------------------------------------------ power/toughness ------------------------------------------------

# clean the power and toughness
power_toughness = power_toughness[0]\
.replace('(','')\
.replace(')','')\
.split('/')

power = power_toughness[0]
toughness = power_toughness[1]


# ------------------------------------------------ multiverse_id ------------------------------------------------


--------------

# Parsing Non-Standard Coutners: 
## Energy

The goal of this section is to extract the rules text from a card, parse for non-standard counters and then insert the words for those counters/symbols into the extracted rules text.

The Gatherer website uses IMG tags for representation of the non-standard counters and symbols, but when scraped, then IMG returns either NULL text or a title-cased word to represent the symbol/counter.

Because if this we will be utilizing a dictionary to return the proper values for each returned word and implementing a means of counting and inserting title-cased words for each IMG tag used to represent the symbols without mucking up the rules text so that it can be properly stored into a data base later.

In [13]:
rules_text_list = card_info_list[3].find('div', class_='rulesText').find_all('p')
lines_of_rules_text = len(rules_text_list)
rules_text_list[2].text

'Whenever Aether Chaser attacks, you may pay . If you do, create a 1/1 colorless Servo artifact creature token.'

In [29]:
lines_of_rules_text

3

In [30]:
rules_text_list

[<p>First strike</p>,
 <p>When Aether Chaser enters the battlefield, you get <img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/><img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/> <i>(two energy counters)</i>.</p>,
 <p>Whenever Aether Chaser attacks, you may pay <img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/><img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/>. If you do, create a 1/1 colorless Servo artifact creature token.</p>]

In [19]:
for i, P_tag in enumerate(rules_text_list):
    print(P_tag.text)

First strike
When Aether Chaser enters the battlefield, you get  (two energy counters).
Whenever Aether Chaser attacks, you may pay . If you do, create a 1/1 colorless Servo artifact creature token.


In [18]:
for i, P_tag in enumerate(rules_text_list):
    print(P_tag)

<p>First strike</p>
<p>When Aether Chaser enters the battlefield, you get <img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/><img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/> <i>(two energy counters)</i>.</p>
<p>Whenever Aether Chaser attacks, you may pay <img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/><img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/>. If you do, create a 1/1 colorless Servo artifact creature token.</p>


In [23]:
symbol_type = rules_text_list[1].find('img').attrs['alt']

num_symbols = len(rules_text_list[1].find_all('img'))
symbol_tag_text = rules_text_list[1].find_all('img')

clean_symbol = symbol_rename_dict[f'{symbol_type}']

for i in range(num_symbols):
    print(clean_symbol)

{Energy}
{Energy}


In [54]:
symbol_tag_text

[<img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/>,
 <img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/>]

In [55]:
rules_text_list

[<p>First strike</p>,
 <p>When Aether Chaser enters the battlefield, you get <img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/><img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/> <i>(two energy counters)</i>.</p>,
 <p>Whenever Aether Chaser attacks, you may pay <img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/><img align="absbottom" alt="Energy" src="/Handlers/Image.ashx?size=small&amp;name=e&amp;type=symbol"/>. If you do, create a 1/1 colorless Servo artifact creature token.</p>]

## Rules Text Cleaning - Symbols

thoughts on cleaning card rules for symbols

search all HTML for <img> in rules text
develop a means of transplanting the words for the symbols.

.split each <p> via '.' and then insert words where the symbols would be.

## Page Navigation:

find last number on list of page nav numbers, make iterable logoc for parsing pages

for card_set in set_list:
    for card in card_set:

In [None]:
https://gatherer.wizards.com/Pages/Search/Default.aspx?set=[%22Aether+Revolt%22]

'4_W_G'

wizards.gatherer doesnt return all cards in a set.

- find out why
- find another means of obtaining the information i.e. the names of all cards in a set

take set_list and build a dictionary off of it


set_name_list = 
    card_name = {
        'card_name' : ,
        'set_name' : ,
        'cost' : ,
        'CMC' : ,
        'type' : ,
        'subtype' : ,
        'power' : ,
        'toughness' : ,
        'rules_text' : ,
        'flavor_text' : ,
        'link_to_rules' : ,
        'link_to_photo' : ,
     },
     card_name = {
        'card_name' : ,
        'set_name' : ,
        'cost' : ,
        'CMC' : ,
        'type' : ,
        'subtype' : ,
        'power' : ,
        'toughness' : ,
        'rules_text' : ,
        'flavor_text' : ,
        'link_to_rules' : ,
        'link_to_photo' : ,
     },
]

Thoughts on change of application:

have app deployed to heroku to display cards and rule at search request from user
also display some neato card combos or potentially, any combinations that do janky stuff

first find a list of all cards

scrape contentm including photo

save to mongo? (text only, incl. link to wizards.gatherer.com)

The thing that I want is a faster means of loading the rules descriptions found below cards for understand what goes on in niche moments when two+ rules combine.

consider estabilishing a means of implanting reminder text for all entries after decoding/parsing the descriptions of the cards.

ability_dict = {
    'ability_name' : this is the name of the ability, e.g. 'Revolt' ,
    'ability_description' : (this is where the reminder text will go),
}

for ability in ability_list:
    if ability in ability_descriptions:
        ability_reminder = "ability" + "(reminder text from rules descriptions)"


DF layout
set_name, card_name, cost, CMC, type, subtype, power, toughness, card_rules, flavor_text, link_to_wizards.gatherer

JSON layout

cardname={
    'cardname' : ,
    'set_name' : ,
    'cost' : ,
    'CMC' : ,
    'type' : ,
    'subtype' : ,
    'power' : ,
    'toughness' : ,
    'rules_text' : ,
    'flavor_text' : ,
    'link_to_rules' : ,
    'link_to_photo' : ,
}