# Web Scraping and Observing Tournament Deck Lists

MTG Goldfish is one of the premier websites for looking at deck information and tournament results. It is fast to update and provides analytic information of card amounts in decks, which speeds up the process of retrieving information for each deck.

## Install Libraries

In [110]:
import pandas as pd
import numpy as np
import os

import time

# For Webscraping
import requests
from bs4 import BeautifulSoup

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from contextlib import closing

# replaces accented characters with standard alphanumeric characters
from unidecode import unidecode

## Retrieve Page Destination

Before getting decklist information, I first need to navigate the MTG Top 8 webpages to get to the destinations from which I will get decklist information. Once I can navigate to and collect the decklists, I will then be able to retrieve the information I want, for anlaysis.

### Grab URL for Homepage

In [78]:
URL_MTG_GOLDFISH = "https://www.mtggoldfish.com"
URL_MTG_GOLDFISH_MODERN = "https://www.mtggoldfish.com/metagame/modern/full#paper"

In [30]:
# Open webpage with Selenium in order to change deck meta analysis from the default 30 days, to 14 days
with closing(webdriver.Chrome()) as driver:
    driver.get(URL_MTG_GOLDFISH_MODERN)
    
    # Change meta analysis from the default previous 30 days to previous 14 days
    select = Select(driver.find_element(By.XPATH, '//*[@id="metagame-re-sort-select"]'))
    select.select_by_value("14")
    time.sleep(10)

    # Grab web page information
    soup_page = BeautifulSoup(driver.page_source)

## Retrieve General Deck Information and URL

On the main page for modern archetypes, the tile display the archetype title, the meta percentage, and the number of decks recorded during the desired time frame. We will gather this information over the past 14 days, along with the link belonging to each archetype that lands on a page displaying the share of cards contained in decks associated with that archetype.

In [35]:
# returns a list of each archetype tile so that deck name, url, and meta share can be recorded
archetype_tiles = soup_page.find_all("div", class_="archetype-tile")

In [80]:
archetypes = []

for tile in archetype_tiles:
    title = tile.find("div", class_="archetype-tile-title").find("a").text

    archetype_statistics = tile.find("div", class_="archetype-tile-statistic-value").text.split("\n")
    meta_percentage = archetype_statistics[1].replace("%", "")
    number_of_decks = archetype_statistics[3].replace("(", "").replace(")", "")
    
    arcehtype_href = tile.find("span", class_="deck-price-online").find("a")['href']
    ARCHETYPE_URL = URL_MTG_GOLDFISH + arcehtype_href

    archetype_dict = {"title": title, "meta_percentage": meta_percentage, "number_of_decks": number_of_decks, "URL": ARCHETYPE_URL}

    archetypes.append(archetype_dict)

## Retrieve Decklist Information

Each archetype has a differences between the played decks associated with them, but a large number of similarities. The average number of each card found in decks over the whole archetype will be collected.

MTG Goldfish, for each deck, will provide information regarding the average number of the card found within each deck that plays it, along with the number of decks within the archetype which play that card. By combining these two numbers, we can find the average number of the card found in decks across the entire format.

This calculation will be across the sideboard and mainboard. Any companion cards will be taken note of, because they function differently from both sideboard and mainboard cards.

In [108]:
# Takes archetype breakdown container and returns list of each card in the container and its prevalence in all decks of the archetype
def filter_cards(container):
    out = []
    
    cards = container.find_all("div", class_="spoiler-card")
    for card in cards:
        card_name = unidecode(card.find("span", class_="price-card-invisible-label").text)
        card_prevalence_text = card.find("p", class_="archetype-breakdown-featured-card-text").text.split(" ")

        card_number = float(card_prevalence_text[0])
        card_deck_prevalence = float(card_prevalence_text[2].replace("%", ""))
        card_share = card_number * card_deck_prevalence / 100

        out.append((card_name, card_share))

    return out

In [115]:
for i in range(len(archetypes)):
    mainboard = []
    sideboard = []
    companion = []

    archetype = archetypes[i]
    URL = archetype["URL"]

    page = requests.get(URL)
    soup_page = BeautifulSoup(page.content, "html.parser")

    spoiler_containers = soup_page.find("div", class_="deck-archetype-breakdown").find_all("div", class_="spoiler-card-container")

    for container in spoiler_containers:
        container_cards = filter_cards(container)
        container_type = container.find('h3').text
        if container_type == "Companion":
            for card in container_cards:
                companion.append(card)
        elif container_type == "Sideboard":
            for card in container_cards:
                sideboard.append(card)
        else:
            for card in container_cards:
                mainboard.append(card)

    archetypes[i]['mainboard'] = mainboard
    archetypes[i]['sideboard'] = sideboard
    archetypes[i]['companion'] = companion

    title = archetypes[i]["title"]
    number_of_decks = archetypes[i]["number_of_decks"]
    meta_percentage = archetypes[i]["meta_percentage"]
    print(f"Finished Deck List for {title}. There are {number_of_decks} decks for {meta_percentage}% of the meta.")
    time.sleep(5)

Finished Deck List for Rakdos Scam. There are 90 decks for 10.6% of the meta.
Finished Deck List for Living End. There are 90 decks for 10.6% of the meta.
Finished Deck List for Yawgmoth. There are 54 decks for 6.4% of the meta.
Finished Deck List for Amulet Titan. There are 53 decks for 6.2% of the meta.
Finished Deck List for Domain Zoo. There are 45 decks for 5.3% of the meta.
Finished Deck List for Ruby Storm. There are 33 decks for 3.9% of the meta.
Finished Deck List for Azorius Control. There are 26 decks for 3.1% of the meta.
Finished Deck List for Bant Nadu. There are 26 decks for 3.1% of the meta.
Finished Deck List for Boros Burn. There are 25 decks for 3.0% of the meta.
Finished Deck List for Mill. There are 22 decks for 2.6% of the meta.
Finished Deck List for Temur Prowess. There are 22 decks for 2.6% of the meta.
Finished Deck List for Mono-Black Scam. There are 17 decks for 2.0% of the meta.
Finished Deck List for Indomitable Creativity. There are 16 decks for 1.9% of t

## Organize Data

Before the data can be used, it is best to organize the data, so that it can be used properly while ensuring that the minimum needed information is passed along. This will ensure faster load times and efficient use of memory.