<a href="https://colab.research.google.com/github/alexdriedger/MTGDataExperiments/blob/main/build_around_evaluator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to Use This Notebook

## Show Me The Cool Graphs

If you want to see the graphs for build arounds and explore the data for various build around strategies, hop over to the [Examples](#Examples) section. There is more info there on how to see some example build arounds (eg. Caves from LCI, Roots from MKM) and how to customize the evaluation (even if you don't have any coding experience).

## Show Me The Code

If you are interested in how the graphs are created or want to tweak the code to test out different things, all the code is in the [Code](#Code) section

## How to Run This Notebook

To run this notebook, in the menu, click "Runtime" => "Run all". It usually takes a few minutes to process all the data. You can see the graphs in the [Examples](#Examples) secion

# Code

This section has all the code for creating the datasets and graphing the build arounds. If you want to just see the graphs, go to [Examples](#Examples).

### Import Pandas and Set Useful Options

Pandas is what we'll use to analyze the data. For more info on Pandas, see the [docs](https://pandas.pydata.org/docs/user_guide/10min.html).

In [None]:
import gzip
import pandas as pd

pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.max_columns', 1600)

### Download the Dataset

Download the datasets. The datasets are from the [17 Lands Public Data Sets](https://www.17lands.com/public_datasets).

In [None]:
from pathlib import Path

FORMAT_PREMIER_DRAFT = "PremierDraft"
FORMAT_TRADITIONAL_DRAFT = "TradDraft"

def download_if_not_exists(file_name, remote_url, common_name):
  if Path(file_name).is_file():
    print(f"{common_name} data file already downloaded. Skipping download")
  else:
    !wget {remote_url}

def get_dataset_metadata(expansion, format, dataset_type):
  file_name = f"{dataset_type}_data_public.{expansion}.{format}.csv.gz"
  remote_url = f"https://17lands-public.s3.amazonaws.com/analysis_data/{dataset_type}_data/{file_name}"
  return file_name, remote_url


def download_datasets(expansion, format):
  game_data_file_name, game_data_remote_url = get_dataset_metadata(expansion, format, "game")
  draft_data_file_name, draft_data_remote_url = get_dataset_metadata(expansion, format, "draft")
  cards_data_file_name = f"cards.csv"
  cards_data_remote_url = f"https://17lands-public.s3.amazonaws.com/analysis_data/cards/{cards_data_file_name}"

  download_if_not_exists(game_data_file_name, game_data_remote_url, "game")
  download_if_not_exists(draft_data_file_name, draft_data_remote_url, "draft")
  download_if_not_exists(cards_data_file_name, cards_data_remote_url, "cards")

### Enrich the dataset

Iterate through the dataset in chunks and add counts for build around enablers and payoffs. We drop all the specific card information after calculating the counts in order to save space.

In [None]:
def get_all_drafts(expansion, format):
  """
  Returns a table with one row per draft
  """
  cols = ["expansion", "event_type", "draft_id", "draft_time", "rank", "event_match_wins", "event_match_losses"]
  chunks = list()
  for draft_data in pd.read_csv(
      get_dataset_metadata(expansion, format, "draft")[0],
      chunksize=100000,
      usecols=cols
      ):
    draft_data_no_dups = draft_data.drop_duplicates(subset=["draft_id"])
    chunks.append(draft_data_no_dups)

  all_drafts = pd.concat(chunks)

  # Remove duplicates in case of drafts that show up in multiple chunks
  all_drafts = all_drafts.drop_duplicates(subset=["draft_id"], keep="last")
  return all_drafts

In [None]:
from datetime import datetime

def enrich_build_around_stats(decks, build_around_payoffs, build_around_enablers):
  decks["build_around_payoffs"] = decks[build_around_payoffs].sum(axis=1)
  decks["build_around_enablers"] = decks[build_around_enablers].sum(axis=1)

  return decks

gd_base_cols = ['draft_id', 'main_colors', 'splash_colors', 'rank', 'on_play', 'opp_colors', 'num_turns', 'draft_time', 'won']
def get_game_data_cols(format, expansion):
  """
  Returns the columns in the game data file that includes metadata and which cards were in the deck. Filters out other columns to reduce size of the dataset
  """
  df = next(pd.read_csv(get_dataset_metadata(format, expansion, "game")[0], chunksize=100))
  col_names = list(df)
  gd_card_cols = [x for x in col_names if x.startswith("deck_")]

  gd_all_cols = gd_base_cols + gd_card_cols
  return gd_all_cols

def get_all_enriched_decks(
    expansion,
    format,
    build_around_payoffs,
    build_around_enablers,

    # Additional filters
    start_date: datetime = None,
    end_date: datetime = None,
    ranks: list[str] = None,
    on_play: bool = None,
    min_num_turns: int = None,
    max_num_turns: int = None,
    main_colors: list[str] = None
    ):
  gd_all_cols = get_game_data_cols(expansion, format)
  final_cols = gd_base_cols + ["build_around_payoffs", "build_around_enablers"]
  chunks = list()
  for game_data in pd.read_csv(
      get_dataset_metadata(expansion, format, "game")[0],
      chunksize=100000,
      usecols=gd_all_cols
      ):
    game_data = game_data.rename(columns=lambda x: x[5:] if x.startswith("deck_") else x)
    game_data['draft_time'] = pd.to_datetime(game_data['draft_time'])
    game_data = enrich_build_around_stats(game_data, build_around_payoffs, build_around_enablers)
    # game_data = game_data.filter(items=final_cols)

    # Additional optional filters
    if start_date:
      game_data = game_data[game_data["draft_time"] >= start_date]
    if end_date:
      game_data = game_data[game_data["draft_time"] <= end_date]
    if ranks:
      game_data = game_data[game_data["rank"].isin(ranks)]
    if on_play:
      game_data = game_data[game_data["on_play"] == on_play]
    if min_num_turns:
      game_data = game_data[game_data["num_turns"] >= min_num_turns]
    if max_num_turns:
      game_data = game_data[game_data["num_turns"] <= max_num_turns]
    if main_colors:
      game_data = game_data[game_data["main_colors"].isin(main_colors)]

    chunks.append(game_data)

  all_games = pd.concat(chunks)
  return all_games

### Aggregate the Dataset

The aggregation gets the win percentage and size for every pair of payoffs count to enablers count (eg. what was the win percentage with 3 payoffs and 7 enablers)

In [None]:
def aggregate_build_around_stats(decks):
  win_col_name = "won_int"
  df = decks.copy()

  df["won_int"] = df["won"].astype(int)

  df = df.filter(items=["build_around_payoffs", "build_around_enablers", win_col_name])
  df = df.groupby(["build_around_payoffs", "build_around_enablers"]).agg(["mean", "size"])

  df = df[win_col_name].copy()
  df = df.reset_index()
  return df

### Graph the Dataset

Now that the data has been enriched and aggregated, we can graph the data with a heatmap. Note, we only show boxes that have enough data points to be relevant

In [None]:
from pandas import DataFrame
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

In [None]:
def graph_build_around_data(build_around_decks, title="Build Around Payoffs"):
  max_payoffs = build_around_decks["build_around_payoffs"].max() + 1
  max_enablers = build_around_decks["build_around_enablers"].max() + 1

  hm = np.zeros((max_payoffs, max_enablers))
  hm.fill(None)

  for index, row in build_around_decks.iterrows():
    non_flip_cave_cards = int(row["build_around_enablers"])
    cave_payoffs = int(row["build_around_payoffs"])
    mean = row["mean"]
    hm[cave_payoffs, non_flip_cave_cards] = mean

  hm_df = DataFrame(hm, index=range(max_payoffs), columns=range(max_enablers))
  hm_df = hm_df[::-1]

  plt.figure(figsize=(max_enablers, max_payoffs))

  vmin = hm[0, 0] - .1
  vmax = hm[0, 0] + 0.04
  center = hm[0, 0] - 0.03

  sns.set(style="ticks")
  plt.style.use("dark_background")

  ax = sns.heatmap(
      hm_df,
      annot=True,
      cmap="RdYlGn",
      robust=True,
      fmt=".2f",
      linewidth=.25,
      linecolor="black",
      center=center,
      vmin=vmin,
      vmax=vmax,
      cbar=False
      )
  ax.xaxis.labelpad = 20
  ax.yaxis.labelpad = 20

  ax.set_xlabel("Enablers in Deck", fontsize=18)
  ax.set_ylabel("Payoffs in Deck", fontsize=18)

  ax.set_title(title, fontdict={'fontsize': 28, 'fontweight': 3}, pad=20)
  return ax

## Bring It All Together

In [None]:
def evaluate_build_around(
    title: str,
    expansion: str,
    format: str,
    build_around_payoffs: list[str],
    build_around_enablers: list[str],
    num_games_threshold: int = 50,

    # Additional filters
    start_date: datetime = None,
    end_date: datetime = None,
    ranks: list[str] = None,
    on_play: bool = None,
    min_num_turns: int = None,
    max_num_turns: int = None,
    main_colors: list[str] = None
    ):

  download_datasets(expansion, format)
  all_decks_with_build_around_data = get_all_enriched_decks(
      expansion,
      format,
      build_around_payoffs,
      build_around_enablers,
      start_date,
      end_date,
      ranks,
      on_play,
      min_num_turns,
      max_num_turns,
      main_colors
      )

  aggregated_stats = aggregate_build_around_stats(all_decks_with_build_around_data)
  aggregated_stats_with_thresh = aggregated_stats.drop(aggregated_stats[aggregated_stats["size"] < num_games_threshold].index)
  ax = graph_build_around_data(aggregated_stats_with_thresh, title)
  ax
  return all_decks_with_build_around_data, aggregated_stats, ax

## Further Stats

These functions help evaluate the data in more detail

In [None]:
def common_cards(input_decks, payoff_min, payoff_max, build_around_min, build_around_max):
  cols_to_exclude = gd_base_cols + ["build_around_enablers", "build_around_payoffs"]
  decks_without_metadata_cols = input_decks[input_decks.columns[~input_decks.columns.isin(cols_to_exclude)]]
  cc_stats = decks_without_metadata_cols.mean().sort_values(ascending=False)
  print("Most common cards in decks (avg)")
  print(cc_stats.head(35))

def color_breakdown(input_decks):
  print("\nBreakdown of decks by color groups")
  print(input_decks["main_colors"].value_counts(normalize=True).sort_values(ascending=False).mul(100).astype(str)+'%')

def extra_stats(input_decks, payoff_min, payoff_max, build_around_min, build_around_max):
  optimal_decks = input_decks[(input_decks["build_around_payoffs"] >= payoff_min) & (input_decks["build_around_payoffs"] <= payoff_max) & (input_decks["build_around_enablers"] >= build_around_min) & (input_decks["build_around_enablers"] <= build_around_max)]
  print(f"\nNumber of data points: {len(optimal_decks)}")
  common_cards(optimal_decks, payoff_min, payoff_max, build_around_min, build_around_max)
  color_breakdown(optimal_decks)

<a name="Examples"></a>
# Examples

To create the graphs, in the menu, click "Runtime" => "Run all". It usually takes a few minutes to process all the data.



### MKM - Insidious Roots / Chalk Outline

In [None]:
roots_outline_payoffs = ["Insidious Roots", "Chalk Outline"]
top_enablers_by_stats = ["Rubblebelt Maverick", "Gravestone Strider", "Vitu-Ghazi Inspector", "Topiary Panther", "Extract a Confession", "Bite Down on Crime", "Evidence Examiner", "Aftermath Analyst", "Leering Onlooker"]

# Other potential groups of enablers
enablers_creatures_that_leave_the_graveyard = ["Rubblebelt Maverick", "Gravestone Strider", "Leering Onlooker"]
collect_evidence_cards = ["Behind the Mask", "Bite Down on Crime", "Crimestopper Sprite", "Evidence Examiner", "Extract a Confession", "Forensic Researcher", "Hedge Whisperer", "Polygraph Orb", "Sample Collector", "Surveillance Monitor", "Vitu-Ghazi Inspector"]
enablers_leave_the_graveyard = ["Macabre Reconstruction", "Rot Farm Mortipede", "Soul Enervation"]

roots_outline_decks_premier, aggs, ax = evaluate_build_around(
    title="Insidious Roots & Chalk Outline \nPremier Draft",
    expansion="MKM",
    format=FORMAT_PREMIER_DRAFT,
    build_around_payoffs=roots_outline_payoffs,
    build_around_enablers=top_enablers_by_stats,
    start_date=datetime(2024, 3, 1),
    ranks=["platinum", "diamond", "mythic"]
)

In [None]:
extra_stats(roots_outline_decks_premier, payoff_min=1, payoff_max=4, build_around_min=7, build_around_max=15)

### LCI - Caves Deck

In [None]:
cave_build_around_payoffs = ["Bat Colony", "Calamitous Cave-In", "Gargantuan Leech", "Sinuous Benthisaur"]
cave_build_around_enablers = ["Captivating Cave", "Cavernous Maw", "Echoing Deeps", "Forgotten Monument", "Hidden Cataract", "Hidden Courtyard", "Hidden Necropolis", "Hidden Nursery", "Hidden Volcano", "Pit of Offerings", "Promising Vein", "Sunken Citadel", "Volatile Fault"]

caves_deck, aggs, ax = evaluate_build_around(
    title="Caves Decks",
    expansion="LCI",
    format=FORMAT_PREMIER_DRAFT,
    build_around_payoffs=cave_build_around_payoffs,
    build_around_enablers=cave_build_around_enablers,
    ranks=["platinum", "diamond", "mythic"]
)

# Other Stuff

### Explore The Data Set

There are 3 key datasets from 17Lands. The main datasets are the draft and game data sets for an MTG set. Another dataset that can often be useful is the list of all cards on arena.

These are helpful to looking at the raw datasets

In [None]:
# Game Data
df = next(pd.read_csv(get_dataset_metadata("LCI", FORMAT_PREMIER_DRAFT, "game")[0], chunksize=100))
df.head(25)

In [None]:
# Draft Data
df = next(pd.read_csv(get_dataset_metadata("LCI", FORMAT_PREMIER_DRAFT, "draft")[0], chunksize=100))
df.head(55)

In [None]:
cards_table = pd.read_csv("cards.csv")
set_cards = cards_table[cards_table["expansion"] == "MKM"]
set_cards.head(25)