# Analyse Deck Names

In this notebook we'll load Dark Tidings data, figure out which decks are evil twins and which are regular decks. We'll also use a trick from ecology to estimate how many decks are printed.

In [1]:
%load_ext nb_black
import pandas as pd
import numpy as np

<IPython.core.display.Javascript object>

In [2]:
deck_data = (
    pd.read_table("./data/deck_names.tab", index_col=0).drop_duplicates().reset_index()
)
deck_data.tail()

Unnamed: 0,index,deck,set_id
89518,89996,Mastermind X. Snowphant’s Evil Twin,496
89519,89997,"“Pie”, guetteuse de Prowell",496
89520,89998,Babu Mandeville,496
89521,89999,"Leif “Indie”, el Emplumado",496
89522,90000,"Zoliak, the Expert of Embalming",496


<IPython.core.display.Javascript object>

Below we figure out which decks are "Evil Twins" and which their non-evil counterpart would be called. Next, we check if there are any pairs registered already. 

In [3]:
evil_prefix = [
    "Evil Twin of ",  # English
    "Gemello di ",  # Italian
    "Le Double Maléfique de ",  # French
    "Reflejo Oscuro de ",  # Spanish
    "Böser Zwilling von ",  # German
    "恶之",  # Chinese,
    "Gêmeo do Mal de ",  # Portugese
]
evil_suffix = [
    "’s Evil Twin",  # English
    " - Zła Bliźniaczka",  # Polish
    "的邪恶分身",  # Chinese
    "的邪惡雙生",  # Chinese
    "의 사악한 분신",  # Korean
]


def is_evil_twin(name: str):
    return any(
        [name.startswith(s) for s in evil_prefix]
        + [name.endswith(s) for s in evil_suffix]
    )


def non_evil_name(name: str):
    for s in evil_prefix + evil_suffix:
        name = name.replace(s, "")

    return name


def start_pipeline(df):
    return df.copy()


def filter_set(df, set_id):
    return df[df.set_id == set_id]


def detect_evil_twins(df):
    df["evil_twin"] = df["deck"].apply(is_evil_twin)

    return df


def rename_evil_twins(df):
    df["non_evil_name"] = df["deck"].apply(non_evil_name)

    return df


def has_known_evil_twin(df):
    evil_twins = list(df[df.evil_twin].non_evil_name)
    df["known_evil_twin"] = df["deck"].isin(evil_twins)

    return df


twin_data = (
    deck_data.pipe(start_pipeline)
    .pipe(filter_set, 496)
    .pipe(detect_evil_twins)
    .pipe(rename_evil_twins)
    .pipe(has_known_evil_twin)
)
twin_data.head()

Unnamed: 0,index,deck,set_id,evil_twin,non_evil_name,known_evil_twin
0,0,Zim who is Deliberately Hot,496,False,Zim who is Deliberately Hot,False
1,1,Herr U. Quill‑ance,496,False,Herr U. Quill‑ance,False
2,2,Capitol the Bellicose,496,False,Capitol the Bellicose,False
5,5,Eternally Crafty Auto,496,False,Eternally Crafty Auto,False
6,6,Fullleaf the “Writer”,496,False,Fullleaf the “Writer”,False


<IPython.core.display.Javascript object>

In [4]:
twin_data[twin_data.evil_twin]

Unnamed: 0,index,deck,set_id,evil_twin,non_evil_name,known_evil_twin
13,13,Xixoera's Underhanded Villain’s Evil Twin,496,True,Xixoera's Underhanded Villain,False
89,89,"Gemello di Hoang, Difensore Furbo",496,True,"Hoang, Difensore Furbo",False
124,124,"""No Bark"" Nat Heilig’s Evil Twin",496,True,"""No Bark"" Nat Heilig",False
210,210,Evil Twin of Mrs. Zora E. Vaultinghause,496,True,Mrs. Zora E. Vaultinghause,False
651,651,Practical Bureaucrat Octalan’s Evil Twin,496,True,Practical Bureaucrat Octalan,False
...,...,...,...,...,...,...
89449,89927,Le Double Maléfique de Omen la chaste,496,True,Omen la chaste,False
89484,89962,"Le Double Maléfique de “Grise”, pyrrhonienne d...",496,True,"“Grise”, pyrrhonienne de Digiata",False
89489,89967,Gemello di Sif “Macchia” l’Alta,496,True,Sif “Macchia” l’Alta,False
89495,89973,"Deirdre, Fortress Spelunker’s Evil Twin",496,True,"Deirdre, Fortress Spelunker",False


<IPython.core.display.Javascript object>

In [5]:
twin_data[~twin_data.evil_twin]

Unnamed: 0,index,deck,set_id,evil_twin,non_evil_name,known_evil_twin
0,0,Zim who is Deliberately Hot,496,False,Zim who is Deliberately Hot,False
1,1,Herr U. Quill‑ance,496,False,Herr U. Quill‑ance,False
2,2,Capitol the Bellicose,496,False,Capitol the Bellicose,False
5,5,Eternally Crafty Auto,496,False,Eternally Crafty Auto,False
6,6,Fullleaf the “Writer”,496,False,Fullleaf the “Writer”,False
...,...,...,...,...,...,...
89517,89995,"Isadora la Deforme, Bandito della Cometa",496,False,"Isadora la Deforme, Bandito della Cometa",False
89519,89997,"“Pie”, guetteuse de Prowell",496,False,"“Pie”, guetteuse de Prowell",False
89520,89998,Babu Mandeville,496,False,Babu Mandeville,False
89521,89999,"Leif “Indie”, el Emplumado",496,False,"Leif “Indie”, el Emplumado",False


<IPython.core.display.Javascript object>

In [6]:
twin_data[twin_data.known_evil_twin]

Unnamed: 0,index,deck,set_id,evil_twin,non_evil_name,known_evil_twin
1016,1016,Santa R. Fink,496,False,Santa R. Fink,True
1247,1247,Odin “Occhio per Occhio” l’Immortale,496,False,Odin “Occhio per Occhio” l’Immortale,True
1387,1387,Discoverer Briseis Hedlei,496,False,Discoverer Briseis Hedlei,True
1752,1752,Prosper le jaune,496,False,Prosper le jaune,True
2176,2176,Calm Mx. Roberts,496,False,Calm Mx. Roberts,True
...,...,...,...,...,...,...
89033,89511,"Q. Roussel, greffière de Vinrampart",496,False,"Q. Roussel, greffière de Vinrampart",True
89244,89722,Pasha “Niana” Shand,496,False,Pasha “Niana” Shand,True
89269,89747,火山口浪人內爾,496,False,火山口浪人內爾,True
89305,89783,Daphnée “Zoom” l'accablante,496,False,Daphnée “Zoom” l'accablante,True


<IPython.core.display.Javascript object>

There is a trick in ecology to estimate the size of a population by first catching a number of individuals and marking them on a certain day. The next day, you start catching individuals again and check how many are marked. If you caught and marked 50 animals on the first day, and you catch 100 the next, of which 10 are marked, you know that on day one you caught 10 % of the entire population. Hence the total population size is 50 * 100/10. This is called the [Mark and Recapture](https://en.wikipedia.org/wiki/Mark_and_recapture) technique. We can do the same with Dark Tidings decks because of the presense of Evil Twin decks !

In [7]:
evil_decks = twin_data[twin_data.evil_twin].shape[0]
non_evil_decks = twin_data[~twin_data.evil_twin].shape[0]
pairs_found = twin_data[twin_data.known_evil_twin].shape[0]

print(
    f"There are {pairs_found} twin pairs found between {evil_decks} evil and {non_evil_decks} non-evil decks."
)

There are 691 twin pairs found between 3975 evil and 47457 non-evil decks.


<IPython.core.display.Javascript object>

In [8]:
total_non_evil_estimate = int(evil_decks * (non_evil_decks / pairs_found))
total_evil_estimate = int(total_non_evil_estimate * evil_decks / non_evil_decks)

print(
    f"There were approx. {total_non_evil_estimate} non-evil decks printed and {total_evil_estimate} evil ones, so there is a {evil_decks * 100/(evil_decks + non_evil_decks):.2f} % chance to get an Evil Twin."
)
print(
    f"So the total estimated dark tidings decks printed is {total_non_evil_estimate + total_evil_estimate}."
)
print(
    f"Currently about {(100 * (evil_decks+non_evil_decks)/(total_evil_estimate + total_non_evil_estimate)):.2f} % was included in this analysis."
)

There were approx. 272997 non-evil decks printed and 22866 evil ones, so there is a 7.73 % chance to get an Evil Twin.
So the total estimated dark tidings decks printed is 295863.
Currently about 17.38 % was included in this analysis.


<IPython.core.display.Javascript object>