<a href="https://colab.research.google.com/github/ZarakiKanzaki/project-lunar-ML/blob/main/ExploratoryDataAnalysisScryfall.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Exploratory Data Analysis

### What's Scryfall?
Scryfall is a search engine for Magic cards. The website Scryfall.com was introduced on the World Wide Web in October 2016, claiming to be faster than similar sites. It also advertises as mobile-friendly, comprehensive, timely, and powerful. Subsequent development focused on expanding the database to include high-resolution scans and previously uncatalogued game features such as artwork, card backs, tokens, and extras.

In [1]:
!pip install requests
import requests
import json

url = "https://api.scryfall.com/bulk-data/oracle_cards"

response = requests.get(url)



In [2]:
if response.status_code == 200:
    data = response.json()
    print(json.dumps(data.get("download_uri"), indent=2))

"https://data.scryfall.io/oracle-cards/oracle-cards-20231103090141.json"


In [3]:
    download_uri = data.get("download_uri")
    if download_uri:
        download_response = requests.get(download_uri)

        if download_response.status_code == 200:
            with open("bulk-files.json", "wb") as file:
                file.write(download_response.content)
            print("File downloaded successfully.")
        else:
            print(f"Failed to download file. Status code: {download_response.status_code}")
    else:
        print("No 'download_uri' found in the JSON data.")


File downloaded successfully.


The first thing we need to do with these bulk files is to clean them with pandas

In [4]:
import pandas as pd
df = pd.read_json("bulk-files.json")

df = df[df['border_color'] != 'silver ']
desired_categories = ['normal', 'saga', 'meld', "prototype", "transform", "split", "adventure", "flip", "modal_dfc", "leveler", "class"]
df = df[df['layout'].isin(desired_categories)]

df = df.drop(["oracle_id","multiverse_ids","mtgo_id","mtgo_foil_id","tcgplayer_id","cardmarket_id","released_at","uri","scryfall_uri","highres_image","image_status","image_uris","lang","cmc","colors","color_identity","reserved","foil","nonfoil","finishes","oversized","promo","games",
"reprint", "variation", "set_id", "set", "set_name", "set_type", "set_uri", "set_search_uri", "scryfall_set_uri", "rulings_uri", "prints_search_uri","collector_number","digital","rarity","card_back_id","artist","artist_ids","illustration_id","flavor_text",
"frame","full_art","textless","booster","story_spotlight","edhrec_rank","prices","related_uris","purchase_uris","security_stamp","preview","penny_rank","arena_id","all_parts","frame_effects","watermark","produced_mana","tcgplayer_etched_id","promo_types",
"life_modifier","hand_modifier","attraction_lights","color_indicator","content_warning"], axis=1)

df = df[df['legalities'].apply(lambda x: x.get('vintage', '') == 'legal')]

In [5]:
df.head()

Unnamed: 0,object,id,name,layout,mana_cost,type_line,oracle_text,keywords,legalities,border_color,power,toughness,card_faces,loyalty
0,card,86bf43b1-8d4e-4759-bb2d-0b2e03ba7012,Static Orb,normal,{3},Artifact,"As long as Static Orb is untapped, players can...",[],"{'standard': 'not_legal', 'future': 'not_legal...",white,,,,
1,card,7050735c-b232-47a6-a342-01795bfd0d46,Sensory Deprivation,normal,{U},Enchantment — Aura,Enchant creature\nEnchanted creature gets -3/-0.,[Enchant],"{'standard': 'not_legal', 'future': 'not_legal...",black,,,,
2,card,e718b21b-46d1-4844-985c-52745657b1ac,Road of Return,normal,{G}{G},Sorcery,Choose one —\n• Return target permanent card f...,[Entwine],"{'standard': 'not_legal', 'future': 'not_legal...",black,,,,
3,card,036ef8c9-72ac-46ce-af07-83b79d736538,Storm Crow,normal,{1}{U},Creature — Bird,Flying (This creature can't be blocked except ...,[Flying],"{'standard': 'not_legal', 'future': 'not_legal...",white,1.0,2.0,,
4,card,b125d1e7-5d9b-4997-88b0-71bdfc19c6f2,Walking Sponge,normal,{1}{U},Creature — Sponge,{T}: Target creature loses your choice of flyi...,[],"{'standard': 'not_legal', 'future': 'not_legal...",black,1.0,1.0,,


In [None]:
df.describe()

## Card types

In Magic the Gathering we can identify two main tipes of Cards:


*   Permanents: As the name says, it's a group of cards that are going to stay on the battlefield until removed by specific effect.
Permanents types of cards contain:

            1. Creatures
            2. Artifacts
            3. Enchantments
            4. Lands
            5. Planeswalkers
            6. Battles


*   Non Permanents: It's a group of card that as they resolves they instantly go to the Graveyard. Non Permanents are:

            1. Sorceries
            2. Instants




### Permanent
> 110.1 <br> A permanent is a card or token on the battlefield. A permanent remains on the battlefield indefinitely. A card or token becomes a permanent as it enters the battlefield and it stops being a permanent as it’s moved to another zone by an effect or rule.


For more information about rulings about permanents, go to section [110](https://media.wizards.com/2023/downloads/MagicCompRules%2020231013.pdf#%5B%7B%22num%22%3A37%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C88%2C571%2C0%5D) of the Comprehensive rule. <br>
Now I'm going to provide an example foreach type of Permanent.

This is an example of a creature card, as almost all of the following cards (aside from planeswalker) it has a blank loyalty value. You can find more rulings about Creatures [here](https://media.wizards.com/2023/downloads/MagicCompRules%2020231013.pdf#%5B%7B%22num%22%3A118%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C88%2C375%2C0%5D).

In [6]:
from IPython.display import Image, display
from tabulate import tabulate

url_images = 'https://api.scryfall.com/cards/named'
columns_to_display = ["name","layout","mana_cost","type_line","oracle_text", "power","toughness","loyalty"]

creature = requests.get(f"{url_images}?fuzzy=korvold+cursed&format=image&version=normal")

df[df['name']=='Korvold, Fae-Cursed King'][columns_to_display]

Unnamed: 0,name,layout,mana_cost,type_line,oracle_text,power,toughness,loyalty
18230,"Korvold, Fae-Cursed King",normal,{2}{B}{R}{G},Legendary Creature — Dragon Noble,"Flying\nWhenever Korvold, Fae-Cursed King ente...",4,4,


In [None]:
display(Image(data=creature.content))

This is an example of an Artifact. In this case you can have mixed types of Artifacts, in fact you can have Artifact creatures, so you'll have mixed values for P/T (from now on I'll use P/T for Power/Toughness). More about rulings at the section [301](https://media.wizards.com/2023/downloads/MagicCompRules%2020231013.pdf#%5B%7B%22num%22%3A116%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C88%2C522%2C0%5D) of the Comprehensive rules.



In [None]:
artifact = requests.get(f"{url_images}?fuzzy=sol+ring&format=image&version=normal")
df[df['name']=='Sol Ring'][columns_to_display]

In [None]:
display(Image(data=artifact.content))

This is an example of Enchantment. As the same as Artifact, there can be some mixed data such as Echantment creatures. For more about Enchantments' rulings, see the section [303](https://media.wizards.com/2023/downloads/MagicCompRules%2020231013.pdf#%5B%7B%22num%22%3A120%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C88%2C571%2C0%5D) of the Comprehensive Rules.

In [None]:
enchantment = requests.get(f"{url_images}?fuzzy=doubling+season&format=image&version=normal")
df[df['name']=='Doubling Season'][columns_to_display]

In [None]:
display(Image(data=enchantment.content))

This is an example of a land, for more information see section [305](https://media.wizards.com/2023/downloads/MagicCompRules%2020231013.pdf#%5B%7B%22num%22%3A124%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C88%2C651%2C0%5D) of the Comprehensive Rules.

In [None]:
land = requests.get(f"{url_images}?fuzzy=forest&format=image&version=normal&set=who")
df[df['name']=='Forest'][columns_to_display]

In [None]:
display(Image(data=land.content))

This is an example of Planeswalker, in this case some of the data is missing, expecially because this specific type of card doesn't have any Power nor Toughness. For rulings, see section [306](https://media.wizards.com/2023/downloads/MagicCompRules%2020231013.pdf#%5B%7B%22num%22%3A126%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C88%2C709%2C0%5D) of the Comprehensive Rules

In [None]:
planesawalker = requests.get(f"{url_images}?fuzzy=nissa+shadowed&format=image&version=normal")
df[df['name']=='Nissa of Shadowed Boughs'][columns_to_display]


In [None]:
display(Image(data=planesawalker.content))

Here's an example of battle. This specific type of card was introduced in the expansion *War of the Machines*. For more information about Battles see the section [310](https://media.wizards.com/2023/downloads/MagicCompRules%2020231013.pdf#%5B%7B%22num%22%3A130%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C88%2C329%2C0%5D) of the Comprehensive rules.

In [None]:
battle_face_front = requests.get(f"{url_images}?fuzzy=invasion+azgol&format=image&version=normal")
battle_face_back = requests.get(f"{url_images}?fuzzy=ashen+reaper&format=image&version=normal&face=back")
battle_face = pd.concat([pd.DataFrame(x) for x in df[df['name'].str.contains('Invasion of Azgol', case=False, na=False)]['card_faces']],
    ignore_index=True)

battle_face = battle_face.drop(["object","artist", 'artist_id', 'illustration_id', 'image_uris', 'flavor_name', 'color_indicator', 'flavor_text'], axis=1 )
battle_face

In [None]:
display(Image(data=battle_face_front.content))
display(Image(data=battle_face_back.content))

### Instants and sorceries (Non Permanents)
Instants, like sorceries, represent one-shot or short-term magical spells. They are never put onto the battlefield; instead, they take effect when their mana cost is paid and the spell resolves, and then are immediately put into the player's graveyard.
The difference between the two is defined by their speed, in fact, Sorceries can be cast only during the main phases, while instant could be cast at any time.


> 610.1. <br> A one-shot effect does something just once and doesn’t have a duration. Examples include
dealing damage, destroying a permanent, creating a token, and moving an object from one zone to
another.


Here is an example of instant, for more information see section [304](https://media.wizards.com/2023/downloads/MagicCompRules%2020231013.pdf#%5B%7B%22num%22%3A122%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C88%2C260%2C0%5D) of the Comprehensive rules.

In [None]:
instant = requests.get(f"{url_images}?fuzzy=growth+spiral&format=image&version=normal&set=who")
df[df['name']=='Growth Spiral'][columns_to_display]

In [None]:
display(Image(data=instant.content))

Here is an example of Sorcery, for more information see section [307](https://media.wizards.com/2023/downloads/MagicCompRules%2020231013.pdf#%5B%7B%22num%22%3A126%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C88%2C145%2C0%5D) of the Comprehensive rules.

In [None]:
sorcery = requests.get(f"{url_images}?fuzzy=explore&format=image&version=normal&set=sld")
df[df['name']=='Explore'][columns_to_display]

In [None]:
display(Image(data=sorcery.content))

## Card Frame
Let's print out what kind of layout, so we can see visually how is a specific layout

In [None]:
unique_layout = df['layout'].unique()
print(unique_layout)

### Normal Frame
In this case see any cards above (except for battles).

### Sagas
Saga is an enchantment type introduced in *Dominaria*. Each Saga tells the story of a key event from the past as it unfolds during each of your turns.Each separate step in the story is called a chapter, and is marked by a roman numeral (I, II, III, etc.). Saga cards are historic.

In [None]:
saga = requests.get(f"{url_images}?fuzzy=urza+saga&format=image&version=normal")
display(Image(data=saga.content))

### Meld
Meld is a keyword action that means to turn two meld cards on the back side into one oversized card, if you control the proper pair. It was introduced in *Eldritch Moon* with three pairs. Meld returned with three more meld pairs in *The Brothers' War*.

In [None]:
meld_face_1 = requests.get(f"{url_images}?fuzzy=mishra+claimed&format=image&version=normal")
meld_face_2 = requests.get(f"{url_images}?fuzzy=phyrexian+dragon+engine&format=image&version=normal")
meld_huge = requests.get(f"{url_images}?fuzzy=mishra+lost&format=image&version=normal")
display(Image(data=meld_face_1.content))
display(Image(data=meld_face_2.content))
display(Image(data=meld_huge.content))

### Prototype

Prototype is a keyword ability introduced in *The Brothers' War* which allows an alternate version of a card to be cast for less than its normal mana cost. It is unique to artifacts and artifact creatures. Each prototype card has two sets of characteristics:

    1. Its default mana cost, power, and toughness – printed in their normal positions on the card. The mana cost is entirely colorless and greater than 4.
    2. A secondary set of color, mana cost, power and toughness printed in the prototype ability.

In [None]:
prototype = requests.get(f"{url_images}?fuzzy=arcane+proxy&format=image&version=normal")
df[df['name']=='Arcane Proxy'][columns_to_display]


In [None]:
display(Image(data=prototype.content))

### Split Card
Split cards are Magic cards with two card faces on the front side. A split card is literally "split" into two separate cards each with its card name, art, mana cost, text, etc. Split cards can only be instants and sorceries, not permanents.


Regular split cards are named with a ”__________ and __________“ convention while Aftermath cards use a ”__________ to __________“ convention. In *Guilds of Ravnica*, the card halves have alliterative names, starting with the same three letters.

In [None]:
split1 = requests.get(f"{url_images}?fuzzy=discovery+dispersal&format=image&version=normal")
split2 = requests.get(f"{url_images}?fuzzy=heaven+earth&format=image&version=normal")
display(Image(data=split1.content))
display(Image(data=split2.content))

### Adventures
Adventure is a spell type, a subtype seen on instants and sorceries attached to permanent cards, primarily appearing on creatures. It was introduced in *Throne of Eldraine*. Permanents with Adventures are called adventurer cards in the rules, and are referred to as cards that "have an Adventure" when this attribute is significant to other cards.

Initially, Adventures only appeared on creature cards, though the rules did not specify any type restrictions. As such, they could, and eventually did, appear on other permanent types. They currently appear on artifacts and enchantments in addition to creatures.

In [None]:
adventure = requests.get(f"{url_images}?fuzzy=mosswood&format=image&version=normal&set")
display(Image(data=adventure.content))

### Flip Cards
Flip cards are two cards in one. When something is triggered, the card is flipped and becomes the "other" part of the card. These types of cards, like *Transform* and *Modal Daouble faced* cards, have a specific structure in the column *card_faces*

#### *Rulings*
You ignore the information on the bottom half of the card until the creature in play "flips" when certain heroic conditions are met. When you flip a hero, you turn it upside down and play with the other half of the card. All of the flipped versions are legendary and have powerful abilities.

In [None]:
flip_face_front = requests.get(f"{url_images}?fuzzy=akki+lavarunner&format=image&version=normal")

flip_face = pd.concat([pd.DataFrame(x) for x in df[df['name'].str.contains('Akki Lavarunner', case=False, na=False)]['card_faces']],
    ignore_index=True)

flip_face = flip_face.drop(["object","artist", 'artist_id', 'illustration_id', 'flavor_name'], axis=1 )
flip_face

In [None]:
from PIL import Image as AcquireImage
from io import BytesIO
display(Image(data=flip_face_front.content))
rotated = AcquireImage.open(BytesIO(flip_face_front.content)).rotate(180)
rotated.show()
display(rotated)

### Double Faced card
Double-faced cards (DFCs) in Magic have a regular card frame on each side, and no card back. Each face has a symbol to denote the front from the back. Traditional DFCs can be transformed or converted from their front face to their back face while modal DFCs can be played as either face but cannot transform or convert.
They can be litteraly any kind of cards.
<br>
More information about the most iconic layout at the section [712](https://media.wizards.com/2023/downloads/MagicCompRules%2020231013.pdf#%5B%7B%22num%22%3A5%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C88%2C605%2C0%5D) of the Comprehensive Rules.

In [None]:
tranform_face_front = requests.get(f"{url_images}?fuzzy=teachings+kirin&format=image&version=normal")
tranform_face_back = requests.get(f"{url_images}?fuzzy=kirin-touched&format=image&version=normal&face=back")
tranform_face = pd.concat([pd.DataFrame(x) for x in df[df['name'].str.contains('Kirin-Touched Orochi', case=False, na=False)]['card_faces']],
    ignore_index=True)

tranform_face = tranform_face.drop(["object","artist", 'artist_id', 'illustration_id', 'image_uris', 'flavor_name', 'color_indicator'], axis=1 )
tranform_face

In [None]:
display(Image(data=tranform_face_front.content))
display(Image(data=tranform_face_back.content))


### Modal double-faced cards

Like previously released double-faced cards, modal double-faced cards have two card faces, one on each side of the card. But these cards don't transform.When you play a modal double-faced card, you choose which face you're playing.

Modal double-faced cards fit in the same design space as split cards, but the latter can only be instants and sorceries. This means MDFCs tend to have at least one side be permanent. Technically, there could be an MDFC with two instants and/or sorceries with text that couldn’t fit on a split card.


In [None]:
modal_face_front = requests.get(f"{url_images}?fuzzy=jorn+winter&format=image&version=normal")
modal_face_back = requests.get(f"{url_images}?fuzzy=jorn+winter&format=image&version=normal&face=back")
modal_face = pd.concat([pd.DataFrame(x) for x in df[df['name'].str.contains('Jorn, God of Winter', case=False, na=False)]['card_faces']],
    ignore_index=True)

modal_face = modal_face.drop(["object","artist", 'artist_id', 'illustration_id', 'image_uris', 'flavor_name'], axis=1 )
modal_face

In [None]:
display(Image(data=modal_face_front.content))
display(Image(data=modal_face_back.content))

### Leveler Card
Leveler cards feature striated text boxes, three power/toughness boxes and use the level up keyword. They are usually referred to as Levelers.

In [None]:
leveler = requests.get(f"{url_images}?fuzzy=hexderinker&format=image&version=normal&set")
pd.options.display.max_colwidth = 1000
df[df['name']=='Hexdrinker'][columns_to_display]

In [None]:
display(Image(data=leveler.content))

### Class Card
Classes act as they would during a game of D&D. They are similar to the Level up mechanic, and have effects that stack as you level up.

Each Class has three abilities in sections of its text box, called Class abilities. The abilities are arranged vertically like Sagas and have vertical artwork of the D&D symbol of the creature class in its art. The first Class ability is active as long as you control the Class. The next two are activated abilities that allow it to level up. Class abilities are activated at sorcery speed, meaning during your main phase if the stack is empty. As mana is paid for the second ability, the Class will become level 2 and the first two class abilities are active. If a Class is level 2, you can activate the level 3 ability. Note that you can only activate a Class's level 3 ability if the Class is level 2.

Class abilities can be anything — static abilities, activated abilities, or triggered abilities.

A Class's level isn't tracked with or represented by counters. A Class's level is just something true about the permanent.

In [None]:
class_card = requests.get(f"{url_images}?fuzzy=rogue+class&format=image&version=normal&set")
df[df['name']=='Rogue Class'][columns_to_display]

In [None]:
display(Image(data=class_card.content))

## Stats and Info


In [None]:
df.info()

#### Permanents

In [None]:
permanent_cards = df[df.apply(is_permanent_card, axis=1)]
permanent_cards.info()

In [None]:
permanent_cards.isna().sum()

#### Nonpermanent

In [None]:
nonpermanent_cards = df[df.apply(is_nonpermanent_card, axis=1)]
nonpermanent_cards.info()

In [None]:
nonpermanent_cards.isna().sum()

#### Vanilla Creatures
Vanilla creatures are basically cards not really interesting to analyze in case, the `oracle_text` is empty

In [None]:
vanilla_cards = df[df.apply(is_vanilla, axis=1)]
vanilla_cards.info()

#### Double faced cards

In [None]:
double_faced_cards = df[df.apply(is_double_faced, axis=1)]
double_faced_cards.info()

In [None]:
double_faced_cards.isna().sum()

#### Meld Cards

In [None]:
meld_cards = df[df.apply(is_meld, axis=1)]
meld_cards.info()

In [None]:
meld_cards.isna().sum()

#### Flip Cards
As analyzed here, flip cards works as the same as transform cards inside the dataframe, so we could classify all of them as `multi_face_cards`.

In [None]:
flip_cards = df[df.apply(is_flip, axis=1)]
flip_cards.info()

In [None]:
flip_cards.isna().sum()

#### Split Cards
As analyzed here, split cards works as the same as transform cards inside the dataframe, so we could classify all of them as `multi_face_cards`.

In [None]:
split_cards = df[df.apply(is_split, axis=1)]
split_cards.info()

In [None]:
split_cards.isna().sum()

#### Adventure cards
As analyzed here, adventure cards works as the same as transform cards inside the dataframe, so we could classify all of them as `multi_face_cards`.

In [None]:
adventure_cards = df[df.apply(is_adventure, axis=1)]
adventure_cards.info()

In [None]:
adventure_cards.isna().sum()

#### Sagas
An interesting stats seems that saga's are referred to only cards that are not double faced like in the *Kamigawa Neon Dynasty*'s expansion

In [None]:
saga_cards = df[df.apply(is_saga, axis=1)]
saga_cards.info()

In [None]:
saga_cards.isna().sum()

#### Levelers

In [None]:
leveler_cards = df[df.apply(is_leveler, axis=1)]
leveler_cards.info()

In [None]:
leveler_cards.isna().sum()

#### class

In [None]:
class_cards = df[df.apply(is_class, axis=1)]
class_cards.info()

In [None]:
class_cards.isna().sum()

#### Prototype

In [None]:
prototype_cards = df[df.apply(is_prototype, axis=1)]
prototype_cards.info()

In [None]:
prototype_cards.isna().sum()

#### planeswalker
For this specific case, cards with `oracle_text` and `mana_cost` which are NaN are from `multi_face_cards`

In [None]:
planeswalker_cards = df[df.apply(is_planeswalker_card, axis=1)]
planeswalker_cards.info()

In [None]:
planeswalker_cards.isna().sum()

In [None]:
print("Number of Unique elements in the DataFrame:", df['name'].value_counts)

In [None]:
df.isna().sum()

## Data Preparation

In [9]:
import nltk
import re

Let's start by identifying all cards' names

In [10]:
name_connector = ' // '
df['name_length'] = df['name'].apply(len)
card_names = df.sort_values(by='name_length', ascending=False)[['name', 'name_length']]
#card_names = df[~df['name'].str.contains(name_connector)]['name']
#card_with_multiple_names = df[df['name'].str.contains(name_connector)]['name']

#split_card_names = []

#for card in card_with_multiple_names:
#    card_splitted = card.split(name_connector)
#    split_card_names.extend(card_splitted)

#card_names = card_names.append(pd.Series(split_card_names))

card_names = card_names[~df['name'].str.contains(name_connector)]

with open('card_names.txt', 'w') as f:
    for name in card_names:
        f.write(re.escape(name) + '\n')

  card_names = card_names[~df['name'].str.contains(name_connector)]


In [11]:
card_names

Unnamed: 0,name,name_length
6877,"Okina, Temple to the Grandfathers",33
6311,Narrow-Minded Baloney Fireworks,31
14301,"Liberator, Urza's Battlethopter",31
3746,Asmoranomardicadaistinaculdacar,31
7713,"Oviya Pashiri, Sage Lifecrafter",31
...,...,...
25720,Fly,3
29993,Nix,3
26814,Dub,3
16157,Hex,3


Now we go with the identification of the symbols of the cards

In [12]:
#https://api.scryfall.com/symbology
symbols_request = requests.get('https://api.scryfall.com/symbology')
symbols = pd.DataFrame(symbols_request.json().get('data'))
symbols = symbols[symbols['funny'] == False]
symbols = symbols[['symbol', 'english']]
symbols


Unnamed: 0,symbol,english
0,{T},tap this permanent
1,{Q},untap this permanent
2,{E},an energy counter
3,{PW},planeswalker
4,{CHAOS},chaos
...,...,...
70,{B},one black mana
71,{R},one red mana
72,{G},one green mana
73,{C},one colorless mana


Now I'll take all `oracle_text` from the cards and put in a variable called `raw_oracle_text`.

In [13]:
raw_oracle_text = df[~df['oracle_text'].isna()].sort_values(by='name_length', ascending=False)
raw_oracle_text = raw_oracle_text[['oracle_text','name_length']]
raw_oracle_text

Unnamed: 0,oracle_text,name_length
6877,"{T}: Add {G}.\n{G}, {T}: Target legendary crea...",33
14301,Flash\nFlying\nYou may cast colorless spells a...,31
7713,"{2}{G}, {T}: Create a 1/1 colorless Servo arti...",31
6311,"{TK}{TK} — Whenever this creature attacks, you...",31
5353,{2}: The next time an artifact source of your ...,31
...,...,...
16157,Destroy six target creatures.,3
26814,Enchant creature\nEnchanted creature gets +2/+...,3
29303,Tek gets +0/+2 as long as you control a Plains...,3
9731,Convoke (Your creatures can help cast this spe...,3


From now on we'll use some top 1K cards in order to the data due to its time constraint algorithm I need to solve.

In [14]:
NUMBER_OF_SAMPLES = 1000
RANDOM_SEED = 12

#raw_text_sample = raw_oracle_text.sample(n=NUMBER_OF_SAMPLES, random_state=RANDOM_SEED)
#card_names_sample = card_names.sample(n=NUMBER_OF_SAMPLES, random_state=RANDOM_SEED)

raw_text_sample = raw_oracle_text.head(NUMBER_OF_SAMPLES)
card_names_sample = card_names.head(NUMBER_OF_SAMPLES)



raw_text_sample = raw_text_sample.sort_values('name_length', ascending=False)
card_names_sample = card_names_sample.sort_values('name_length', ascending=False)

In [44]:
raw_text_sample

Unnamed: 0,oracle_text,name_length
6877,tap this permanent : Add one green mana .\none...,33
7713,"two generic mana one green mana , tap this per...",31
6311,a ticket counter a ticket counter — Whenever ...,31
5353,two generic mana : The next time an artifact s...,31
3746,"As long as you've discarded a card this turn, ...",31
...,...,...
18461,"When CARD_NAME dies, AMASS_ACT Zombies 2. (Put...",24
23449,Room abilities of dungeons you own trigger an ...,24
10827,+1: CREATE_ACT a 1/1 black Vampire creature to...,24
12406,tap this permanent : UNTAP_ACT another target ...,24


In [45]:
card_names_sample

Unnamed: 0,name,name_length
6877,"Okina, Temple to the Grandfathers",33
14301,"Liberator, Urza's Battlethopter",31
3746,Asmoranomardicadaistinaculdacar,31
7713,"Oviya Pashiri, Sage Lifecrafter",31
5353,Circle of Protection: Artifacts,31
...,...,...
17614,"Niv-Mizzet, the Firemind",24
9514,Grasp of the Hieromancer,24
19724,"Ayula, Queen Among Bears",24
27263,"Tocasia, Dig Site Mentor",24


In [15]:
# https://api.scryfall.com/catalog/ability-words
# https://api.scryfall.com/catalog/keyword-actions
# https://api.scryfall.com/catalog/keyword-abilities
ability_words = pd.DataFrame(requests.get('https://api.scryfall.com/catalog/ability-words').json().get('data'))
keyword_actions = pd.DataFrame(requests.get('https://api.scryfall.com/catalog/keyword-actions').json().get('data'))
keyword_abilities  = pd.DataFrame(requests.get('https://api.scryfall.com/catalog/keyword-abilities').json().get('data'))

In [46]:
ability_words

Unnamed: 0,0
0,Battalion
1,Bloodrush
2,Channel
3,Chroma
4,Cohort
5,Constellation
6,Converge
7,Delirium
8,Domain
9,Fateful hour


In [47]:
keyword_actions

Unnamed: 0,0
0,Seek
1,Activate
2,Attach
3,Cast
4,Counter
5,Create
6,Destroy
7,Discard
8,Double
9,Exchange


In [None]:
keyword_abilities

In [None]:
raw_oracle_text

In [17]:
def prepare_mask_ka(word):
    mask = re.sub(r"'s", "", word, flags=re.IGNORECASE)
    mask = mask.replace('-', '_')
    mask = mask.replace(' ', '_')
    mask = mask.upper()
    mask = f"{mask}_KA"
    return mask

def prepare_mask_action(word):
    mask = re.sub(r"'s", "", word, flags=re.IGNORECASE)
    mask = mask.replace('-', '_')
    mask = mask.replace(' ', '_')
    mask = mask.upper()
    mask = f"{mask}_ACT"
    return mask

def prepare_mask_aw(word):
    mask = re.sub(r"'s", "", word, flags=re.IGNORECASE)
    mask = mask.replace(' ', '_')
    mask = mask.upper()
    mask = f"{mask}_AW"
    return mask

In [20]:
def replace_symbols_with_english(text, symbols):
    for index, symbol in symbols.iterrows():
        text = text.replace(symbol['symbol'], f"{symbol['english']} ")

    return text

def replace_keyword_action(text, keyword_actions):
    for action in keyword_actions:
        mask = prepare_mask_action(action)
        pattern = re.compile(r'\b' + re.escape(action) + r'\b', re.IGNORECASE)
        text = pattern.sub(mask, text)

    return text

def replace_keyword_ability(text, keyword_abilities):
    for keyword in keyword_abilities:
        mask = prepare_mask_ka(keyword)
        pattern = re.compile(r'\b' + re.escape(keyword) + r'\b', re.IGNORECASE)
        text = pattern.sub(mask, text)

    return text

def replace_ability_word(text, ability_words):
    for ability_word in ability_words:
      mask = prepare_mask_aw(ability_word)
      text = text.replace(f"{ability_word} —", mask)
      text = text.replace(f"{ability_word} –", mask)
    return text

def replace_card_names(text, card_names):
    for card_name in card_names['name']:
      if("," in card_name):
        text = text.replace(card_name, 'CARD_NAME')
        text = text.replace(card_name.split(',')[0], 'CARD_NAME')
      else:
        text = text.replace(card_name, 'CARD_NAME')
    return text



In [42]:
split_oracle_text_df = raw_text_sample['oracle_text'].str.split('\n', expand=True).stack().reset_index(level=1, drop=True)

split_oracle_text_df = split_oracle_text_df.rename('split_oracle_text')

split_oracle_text_df.reset_index(inplace=True, drop=True)
split_oracle_text_df


0               tap this permanent : Add one green mana .
1       one green mana , tap this permanent : Target l...
2       two generic mana one green mana , tap this per...
3       four generic mana one green mana , tap this pe...
4       a ticket counter a ticket counter  — Whenever ...
                              ...                        
2291    CYCLING_KA one generic mana one blue mana  (on...
2292    When you cycle CARD_NAME, UNTAP_ACT target per...
2293    +1: REVEAL_ACT the top five cards of your libr...
2294    −3: You may put a green creature card from you...
2295    −7: You get an emblem with "Whenever you CAST_...
Name: split_oracle_text, Length: 2296, dtype: object

In [43]:
distinct_split_oracle_text_df = split_oracle_text_df.drop_duplicates().reset_index(drop=True)
distinct_split_oracle_text_df

0               tap this permanent : Add one green mana .
1       one green mana , tap this permanent : Target l...
2       two generic mana one green mana , tap this per...
3       four generic mana one green mana , tap this pe...
4       a ticket counter a ticket counter  — Whenever ...
                              ...                        
1867    CYCLING_KA one generic mana one blue mana  (on...
1868    When you cycle CARD_NAME, UNTAP_ACT target per...
1869    +1: REVEAL_ACT the top five cards of your libr...
1870    −3: You may put a green creature card from you...
1871    −7: You get an emblem with "Whenever you CAST_...
Name: split_oracle_text, Length: 1872, dtype: object

Now let's replace all occurrencies with a token calle `CARD_NAME`


In [24]:
clean_text_sample = raw_text_sample
clean_text_sample

Unnamed: 0,oracle_text,name_length
6877,"{T}: Add {G}.\n{G}, {T}: Target legendary crea...",33
7713,"{2}{G}, {T}: Create a 1/1 colorless Servo arti...",31
6311,"{TK}{TK} — Whenever this creature attacks, you...",31
5353,{2}: The next time an artifact source of your ...,31
3746,"As long as you've discarded a card this turn, ...",31
...,...,...
18461,"When Herald of the Dreadhorde dies, amass Zomb...",24
23449,Room abilities of dungeons you own trigger an ...,24
10827,+1: Create a 1/1 black Vampire creature token ...,24
12406,{T}: Untap another target permanent.\nCycling ...,24


In [25]:
clean_text_sample['oracle_text'] = clean_text_sample['oracle_text'].apply(replace_keyword_ability, keyword_abilities=keyword_abilities[0])
clean_text_sample

Unnamed: 0,oracle_text,name_length
6877,"{T}: Add {G}.\n{G}, {T}: Target legendary crea...",33
7713,"{2}{G}, {T}: Create a 1/1 colorless Servo arti...",31
6311,"{TK}{TK} — Whenever this creature attacks, you...",31
5353,{2}: The next time an artifact source of your ...,31
3746,"As long as you've discarded a card this turn, ...",31
...,...,...
18461,"When Herald of the Dreadhorde dies, amass Zomb...",24
23449,Room abilities of dungeons you own trigger an ...,24
10827,+1: Create a 1/1 black Vampire creature token ...,24
12406,{T}: Untap another target permanent.\nCYCLING_...,24


In [26]:
clean_text_sample['oracle_text'] = clean_text_sample['oracle_text'].apply(replace_ability_word, ability_words=ability_words[0])
clean_text_sample

Unnamed: 0,oracle_text,name_length
6877,"{T}: Add {G}.\n{G}, {T}: Target legendary crea...",33
7713,"{2}{G}, {T}: Create a 1/1 colorless Servo arti...",31
6311,"{TK}{TK} — Whenever this creature attacks, you...",31
5353,{2}: The next time an artifact source of your ...,31
3746,"As long as you've discarded a card this turn, ...",31
...,...,...
18461,"When Herald of the Dreadhorde dies, amass Zomb...",24
23449,Room abilities of dungeons you own trigger an ...,24
10827,+1: Create a 1/1 black Vampire creature token ...,24
12406,{T}: Untap another target permanent.\nCYCLING_...,24


In [27]:
clean_text_sample['oracle_text'] = clean_text_sample['oracle_text'].apply(replace_card_names, card_names=card_names_sample)
clean_text_sample

Unnamed: 0,oracle_text,name_length
6877,"{T}: Add {G}.\n{G}, {T}: Target legendary crea...",33
7713,"{2}{G}, {T}: Create a 1/1 colorless Servo arti...",31
6311,"{TK}{TK} — Whenever this creature attacks, you...",31
5353,{2}: The next time an artifact source of your ...,31
3746,"As long as you've discarded a card this turn, ...",31
...,...,...
18461,"When CARD_NAME dies, amass Zombies 2. (Put two...",24
23449,Room abilities of dungeons you own trigger an ...,24
10827,+1: Create a 1/1 black Vampire creature token ...,24
12406,{T}: Untap another target permanent.\nCYCLING_...,24


In [28]:
clean_text_sample['oracle_text'] = clean_text_sample['oracle_text'].apply(replace_keyword_action, keyword_actions=keyword_actions[0])
clean_text_sample

Unnamed: 0,oracle_text,name_length
6877,"{T}: Add {G}.\n{G}, {T}: Target legendary crea...",33
7713,"{2}{G}, {T}: CREATE_ACT a 1/1 colorless Servo ...",31
6311,"{TK}{TK} — Whenever this creature attacks, you...",31
5353,{2}: The next time an artifact source of your ...,31
3746,"As long as you've discarded a card this turn, ...",31
...,...,...
18461,"When CARD_NAME dies, AMASS_ACT Zombies 2. (Put...",24
23449,Room abilities of dungeons you own trigger an ...,24
10827,+1: CREATE_ACT a 1/1 black Vampire creature to...,24
12406,{T}: UNTAP_ACT another target permanent.\nCYCL...,24


In [29]:
clean_text_sample['oracle_text'] = clean_text_sample['oracle_text'].apply(replace_symbols_with_english, symbols=symbols)
clean_text_sample

Unnamed: 0,oracle_text,name_length
6877,tap this permanent : Add one green mana .\none...,33
7713,"two generic mana one green mana , tap this per...",31
6311,a ticket counter a ticket counter — Whenever ...,31
5353,two generic mana : The next time an artifact s...,31
3746,"As long as you've discarded a card this turn, ...",31
...,...,...
18461,"When CARD_NAME dies, AMASS_ACT Zombies 2. (Put...",24
23449,Room abilities of dungeons you own trigger an ...,24
10827,+1: CREATE_ACT a 1/1 black Vampire creature to...,24
12406,tap this permanent : UNTAP_ACT another target ...,24


In [None]:
clean_text_sample

Now we need to add the oracle text of all multi faced cards.

In [None]:
multi_faced_cards = df[df.apply(is_multi_faced, axis=1)].sort_values(by='name_length', ascending=False)[['card_faces', 'name_length']]
print(multi_faced_cards)
face_oracle_text = []
for index, row in multi_faced_cards.iterrows():
    card_faces = row['card_faces']
    name_length = row['name_length']

    for face in card_faces:
        oracle_text = face['oracle_text']
        face_oracle_text.append((oracle_text, name_length))

Time to append all the `face_oracle_text` into `raw_oracle_text`

In [None]:
raw_oracle_text = pd.concat([raw_oracle_text,pd.DataFrame(face_oracle_text, columns=['oracle_text', 'name_length'])])
raw_oracle_text = raw_oracle_text.sort_values(by='name_length', ascending=False)
raw_oracle_text

Now I'll save all `oracle_text` to a csv and perform a quick replace with the `sed` command

In [None]:
raw_oracle_text.to_csv('raw_oracle_text.csv', index=False, header=False)

In [None]:
filtered_df = df[~df['oracle_text'].isna()]

card_names = filtered_df[filtered_df['legalities'].apply(lambda x: x.get('vintage', '') == 'legal')].sort_values('name', ascending=True)['name']

raw_oracle_text = filtered_df['oracle_text']

for card_name in card_names:
    escaped_card_name = re.escape(card_name)
    mask = raw_oracle_text.str.contains(fr'\b{escaped_card_name}\b', regex=True)

    if mask.any():
        matching_indexes = filtered_df[mask].index
        print(f"{card_name} is contained as a whole word in the following rows with index: {matching_indexes}")

In [None]:
# sed -i 's|$(cat card_names.txt | sed 's/$/\\|/' | tr -d '\n')'|CARD_NAME|g' raw_oracle_text.csv
sed_command = "sed -i 's|" + "$(cat card_names.txt | sed 's/$/\\|/' | tr -d '\n')" + "|CARD_NAME|g' raw_oracle_text.csv"

with open('replace.sed', 'w') as f:
    f.write(sed_command)


In [None]:
!sed 's/.*/s\/&\/\bCARD_NAME\b\/g/' card_names.txt > replace_cards.sed

In [None]:
!sed -n '/card name/p' replace_cards.sed

In [None]:
!sed -E -f replace_cards.sed raw_oracle_text.csv > processed_oracle_text.csv

In [None]:
card_names

In [None]:
df.head()

## Fuction Definition

#### fuctions for cards

In [7]:
def is_multi_faced(row):
  return is_double_faced(row) or is_flip(row) or is_split(row) or is_adventure(row)

def is_saga(row):
  return 'saga' in row['layout'].lower()

def is_adventure(row):
  return 'adventure' in row['layout'].lower()

def is_prototype(row):
  return 'prototype' in row['layout'].lower()

def is_class(row):
  return 'class' in row['layout'].lower()

def is_leveler(row):
  return 'leveler' in row['layout'].lower()

def is_split(row):
  return 'split' in row['layout'].lower()

def is_double_faced(row):
  return is_transform(row) or is_modal_dfc(row)

def is_flip(row):
  return 'flip' in row['layout'].lower()

def is_meld(row):
  return 'meld' in row['layout'].lower()

def is_transform(row):
  return 'transform' in row['layout'].lower()

def is_modal_dfc(row):
  return 'modal_dfc' in row['layout'].lower()

def is_vanilla(row):
  return is_creature(row['type_line']) and is_null_or_empty(row['oracle_text']) and has_normal_layout(row)

def has_normal_layout(row):
  return 'normal' in row['layout'].lower()

def is_permanent_card(row):
  return (is_artifact_card(row)
          or is_creature_card(row)
          or is_enchantment_card(row)
          or is_land_card(row)
          or is_planeswalker_card(row)
          or is_battle_card(row))

def is_nonpermanent_card(row):
  return (is_instant_card(row)
          or is_sorcery_card(row))

def is_planeswalker_card(row):
  return is_planeswalker(row['type_line'])

def is_sorcery_card(row):
  return is_sorcery(row['type_line'])

def is_instant_card(row):
  return is_instant(row['type_line'])

def is_enchantment_card(row):
  return is_enchantment(row['type_line'])

def is_creature_card(row):
  return is_creature(row['type_line'])

def is_land_card(row):
  return is_land(row['type_line'])

def is_artifact_card(row):
  return is_artifact(row['type_line'])

def is_basic_land_card(row):
  return is_basic_land(row['type_line'])

def is_battle_card(row):
  return is_battle(row['type_line'])

def is_battle(type_line):
  return 'battle' in type_line.lower()

def is_basic_land(type_line):
  return 'basic' in type_line.lower() and 'land' in type_line.lower()

def is_artifact(type_line):
  return 'artifact' in type_line.lower()

def is_land(type_line):
  return 'land' in type_line.lower()

def is_planeswalker(type_line):
  return 'planeswalker' in type_line.lower()

def is_sorcery(type_line):
  return 'sorcery' in type_line.lower()

def is_instant(type_line):
  return 'instant' in type_line.lower()

def is_enchantment(type_line):
  return 'enchantment' in type_line.lower()

def is_creature(type_line):
  return 'creature' in type_line.lower()

#### other functions

In [8]:
def is_null_or_empty(input_string):
    return input_string is None or input_string == ''
