# MTGJSON Exploration

In [1]:
import json
with open('../data/AllPrintings.json', 'r', encoding = 'UTF-8') as f:
    data = json.load(f)

card_dict = data['data']


In [2]:
def preview_dict(d, n=50):
    for i, (k, v) in enumerate(d.items()):
        if i >= n:
            break
        print(f"{i+1}. Key: {k}")
        print(f"   Value (type: {type(v)}):", repr(v)[:100], "...\n")  # Truncated preview


Let's have a look at the dictionary.

In [3]:
preview_dict(card_dict)

1. Key: 10E
   Value (type: <class 'dict'>): {'baseSetSize': 383, 'block': 'Core Set', 'booster': {'draft': {'boosters': [{'contents': {'basic':  ...

2. Key: 2ED
   Value (type: <class 'dict'>): {'baseSetSize': 302, 'block': 'Core Set', 'booster': {'default': {'boosters': [{'contents': {'common ...

3. Key: 2X2
   Value (type: <class 'dict'>): {'baseSetSize': 331, 'booster': {'collector': {'boosters': [{'contents': {'commonUncommonShowcase':  ...

4. Key: 2XM
   Value (type: <class 'dict'>): {'baseSetSize': 332, 'booster': {'box-topper': {'boosters': [{'contents': {'boxtopper': 1}, 'weight' ...

5. Key: 30A
   Value (type: <class 'dict'>): {'baseSetSize': 594, 'booster': {'draft': {'boosters': [{'contents': {'a30Basic': 2, 'a30Common': 7, ...

6. Key: 3ED
   Value (type: <class 'dict'>): {'baseSetSize': 306, 'block': 'Core Set', 'booster': {'default': {'boosters': [{'contents': {'common ...

7. Key: 40K
   Value (type: <class 'dict'>): {'baseSetSize': 617, 'block': 'Commander', 'cards

Okay, so the top level is sets. Let's look at one of the sets. I'll do one I'm familiar with.

In [4]:
set_dict = card_dict['TDM']

for key in set_dict.keys():
    print(key)

baseSetSize
booster
cards
cardsphereSetId
code
isFoilOnly
isOnlineOnly
keyruneCode
languages
mcmId
mcmIdExtras
mcmName
mtgoCode
name
releaseDate
sealedProduct
tcgplayerGroupId
tokenSetCode
tokens
totalSetSize
translations
type


Some of that is self-explanatory, but a lot of it isn't. I'm going to explore each of those keys.
## baseSetSize

In [5]:
print(set_dict['baseSetSize'])

291


Presumably that's the number of cards in the set. But that seems to include basic lands, which are numbered 272-291 including all the different printings. So this number is only roughly meaningful. At least it doesn't include alternate art styles, which would inflate the number to 426. The real number of unique cards in the set, including the 5 basic lands, is 276. The difference comes from the fact that each land has 4 different styles. Most sets are going to have some extra styles for each basic land, so I think this number is useful even if it's not a precise count of the number of unique cards.
## booster

In [6]:
print(set_dict['booster'])

{'play': {'boosters': [{'contents': {'common': 7, 'foil': 1, 'land': 1, 'rareMythicWithBoosterfun': 1, 'uncommon': 3, 'wildcard': 1}, 'weight': 63}, {'contents': {'common': 6, 'foil': 1, 'land': 1, 'rareMythicWithBoosterfun': 1, 'specialGuest': 1, 'uncommon': 3, 'wildcard': 1}, 'weight': 1}], 'boostersTotalWeight': 64, 'name': 'Tarkir: Dragonstorm Play Booster', 'sheets': {'common': {'cards': {'016939df-525f-58dc-85e1-3edf49a5cb6a': 1, '03619678-7720-578e-b980-96beb524ced1': 1, '0d0c0f15-2645-50a6-94a0-3c345b34988b': 1, '1e058df0-8555-5e70-8ab3-855ece3f0388': 1, '1e4a898c-eb1d-56cc-8f55-46ac93efcb43': 1, '220afd9d-cbb5-5e56-8f58-cedde65bed6f': 1, '28943f30-e152-53f4-83d8-d370c4684837': 1, '2925f044-bca0-5bb2-a342-b3a70d7ff41c': 1, '2aeff58b-6d33-5faf-8941-c778c073e8df': 1, '2e3b9f6b-f1de-5d43-ae9a-8d87f9c88b9b': 1, '32283932-026d-5cbb-aec6-10c6e91e0f19': 1, '33af6e3b-1dfd-513b-8ca1-1567740a6d7e': 1, '33da33e2-ff60-5206-97af-b5a0241b2fe0': 1, '363e307a-d1a6-5812-823d-6cc70273be10': 1, '

Well, that's not intuitive. Let's try another way of looking at it.

In [7]:
for key in set_dict['booster'].keys():
    print(key)

play


Okay, there's only one type of booster. I wonder why that is. Seems like there should be set and collector boosters listed, too. I'm curious whether this is different in different sets.

In [8]:
for key in card_dict['SNC']['booster'].keys():
    print(key)

arena
collector
collector-sample
draft
prerelease-brokers
prerelease-cabaretti
prerelease-maestros
prerelease-obscura
prerelease-riveteers
set
theme-brokers
theme-cabaretti
theme-maestros
theme-obscura
theme-riveteers


Weird. Maybe TDM just isn't up to date since it's the newest set. Let's see what's in the play dict.

In [9]:
for key, value in set_dict['booster']['play'].items():
    print(key, value)

boosters [{'contents': {'common': 7, 'foil': 1, 'land': 1, 'rareMythicWithBoosterfun': 1, 'uncommon': 3, 'wildcard': 1}, 'weight': 63}, {'contents': {'common': 6, 'foil': 1, 'land': 1, 'rareMythicWithBoosterfun': 1, 'specialGuest': 1, 'uncommon': 3, 'wildcard': 1}, 'weight': 1}]
boostersTotalWeight 64
name Tarkir: Dragonstorm Play Booster
sheets {'common': {'cards': {'016939df-525f-58dc-85e1-3edf49a5cb6a': 1, '03619678-7720-578e-b980-96beb524ced1': 1, '0d0c0f15-2645-50a6-94a0-3c345b34988b': 1, '1e058df0-8555-5e70-8ab3-855ece3f0388': 1, '1e4a898c-eb1d-56cc-8f55-46ac93efcb43': 1, '220afd9d-cbb5-5e56-8f58-cedde65bed6f': 1, '28943f30-e152-53f4-83d8-d370c4684837': 1, '2925f044-bca0-5bb2-a342-b3a70d7ff41c': 1, '2aeff58b-6d33-5faf-8941-c778c073e8df': 1, '2e3b9f6b-f1de-5d43-ae9a-8d87f9c88b9b': 1, '32283932-026d-5cbb-aec6-10c6e91e0f19': 1, '33af6e3b-1dfd-513b-8ca1-1567740a6d7e': 1, '33da33e2-ff60-5206-97af-b5a0241b2fe0': 1, '363e307a-d1a6-5812-823d-6cc70273be10': 1, '3768817d-2372-5cb6-8a90-b29

Okay, I think I get the picture. This could be useful if I want to look at the history of the distribution of rarities in booster packs. I wonder what 'boostersTotalWeight' is. MTGJSON just says, "The weight of total booster pack configurations." Not very helpful. Maybe it's a commonly used term.

Well, I've tried looking it up, and all I've found is that some people literally weigh packs, believing that a heavier pack is more likely to have a mythic rare. That sounds like nonsense to me.

After looking back at the 'boosters' item, I think it's just one term of a ratio. In this case, 63/64 packs have the first distribution, and 1/64 packs have a Special Guest instead of one of the commons. So this just saves having to add together the weights of the different booster distributions.

sourceSetCodes seems to list the set codes that can possibly appear on cards from boosters in the set. You might think that would just be the set code for the set, but there are things like Special Guests that have their own set code (presumably to make it unambiguous that they do not have the same legality as in-set cards outside of Limited.)

By the way, I have confirmed that Special Guests are not included in the baseSetSize, which makes sense. But they might want to be counted when considering how many unique cards can appear in a Limited game in the set. I wonder whether that's tracked anywhere in MTGJSON. It might not be important because it has been consistently about a dozen cards in each set.
## cards
This is obviously going to be the main source of data. This will have to be unpacked in detail later, but let's take a quick look at one of the cards.

In [10]:
for card in set_dict['cards']:
    if (card['name'] == 'Boulderborn Dragon' 
        and int(card['number']) < set_dict['baseSetSize']):      ### because I only want to see one copy
            for key, value in card.items():
             print(key, value)

artist Alexander Ostrowski
artistIds ['3c278fce-4a9d-4b16-8c84-d29addc394f5']
availability ['arena', 'mtgo', 'paper']
boosterTypes ['default']
borderColor black
colorIdentity []
colors []
convertedManaCost 5.0
edhrecRank 16550
finishes ['nonfoil', 'foil']
flavorText The draconic power that flowed out from the dragonstorms imbued the landscape with draconic features—scales, claws, and appetites.
foreignData [{'flavorText': 'Die drakonische Macht, die aus den Drachenstürmen entwich, füllte die Landschaft mit drakonischen Eigenschaften — Schuppen, Klauen und Appetit.', 'identifiers': {'multiverseId': '693990', 'scryfallId': '7b36a6da-3a7b-45d6-a850-a3f0cbd963af'}, 'language': 'German', 'multiverseId': 693990, 'name': 'Felsgeborener Drache', 'text': 'Fliegend, Wachsamkeit\nImmer wenn diese Kreatur angreift, wende Überwachen 1 an. (Schaue dir die oberste Karte deiner Bibliothek an. Du kannst sie auf deinen Friedhof legen.)', 'type': 'Artefaktkreatur — Drache'}, {'flavorText': 'El poder drac

I can see that some of this is going to be less interesting, and some is going to be crucial.
### artist
Sure, it might be interesting to see if there are trends in how many different artists contribute to a set and who sticks around over time.
### artistIds
I wonder why this is a list when there's only one artist. Can one artist have multiple IDs? Why? There should be a way for me to search for a card where this list is longer than 1.

In [11]:
for i in card_dict.values():
    for card in i['cards']:
        if len(card['artistIds']) > 1:
            print(card['name'])

Benalish Knight
Benalish Knight
Serra's Embrace
Serra's Embrace
Wall of Swords
Wall of Swords
Cryoclasm
Incinerate
Femeref Archers
Mentor of the Meek
Weathered Wayfarer
Aethersnipe
Grand Arbiter Augustin IV
Teneb, the Harvester
Weathered Wayfarer
Grand Arbiter Augustin IV
Teneb, the Harvester
Braids, Conjurer Adept
Phyrexian Metamorph
Avenger of Zendikar
Throne of Geth
Academy Ruins
Mishra's Factory
Mishra's Factory
Fold into Aether
Nim Grotesque
Ferocious Charge
Opaline Bracers
Relentless Assault
Spitting Drake
Circle of Protection: Blue
Circle of Protection: Blue
Eager Cadet
Eager Cadet
Healing Salve
Healing Salve
Knighthood
Knighthood
Master Healer
Master Healer
Final Fortune
Final Fortune
Sudden Impact
Sudden Impact
Reclaim
Reclaim
Eager Cadet
Aven Flock
Aven Flock
Circle of Protection: Blue
Circle of Protection: Blue
Healing Salve
Healing Salve
Master Healer
Master Healer
Concentrate
Concentrate
Death Pits of Rath
Sudden Impact
Sudden Impact
Aven Flock
Aven Flock
Weathered Wayfare

KeyError: 'artistIds'

Apparently, not every card even has this artistIds key. That's good to know. I'll have to look into what cards don't and why. For now, let's take a look at Benalish Knight. I see that there are two artists: Zoltan Boros & Gabor Szikszai. So, the artist key is not going to be useful by itself. The artistIds will be more meaningful for aggregate information, especially if there are artists who sometimes work independently and sometimes collaborate. For now, let's find out which cards don't have this key and why:

Oh, okay. Some cards just don't have art! So if I want to do analysis involving artists, I have to remember to exclude those cards and focus on artistIds rather than artist.

### availability
Where the card exists (MTGA, MTGO, paper... is there anything else? Let's find out!)

In [None]:
for i in card_dict.values():
    for card in i['cards']:
        for a in card['availability']:
            if a not in ['arena', 'mtgo', 'paper']:
                print(' '.join([card['name'], card['setCode'], card['number'], ':', a]))

Aswan Jaguar PAST 1 : shandalar
Call from the Grave PAST 2 : shandalar
Faerie Dragon PAST 3 : shandalar
Goblin Polka Band PAST 4 : shandalar
Necropolis of Azar PAST 5 : shandalar
Orcish Catapult PAST 6 : shandalar
Power Struggle PAST 7 : shandalar
Prismatic Dragon PAST 8 : shandalar
Rainbow Knights PAST 9 : shandalar
Whimsy PAST 10 : shandalar
Pandora's Box PAST 11 : shandalar
Gem Bazaar PAST 12 : shandalar
Arden Angel PSDG 1 : dreamcast
Ashuza's Breath PSDG 2 : dreamcast
Camato Scout PSDG 3 : dreamcast
Hapato's Might PSDG 4 : dreamcast
Lydari Druid PSDG 5 : dreamcast
Lydari Elephant PSDG 6 : dreamcast
Murgish Cemetery PSDG 7 : dreamcast
Saji's Torrent PSDG 8 : dreamcast
Tornellan Protector PSDG 9 : dreamcast
Velukan Dragon PSDG 10 : dreamcast


Ah, interesting. There were two video games. These lists seem incomplete. Surely there were more than 12 cards in Shandalar and 10 in the Dreamcast game. Basic lands at least! Anyway, these cards are a curiosity. Shandalar seems like a precursor to Alchemy with mechanics that are obviously designed for a digital format. But I think for most purposes, these cards should be excluded.
### boosterTypes
Okay, this one just says, 'default' on Boulderborn Dragon. Can that be right? Let's look at all the cards:

In [None]:
for set_dict in card_dict.values():
    for card in set_dict['cards']:
        if 'boosterTypes' in card.keys():
            if card['boosterTypes'] != ['default']:
                print(card['name'], card['boosterTypes'])

Eager Cadet ['deck']
Vengeance ['deck']
Giant Octopus ['deck']
Sea Eagle ['deck']
Vizzerdrix ['deck']
Vizzerdrix ['deck']
Enormous Baloth ['deck']
Silverback Ape ['deck']
Eager Cadet ['deck']
Vengeance ['deck']
Coral Eel ['deck']
Giant Octopus ['deck']
Index ['deck']
Vizzerdrix ['deck']
Vizzerdrix ['deck']
Goblin Raider ['deck']
Enormous Baloth ['deck']
Spined Wurm ['deck']
Dawnfeather Eagle ['deck']
Alley Strangler ['deck']
Wrangle ['deck']
Ajani, Valiant Protector ['deck']
Inspiring Roar ['deck']
Ajani's Comrade ['deck']
Ajani's Aid ['deck']
Tranquil Expanse ['deck']
Tezzeret, Master of Metal ['deck']
Tezzeret's Betrayal ['deck']
Pendulum of Patterns ['deck']
Tezzeret's Simulacrum ['deck']
Submerged Boneyard ['deck']
Gideon, Martial Paragon ['deck']
Companion of the Trials ['deck']
Gideon's Resolve ['deck']
Graceful Cat ['deck']
Stone Quarry ['deck']
Liliana, Death Wielder ['deck']
Desiccated Naga ['deck']
Liliana's Influence ['deck']
Tattered Mummy ['deck']
Foul Orchard ['deck']
Cin

So, if it exists, it's either 'deck' or 'default'. I can't imagine what this means. I thought at first that 'deck' was for cards that can only be acquired in pre-built decks. But that doesn't make sense. Serra Redeemer is a regular set card. The documentation isn't helpful. It just says, "A list of types this card is in a booster pack." What does it mean for a card to be a 'deck' type in a booster pack?
### borderColor
Example: 
"black", "borderless", "gold", "silver", "white", "yellow"
### cardParts
This sounds important, but apparently not every card has them.

In [None]:
for set_dict in card_dict.values():
    for card in set_dict['cards']:
        if 'cardParts' in card.keys():
            print(card['name'], card['setCode'], card['number'], card['cardParts'])

Phyrexian Dragon Engine // Mishra, Lost to Phyrexia BRO 163a ['Mishra, Claimed by Gix', 'Phyrexian Dragon Engine', 'Mishra, Lost to Phyrexia']
Mishra, Lost to Phyrexia BRO 163b ['Mishra, Claimed by Gix', 'Phyrexian Dragon Engine', 'Mishra, Lost to Phyrexia']
Titania, Voice of Gaea // Titania, Gaea Incarnate BRO 193 ['Titania, Voice of Gaea', 'Argoth, Sanctum of Nature', 'Titania, Gaea Incarnate']
Mishra, Claimed by Gix // Mishra, Lost to Phyrexia BRO 216 ['Mishra, Claimed by Gix', 'Phyrexian Dragon Engine', 'Mishra, Lost to Phyrexia']
Urza, Lord Protector // Urza, Planeswalker BRO 225 ['Urza, Lord Protector', 'The Mightstone and Weakstone', 'Urza, Planeswalker']
The Mightstone and Weakstone // Urza, Planeswalker BRO 238a ['Urza, Lord Protector', 'The Mightstone and Weakstone', 'Urza, Planeswalker']
Urza, Planeswalker BRO 238b ['Urza, Lord Protector', 'The Mightstone and Weakstone', 'Urza, Planeswalker']
Argoth, Sanctum of Nature // Titania, Gaea Incarnate BRO 256a ['Titania, Voice of G

Got it, card parts is for cards that are played by melding other cards. Cards like Mishra, Lost to Phyrexia don't have a casting cost because they are played by melding two DFCs, each of which has half the melded card. That makes it tricky to decide whether to include the melded cards in analyses. For example, if I am interested in how many cards a set includes with a certain color identity, do I include the melded cards or not? I'm leaning towards no, because each melded card is the combined reverse faces of two other cards. But if I want to know how many Planeswalkers there are in a set, then I think Urza, Planeswalker should be included.
### colorIdentity
This is important for Commander deckbuilding rules. I need to know the data type.

In [None]:
print(set_dict['cards'][1]['colorIdentity'])

['W']


Okay, that's what I was expecting: a list of strings. I think it should be a set, but I can convert it as needed.
### colorIndicator
I don't immediately understand what this is, and the documentation is once again not super helpful: "A list of all the colors in the color indicator. This is the symbol prefixed to a card's types." What symbol? Where? Oh, I see, this is the little circle that gives an unambiguous color identity to a card face without a casting cost, like the back of a TDFC. Won't this always be the same as colorIdentity? Let's find out:

In [None]:
for set_dict in card_dict.values():
    for card in set_dict['cards']:
        if 'colorIndicator' in card.keys() and card['colorIdentity'] != card['colorIndicator']:
            print(card['name'], card['setCode'], card['number'], card['colorIdentity'], card['colorIndicator'])

Brutal Cathar // Moonrage Brute DBL 7 ['R', 'W'] ['R']
Suspicious Stowaway // Seafaring Werewolf DBL 80 ['G', 'U'] ['G']
Panicked Bystander // Cackling Culprit DBL 295 ['B', 'W'] ['B']
Loyal Cathar // Unhallowed Cathar DKA 13 ['B', 'W'] ['B']
Cecil, Dark Knight // Cecil, Redeemed Paladin FIN 91 ['B', 'W'] ['W']
Terra, Magical Adept // Esper Terra FIN 245 ['B', 'G', 'R', 'U', 'W'] ['G', 'R']
Terra, Magical Adept // Esper Terra FIN 323 ['B', 'G', 'R', 'U', 'W'] ['G', 'R']
Cecil, Dark Knight // Cecil, Redeemed Paladin FIN 380 ['B', 'W'] ['W']
Cecil, Dark Knight // Cecil, Redeemed Paladin FIN 445 ['B', 'W'] ['W']
Terra, Magical Adept // Esper Terra FIN 511 ['B', 'G', 'R', 'U', 'W'] ['G', 'R']
Cecil, Dark Knight // Cecil, Redeemed Paladin FIN 525 ['B', 'W'] ['W']
Archangel Avacyn // Avacyn, the Purifier INR 11 ['R', 'W'] ['R']
Town Gossipmonger // Incited Rabble INR 46 ['R', 'W'] ['R']
Archangel Avacyn // Avacyn, the Purifier INR 449 ['R', 'W'] ['R']
Town Gossipmonger // Incited Rabble INR 

Shows what I know. The color identity is important for Commander deck-building, but the color indicator determines the color of the permanent. So Brutal Cathar is not a red permanent for the purposes of triggering Ajani, Nacatl Avenger, but Moonrage Brute is. And when Brutal Cathar is on the stack, it can't be targeted by a card that targets red spells. Since I've decided to treat each DFC as a single card for most analyses, neither colorIdentity nor colorIndicator is going to work perfectly for color distribution analysis. I might need to put together casting cost and color Indicator...
### colors
Ah, there we are. That's exactly what 'colors' does. So this is the data to use for color analysis, except for Commander deckbuilding purposes. Again, it's a list of strings.
### convertedManaCost
I see that this is deprecated, and I should use manaValue instead. "Mana value" is the same thing as CMC, so this is not a problem.
### defense
This is something that will only show up on Battle cards, which are only in 1 set, which means there can't be any trends about them. If I'm ever interested in one-off mechanics that haven't come back, I'll flag Battles by the style, not this attribute.
### duelDeck
### edhrecRank
### edhrecSaltiness
### faceConvertedManaCost
### faceFlavorName
This refers to cards like Godzilla, King of the Monsters, which is just a fun wrapper on Zilortha, Strength Incarnate. Cards with this key can be left out of analyses the same way as alternate art cards, by requiring 
    card['number'] < set['baseSetSize']
### faceManaValue
This is the "mana value of the face for either half or part of the card." How is this going to be different from manaValue?

In [None]:
for set_dict in card_dict.values():
    for card in set_dict['cards']:
        if 'faceManaValue' in card.keys():
            print('{}, {} ({}): {} vs {}'.format(card['faceName'], card['setCode'], card['number'], card['manaValue'], card['faceManaValue']))

Realm-Cloaked Giant, AFC (70): 7.0 vs 7.0
Cast Off, AFC (70): 7.0 vs 5.0
Dusk, AKH (210): 9.0 vs 4.0
Dawn, AKH (210): 9.0 vs 5.0
Commit, AKH (211): 10.0 vs 4.0
Memory, AKH (211): 10.0 vs 6.0
Never, AKH (212): 7.0 vs 3.0
Return, AKH (212): 7.0 vs 4.0
Insult, AKH (213): 6.0 vs 3.0
Injury, AKH (213): 6.0 vs 3.0
Mouth, AKH (214): 7.0 vs 3.0
Feed, AKH (214): 7.0 vs 4.0
Start, AKH (215): 6.0 vs 3.0
Finish, AKH (215): 6.0 vs 3.0
Reduce, AKH (216): 6.0 vs 3.0
Rubble, AKH (216): 6.0 vs 3.0
Destined, AKH (217): 6.0 vs 2.0
Lead, AKH (217): 6.0 vs 4.0
Onward, AKH (218): 6.0 vs 3.0
Victory, AKH (218): 6.0 vs 3.0
Spring, AKH (219): 9.0 vs 3.0
Mind, AKH (219): 9.0 vs 6.0
Prepare, AKH (220): 6.0 vs 2.0
Fight, AKH (220): 6.0 vs 4.0
Failure, AKH (221): 3.0 vs 2.0
Comply, AKH (221): 3.0 vs 1.0
Rags, AKH (222): 11.0 vs 4.0
Riches, AKH (222): 11.0 vs 7.0
Cut, AKH (223): 4.0 vs 2.0
Ribbons, AKH (223): 4.0 vs 2.0
Heaven, AKH (224): 3.0 vs 1.0
Earth, AKH (224): 3.0 vs 2.0
Dusk, AKR (16): 9.0 vs 4.0
Dawn, AKR 

Okay, interesting. Every MDFC and double card is registered twice, once per side/half. Do TDFCs not have this attribute? How can I find out?

In [None]:
for set_dict in card_dict.values():
    for card in set_dict['cards']:
        if 'faceName' in card.keys():
            if card['faceName'] in ['Ambitious Farmhand','Seasoned Cathar']:
                print('{}, {} ({}): {}'.format(card['faceName'], card['setCode'], card['number'], card['faceManaValue']))

Ambitious Farmhand, DBL (2): 2.0
Seasoned Cathar, DBL (2): 0.0
Ambitious Farmhand, INR (8): 2.0
Seasoned Cathar, INR (8): 0.0
Ambitious Farmhand, INR (448): 2.0
Seasoned Cathar, INR (448): 0.0
Ambitious Farmhand, MID (2): 2.0
Seasoned Cathar, MID (2): 0.0


Okay, they've got it, too, and as expected, the transformed face has a mana value of 0. So all DFCs and double cards are duplicated per side. This complicates my decision to treat these cases as one card. If it's safe to assume that the front comes first (and I can always shore that up with a string comparison to the beginning of 'name'), a card can be constructed by taking the name from the first half... The more I think about this, the more dissatisfied I am with the Scryfall style. But there are problems with the other approach, too: it leads to incorrect calculation of the number of unique cards in the set, which is relevant for color distribution. I guess each approach has its virtues for different questions, so I should be prepared to use both approaches. For some questions, I can use the data as it as, for others I need to flatten it.
### faceName
Just what it sounds like: "The name on the face of the card."
### flavorName
Again, just a silly piece of information about alternate styles.
### flavorText
To me, this isn't as silly because it says something about the attitude of the canonical game flavor (the main version of each card), which is an important part of the experience. I'm interested in trends in the style, tone, length, and syntactical complexity of the flavor text, and questions like whether the qualities of the flavor text correlate with the saltiness of the card. That's a linguistic analysis project beyond my current means, but I can dream.
### foreignData
A big old mess of information about foreign language printings. I'm not interested in foreign language cards right now, so there's no point digging through this structure.
### frameEffects
### frameVersion
### hand
Wow, I didn't know about the Vanguard supplement. This is so cool, IMO! Why don't people play this way, except your opponent picks two characters you can use, and you choose one. That would add an interesting layer of strategy and meta-strategy to the game! Especially if you can force someone to change up the character in game 2. Anyway, I should ignore cards with this attribute for most purposes. I should start keeping a list of attributes to guard against.
### hasAlternativeDeckLimit
Cards that break the deck-building rules by allowing any number in the deck. Another interesting thing to track.
### hasContentWarning
The fact that these cards exist is part of the history of the game. I wonder how far back in time you have to go before Hasbro is willing to acknowledge content problems.

In [None]:
for set_dict in card_dict.values():
    for card in set_dict['cards']:
        if 'hasContentWarning' in card.keys() and card['hasContentWarning']:
            print(card['name'])

Crusade
Crusade
Crusade
Pradesh Gypsies
Crusade
Pradesh Gypsies
Crusade
Pradesh Gypsies
Crusade
Pradesh Gypsies
Jihad
Stone-Throwing Devils
Stone-Throwing Devils
Crusade
Crusade
Crusade
Crusade
Crusade
Crusade
Cleanse
Invoke Prejudice
Imprison
Pradesh Gypsies
Crusade
Cleanse
Crusade
Crusade
Pradesh Gypsies
Crusade


That's a short list! I'm surprised it doesn't include Earthbind. I wonder how far back you have to go to find cards that Hasbro is willing to put on this list. I'll need to take just the earliest printing of each one. This will be easier to do when I get the cards into a dataframe.
### hasFoil
Deprecated. This is covered by 'finishes'.
### hasNonFoil
Deprecated. This is covered by 'finishes'.
### identifiers
Unique IDs that different systems use to track Magic cards. I know most of these, but I see some that are new to me: abu, csi, mcm, miniature market, multiverse, scg, and tnt. I'll have to look those up sometime and see if they have anything to offer.
### isAlternative
Ah, that gives me a simpler way to exclude flavor-named cards and other alternative cards.
### isFullArt
This will be one type of card with the same art as the main version, just a different presentation.
### isFunny
This is another one that might be excluded from a lot of analyses, though cards from these sets are not necessarily banned in all formats.
### isOnlineOnly
Seems like an alternative to filtering by whether or not 'paper' is in card['availability']
### isOversized
This is something I'll almost always want to exclude.
### isPromo
Promo cards are like alternative cards as far as I'm concerned. I'll generally exclude them unless I'm specifically interested in the game outside the canonical core.
...
### isReprint
This is a convenient way for me to identify reprints. I can use this as an on/off switch to see if trends vary depending on whether reprints are included.
...
### keywords
This will be a really interesting thing to track, but it might not include cards that basically have the mechanic but not the keyword. Vigilance would be a good example. Will vigilance show up as a keyword for Serra Angel?

In [None]:
for set_dict in card_dict.values():
    for card in set_dict['cards']:
        if card['name'] == 'Serra Angel':
            print(card['keywords'])

['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigilance']
['Flying', 'Vigi

Yes! So 'keywords' is going to be pretty reliable for tracking mechanics, even before they were keyworded.
...
### legalities
This is an important characteristic, but it isn't useful for tracking trends because the original legality of the card will in many cases not be the same as its current legality.
...
### loyalty
Planeswalker loyalty is worth tracking. Has the typical starting loyalty changed much over time?
### manaCost
This will be sticky to work with: Example: "{1}{B}"

In [None]:
def show_attribute(card_dict, attribute, condition = True):
    for set_dict in card_dict.values():
        show_attribute_set(set_dict, attribute, condition = condition)

def show_attribute_set(set_dict, attribute, condition = True):
    for card in set_dict['cards']:
        if attribute in card.keys() and condition:
            print('{}, {} ({}): {}'.format(card['name'], card['setCode'], card['number'], card[attribute]))

show_attribute_set(card_dict['TDM'], 'manaCost')

Ugin, Eye of the Storms, TDM (1): {7}
Anafenza, Unyielding Lineage, TDM (2): {2}{W}
Arashin Sunshield, TDM (3): {3}{W}
Bearer of Glory, TDM (4): {1}{W}
Clarion Conqueror, TDM (5): {2}{W}
Coordinated Maneuver, TDM (6): {1}{W}
Dalkovan Packbeasts, TDM (7): {2}{W}
Descendant of Storms, TDM (8): {W}
Dragonback Lancer, TDM (9): {3}{W}
Duty Beyond Death, TDM (10): {1}{W}
Elspeth, Storm Slayer, TDM (11): {3}{W}{W}
Fortress Kin-Guard, TDM (12): {1}{W}
Furious Forebear, TDM (13): {1}{W}
Lightfoot Technique, TDM (14): {1}{W}
Loxodon Battle Priest, TDM (15): {4}{W}
Mardu Devotee, TDM (16): {W}
Osseous Exhale, TDM (17): {1}{W}
Poised Practitioner, TDM (18): {2}{W}
Rally the Monastery, TDM (19): {3}{W}
Rebellious Strike, TDM (20): {1}{W}
Riling Dawnbreaker // Signaling Roar, TDM (21): {4}{W}
Riling Dawnbreaker // Signaling Roar, TDM (21): {1}{W}
Sage of the Skies, TDM (22): {2}{W}
Salt Road Packbeast, TDM (23): {5}{W}
Smile at Death, TDM (24): {3}{W}{W}
Starry-Eyed Skyrider, TDM (25): {2}{W}
Static

It really is just a string. So if I want to make any use of this, I'll have to parse it. Seems like it could have been a list. I wonder why it's done this way.
### manaValue
The mana value of the card. What used to be called converted mana cost.

In [None]:
show_attribute_set(card_dict['TDM'], 'manaValue')

Ugin, Eye of the Storms, TDM (1): 7.0
Anafenza, Unyielding Lineage, TDM (2): 3.0
Arashin Sunshield, TDM (3): 4.0
Bearer of Glory, TDM (4): 2.0
Clarion Conqueror, TDM (5): 3.0
Coordinated Maneuver, TDM (6): 2.0
Dalkovan Packbeasts, TDM (7): 3.0
Descendant of Storms, TDM (8): 1.0
Dragonback Lancer, TDM (9): 4.0
Duty Beyond Death, TDM (10): 2.0
Elspeth, Storm Slayer, TDM (11): 5.0
Fortress Kin-Guard, TDM (12): 2.0
Furious Forebear, TDM (13): 2.0
Lightfoot Technique, TDM (14): 2.0
Loxodon Battle Priest, TDM (15): 5.0
Mardu Devotee, TDM (16): 1.0
Osseous Exhale, TDM (17): 2.0
Poised Practitioner, TDM (18): 3.0
Rally the Monastery, TDM (19): 4.0
Rebellious Strike, TDM (20): 2.0
Riling Dawnbreaker // Signaling Roar, TDM (21): 5.0
Riling Dawnbreaker // Signaling Roar, TDM (21): 5.0
Sage of the Skies, TDM (22): 3.0
Salt Road Packbeast, TDM (23): 6.0
Smile at Death, TDM (24): 5.0
Starry-Eyed Skyrider, TDM (25): 3.0
Static Snare, TDM (26): 5.0
Stormbeacon Blade, TDM (27): 2.0
Stormplain Detainmen

Interesting that there aren't two mana values for Riling Dawnbreaker // Signaling Roar. I thought it would be like a split card, where the mana value is the sum of the costs of both halves. It's more like a card with an adventure, obviously. So the mana value is based on the cost of the "main" part of the card, but it can have a different mana value on the stack.
### name
The important thing to note here is that split cards, DFCs, and other cards with two halves have both halves represented in the name, divided by '//'.
### number
Ironically enough, the number is a string. That's because it can actually contain letters and even symbols representing promotional qualities. That means I can't always use my trick of avoiding alternates by restricting analysis to card with numbers lower than the baseSetSize because comparison will fail if the card's number has letters or symbols in it.
...
### originalText
It could be interesting to see how many reprints have text different from the original text, but for most purposes this is not an important attribute.
...
### power
This is self-explanatory.
### printings
This seems like it might come in handy. It's a list of printing set codes in which the card was printed.
...
### rarity
This seems self-explanatory, but I don't actually know the history of rarity. Were there even "rare" cards in the original sets? Or did cards become rare because they weren't reprinted? I'll look into this. 
...
### rulings
It might be interesting to see whether there's a relationship between length of oracle text and number of rulings. This looks like a list of dictionaries.

In [None]:
show_attribute_set(card_dict['DSK'], 'rulings')

Acrobatic Cheerleader, DSK (1): [{'date': '2024-09-20', 'text': "If Acrobatic Cheerleader leaves the battlefield and then returns, it's a new object with no memory of its previous existence. Its ability will be able to trigger again."}, {'date': '2024-09-20', 'text': "If a creature with a survival ability isn't tapped when your second main phase begins, the ability won't trigger at all. You won't be able to tap it during your second main phase in time to have that ability trigger."}, {'date': '2024-09-20', 'text': "If a creature's survival ability triggers but that creature is untapped when the ability begins to resolve, that ability won't do anything."}, {'date': '2024-09-20', 'text': "If a creature's survival ability triggers but the creature leaves the battlefield before the ability resolves, use its tapped or untapped status as it last existed on the battlefield to determine whether or not the ability will do anything."}, {'date': '2024-09-20', 'text': "Once Acrobatic Cheerleader's

Each dictionary in the list represents a ruling. So the number of rulings can be calculated as len(card['rulings])
...
### setCode
The card is part of the value of 'cards' in the set dictionary, so this isn't strictly necessary, but it's convenient. It means I can pretty easy make a flat dataframe of all cards with their setCodes. I'll just need to grab the set release date for each set.
### subtypes
A list of all the subtypes found after the em-dash on the type line. Things like "Rogue," "Adventure," and "Siege."
### supertypes
These are the supertypes found before the em-dash on the type line. Things like "Basic" and "Legendary."
### text
Just a string with all the oracle text from the card.
### toughness
Creatures and vehicles (and maybe some other cards?) will have toughness and power. I'm curious whether there are any cards with power and toughness that are not creatures or vehicles. I would think any non-vehicles that can become creatures would handle power and toughness with counters. But let's see:

In [None]:
for set_dict in card_dict.values():
    for card in set_dict['cards']:
        if 'toughness' in card.keys() and 'Creature' not in card['types'] and 'Vehicle' not in card['subtypes']:
            print('{}, {} ({}): {}'.format(card['name'], card['setCode'], card['number'], card['type']))

Xyru Specter, CMB1 (51): Summon — Specter
Throat Wolf, CMB1 (65): Summon Wolf
Xyru Specter, CMB2 (51): Summon — Specter
Throat Wolf, CMB2 (65): Summon Wolf
Aswan Jaguar, PAST (1): Summon Jaguar
Faerie Dragon, PAST (3): Summon Dragon
Goblin Polka Band, PAST (4): Summon Goblin
Prismatic Dragon, PAST (8): Summon Dragon
Rainbow Knights, PAST (9): Summon Knights
1996 World Champion, PCEL (1): Summon Legend
Shichifukujin Dragon, PCEL (2): Summon Dragon
Aswan Jaguar, PMIC (1): Summon Jaguar
Marang River Regent // Coil and Catch // Marang River Regent, TDM (378): Instant — Omen
Scavenger Regent // Exude Toxin // Scavenger Regent, TDM (379): Sorcery — Omen
Bloomvine Regent // Claim Territory // Bloomvine Regent, TDM (381): Sorcery — Omen
Old Fogey, ULST (39): Summon — Dinosaur
Old Fogey, UND (67): Summon — Dinosaur
Atinlay Igpay, UNH (1): Eaturecray — Igpay
Old Fogey, UNH (106): Summon — Dinosaur
Old Fogey, UNH (106★): Summon — Dinosaur


Okay, it looks like a few creature cards still have the old Summon type, and a few (but not all) Omens have toughness listed for the spell half. Then there's Atinlay Igpay!
### uuid
The universal unique identifier (v5) generated by MTGJSON.

## cardsphereSetId
This seems pretty self-explanatory. It's not going to be useful for trend analysis.
## code

In [None]:
print(set_dict['code'])

TDM


Okay, this is just the set code again, but as a key/value pair instead of just the key.
...
### isOnlineOnly and isPaperOnly
These are booleans I can use if I want to focus on paper Magic or online only sets.
...
### releaseDate
The release date in ISO 8601 format for the set. I need to add this to each card in the dataframe.
...
### type
The expansion type of the set. Some examples:

In [13]:
set_types = []
for set_code in card_dict.keys():
    if not (card_dict[set_code]['type'] in set_types):
        set_types.append(card_dict[set_code]['type'])
print(set_types)

['core', 'masters', 'memorabilia', 'commander', 'expansion', 'draft_innovation', 'starter', 'archenemy', 'box', 'masterpiece', 'arsenal', 'funny', 'promo', 'duel_deck', 'from_the_vault', 'premium_deck', 'alchemy', 'planechase', 'token', 'minigame', 'vanguard', 'treasure_chest', 'spellbook']
