# All Printings Table

## Introduction

The purpose of this notebook is to process and upload all card data from MTGJSON into the postgresql database mtg_db. This is done through the following steps:
- Download the json file from MTGJSON's file server
- Check the version and date of the json file
- Pre-process the dictionary and convert it into a dataframe
- Push the keywords dataframe to the database "raw_data" schema

## Schemas

Set List Schema - Main table

| Column            | Renamed         | Dataype    | Description                                                                |
| ---               | ---             | ---        | ---                                                                        |
| code              | SET_CODE        | STRING     | The set code                                                               |
| name              | SET_NAME        | STRING     | Name of the set                                                            |
| baseSetSize       | BASE_SET_SIZE   | INTEGER    | The number of cards in the base set without promos or supplements          |
| booster           | BOOSTERS
| cardsphereSetId   | CS_SET_ID       | FLOAT      | ID for set in Cardsphere                                                   |
| mcmId             | CM_ID           | FLOAT      | Card Market set ID                                                         |
| mcmIdExtras       | CM_ID_ADD       | FLOAT      | If the set is split into two sets this is the additional Card Market ID    |
| mcmName           | CM_NAME         | STRING     | Name of the set on Card Market                                             |
| isFoilOnly        | FOIL_FLAG       | BOOLEAN    | Flag whether the set is only available as foils                            |
| isForeignOnly     | FOREIGN_FLAG    | BOOLEAN    | Flag whether the set is only available outside the US                      |
| keyruneCode       | KEYRUNE_CODE    | STRING     | ID for the keyrune database of set icons                                   |
| languages         | LANGUAGES       | LIST       | List of languages the set was printed in                                   |
| mtgoCode          | MTGO_SET_CODE   | STRING     | Set code on Magic The Gathering Online                                     |
| isNonFoilOnly     | NON_FOIL_FLAG   | BOOLEAN    | Flag whether the set is only available as non-foils                        |
| isOnlineOnly      | ONLINE_FLAG     | BOOLEAN    | Flag whether the set is only available in online formats                   |
| isPartialPreview  | PREVIEW_FLAG    | BOOLEAN    | Flag whether the set is still in preview and not complete                  |
| sealedProduct     | PRODUCT_INFO    | LIST       | Information about the purchasable sealed product                           |
| releaseDate       | RELEASE_DATE    | STRING     | Date the set was release, in format YYYY-MM-DD                             |
| block             | SET_BLOCK_NAME  | STRING     | Block the set is in, e.g. Kaladesh                                         |
| decks             | SET_DECKS       | LIST       | All decks associated with the set                                          |
| parentCode        | SET_PARENT_CODE | STRING     | Code of the parent set for set variations, e.g. promotions, guild kits etc |
| tokenSetCode      | SET_TOKEN_CODE  | STRING     | Code for the set's tokens                                                  |
| type              | SET_TYPE        | STRING     | The type of set, e.g. alchemy, commander, funny                            |
| tcgplayerGroupId  | TCGPG_ID        | INTEGER    | ID for the set on TCGplayer                                                |
| totalSetSize      | TOTAL_SET_SIZE  | INTEGER    | The number opf cards in the set with promos and supplements                |
| translations      | TRANSLATIONS    | DICTIONARY | The translated name of the set                                             |

## Python Libraries

In [2]:
import sys
import json
import requests
import lzma
from   tqdm                           import tqdm
import numpy                          as     np
import pandas                         as     pd

## Modular functions
# Setting the root path for finding the modules directory
import sys, os
sys.path.append(os.path.abspath(".."))
# Loading Modular functions
from   modules.data_recency import data_recency_check

# Clean-up
del sys, os

In [3]:
# Show all columns instead of truncating with "..."
pd.set_option("display.max_columns", None)

# (Optional) also show all rows
pd.set_option("display.max_rows", None)

# (Optional) widen the display area so columns don’t wrap badly
pd.set_option("display.width", None)

## Input

In [4]:
# URL for the MTGJSON file (example: AllPrintings)
url = "https://mtgjson.com/api/v5/AllPrintings.json.xz"

# Stream download the file to track progress
response = requests.get(url, stream=True)
response.raise_for_status()

# Prepare to track total size and read in chunks
total_size = int(response.headers.get('content-length', 0))  # total bytes, may be None
chunk_size = 1024 * 1024  # 1 MB per chunk
compressed_data = bytearray()  # store the downloaded bytes

# Clean-Up
del url, requests

In [5]:
# Iterate over response chunks, updating progress bar
with tqdm(total=total_size, unit='B', unit_scale=True, desc="Downloading") as pbar:
    for chunk in response.iter_content(chunk_size=chunk_size):
        if chunk:  # filter out keep-alive chunks
            compressed_data.extend(chunk)
            pbar.update(len(chunk))

# Clean-Up
del total_size, pbar, chunk, chunk_size, response

Downloading:   0%|          | 0.00/72.5M [00:00<?, ?B/s]

Downloading: 100%|██████████| 72.5M/72.5M [00:22<00:00, 3.18MB/s]


In [6]:
# Decompress the .xz file from the bytes you collected
decompressed_bytes = lzma.decompress(compressed_data)

# Parse JSON into a dictionary
dict__all_printings = json.loads(decompressed_bytes)

# Clean Up
del compressed_data, decompressed_bytes, json, lzma

## Pre-processing

In [7]:
# Checking the latest version of the input data
df__data_recency = data_recency_check(dict__all_printings, 'all printings')
display(df__data_recency)

# Clean-Up
del df__data_recency, data_recency_check

Unnamed: 0,json_type,latest_date,latest_version
0,all printings,2025-09-30,5.2.2+20250930


In [8]:
## Converting the first layer of JSON dictionary into dataframe
# Empty list for storing dataframes
list__set_data = []

# Listing the set codes
list__set_codes = list(dict__all_printings['data'].keys())

# Looping through the set codes making individual dataframes
for set_code in tqdm(list__set_codes, desc="Processing sets"):
    df__set = pd.json_normalize(dict__all_printings['data'][set_code], max_level=0)
    list__set_data.append(df__set)

# Concatenate sets into single DataFrame
df__sets = pd.concat(list__set_data, ignore_index=True)

# Clean Up
del list__set_data, list__set_codes, set_code, df__set, tqdm, dict__all_printings

Processing sets: 100%|██████████| 830/830 [00:13<00:00, 59.46it/s] 


In [9]:
# Making a copy of the cards data
df__cards = df__sets[['code'
                     ,'name'
                     ,'releaseDate'
                     ,'cards']].copy()

df__cards = df__cards.rename(columns={'code'         : 'SET_CODE'
                                     ,'name'         : 'SET_NAME'
                                     ,'releaseDate'  : 'RELEASE_DATE'
                                     ,'cards'        : 'CARDS'}).sort_values(by = 'RELEASE_DATE').reset_index(drop = True)

# Making a copy of the tokens data
df__tokens = df__sets[['code'
                      ,'name'
                      ,'releaseDate'
                      ,'tokenSetCode'
                      ,'tokens']].copy()

df__tokens = df__tokens.rename(columns={'code'         : 'SET_CODE'
                                       ,'name'         : 'SET_NAME'
                                       ,'releaseDate'  : 'RELEASE_DATE'
                                       ,'tokenSetCode' : 'SET_TOKEN_CODE'
                                       ,'tokens'       : 'TOKENS'}).sort_values(by = 'RELEASE_DATE').reset_index(drop = True)

# Clean-Up
del df__sets

## Main Code

In [10]:
df__cards.head()

Unnamed: 0,SET_CODE,SET_NAME,RELEASE_DATE,CARDS
0,LEA,Limited Edition Alpha,1993-08-05,"[{'artist': 'Dan Frazier', 'artistIds': ['059b..."
1,LEB,Limited Edition Beta,1993-10-04,"[{'artist': 'Dan Frazier', 'artistIds': ['059b..."
2,2ED,Unlimited Edition,1993-12-01,"[{'artist': 'Dan Frazier', 'artistIds': ['059b..."
3,CEI,Intl. Collectors' Edition,1993-12-10,"[{'artist': 'Dan Frazier', 'artistIds': ['059b..."
4,CED,Collectors' Edition,1993-12-10,"[{'artist': 'Dan Frazier', 'artistIds': ['059b..."


In [11]:
## Listing all the keys in the cards data model

all_keys = set()

for cards in df__cards['CARDS']:
    for card in cards:
        all_keys.update(card.keys())

for key in sorted(all_keys):
    print(key)

artist
artistIds
asciiName
attractionLights
availability
boosterTypes
borderColor
cardParts
colorIdentity
colorIndicator
colors
convertedManaCost
defense
duelDeck
edhrecRank
edhrecSaltiness
faceConvertedManaCost
faceFlavorName
faceManaValue
faceName
facePrintedName
finishes
flavorName
flavorText
foreignData
frameEffects
frameVersion
hand
hasAlternativeDeckLimit
hasFoil
hasNonFoil
identifiers
isAlternative
isFullArt
isFunny
isGameChanger
isOnlineOnly
isOversized
isPromo
isRebalanced
isReprint
isReserved
isStarter
isStorySpotlight
isTextless
isTimeshifted
keywords
language
layout
leadershipSkills
legalities
life
loyalty
manaCost
manaValue
name
number
originalPrintings
originalReleaseDate
originalText
otherFaceIds
power
printedName
printedText
printedType
printings
promoTypes
purchaseUrls
rarity
rebalancedPrintings
relatedCards
rulings
securityStamp
setCode
side
signature
sourceProducts
subsets
subtypes
supertypes
text
toughness
type
types
uuid
variations
watermark


In [12]:
df__cards[df__cards['SET_CODE'] == 'LEA']

Unnamed: 0,SET_CODE,SET_NAME,RELEASE_DATE,CARDS
0,LEA,Limited Edition Alpha,1993-08-05,"[{'artist': 'Dan Frazier', 'artistIds': ['059b..."


In [13]:
df__cards[df__cards['SET_CODE'] == 'LEA']['CARDS'].values[0][0]

{'artist': 'Dan Frazier',
 'artistIds': ['059bba56-5feb-42e4-8c2e-e2f1e6ba11f9'],
 'availability': ['paper'],
 'boosterTypes': ['default'],
 'borderColor': 'black',
 'colorIdentity': ['W'],
 'colors': ['W'],
 'convertedManaCost': 1.0,
 'edhrecRank': 20960,
 'finishes': ['nonfoil'],
 'foreignData': [],
 'frameVersion': '1993',
 'hasFoil': False,
 'hasNonFoil': True,
 'identifiers': {'cardKingdomId': '64004',
  'cardsphereId': '24523',
  'deckboxId': '4850',
  'mcmId': '5418',
  'mtgjsonV4Id': '5b4a162f-c574-5f7e-a883-375aa3ba6642',
  'multiverseId': '232',
  'scryfallCardBackId': '0aeebaf5-8c7d-4636-9e82-8c27447861f7',
  'scryfallId': 'd5c83259-9b90-47c2-b48e-c7d78519e792',
  'scryfallIllustrationId': '6757e04d-7bfc-4bdc-9dcb-02059a2d4e60',
  'scryfallOracleId': 'c7a6a165-b709-46e0-ae42-6f69a17c0621',
  'tcgplayerProductId': '1029'},
 'keywords': ['Enchant'],
 'language': 'English',
 'layout': 'normal',
 'legalities': {'commander': 'Legal',
  'duel': 'Legal',
  'legacy': 'Legal',
  'oat

In [14]:
pd.json_normalize(df__cards[df__cards['SET_CODE'] == 'LEA']['CARDS'].values[0][0], max_level=0)

Unnamed: 0,artist,artistIds,availability,boosterTypes,borderColor,colorIdentity,colors,convertedManaCost,edhrecRank,finishes,foreignData,frameVersion,hasFoil,hasNonFoil,identifiers,keywords,language,layout,legalities,manaCost,manaValue,name,number,originalText,printings,purchaseUrls,rarity,rulings,setCode,sourceProducts,subtypes,supertypes,text,type,types,uuid
0,Dan Frazier,[059bba56-5feb-42e4-8c2e-e2f1e6ba11f9],[paper],[default],black,[W],[W],1.0,20960,[nonfoil],[],1993,False,True,"{'cardKingdomId': '64004', 'cardsphereId': '24...",[Enchant],English,normal,"{'commander': 'Legal', 'duel': 'Legal', 'legac...",{W},1.0,Animate Wall,1,Target wall can now attack. Target wall's powe...,"[2ED, 30A, 3ED, 4BB, 4ED, 5ED, 6ED, CED, CEI, ...",{'cardKingdom': 'https://mtgjson.com/links/5ba...,rare,"[{'date': '2007-09-16', 'text': 'This is a cha...",LEA,{'nonfoil': ['7e17487a-7a6a-51ff-927e-7063226b...,[Aura],[],Enchant Wall\nEnchanted Wall can attack as tho...,Enchantment — Aura,[Enchantment],2b304dc1-8d7d-50a7-a310-2d0e5427935f


In [15]:
df__cards_lea = pd.json_normalize(df__cards[df__cards['SET_CODE'] == 'LEA']['CARDS'].values[0], max_level=0)

In [16]:
df__cards_lea['flavorText'] = df__cards_lea['flavorText'].where(pd.notna(df__cards_lea['flavorText']), None)

In [17]:
summary = []

for col in df__cards_lea.columns:
    type_counts = df__cards_lea[col].apply(lambda x: type(x).__name__).value_counts()
    n_types = len(type_counts)   # number of distinct datatypes in this column
    
    for dtype, count in type_counts.items():
        summary.append({
            "column_name": col,
            "datatype": dtype,
            "count": count,
            "datatype_count_per_column": n_types
        })

df_summary = pd.DataFrame(summary)

# Optional: sort for easier review
df_summary = df_summary.sort_values(
    ["column_name", "datatype"],
    ascending=[True, True]
).reset_index(drop=True)

df_summary

Unnamed: 0,column_name,datatype,count,datatype_count_per_column
0,artist,str,295,1
1,artistIds,list,295,1
2,availability,list,295,1
3,boosterTypes,list,295,1
4,borderColor,str,295,1
5,colorIdentity,list,295,1
6,colors,list,295,1
7,convertedManaCost,float,295,1
8,edhrecRank,float,295,1
9,edhrecSaltiness,float,295,1


In [21]:
df__cards_lea[df__cards_lea['artistIds'].apply(lambda x: len(x)) != 1]

Unnamed: 0,artist,artistIds,availability,boosterTypes,borderColor,colorIdentity,colors,convertedManaCost,edhrecRank,finishes,foreignData,frameVersion,hasFoil,hasNonFoil,identifiers,keywords,language,layout,legalities,manaCost,manaValue,name,number,originalText,printings,purchaseUrls,rarity,rulings,setCode,sourceProducts,subtypes,supertypes,text,type,types,uuid,edhrecSaltiness,flavorText,power,toughness,isReserved,hasContentWarning,isGameChanger,variations


In [32]:
cards = pd.json_normalize(df__cards['CARDS'].values[0], max_level=0)

In [28]:
df__cards.shape

(830, 4)

In [27]:
cards.shape

(295, 44)

In [25]:
cards.head()

Unnamed: 0,artist,artistIds,availability,boosterTypes,borderColor,colorIdentity,colors,convertedManaCost,edhrecRank,finishes,foreignData,frameVersion,hasFoil,hasNonFoil,identifiers,keywords,language,layout,legalities,manaCost,manaValue,name,number,originalText,printings,purchaseUrls,rarity,rulings,setCode,sourceProducts,subtypes,supertypes,text,type,types,uuid,edhrecSaltiness,flavorText,power,toughness,isReserved,hasContentWarning,isGameChanger,variations
0,Dan Frazier,[059bba56-5feb-42e4-8c2e-e2f1e6ba11f9],[paper],[default],black,[W],[W],1.0,20960.0,[nonfoil],[],1993,False,True,"{'cardKingdomId': '64004', 'cardsphereId': '24...",[Enchant],English,normal,"{'commander': 'Legal', 'duel': 'Legal', 'legac...",{W},1.0,Animate Wall,1,Target wall can now attack. Target wall's powe...,"[2ED, 30A, 3ED, 4BB, 4ED, 5ED, 6ED, CED, CEI, ...",{'cardKingdom': 'https://mtgjson.com/links/5ba...,rare,"[{'date': '2007-09-16', 'text': 'This is a cha...",LEA,{'nonfoil': ['7e17487a-7a6a-51ff-927e-7063226b...,[Aura],[],Enchant Wall\nEnchanted Wall can attack as tho...,Enchantment — Aura,[Enchantment],2b304dc1-8d7d-50a7-a310-2d0e5427935f,,,,,,,,
1,Jesper Myrfors,[c011318e-8503-48c1-a990-46e50aff48a0],[paper],[default],black,[W],[W],4.0,3928.0,[nonfoil],[],1993,False,True,"{'cardKingdomId': '64006', 'cardsphereId': '24...",,English,normal,"{'commander': 'Legal', 'duel': 'Legal', 'legac...",{3}{W},4.0,Armageddon,2,All lands in play are destroyed.,"[2ED, 30A, 3ED, 4BB, 4ED, 5ED, 6ED, A25, ATH, ...",{'cardKingdom': 'https://mtgjson.com/links/373...,rare,,LEA,{'nonfoil': ['7e17487a-7a6a-51ff-927e-7063226b...,[],[],Destroy all lands.,Sorcery,[Sorcery],22bf9c94-5e17-5b7d-ae82-e18f992a2ffb,2.83,,,,,,,
2,Mark Poole,[bfdeaf09-f915-4058-8e8b-bcac3bc43c33],[paper],[default],black,[W],[W],2.0,,[nonfoil],[],1993,False,True,"{'cardKingdomId': '64010', 'cardsphereId': '24...",,English,normal,"{'commander': 'Banned', 'duel': 'Banned', 'leg...",{1}{W},2.0,Balance,3,Whichever player has more lands in play must d...,"[2ED, 30A, 3ED, 4BB, 4ED, CED, CEI, EMA, FBB, ...",{'cardKingdom': 'https://mtgjson.com/links/0de...,rare,"[{'date': '2016-06-08', 'text': 'Balance doesn...",LEA,{'nonfoil': ['7e17487a-7a6a-51ff-927e-7063226b...,[],[],Each player chooses a number of lands they con...,Sorcery,[Sorcery],a1aa90b2-1c25-5c8f-8fed-46c295ef03b2,,,,,,,,
3,Douglas Shuler,[a9ddb513-51c7-455c-ab8f-5b90aae9f75b],[paper],[default],black,[W],[W],1.0,20248.0,[nonfoil],[],1993,False,True,"{'cardKingdomId': '64013', 'cardsphereId': '24...",[Banding],English,normal,"{'commander': 'Legal', 'duel': 'Legal', 'legac...",{W},1.0,Benalish Hero,4,Bands,"[2ED, 30A, 3ED, 4BB, 4ED, 5ED, CED, CEI, FBB, ...",{'cardKingdom': 'https://mtgjson.com/links/d16...,common,"[{'date': '2008-10-01', 'text': 'A maximum of ...",LEA,{'nonfoil': ['7e17487a-7a6a-51ff-927e-7063226b...,"[Human, Soldier]",[],"Banding (Any creatures with banding, and up to...",Creature — Human Soldier,[Creature],e645151c-9c01-5246-890d-becf25c794c8,0.36,Benalia has a complex caste system that change...,1.0,1.0,,,,
4,Dan Frazier,[059bba56-5feb-42e4-8c2e-e2f1e6ba11f9],[paper],[default],black,[W],[W],1.0,21632.0,[nonfoil],[],1993,False,True,"{'cardKingdomId': '64019', 'cardsphereId': '24...",[Enchant],English,normal,"{'commander': 'Legal', 'duel': 'Legal', 'legac...",{W},1.0,Black Ward,5,Target creature gains protection from black.,"[2ED, 30A, 3ED, 4BB, 4ED, CED, CEI, FBB, LEA, ...",{'cardKingdom': 'https://mtgjson.com/links/785...,uncommon,,LEA,{'nonfoil': ['7e17487a-7a6a-51ff-927e-7063226b...,[Aura],[],Enchant creature\nEnchanted creature has prote...,Enchantment — Aura,[Enchantment],5d993530-bac0-5533-b005-d098ea6071ab,0.11,,,,,,,
