# All Printings Table

## Introduction

The purpose of this notebook is to process and upload all card data from MTGJSON into the postgresql database mtg_db. This is done through the following steps:
- Download the json file from MTGJSON's file server
- Check the version and date of the json file
- Pre-process the dictionary and convert it into a dataframe
- Push the keywords dataframe to the database "raw_data" schema

## Schemas

Set List Schema - Main table

| Column            | Renamed         | Dataype    | Description                                                                |
| ---               | ---             | ---        | ---                                                                        |
| code              | SET_CODE        | STRING     | The set code                                                               |
| name              | SET_NAME        | STRING     | Name of the set                                                            |
| baseSetSize       | BASE_SET_SIZE   | INTEGER    | The number of cards in the base set without promos or supplements          |
| booster           | BOOSTERS
| cardsphereSetId   | CS_SET_ID       | FLOAT      | ID for set in Cardsphere                                                   |
| mcmId             | CM_ID           | FLOAT      | Card Market set ID                                                         |
| mcmIdExtras       | CM_ID_ADD       | FLOAT      | If the set is split into two sets this is the additional Card Market ID    |
| mcmName           | CM_NAME         | STRING     | Name of the set on Card Market                                             |
| isFoilOnly        | FOIL_FLAG       | BOOLEAN    | Flag whether the set is only available as foils                            |
| isForeignOnly     | FOREIGN_FLAG    | BOOLEAN    | Flag whether the set is only available outside the US                      |
| keyruneCode       | KEYRUNE_CODE    | STRING     | ID for the keyrune database of set icons                                   |
| languages         | LANGUAGES       | LIST       | List of languages the set was printed in                                   |
| mtgoCode          | MTGO_SET_CODE   | STRING     | Set code on Magic The Gathering Online                                     |
| isNonFoilOnly     | NON_FOIL_FLAG   | BOOLEAN    | Flag whether the set is only available as non-foils                        |
| isOnlineOnly      | ONLINE_FLAG     | BOOLEAN    | Flag whether the set is only available in online formats                   |
| isPartialPreview  | PREVIEW_FLAG    | BOOLEAN    | Flag whether the set is still in preview and not complete                  |
| sealedProduct     | PRODUCT_INFO    | LIST       | Information about the purchasable sealed product                           |
| releaseDate       | RELEASE_DATE    | STRING     | Date the set was release, in format YYYY-MM-DD                             |
| block             | SET_BLOCK_NAME  | STRING     | Block the set is in, e.g. Kaladesh                                         |
| decks             | SET_DECKS       | LIST       | All decks associated with the set                                          |
| parentCode        | SET_PARENT_CODE | STRING     | Code of the parent set for set variations, e.g. promotions, guild kits etc |
| tokenSetCode      | SET_TOKEN_CODE  | STRING     | Code for the set's tokens                                                  |
| type              | SET_TYPE        | STRING     | The type of set, e.g. alchemy, commander, funny                            |
| tcgplayerGroupId  | TCGPG_ID        | INTEGER    | ID for the set on TCGplayer                                                |
| totalSetSize      | TOTAL_SET_SIZE  | INTEGER    | The number opf cards in the set with promos and supplements                |
| translations      | TRANSLATIONS    | DICTIONARY | The translated name of the set                                             |

## Python Libraries

In [None]:
import sys
import json
import requests
import lzma
from   tqdm                           import tqdm
import numpy                          as     np
import pandas                         as     pd
from   sqlalchemy                     import create_engine, inspect, Table, Column, MetaData, Text, Date, text
from   sqlalchemy.dialects.postgresql import insert

## Modular functions
# Setting the root path for finding the modules directory
import sys, os
sys.path.append(os.path.abspath(".."))
# Loading Modular functions
from   modules.data_recency import data_recency_check, recency_check_upload
from   modules.utils_dict   import print_dict_structure
from   modules.utils_memory import list_variables_memory

# Clean-up
del sys, os

In [2]:
# Show all columns instead of truncating with "..."
pd.set_option("display.max_columns", None)

# (Optional) also show all rows
pd.set_option("display.max_rows", None)

# (Optional) widen the display area so columns don’t wrap badly
pd.set_option("display.width", None)

## Input

### Database Connection

In [3]:
## Setting up credentials for accessing postgresql "mtg_db" database

# Credentials for setting up connection to postgresql
user     = "postgres"
password = "as:123bpostgresql"
host     = "localhost"
port     = "5432"
database = "mtg_db"

# Engine connection to postgresql
engine = create_engine(f"postgresql+psycopg2://{user}:{password}@{host}:{port}/{database}")

In [4]:
## Creating the empty data_recency table if not exists
query = """
        CREATE TABLE IF NOT EXISTS raw_data.data_recency (
         json_type      TEXT PRIMARY KEY
        ,latest_date    DATE
        ,latest_version TEXT);
        """
with engine.begin() as conn:
    conn.execute(text(query))

### Input Data

In [5]:
# URL for the MTGJSON file (example: AllPrintings)
url = "https://mtgjson.com/api/v5/AllPrintings.json.xz"

# Stream download the file to track progress
response = requests.get(url, stream=True)
response.raise_for_status()

# Prepare to track total size and read in chunks
total_size = int(response.headers.get('content-length', 0))  # total bytes, may be None
chunk_size = 1024 * 1024  # 1 MB per chunk
compressed_data = bytearray()  # store the downloaded bytes

# Iterate over response chunks, updating progress bar
with tqdm(total=total_size, unit='B', unit_scale=True, desc="Downloading") as pbar:
    for chunk in response.iter_content(chunk_size=chunk_size):
        if chunk:  # filter out keep-alive chunks
            compressed_data.extend(chunk)
            pbar.update(len(chunk))

# Decompress the .xz file from the bytes you collected
decompressed_bytes = lzma.decompress(compressed_data)

# Parse JSON into a dictionary
dict__all_printings = json.loads(decompressed_bytes)

# Clean Up
del url, response, total_size, chunk_size, compressed_data,
del pbar, chunk, decompressed_bytes

Downloading: 100%|██████████| 72.5M/72.5M [00:23<00:00, 3.08MB/s]


## Pre-processing

In [6]:
# Checking the latest version of the input data
df__data_recency = data_recency_check(dict__all_printings, 'all printings')
df__data_recency

Unnamed: 0,json_type,latest_date,latest_version
0,all printings,2025-09-28,5.2.2+20250928


In [10]:
## Converting the first layer of JSON dictionary into dataframe
# Empty list for storing dataframes
list__set_data = []

# Listing the set codes
list__set_codes = list(dict__all_printings['data'].keys())

# Looping through the set codes making individual dataframes
for set_code in tqdm(list__set_codes, desc="Processing sets"):
    df__set = pd.json_normalize(dict__all_printings['data'][set_code], max_level=0)
    list__set_data.append(df__set)

# Concatenate sets into single DataFrame
df__sets = pd.concat(list__set_data, ignore_index=True)

# Clean Up
del list__set_data, list__set_codes, set_code, df__set

Processing sets: 100%|██████████| 830/830 [00:07<00:00, 109.63it/s]


In [27]:
# Making a copy of the cards data
df__cards = df__sets[['code'
                     ,'name'
                     ,'releaseDate'
                     ,'cards']].copy()

df__cards = df__cards.rename(columns={'code'         : 'SET_CODE'
                                     ,'name'         : 'SET_NAME'
                                     ,'releaseDate'  : 'RELEASE_DATE'
                                     ,'cards'        : 'CARDS'}).sort_values(by = 'RELEASE_DATE').reset_index(drop = True)

# Making a copy of the tokens data
df__tokens = df__sets[['code'
                      ,'name'
                      ,'releaseDate'
                      ,'tokenSetCode'
                      ,'tokens']].copy()

df__tokens = df__tokens.rename(columns={'code'         : 'SET_CODE'
                                       ,'name'         : 'SET_NAME'
                                       ,'releaseDate'  : 'RELEASE_DATE'
                                       ,'tokenSetCode' : 'SET_TOKEN_CODE'
                                       ,'tokens'       : 'TOKENS'}).sort_values(by = 'RELEASE_DATE').reset_index(drop = True)

## Main Code

In [31]:
df__cards[df__cards['SET_CODE'] == 'LEA']

Unnamed: 0,SET_CODE,SET_NAME,RELEASE_DATE,CARDS
0,LEA,Limited Edition Alpha,1993-08-05,"[{'artist': 'Dan Frazier', 'artistIds': ['059b..."


In [35]:
df__cards[df__cards['SET_CODE'] == 'LEA']['CARDS'].values[0][0]

{'artist': 'Dan Frazier',
 'artistIds': ['059bba56-5feb-42e4-8c2e-e2f1e6ba11f9'],
 'availability': ['paper'],
 'boosterTypes': ['default'],
 'borderColor': 'black',
 'colorIdentity': ['W'],
 'colors': ['W'],
 'convertedManaCost': 1.0,
 'edhrecRank': 20938,
 'finishes': ['nonfoil'],
 'foreignData': [],
 'frameVersion': '1993',
 'hasFoil': False,
 'hasNonFoil': True,
 'identifiers': {'cardKingdomId': '64004',
  'cardsphereId': '24523',
  'deckboxId': '4850',
  'mcmId': '5418',
  'mtgjsonV4Id': '5b4a162f-c574-5f7e-a883-375aa3ba6642',
  'multiverseId': '232',
  'scryfallCardBackId': '0aeebaf5-8c7d-4636-9e82-8c27447861f7',
  'scryfallId': 'd5c83259-9b90-47c2-b48e-c7d78519e792',
  'scryfallIllustrationId': '6757e04d-7bfc-4bdc-9dcb-02059a2d4e60',
  'scryfallOracleId': 'c7a6a165-b709-46e0-ae42-6f69a17c0621',
  'tcgplayerProductId': '1029'},
 'keywords': ['Enchant'],
 'language': 'English',
 'layout': 'normal',
 'legalities': {'commander': 'Legal',
  'duel': 'Legal',
  'legacy': 'Legal',
  'oat

In [37]:
pd.json_normalize(df__cards[df__cards['SET_CODE'] == 'LEA']['CARDS'].values[0][0], max_level=0)

Unnamed: 0,artist,artistIds,availability,boosterTypes,borderColor,colorIdentity,colors,convertedManaCost,edhrecRank,finishes,foreignData,frameVersion,hasFoil,hasNonFoil,identifiers,keywords,language,layout,legalities,manaCost,manaValue,name,number,originalText,printings,purchaseUrls,rarity,rulings,setCode,sourceProducts,subtypes,supertypes,text,type,types,uuid
0,Dan Frazier,[059bba56-5feb-42e4-8c2e-e2f1e6ba11f9],[paper],[default],black,[W],[W],1.0,20938,[nonfoil],[],1993,False,True,"{'cardKingdomId': '64004', 'cardsphereId': '24...",[Enchant],English,normal,"{'commander': 'Legal', 'duel': 'Legal', 'legac...",{W},1.0,Animate Wall,1,Target wall can now attack. Target wall's powe...,"[2ED, 30A, 3ED, 4BB, 4ED, 5ED, 6ED, CED, CEI, ...",{'cardKingdom': 'https://mtgjson.com/links/5ba...,rare,"[{'date': '2007-09-16', 'text': 'This is a cha...",LEA,{'nonfoil': ['7e17487a-7a6a-51ff-927e-7063226b...,[Aura],[],Enchant Wall\nEnchanted Wall can attack as tho...,Enchantment — Aura,[Enchantment],2b304dc1-8d7d-50a7-a310-2d0e5427935f


## Output

In [28]:
recency_check_upload(schema_name = "raw_data"
                    ,table_name  = "data_recency"
                    ,dataframe   = df__data_recency)

In [25]:
# Uploading the keywords dataframe to postgresql
df__boosters.to_sql(name      = 'boosters'
                   ,con       = engine
                   ,schema    = 'raw_data'
                   ,if_exists = 'replace'
                   ,index     = False)

187

## Checks

### Boosters

In [30]:
# Check the json file date and version
query = """
        SELECT *
        FROM raw_data.data_recency
        """
pd.read_sql_query(query, con=engine)

Unnamed: 0,json_type,latest_date,latest_version
0,keyword,2025-08-20,5.2.2+20250820
1,all printings,2025-09-08,5.2.2+20250908


In [31]:
# Check the dataframe top 10 values
query = """
        SELECT *
        FROM raw_data.boosters
        LIMIT 10
        """
pd.read_sql_query(query, con=engine)

Unnamed: 0,SET_CODE,BOOSTER_TYPE,BOOSTER_CONTENT
0,10E,draft,"{'boosters': [{'contents': {'basic': 1, 'commo..."
1,2ED,default,"{'boosters': [{'contents': {'common': 11, 'rar..."
2,2ED,starter,{'boosters': [{'contents': {'commonWithDuplica...
3,2X2,collector,{'boosters': [{'contents': {'commonUncommonSho...
4,2X2,draft,{'boosters': [{'contents': {'commonWithShowcas...
5,2XM,box-topper,"{'boosters': [{'contents': {'boxtopper': 1}, '..."
6,2XM,draft,"{'boosters': [{'contents': {'common': 8, 'dedi..."
7,2XM,vip,"{'boosters': [{'contents': {'foilBasic': 2, 'f..."
8,30A,draft,"{'boosters': [{'contents': {'a30Basic': 2, 'a3..."
9,3ED,default,"{'boosters': [{'contents': {'common': 11, 'rar..."


### Cards