Let's take a look at some card-level data, courtesy of mtgjson.
Here's specifically where to find what I'm using:
https://mtgjson.com/downloads/all-files/

In [26]:
# data_path = '/Users/connorkenehan/Documents/GitHub/gathering-data/data/AtomicCards.json.zip'
data_path = '/Users/connorkenehan/Downloads/AtomicCards.json'
import pandas as pd
data = pd.read_json(data_path)

In [27]:
#let's see how the whole dataframe looks
data.head()

Unnamed: 0,meta,data
date,2023-08-24,
version,5.2.1+20230824,
"""Ach! Hans, Run!""",,"[{'colorIdentity': ['G', 'R'], 'colors': ['G',..."
"""Brims"" Barone, Midway Mobster",,"[{'colorIdentity': ['B', 'W'], 'colors': ['B',..."
"""Lifetime"" Pass Holder",,"[{'colorIdentity': ['B'], 'colors': ['B'], 'co..."


In [31]:
#that third field looks funky, let's check out an individual row
data.iloc[4]['data']

[{'colorIdentity': ['B'],
  'colors': ['B'],
  'convertedManaCost': 1.0,
  'edhrecRank': 11700,
  'firstPrinting': 'UNF',
  'foreignData': [],
  'identifiers': {'scryfallOracleId': '7bf6f13a-3c90-4bda-bc84-e026828bf4d1'},
  'keywords': ['Open an Attraction', 'Roll to Visit Your Attractions'],
  'layout': 'normal',
  'legalities': {'commander': 'Legal',
   'duel': 'Legal',
   'legacy': 'Legal',
   'oathbreaker': 'Legal',
   'vintage': 'Legal'},
  'manaCost': '{B}',
  'manaValue': 1.0,
  'name': '"Lifetime" Pass Holder',
  'power': '2',
  'printings': ['UNF'],
  'purchaseUrls': {'cardKingdom': 'https://mtgjson.com/links/9e1631dd345f664d',
   'cardKingdomFoil': 'https://mtgjson.com/links/67667a01d3ca6254',
   'cardmarket': 'https://mtgjson.com/links/4814d0ee0bfce13a',
   'tcgplayer': 'https://mtgjson.com/links/e13efc749614f1a8'},
  'subtypes': ['Zombie', 'Guest'],
  'supertypes': [],
  'text': '"Lifetime" Pass Holder enters the battlefield tapped.\nWhen "Lifetime" Pass Holder dies, open a

When we take a quick look at how our dataframe appears from the get-go, it's
clear that we need to do some additional work to transform it into something
we can easily work with and learn from.  Let's clean that data!

Some observations from eyeballing the first few rows:
1. We have three fields.
2. The first field seems to be a card name
3. The second field, called meta, appears to be metadata which we won't actually need while analyzing cards
4. The third field, called data, seems to have everything we need in it...it's just all smooshed in a dictionary inside of a list.  So we'll have to unpack that a bit.

Let's get into it and clean that data!

We know that there's some amount of null values in this dataset.  How much, and in what fields?

In [42]:
cleaned_data = data.reset_index()
print(len(cleaned_data),'total rows')
for column in cleaned_data.columns:
    na_count = cleaned_data[column].isna().sum()
    print('\n',column,':',na_count,'null rows')


27352 total rows

 index : 0 null rows

 meta : 27350 null rows

 data : 2 null rows


Ok, so these counts tell us that our meta column is only those two rows we saw when we looked at the top chunk of the dataset.  We also learn that our main (data) field only has two nulls in it, which we also saw earlier.  Let's drop those null fields.

In [43]:
keep_list = ['index','data']
cleaned_data = cleaned_data[keep_list].dropna()
cleaned_data.head()

Unnamed: 0,index,data
2,"""Ach! Hans, Run!""","[{'colorIdentity': ['G', 'R'], 'colors': ['G',..."
3,"""Brims"" Barone, Midway Mobster","[{'colorIdentity': ['B', 'W'], 'colors': ['B',..."
4,"""Lifetime"" Pass Holder","[{'colorIdentity': ['B'], 'colors': ['B'], 'co..."
5,"""Rumors of My Death . . .""","[{'colorIdentity': ['B'], 'colors': ['B'], 'co..."
6,+2 Mace,"[{'colorIdentity': ['W'], 'colors': ['W'], 'co..."


In [56]:
# test_data = cleaned_data.iloc[3]['data'][0]
test_data = cleaned_data
test_data['data_to_dict']
test_data
# pd.json_normalize(cleaned_data['data'])
pd.json_normalize(test_data).columns

Index(['colorIdentity', 'colors', 'convertedManaCost', 'firstPrinting',
       'foreignData', 'isFunny', 'layout', 'manaCost', 'manaValue', 'name',
       'printings', 'subtypes', 'supertypes', 'text', 'type', 'types',
       'identifiers.scryfallOracleId', 'purchaseUrls.cardKingdom',
       'purchaseUrls.cardKingdomFoil', 'purchaseUrls.cardmarket',
       'purchaseUrls.tcgplayer'],
      dtype='object')