In this notebook, we are going to deal with json with a nested structure. We are also going to build a project that works like pandas that can deal with these dictionary.

In [1]:
import json
import pathlib

In [5]:
poke_dict = json.loads(pathlib.Path("data/pokemon.json").read_text())

This is the json file we will be working with.

In [6]:
poke_dict[:3]

[{'name': 'Bulbasaur',
  'type': ['Grass', 'Poison'],
  'total': 318,
  'hp': 45,
  'attack': 49},
 {'name': 'Ivysaur',
  'type': ['Grass', 'Poison'],
  'total': 405,
  'hp': 60,
  'attack': 62},
 {'name': 'Venusaur',
  'type': ['Grass', 'Poison'],
  'total': 525,
  'hp': 80,
  'attack': 82}]

We want to be able query it easily.

Lets try to query all grass type pokemon

In [7]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob
    
    def keep(self, func):
        return [d for d in self.blob if func(d)]

This class can now give me all the pokemon whose type is 'Grass'

In [8]:
Clumper(poke_dict).keep(lambda d: 'Grass' in d['type'])

[{'name': 'Bulbasaur',
  'type': ['Grass', 'Poison'],
  'total': 318,
  'hp': 45,
  'attack': 49},
 {'name': 'Ivysaur',
  'type': ['Grass', 'Poison'],
  'total': 405,
  'hp': 60,
  'attack': 62},
 {'name': 'Venusaur',
  'type': ['Grass', 'Poison'],
  'total': 525,
  'hp': 80,
  'attack': 82},
 {'name': 'VenusaurMega Venusaur',
  'type': ['Grass', 'Poison'],
  'total': 625,
  'hp': 80,
  'attack': 100},
 {'name': 'Oddish',
  'type': ['Grass', 'Poison'],
  'total': 320,
  'hp': 45,
  'attack': 50},
 {'name': 'Gloom',
  'type': ['Grass', 'Poison'],
  'total': 395,
  'hp': 60,
  'attack': 65},
 {'name': 'Vileplume',
  'type': ['Grass', 'Poison'],
  'total': 490,
  'hp': 75,
  'attack': 80},
 {'name': 'Paras',
  'type': ['Bug', 'Grass'],
  'total': 285,
  'hp': 35,
  'attack': 70},
 {'name': 'Parasect',
  'type': ['Bug', 'Grass'],
  'total': 405,
  'hp': 60,
  'attack': 95},
 {'name': 'Bellsprout',
  'type': ['Grass', 'Poison'],
  'total': 300,
  'hp': 50,
  'attack': 75},
 {'name': 'Weepin

Now lets make it where we can chain a couple of the keep()'s together. The issue is that the list method in the class cannot append a keep method on top.

We can fix this by making the keep method returns a clumper object so that now we can chain them

In [10]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob
    
    def keep(self, func):
        return Clumper([d for d in self.blob if func(d)])

I can now call all pokemon that are both Grass type and have hp greater less than 60

In [11]:
(Clumper(poke_dict)
 .keep(lambda d: 'Grass' in d['type'])
 .keep(lambda d: d['hp'] < 60)
 .blob)

[{'name': 'Bulbasaur',
  'type': ['Grass', 'Poison'],
  'total': 318,
  'hp': 45,
  'attack': 49},
 {'name': 'Oddish',
  'type': ['Grass', 'Poison'],
  'total': 320,
  'hp': 45,
  'attack': 50},
 {'name': 'Paras',
  'type': ['Bug', 'Grass'],
  'total': 285,
  'hp': 35,
  'attack': 70},
 {'name': 'Bellsprout',
  'type': ['Grass', 'Poison'],
  'total': 300,
  'hp': 50,
  'attack': 75},
 {'name': 'Chikorita',
  'type': ['Grass'],
  'total': 318,
  'hp': 45,
  'attack': 49},
 {'name': 'Hoppip',
  'type': ['Grass', 'Flying'],
  'total': 250,
  'hp': 35,
  'attack': 35},
 {'name': 'Skiploom',
  'type': ['Grass', 'Flying'],
  'total': 340,
  'hp': 55,
  'attack': 45},
 {'name': 'Sunkern', 'type': ['Grass'], 'total': 180, 'hp': 30, 'attack': 30},
 {'name': 'Treecko', 'type': ['Grass'], 'total': 310, 'hp': 40, 'attack': 45},
 {'name': 'Grovyle', 'type': ['Grass'], 'total': 405, 'hp': 50, 'attack': 65},
 {'name': 'Lotad',
  'type': ['Water', 'Grass'],
  'total': 220,
  'hp': 40,
  'attack': 30},

We can further modify the code so that we dont have to use mulitple keep() methods

In [13]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob
    
    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

Now we only need one keep method

In [14]:
(Clumper(poke_dict)
 .keep(lambda d: 'Grass' in d['type'],
       lambda d: d['hp'] < 60)
 .blob)

[{'name': 'Bulbasaur',
  'type': ['Grass', 'Poison'],
  'total': 318,
  'hp': 45,
  'attack': 49},
 {'name': 'Oddish',
  'type': ['Grass', 'Poison'],
  'total': 320,
  'hp': 45,
  'attack': 50},
 {'name': 'Paras',
  'type': ['Bug', 'Grass'],
  'total': 285,
  'hp': 35,
  'attack': 70},
 {'name': 'Bellsprout',
  'type': ['Grass', 'Poison'],
  'total': 300,
  'hp': 50,
  'attack': 75},
 {'name': 'Chikorita',
  'type': ['Grass'],
  'total': 318,
  'hp': 45,
  'attack': 49},
 {'name': 'Hoppip',
  'type': ['Grass', 'Flying'],
  'total': 250,
  'hp': 35,
  'attack': 35},
 {'name': 'Skiploom',
  'type': ['Grass', 'Flying'],
  'total': 340,
  'hp': 55,
  'attack': 45},
 {'name': 'Sunkern', 'type': ['Grass'], 'total': 180, 'hp': 30, 'attack': 30},
 {'name': 'Treecko', 'type': ['Grass'], 'total': 310, 'hp': 40, 'attack': 45},
 {'name': 'Grovyle', 'type': ['Grass'], 'total': 405, 'hp': 50, 'attack': 65},
 {'name': 'Lotad',
  'type': ['Water', 'Grass'],
  'total': 220,
  'hp': 40,
  'attack': 30},

Another thing we also want to do is create a subset of data where we can get ordered maximums or minimums

In [15]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob
    
    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)
    
    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])
    
    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])

Now lets get the top 2 pokemon of our previous filters

In [16]:
(Clumper(poke_dict)
 .keep(lambda d: 'Grass' in d['type'],
       lambda d: d['hp'] < 60)
 .head(2)
 .blob)

[{'name': 'Bulbasaur',
  'type': ['Grass', 'Poison'],
  'total': 318,
  'hp': 45,
  'attack': 49},
 {'name': 'Oddish',
  'type': ['Grass', 'Poison'],
  'total': 320,
  'hp': 45,
  'attack': 50}]

lets get the final two

In [17]:
(Clumper(poke_dict)
 .keep(lambda d: 'Grass' in d['type'],
       lambda d: d['hp'] < 60)
 .tail(2)
 .blob)

[{'name': 'GourgeistSmall Size',
  'type': ['Ghost', 'Grass'],
  'total': 494,
  'hp': 55,
  'attack': 85},
 {'name': 'PumpkabooSuper Size',
  'type': ['Ghost', 'Grass'],
  'total': 335,
  'hp': 59,
  'attack': 66}]

Now lets add the option to only select certain keys instead of all the ones in the dictionary

In [20]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob
    
    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)
    
    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])
    
    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])
    
    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])

Now lets test this by only getting the name and hp of the pokemon

In [21]:
(Clumper(poke_dict)
 .keep(lambda d: 'Grass' in d['type'],
       lambda d: d['hp'] < 60)
 .tail(2)
 .select('name', 'hp')
 .blob)

[{'name': 'GourgeistSmall Size', 'hp': 55},
 {'name': 'PumpkabooSuper Size', 'hp': 59}]

Lets add the option to add new keys to the dictionary and also update existing key:value pairs

In [23]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob
    
    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)
    
    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])
    
    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])
    
    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])
    
    def mutate(self, **kwargs):
        data = self.blob
        for key, func in kwargs.items():
            for i in range (len(data)):
                data[i][key] = func(data[i])
        return Clumper(data)

For the above code, the key word arguements allows it to accept multiple arugements. It is organized in a way so that it is the key and function. which is what we see iterated over the key word arguements. What the new code does is that it loops over all the data, finds the key and updates it with the new function/value

In [24]:
(Clumper(poke_dict)
 .keep(lambda d: 'Grass' in d['type'],
       lambda d: d['hp'] < 60)
 .tail(2)
 .select('name', 'hp')
 .mutate(hp=lambda d: d['hp'] * 2,
         hp4=lambda d: d['hp'] * 4)
 .blob)

[{'name': 'GourgeistSmall Size', 'hp': 110, 'hp4': 440},
 {'name': 'PumpkabooSuper Size', 'hp': 118, 'hp4': 472}]

Python itself also allows us to be creative in different ways, such as with the sorted function below

In [25]:
tuple_list = [(4, 3), (1, 2), (5, 10), (10, 2)]

The sorted function organizes based on the first value of the tuples

In [26]:
sorted(tuple_list)

[(1, 2), (4, 3), (5, 10), (10, 2)]

Now lets try and sort it by the second value in the tuples

In [27]:
sorted(tuple_list, key=lambda t: t[1])

[(1, 2), (10, 2), (4, 3), (5, 10)]

Now lets try to apply the sorting/lambda in our class

In [29]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob
    
    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)
    
    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])
    
    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])
    
    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])
    
    def mutate(self, **kwargs):
        data = self.blob
        for key, func in kwargs.items():
            for i in range (len(data)):
                data[i][key] = func(data[i])
        return Clumper(data)
    
    def sort(self, key, reverse=False):
        return Clumper(sorted(self.blob, key=key, reverse=reverse))

In [34]:
(Clumper(poke_dict)
 .keep(lambda d: 'Grass' in d['type'],
       lambda d: d['hp'] < 60)
 .select('name', 'hp')
  .head(10)
 .sort(lambda d: d['hp'])
 .blob)

[{'name': 'Sunkern', 'hp': 30},
 {'name': 'Paras', 'hp': 35},
 {'name': 'Hoppip', 'hp': 35},
 {'name': 'Treecko', 'hp': 40},
 {'name': 'Bulbasaur', 'hp': 45},
 {'name': 'Oddish', 'hp': 45},
 {'name': 'Chikorita', 'hp': 45},
 {'name': 'Bellsprout', 'hp': 50},
 {'name': 'Grovyle', 'hp': 50},
 {'name': 'Skiploom', 'hp': 55}]

now lets try the reverse

In [33]:
(Clumper(poke_dict)
 .keep(lambda d: 'Grass' in d['type'],
       lambda d: d['hp'] < 60)
 .select('name', 'hp')
 .head(10)
 .sort(lambda d: d['hp'], reverse=True)
 .blob)

[{'name': 'Skiploom', 'hp': 55},
 {'name': 'Bellsprout', 'hp': 50},
 {'name': 'Grovyle', 'hp': 50},
 {'name': 'Bulbasaur', 'hp': 45},
 {'name': 'Oddish', 'hp': 45},
 {'name': 'Chikorita', 'hp': 45},
 {'name': 'Treecko', 'hp': 40},
 {'name': 'Paras', 'hp': 35},
 {'name': 'Hoppip', 'hp': 35},
 {'name': 'Sunkern', 'hp': 30}]

We can look at other ways to do this, but it is clear to see that it is not as intuitive to understand and it is not organized as well. Furthermore, in the class code, it is much more flexable that and intuitive to modify and change.

In [35]:
subset = [{'hp': d['hp'], 'name': d['name']}
          for d in poke_dict
          if ('Grass' in d['type']) & (d['hp'] < 60)]
sorted_subset = sorted(subset, key=lambda d: d['hp'], reverse=True)
sorted_subset[:18]

[{'hp': 59, 'name': 'PumpkabooSuper Size'},
 {'hp': 56, 'name': 'Chespin'},
 {'hp': 55, 'name': 'Skiploom'},
 {'hp': 55, 'name': 'Turtwig'},
 {'hp': 55, 'name': 'Swadloon'},
 {'hp': 55, 'name': 'GourgeistSmall Size'},
 {'hp': 54, 'name': 'PumpkabooLarge Size'},
 {'hp': 50, 'name': 'Bellsprout'},
 {'hp': 50, 'name': 'Grovyle'},
 {'hp': 50, 'name': 'Roselia'},
 {'hp': 50, 'name': 'Cacnea'},
 {'hp': 50, 'name': 'RotomMow Rotom'},
 {'hp': 50, 'name': 'Pansage'},
 {'hp': 49, 'name': 'PumpkabooAverage Size'},
 {'hp': 45, 'name': 'Bulbasaur'},
 {'hp': 45, 'name': 'Oddish'},
 {'hp': 45, 'name': 'Chikorita'},
 {'hp': 45, 'name': 'Cherubi'}]