In [2]:
!wget https://calmcode.io/static/data/pokemon.json

--2024-04-15 18:36:14--  https://calmcode.io/static/data/pokemon.json
Resolving calmcode.io (calmcode.io)... 172.66.0.96, 162.159.140.98, 2606:4700:7::60, ...
Connecting to calmcode.io (calmcode.io)|172.66.0.96|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 59991 (59K) [application/json]
Saving to: ‘pokemon.json’


2024-04-15 18:36:15 (141 MB/s) - ‘pokemon.json’ saved [59991/59991]



We have downloaded the dataset from website and uploaded it to google colab runtime.

In [6]:
import json
import pathlib

file_path = 'pokemon.json'

with open(file_path, 'r') as file:
    poke_dict = json.load(file)


In [9]:
poke_dict[:4]

[{'name': 'Bulbasaur',
  'type': ['Grass', 'Poison'],
  'total': 318,
  'hp': 45,
  'attack': 49},
 {'name': 'Ivysaur',
  'type': ['Grass', 'Poison'],
  'total': 405,
  'hp': 60,
  'attack': 62},
 {'name': 'Venusaur',
  'type': ['Grass', 'Poison'],
  'total': 525,
  'hp': 80,
  'attack': 82},
 {'name': 'VenusaurMega Venusaur',
  'type': ['Grass', 'Poison'],
  'total': 625,
  'hp': 80,
  'attack': 100}]

Now we have loaded the dataset as a list of dictionaries and dataset is not following convinent format. It is having lot of attributes like name, type, total stats, hp, attack etc

We cannot efficeintly use pandas module to work with this type of data, so let us use a new complex topic "Method Chains". This topic is a mix of Fuctional Programming and Object Oriented Programming. It is very effective to perform operations on these type of datasets.

In [10]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, func):
        return [d for d in self.blob if func(d)]

In [11]:
Clumper(poke_dict).keep(lambda d: 'Dragon' in d['type'])

[{'name': 'CharizardMega Charizard X',
  'type': ['Fire', 'Dragon'],
  'total': 634,
  'hp': 78,
  'attack': 130},
 {'name': 'Dratini', 'type': ['Dragon'], 'total': 300, 'hp': 41, 'attack': 64},
 {'name': 'Dragonair',
  'type': ['Dragon'],
  'total': 420,
  'hp': 61,
  'attack': 84},
 {'name': 'Dragonite',
  'type': ['Dragon', 'Flying'],
  'total': 600,
  'hp': 91,
  'attack': 134},
 {'name': 'AmpharosMega Ampharos',
  'type': ['Electric', 'Dragon'],
  'total': 610,
  'hp': 90,
  'attack': 95},
 {'name': 'Kingdra',
  'type': ['Water', 'Dragon'],
  'total': 540,
  'hp': 75,
  'attack': 95},
 {'name': 'SceptileMega Sceptile',
  'type': ['Grass', 'Dragon'],
  'total': 630,
  'hp': 70,
  'attack': 110},
 {'name': 'Vibrava',
  'type': ['Ground', 'Dragon'],
  'total': 340,
  'hp': 50,
  'attack': 70},
 {'name': 'Flygon',
  'type': ['Ground', 'Dragon'],
  'total': 520,
  'hp': 80,
  'attack': 100},
 {'name': 'Altaria',
  'type': ['Dragon', 'Flying'],
  'total': 490,
  'hp': 75,
  'attack': 70

This Clumper class is very flexible and helps us to analyze and filter the data according to our desire or requirements. It will take the dataset as input and returns the appropriate list we desire as output.

Let us say we want to work with only "dragon" type pokemon from the dataset, so we want to extract only those. We can do it by using the above code.

We are using "Lambda" functions topic to apply condition without even defining the function.

There we go we got all dragon type pokemon as list.
Everything looks fine but we are having one issue.

The keep method is returning the lists as output and if we want to apply another filtration to the list we cannot do as keep method cannot be applied on keep method as data is a list. So let us try to modify the code and get the output as Clumper object, so that we can apply as many keep methods on top it to get precise data with lot of filtrations.

In [12]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, func):
        return Clumper([d for d in self.blob if func(d)])

In [14]:
(Clumper(poke_dict)
  .keep(lambda d: 'Dragon' in d['type'])
  .keep(lambda d: d['hp'] > 100)
  .blob)

[{'name': 'Rayquaza',
  'type': ['Dragon', 'Flying'],
  'total': 680,
  'hp': 105,
  'attack': 150},
 {'name': 'RayquazaMega Rayquaza',
  'type': ['Dragon', 'Flying'],
  'total': 780,
  'hp': 105,
  'attack': 180},
 {'name': 'Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 600,
  'hp': 108,
  'attack': 130},
 {'name': 'GarchompMega Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 700,
  'hp': 108,
  'attack': 170},
 {'name': 'GiratinaAltered Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 100},
 {'name': 'GiratinaOrigin Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 120},
 {'name': 'Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 660,
  'hp': 125,
  'attack': 130},
 {'name': 'KyuremBlack Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 170},
 {'name': 'KyuremWhite Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 120},
 {'name': 'Zygarde50% Forme',
  '

Here we made two small modifications when compared with the previous code.
Instead of returning a list we are returning a Clumper object and to that object we are applying methods woth conditions again and again as a chain.

So we have got all the "Dragon" pokemon as the list of dictionaries in previous code but now we want to apply second condition to do further filtration for the result list. Let us say I want all the dragon pokemon whose hp is greater than 100.

The above code bought us the list after applying two conditions: type must be "dragon" and hp must be greater than "100".

So with this clumper object we can apply multiple conditions as a chain and get advanced filtering done.

The above code satsified our requiremnet but still if there is a way to apply tow conditions on only one line as it will save lot of time and improve accuracy a lot better when dealing with large porgrams where we use multiple conditions.

There is a way to do so. The code can be modified into:

In [15]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

In [16]:
(Clumper(poke_dict)
  .keep(lambda d: 'Dragon' in d['type'],
        lambda d: d['hp'] > 100)
  .blob)

[{'name': 'Rayquaza',
  'type': ['Dragon', 'Flying'],
  'total': 680,
  'hp': 105,
  'attack': 150},
 {'name': 'RayquazaMega Rayquaza',
  'type': ['Dragon', 'Flying'],
  'total': 780,
  'hp': 105,
  'attack': 180},
 {'name': 'Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 600,
  'hp': 108,
  'attack': 130},
 {'name': 'GarchompMega Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 700,
  'hp': 108,
  'attack': 170},
 {'name': 'GiratinaAltered Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 100},
 {'name': 'GiratinaOrigin Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 120},
 {'name': 'Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 660,
  'hp': 125,
  'attack': 130},
 {'name': 'KyuremBlack Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 170},
 {'name': 'KyuremWhite Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 120},
 {'name': 'Zygarde50% Forme',
  '

What we have did here we specified the code that we might pass multiple functions by adding '*' before the function name and passed funcs such that multiple functions might be passed.

Then we iterated the lambda functionn again and again whenever we see a new data 'd' and after iteration, the list gets updated with the results.

So the function iterates twice with the same line code. First time, first filteration will be applied and then another filteration is applied. In this we way, we can apply both the conditions and do multiple filterations just like the previous code and that too we can do it in less number of lines

we are getting the data from first that too in the dataset order, instead of that we can get data from anywhere from top or bottom with filterations.

That can be done using the head and tails methods.
Let see the practical demonstration

In [18]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])

    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])


In [19]:
(Clumper(poke_dict)
  .keep(lambda d: 'Dragon' in d['type'],
        lambda d: d['hp'] > 100)
  .head(10)
  .tail(10)
  .blob)

[{'name': 'Zygarde50% Forme',
  'type': ['Dragon', 'Ground'],
  'total': 600,
  'hp': 108,
  'attack': 100},
 {'name': 'KyuremWhite Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 120},
 {'name': 'KyuremBlack Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 170},
 {'name': 'Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 660,
  'hp': 125,
  'attack': 130},
 {'name': 'GiratinaOrigin Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 120},
 {'name': 'GiratinaAltered Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 100},
 {'name': 'GarchompMega Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 700,
  'hp': 108,
  'attack': 170},
 {'name': 'Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 600,
  'hp': 108,
  'attack': 130},
 {'name': 'RayquazaMega Rayquaza',
  'type': ['Dragon', 'Flying'],
  'total': 780,
  'hp': 105,
  'attack': 180},
 {'name': 'Rayquaza',
  '

With the above code we got top 10 items from the list with head method and we got below 10 items with tail method

In [21]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])

    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])

    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])

In [22]:
(Clumper(poke_dict)
  .keep(lambda d: 'Dragon' in d['type'],
        lambda d: d['hp'] > 100)
  .select('name', 'hp')
  .head(10)
  .blob)

[{'name': 'Rayquaza', 'hp': 105},
 {'name': 'RayquazaMega Rayquaza', 'hp': 105},
 {'name': 'Garchomp', 'hp': 108},
 {'name': 'GarchompMega Garchomp', 'hp': 108},
 {'name': 'GiratinaAltered Forme', 'hp': 150},
 {'name': 'GiratinaOrigin Forme', 'hp': 150},
 {'name': 'Kyurem', 'hp': 125},
 {'name': 'KyuremBlack Kyurem', 'hp': 125},
 {'name': 'KyuremWhite Kyurem', 'hp': 125},
 {'name': 'Zygarde50% Forme', 'hp': 108}]

Along with customizing the order of elements, we can also apply select method to get our required columns.

It works using the dictionary comprehension by specifying which keys we want and we are going to follow the same concept like above, iterating again and again as we see the conditions.

In [23]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])

    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])

    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])

    def mutate(self, **kwargs):
      data = self.blob
      for key, func in kwargs.items():
          for i in range(len(data)):
              data[i][key] = func(data[i])
      return Clumper(data)

In [24]:
(Clumper(poke_dict)
  .keep(lambda d: 'Dragon' in d['type'],
        lambda d: d['hp'] > 100)
  .select('name', 'hp')
  .mutate(hp = lambda d: d['hp'] * 2,
          hp4 = lambda d: d['hp'] * 4)
  .blob)

[{'name': 'Rayquaza', 'hp': 210, 'hp4': 840},
 {'name': 'RayquazaMega Rayquaza', 'hp': 210, 'hp4': 840},
 {'name': 'Garchomp', 'hp': 216, 'hp4': 864},
 {'name': 'GarchompMega Garchomp', 'hp': 216, 'hp4': 864},
 {'name': 'GiratinaAltered Forme', 'hp': 300, 'hp4': 1200},
 {'name': 'GiratinaOrigin Forme', 'hp': 300, 'hp4': 1200},
 {'name': 'Kyurem', 'hp': 250, 'hp4': 1000},
 {'name': 'KyuremBlack Kyurem', 'hp': 250, 'hp4': 1000},
 {'name': 'KyuremWhite Kyurem', 'hp': 250, 'hp4': 1000},
 {'name': 'Zygarde50% Forme', 'hp': 216, 'hp4': 864}]

In the above code demonstration, we are giving user the freedom to override existing columns or add new columns to the dataset.

We can even have functions specify user about how they are done systematically without changing functionality.

Now we are passing the name and hp to the method as kwargs arguments and for each key and func in the items, we are taking data and giving it to the function and store it in the data segment of the keys. then we will return the Clumper object. This is the code in the mutate method.

we are adding new coulumns of hp with double of hp value and four times the hp value in hp4





In [25]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])

    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])

    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])

    def mutate(self, **kwargs):
        data = self.blob
        for key, func in kwargs.items():
            for i in range(len(data)):
                data[i][key] = func(data[i])
        return Clumper(data)

    def sort(self, key, reverse=False):
        return Clumper(sorted(self.blob, key=key, reverse=reverse))


In [26]:
(Clumper(poke_dict)
    .keep(lambda d: 'Dragon' in d['type'],
          lambda d: d['hp'] > 100)
    .select('name', 'hp')
    .sort(lambda d: d['hp'], reverse=True)
    .blob)


[{'name': 'GiratinaAltered Forme', 'hp': 150},
 {'name': 'GiratinaOrigin Forme', 'hp': 150},
 {'name': 'Kyurem', 'hp': 125},
 {'name': 'KyuremBlack Kyurem', 'hp': 125},
 {'name': 'KyuremWhite Kyurem', 'hp': 125},
 {'name': 'Garchomp', 'hp': 108},
 {'name': 'GarchompMega Garchomp', 'hp': 108},
 {'name': 'Zygarde50% Forme', 'hp': 108},
 {'name': 'Rayquaza', 'hp': 105},
 {'name': 'RayquazaMega Rayquaza', 'hp': 105}]

By now, we can see that object oriented approach and function apporach was combined to get the desired outputs.

Now we can even sort the tuples. Now we have generic functions in the python and implement the sorting into the code, we can sort the first elements in the tuple.

If we want to do the same and implement the sorting to the second elements of the tuple, we need to use the lambda functions concept. we are passing lambda function to the keys in tuple and then we apply the logic after assigning it a value.

In similar to above, we can specify what we are doing and how we are doing via functions.

In [None]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])

    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])

    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])

    def mutate(self, **kwargs):
        data = self.blob
        for key, func in kwargs.items():
            for i in range(len(data)):
                data[i][key] = func(data[i])
        return Clumper(data)

    def sort(self, key, reverse=False):
        return Clumper(sorted(self.blob, key=key, reverse=reverse))

In [None]:
(Clumper(poke_dict)
  .keep(lambda d: 'Grass' in d['type'],
        lambda d: d['hp'] < 60)
  .mutate(ratio=lambda d: d['attack']/d['hp'])
  .select('name', 'ratio')
  .sort(lambda d: d['ratio'], reverse=True)
  .head(15)
  .blob)

It's easy to make changes to the analysis here because the reading is from left to right top to bottom. We can change the order of lines which will change the order of the code which makes it easy to reason about the steps that are being applied to our data.

Model Chains API is much more powerful and flexible than traditional pandas module to even perform any changes with specifying the changes with the help of functions. We can even improve speed further by workinh on this module. It can even perfrom better and exceed limitations of pandas and python.