![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-science-and-artificial-intelligence&branch=main&subPath=06b-getting-data.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Getting Pokémon Data

This is an optional notebook to demonstrate how we can query the [PokéAPI](https://pokeapi.co/) site to get a large Pokémon data set and store it as a [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) file for use in the [Pokémon Data Analysis notebook](06-data-analysis-pokemon.ipynb).

We'll start by getting data about all Pokémon.

In [None]:
import requests
import pandas as pd
r = requests.get('https://pokeapi.co/api/v2/pokemon?limit=100000')
df = pd.DataFrame(r.json()['results'])
df
abilities = []
base_experience = []
forms = []
#game_indices = []
height = []
held_items = []
id = []
is_default = []
location_area_encounters = []
moves = []
#name = []
order = []
species = []
sprites = []
stats = []
types = []
weight = []
#for i in range(5):
for i in range(len(df)):
    print(df['name'][i])
    try:
        r = requests.get(df['url'][i])
        abilities.append(r.json()['abilities'])
        base_experience.append(r.json()['base_experience'])
        forms.append(r.json()['forms'])
        #game_indices.append(r.json()['game_indices'])
        height.append(r.json()['height'])
        held_items.append(r.json()['held_items'])
        id.append(r.json()['id'])
        is_default.append(r.json()['is_default'])
        location_area_encounters.append(r.json()['location_area_encounters'])
        moves.append(r.json()['moves'])
        #name.append(r.json()['name'])
        order.append(r.json()['order'])
        species.append(r.json()['species'])
        sprites.append(r.json()['sprites'])
        stats.append(r.json()['stats'])
        types.append(r.json()['types'])
        weight.append(r.json()['weight'])
    except:
        abilities.append(None)
        base_experience.append(None)
        forms.append(None)
        #game_indices.append(None)
        height.append(None)
        held_items.append(None)
        id.append(None)
        is_default.append(None)
        location_area_encounters.append(None)
        moves.append(None)
        #name.append(None)
        order.append(None)
        species.append(None)
        sprites.append(None)
        stats.append(None)
        types.append(None)
        weight.append(None)
df['id'] = id
df['height'] = height
df['weight'] = weight
df['base_experience'] = base_experience
df['abilities'] = abilities
df['forms'] = forms
df['species'] = species
df['is_default'] = is_default
#df['game_indices'] = game_indices
#df['held_items'] = held_items
df['location_area_encounters'] = location_area_encounters
#df['moves'] = moves
#df['name'] = name
df['order'] = order
#df['sprites'] = sprites
df['stats'] = stats
df['types'] = types
df = df.drop('url', axis=1) # drop url column

# split the stats column into multiple columns
df_stats = pd.DataFrame(df['stats'].tolist())
df_stats.columns = ['hp','attack','defense','special-attack','special-defense','speed']
for column in df_stats.columns:
    df_stats[column] = df_stats[column].apply(lambda x: x['base_stat'] if x is not None else None)
# join with the main dataframe
df = df.join(df_stats)
df = df.drop('stats', axis=1)
df = df.drop('location_area_encounters', axis=1)

# split the abilities to three columns
abilities_lists = [[],[],[]]
for i in range(len(df)):
    for n in range(3):
        try:
            ability = df['abilities'][i][n]['ability']['name']
        except:
            ability = None
        abilities_lists[n].append(ability)
df['ability1'] = abilities_lists[0]
df['ability2'] = abilities_lists[1]
df['ability3'] = abilities_lists[2]
df = df.drop('abilities', axis=1)

# species column is a dictionary, so we need to extract the name
df['species'] = df['species'].apply(lambda x: x['name'] if x is not None else None)

# convert the forms column to a string
df['forms'] = df['forms'].apply(lambda x: x[0]['name'] if x is not None else None)

# split the types to two columns
types_lists = [[],[]]
for i in range(len(df)):
    for n in range(2):
        try:
            type = df['types'][i][n]['type']['name']
        except:
            type = None
        types_lists[n].append(type)
df['type1'] = types_lists[0]
df['type2'] = types_lists[1]
df = df.drop('types', axis=1)

# convert the moves column to a stringified list
#df['moves'] = df['moves'].apply(lambda x: [move['move']['name'] for move in x] if x is not None else None)

# reorder the columns
df = df[['id','name','base_experience','height','weight','speed','hp','attack','defense','special-attack','special-defense','forms','species','is_default','order','type1','type2','ability1','ability2','ability3']]

df.head(3)

Now that we have a data set stored in `df`, we can save it to a csv file. The `index=False` command means we won't include the index as a column in the file.

In [None]:
df.to_csv('data/pokemon.csv', index=False)

## Pokémon Images

We can also display images of Pokémon by their index number. For example let's choose the first five from our dataframe.

In [None]:
from IPython.display import Image
def display_pokemon(n):
    image_url = 'https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/other/official-artwork/'+str(n+1)+'.png'
    display(Image(url=image_url, width=200, height=200))

for n in range(5):
    display_pokemon(n)
    print(df['name'][n])

## Species Characteristics

To retrieve Pokémon species characteristics, we can query the link `https://pokeapi.co/api/v2/pokemon-species/{id or name}/`. We'll do that for every species name in our `df` data set and then clean up the data.

In [None]:
import requests
import pandas as pd
def get_species_characteristics(species):
    r = requests.get('https://pokeapi.co/api/v2/pokemon-species/'+species)
    resp = r.json()
    sc_dict = {}
    sc_dict['base_happiness'] = resp['base_happiness']
    sc_dict['capture_rate'] = resp['capture_rate']
    sc_dict['color'] = resp['color']['name']
    sc_dict['egg_groups'] = [egg_group['name'] for egg_group in resp['egg_groups']]
    sc_dict['evolution_chain'] = resp['evolution_chain']['url']
    sc_dict['evolves_from_species'] = resp['evolves_from_species']['name'] if resp['evolves_from_species'] is not None else None
    #sc['flavor_text_entries'] = [flavor_text['flavor_text'] for flavor_text in resp['flavor_text_entries'] if flavor_text['language']['name'] == 'en']
    sc_dict['form_descriptions'] = [form_description['description'] for form_description in resp['form_descriptions'] if form_description['language']['name'] == 'en']
    sc_dict['forms_switchable'] = resp['forms_switchable']
    sc_dict['gender_rate'] = resp['gender_rate']
    sc_dict['generation'] = resp['generation']['name']
    sc_dict['growth_rate'] = resp['growth_rate']['name']
    sc_dict['habitat'] = resp['habitat']['name'] if resp['habitat'] is not None else None
    sc_dict['has_gender_differences'] = resp['has_gender_differences']
    sc_dict['hatch_counter'] = resp['hatch_counter']
    sc_dict['is_baby'] = resp['is_baby']
    sc_dict['is_legendary'] = resp['is_legendary']
    sc_dict['is_mythical'] = resp['is_mythical']
    sc_dict['name'] = resp['name']
    sc_dict['names'] = [name['name'] for name in resp['names'] if name['language']['name'] == 'en']
    sc_dict['order'] = resp['order']
    sc_dict['pal_park_encounters'] = [pal_park_encounter['area']['name'] for pal_park_encounter in resp['pal_park_encounters']]
    sc_dict['shape'] = resp['shape']['name'] if resp['shape'] is not None else None
    sc_dict['varieties'] = [variety['pokemon']['name'] for variety in resp['varieties']]
    return sc_dict

species_characteristics = {}
for species in df['species'].unique():
    print(species)
    species_characteristics[species] = get_species_characteristics(species)
sc = pd.DataFrame(species_characteristics).T

# use a numeric index, since we already have a name column
sc = sc.reset_index()
sc = sc.drop('index', axis=1)

# rename the name column to species
sc = sc.rename(columns={'name': 'species'})

# fill null values with 0 for capture_rate and base_happiness, and convert to integers
sc['base_happiness'] = sc['base_happiness'].fillna(0)
sc['base_happiness'] = sc['base_happiness'].astype(int)
sc['capture_rate'] = sc['capture_rate'].fillna(0)
sc['capture_rate'] = sc['capture_rate'].astype(int)

# split the egg_groups column into two columns
sc['egg_group1'] = sc['egg_groups'].apply(lambda x: x[0] if len(x) > 0 else None)
sc['egg_group2'] = sc['egg_groups'].apply(lambda x: x[1] if len(x) > 1 else None)
sc = sc.drop('egg_groups', axis=1)

# drop evolution_chain, form_descriptions, and names columns
sc = sc.drop(['evolution_chain', 'form_descriptions', 'names'], axis=1)

# remove 'generation-' from the generation column
sc['generation'] = sc['generation'].apply(lambda x: x.replace('generation-', ''))
# replace roman numerals with integers
sc['generation'] = sc['generation'].apply(lambda x: x.replace('ix', '9'))
sc['generation'] = sc['generation'].apply(lambda x: x.replace('viii', '8'))
sc['generation'] = sc['generation'].apply(lambda x: x.replace('vii', '7'))
sc['generation'] = sc['generation'].apply(lambda x: x.replace('vi', '6'))
sc['generation'] = sc['generation'].apply(lambda x: x.replace('iv', '4'))
sc['generation'] = sc['generation'].apply(lambda x: x.replace('iii', '3'))
sc['generation'] = sc['generation'].apply(lambda x: x.replace('ii', '2'))
sc['generation'] = sc['generation'].apply(lambda x: x.replace('i', '1'))
sc['generation'] = sc['generation'].apply(lambda x: x.replace('v', '5'))
sc['generation'] = sc['generation'].astype(int)
sc['generation'].unique()

# calculate female and male percentages
sc['%_female'] = sc['gender_rate'].apply(lambda x: 0 if x == -1 else float(12.5 * x))
sc['%_male'] = sc['gender_rate'].apply(lambda x: 0 if x == -1 else 100 - float(12.5 * x))
sc = sc.drop('gender_rate', axis=1)

# convert 'pal_park_encounters' column to a string
sc['pal_park_encounters'] = sc['pal_park_encounters'].apply(lambda x: ', '.join(x))

# reorder columns
#df_sc = df_sc[['type','name','order','base_happiness','capture_rate','color','evolves_from_species','forms_switchable','generation','growth_rate','habitat','shape','%_female','% male','has_gender_differences','hatch_counter','egg_group1','egg_group2','is_baby','is_legendary','is_mythical','pal_park_encounters','varieties']]

sc

This new set of data has a `species` column that we can use to merge it into the original Pokémon data set from above.

They both have an `order` column, so we'll need to rename those after the merge.

In [None]:
df = df.merge(sc, on='species')
df = df.rename(columns={'order_x': 'pokemon_order', 'order_y': 'species_order'})

Once again we can export to a csv file.

In [None]:
df.to_csv('data/pokemon.csv', index=False)

## Types

There are also various Pokémon types, we can get a list of them and clean it up.

In [1]:
import requests
import pandas as pd
r = requests.get('https://pokeapi.co/api/v2/type?limit=100000')
types = pd.DataFrame(r.json()['results'])

damage_relations = []
game_indices = []
generation = []
move_damage_class = []
moves = []
pokemon = []
name = []
names = []
for i in range(len(types)):
    r = requests.get(types['url'][i])
    resp = r.json()
    damage_relations.append(resp['damage_relations'])
    game_indices.append(resp['game_indices'])
    generation.append(resp['generation'])
    move_damage_class.append(resp['move_damage_class'])
    moves.append(resp['moves'])
    pokemon.append(resp['pokemon'])
    name.append(resp['name'])
    names.append(resp['names'])
types['damage_relations'] = damage_relations
types['game_indices'] = game_indices
types['generation'] = generation
types['move_damage_class'] = move_damage_class
types['moves'] = moves
types['pokemon'] = pokemon
types['name'] = name
types['names'] = names

# drop the url, game_indices, and names columns
types = types.drop(['url', 'game_indices', 'names'], axis=1)

# remove 'generation-' from the generation column
types['generation'] = types['generation'].apply(lambda x: x['name'])
types['generation'] = types['generation'].apply(lambda x: x.replace('generation-', ''))
# replace roman numerals with integers
types['generation'] = types['generation'].apply(lambda x: x.replace('ix', '9'))
types['generation'] = types['generation'].apply(lambda x: x.replace('viii', '8'))
types['generation'] = types['generation'].apply(lambda x: x.replace('vii', '7'))
types['generation'] = types['generation'].apply(lambda x: x.replace('vi', '6'))
types['generation'] = types['generation'].apply(lambda x: x.replace('iv', '4'))
types['generation'] = types['generation'].apply(lambda x: x.replace('iii', '3'))
types['generation'] = types['generation'].apply(lambda x: x.replace('ii', '2'))
types['generation'] = types['generation'].apply(lambda x: x.replace('i', '1'))
types['generation'] = types['generation'].apply(lambda x: x.replace('v', '5'))
types['generation'] = types['generation'].astype(int)

# convert 'moves' column to a stringified list
types['moves'] = types['moves'].apply(lambda x: [item['name'] for item in x])

# convert 'move_damage_class' column to a string
types['move_damage_class'] = types['move_damage_class'].apply(lambda x: x['name'] if x else None)

# convert 'pokemon' column to a stringified list
types['pokemon'] = types['pokemon'].apply(lambda x: [item['pokemon']['name'] for item in x])

# split the damage_relations column into 6 columns
types['double_damage_from'] = types['damage_relations'].apply(lambda x: [item['name'] for item in x['double_damage_from']])
types['double_damage_to'] = types['damage_relations'].apply(lambda x: [item['name'] for item in x['double_damage_to']])
types['half_damage_from'] = types['damage_relations'].apply(lambda x: [item['name'] for item in x['half_damage_from']])
types['half_damage_to'] = types['damage_relations'].apply(lambda x: [item['name'] for item in x['half_damage_to']])
types['no_damage_from'] = types['damage_relations'].apply(lambda x: [item['name'] for item in x['no_damage_from']])
types['no_damage_to'] = types['damage_relations'].apply(lambda x: [item['name'] for item in x['no_damage_to']])
types = types.drop('damage_relations', axis=1)

types

Unnamed: 0,name,generation,move_damage_class,moves,pokemon,double_damage_from,double_damage_to,half_damage_from,half_damage_to,no_damage_from,no_damage_to
0,normal,1,physical,"[pound, double-slap, comet-punch, mega-punch, ...","[pidgey, pidgeotto, pidgeot, rattata, raticate...",[fighting],[],[],"[rock, steel]",[ghost],[ghost]
1,fighting,1,physical,"[karate-chop, double-kick, jump-kick, rolling-...","[mankey, primeape, poliwrath, machop, machoke,...","[flying, psychic, fairy]","[normal, rock, steel, ice, dark]","[rock, bug, dark]","[flying, poison, bug, psychic, fairy]",[],[ghost]
2,flying,1,physical,"[gust, wing-attack, fly, peck, drill-peck, mir...","[charizard, butterfree, pidgey, pidgeotto, pid...","[rock, electric, ice]","[fighting, bug, grass]","[fighting, bug, grass]","[rock, steel, electric]",[ground],[]
3,poison,1,physical,"[poison-sting, acid, poison-powder, toxic, smo...","[bulbasaur, ivysaur, venusaur, weedle, kakuna,...","[ground, psychic]","[grass, fairy]","[fighting, poison, bug, grass, fairy]","[poison, ground, rock, ghost]",[],[steel]
4,ground,1,physical,"[sand-attack, earthquake, fissure, dig, bone-c...","[sandshrew, sandslash, nidoqueen, nidoking, di...","[water, grass, ice]","[poison, rock, steel, fire, electric]","[poison, rock]","[bug, grass]",[electric],[flying]
5,rock,1,physical,"[rock-throw, rock-slide, sandstorm, rollout, a...","[geodude, graveler, golem, onix, rhyhorn, rhyd...","[fighting, ground, steel, water, grass]","[flying, bug, fire, ice]","[normal, flying, poison, fire]","[fighting, ground, steel]",[],[]
6,bug,1,physical,"[twineedle, pin-missile, string-shot, leech-li...","[caterpie, metapod, butterfree, weedle, kakuna...","[flying, rock, fire]","[grass, psychic, dark]","[fighting, ground, grass]","[fighting, flying, poison, ghost, steel, fire,...",[],[]
7,ghost,1,physical,"[night-shade, confuse-ray, lick, nightmare, cu...","[gastly, haunter, gengar, misdreavus, shedinja...","[ghost, dark]","[ghost, psychic]","[poison, bug]",[dark],"[normal, fighting]",[normal]
8,steel,2,physical,"[steel-wing, iron-tail, metal-claw, meteor-mas...","[magnemite, magneton, forretress, steelix, sci...","[fighting, ground, fire]","[rock, ice, fairy]","[normal, flying, rock, bug, steel, grass, psyc...","[steel, fire, water, electric]",[poison],[]
9,fire,1,special,"[fire-punch, ember, flamethrower, fire-spin, f...","[charmander, charmeleon, charizard, vulpix, ni...","[ground, rock, water]","[bug, steel, grass, ice]","[bug, steel, fire, grass, ice, fairy]","[rock, fire, water, dragon]",[],[]


And export to a csv file.

In [None]:
types.to_csv('data/pokemon_types.csv', index=False)

You can now use these data sets in [Data Analysis notebook](06-data-analysis.ipynb), or go on to the [next notebook](07-data-logging.ipynb) that will introduce you to recording and using your own data.

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)