In [1]:
# This is Lecture03 - Exercise 1
# of the "Data Science" class at Technische Hochschule Rosenheim

# Pokemon Dataset

In this exercise you will be analysing a dataset about Pokemon.
<img src="figures/pokemon.png" alt="Pikatchu" align="right" width="200"/>

### Intro to Pokemon
Pokémon is a media franchise managed by The Pokémon Company, a Japanese consortium between Nintendo, Game Freak, and Creatures. While the franchise copyright is shared by all three companies, Nintendo is the sole owner of the trademark. The franchise was created by Satoshi Tajiri in 1995, and is centered on fictional creatures called "Pokémon", which humans, known as Pokémon Trainers, catch and train to battle each other for sport.

The name Pokémon is the romanized contraction of the Japanese brand Pocket Monsters. The term Pokémon, in addition to referring to the Pokémon franchise itself, also collectively refers to the 721 known fictional species that have made appearances in Pokémon media as of the release of the sixth generation titles Pokémon X and Y. "Pokémon" is identical in both the singular and plural, as is each individual species name; it is grammatically correct to say "one Pokémon" and "many Pokémon", as well as "one Pikachu" and "many Pikachu". (source: wikipedia.com)

### The Dataset
You can find the data in three files in the 'data' directory. This are the raw attributes that are used for calculating how much damage an attack will do in the games. This dataset is about the Pokemon games (NOT Pokemon cards or Pokemon Go).

1) 'pokemon_ids.npy' contains an int ndarray of the ID for each Pokemon

2) 'pokemon_names.npy' contains a string ndarray of the name of each Pokemon (Hint: to `load` this file, you need to set the named parameter `allow_pickle` to `True`, as strings are objects)

3) 'pokemon_stats.npy' contains a ndarray with 6 columns for each Pokemon

- HP: hit points, or health, defines how much damage a Pokemon can withstand before fainting
- Attack: the base modifier for normal attacks (eg. Scratch, Punch)
- Defense: the base damage resistance against normal attacks
- SP Atk: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
- SP Def: the base damage resistance against special attacks
- Speed: determines which Pokemon attacks first each round


### Exercises 1

* 1a) load the dataset into three numpy arrays 'ids', 'names' and 'stats'. Verfiy the datatypes of each array.

* 1b) how many rows do you expect the array to have? verify your assumption!

* 1c) inspect the first 10 rows and the last 10 rows - do you notice anything important? Find an explanation for your observation!


In [2]:
## ---------- SOLUTION 1a

In [3]:
import numpy as np

In [4]:
ids = np.load('data/pokemon_ids.npy')

In [5]:
names = np.load('data/pokemon_names.npy', allow_pickle=True)

In [6]:
stats = np.load('data/pokemon_stats.npy')

In [7]:
print(ids.dtype, names.dtype, stats.dtype)

int64 object int64


In [8]:
## ---------- SOLUTION 1b

In [9]:
# we expect 721 rows

In [10]:
print(len(names), len(stats), len(ids))

800 800 800


In [11]:
print(names.shape, stats.shape, ids.shape)

(800,) (800, 6) (800,)


In [12]:
# oops, we got 800 rows!!! this is strange!!!

In [13]:
## ---------- SOLUTION 1c

In [14]:
print(names[:10])
print(names[-10:])

['Bulbasaur' 'Ivysaur' 'Venusaur' 'VenusaurMega Venusaur' 'Charmander'
 'Charmeleon' 'Charizard' 'CharizardMega Charizard X'
 'CharizardMega Charizard Y' 'Squirtle']
['Noibat' 'Noivern' 'Xerneas' 'Yveltal' 'Zygarde50% Forme' 'Diancie'
 'DiancieMega Diancie' 'HoopaHoopa Confined' 'HoopaHoopa Unbound'
 'Volcanion']


In [15]:
print(ids[:10])
print(ids[-10:])

[1 2 3 3 4 5 6 6 6 7]
[714 715 716 717 718 719 719 720 720 721]


In [16]:
print(stats[:10])
print(stats[-10:])

[[ 45  49  49  65  65  45]
 [ 60  62  63  80  80  60]
 [ 80  82  83 100 100  80]
 [ 80 100 123 122 120  80]
 [ 39  52  43  60  50  65]
 [ 58  64  58  80  65  80]
 [ 78  84  78 109  85 100]
 [ 78 130 111 130  85 100]
 [ 78 104  78 159 115 100]
 [ 44  48  65  50  64  43]]
[[ 40  30  35  45  40  55]
 [ 85  70  80  97  80 123]
 [126 131  95 131  98  99]
 [126 131  95 131  98  99]
 [108 100 121  81  95  95]
 [ 50 100 150 100 150  50]
 [ 50 160 110 160 110 110]
 [ 80 110  60 150 130  70]
 [ 80 160  60 170 130  80]
 [ 80 110 120 130  90  70]]


In [17]:
# to notice: the ID is not unique, there seem to be "similar" pokemon with the same ID 
# ("mega evolutions"). 
# We do have 721 IDs as expected, but 800 rows
# Doing some research, we find out, that some Pokemon can be temporarily changed to a different form 
# (for one fight) with a "mega stone" - after the fight, they revert to the original form. These temporary
# forms have the same ID.

### Exercises 2

* 2a) find the ids and names of all Pokemon with hitpoints above 150
* 2b) find the names of all Pokemon that have a higher Attack than Defense (the "attackers")

In [18]:
## ---------- SOLUTION 2a

In [19]:
hp = stats[:, 0] # create a 1-dim array of the hp values

In [20]:
large_hp_idx = (hp>150) # create a boolean 1-dim array of the hps above 150 for indexing

In [21]:
print(ids[large_hp_idx])
print(names[large_hp_idx])

[113 143 202 242 321 594]
['Chansey' 'Snorlax' 'Wobbuffet' 'Blissey' 'Wailord' 'Alomomola']


In [22]:
## ---------- SOLUTION 2b

In [23]:
# concise solution
attacker_idx = stats[:, 1] > stats[:, 2]

In [24]:
# easier to understand solution
attack = stats[:, 1]
defense = stats[:, 2]
attacker_idx = attack > defense

In [25]:
print(f'We have {attacker_idx.sum()} attackers, the first 20 are: {names[attacker_idx][:20]}')

We have 433 attackers, the first 20 are: ['Charmander' 'Charmeleon' 'Charizard' 'CharizardMega Charizard X'
 'CharizardMega Charizard Y' 'Weedle' 'Beedrill' 'BeedrillMega Beedrill'
 'Pidgey' 'Pidgeotto' 'Pidgeot' 'Rattata' 'Raticate' 'Spearow' 'Fearow'
 'Ekans' 'Arbok' 'Pikachu' 'Raichu' 'Nidoqueen']


---