![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Pokémon

Do you know them all?

<img src="https://upload.wikimedia.org/wikipedia/commons/9/98/International_Pok%C3%A9mon_logo.svg" alt="Pokemon Logo" style="width: 730px;"/>
<div style="font-size:10px;"><a href="https://en.wikipedia.org/wiki/File:International_Pok%C3%A9mon_logo.svg">en.wikipedia.org/wiki/File:International_Pok%C3%A9mon_logo.svg</a></div>
    
[Pokémon](https://en.wikipedia.org/wiki/Pok%C3%A9mon), an abbreviation of **Pocket Monsters**, is a media franchise managed by a Japanese consortium. While it referes to a franchise itself, it also collectively refers to various fictional species which made appearances in the franchise's digital and print publications as well as films and television series.

On a regular day you might be collecting Pokémon in your video games, playing cards, or watching the animated series. However in this hackathon, we try to learn more about various Pokémon using data. Hopefully you will encounter some interesting findings while learning some new skills.

## Getting Ready

This section sets up many things behind the scenes which are required for the rest of this notebook. Most of the code blocks in this section are ready-to-run so you won't have to do any modifications. You don't need to know everything about various tasks being accomplished by the code cell in this section to complete the challenges. However feel free to ask mentors about anything that makes you curious.

### Importing Libraries

`▸Run` the cell below to import the required Python libraries.

In [None]:
%pip install -q pyodide_http plotly
import pyodide_http
pyodide_http.patch_all()
import pandas as pd
import plotly.express as px
print('Setup Complete')

### Importing Data

This Pokémon dataset is from [Kaggle](https://www.kaggle.com/rounakbanik/pokemon), an online community of data scientists and machine learners for predictive modeling and analytics. We've stored a copy of the dataset on GitHub.

In [None]:
pokemon = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/hackathon/pokemon.csv')
pokemon

Let's have a look at the column names.

In [None]:
pokemon.columns

Here is the description for some of the columns from Kaggle:

- **name**: The English name of the Pokemon
- **japanese_name**: The Original Japanese name of the Pokemon
- **pokedex_number**: The entry number of the Pokemon in the National Pokedex
- **percentage_male**: The percentage of the species that are male. Blank if the Pokemon is genderless.
- **type1**: The Primary Type of the Pokemon
- **type2**: The Secondary Type of the Pokemon
- **classfication**: The Classification of the Pokemon as described by the Sun and Moon Pokedex
- **height_m**: Height of the Pokemon in metres
- **weight_kg**: The Weight of the Pokemon in kilograms
- **capture_rate**: Capture Rate of the Pokemon
- **base_egg_steps**: The number of steps required to hatch an egg of the Pokemon
- **abilities**: A stringified list of abilities that the Pokemon is capable of having
- **experience_growth**: The Experience Growth of the Pokemon
- **base_happiness**: Base Happiness of the Pokemon
- **against_?**: Eighteen features that denote the amount of damage taken against an attack of a particular type
- **hp**: The Base HP of the Pokemon
- **attack**: The Base Attack of the Pokemon
- **defense**: The Base Defense of the Pokemon
- **sp_attack**: The Base Special Attack of the Pokemon
- **sp_defense**: The Base Special Defense of the Pokemon
- **speed**: The Base Speed of the Pokemon
- **generation**: The numbered generation which the Pokemon was first introduced
- **is_legendary**: Denotes if the Pokemon is legendary.

## Sorting the Data

As an example, let's get to know more about Pokémon from their physical characteristics.

First we will select the columns `name`, `weight_kg`, and `height_m` then sort by `weight_kg` and display the top 10.

In [None]:
pokemon_by_weight = pokemon[['name','weight_kg','height_m']]
pokemon_by_weight = pokemon_by_weight.sort_values(by=['weight_kg'],ascending=False)
pokemon_by_weight.head(10)

## Visualizing the Data

Let's make a bar graph of the ten heaviest Pokmon.

In [None]:
px.bar(pokemon_by_weight.head(10), x='name', y='weight_kg', title='Heaviest Pokemon by Weight')

We can also make a scatterplot of height vs weight.

In [None]:
px.scatter(pokemon, x='weight_kg', y='height_m', hover_data=['name'], title='Pokemon Height versus Weight')

To make it more interesting, let's color-code them by primary type.

In [None]:
px.scatter(pokemon, x='weight_kg', y='height_m', color='type1', hover_data=['name'], title='Pokemon Height versus Weight')

Interestingly, [Cosmoem](https://www.pokemon.com/us/pokedex/cosmoem) has a large weight and a short height.

In the graph that you just made you can also click on the primary types in the legend at the right to hide or show those Pokémon.

#### Counting Pokémon by Type

Let's list the primary types of Pokémon.

In [None]:
pokemon_types1 = pokemon['type1'].unique()
for pokemon_type in pokemon_types1:
    print(pokemon_type)

We can create a [histogram](https://en.wikipedia.org/wiki/Histogram) that is like a bar chart of how many there are of each type of Pokémon.

In [None]:
px.histogram(pokemon, x='type1', title='Number of Pokemon by Primary Type')

Here is a similar histogram by secondary type.

In [None]:
px.histogram(pokemon, x='type2', title='Number of Pokemon by Secondary Type')

### Attack and Defense by Primary Type

Let's find the attack and defense capabilities of Pokémon by their primary type.

In [None]:
px.scatter(pokemon, x='attack', y='defense', color='type1', hover_data=['name'], title='Pokemon Defence versus Attack')

Once again, you can filter out types by clicking on the labels in the legend.

### Average Attack and Defense by Primary Type

To compare Pokémon with different primary types we can calculate their [average](https://en.wikipedia.org/wiki/Average) attack and defense capabilities and visualize them.

In [None]:
attack_defence = pokemon[['type1', 'attack', 'defense']].groupby('type1').mean()
number_of_pokemon = pokemon[['type1', 'name']].groupby('type1').count()
attack_defence = attack_defence.join(number_of_pokemon, on='type1').rename(columns={'name':'number'})
types = attack_defence.index
px.scatter(attack_defence, x='attack', y='defense', color=types, size='number', title='Pokemon Defence versus Attack by Type')

There are a lot of other columns of data to explore in this dataset, you can continue your own analysis in the [next notebook](pokemon-challenge.ipynb).

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)