![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Pokémon - Do you know them all?

**Submitted by: A, B, C, D**

<table><tr>
<td> <img src="data/pokemon.png" alt="Pokemon Logo" style="width: 730px;"/> </td>
<td> <img src="data/cosmoem.jpeg" alt="Cosmoem" style="width: 270px;"/> </td>
</tr></table>
    
[Pokémon](https://en.wikipedia.org/wiki/Pok%C3%A9mon), an abbreviation of **Pocket Monsters**, is a media franchise managed by a Japanese consortium. While it referes to a franchise itself, it also collectively refers to various fictional species which made appearances in the franchise's digital and print publications as well as films and television series.

On a regular day you might be collecting Pokémon in your video games, playing cards, or watching the animated series. However in this hackathon, we try to learn more about various Pokémon using data. Hopefully you will encounter some interesting findings while learning some new coding/hacking skills.

## Getting ready

This section sets up many things behind the scenes which are required for the rest of this notebook. Most of the code blocks in this section are ready-to-run so you won't have to do any modifications. You don't need to know everything about various tasks being accomplished by the code cell in this section to complete the challenges. However feel free to ask mentors about anything that makes you curious.

### 1. Install/Import libraries

Run the cell below to download and install required Python libraries. It may take few minutes to complete the execution of the cell.

In [None]:
!pip install cufflinks ipywidgets

Run the next few cells to load libaries and functions which will help us later to complete various challenges.

In [None]:
# load libraries and helper code
import pandas as pd
import chart_studio.plotly as py
import cufflinks as cf
import IPython
from plotly.offline import init_notebook_mode
cf.go_offline()

# color pallete with more than 20 colors
colors20 = ['#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#46f0f0', 
          '#f032e6', '#bcf60c', '#fabebe', '#008080', '#e6beff', '#9a6324', '#fffac8', 
          '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#808080', '#ffffff', '#000000']

# to enable plotting in colab
def enable_plotly_in_cell():
    display(IPython.core.display.HTML('''<script src="/static/components/requirejs/require.js"></script>'''))
    init_notebook_mode(connected=False)
get_ipython().events.register('pre_run_cell', enable_plotly_in_cell)

### 2. Import data and create a dataframe

Pokémon dataset is available on [Kaggle](https://www.kaggle.com/rounakbanik/pokemon) which is an online community of data scientists and machine learners and a well-known competition platform for predictive modeling and analytics.

For this hackathon the dataset is stored in cloud storage so we can import it into this notebook. Executing cells below will also create a dataframe and make you aware of some interesting facts about the dataset.

In [None]:
# reading from cloud object storage
target_url="https://swift-yeg.cloud.cybera.ca:8080/v1/AUTH_d22d1e3f28be45209ba8f660295c84cf/hackaton/pokemon.csv"

In [None]:
# reading the input file and creating dataframe
pokemon = pd.read_csv(target_url) 

In [None]:
# how many rows and colums does the dataframe have?
pokemon.shape

In [None]:
# what are the column names?
pokemon.columns

Now you know which columns are there in the dataset, but what do those columns refer to? Here is the description for some of the columns from Kaggle:

- **name**: The English name of the Pokemon
- **japanese_name**: The Original Japanese name of the Pokemon
- **pokedex_number**: The entry number of the Pokemon in the National Pokedex
- **percentage_male**: The percentage of the species that are male. Blank if the Pokemon is genderless.
- **type1**: The Primary Type of the Pokemon
- **type2**: The Secondary Type of the Pokemon
- **classfication**: The Classification of the Pokemon as described by the Sun and Moon Pokedex
- **height_m**: Height of the Pokemon in metres
- **weight_kg**: The Weight of the Pokemon in kilograms
- **capture_rate**: Capture Rate of the Pokemon
- **base_egg_steps**: The number of steps required to hatch an egg of the Pokemon
- **abilities**: A stringified list of abilities that the Pokemon is capable of having
- **experience_growth**: The Experience Growth of the Pokemon
- **base_happiness**: Base Happiness of the Pokemon
- **against_?**: Eighteen features that denote the amount of damage taken against an attack of a particular type
- **hp**: The Base HP of the Pokemon
- **attack**: The Base Attack of the Pokemon
- **defense**: The Base Defense of the Pokemon
- **sp_attack**: The Base Special Attack of the Pokemon
- **sp_defense**: The Base Special Defense of the Pokemon
- **speed**: The Base Speed of the Pokemon
- **generation**: The numbered generation which the Pokemon was first introduced
- **is_legendary**: Denotes if the Pokemon is legendary.

Now everything is set up for crunching the Pokémon dataset. Your group can go through the rest of the notebook and work on challenges.

**While working on the challenges, feel free to add new code/markdown cells as needed.**

## Part A: Display the dataset

Let's have a quick look at the first 20 columns and few top rows of the dataset. 

In [None]:
# display first 20 columns to explore what the data look like
# 0:20 - means from column 0 up to column 19
pokemon.iloc[:,0:20].head()

### Challenge:

 - Out of the total 41 columns, we can see only first 20 of them here. Can you tweak the code cell to display only rest of the columns (i.e. from 21 to 41)?

## Part B: Weight and height 

Let's get to know more about Pokémon by assessing their physical characteristics.

In [None]:
# let's select only two columns "weight_kg" and "name"
pokemon_by_weight = pokemon[["weight_kg","name"]]

# we will order by "weight_kg" in descending order
pokemon_by_weight = pokemon_by_weight.sort_values(by=['weight_kg'],ascending=False)

# print out the top 10 heaviest pokemon
pokemon_by_weight.head(10)

In [None]:
# create a bar chart

# set_index("name"): we need to set index of rows to "name" so that the graph has pokemon names as labels
pokemon_by_weight.head(10).set_index("name").iplot(kind="bar",yTitle="Weight (kg)")

### Challenges:
- Find the 20 shortest Pokémon (use **height_m** column).
- Create a horizontal bar chart or line chart for the 20 shortest Pokémon by changing `iplot(kind="bar")` to `iplot(kind="barh")`  or `iplot(kind="line")` respectively. Which graph helps you better understand the results?
- Try two new kinds of plots: [boxplots](https://www.mathsisfun.com/definitions/box-and-whisker-plot.html) and [histograms](https://www.mathsisfun.com/data/histograms.html). Can you figure out how to interpret them?
     - use `iplot(kind="box")`
     - use `iplot(kind="histogram")`

## Part C: For the top 10 heaviest Pokémon, what will be the height?

We have identified the heaviest Pokémon in the dataset. Are those heaviest ones are the tallest ones as well? Let's find out.

In [None]:
# this time we are interested in weight and height - we select three columns "weight_kg","height_m" and "name"
pokemon_by_weight_height = pokemon[["weight_kg","height_m","name"]]

# we will order by weight_kg in descending order
pokemon_by_weight_height = pokemon_by_weight_height.sort_values(by=['weight_kg'],ascending=False)

# print on the screen the top 10 heaviest pokemon
pokemon_by_weight_height.head(10)

Interestingly, [Cosmoem](https://www.pokemon.com/us/pokedex/cosmoem) has a super large weight and minimum height!

Let's try to visualize the height and weight for the ten heaviest Pokémon using a bar chart.

In [None]:
# create a bar chart
pokemon_by_weight_height.head(10).set_index("name").iplot(kind="bar")

Do you think a bar chart is a good option to visualize height and weight of all Pokémon? Perhaps not. Let's try [scatter plot](https://en.wikipedia.org/wiki/Scatter_plot) for that purpose.

In [None]:
# Scatter plot
pokemon_by_weight_height.iplot(kind="scatter", # type of plot
                               mode='markers', # show only markers(dots), not lines
                               x='weight_kg', # which columns will be the used for x-values
                               y='height_m', # which columns will be used for y-values
                               text="name", # name of the pokemon will be displayed when you hoover your mouse over it
                               xTitle="Weight (kg)", # x-axis title
                               yTitle="Height (m)", # y-axis title
                               title="Physical attributes of Pokémon") # title of the plot

### Challenges:
 - Find the top ten strongest Pokémon (use **attack** column - higher base attack means stronger Pokémon).
     - What is the base defense for these Pokémon? (use **attack** and **defense** columns)
 - Draw a scatter plot for defence (on y-axis) and attack (on x-axis) for the top ten strongest Pokémon.

## Part D: Pokémon by primary type

Let's analyze Pokémon by their primary type.

In [None]:
# unique primary types
pokemon_types1 = pokemon["type1"].unique()

# how many primary types do exist
print(len(pokemon_types1), "types")

# print the actual type names
print(pokemon_types1)

Would it be possible to find and list these 18 primary types of Pokémon manually?

Which type of Pokémon do you like the most?

In [None]:
# calculate how many of the Pokémon belong to every type
pokemon_by_type = pokemon.groupby("type1").size()

# create additional column "count" to store the number of Pokémon 
pokemon_by_type = pokemon_by_type.reset_index(name='count')

# sort by number of Pokémon - what type has the largest number
pokemon_by_type = pokemon_by_type.sort_values(['count'], ascending=False)

# print the results on the screen
pokemon_by_type

In [None]:
# this time we use pie chart for visualization
pokemon_by_type.iplot(kind="pie", # type of chart
                      labels = "type1", # which column are we using for labels
                      values = "count", # which column is used for plotting
                      colors = colors20) # use extended color palette with 20 colors

### Challenges:

Analyze Pokémon by secondary type available in **type2** column.

 - How many secondary types exist?
 - Provide the number of Pokémon by secondary type.
 - How many secondary types are there for water-type Pokémon? (Use `water_type = pokemon[pokemon["type1"]=="water"]` to subset only water-type Pokémon)

## Part E: Attack and defense by primary type

Let's find the attack and defense capabilities of Pokémon by their primary type.

In [None]:
# we will use scatterplot again to plot attack on x-axis and defense on y-axis for all the Pokémon
# we will color by primary type using categories ="type1"

pokemon.iplot(kind="scatter",mode='markers', x='attack', y='defense', categories ="type1",
              text ="name", color=colors20, xTitle="Attack", yTitle="Defence")

This looks a little messy. Let's specify a Pokémon type manually using `input()` function in the next code cell.


**Note**: If you enter multiple types or a type that doesn't exist then the code will give an error. Execute the cell again to start over.

In [None]:
print("Enter Pokémon primary type: ")

# will read user input into "input1" varable
input1 = input()

# we do the same scatter plot but subset the data by input
pokemon[(pokemon["type1"]==input1)].iplot(kind="scatter",mode='markers', x='attack', y='defense', 
              categories ="type1",text ="name",color=colors20,xTitle="Attack",yTitle="Defence")

### Challenge:

Use code from the cell above to add two user inputs and plot two Pokémon types simultaneously.
  - Use `pokemon[(pokemon["type1"]==input1) | (pokemon["type1"]==input2)]` to select two types at the same time.

## Part F: Average attack and defense by primary type (optional)

What would be the best way to compare Pokémon with different primary types? Let's take the mean of their attack and defense capabilities and visualize them. 

**Note that this section is not mandatory.**

In [None]:
# calculate average attack and defense by primary type
avg_by_type = pokemon[['type1', 'attack', 'defense']].groupby('type1').mean()

# how many Pokémon belong to each primary type
counts = pokemon.groupby('type1').size()

# combine Pokémon counts and average attack/defense
avg_by_type["count"] = counts

# reset index to have "type1" as a column
avg_by_type = avg_by_type.reset_index()

avg_by_type

In [None]:
# create a bubble chart
# the size of bubble is set to "count" - the bigger the bubble is the more Pokémon belong to this primary type
avg_by_type.iplot(kind="bubble",mode='markers', x='attack', y='defense', size="count", 
              categories ="type1",color=colors20)

### Challenge:
- Compare the Pokémon with different secondary types. Create a bubble chart as above.

## Summary

This workbook analyzes the **Pokémon** dataset from Kaggle with the help of Python code blocks. Various physical attributes of Pokémon are analyzed to identify heaviest and shortest Pokémon. Also, their attack and defense capabilities are visualized by their primary types and distinct relevant challenges are addressed.

By taking part in this hackathon and completing these challenges, you learned how to analyze a big dataset which is impractical to do manually, created visualizations, and most importantly developed [*computational thinking*](https://en.wikipedia.org/wiki/Computational_thinking) abilities which can be used to solve various problems.

## Hackathon Reflections
Write about some or all of the following questions, either individually in separate markdown cells or as a group.
- What is something you learned through this process?
- How well did your group work together? Why do you think that is?
- What were some of the hardest parts?
- What are you proud of? What would you like to show others?
- Are you curious about anything else related to this? Did anything surprise you?
- How can you apply your learning to future activities?

![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)