![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

In [None]:
#from IPython.display import HTML, display
#display(HTML("<table><tr><td><img src='data/pokemon.png' width='730'></td><td><img src='data/cosmoem.jpeg' width='270'></td></tr></table>"))

### Prep work

In [None]:
#library should be installed already
#!pip install cufflinks ipywidgets

Run the next cell to load libaries and pre-defined functions:

In [None]:
# load libraries and helper code
#from helper_code.pokemon import *
import pandas as pd
import chart_studio.plotly as py
import cufflinks as cf
import IPython
from plotly.offline import init_notebook_mode
cf.go_offline()

colors20 = ['#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#46f0f0', 
          '#f032e6', '#bcf60c', '#fabebe', '#008080', '#e6beff', '#9a6324', '#fffac8', 
          '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#808080', '#ffffff', '#000000']

#to enable plotting in colab
def enable_plotly_in_cell():
    display(IPython.core.display.HTML('''
        <script src="/static/components/requirejs/require.js"></script>
  '''))
    init_notebook_mode(connected=False)

get_ipython().events.register('pre_run_cell', enable_plotly_in_cell)

### Getting data
Pokemon dataset was downloaded from [Kaggle](https://www.kaggle.com/rounakbanik/pokemon).

**Kaggle** is the online community of data scientists and machine learners and a well-known competition platform for predictive modeling and analytics.


In [None]:
#reading from cloud object storage
target_url="https://swift-yeg.cloud.cybera.ca:8080/v1/AUTH_d22d1e3f28be45209ba8f660295c84cf/hackaton/pokemon.csv"

In [None]:
#reading the input file and creating dataframe
pokemon = pd.read_csv(target_url) 

In [None]:
#how many rows and colums does the dataframe have?
pokemon.shape

In [None]:
#what are the column names?
pokemon.columns

Here is the column description from Kaggle:

- **name**: The English name of the Pokemon
- **japanese_name**: The Original Japanese name of the Pokemon
- **pokedex_number**: The entry number of the Pokemon in the National Pokedex
- **percentage_male**: The percentage of the species that are male. Blank if the Pokemon is genderless.
- **type1**: The Primary Type of the Pokemon
- **type2**: The Secondary Type of the Pokemon
- **classfication**: The Classification of the Pokemon as described by the Sun and Moon Pokedex
- **height_m**: Height of the Pokemon in metres
- **weight_kg**: The Weight of the Pokemon in kilograms
- **capture_rate**: Capture Rate of the Pokemon
- **base_egg_steps**: The number of steps required to hatch an egg of the Pokemon
- **abilities**: A stringified list of abilities that the Pokemon is capable of having
- **experience_growth**: The Experience Growth of the Pokemon
- **base_happiness**: Base Happiness of the Pokemon
- **against_?**: Eighteen features that denote the amount of damage taken against an attack of a particular type
- **hp**: The Base HP of the Pokemon
- **attack**: The Base Attack of the Pokemon
- **defense**: The Base Defense of the Pokemon
- **sp_attack**: The Base Special Attack of the Pokemon
- **sp_defense**: The Base Special Defense of the Pokemon
- **speed**: The Base Speed of the Pokemon
- **generation**: The numbered generation which the Pokemon was first introduced
- **is_legendary**: Denotes if the Pokemon is legendary.

In [None]:
#display first 20 columns to explore how the data look like
#0:20 - means from column 0 up to column 19
pokemon.iloc[:,21:40].head()

### Challenge

 - Display columns from 21 to 40.

### Weight and height 

In [None]:
#let's select only two columns "weight_kg" and "name"
pokemon_by_weight = pokemon[["weight_kg","name"]]

#we will order by "weight_kg" in descending order
pokemon_by_weight = pokemon_by_weight.sort_values(by=['weight_kg'],ascending=False)

#print out the top 10 heaviest pokemon
pokemon_by_weight.head(10)

In [None]:
#we need to set index of rows to "name" so that the graph has pokemon names as labels
#.iplot(kind="bar") is used to create bar chart

pokemon_by_weight.head(10).set_index("name").iplot(kind="bar",yTitle="Weight")

### Challenge
- Using the example above, create new cell(s) and find the 20 shortest Pokemons (use **height_m** column).
- Create a horizontal bar chart or line chart for the 20 shortest Pokemons by changing `iplot(kind="bar")` to `iplot(kind="barh")`  or `.iplot(kind="line")` respectively.
    -  Which graph helps you better understand the results?

### Bonus
 - Run the cell below and change `iplot(kind="histogram")` to `iplot(kind="box")`.
 
 - Do you know how to interpret [histograms](https://www.mathsisfun.com/data/histograms.html) and [boxplots](https://www.mathsisfun.com/definitions/box-and-whisker-plot.html)?

In [None]:
pokemon_by_weight.set_index("name").iplot(kind="histogram")

### For the top 10 heaviest pokemons, what will be the height?

In [None]:
#this time we are interested in weight and height - we select three columns "weight_kg","height_m" and "name"
pokemon_by_weight_height = pokemon[["weight_kg","height_m","name"]]

#we will order by weight_kg in descending order
pokemon_by_weight_height = pokemon_by_weight_height.sort_values(by=['weight_kg'],ascending=False)

#print on the screen the top 10 heaviest pokemon
pokemon_by_weight_height.head(10)

Interestingly, [Cosmoem](https://www.pokemon.com/us/pokedex/cosmoem) has a super large weight and minimum height!

Let's try to visualize the height and weight for 10 heaviest pokemons using bar chart.

In [None]:
pokemon_by_weight_height.head(10).set_index("name").iplot(kind="bar")

Do you think bar chart is a good option in case we want to visualize height and weight of all pokemons? May be not. Let us try [scatter plot](https://en.wikipedia.org/wiki/Scatter_plot) for that purpose.

In [None]:
# Scatter plot

pokemon_by_weight_height.iplot(kind="scatter", # type of plot
                               mode='markers', # show only markers(dots), not lines
                               x='weight_kg', # which columns will be the used for x-values
                               y='height_m', # which columns will be used for y-values
                               text="name", # name of the pokemon will be displayed when you hoover your mouse over it
                               xTitle="Weight", # x-axis title
                               yTitle="Height") # y-axis title

### Challenge
 - Using the example above, create new cell(s) and find top 10 strongest pokemons (with highest base attack).
     - What is the base defense for these pokemons? (use **attack** and **defense** columns)
 - Plot defence and attack for the top 10 strongest Pokemons.

### Pokemon by primary type

In [None]:
#unique primary types
pokemon_types1 = pokemon["type1"].unique()

#how many primary types do exist
print(len(pokemon_types1), "types")

#print the actual type names
print(pokemon_types1)

In [None]:
#calculate how many of the pokemons belong to every type
pokemon_by_type = pokemon.groupby("type1").size()

#create additional column "count" to store the number of pokemons 
pokemon_by_type = pokemon_by_type.reset_index(name='count')

#sort by number of pokemons - what type has the largest number
pokemon_by_type = pokemon_by_type.sort_values(['count'], ascending=False)

#print the results on the screen
pokemon_by_type

In [None]:
#this time we use pie chart for visualization

pokemon_by_type.iplot(kind="pie", # type of chart
                      labels = "type1", # which column are we using for labels
                      values = "count", # which column is used for plotting
                      colors = colors20) # use extended color palette with 20 colors

### Challenge
Using the example above, create new cell(s) and analyze pokemons by secondary type available in **type2** column.
 - How many secondary types exist?
 - Provide the number of pokemons by secondary type.
 - How many secondary types are there for water type pokemons? (Use `water_type = pokemon[pokemon["type1"]=="water"]` to subset only water-type Pokemons)

### Attack and defense by primary type

In [None]:
#we will use scatterplot again to plot attack on x axis and defense on y axis for all the pokemons
#we will color by primary type using categories ="type1"

pokemon.iplot(kind="scatter",mode='markers', x='attack', y='defense', categories ="type1",
              text ="name",color=colors20,xTitle="Attack",yTitle="Defence")

This looks a little messy. Let's specify the pokemon type manually using `input()` function.


**Note**: If you enter multiple types or a type that doesn't exist then the code will give an error. Execute the cell again to start over.

In [None]:
print("Enter Pokemon primary type: ")

# will read user input into input1 varable
input1 = input()


#we do the same graph but subset the data by input
pokemon[(pokemon["type1"]==input1)].iplot(kind="scatter",mode='markers', x='attack', y='defense', 
              categories ="type1",text ="name",color=colors20,xTitle="Attack",yTitle="Defence")

### Challenge

Create a new code cell and use code from the cell above to add two user inputs and plot two Pokemon types simultaneously
  - Use `pokemon[(pokemon["type1"]==input1) | (pokemon["type1"]==input2)]` to select two types at the same time


##  Extra 

### Average attack and defense by primary type

In [None]:
#calculate average attack and defense by primary type
avg_by_type = pokemon[['type1', 'attack', 'defense']].groupby('type1').mean()

#how many pokemon belong to each primary type
counts = pokemon.groupby('type1').size()

#combine pokemon counts and average attack/defense
avg_by_type["count"] = counts

#reset index to have "type1" as a column
avg_by_type = avg_by_type.reset_index()

avg_by_type

In [None]:
# we use bubble chart here
#the size of bubble is set to "count" - the bigger the bubble is the more Pokemon belong to this primary type

avg_by_type.iplot(kind="bubble",mode='markers', x='attack', y='defense', size="count", 
              categories ="type1",color=colors20)

![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)