<a href="https://colab.research.google.com/github/tinkercademy/ml-notebooks/blob/main/Data Science/05_Plotting_Pokemon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Plotting in Python

In [None]:
# If you are mounting from google drive, uncomment and run the below lines. Else, you can ignore this
# from google.colab import drive
# drive.mount('/content/drive')

In [None]:
#If you are importing from Colab's Files, uncomment and use the below lines. Else, you can ignore this
# data = pd.read_csv("pokemon.basestats.csv", index_col = 'Name')

## Matplotlib

In this notebook, we'll start plotting things out using the `matplotlib` library.

Most notebooks will add lines like the below at the start, to indicate early on what modules have been imported. You want to do this early on for neatness and for others to know what modules they need to run your code, so even if you realise later on that you want to add modules, you should still come all the way to the top and import them.

`%matplotlib inline` is a special command that configures the plotting library, matplotlib, to show its results inline.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
data = pd.read_csv("https://raw.githubusercontent.com/iamamangosteen/pythonforbusiness/master/pokemon.basestats.csv", index_col = 'Name')
data = data.loc[:,'HP':'Speed'].head(13) #slice out a subset of the dataset that only contains numerical information. Here we'll only keep the first 13 rows for testing purposes.
data

<hr>

To plot, we just use the handy `.plot()` function, built into each DataFrame:

However, the graph looks like someone trampled all over some leaves. Our secondary school maths teachers would not be pleased. Let's fix things up a bit by making adjustments to the `plt` module. See if you can figure out what each line means.

Note the ordering--you plot _first_, in the first line, then you adjust the settings!

Now, how about showing more than one pokemon at a time? You can just plot each of them separately.

### <font color="red">Exercise 1: Plotting Pokemon</font>

Plot the stats of Venusaur, Charizard, and Blastoise.

Extra challenge: But their plot line colours don't match their types! See if you can fix this. Look through the `.plot` [documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html) to make Venusaur's plot line green, Charizard's plot line orange, and Blastoise's plot line blue.

### <font color="red">Exercise 2: Minimum & Maximum Stats</font>

Plot the minimum stats for all the Pokemon in the dataset. Add a second plot line that shows the maximum stats. Hint: remember the .min() and .max() operations we could call on dataframes?

### <font color="red">Exercise 3: Other plots</font>

Try this out. What does this plot? What does the `s` parameter refer to? What is 1e1?

In [None]:
data = pd.read_csv('https://raw.githubusercontent.com/iamamangosteen/pythonforbusiness/master/pokedex_(Update_2021.04).csv', index_col="name")
data["totaldefense"] = data["hp"]*data["defense"]*data["sp_defense"]
data.plot(kind='scatter', x='attack', y='sp_attack', s=data['totaldefense']/5e4)

## Plot.ly

There are also other plotting options available. One popular one is plot.ly, a data visualization toolbox that’s compatible with  Jupyter notebooks. It has an offline mode that allows you to save files locally inside ipython notebooks. Plotly is nice because you can dynamically zoom in on plot sections and hover over your plot to identify specific data points.

Note: saving your notebooks with plots results in a much larger file size than without. To save without plots, simply clear your output by going to Cell > All Output > Clear.

First, we must import required libraries:

In [None]:
#import modules
import pandas as pd
import plotly
import plotly.graph_objs as go

In [None]:
#let's first setup a dataframe to use, same as before, except this time we store it in a variable called df
df = pd.read_csv("https://raw.githubusercontent.com/iamamangosteen/pythonforbusiness/master/pokemon.basestats.csv", index_col = 'Name')
df = df.loc[:, "HP":"Speed"]
df.head()

### Plotting with Plotly
Plotly has a few different ways to create plots. One of these is called Plotly Express, which is designed to be a way to quickly plot data with minimal code. The big downside to Plotly Express is that you really don't have many customisation options. We've already seen a basic plotting option with matplotlib, so we're going to skip Plotly Express and instead jump straight into the Plotly's full plotting options.

So far we've only plotted scatter plots with plotly. What about a bar graph? You'll notice we use 'plotly.graph_objs.Scatter' in our above examples. To plot a bar graph, we'll use plotly.graph_objs.Bar. Using the same CSV file above, plot the number of each type of Pokemon with a bar graph.

We can also plot histograms that track the number of occurrences in a dataset. Here we use the ```.count()``` and ```.groupby()``` methods to count the number of each Type of pokemon (e.g. Fire, Water, etc.)

We also create a list of colours to customise our graph and make each Type display in an appropriate colour (e.g. orange for Fire, yellow for Electric, etc.)

Plotly also has a way to create attractive tables out of dataframes:

Plotly also interfaces with some pretty neat extensions:

### <font color="red">Exercise 10: Mapbox

One of these extensions is Mapbox, which allows us to plot longitude and latitude coordinates on a map. To plot on Mapbox maps with Plotly you'll need a Mapbox account and a Mapbox Access Token.

### Radar Charts

In [None]:
# Visualizing single Pokemon statistics
# source: https://www.kaggle.com/lakshyaag/data-visualization-pokemon-data

# Defining colors for graphs
colors = {"Bug": "#A6B91A","Dark": "#705746","Dragon": "#6F35FC","Electric": "#F7D02C","Fairy": "#D685AD","Fighting": "#C22E28","Fire": "#EE8130","Flying": "#A98FF3","Ghost": "#735797","Grass": "#7AC74C","Ground": "#E2BF65","Ice": "#96D9D6","Normal": "#A8A77A","Poison": "#A33EA1","Psychic": "#F95587","Rock": "#B6A136","Steel": "#B7B7CE","Water": "#6390F0",}

def polar_pokemon_stats(pkmn_name):
    poke = pd.read_csv("https://raw.githubusercontent.com/iamamangosteen/pythonforbusiness/master/pokemon.basestats.csv")
    pkmn = poke[poke.Name == pkmn_name]
    obj = go.Scatterpolar(
        r=[
            pkmn['HP'].values[0],
            pkmn['Attack'].values[0],
            pkmn['Defense'].values[0],
            pkmn['Sp. Atk'].values[0],
            pkmn['Sp. Def'].values[0],
            pkmn['Speed'].values[0],
            pkmn['HP'].values[0]
        ],
        theta=[
            'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'HP'
        ],
        fill='toself',
        marker=dict(
            color=colors[pkmn['Type 1'].values[0]]
        ),
        name=pkmn['Name'].values[0]
    )

    return obj


def plot_single_pokemon(name):
    layout = go.Layout(
        polar=dict(
            radialaxis=dict(
                visible=True,
                range=[0, 250]
            )
        ),
        showlegend=False,
        title="Stats of {}".format(name)
    )

    pokemon_figure = go.Figure(data=[polar_pokemon_stats(name)], layout=layout)
    pokemon_figure.show()

name = 'Charizard'
plot_single_pokemon(name)

### Histograms (Distribution Plots)

In [None]:
import plotly.figure_factory as ff
df4 = pd.read_csv("https://raw.githubusercontent.com/iamamangosteen/pythonforbusiness/master/gdp_pop_all.csv", index_col = "country")
life_exp = ff.create_distplot([df4.lifeExp_1952, df4.lifeExp_2007], ['Life Expectancy 1952', 'Life Expectancy 2007'], bin_size=2)
life_exp.show()

### Semi-Logarithmic Plots

In [None]:
df = pd.read_csv('http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt', sep='\t')
df2007 = df[df.year==2007]
df1952 = df[df.year==1952]
df.head(2)

fig = {
    'data': [
        {
            'x': df2007.gdpPercap,
            'y': df2007.lifeExp,
            'text': df2007.country,
            'mode': 'markers',
            'name': '2007'},
        {
            'x': df1952.gdpPercap,
            'y': df1952.lifeExp,
            'text': "df1952.country",
            'mode': 'markers',
            'name': '1952'}
    ],
    'layout': {
        'xaxis': {'title': 'GDP per Capita', 'type': 'log'},
        'yaxis': {'title': "Life Expectancy"}
    }
}

fig = go.Figure(fig)
fig.show()

### Stacked Bar Graphs

In [None]:
import numpy as np

N = 20
x = np.linspace(1, 10, N)
y = np.random.randn(N)+3
y2 = np.random.randn(N)+6
y3 = np.random.randn(N)+9
y4 = np.random.randn(N)+12
df = pd.DataFrame({'x': x, 'y': y, 'y2':y2, 'y3':y3, 'y4':y4})
df.head()

data = [
    go.Bar(
        x=df['x'], # assign x as the dataframe column 'x'
        y=df['y']
    ),
    go.Bar(
        x=df['x'],
        y=df['y2']
    ),
    go.Bar(
        x=df['x'],
        y=df['y3']
    ),
    go.Bar(
        x=df['x'],
        y=df['y4']
    )

]

layout = go.Layout(
    barmode='stack',
    title='Stacked Bar Graph'
)

fig = go.Figure(data=data, layout=layout)
fig.show()


### Scatter Plots

In [None]:
# Attack vs. Defense of pokemon over generations, sized by HP
# source: https://www.kaggle.com/lakshyaag/data-visualization-pokemon-data

poke = pd.read_csv("https://raw.githubusercontent.com/iamamangosteen/pythonforbusiness/master/pokemon.basestats.csv")

def attack_vs_def(type):
    type_data = poke[poke['Type 1'] == type]
    data = []

    for i in range(1, 7):
        gen = type_data[type_data['Generation'] == i]
        trace = go.Scatter(
            x=gen['Attack'],
            y=gen['Defense'],
            mode='markers',
            marker=dict(
                symbol='circle',
                sizemode='area',
                size=gen['HP'],
                sizeref=2. * max(gen['HP']) / (2000),
                line=dict(
                    width=2
                ),
            ),
            name='Generation {}'.format(i),
            text=type_data['Name']
        )
        data.append(trace)

    layout = go.Layout(
        showlegend=True,
        xaxis=dict(
            title="Attack"
        ),
        yaxis=dict(
            title="Defense"
        ),
        title="Attack vs. Defense of {} pokemon over generations, sized by HP".format(type)
    )

    fig = go.Figure(data=data, layout=layout)
    return fig


attack_vs_def('Electric').show()

### Line Charts

In [None]:
# Visualizing the trend of stats by type and generation.
# source: https://www.kaggle.com/lakshyaag/data-visualization-pokemon-data

poke = pd.read_csv("https://raw.githubusercontent.com/iamamangosteen/pythonforbusiness/master/pokemon.basestats.csv")

def stats_by(classifier):
    data = []
    stats_names = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']
    stats = poke.groupby(classifier)[stats_names].mean().reset_index()
    for stat in stats_names:
        stat_line = go.Scatter(
            x=stats[classifier],
            y=stats[stat],
            name=stat,
            line=dict(
                width=3,
            ),
        )

        data.append(stat_line)

    layout = go.Layout(
        title='Trend of stats by {}'.format(classifier),
        xaxis=dict(title=classifier),
        yaxis=dict(title='Values')
    )

    trend = go.Figure(data=data, layout=layout)
    trend.show()


stats_by('Generation')
stats_by('Type 1')

### We'll stop introducing Plot.ly here
- there's still plenty more that can be done with plot.ly, but this should serve as a good introduction to create basic plots.
- for additional resources, please do check out the [plotly documentation](https://plotly.com/python/)!
