# Advanced plot

There are several libraries to pruduce amazing plots, like:

- [Plotly](https://plot.ly/)
- [Bokeh](https://bokeh.pydata.org)
- [Dash](https://plot.ly/products/dash/)
- [Folium](http://python-visualization.github.io/folium/)
- [Matplotlib](https://matplotlib.org/)
- [Seaborn](https://seaborn.pydata.org/)
- [ggplot](https://ggplot2.tidyverse.org/)
- [Pygal](http://www.pygal.org)
- [Geoplotlib](https://github.com/andrea-cuttone/geoplotlib/wiki/User-Guide)
- [missingno](https://github.com/ResidentMario/missingno)   


In [0]:
# install additional packages
!pip install plotly --upgrade

# restart runtime!!!!!!!!!

In [0]:
# execute this cell before and after update plotly
# check the version
from plotly import __version__
__version__ 


Let's use the first, because it is one of the most used libraries in data science and has the capacity of create complex charts with few lines of code. 


In [0]:
# Importing libraries
import pandas as pd
import numpy as np
#import plotly as py
import plotly.offline as pyo
import plotly.graph_objs as go

#from __future__ import division

In [0]:
# Cell Configuration for plotly over google colab
# This method pre-populates the outputframe with the configuration that Plotly
# expects and must be executed for every cell which is displaying a Plotly graph.
def configure_plotly_browser_state():
  import IPython
  display(IPython.core.display.HTML('''
        <script src="/static/components/requirejs/require.js"></script>
        <script>
          requirejs.config({
            paths: {
              base: '/static/base',
              plotly: 'https://cdn.plot.ly/plotly-1.5.1.min.js?noext',
            },
          });
        </script>
        '''))

Before we code advanced plots with Plotly, let's plot some basic charts such as those which we created with pandas, but now using Plotly. 

# Basic Charts

## 1 - Bar Chart

Basically, the idea is to create an object of type *Bar* containing at least the values of x and y axis.

In [0]:
# basic functions to run on google colab
configure_plotly_browser_state()
pyo.init_notebook_mode(connected=False)

bar = go.Bar(x = ['A','B','C','D','E'],y = np.random.randint(30,100,5))
data = [bar]

# Type of bar variable
print(type(bar))

# Plotting data
pyo.iplot(data)

## 2 - Line Chart

To creating a line chart, it's needed to use a *Scatter* object containing at least the values of x and y axis, as defined before in bar chart.

In [0]:
# basic functions to run on google colab
configure_plotly_browser_state()
pyo.init_notebook_mode(connected=False)

values = np.random.rand(2,10)

# Create a trace
trace = go.Scatter(
    x = np.arange(0,100,10),
    y = values[0]
)

data = [trace]

pyo.iplot(data)

## 3 - Area Chart

The area chart uses same structure of line chart, but it requires to setup a value to property **fill** inside *Scatter* object. It's possible to use the following strings as values:  "none", "tozeroy", "tozerox", "tonexty", "tonextx", "toself" and "tonext". 

In [0]:
# basic functions to run on google colab
configure_plotly_browser_state()
pyo.init_notebook_mode(connected=False)

trace1 = go.Scatter(
    x = np.arange(0,100,10),
    y = values[0],
    fill = 'tozeroy'
)

data = [trace1]
pyo.iplot(data)

It's also possible to plot multiple areas, because each area is defined by a *Scatter* object, see the example bellow:

In [0]:
# basic functions to run on google colab
configure_plotly_browser_state()
pyo.init_notebook_mode(connected=False)

trace2 = go.Scatter(
    x = np.arange(0,100,10),
    y = values[1],
    fill = 'tozeroy'
)

data = [trace1, trace2]
pyo.iplot(data)

## 4 - Scatter chart

The default **mode** property of a *Scatter* object is the value *lines*, but to create a simple scatter without these lines, it is necessary to change the **mode** value from *lines* to *markers*.

In [0]:
# basic functions to run on google colab
configure_plotly_browser_state()
pyo.init_notebook_mode(connected=False)


# Create a trace
trace = go.Scatter(
    x = np.random.randn(300),
    y = np.random.randn(300),
    mode = 'markers'
)

data = [trace]

pyo.iplot(data)

# Advanced Charts

In [0]:
# Reading dataset
global_power = pd.read_csv('https://github.com/cs-ufrn/intro-python-data-science/raw/master/datasets/global_power_plant_database.csv')

# Preview of dataset
global_power.head()

## 1 - Polar Chart: Top 5 producers

In [0]:
# basic functions to run on google colab
configure_plotly_browser_state()
pyo.init_notebook_mode(connected=False)

# Filtering dataset by country and fuel
countries_fuel = global_power[['country', 'fuel1', 'country_long']]

# Getting the top 5 energy producer
top_5 = countries_fuel.groupby('country').count().sort_values(by=['fuel1'], ascending=False).head(5).index.tolist()

# Choosing energy sources
energies = ['Hydro', 'Wind', 'Oil', 'Gas', 'Solar']
data  = []

index = 1
for country in top_5:
    c = countries_fuel.loc[countries_fuel['country'] == country].groupby('fuel1').count()
    c = c.loc[energies]['country'].tolist()
    data.append(
        go.Scatterpolar(
            r = c,
            theta = energies,
            fill = 'toself',
            name = country,
            subplot = "polar" + str(index)
       )
    )
    index += 1

layout = go.Layout(
    title = 'Energy grid in the major countries',
    polar1 = dict(
        domain = dict(
            x = [0, .2],
            y = [0, .5]
        ),
        radialaxis = dict(
            visible = True,
            range = [0, 2000]
        )
    ),
    polar2 = dict(
        domain = dict(
            x = [.2, .4],
            y = [.5, 1]
        ),
        radialaxis = dict(
            visible = True,
            range = [0, 1000]
        )
    ),
    polar3 = dict(
        domain = dict(
            x = [.4, .6],
            y = [0, .5]
        ),
        radialaxis = dict(
            visible = True,
            range = [0, 1200]
        )
    ),
    polar4 = dict(
        domain = dict(
            x = [.6, .8],
            y = [.5, 1]
        ),
        radialaxis = dict(
            visible = True,
            range = [0, 800]
        )
    ),
    polar5 = dict(
        domain = dict(
            x = [.8, 1],
            y = [0, .5]
        ),
        radialaxis = dict(
            visible = True,
            range = [0, 800]
        )
    ),
)

fig = go.Figure(data=data, layout=layout)
pyo.iplot(fig)

In [0]:
# basic functions to run on google colab
configure_plotly_browser_state()
pyo.init_notebook_mode(connected=False)

data_ = []

index = 1
for country in top_5:
    c = countries_fuel.loc[countries_fuel['country'] == country].groupby('fuel1').count()
    c = c.loc[energies]['country'].tolist()
    data_.append(
        go.Scatterpolar(
            r = c,
            theta = energies,
            fill = 'toself',
            name = country,
            subplot = "polar"
       )
    )
    index += 1

layout_ = go.Layout(
    title = 'Energy grid in the major countries',
    polar = dict(
        domain = dict(
            x = [0, 1],
            y = [0, 1]
        ),
        radialaxis = dict(
            visible = True,
            range = [0, 2000]
        )
    ),
)

fig = go.Figure(data=data_, layout=layout_)
pyo.iplot(fig)

## 2 - Map: Places of energy production in Brazil

In [0]:
# Limiting the dataset to Brazil
countries_fuel_pos = global_power.loc[global_power['country'] == 'BRA'][['fuel1', 'latitude', 'longitude', 'capacity_mw']]

# Filtering by energies
countries_fuel_pos = countries_fuel_pos[countries_fuel_pos.fuel1.isin(energies)]

# Preview of data
countries_fuel_pos.head()

In [0]:
# basic functions to run on google colab
configure_plotly_browser_state()
pyo.init_notebook_mode(connected=False)

energies = ['Hydro', 'Wind', 'Oil', 'Gas', 'Solar']
colors  = [
            [[0,"rgb(0,191,255)"], [1,"rgb(0,191,255)"]], 
            [[0,"rgb(173,255,47)"], [1,"rgb(173,255,47)"]], 
            [[0,"rgb(105,105,105)"], [1,"rgb(105,105,105)"]], 
            [[0,"rgb(255,69,0)"], [1,"rgb(255,69,0)"]], 
            [[0,"rgb(255,215,0)"], [1,"rgb(255,215,0)"]],
          ]
markers = ['circle', 'square', 'diamond-tall-dot', 'hexagram', 'triangle-up']

data_map = []
index = 0
for energy in energies:
    c = countries_fuel_pos.loc[countries_fuel_pos['fuel1'] == energy]
    data_map.append(
        dict(
            type = 'scattergeo',
            locationmode = 'country names',
            lon = c['longitude'],
            lat = c['latitude'],
            mode = 'markers',
            name = energy,
            marker = dict(
                size = 8,
                opacity = 1,
                symbol = markers[index],
                colorscale = colors[index],
                cmin = 0,
                color = countries_fuel_pos['capacity_mw'],
                cmax = countries_fuel_pos['capacity_mw'].max(),
                line = dict (
                    color = 'rgb(0,0,0)',
                    width = 1
                )
            )
        )
    )
    index += 1

layout_map = dict(
        title = 'Sources of energy production<br>in Brazil',
        width = 800,
        height = 800,
        geo = dict(
            scope='south america',
            projection=dict( type='mercator' ),
            showland = True,
            landcolor = "rgb(250, 250, 250)",
            subunitcolor = "rgb(217, 217, 217)",
            countrycolor = "rgb(217, 217, 217)",
            countrywidth = 1,
            subunitwidth = 1
        ),
    )

fig = dict( data=data_map, layout=layout_map )
pyo.iplot( fig )

## 3 - Funnel Chart: Oil production in Brazil in relation to the world

In [0]:
# Reading a new dataset of countries
all_countries = pd.read_csv('https://raw.githubusercontent.com/cs-ufrn/intro-python-data-science/master/datasets/all_countries.csv')

# Preview of new dataset
all_countries.head()

In [0]:
# Filtering the dataset to limit only oil production
global_oil = global_power.loc[global_power['fuel1'] == 'Oil'][['country']]

# Number of places which produce oil in world
global_oil_size = global_oil.size
print("Global: ", global_oil_size)

In [0]:
# Filtering the dataset to limit only American countries
american_countries      = all_countries.loc[all_countries['region'] == 'Americas']
american_countries_code = american_countries['alpha-3'].unique().tolist()
american_oil = global_oil[global_oil.country.isin(american_countries_code)]

# Number of places which produce oil in america
american_oil_size = american_oil.size
print("America: ", american_oil_size)

In [0]:
# Filtering the dataset to limit only South-American countries
south_american_countries      = american_countries.loc[american_countries['intermediate-region'] == 'South America']
south_american_countries_code = south_american_countries['alpha-3'].unique().tolist()
south_american_oil = american_oil[american_oil.country.isin(south_american_countries_code)]

# Number of places which produce oil in south-america
south_american_oil_size = south_american_oil.size
print("South-America: ", south_american_oil_size)

In [0]:
# Filtering the dataset to limit only Brazil
brazil_oil = south_american_oil.loc[south_american_oil['country'] == 'BRA']

# Number of places which produce oil in brazil
brazil_oil_size = brazil_oil.size
print("Brazil: ", brazil_oil_size)

<div class="alert alert-info">
<b>Creation of funnel Chart</b>
</div>

In [0]:
# Chart data
values = [global_oil_size, american_oil_size, south_american_oil_size, brazil_oil_size]
phases  = ['World', 'America', 'S.America', 'Brazil']

# color of each funnel section
colors = ['rgb(32,155,160)', 'rgb(253,93,124)', 'rgb(28,119,139)', 'rgb(182,231,235)']

In [0]:
n_phase = len(phases)
plot_width = 400

# height of a section and difference between sections 
section_h = 100
section_d = 10

# multiplication factor to calculate the width of other sections
unit_width = plot_width / max(values)

# width of each funnel section relative to the plot width
phase_w = [int(value * unit_width) for value in values]

# plot height based on the number of sections and the gap in between them
height = section_h * n_phase + section_d * (n_phase + 1)

In [0]:
# list containing all the plot shapes
shapes = []

# list containing the Y-axis location for each section's name and value text
label_y = []

for i in range(n_phase):
    if (i == n_phase-1):
        points = [phase_w[i] / 2, height, phase_w[i] / 2, height - section_h]
    else:
        points = [phase_w[i] / 2, height, phase_w[i+1] / 2, height - section_h]
    
    # SVG code to draw polygons
    path = 'M {0} {1} L {2} {3} L -{2} {3} L -{0} {1} Z'.format(*points)

    shape = {
        'type': 'path',
        'path': path,
        'fillcolor': colors[i],
        'line': {
            'width': 1,
            'color': colors[i]
        }
    }
    shapes.append(shape)

    # Y-axis location for this section's details (text)
    label_y.append(height - (section_h / 2))

    height = height - (section_h + section_d)

In [0]:
# basic functions to run on google colab
configure_plotly_browser_state()
pyo.init_notebook_mode(connected=False)

# For phase names
label_trace = go.Scatter(
    x=[-350]*n_phase,
    y=label_y,
    mode='text',
    text=phases,
    textfont=dict(
        color='rgb(200,200,200)',
        size=15
    )
)
 
# For phase values
value_trace = go.Scatter(
    x=[350]*n_phase,
    y=label_y,
    mode='text',
    text=values,
    textfont=dict(
        color='rgb(200,200,200)',
        size=15
    )
)

data = [label_trace, value_trace]
 
layout = go.Layout(
    title="<b>Funnel Chart</b>",
    titlefont=dict(
        size=20,
        color='rgb(203,203,203)'
    ),
    shapes=shapes,
    height=560,
    width=900,
    showlegend=False,
    paper_bgcolor='rgba(44,58,71,1)',
    plot_bgcolor='rgba(44,58,71,1)',
    xaxis=dict(
        showticklabels=False,
        zeroline=False,
    ),
    yaxis=dict(
        showticklabels=False,
        zeroline=False
    )
)
 
fig = go.Figure(data=data, layout=layout)
pyo.iplot(fig)

## 4 - Treemap: Top 12 countries with more energy productions

In [0]:
!pip install squarify

In [0]:
import squarify

# basic functions to run on google colab
configure_plotly_browser_state()
pyo.init_notebook_mode(connected=False)


# Grouping the countries by number of production places
countries_fuel_sorted = countries_fuel.groupby('country').count().sort_values(by=['fuel1'], ascending=False).head(12)

x = 0.
y = 0.
width = 500.
height = 500.

values      = countries_fuel_sorted['fuel1'].tolist()
values_names = countries_fuel_sorted.index.tolist()
values_fullnames = countries_fuel[countries_fuel.country.isin(values_names)][['country','country_long']].drop_duplicates()

normed = squarify.normalize_sizes(values, width, height)
rects = squarify.squarify(normed, x, y, width, height)

# Choose colors
color_brewer = ['rgb(141,211,199)','rgb(255,255,179)','rgb(190,186,218)',
                'rgb(251,128,114)','rgb(128,177,211)','rgb(253,180,98)',
                'rgb(179,222,105)','rgb(252,205,229)','rgb(217,217,217)',
                'rgb(188,128,189)','rgb(204,235,197)','rgb(255,237,111)']
shapes = []
annotations = []
counter = 0

index = 0
for r in rects:
    shapes.append( 
        dict(
            type = 'rect', 
            x0 = r['x'], 
            y0 = r['y'], 
            x1 = r['x']+r['dx'], 
            y1 = r['y']+r['dy'],
            line = dict( width = 2 ),
            fillcolor = color_brewer[counter]
        ) 
    )
    
    txt  = str(values_fullnames[values_fullnames['country'] == values_names[index]]['country_long'].tolist()[0])
    txt += '<br>'
    txt += 'Places : '
    txt += str(values[index])
    
    annotations.append(
        dict(
            x = r['x']+(r['dx']/2),
            y = r['y']+(r['dy']/2),
            text = txt,
            showarrow = False
        )
    )
    index += 1
    counter = counter + 1
    if counter >= len(color_brewer):
        counter = 0
        
layout = dict(
    title="Top 12 countries with more energy productions",
    width=950,
    height=600,
    xaxis=dict(showgrid=False,zeroline=False,showticklabels=False),
    yaxis=dict(showgrid=False,zeroline=False,showticklabels=False),
    shapes=shapes,
    annotations=annotations,
    hovermode='closest'
)

figure = dict(data=[go.Scatter()], layout=layout)
pyo.iplot(figure)