# Plotly

This lab - and the accompanying Bokeh lab - serves as a walkthrough for the [Plotly](https://plot.ly) and [Bokeh](https://bokeh.pydata.org/en/latest/) packages, and can serve as a resource document as you build your visualizations for projects and challenges.

Plotly and Bokeh excel at interactive visualizations, advanced graphs, and multidimentional or multi-selectable variations in viewing data. At the end, we'll summarize how the major packages compare; our focus here is to showcase some common intermediate and advanced graphs you might want to build.

Let's focus on Plotly: it's a private company that sells licenses to products like its [Dash suite](https://github.com/plotly/dash/), but also maintains an open-source library for Python, R, and Javascript graphing. We'll focus on the [Python package](https://github.com/plotly/plotly.py/).

In [None]:
# load packages
import plotly.figure_factory as ff
import plotly.graph_objs as go

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot


import pandas as pd
import numpy as np

In [None]:
init_notebook_mode(connected=True)

In [None]:
# load iris dataset
from sklearn import datasets
iris = datasets.load_iris()

# create raw numpy arrays
X = iris.data
y = iris.target

In [None]:
# contains data dictionary and info to track our headers
iris.DESCR

iris.feature_names

In [None]:
# optional: create DataFrame version
# df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
#                     columns= iris['feature_names'] + ['target'])

# or load a DataFrame with nicer labeled headers and targets!
df = pd.read_csv('./iris.csv') # this dataset has labeled targets, an improvement on df above
df.head()

In [None]:
# interactive plotly table of data
table = ff.create_table(df)
iplot(table)

>Play around with the table above, utilizing the Plotly toolbar at the top right. How might this functionality be useful?

### Scatter Plots

The iris dataset has four feature dimensions, so Plotly is perfect for viewing how a majority interact.

In [None]:
# iris 3d using Pandas

data = []
clusters = []
colors = ['rgb(228,26,28)','rgb(55,126,184)','rgb(77,175,74)'] # set our dot colors

for i in range(len(df['Name'].unique())): # allows us to split our data into three distinct groups
    name = df['Name'].unique()[i]
    color = colors[i]
    x = df[ df['Name'] == name ]['SepalLength']
    y = df[ df['Name'] == name ]['SepalWidth']
    z = df[ df['Name'] == name ]['PetalLength']
    
    trace = dict(  # trace is how we "trace" or draw our data on the canvas
        name = name,
        x = x, y = y, z = z,
        type = "scatter3d",    
        mode = 'markers',
        marker = dict( size=3, color=color, line=dict(width=0) ) )
    data.append( trace )

layout = dict( # we modify our canvas here, including initial layout and styles
    width=800,
    height=550,
    autosize=False,
    title='Iris dataset',
    scene=dict(
        xaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)',
            title='Sepal Length',  # set titles, very important
            titlefont=dict(
            family='Courier New',
            size=14,
            color='#2f2f2f'),  # we can use hex, rgba, or other color variants
        ),
        yaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)',
            title='Sepal Width',  # set titles, very important
            titlefont=dict(
            family='Courier New',
            size=14,
            color='#4f4f4f'),
        ),
        zaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)',
            title='Petal Length',  # set titles, very important
            titlefont=dict(
            family='Courier New',
            size=14,
            color='#7f7f7f'),
        ),
        aspectratio = dict( x=1, y=1, z=1 ), # we can compress large dimensions this way
        aspectmode = 'manual'        
    ),
)

fig = dict(data=data, layout=layout) # this finally compiles our figure

# run locally in notebook
iplot(fig)

# run on site, may need an account and access key, depending on usage
# url = py.plot(fig, filename='pandas-3d-iris', validate=False)

>What seems to be the native structure of Plotly?

>Dictionaries, everything is designed in nested dictionaries!

### Bar Charts

We can create some [interesting bar chart variations](https://plot.ly/python/bar-charts/):

In [None]:
x_data = ['Product<br>Revenue', 'Services<br>Revenue',
          'Total<br>Revenue', 'Fixed<br>Costs',
          'Variable<br>Costs', 'Total<br>Costs', 'Total'] # our categories, notice use of <br> to break into a new line

# this formatting allows Plotly to push to 

y_data = [400, 660, 660, 590, 400, 400, 340]

text = ['$430K', '$260K', '$690K', '$-120K', '$-200K', '$-320K', '$370K'] # our labels

# Base
trace0 = go.Bar(  # notice we have a trace for each category
    x=x_data,
    y=[0, 430, 0, 570, 370, 370, 0],
    marker=dict(
        color='rgba(1,1,1, 0.0)',
    )
)
# Revenue
trace1 = go.Bar(
    x=x_data,
    y=[430, 260, 690, 0, 0, 0, 0],
    marker=dict(
        color='rgba(55, 128, 191, 0.7)',
        line=dict(
            color='rgba(55, 128, 191, 1.0)',
            width=2,
        )
    )
)
# Costs
trace2 = go.Bar(
    x=x_data,
    y=[0, 0, 0, 120, 200, 320, 0],
    marker=dict(
        color='rgba(219, 64, 82, 0.7)',
        line=dict(
            color='rgba(219, 64, 82, 1.0)',
            width=2,
        )
    )
)
# Profit
trace3 = go.Bar(
    x=x_data,
    y=[0, 0, 0, 0, 0, 0, 370],
    marker=dict(
        color='rgba(50, 171, 96, 0.7)',
        line=dict(
            color='rgba(50, 171, 96, 1.0)',
            width=2,
        )
    )
)
data = [trace0, trace1, trace2, trace3]
layout = go.Layout(
    title='Annual Profit- 2019',
    barmode='stack',
    paper_bgcolor='rgba(245, 246, 249, 1)',
    plot_bgcolor='rgba(245, 246, 249, 1)',
    showlegend=False
)

annotations = []

for i in range(0, 7):
    annotations.append(dict(x=x_data[i], y=y_data[i], text=text[i],
                                  font=dict(family='Arial', size=14,
                                  color='rgba(245, 246, 249, 1)'),
                                  showarrow=False,))
    layout['annotations'] = annotations

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='waterfall-bar-profit')

### Aggregations
We can also build in various aggregation functions (aggfuncs) and access them using the dropdown in the below example:

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/bcdunbar/datasets/master/worldhappiness.csv")

aggs = ["count","sum","avg","median","mode","rms","stddev","min","max","first","last"]

agg = []
agg_func = []
for i in range(0, len(aggs)):
    agg = dict(
        args=['transforms[0].aggregations[0].func', aggs[i]],
        label=aggs[i],
        method='restyle'
    )
    agg_func.append(agg)

data = [dict(
  type = 'choropleth',
  locationmode = 'country names',
  locations = df['Country'],
  z = df['HappinessScore'],
  autocolorscale = False,
  colorscale = 'Portland', # pick a color scheme
  reversescale = True,
  transforms = [dict(
    type = 'aggregate',
    groups = df['Country'],
    aggregations = [dict(
        target = 'z', func = 'sum', enabled = True)
    ]
  )]
)]

layout = dict(
  title = '<b>Plotly Aggregations</b><br>Use dropdown to change aggregation:',
  xaxis = dict(title = 'Subject'),
  yaxis = dict(title = 'Score', range = [0,22]),
  height = 600,
  width = 900,
  updatemenus = [dict(
        x = 0.85,
        y = 1.15,
        xref = 'paper',
        yref = 'paper',
        yanchor = 'top',
        active = 1,
        showactive = False,
        buttons = agg_func
  )]
)

iplot({'data': data,'layout': layout}, validate=False)

# Graphing Package Comparison:

1. Matplotlib
    - the original, and the foundation for many of the packages below
    - resembles MATLAB
    - highly customizable
    - lots of code
    - with customization comes huge, tedious documentation
    - takes a while to make viz look good, but has skins and styles that can be applied
    
    
2. Seaborn
    - very fast
    - built on matplotlib
    - looks good out of the box (but need to know matplotlib to modify)
    - lots of aggregate graphs with only one line of code
    
    
3. Pandas
    - super easy and fast, if your data is in Pandas format
    - limited options
    - looks like Matplotlib, stuck in the 90's


4. ggplot
    - comes from R
    - tight integration with Pandas, for better or worse
    - fast but low customization
    
    
5. Bokeh
    - interactive
    - easy output to JSON, HTML, or web apps
    - handles streaming data
    - verbose, lots of code
    
    
6. Plotly
    - interactive
    - web hosted graphics
    - has rare graphs like contour plots and dendrograms
    - benefits from being for-profit with robust documentation, but can't access top functionality for free


7. geoplotlib
    - map focused
    - offers choropleths, heatmaps, and dot density maps
    
    
8. D3.js
    - not Python, but plays well via a number of tools like Flask
    - extremely customizable
    - native to web: fast, responsive, and easy to share
    - have to know Javascript
    - thousands of examples, documentation can be tough to navigate

These are the main players to consider for your projects. Each has pros and cons, so choose wisely to maximize your options. In addition, seek to graph in a way that speaks organically to the widest audience?

>How do you do this?

>What are essential things we should do when sharing graphs?