# Plotly

### Handy hints 

* Some of the plotting libraries we use need to communicate a lot of data to the browser. Depending on which version of Jupyter you are running, you may need to launch this notebook with a higher data rate limit: `jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000`

* In general, we are using plotting libraries that return objects encapsulating the plot. You can check the type of these returned objects with `type()`. Jupyter's tools for exploring objects and methods will also be useful: the `?` and `??` operators, and tab autocompletion.

## Setup 

In [2]:
import pandas as pd
import numpy as np

In [3]:
# We may want to use some colours etc from other libraries
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
# Use the plotly.offline module to use plotly without a cloud account
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly import tools
import plotly

In [5]:
# init_notebook_mode() will inject the Plotly javascript into our notebook
# so we can display plots inline using iplot()
init_notebook_mode(connected=True)

### Toy data 

Let's create a toy dataset to use for illustration and problem-posing purposes. It's very small, so you we can easily see what's going on. In most cases we'll ask you to do exercises using the more complex housing data.

In [6]:
from io import StringIO

data_string = """name	number	engine_type	colour	wheels	top_speed_mph	weight_tons
Thomas	1	Tank	Blue	6	40	52
Edward	2	Tender	Blue	14	70	41
Henry	3	Tender	Green	18	90	72.2
Gordon	4	Tender	Blue	18	100	91.35
James	5	Tender	Red	14	70	46
Percy	6	Tank	Green	4	40	22.85
Toby	7	Tank	Brown	6	20	27
Emily	12	Tender	Green	8	85	45
Rosie	37	Tank	Purple	6	65	37
Hiro	51	Tender	Black	20	55	76.8"""

trains = pd.read_table(StringIO(data_string))
trains['size'] = pd.cut(trains['weight_tons'], 3, labels=['Small','Medium','Big'])

trains

Unnamed: 0,name,number,engine_type,colour,wheels,top_speed_mph,weight_tons,size
0,Thomas,1,Tank,Blue,6,40,52.0,Medium
1,Edward,2,Tender,Blue,14,70,41.0,Small
2,Henry,3,Tender,Green,18,90,72.2,Big
3,Gordon,4,Tender,Blue,18,100,91.35,Big
4,James,5,Tender,Red,14,70,46.0,Medium
5,Percy,6,Tank,Green,4,40,22.85,Small
6,Toby,7,Tank,Brown,6,20,27.0,Small
7,Emily,12,Tender,Green,8,85,45.0,Small
8,Rosie,37,Tank,Purple,6,65,37.0,Small
9,Hiro,51,Tender,Black,20,55,76.8,Big


### Housing data 

In [7]:
sales = pd.read_csv("housing-data-10000.csv", 
                    usecols=['id','date','price','zipcode','lat','long',
                             'waterfront','view','grade','sqft_living','sqft_lot'],
                    parse_dates=['date'], 
                    dtype={'zipcode': 'category',
                           'waterfront': 'bool'})

In [8]:
sales.head()

Unnamed: 0,id,date,price,sqft_living,sqft_lot,waterfront,view,grade,zipcode,lat,long
0,1446800660,2015-03-16,276500.0,1400,6650,False,0,6,98168,47.4888,-122.332
1,1453601502,2015-02-26,303697.0,2520,7334,False,0,7,98125,47.7263,-122.291
2,9523103990,2014-12-08,611000.0,1850,5000,False,0,7,98103,47.6727,-122.351
3,7308600040,2014-07-23,769995.0,3360,12080,False,0,9,98011,47.7757,-122.173
4,1562200240,2014-09-18,550000.0,2160,15360,False,0,8,98007,47.6232,-122.138


In [9]:
sales.dtypes

id                      int64
date           datetime64[ns]
price                 float64
sqft_living             int64
sqft_lot                int64
waterfront               bool
view                    int64
grade                   int64
zipcode              category
lat                   float64
long                  float64
dtype: object

## Plotly 

Plotly is a javascript library with APIs in several languages: Python, R and Matlab. It has a wide range of built-in plot types and works well with the Jupyter Notebook. Plots made using the Python API (or any API) are rendered with Javascript, and are all interactive.

Some useful references:

- Plotly Python reference home https://plot.ly/python/
- Plotly Python introductory user guide https://plot.ly/python/user-guide/
- Plotly online Graph Maker https://plot.ly/create/ 

Plotly was originally designed to work with hosted plots in the cloud, and that is still possible, but it is now also possible to work entirely offline without an account by using the `plotly.offline` module.

Cufflinks is a companion library to Plotly that makes it Pandas-aware. It gives it a syntax much like Seaborn's, where DataFrames and column names can be passed to plotting functions.  We won't use Cufflinks today, as more advanced Plotly plots currently aren't possible in Cufflinks syntax. You can learn about it [here](https://plot.ly/ipython-notebooks/cufflinks/).

### Simple interactive plotting

Here's a simple example of a Plotly plot. Try mouseover on the data to see the hover info, and try out the zoom/pan/select tools in the upper right. Also try clicking on the legend, which is interactive.

In [10]:
x = [1,2,3,4,5]
y = [2,5,10,17,26]
y2 = [1,4,9,11,9]

trace0 = go.Scatter(x=x, y=y, mode='lines', 
                    line=dict(color='blue'),
                    name='Projected')

trace1 = go.Scatter(x=x, y=y2, mode='markers', 
                    marker=dict(color='red', size=10),
                    name='Actual')

iplot([trace0, trace1])

An individual mapping of data to coordinates in Plotly is called a _trace_. Above, we created a plot specification by making a list of our two traces, and passed it to the `iplot()` function. More generally, the `iplot()` function can take either just data (as above), or a Figure, which incorporates both Data and Layout.

Layouts are optional, and specify axis behaviours, titles and labels, annotations, etc. They are also used to manage subplots. 


In [11]:
x = [1,2,3,4,5]
y = [2,5,10,17,26]
y2 = [1,4,9,11,9]

trace0 = go.Scatter(x=x, y=y, mode='lines', 
                    line=dict(color='blue'),
                    name='Projected')

trace1 = go.Scatter(x=x, y=y2, mode='markers', 
                    marker=dict(color='red', size=10),
                    name='Projected')

layout = go.Layout(title="An example plot",
                   width=600, 
                   height=400,
                   xaxis=dict(title='Month', range=[0,6]),
                   showlegend=True,
                   annotations=[dict(x=3, y=10,
                                     text="where it all went wrong", 
                                     showarrow=True)
                               ]
                  )

fig = go.Figure(data=[trace0, trace1], layout=layout)

iplot(fig)

Let's examine this figure object:

In [12]:
fig

{'data': [{'line': {'color': 'blue'},
   'mode': 'lines',
   'name': 'Projected',
   'type': 'scatter',
   'x': [1, 2, 3, 4, 5],
   'y': [2, 5, 10, 17, 26]},
  {'marker': {'color': 'red', 'size': 10},
   'mode': 'markers',
   'name': 'Projected',
   'type': 'scatter',
   'x': [1, 2, 3, 4, 5],
   'y': [1, 4, 9, 11, 9]}],
 'layout': {'annotations': [{'showarrow': True,
    'text': 'where it all went wrong',
    'x': 3,
    'y': 10}],
  'height': 400,
  'showlegend': True,
  'title': 'An example plot',
  'width': 600,
  'xaxis': {'range': [0, 6], 'title': 'Month'}}}

Our plot is represented by a declarative data structure. It closely parallels the JSON object sent to the Javascript library for rendering. We could have can declared this structure directly without using `plotly.graph_objs` classes, but the classes give us documentation, error checking, and utility functions.

In [13]:
# Plot using only a data structure
iplot({'data': [{'type': 'scatter',
                 'mode': 'lines+markers',
                 'x' : [0,2,4,6],
                 'y' : [2,5,5,2]}]})

Plots can be saved to HTML, in which case they retain their interactivity:

In [71]:
plot(fig, filename="example_plot.html")

'file:///Users/clare/Dropbox/LSCC_working/ASPP_viz_workshop/example_plot.html'

Plots can also be exported as images, either using the interactive toolbar on the plot itself, or via the `image` argument to `plot()`.

**Exercise:** Recreate, in Plotly, the scatter plot of house sales with x-coordinates given by `longitude` and y-coordinates given by `latitude`. Your plot may work better if you work with a sample of the data, e.g. `sample = sales.sample(4000)`, although it's not critical. You can use `go.Scattergl()` as a drop-in replacement for `go.Scatter()` - this is a WebGL implementation that handles large numbers of points better.

In [14]:
sample = sales.sample(4000)  # not subsampling is fine

scatter = go.Scattergl(x=sample['long'], y=sample['lat'], 
                       mode='markers',
                       marker={'opacity':0.5, 'size':5})

iplot([scatter])

### Subplots 

There are a couple of ways to make subplots in Plotly. A convenient option is `tools.make_subplots()`, a utility function to generate a figure with multiple subplots. We can then attach traces to whichever subplot we like.

In [15]:

histogram = go.Histogram(x=trains['weight_tons'])
scatter = go.Scatter(x=trains['weight_tons'], y=trains['top_speed_mph'], 
                     mode='markers',
                     marker=dict(size=10,
                                 color=trains['colour'])
                    )

# Make figure and attach subplots

fig = tools.make_subplots(rows=2, cols=1)

fig.append_trace(trace=histogram, row=1, col=1)
fig.append_trace(trace=scatter, row=2, col=1)

fig.update({'layout': go.Layout(showlegend=False,
                                width=600, height=600)})

iplot(fig)


This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x2,y2 ]



**Exercise:** Try adding the argument `shared_xaxes=True` to the `make_subplots()` call above, and observe the pan and zoom behaviour before and after.

See [here](https://plot.ly/python/subplots/) for more on subplots.

### Colour

If we want to use colour to show some categorical variable, we draw multiple traces onto one plot:

In [74]:
tanks_df = trains[trains['engine_type']=='Tank']
tenders_df = trains[trains['engine_type']=='Tender']

tanks_scatter = go.Scatter(x=tanks_df['weight_tons'], y=tanks_df['top_speed_mph'], 
                           mode='markers', 
                           marker=dict(size=10),
                           name="Tanks")
tenders_scatter = go.Scatter(x=tenders_df['weight_tons'], y=tenders_df['top_speed_mph'], 
                             mode='markers', 
                             marker=dict(size=10),
                             name="Tenders")

layout = go.Layout(xaxis=dict(rangemode='tozero',
                              title='Weight'),
                   yaxis=dict(rangemode='tozero',
                              title='Top speed')
                  )

fig = go.Figure(data=[tanks_scatter, tenders_scatter], layout=layout)

iplot(fig)

**Exercise:** Colour your scatter plot of house sales according to whether the property is or is not a waterfront property. You should end up with a legend that you can use to, for example, hide all non-waterfront properties.

In [54]:
traces = []

for (label,df) in sales.groupby('waterfront'):
    traces.append(go.Scattergl(x=df['long'], y=df['lat'], 
                               mode='markers',
                               name=label))

layout = go.Layout(xaxis=dict(title='Longitude'),
                   yaxis=dict(title='Latitude')
                  )

fig = go.Figure(data=traces, layout=layout)

iplot(fig)

If we want to use colour to map a *continuous* variable, we use the `color` parameter - in this case, to `marker`. Previously, we set the `color` to a single value (like "red" or "blue"), but we can set it to an array of numbers to colour every point differently.

In [76]:
# Speed against wheels, coloured by weight

scatter = go.Scatter(x=trains['wheels'], y=trains['top_speed_mph'], 
                     mode='markers', 
                     marker=dict(size=10, 
                                 color=trains['weight_tons'],     # colour by weight
                                 colorscale='Reds',               # choose colormap
                                 cmin=0, cmax=100,                # map range (default is min wheels to max wheels)
                                 showscale=True,                  # display colorbar
                                 colorbar=dict(title="Weight")),  # colorbar properties
                     name="Trains")

layout = go.Layout(xaxis=dict(rangemode='tozero',
                              title='Wheels'),
                   yaxis=dict(rangemode='tozero',
                              title='Top speed'))

fig = go.Figure(data=[scatter], layout=layout)

iplot(fig)

**Exercise:** 

* Colour your scatter plot of house sales according to the sale price.
* An example of a logarithmic colour scale is shown [here](https://plot.ly/python/logarithmic-color-scale/). If you have time, see if you can get your house prices to map to a logarithmic colour scale. (Of course, an easier alternative is just to set the `color` value to the log of the price.) 

In [55]:
sample = sales.sample(4000)  

scatter = go.Scattergl(x=sample['long'], y=sample['lat'], 
                       mode='markers',
                       marker=dict(color=sample['price'].apply(np.log),
                                   colorscale='Reds',
                                   showscale=True)
                      )

iplot([scatter])

In [19]:
# NB prices are skewed: log distribution is more symmetric

hist = go.Histogram(x=sales['price'].apply(np.log))

iplot([hist])

In [53]:
sample = sales.sample(4000)  

scatter = go.Scattergl(x=sample['long'], y=sample['lat'], 
                       mode='markers',
                       marker=dict(color=sample['price'],
                                   colorscale=[[0, 'rgb(250, 250, 250)'], 
                                                [1./10000, 'rgb(210, 210, 250)'], 
                                                [1./1000, 'rgb(170, 170, 250)'],  
                                                [1./100, 'rgb(130, 130, 250)'], 
                                                [1./10, 'rgb(90, 90, 250)'],  
                                                [1., 'rgb(50, 50, 250)']],
                                   showscale=True)
                      )

iplot([scatter])

### Hover text

We can set the text to be shown on hover with the `text` attribute:

In [79]:

scatter = go.Scatter(x=trains['weight_tons'], y=trains['top_speed_mph'], 
                     mode='markers', 
                     marker=dict(size=10),
                     text=trains['name'],   # <--
                     name="Trains")

layout = go.Layout(xaxis=dict(rangemode='tozero',
                              title='Weight'),
                   yaxis=dict(rangemode='tozero',
                              title='Top speed'),
                   hovermode="closest")

fig = go.Figure(data=[scatter], layout=layout)

iplot(fig)

Note we also set `hovermode="closest"` in the layout, so that we will see information for the closest point to the curser. You can switch back with the "Compare on hover" option in the plot toolbar - check the difference in behaviour.

**Exercise:** 
* On your scatter plot of house location, set the hover info to display the sale price.
* Now try to set the hover info to display the house grade, sale date, and price. This will require some Pandas wrangling.

In [56]:
sample = sales.sample(4000)  

scatter = go.Scattergl(x=sample['long'], y=sample['lat'], 
                       mode='markers',
                       marker=dict(color=sample['price'].apply(np.log),
                                   colorscale='Reds',
                                   showscale=True),
                       text=sample['price']
                      )

iplot([scatter])

In [58]:
sample = sales.sample(4000)  

text = sample['grade'].apply(lambda x:"Grade {}".format(x)).str.cat([sample['date'].astype(str),
                                                                     sample['price'].apply(lambda x: "${}".format(x))],
                                                                     sep="; ")

scatter = go.Scattergl(x=sample['long'], y=sample['lat'], 
                       mode='markers',
                       marker=dict(color=sample['price'].apply(np.log),
                                   colorscale='Reds',
                                   showscale=True),
                       text=text
                      )

iplot([scatter])