# Plotly

## Installing the dependencies

We need to make sure that we have all of the dependencies we need installed. If you haven't already, please install the required python packages. You can do this inside this notebook by using:

In [None]:
!pip install plotly numpy pandas

By default, plotly wants to use the online cloud product for displaying visualizations. So for a lot of tutorials you will see a line like:

```
import plotly.plotly as py
```

If you do the above, then try to plot in your notebook, plotly will complain about not having an API key.

But we don't want that: we want to show our visualizations in a note book. To do so, we do the following:

In [None]:
import plotly.offline as py
py.init_notebook_mode(connected=False)

Users of Colab need one more thing for Plotly version 4 to run inside cells in notebooks:

In [None]:
import sys

def enable_plotly_in_cell():
  import IPython
  from plotly.offline import init_notebook_mode
  display(IPython.core.display.HTML('''
        <script src="/static/components/requirejs/require.js"></script>
  '''))
  init_notebook_mode(connected=False)

if 'google.colab' in sys.modules:
    get_ipython().events.register('pre_run_cell', enable_plotly_in_cell)

### Creating random points

We'll create some random data points: one array for the x values, one array for y values, a hundred numbers in each.

[`numpy.random.randn`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randn.html) samples points from a standard normal distribution (mean 0, standard deviation 1).

In [None]:
# Create random data with numpy
import numpy as np

N = 100

random_x = np.random.randn(N)
random_y = np.random.randn(N)

# Poke around with one of these
print("Length:", len(random_x))
print("First ten values:", random_x[0:9])

The `graph_objs` submodule of plotly has functions that will create our graph objects for us.

A **trace** is just the name we give a collection of data and the specifications of which we want that data plotted. This terminology is used a lot in the plotly documentation. We can have any number of traces in a plot.

## Scatter plots

Here we will create a trace for a basic scatter plot, with the data coming from our generated data above.

In [None]:
import plotly.graph_objs as go

# Create a trace
trace = go.Scatter(
    x = random_x,
    y = random_y,
    mode = 'markers'
)

Note that nothing was plotted yet!

We can now plot the data in the notebook using the function `py.iplot`. Notice that the first argument is for supplying data. This can be a list of traces to plot (a list because we might want to plot more than one trace in a single plot).

In [None]:
data = [trace]
py.iplot(data)

Play around with the user interface (TODO: say more about this)

Rather than embedding the plot, we can also use the `py.plot` function to export our plot to an html file that we can put online, or email to a colleague. The function returns the URL of the html file, but frequently the plot will open up in another browser window. This function writes an html file at the location you provide in the `filename` keyword parameter.

In [None]:
# Using `plot` instead of `iplot`

plot_url = py.plot(data, filename='my-first-scatter.html')
print("URL to my plot is", plot_url)

A warning though: the HTML file produced will be pretty huge.

Users of Colab need some extra code to get the HTML file:

In [None]:
if 'google.colab' in sys.modules:
  from google.colab import files
  files.download('my-first-scatter.html')

There are a lot of options to modify the visual appearance of the scatter plot. For example we can connect the data points in the order they appear in the arrays:

In [None]:
trace = go.Scatter(
    x = random_x,
    y = random_y,
    mode = 'lines'
)
data = [trace]
py.iplot(data)

Or we can have both dots and lines:

In [None]:
py.iplot([go.Scatter(x = random_x,
                     y = random_y,
                     mode = 'markers+lines')])

Notice that we reduced three separate python statements into one statement above. We did not create a `trace` variable or a `data` variable as intermediate results.

If this approach matches the style you prefer, then this is prefectly valid code, but it can make things harder to read (but might make things easier to modify in some situations).

### What about those scatterplots where the dots change size because of another variable?

This is something we can easily do by creating another random numpy array to hold size information, and set that as a configuration for the marker:

In [None]:
# Size must be positive and be big enough to see!
random_size = np.absolute(np.random.randn(N)) * 30

trace = go.Scatter(
    x = random_x,
    y = random_y,
    mode = 'markers',
    marker = {'size': random_size}
)
data = [trace]
py.iplot(data)

### But I want a graph ..., you know, a line graph ...

Lets create a hundred linear points in the x direction, evenly spaced between zero and one. The function [numpy.linspace](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html) does this for us.

In [None]:
linear_x = np.linspace(0, 1, N)

# Poke around ...
print("Length:", len(linear_x))
print("First ten values:", linear_x[0:9])

We can now make something that looks like a traditional graph:

In [None]:
trace = go.Scatter(
    x = linear_x,
    y = random_y,
    mode = 'lines'
)
data = [trace]
py.iplot(data)

### But I want two graphs...

Let's create two random sets of Y values to plot:

In [None]:
# Mean 0, standard deviation 1
random_y0 = np.random.randn(N)

# Mean 2, standard deviation 1
# Nudge the values up for these guys so the mean is 2 ...
random_y1 = np.random.randn(N) + 2

As mentioned earlier, we can plot multiple data sets by adding more traces to the array we pass to `iplot`:

In [None]:
trace0 = go.Scatter(
    x = linear_x,
    y = random_y0,
    mode = 'lines'
)
trace1 = go.Scatter(
    x = linear_x,
    y = random_y1,
    mode = 'lines+markers'
)

data = [trace0, trace1]
py.iplot(data)

Check out the cool hover effects ...

### Exercise

The code above is repeated below.

Modify the code:
* Generate another random data set called `random_y2`. This data should be sampled from a distribution with mean `-2`.
* Create a third trace called `trace2` using this data.
* Plot all three datasets. We want `trace2` to be represented as markers only.

In [None]:
# Your code here ...
# Modify this code as described above

trace0 = go.Scatter(
    x = linear_x,
    y = random_y0,
    mode = 'lines'
)
trace1 = go.Scatter(
    x = linear_x,
    y = random_y1,
    mode = 'lines+markers'
)

data = [trace0, trace1]
py.iplot(data)

In [None]:
%load solutions/plotly-scatter-add-trace2.py

# SOLUTION
# Run this cell once to load the solution
# Run this cell a second time to actually run the code
# ... or download from https://raw.githubusercontent.com/ualberta-rcg/python-plotting/master/notebooks/solutions/plotly-scatter-add-trace2.py

We can modify traces after the fact. Let's make the dotty-ist graph ever.

Notice that we don't have to redefine `data` below if we don't want to ... it is an array of **references** to the traces.

In [None]:
trace0.mode = "markers"
trace1.mode = "markers"
py.iplot(data)

### We can control many more aspects of the visual representation of our plots

Some obvious choices are:
* color
* marker size
* marker color
* line width
* line color
* name (that shows up in the legend)

Some of the options to do this become a bit arcane, so the [scatter documentation](https://plot.ly/python/line-and-scatter/) will be your friend...

Here is an example:

In [None]:
trace0 = go.Scatter(
    x = linear_x,
    y = random_y0,
    mode = 'lines+markers',
    name = 'Size of thing',
    marker = dict(
      size = 10,
      color = 'rgba(255, 182, 193, .9)',
      line = dict(
        width = 2,
      )
    ),
)
trace1 = go.Scatter(
    x = linear_x,
    y = random_y1,
    mode = 'lines',
    name = 'Size of other thing',
    line = dict(
      width = 5,
      color = 'rgba(50, 50, 255, .5)',
    )
)

data = [trace0, trace1]
py.iplot(data)

### No, I want a green line instead

Again, we can edit these styles after the fact, then re-plot ...

In [None]:
trace1.line['color'] = 'rgba(50, 255, 50, .5)'
py.iplot(data)

### Exercise

Change the name of `trace1` to be `Size of stuff` and replot.

In [None]:
# Your code here ...

In [None]:
%load solutions/plotly-scatter-size-of-stuff.py

# SOLUTION
# Run this cell once to load the solution
# Run this cell a second time to actually run the code
# ... or download from https://raw.githubusercontent.com/ualberta-rcg/python-plotting/master/notebooks/solutions/plotly-scatter-size-of-stuff.py

### Yeah, but you should always label your axes and have a title

For plotly, we need to pass `iplot` a **figure**: a `dict` that describes the data and layout.
Here's an example ...

In [None]:
layout = dict(title = 'The size of a couple of things',
              xaxis = dict(title = 'Time (Seconds)'),
              yaxis = dict(title = 'Width (inches)')
             )
# Create the figure
fig = dict(data=data, layout=layout)

# Plot the figure
py.iplot(fig)

### Yeah, but my data is in a Pandas dataframe

Lets grab one of those Gapminder datasets (Europe), and convert the columns to nice numeric values.

Colab users will need to download the data first:

In [None]:
if 'google.colab' in sys.modules:
    !mkdir -p data
    !wget -P data https://raw.githubusercontent.com/ualberta-rcg/python-plotting/master/notebooks/data/gapminder_gdp_europe.csv

Now load the data into a dataframe

In [None]:
import pandas as pd

df = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')

print("Yucky columns:", df.columns )

# Extract year from last 4 characters of each column name
years = df.columns.str.strip('gdpPercap_')

# Convert year values to integers, saving results back to dataframe
df.columns = years.astype(int)
print("Nice columns:", df.columns )

# Look at the first five rows...
df.head()

We can now slice our dataframe to plot it. Here we plot GDP per-capita data for the Netherlands and France.

In [None]:
trace0 = go.Scatter(
    x = df.columns,
    y = df.loc['Netherlands'],
    mode = 'lines'
)
trace1 = go.Scatter(
    x = df.columns,
    y = df.loc['France'],
    mode = 'lines+markers'
)

data = [trace0, trace1]
py.iplot(data)

### Exercise

That last graph was awesome ... BUT:

* The graph should be titled 'GDP per-capita for the Netherlands and France'
* The Netherlands graph is called 'trace 0', and the France graph is called 'trace 1'. They should be named after the country.
* The x-axis should be labeled 'Year'
* The y-axis should be labeled 'GDP per-capita'
* The France graph should be blue (`'rgb(0, 0, 255)'`) and the Netherlands graph should be orange (`'rgb(255, 127, 0)'`)

Apply what you have learned in this notebook to fix the graph!

In [None]:
# Your code here ...

In [None]:
%load solutions/plotly-scatter-netherlands-france.py

# SOLUTION
# Run this cell once to load the solution
# Run this cell a second time to actually run the code
# ... or download from https://raw.githubusercontent.com/ualberta-rcg/python-plotting/master/notebooks/solutions/plotly-scatter-netherlands-france.py

### Summary

Here are some key points:

* Data is organized in traces, which hold x,y data and style information
* `iplot` puts a plot in a notebook, `plot` generates a web page.
* One or more traces can be plotted as data
* Data can be combined with layout to make a figure

We will see this same overall pattern with other types of plotly plot types.

What we have seen so far is the tip of the iceberg: consult the Plotly documentation (and Stackoverflow) to find out all the options.