# Intro to Plotly

**Plotly** is a library which helps create interactive, deployable and publication-ready graphs. The library allows graph building in multiple languages like Python, R, Matlab and Javascript. This library was initially written in javascript, and named plotly.js. Support for all other languages is in the form of respective APIs which process native code.

In this micro-course we will learn the python implementations to build interactive plots using **plotly**.

Simple graphs, 3D plots, Dashboards, APIs, Animations etc. can be constructed using Plotly. We'll start with simple graphs, 3D plots and some basic dashboards.

<b>A note before we begin: Plotly is a very well documented library. All the example code (other than the exercises) you see in this course is taken from the Plotly documentation. Refer to Plotly documentation to learn more about various graphs available (https://plot.ly/python).</b>

## Importing Plotly

A simple import statement can be used to import plotly library. The version of the plotly library that is installed can be checked using 'plotly.\_\_version\_\_' statement.

```python
import plotly
plotly.__version__

>>> '4.1.1'
```

## Plotting graphs with Plotly

There is one module in Plotly that is important for creating graphs. It is:
* ```graph_objects``` - Each graph is an object. This module contains functions which generate graph objects.

There are 3 parts to every graph:
1. Data or Trace: This is usually a Python list object and contains all the data that we would want to plot. A trace is a collection of data points and their specifications that we would want to plot.
2. Layout: This object is used to change the features of the graph like axis titles, spacing, fonts etc. which are unrelated to the data itself.
3. Figure: This is a dictionary-like object which contains both the data object and the layout object and this defines the graph.

In order to start creating graphs using plotly, we need to import 'graph_objects' modules.

```python
import plotly.graph_objects as go
```

#### Exercise

* Import plotly library and find out what version of plotly is installed.
* Import graph_objects module from the plotly library.

## Simple graphs in Plotly

There are many types of simple plots that can be built using plotly. Some of the plots that are commonly used by Data Scientists, which we will learn in this micro-course, are:

* Scatter plots
* Line charts
* Bar charts
* Bubble charts

### Scatter plots

Scatter plots are simple point graphs which are used to plot observations on a coordinate system. If the coordinate system has two dimensions it is a 2D plot and if it has 3 dimensions it becomes a 3D scatter plot. N-dimensional observations cannot be plotted and visualized for obvious reasons (limitations of technology and human perception).

Let us see how we can construct a scatter plot on plotly.

#### Step 1 - Defining the trace object

The trace object consists of the data and some plot formatting options. We can specify the observations for each feature as an array, i.e., If $x_1, x_2, x_3$ are 3 predictors and $y$ is the predicted variable, then data may be specified as:

$x_1 = [1,2,3]$,
$x_2 = [4,5,6]$,
$x_3 = [7,8,9]$,
$y = [12,15,18]$

In the above scenario, observation $O_1$ would be $x_1=1,\ x_2=4,\ x_3=7 \ and \ y=12$; $O_2$ would be $x_1=2,\ x_2=5,\ x_3=8 \ and \ y=15$ and so on.

Now, let us assume there is only one predictor $x$ and one predicted variable $y$. Let $x=[1,2,3]$ and $y=[4,5,6]$.
The code to create a trace with above given data would be:

```python
trace1 = go.Scatter(x=[1,2,3], y=[4,5,6], marker={'color': 'red', 'symbol': 104, 'size': 10}, 
                    mode="markers",  text=["one","two","three"], name='1st Trace')
```

Note: Multi-dimensional data is difficult to visualize using traditional plotting.

In the above code, the trace we have created has a scatter plot object created. The object includes both data (x and y) and formatting options for the figure.

#### Exercise

Create a trace with $x$ as integer numbers from 1 to 30 and $y$ to be the 'square of $x$' ($y=x^2$) using a scatter plot object. You may use the same format options (apart from $x$ and $y$) as above example code. (Hint: use ```Numpy``` to generate integers from 1 to 30. If you are not familiar with Numpy, feel free to check out our micro-course "Intro_to_Numpy".)

#### Step 2 - Creating the Data and Layout objects

The data object consists of the data we would want to plot in the format that is consumable by Plotly engine. We can use the trace object and recast it into a list, and then use it as the data object. This is because the Figure object which we would create in Step 3 only accepts a list or tuple as data object. The syntax to convert a trace object into a list is to simply enclose the trace object in square braces:

```python
data = [trace]
```
Alternatively, we can simply pass the trace object which is converted as a list directly to the Figure object. Syntax of this will be shown in Step 3.


The layout object can be created by instantiating the 'Layout' class of the ```graph_objects``` module. A 'layout' object accepts attributes such as title ('title of the graph'), xaxis (title/label for x-axis), yaxis (title/label for y-axis) etc. The syntax for creating a simple layout object is:

```python
layout = go.Layout(title="First Plot", xaxis={'title':'x1'}, yaxis={'title':'x2'})
```

#### Exercise

Create a data object and a layout object for the trace created in the previous exercise.

#### Step 3 - Creating the figure object and plotting the graph

The third step involves creation of a figure object. The figure object is also an instantiation of the 'Figure' class in the graph objects module. A figure object accepts the data and the layout objects as parameters and creates a plot using them. Once the figure object is created, the plot can be visualized using the ```.show``` method. Just add ```.show()``` at the end of the figure object. (This applies to Plotly version 4 and above, make sure you have the newest plotly version installed. If you want to learn more about how to display figures, refer to the official [documentation](https://plot.ly/python/renderers/).)


```python
figure = go.Figure(data=data, layout=layout)
figure.show()
or
# Skipping creation of extra data variable/object
figure = go.Figure(data=[trace], layout=layout)
# Skipping the call to ".show()"
figure
```

#### Exercise

Create a figure object using the data and layout objects created in above exercises. Just use the example code given above. Also, use the ```.show``` method to visualize the scatter plot.

### Line charts

The line chart follows the same format of plotting - create trace, create data and layout objects, create figure object and visualize the plot. In fact, the line object uses the same syntax and code as the scatter plot shown above. The trace is created using an instantiation of the Scatter class from ```graph_objects``` module. The format options change the way the points are plotted. When defining 'mode' within the trace object, we may use 3 options:

* **markers** - shows observations as dots or symbols
* **markers+lines** - shows observations as dots or symbols and draws a line connecting them
* **lines** - shows the line connecting all observations. Easier to visualize trend in observations

The syntax to create a trace object which would render a line graph is given below:

```python
trace2 = go.Scatter(x=[1,2,3], y=[4,5,6], marker={'color': 'red', 'symbol': 104, 'size': 10}, 
                    mode="lines",  text=["one","two","three"], name='2nd Trace')
```

As you can see the only difference between the above scatter plot and the line plot is the ```mode```.

#### Exercise

Recreate your above Scatter plot (where $y=x^2$) as a line graph. 

* Use 'lines' as the mode in the trace object
* Also, create layout, figure objects and visualize the plot

### Multiple series in same chart

Now can we add multiple data series and visualize them together on a single plot? This is a common need and feature of any realistic visualization from which inferences are expected to be made.

It would be great if we can visualize two different data series, with some similarity among them, in a single graph, so that we may compare them. A single figure can contain multiple traces, which make up the data object.

Let us look at an example to create a line graph with multiple data series:

```python
# Creating the data x, y0, y1 and y2. Here x is independent variable and y0, y1, y2 are depedent/derived variables.
import numpy as np
import plotly.graph_objects as go
N = 100
random_x = np.linspace(0, 1, N)
random_y0 = np.random.randn(N)+5
random_y1 = np.random.randn(N)
random_y2 = np.random.randn(N)-5

# Create multiple traces
trace0 = go.Scatter(x = random_x,
                    y = random_y0,
                    mode = 'lines',
                    name = 'lines')
trace1 = go.Scatter(x = random_x,
                    y = random_y1,
                    mode = 'lines+markers',
                    name = 'lines+markers')
trace2 = go.Scatter(x = random_x,
                    y = random_y2,
                    mode = 'markers',
                    name = 'markers')

# Create data object with multiple traces
data = [trace0, trace1, trace2]

# Visualize the plot using Figure function and ".show()" method
figure = go.Figure(data=data)
figure.show()

# Output
```

<img src="multi-line-graph.png" style="width:50vw">

<br>

#### Exercise

Create a multi-series line graph using the following data inputs:
* x = [1,2,3,...30]
* y_1 = a random integer each, between 5 and 10 - both inclusive, for as many observations as x has
* y_2 = sine of x
* y_3 = y_1 + 5

Note that you are required to create 3 traces:
* trace_1 with x and y_1 and markers as mode
* trace_2 with x and y_2 and lines as mode
* trace_3 with x and y_3 and lines+markers as mode


### Bar Charts

Bar charts are generally used to depict an aggregated metric across multiple categories, so as to compare all the categories. The independent variable, x is generally the categorical variable and the dependent variable, y is the numeric metric.

If we have simple data of say counts/values for each category, say you have a zoo where you are exhibiting 20 giraffes, 14 orangutans and 23 monkeys, you may represent this data using a simple bar chart. We can define the animal categories as x values (as they are independent categories) and the count of each animal as y value (dependent values).

```python
trace = go.Bar(x=['giraffes', 'orangutans', 'monkeys'],
               y=[20, 14, 23])
data = [trace]

go.Figure(data)

# Output
```
<img src="simple-bar.png" style="width:40vw">


#### Representing multiple series in a single bar chart

In bar charts too, there is a provision to represent multiple series on the same plot. The way we represent multiple series in a bar chart is:
1. Create multiple traces that you want to plot and add them to the data object.
2. While creating the layout object, use a parameter called 'barmode'. You may set barmode to 'group' or 'stack'.
3. Create figure object using data and layout objects and visualize the plot.

Let's say I want to compare the giraffe, orangutan and monkeys counts of San Francisco Zoo to the Los Angeles Zoo. We can construct two traces, one for each data series, and visualize both series as a grouped bar chart or a stacked bar chart.

Example code for **grouped chart**:

```python
# Create trace 1 for San Francisco Zoo
trace1 = go.Bar(x=['giraffes', 'orangutans', 'monkeys'], 
                y=[20, 14, 23], 
                name='SF Zoo')

# Create trace 2 for Los Angeles Zoo
trace2 = go.Bar(x=['giraffes', 'orangutans', 'monkeys'],
                y=[12, 18, 29],
                name='LA Zoo')

# Create data object with both traces
data = [trace1, trace2]

# Create layout object with barmode set to group
layout = go.Layout(barmode='group')

# Create figure object and visualize plot
fig = go.Figure(data=data, layout=layout)
fig.show()

# Output
```

<img src="grouped-bar.png" style="width:40vw">

<br>

Example code for **stacked chart**:

```python
# We only need to change the layout object with barmode set to stack, keep all other codes the same.
layout = go.Layout(barmode='stack')

# Output
```

<img src="stacked-bar.png" style="width:40vw">

<br>

Note that the only difference in code for a grouped and stacked chart is the barmode parameter which is set to either 'group' or 'stack'.

#### Exercise

Create a grouped bar chart with the following data:
* x is type of employee in a company and x = ['Full-Time','Intern','Contractor']
* y_Corp_A is the count of employees in Company A and y_Corp_A = [345,17,43]
* y_Corp_B is the count of employees in Company B and y_Corp_B = [568,6,27]

Create separate traces - one using x and y_Corp_A and the other using x and y_Corp_B. Refer to the example code above to create your plot.

#### Formatting a bar chart

The bar chart can be customized:
* to show different text labels upon hover or always-shown labels - 'text' and 'textposition' parameters.
* individual bars can be emphasized with different color - Inside the marker dictionary, use 'color' as one of the keys.
* transparency can be added to the bar colors - use the 'opacity' parameter and set value anywhere between 0 and 1, i.e. percentage upto 100.

<b>Custom Text Labels</b>
Assume that in above giraffes, orangutans and monkeys example, we would like to convey that the Giraffes were imported from Africa, the Orangutans were bought in from Asia and the Monkeys were bought in from South America. We can show this as text. By default you can see that labels are displayed upon hovering. If we use the 'textposition' parameter, we can make sure labels always show and we can decide where the labels are to be positioned.

The syntax to do each of the above is given below:

```python
# Syntax to set custom hover text
data = [go.Bar(
               x=['giraffes', 'orangutans', 'monkeys'],
               y=[20, 14, 23],
               text=['Africa','Asia','South America']
        )]

go.Figure(data)

# Output
```

<img src="custom-tags.png" style="width:50vw">

<br>

```python
# Syntax to set label at specific position as always showing
data = [go.Bar(
               x=['giraffes', 'orangutans', 'monkeys'],
               y=[20, 14, 23],
               text=['Africa','Asia','South America'],
               textposition='auto'
       )]

go.Figure(data)

# Output
```

<img src="custom-tag-always.png" style="width:50vw">

<br>

<b>Custom Bar Colors</b>
Now let us say we want to color giraffes with 'yellow', orangutans as 'orange' and monkeys as 'brown' we can do so using the color parameter in the marker dictionary.

```python
# Setting color to each bar
data = [go.Bar(
               x=['giraffes', 'orangutans', 'monkeys'],
               y=[20, 14, 23],
               marker=dict(
               color=['yellow','orange','brown'])
       )]

go.Figure(data)

# Output
```

<img src="custom-colors.png" style="width:50vw">

<br>

```python
# You can also use rgba notation for colors - red, green, blue, alpha. Here alpha is opacity of the color
data = [go.Bar(
               x=['giraffes', 'orangutans', 'monkeys'],
               y=[20, 14, 23],
               marker=dict(
               color=['rgba(255, 255, 0, 0.3)','rgba(255, 165, 0, 0.6)','rgba(165, 42, 42, 0.9)'])
       )]

go.Figure(data)

# Output
```

<img src="custom-colors-trans.png" style="width:50vw">

<br>

<b>Custom Overall Opacity</b>

We can also set overall color opacity for all bars using the opacity parameter.

```python
# Refer to the opacity parameter in marker dictionary
data = [go.Bar(
            x=['giraffes', 'orangutans', 'monkeys'],
            y=[20, 14, 23],
            marker=dict(
            color=['yellow','orange','brown']),
            opacity=0.75
    )]

go.Figure(data)

# Output
```

<img src="custom-colors-all-trans.png" style="width:50vw">

<br>

#### Exercise

Use the same bar chart as constructed in the above exercise you have completed for group bar chart. Refer to the data as below:
* x is type of employee in a company and x = ['Full-Time','Intern','Contractor']
* y_Corp_A is the count of employees in Company A and y_Corp_A = [345,17,43]
* y_Corp_B is the count of employees in Company B and y_Corp_B = [568,6,27]

In this bar chart chart,
* add custom hover labels as 'medium-pay','low-pay' and 'high-pay' for Full-Time, Intern and Contractors respectively.
* Color the Full-Time stack as Red for Corp_A and Green bars for Corp_B, the other bars of Corp_A can take blue as color and other bars of Corp_B can take orange as color, with an overall opacity of 60% (i.e., opacity=0.6).

### Bubble charts

The bubble chart is a specialized scatter plot. In a typical 2-dimensional scatter plot, observations have 2 dimensions/features - x and y. The bubble chart has the capability to present multiple dimensions/features of observations in a single plot. The typical observations of a scatter plot shown in the form of dots or tiny circles can be increased to show variable sized circles depicting each observation. The size of each circle may be determined by a 'third feature' of the observation. The color of the circle may be determined by the 'fourth feature' of the observation or data set. The plot would showcase multiple sized circles scattered across the X-Y coordinate system. This is called a Bubble chart.

A bubble chart can be created from a scatter plot object by modifying the 'marker' dictionary. The parameters that can be modified to create a bubble chart are:
1. size - this parameter/key can take a list of values. The list should be the same length as the number of observations. It determines the size of each bubble.
2. color - this parameter/key can take a list of values. The list should be the same length as the number of observations. It determines the color of each bubble.

<b>Note: It is important to observe that the fourth feature that can be represented using color of the bubble needs to be a categorical variable. Also, the third feature which is represented by size of bubble is generally numerical in nature.</b>

An example bubble chart code is:

```python
# Create scatter plot object, but add color and size parameters to the 'marker' dictionary
trace0 = go.Scatter(
    x=[1, 2, 3, 4],
    y=[10, 11, 12, 13],
    mode='markers',
    marker=dict(
        color=['rgb(93, 164, 214)', 'rgb(255, 144, 14)',
               'rgb(44, 160, 101)', 'rgb(255, 65, 54)'],
        opacity=[1, 0.8, 0.6, 0.4],
        size=[40, 60, 80, 100],
    )
)

# Visualize plot
go.Figure([trace0])

# Output
```

<img src="basic-bubble.png" style="width:50vw">

<br>
Another example with overlapping bubbles:

```python
# Create scatter plot object, but add color and size parameters to the 'marker' dictionary
trace0 = go.Scatter(
    x=[1, 1.5, 1.48, 1.75, 1.85, 1.95, 2, 2.25, 2.25, 2.28, 2.5, 2.5, 3],
    y=[12, 12, 12.3, 13, 12.1, 12.2, 12, 11.5, 12.5, 12.25, 11.2, 11.5, 13.5],
    mode='markers',
    marker=dict(
        color=['rgb(93, 164, 214)', 'rgb(255, 144, 14)',
               'rgb(44, 160, 101)', 'rgb(255, 65, 54)',
               'rgb(52, 63, 123)','rgb(135, 12, 73)',
               'rgb(65, 108, 25)', 'rgb(125, 230, 20)',
               'rgb(56, 35, 222)','rgb(130,30,30)',
               'rgb(46, 123, 123)','rgb(124, 24, 1)',
               'rgb(235, 141, 55)'],
        size=[20, 50, 35, 40, 45, 40, 45, 50, 55, 60, 40, 60, 10],
    )
)

# Visualize plot
go.Figure([trace0])

# Output
```

<img src="overlap-bubble.png" style="width:50vw">

<br>

#### Exercise

Create a bubble chart with the following data:
* x is 10 random numbers chosen within a range of 10 and 25. They can take floating point values.
* y is 10 random numbers chosen within a range of 5 and 20. They can also take floating point values.
* z is third feature size, a list of 10 sizes, random integers between 20 and 65.

You may add any colors to the bubbles as you want. It is optional.

## Learn more about Plotly

In this notebook, we have covered the basics of Plotly, which are trace, data, layout, and figure objects. With these objects we can create scatter plots, line charts, bar charts, and bubble charts. There are also a lot more advanced charts that we can plot by using Plotly, like donut charts, gauge charts, histograms, box plots, heatmap, 3D plots, etc. If you want to learn more about Plotly, check out our course at https://refactored.ai. Our course covers everything from introductory Python to Pandas, Plotly, Bokeh, and to machine learning techniques.