# Plotly Workshop Tutorial
 
Welcome to CodeRATS's Plotly workshop! This consists of two parts:
- Part 1: Code-together
    - Part 1a: Basics of python and plotly.express
    - Part 1b: Advanced Plotly features with Plotly Graph Objects
- Part 2: Bring your own code 


# Part 1a: Basics of python and plotly

Plotly makes figures in 4 steps:
1. Making a "canvas" or "figure" to draw on
2. Adds in your data points
3. Customize your visual (colors, point symbols, axes names and titles, etc.)
4. Annotate the visual 

There are two versions of Plotly:
- Plotly Express: super easy to use
- Plotly Graph Objects (GO): more customizable

We will start off with using Plotly Express to make a scatter plot

In [1]:
# import packages
import pandas as pd
import plotly.express as px

In [2]:
# Step 1: load your data 
df = px.data.iris()  # reads in data as pandas dataframe (like a table)
display(df)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,species_id
0,5.1,3.5,1.4,0.2,setosa,1
1,4.9,3.0,1.4,0.2,setosa,1
2,4.7,3.2,1.3,0.2,setosa,1
3,4.6,3.1,1.5,0.2,setosa,1
4,5.0,3.6,1.4,0.2,setosa,1
...,...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica,3
146,6.3,2.5,5.0,1.9,virginica,3
147,6.5,3.0,5.2,2.0,virginica,3
148,6.2,3.4,5.4,2.3,virginica,3


Let's compare petal width (`petal_width`) to petal length (`petal_length`) in a scatter plot. Run the following code block, and notice that you made a scatter plot in two lines of code!

Try out different viewing methods:
- Hovering your mouse over the points to get the exact data used to plot it
- Selecting the 'zoom' option in top right menu (magnifying glass) and zoom in on a regions you want to view by clicking and dragging your mouse
- Selecting 'Pan' option in top right menu (coordiate axes), and move around the plot by clicking and dragging
- Reset view by clicking 'Autoscale' in the top right menu (next to the home figure)

In [6]:
# 1. Make the figure
fig = px.scatter(df,               # df: a pandas dataframe with the data you want to plot. Rows are "samples" and columns are "features" you want to plot, categorize, etc.
                 x="petal_width",  # x: name of column in df you want to use in the x-axis of the scatter plot
                 y="petal_length") # y: name of column in df you want to use in the y-axis of the scatter plot

# 2. Show the figure in the notebook
fig.show()

Say you want to color these points by `species`. It is as simple as including the `color=` parameter into `px.scater(...)`. plotly.express will automatically color the points according to the data under the `species` column in your dataframe, and include a legend.

Try:
- Clicking on one of the datapoints in the legend to remove that group of points from the plot. Click again to see it again
- Instead of `color`, use `symbol` to discriminate the points coming from different `species`. 
- Add `hover_data=["sepal_length", "sepal_width"]` into `px.scatter(...)` so when you hover over the points, you can also see the `sepal_length` and `sepal_width` values for that point.
- Add marginal plots to supplement your scatter plot by adding `marginal_y="violin"` and `marginal_x="box"` into `px.scatter(...)`. More documentation and options described at https://plotly.com/python/marginal-plots/
- Add a title and change the default axis names (find how to do it at https://plotly.com/python/figure-labels/)


In [9]:
# 1. Make the figure
fig = px.scatter(df,               
                 x="petal_width",  
                 y="petal_length",
                 # color=
                ) 

# 2. Show the figure in the notebook
fig.show()

Try some of the other plotly.express plots! View some of the basic options:
- https://www.geeksforgeeks.org/python-plotly-tutorial/
- https://plotly.com/python/plotly-express/

# Part 1b: Advanced Plotly features with Plotly Graph Objects

Plotly express is great for quickly making a plot for exploring your data. Sometimes, you need a cleaner figure, or you just want to plot something more complicated. Plotly Graph Objects allows you to work directly with plotly figure components so you can easily customize.

Let's make a scatter plot again, but this time with plotly graph objects. We will be loosely following this tutorial: https://towardsdatascience.com/tutorial-on-building-professional-scatter-graphs-in-plotly-python-abe33923f557

In [11]:
# import the plotly graph objects package
import plotly.graph_objects as go

Step 1: Initialize your figure and add data to your plot. To do this, we use `add_trace(...)`. A `trace` is like a layer of data (or a graph object) to add to the figure. You can call `fig.add_trace(...)` multiple times to add multiple `traces` (say a scatter plot overlaying a bar plot).

For now, we will only add one trace, which will be a scatter plot. To make this, we will call `go.Scatter(...)` to make a scatter plot graph object. Then we will add this scatter graph object to our figure `my_fig` using `my_fig.add_trace(...)`.

In [25]:
# Step 1: make the figure
my_fig = go.Figure()

# Step 2: Add data.
scatter_graph_object = go.Scatter(x = df["petal_width"], 
                                  y = df["petal_length"],
                                  mode = "markers",
                                 )
print(f'scatter_graph_object is a {type(scatter_graph_object)}')

my_fig.add_trace(scatter_graph_object)

scatter_graph_object is a <class 'plotly.graph_objs._scatter.Scatter'>


Note that unlike plotly express, plotly graph objects do not generate a graph title or axis title. These need to be explicitly defined.

To do this, call `update_layout(...)`. Check out the documentation at https://plotly.com/python/figure-labels/, and make the following updates:

1. Add in graph title and axis titles
1. Put the y-axis ticks in a better format
1. Ensure the measurements of each axis are clear
1. Change the line color of the axes from white to light grey
1. Change the background color to white
1. Change the color of all text on the graph to light grey
1. Change the color of the data points to a darker shade of blue, so they stand out more



In [26]:
my_fig.update_layout(
    autosize=False,
    width=1200,
    height=500,
    title={'text':'Iris Petal Sizes',
           'font_size':30,
           'xanchor':'center',
          'x':0.35}
)
my_fig.show()

To give different categories different colors, we have to add them one at a time as separate traces. Plotly express does this automatically if you give it a column name to the `color` or `symbol` parameters. However, sometimes you want to do some specific to each category other than just the color or symbol of the marker.

The approach here is to make a `for` loop. For each unique category, make a new trace (scatter plot graph object) with the corresponding data from your table and add it to your figure.

In [29]:
unique_species = pd.unique(df['species']).tolist()  # Get the unique values in the species column of your data
print(unique_species)

['setosa', 'versicolor', 'virginica']


You will need to specify the colors to use to plot each category. For now, we will make a dictionary with the species as the `key` and the color (in hexcode) as the `value`. The hexcodes correspond to different colors. View more colors at https://htmlcolorcodes.com/.

Then, make the figure!
1. Initialize figure
2. Add traces
3. Update layout

In [38]:
species_colors_dict = {'setosa': "#2471A3", 
                       'versicolor': "#BA4A00", 
                       'virginica': "#884EA0"
                      }

# 1. initialize figure
fig_species_colored = go.Figure()

# 2. add traces
for species in unique_species:
    # Get only the rows in df where the species value matches the current species
    species_df = df[df['species'] == species]  
    
    # Add a scatter graph object
    fig_species_colored.add_trace(go.Scatter(x = species_df["petal_width"], # Using only the data corresponding to the current species
                                             y = species_df["petal_length"],
                                             mode = "markers",
                                             name = species, # label the points with the species name. Default will be trace_0, trace_2, etc.
                                             marker = dict(color = species_colors[species]) # Using the species_colors_dict to get the corresponding color for this species
                                            )
                                 )
#####

# 3. update figure layout
fig_species_colored.update_layout(plot_bgcolor = "white",  # background color
                                  font = dict(color = "#909497"),
                                  title = dict(text = "Iris Petal Sizes"),
                                  xaxis = dict(title = "Petal Width", linecolor = "#909497"),
                                  yaxis = dict(title = "Petal Length", tickformat = ",", linecolor = "#909497"))
    

Maybe it would be easier to view graph if each species had its own plot. However, we still want to be able to compare them. A subplot will allow us to arrange multiple plots on the same figure.

In [40]:
from plotly.subplots import make_subplots

In [47]:
#create the blank graph object with make_subplots(...) instead of go.Figure()
fig_subplots = make_subplots(rows = 1, 
                             cols = 3, #provide the dimensions of the subplot
                             subplot_titles= unique_species) #give each subplot a title


# like before, iterate through each category (species). To make this easier to add to subplot
for i in range(len(unique_species)): # i = 0, 1, 2
    species = unique_species[i]
    # Get only the rows in df where the species value matches the current species
    species_df = df[df['species'] == species]  
    
    # Add a scatter graph object
    fig_subplots.add_trace(go.Scatter(x = species_df["petal_width"],
                                      y = species_df["petal_length"],
                                      mode = "markers",
                                      name = species, 
                                      marker = dict(color = species_colors[species])
                                     ),
                           row = 1,     # subplots use 1-indexing convention
                           col = i + 1  # which column in subplot to add this trace
                          )
#####

# Update layout
fig_subplots.update_layout(plot_bgcolor = "white",  # background color
                           font = dict(color = "#909497"),
                           title = dict(text = "Iris Petal Sizes"),
                           )

# Update all the subplots' axes at the same time
fig_subplots.update_xaxes(title = "Petal Width", linecolor = "#909497")
fig_subplots.update_yaxes(title = "Petal Length", tickformat = ",", linecolor = "#909497")



Plot still looks a little busy with all the redundant axes labels. The legend is also redundant since we already have the data separated out. To make it look nicer, we will:

1. Remove the legend
1. Remove the duplicated axis titles
1. Remove axes lines
1. Ensure all subplots are displaying a consistent range for both axes

In [61]:

fig_subplots_clean = make_subplots(rows = 1, 
                                   cols = 3, 
                                   subplot_titles= unique_species,
                                   shared_yaxes = True             # Have all plots share the same axis
                                  ) 


# like before, iterate through each category (species). To make this easier to add to subplot
for i in range(len(unique_species)): # i = 0, 1, 2
    species = unique_species[i]
    # Get only the rows in df where the species value matches the current species
    species_df = df[df['species'] == species]  
    
    # Add a scatter graph object
    fig_subplots_clean.add_trace(go.Scatter(x = species_df["petal_width"],
                                            y = species_df["petal_length"],
                                            mode = "markers",
                                            name = species, 
                                            marker = dict(color = species_colors[species])
                                           ),
                                 row = 1,    
                                 col = i + 1 
                                )
#####

# Update layout
fig_subplots_clean.update_layout(plot_bgcolor = "white", 
                                 font = dict(color = "#909497"),
                                 title = dict(text = "Iris Petal Sizes"),
                                 showlegend = False     # Remove legend 
                                )

#fix the x axes range because shared_xaxes can't be used
fig_subplots_clean.update_xaxes(linecolor = "white", 
                                range = [0, 3],          # [xaxis minimum, xaxis maximum]
                                tickvals = [0, 1, 2, 3]) # list of values to add tick marks
fig_subplots_clean.update_yaxes(tickformat = ",", linecolor = "white") 

#use the add_annotations() command to generate both the x-axis and y-axis titles instead of update_axes(title = ...) and update_yaxes(title = ...)

#x axis title
fig_subplots_clean.add_annotation(text = "Petal Width",
                                  xref = "paper",
                                  yref = "paper",
                                  x = 0.5,
                                  y = -0.1,
                                  showarrow = False)

#y axis title
fig_subplots_clean.add_annotation(text = "Petal Length",
                                  xref = "paper",
                                  yref = "paper",
                                  x = -0.08,
                                  y = 0.5,
                                  showarrow = False,
                                  textangle = -90)

fig_subplots_clean.show()


Lastly, let's annotated these plots.

1. Add all data points to each subplot
1. Add commentary to each plot to give more insight into the data
1. Fix the hover info label (for the interactive version)
1. Make the titles look better
1. Reduce the font size of axes ticks
1. Add a line of best fit

In [59]:

fig_subplots_fancy = make_subplots(rows = 1, 
                                   cols = 3, 
                                   subplot_titles= unique_species,
                                   shared_yaxes = True             # Have all plots share the same axis
                                  ) 


# like before, iterate through each category (species). To make this easier to add to subplot
for i in range(len(unique_species)): # i = 0, 1, 2
    species = unique_species[i]
    # Get only the rows in df where the species value matches the current species
    species_df = df[df['species'] == species]  

    
    # Plot only the points for current species with corresponding color
    fig_subplots_fancy.add_trace(go.Scatter(x = species_df["petal_width"], # just species_df
                                            y = species_df["petal_length"],
                                            mode = "markers",
                                            name = species, 
                                            marker = dict(color = species_colors[species])  # colored for current category
                                           ),
                                 row = 1,    
                                 col = i + 1 
                                )
    
    # Plot all the points in dataframe in gray
    fig_subplots_fancy.add_trace(go.Scatter(x = df["petal_width"],  # full df
                                            y = df["petal_length"],
                                            mode = "markers",
                                            name = species, 
                                            marker = dict(color = "#909497"),  # gray color
                                            opacity = 0.3 # Setting this trace to be more transparent
                                           ),
                                 row = 1,    
                                 col = i + 1 
                                )
    
    # advanced: add a line of best fit (best_fit_y) for the current species data.
#     fig_subplots_fancy.add_trace(go.Scattergl(x = species_df["petal_width"],
#                                               y = best_fit_y,
#                                               line = dict(color = df_plot["Color"].tolist()[0]), 
#                                               hoverinfo = "skip"
#                                              ),
#                                  row = 1,    
#                                  col = i + 1 
#                                 )
#####

# Update layout
fig_subplots_fancy.update_layout(plot_bgcolor = "white", 
                                 font = dict(color = "#909497"),
                                 showlegend = False     # Remove legend 
                                )

#fix the x axes range because shared_xaxes can't be used
fig_subplots_fancy.update_xaxes(linecolor = "white", 
                                range = [0, 3],          # [xaxis minimum, xaxis maximum]
                                tickvals = [0, 1, 2, 3]) # list of values to add tick marks
fig_subplots_fancy.update_yaxes(tickformat = ",", linecolor = "white") 

#use the add_annotations() command to generate both the x-axis and y-axis titles instead of update_axes(title = ...) and update_yaxes(title = ...)

#x axis title
fig_subplots_fancy.add_annotation(text = "Petal Width",
                                  xref = "paper",
                                  yref = "paper",
                                  x = 0.5,
                                  y = -0.1,
                                  showarrow = False)

#y axis title
fig_subplots_fancy.add_annotation(text = "Petal Length",
                                  xref = "paper",
                                  yref = "paper",
                                  x = -0.08,
                                  y = 0.5,
                                  showarrow = False,
                                  textangle = -90)

# Title annotation
fig_subplots_fancy.add_annotation(text = "Iris Petal Sizes",
                                  xref = "paper",
                                  yref = "paper",
                                  x = -0.08,
                                  y = 1.10,
                                  showarrow = False,
                                  xanchor = "left",
                                  font = dict(color = "#404647", size = 16))

#sub-title annotation
i = 1
for species in unique_species:
    fig_subplots_fancy.add_annotation(text = species,
                                      xref = f'x{i}',
                                      yref = "paper",
                                      x = 20,
                                      y = 1.02,
                                      showarrow = False,
                                      xanchor = "left",
                                      font = dict(size = 14, color = "#404647")
                                     )
    i += 1

    
#create author of the graph
fig_subplots_fancy.add_annotation(text = "Author: My name", # add your name!
                                  xref = "paper",
                                  yref = "paper",
                                  x = 1.005,
                                  y = -0.145,
                                  showarrow = False,
                                  font = dict(size = 12),
                                  align = "right",
                                  xanchor = "right")

fig_subplots_fancy.show()


# Part 2: BYOC

Now it's your turn! Keep playing around with the iris dataset and try other plots, or load in your own data you want to visualize.

In [60]:
# Step 1: Load your data

# Step 2: Format or annotate data

# Step 3: Initialize your figure

# Step 4: Update layout

# Step 5: Annotate