# Plotly Workshop Tutorial
 
Welcome to CodeRATS's Plotly workshop! This consists of two parts:
- Part 1: Code-together
    - Part 1a: Basics of python and plotly.express
    - Part 1b: Advanced Plotly features with Plotly Graph Objects
- Part 2: Bring your own code 


# Part 1a: Basics of python and plotly

Plotly makes figures in 4 steps:
1. Making a "canvas" or "figure" to draw on
2. Adds in your data points
3. Customize your visual (colors, point symbols, axes names and titles, etc.)
4. Annotate the visual 

There are two versions of Plotly:
- Plotly Express: super easy to use
- Plotly Graph Objects (GO): more customizable

We will start off with using Plotly Express to make a scatter plot

In [None]:
# import packages
import pandas as pd
import plotly.express as px

In [None]:
# Step 1: load your data 
df = px.data.iris()  # reads in data as pandas dataframe (like a table)
display(df)

Let's compare petal width (`petal_width`) to petal length (`petal_length`) in a scatter plot. Run the following code block, and notice that you made a scatter plot in two lines of code!

Try out different viewing methods:
- Hovering your mouse over the points to get the exact data used to plot it
- Selecting the 'zoom' option in top right menu (magnifying glass) and zoom in on a regions you want to view by clicking and dragging your mouse
- Selecting 'Pan' option in top right menu (coordiate axes), and move around the plot by clicking and dragging
- Reset view by clicking 'Autoscale' in the top right menu (next to the home figure)

In [None]:
# 1. Make the figure
fig = px.scatter(df,               # df: a pandas dataframe with the data you want to plot. Rows are "samples" and columns are "features" you want to plot, categorize, etc.
                 x="petal_width",  # x: name of column in df you want to use in the x-axis of the scatter plot
                 y="petal_length") # y: name of column in df you want to use in the y-axis of the scatter plot

# 2. Show the figure in the notebook
fig.show()

Say you want to color these points by `species`. It is as simple as including the `color=` parameter into `px.scater(...)`. plotly.express will automatically color the points according to the data under the `species` column in your dataframe, and include a legend.

**Try:**
- Clicking on one of the datapoints in the legend to remove that group of points from the plot. Click again to make it appear
- Instead of `color`, use `symbol` to discriminate the points coming from different `species`. 
    - BONUS: specify which symbols to use using `symbol_sequence` or `symbol map` - can you figure out the difference between the two arguments?
        - Symbol types described at https://plotly.com/python/marker-style/#custom-marker-symbols
- Add `hover_data=["sepal_length", "sepal_width"]` into `px.scatter(...)` so when you hover over the points, you can also see the `sepal_length` and `sepal_width` values for that point.
- Add marginal plots to supplement your scatter plot by adding `marginal_y="violin"` and `marginal_x="box"` into `px.scatter(...)`. More documentation and options described at https://plotly.com/python/marginal-plots/
- Add a title and change the default axis names (find how to do it at https://plotly.com/python/figure-labels/)


In [None]:
# 1. Make the figure
fig = px.scatter(df,               
                 x="petal_width",  
                 y="petal_length",
                 # color=
                ) 

# 2. Show the figure in the notebook
fig.show()

Try some of the other plotly.express plots! View some of the basic options:
- https://www.geeksforgeeks.org/python-plotly-tutorial/
- https://plotly.com/python/plotly-express/

# Part 1b: Advanced Plotly features with Plotly Graph Objects

Plotly express is great for quickly making a plot for exploring your data. Sometimes, you need a cleaner figure, or you just want to plot something more complicated. Plotly Graph Objects allows you to work directly with plotly figure components so you can easily customize.

Let's make a scatter plot again, but this time with plotly graph objects. We will be loosely following this tutorial: https://towardsdatascience.com/tutorial-on-building-professional-scatter-graphs-in-plotly-python-abe33923f557

In [None]:
# import the plotly graph objects package
import plotly.graph_objects as go

Step 1: Initialize your figure and add data to your plot. To do this, we use `add_trace(...)`. A `trace` is like a layer of data (or a graph object) to add to the figure. You can call `fig.add_trace(...)` multiple times to add multiple `traces` (say a scatter plot overlaying a bar plot). Other helper methods exist such as `add_shape(...)` and `add_hline(...)`; we will get to those later.

For now, we will only add one trace, which will be a scatter plot. To make this, we will call `go.Scatter(...)` to make a scatter plot graph object. Then we will add this scatter graph object to our figure `my_fig` using `my_fig.add_trace(...)`.

In [None]:
# Step 1: make the figure
my_fig = go.Figure()

# Step 2: Add data.
scatter_graph_object = go.Scatter(x = df["petal_width"], 
                                  y = df["petal_length"],
                                  
                                  # mode can be one of "markers", "lines", "lines+markers", "lines+markers+text"
                                  # what do the others look like?
                                  mode = "markers",  
                                 )
print(f'scatter_graph_object is a {type(scatter_graph_object)}')

In [None]:
my_fig.add_trace(scatter_graph_object)
my_fig.show()

Note that the graph object does not take in a pandas dataframe (like in plotly express); instead the data is defined directly as an array (or a pandas Series). This is slightly inconvenient but also much more flexible. Also note that unlike plotly express, plotly graph objects do not generate a graph title or axis title. These need to be explicitly defined.

To do this, call `update_layout(...)`. Check out the documentation at https://plotly.com/python/figure-labels/ and https://plotly.com/python/axes, and make the following updates:

1. Add in graph title and axis titles with specified font sizes
1. Adjust position of graph title
1. Change the background color to white
1. Change the line color of the axes from white to gray
1. Change the color of all text on the graph to gray
1. Change the color of the data points to a darker shade of blue, so they stand out more

All the CSS named colors can be found at https://developer.mozilla.org/en-US/docs/Web/CSS/color_value



In [None]:
my_fig.update_layout(
    # title={ ... }
    # plot_bgcolor = 
    # xaxis = { ... }  # The x-axis can be updated in the update_layout() method or in its own update_xaxes() method
    # yaxis = { ... }  # Similar for the y-axis
)

# The color of the data points is tied to the figure trace; so the update_traces() method must be used to change the marker_color
my_fig.update_traces()

my_fig.show()

Let's look at the underlying data structure of a graph_object Figure. Try examining `my_fig.data` and `my_fig.layout`

In [None]:
my_fig.data

In [None]:
my_fig.layout

Was this what you expected? The plotly figures are built upon nested-dictionaries (and lists to store multiple traces) and can be inspected and modified just like normal dicts and lists. I don't recommend trying to create/modify a Figure from scratch; methods exist for a reason. But viewing the underlying data can be helpful to remember a attribute name or understand the current data state.

Next we want to distinguish the points by category. Plotly express does this automatically if you pass a column name to the `color` or `symbol` parameters. Using graph_objects, similarly to how you updated the `marker_color` above with a single value, you could pass a list instead, giving the desired color of each point (based on its category). Similar techniques could be used to change the marker size or symbol as well. However, adding in a _separate trace_ for each category is usually easier to deal with when making other updates later on.

The approach here is to use a `for` loop. For each unique category, make a new trace (scatter plot graph object) with the corresponding data from your table and add it to your figure.

In [None]:
unique_species = pd.unique(df['species']).tolist()  # Get the unique values in the species column of your data
print(unique_species)

You will need to specify the colors to use to plot each category. For now, we will make a dictionary with the species as the `key` and the color (this time in hexcode) as the `value`. View more colors at https://htmlcolorcodes.com/.

In [None]:
# specify colors
species_colors_dict = {'setosa': "#color_1", 
                       'versicolor': "#color_2", 
                       'virginica': "#color_3"
                      }

# initialize figure
fig_species_colored = go.Figure()

# add traces
for species in unique_species:
    # Get only the rows in df where the species value matches the current species
    species_df = df  #.loc[...]  
    
    # Add a scatter graph object
    fig_species_colored.add_trace(go.Scatter(x = species_df["petal_width"], # Using only the data corresponding to the current species
                                             y = species_df["petal_length"],
                                             mode = "markers",
                                             name = species, # label the points with the species name. Default will be trace_0, trace_2, etc.
                                             marker = dict(color = species_colors_dict[species]) # Using the species_colors_dict to get the corresponding color for this species
                                            )
                                 )
#####

# update figure layout
fig_species_colored.update_layout(plot_bgcolor = "white",  # background color
                                  font = dict(color = "#909497"),
                                  title = dict(text = "Iris Petal Sizes", font_size=30),
                                  xaxis = dict(title = "Petal Width", linecolor = "#909497"),
                                  yaxis = dict(title = "Petal Length", linecolor = "#909497"))
    

Maybe it would be easier to view graph if each species had its own plot. However, we still want to be able to compare them. A subplot will allow us to arrange multiple plots on the same figure. Check out this link before continuing: https://plotly.com/python/subplots/#subplots-with-shared-yaxes

In [None]:
from plotly.subplots import make_subplots

In [None]:
#create the blank graph object with make_subplots(...) instead of go.Figure()
fig_subplots = make_subplots(rows = 1, 
                             # cols = ,           # provide the dimensions of the subplot
                             shared_yaxes=True,
                             # subplot_titles= ,  # give each subplot a title
                             horizontal_spacing=0.07)  


# like before, iterate through each category (species)
for i, species in enumerate(unique_species): # i = 0, 1, 2
    # Get only the rows in df where the species value matches the current species
    species_df = df.loc[df['species'] == species]  
    
    # Add a scatter graph object
    fig_subplots.add_trace(  # add the same go.Scatter() object as before
                             # this time, specify which row and col to assign the trace to
                             # notice that subplots use 1-indexing convention
                          )
#####

# Update layout
fig_subplots.update_layout(plot_bgcolor = "white",  # background color
                           font = dict(color = "#909497"),
                           title = dict(text = "Iris Petal Sizes", font_size=30),
                           )

# Update all the subplots' axes at the same time
fig_subplots.update_xaxes(title = "Petal Width", linecolor = "#909497")
fig_subplots.update_yaxes(title = "Petal Length", linecolor = "#909497")



Plot still looks a little busy with all the redundant axes labels. The legend is also redundant since we already have the data separated out. To make it look nicer, we will:

1. Remove the legend
1. Remove the duplicated axis titles
1. Remove axes lines
1. Ensure all subplots are displaying a consistent range for both axes

In [None]:
fig_subplots_clean = go.Figure(fig_subplots)

# Remove legend (set showlegend to False)
fig_subplots_clean.update_layout()     

# manually set the x-axis range and remove axis title and line
fig_subplots_clean.update_xaxes(title_text= '',
                                # showline = , 
                                # range = ,          # [xaxis minimum, xaxis maximum]
                                # tickvals =         # list of values to add tick marks
                               )
# remove y-axis title and line
fig_subplots_clean.update_yaxes(title_text= '',
                                #showline =  
                                )

# use the add_annotations() command to generate both the x-axis and y-axis titles instead of update_axes(title = ...) and update_yaxes(title = ...)
# Allows for more precise control of placement

#x axis title
fig_subplots_clean.add_annotation(text = "Petal Width",
                                  xref = "paper",
                                  yref = "paper",
                                  x = 0.8,    # modify values to place correctly
                                  y = -0.03,
                                  showarrow = False)

#y axis title
fig_subplots_clean.add_annotation(text = "Petal Length",
                                  xref = "paper",
                                  yref = "paper",
                                  x = -0.03,   # modify values to place correctly
                                  y = 0.2,
                                  showarrow = False,
                                  textangle = -90)

fig_subplots_clean.show()

Lastly, let's add a few finishing touches:

1. Add all data points to each subplot
1. Add additional information to the hover labels and fix the format (read more here: https://plotly.com/python/hover-text-and-formatting/#customizing-hover-text-with-a-hovertemplate)
1. Make the titles look better
1. Add signature

Other than adding your name in the signature, the code in this cell is all present and correct. Feel free to change some values to see what happens or try adding additional layout specifications.

In [None]:
import numpy as np
fig_subplots_fancy = go.Figure(fig_subplots_clean)

# iterate through the columns to add all the points in gray
for i in range(len(unique_species)): # i = 0, 1, 2
    species = unique_species[i]
    
    # Plot all the points in dataframe in gray
    fig_subplots_fancy.add_trace(go.Scatter(x = df["petal_width"],  # full df
                                            y = df["petal_length"],
                                            mode = "markers",
                                            name = "all_points", 
                                            marker = dict(color = "#909497"),  # gray color
                                            opacity = 0.3, # Setting this trace to be more transparent
                                            
                                            # we can provide additional data to reference in the hover labels
                                            customdata = np.stack((df['sepal_width'],
                                                                   df['sepal_length'],
                                                                   df['species']),
                                                                  axis=-1),
                                            
                                            # this template defines the structure of the hover labels
                                            hovertemplate='Petal Width: %{x:.2f} <br>' +
                                                          'Petal Length: %{y:.2f} <br>' +
                                                          'Sepal Width: %{customdata[0]:.2f} <br>' +
                                                          'Sepal Length: %{customdata[1]:.2f} <br>' +
                                                          '<extra>%{customdata[2]}</extra>',
                                            hoverlabel={'bgcolor': 'white'}
                                           ),
                                 row = 1,    
                                 col = i + 1,
                                )
    
#sub-title annotation
for i, species in enumerate(unique_species):
    fig_subplots_fancy.add_annotation(text = species,
                                      xref = f'x{i+1}',
                                      yref = "paper",
                                      x = 20,
                                      y = 1.02,
                                      showarrow = False,
                                      xanchor = "left",
                                      font = dict(size = 14, color = "#404647")
                                     )

    
#create author of the graph
fig_subplots_fancy.add_annotation(text = "Author: My name", # add your name!
                                  xref = "paper",
                                  yref = "paper",
                                  x = 1.005,
                                  y = -0.145,
                                  showarrow = False,
                                  font = dict(size = 12),
                                  align = "right",
                                  xanchor = "right")

fig_subplots_fancy.show()


Finally, let's save your figure! You can easily download your plot as a png by clicking on the camera icon on the top right. If you want your figure saved as an svg or pdf (or other image format), you will use the `write_image(...)` method. To utilize this functionality, you may need to install an additional dependency `kaleido`. There can some issues in getting this package to work, however. Let us know if you need help!

In [None]:
fig_subplots_fancy.write_image('final_image.svg', height=500, width=800)

# Part 2: BYOC

Now it's your turn! Keep playing around with the iris dataset and try other plots (we would especially recommend boxplots https://plotly.com/python/box-plots/ or heatmaps https://plotly.com/python/heatmaps/). 

**OR** 

Load in your own data you want to visualize.

In [None]:
# Step 1: Load your data

# Step 2: Format or annotate data

# Step 3: Initialize your figure

# Step 4: Update layout

# Step 5: Annotate