# DSFP Session 22 Introduction to Plotly

### Based on a Tutorial by Aaron Geller (CIERA/Northwestern). 

Some wisdom from Aaron:

If you already know Python and you don't really want to learn another coding language, but you do want to create interactive figures (e.g., within a Jupyter notebook and/or for use on a website), you should look into Plotly.  

In particular, [Plotly express](https://plotly.com/python/plotly-express/) is a fantastic tool for generating quick interactive figures without much code.  Plotly express covers a good amount of ground, and you may be able to do all/most your work within Plotly express, depending on your specific needs.  In this workshop, I'll show you Plotly express, but then move beyond it for the majority of the content.  

Though you can do a lot with Plotly, it definitely has limitations (some of which we'll see in this workshop). Also, as with all of the ready-made interactive plot solutions (e.g., [Bokeh](https://docs.bokeh.org/en/latest/), [Altair](https://altair-viz.github.io/), [Glue](https://glueviz.org/), etc.), Plotly has a specific look, which can only be tweaked to a certain extent.  If you like the look well enough and you don't mind the limitations, then it's a good choice. 

In [1]:
## Preliminaries

import pandas as pd
import numpy as np
import scipy.stats

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px



In the Session 22 repository you will find two data files. They are:

1. plasticc_train_metadata.csv
2. climate_data.csv

## 1: Simulated Rubin Dataset 

We will start with using plotly express to generate plots of simulated Rubin data. 

Create a couple simple Plotly figures using [Plotly express](https://plotly.com/python/plotly-express/).

Plotly express is a simplified version of the Plotly interface for Python that allows users to create many types of Plotly figures with single lines of code.  This greatly simplifies the workflow for some kinds of Plotly figures.  We will start with Plotly express (and for some of your use cases, that may be enough), but we will move on to full blown Plotly for the rest of this tutorial.

## 0) read in the data and examine it as a table

In [5]:
df = pd.read_csv('plasticc_train_metadata.csv')
df.columns

Index(['object_id', 'ra', 'decl', 'ddf_bool', 'hostgal_specz',
       'hostgal_photoz', 'hostgal_photoz_err', 'distmod', 'mwebv', 'target',
       'true_target', 'true_submodel', 'true_z', 'true_distmod',
       'true_lensdmu', 'true_vpec', 'true_rv', 'true_av', 'true_peakmjd',
       'libid_cadence', 'tflux_u', 'tflux_g', 'tflux_r', 'tflux_i', 'tflux_z',
       'tflux_y'],
      dtype='object')

## a) Create a photo-z spec-z scatter plot with px.scatter(data, x = "", y = "")

In [7]:
fig = px.scatter(df, x = df['hostgal_photoz'], y = df['hostgal_specz'])
fig.show()

## b) Now pick a graphical encoding s.t. that your data points express some feature of the dataset. Express supports size and color (or both)

In [14]:
#encoding = ## encoding goes here 
fig = px.scatter(df, x = df['hostgal_photoz'], y = df['hostgal_specz'], size=df['true_z'], color=df['true_vpec'])
fig.show()

## c) Now add a hover menu that will display some properties of the data as you move over datapoints. You can do this with "hover_name = " and "hover_data = "

In [17]:
fig = px.scatter(df, x = df['hostgal_photoz'], y = df['hostgal_specz'], size=df['true_z'], color=df['true_vpec'],
                hover_name = df['object_id'],
                hover_data = [df['ra'], df['decl']])

fig.show()

## d) change the data range with range = [min, max] in  fig.update_xaxes/fig.update_axes

In [30]:
xmin, xmax = 0,3.2
ymin, ymax = 0,4.0

fig.update_yaxes(title = r'$z_{spec}$', range = [xmin, xmax])
fig.update_xaxes(title = r'$z_{photo}$', range = [xmin, xmax])

## e) Now create a bar chart/histogram of some of the meta variables with px.histogram(data, x="",...)
Pick a color encoding s.t. that your histogram conveys a meaningful message about the types of sources in the plasticc dataset

In [35]:
fig = px.histogram(df, x=df['true_vpec'], color=df['target'])
fig.show()

## f) Create the plot using the standard Plotly [Graph Object](https://plotly.com/python/graph-objects/).

For the remainder of the tutorial we will use Graph Objects for our Plotly figures.  One motivation here is so that I can create multiple panels in one figure, which can be downloaded to an html file.  (Plotly express will only make an individual figure, and does not support arbitrary subplots.) 

First you create a <b>"trace"</b>, which holds the data.  There are many kinds of traces available in Plotly. (e.g., bar, scatter, etc.).  For this example, we will use a scatter trace.  (Interestingly, the scatter trace object also includes line traces, accessed by changing the "mode" key.  I will show the line version later on.)

Then you create a figure and add the trace to that figure.  A single figure can have multiple traces.

In [41]:
# Create a plot using Plotly Graph Objects(s)

# Note: We imported the plotly.graph_objects as go.
# create the trace (scatter plot)
trace1 = go.Scatter(
    x=df['hostgal_photoz'], 
    y=df['hostgal_specz'], 
    mode='markers',
    marker=dict(
        size=df['true_z'],    # size of the markers based on true_z
        color=df['true_vpec'], # color of the markers based on true_vpec
        colorscale='Viridis',  # choose a colorscale, e.g., 'Viridis'
        showscale=True         # display the color scale
    ),
    text=df['object_id'],  # object_id for hover information
    hovertemplate='<b>Object ID:</b> %{text}<br><b>RA:</b> %{customdata[0]}<br><b>DEC:</b> %{customdata[1]}',  # customize hover info
    customdata=np.stack([df['ra'], df['decl']], axis=-1)  # pass ra and decl as additional hover data
)

# create the figure 
fig = go.Figure()

# add the trace to the figure
fig.add_trace(trace1)

# update axes properties (e.g., setting axis titles)
fig.update_xaxes(title_text='Photometric Redshift (hostgal_photoz)')
fig.update_yaxes(title_text='Spectroscopic Redshift (hostgal_specz)')

fig.show()

# Part 2: Chicago Transit Authority Dataset

In this part, we will explore a time-series dataset over multiple populations or clusters.

## 0) read in the data and visualize as a table
Hint: for the next part to work, you will need to convert the dates in the table using the pd.to_datetime() method.

## a) Create time series traces (entries over time) for entries at stations 42180 (blue) and 40900 (red) organized by daytype.

Pick a reasonable color mapping for your data markers, and define a hovertemplate that provides the date, station name, and daytype when you hover over each point. Then create two plots side by side that enable comparison of the ridership data at both stations.

In [None]:
# do this for the first station 
# you may need pandas df.loc, something like data.loc[data['feature'] == #####]

trace_CTA = go.Scatter(x = , 
                y = , 
                
                hovertemplate = ,
                text = ,
                hoverlabel=dict(
                bgcolor = 'white',
    )



In [None]:
# Create the trace(s) for the 2nd station

## b) Now create a side-by-side figure showing the data with a shared y-axis

In [None]:
# Create the figure and add the traces
# I will use Plotly's "make_subplots" method (imported above).
# Define the number of rows and columns, the column_widths, spacing, and here I will share the y axis.

# Sharing the y axis means that if you zoom/pan on one plot, the other will also zoom/pan.
fig = make_subplots(rows = , cols = , column_widths = , horizontal_spacing = , shared_yaxes = True)

# Add the first trace and update the axes.
# Note that I specify which row and column within each of these commands.
fig.add_trace(, row = 1, col = 1)

fig.update_xaxes(title = 'Date', row = 1, col = 1)
fig.update_yaxes(title = 'Entries', row = 1, col = 1)

# Add the second trace and update the axes.


# Provide an overall title to the figure.
fig.update_layout(title_text = 'Blue Line (Jefferson Park Branch) Terminus vs Red (North Side) Terminus')

# Show the final result
fig.show()

## c) What trends do you notice over time and by datetype?

# Part 3: Climate Data

## 0) Read in the data 

"visualize it as a table"

In [42]:
data = pd.read_csv('climate_data.csv')
data.columns

Index(['Date', 'Temp', 'Temp Anomaly', 'Cooling days', 'Cooling Days Anomaly',
       'Drought Index', 'Drought Anomaly', 'Precipitation',
       'Precipitation Anomaly', 'Region'],
      dtype='object')

## b) create two functions, createTraces and createButtons that will loop through the data in our dataset and allow you to select which is plotted.

In [58]:
columns = ['Temp', 'Temp Anomaly', 'Cooling days', 'Cooling Days Anomaly',
       'Drought Index', 'Drought Anomaly', 'Precipitation',
       'Precipitation Anomaly']
region = 'Southwest'
rollingAve = 3

data['Date'] = pd.to_datetime(data['Date'], format='%Y%m')

In [63]:
regional_data = data[data['Region'] == region]# subselect the data to one region

#I'm going to write this as a function so that I can reuse it below
def createTraces(columns):
    # For this scenario, I am going to add each of the 4 traces to the plot but only show one at a time
    # Add traces for each column
    
    traces = [
        go.Scatter(x = data['Date'], y = data[c].rolling(2).mean(), #rolling mean here improves the visualization
            mode = 'lines', # Set the mode the lines (rather than markers) to show a line.
            opacity = 1, 
            marker_color = 'black',
            fill = 'tozeroy',  # This will fill between the line and y=0.
            showlegend = False,
            name = 'Climate Metric',
            hovertemplate = 'Date: %{x}<br>Value: %{y}<extra></extra>', #Note: the <extra></extra> removes the trace label.
            visible = i == 0
        ) for i, c in enumerate(columns)
    ]

    
    return traces


# I'm going to write this as a function so that I can reuse it below
# x,y args to position the buttons
def createButtons(columns, x = 0.0, y = 1.13):
    # create an "updatemenu" with buttons for choosing the data to plot that I will add to the figure later

    updatemenu = dict(
            type = 'buttons',
            direction = 'left', # This defines what orientation to include all buttons.  'left' shows them in one row.
            buttons = list([
                dict(
                    # 'args' tells the button what to do when clicked.  
                    #     In this case it will change the visibility of the traces
                    # 'label' is the text that will be displayed on the button
                    # 'method' is the type of action the button will take.
                    #    method = 'restyle' allows you to redefine certain preset plot styles (including the visible key).  
                    #    See  https://plotly.com/python/custom-buttons/ for different methods and their uses
                    args = [{'visible': [i == j for j in range(len(columns))]}], 
                    label = label.replace('_', ' '),
                    method = 'restyle' 
                ) for i, label in enumerate(columns)]),
        
            showactive = True, # Highlight the active button
            # Below is for positioning
            x = x, 
            xanchor = 'left',
            y = y,
            yanchor = 'top'
        )
    
    return updatemenu

## c) Now create the figure

In [64]:
# Create the figure.
fig = go.Figure()

# create the traces
traces = createTraces(columns)

# add the traces to the figure
for t in traces:
    fig.add_trace(t)
    
# create the buttons and add them to the figure below
buttons = createButtons(columns)

# Update a few parameters for the axes and add the buttons
#   Note: I added a margin to the top ('t') of the plot within fig.update_layout to make room for the buttons.
fig.update_xaxes(title = 'Date')#, range = [np.datetime64('2020-03-01'), np.datetime64('2022-01-12')])
fig.update_yaxes(title = 'Value/Temp/Index')
fig.update_layout(
    title_text = 'Climate Data Explorer : '+ region + '<br>(' + 'July-to-July ' + str(rollingAve) +'-year rolling average)',
    margin = dict(t = 150),
    updatemenus = [buttons]
)

fig.show()

## d) Now add a drop down that allows you to change between the regions. Call this createDropdown

In [65]:
# I am going to create the dropdown list here and then add it to the figure below
# I will need to update the x and y data for the time series plot 

# Identify the countries to use 
# I will but The United States of America first so that it can be the default country on load (the first button)
availableRegions = # get a list of the unique regions 
#availableRegions.insert(0, availableCountries.pop(availableCountries.index('United States of America'))) 

# I will write this as a function as well and then create a new figure in the next cell that uses this function
# x,y args to position the dropdown
def createDropdown(availableRegions, columns, x = 0.0, y = 1.1):
    # create an "updatemenu" with a dropdown for choosing the data to plot that I will add to the figure later

    dropdown = []
    for c in availableRegions:
        if (c in 'region'): #<---- This line, 'region' is intended as a placeholder
            dropdown.append(dict(
                args = [{'x': ["replace this with the appropriate column" == c]['Date']]*len(columns), # the same x values for each trace
                         'y': ["replace this with the appropirate column" == c][col].rolling(rollingAve).mean() for col in columns],
                }],
                label = c,
                method = 'update'
            ))

    updatemenu = dict(
        buttons = dropdown,
            direction = 'down',
            showactive = True,
            x = x,
            xanchor = 'left',
            y = y,
            yanchor = 'top'
        )
        

    return updatemenu

SyntaxError: invalid syntax (1160314542.py, line 6)

In [None]:
# Create the figure.
fig = go.Figure()

# create the traces
traces = createTraces(columns)

# add the traces to the figure
for t in traces:
    fig.add_trace(t)

# generate the menus to be added to the figure below
updatemenus = [createButtons(columns, 0, 1.3), createDropdown(availableRegions, columns, 0, 1.15)]

# Update a few parameters for the axes and add the buttons and dropdown
fig.update_xaxes(title = 'Date')#, range = [np.datetime64('2020-03-01'), np.datetime64('2022-01-12')])
fig.update_yaxes(title = 'Value/Temp/Index')
fig.update_layout(
    title_text = 'Climate Explorer : '+ region + '<br>(' + str(rollingAve) +'-year rolling average)',
    title_y = 0.97,
    margin = dict(t = 140),
    updatemenus = updatemenus
)

fig.show()

## Optional Problem

### Create Derived Features from the information in the meta-data. Use these derived features to visualize class boundaries.
