### Visualization Lab

Welcome to today's lab!  The primary gist of today's lab will be to visually explore our dataset using the plot.ly visualization library.  You can find its documentation here:  https://plotly.com/python/

This lab will be longer form, and is designed to walk you through the fundamentals of using plot.lys `express` and `graph_object` libraries to further enhance your visualizations.

However, it is **not** meant to be an exhaustive introduction to everything plot.ly can offer -- that would take too much time.  However, we should have a solid understanding of how to use it to visually capture information within our data.

### Section 1:  Working With Time

To begin, we'll start using line charts with time, and different ways to visualize their dynamics.

The simplest possible way to do this is with `px.line` charts, as we'll see below.

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px

# replace this file path with your own!
df = pd.read_csv('../data/restaurant data/master.csv', parse_dates=['visit_date'])

In [None]:
# the basic syntax for a line chart
# you pass in your dataframe as the first argument
# define the x & y - axis
# notice we do the grouping -- this is used to create a clean, one-dimensional line chart
data = df.groupby('visit_date')['visitors'].sum().reset_index()
px.line(data, x='visit_date', y='visitors', title='Daily Visitors Over Time')

**Your Turn**: See if you can create a line chart showing reservations over time

In [None]:
# your answer here

#### Stratified Line Charts

Lots of times we would want to see an overall trend for individual units -- like restaurants.  

This is easy enough to do using the `color` argument.

In [None]:
# notice we are not doing a grouping -- this would remove the data at the restaurant level
px.line(df, x='visit_date', y='visitors', color='id', title='Daily Visitors For Individual Restaurants')

This is okay -- but maybe still a bit crowded.  Let's see if we can stratify a bit more using the `facet_row` and `facet_col` arguments, which allow you to further break down a chart into small subcomponents based on categorical values from another column.

In this example, we'll break everything down by `genre`

In [None]:
# this will get attendance, broken down by genre -- makes the graphs present easier
data = df.groupby(['visit_date', 'genre'])['visitors'].sum().reset_index()
# and then we go ahead and create a line chart out of it
px.line(data, x='visit_date', y='visitors', facet_row='genre',
        title='Restaurant Attendance Broken Down by Genre',
        height=2000)

One issue that frequently comes up with subplots that have large numbers items is that it can be difficult to get the spacing correct.  The above chart is a little messy to my eye.  One issue to have more control over this is the `make_subplots` function, which returns a `graph.figure` that will give you more control over appearance.

Importantly here -- this function is not compatible with `plotly.express` so we have to use the more powerful but more verbose `graph_object` to configure the plot.

Let's see how it works.

In [None]:
# notice we create a graph_object -- NOT a plotly.express line chart
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# number of columns is number of unique values of genre
fig = make_subplots(cols=1, rows=data.genre.nunique(),
                    # this controls how much space shows up between each subplot -- useful for controlling layout
                    # the smaller this is, the less space appears between figures -- you can also use horizontal spacing
                    vertical_spacing =.02,
                    # this will cause the title of each subplot to show up on top
                    subplot_titles = data.genre.unique().tolist())

# we'll loop through each unique value of a genre and add it as a trace
for idx, genre in enumerate(data.genre.unique(), start=1):
    # go.scatter is plotly's low level version of a line chart
    fig.add_trace(go.Scatter(x=data[data.genre == genre]['visit_date'], y=data[data.genre == genre]['visitors']), col=1, row=idx)

# the update_layout() function is very useful for controlling small visual details of what shows up in your chart
# in this case, we are making the legend disappear and adding the title and height
fig.update_layout(showlegend=False, title='Attendance By Genre', height=2800)

The above chart looks better to my eye -- it's laid out appropriately, and it's easier to compare values along different axes.  It's also a good example of how you might need to skip `plotly.express` from time to time to get the appropriate results.

**Your Turn:** Using what was covered in the previous cells, see if you can create your own line chart, using some combination of `facet_col`, `facet_row`, `color`, and `make_subplots`.  Take about 10-15 minutes.

In [None]:
# your answer here

### Scatter Plots

The syntax for most `plot.ly express` charts is very much the same across charts, we won't need to repeat ourselves very much to go from one visual to the next.  

Instead, it might be easiest to run a few cells and see how everything works.

In [None]:
# create a grouping for the chart
data = df.groupby('visit_date')[['visitors', 'reserve_visitors']].sum().reset_index()
# create a column for day of the week to use for later charts
data['day'] = data.visit_date.dt.day_name()

# simple scatter chart
px.scatter(data, x='visitors', y='reserve_visitors')

In [None]:
# scatter chart with colors
px.scatter(data, x='visitors', y='reserve_visitors', color='day', title='Visits vs. Reservations, by Date')

In [None]:
# scatter plot with day, using a col wrap
px.scatter(data, x='visitors', y='reserve_visitors', facet_col='day', height=350)

In [None]:
# scatter chart w/ trendline + colors
# there is a trendline for each unique day -- if you removed the color, there would just be one unique trend
px.scatter(data, x='visitors', y='reserve_visitors', color='day', trendline='ols', title='Visits vs. Reservations, by Date')

In [None]:
# all of the above, but the points are also scaled for size
px.scatter(data, x='visitors', y='reserve_visitors',
           color='day', 
           size='visitors', 
           title='Visits vs. Reservations, by Date, Scaled By Visits')

**Your Turn:** In a similar vein to what was done above, try re-creating a scatter plot to capture some aspect of the data set that piques your interest.

In [None]:
# your answer here

### Bar Charts

We know them, we love them -- they're old fashioned, but that's only because they've been proven to work.  Bar charts are the quintessential way of visualizing all sorts of data -- so let's see how they work inside plot.ly

In [None]:
# simple bar chart -- we're going to use the same grouping from the previous cells -- recreate it if you need to
px.bar(data, x='day', y='visitors', barmode='group')

In [None]:
# different barmode group -- gives you a slightly different appearance
px.bar(data, x='day', y='visitors', barmode='overlay')

In [None]:
# we'll add color -- in a similar way that we did before
data['year'] = data.visit_date.dt.year
# and use this as a color overlay
px.bar(data, x='day', y='visitors', barmode='group', color='year')

Notice this is a continuous color palette, since the year column is technically an integer.  If we wanted a discrete color palette -- probably more appropriate in this instance, we could just change it to a string.

In [None]:
# we'll add color -- in a similar way that we did before
data['year'] = data.visit_date.dt.year.astype(str)
# and use this as a color overlay
px.bar(data, x='day', y='visitors', barmode='group', color='year', title='Visitors by Day, Year Over Year')

For the next chart we'll try and do something a little fancier -- facet_columns + rows, with error bars added in.

In [None]:
# create a new grouping -- this time using holidays as part of our grouping
data = df.groupby(['visit_date', 'holiday'])[['visitors']].mean().reset_index()
# extract the time parts
data['day'] = data.visit_date.dt.day_name()
data['year'] = data.visit_date.dt.year.astype(str)

# and create a new bar chart with some additional options
px.bar(data, x='day', y='visitors', facet_col='holiday', facet_row='year', height=800, color='day')

**Your Turn:** Try your hand at creating one of these, using data categories that you wish.

In [None]:
# your answer here

### Spatial Maps

A big benefit of Plotly compared to other graphing libraries such as seaborn or matplotlib is that it has wider out of the box support for a fairly large amount of charts that are very contemporary.  

Particularly, it provides comprehensive support for spatial data, which is otherwise difficult to find in the python ecosystem.  

The dataset we have been using has latitude and longitude columns, but so far these have not been used for anything.  Let's see how we can use these to visualize our data spatially.

There are 4-5 different spatial charts you can render, but if you just have latitude and longitude columns and nothing else then the `scatter_mapbox` is the easiest to work with.

In [None]:
px.scatter_mapbox(df, lat="latitude", lon="longitude", color="reserve_visitors", size="visitors",
                  color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10,
                  mapbox_style="carto-positron")

What's useful here is that spatial charts work exactly the same as others -- you just need to provide the latitude and longitude values instead of X & y.

**Your Turn:** Create a similar map, except only include restaurants that are located in `'Fukuoka-ken'` by extracting that information from the `area` column.

In [None]:
# your answer here

#### Distribution Charts

The final major category of charts we're going to look at are distribution charts -- charts that you give you some idea of the numeric shape of your data.  

These would be histograms, boxplots, and the like.  

The syntax for these works in a manner analogous to others, so let's just make note of how these charts render and some of their more interesting options.

**Boxplots**

In [None]:
# visitors by day, broken down by holiday
px.box(data, x="day", y="visitors", color="holiday")

Here's a violin plot, which demonstrates something similar, except areas bulge where there are more samples.

In [None]:
# visitors by day, broken down by holiday 
px.violin(data, x="day", y="visitors", color="holiday")

And here's our trusted friend -- the histogram

In [None]:
# standard
px.histogram(data, x='visitors')

In [None]:
# if you want overlays, just add in something for the 'color' argument
px.histogram(data, x='visitors', color='holiday')

In [None]:
# or of course, if we wanted to break this down into smaller components we could
# notice facet_col_wrap -- allows you to stack subplots across multiple rows + columns
px.histogram(data, x='visitors', color='holiday', facet_col='day', height=1200, facet_col_wrap=2)

**Your Turn:** Take 10-15 minutes to try and re-create some distribution plots of your own.

In [None]:
# your answer here

### Animation Frames

Another really nice feature of plotly compared to its peers it allows you to make charts that have interactivity built into them.  Similar to quick filters in Tableau or other BI tools.  The `animation_frame` argument makes it easy to see how charts might change when you use different values for another column.

As an example, here's the scatter plot that demonstrates `visitors` vs `reserve_visitors`:

In [None]:
# create a grouping for the chart
data = df.groupby('visit_date')[['visitors', 'reserve_visitors']].sum().reset_index()
# create a column for day of the week to use for later charts
data['day'] = data.visit_date.dt.day_name()

# simple scatter chart
px.scatter(data, x='visitors', y='reserve_visitors')

Before, we use the `day of the week` to break things down by color.  However, that didn't necessarily provide a lot of insight with respect to how the relationship between the two actually changed from day to day -- even though the chart was prettier.  An animation frame makes it a lot easier to isolate the same version of that chart, but for different days of the week.

In [None]:
# scatter chart with an animation frame
a = px.scatter(data, x='visitors', y='reserve_visitors', animation_frame='day',
           title='Visits vs. Reservations, By Day',
           height=500)

As you can see -- this is a very intuitive way to analyze data that's much more dynamic than just a regular chart.  They are also useful for creating timelines.  It also allows you to mimic more expensive BI tools that have costly licenses.  

**Your Turn:** Take 10-15 minutes, and try and re-do some of your previous charts using animation frames to make them more interactive.

In [None]:
# your answer here

### Using Plotly as a Backend for Pandas

If you want to use plotly directly within pandas via the `plot` function you can, easily enough.  

Just use the following code:  

In [None]:
# run this to set plotly as the default back end
pd.options.plotting.backend = "plotly"

In [None]:
# and now we get our plotly charts automatically
df.groupby(['visit_date'])['visitors'].sum().plot()

In [None]:
# we can do other fun things with it as well
df.groupby(['visit_date', 'genre'])['visitors'].sum().reset_index().plot(x='visit_date', y='visitors', animation_frame='genre')