# Lecture 24 – Fun with Plotly

## Data 6, Summer 2022

In [None]:
from datascience import *
import numpy as np
Table.interactive_plots()
import plotly.express as px
import seaborn as sns

In this lecture we will introduce additional visualization techniques available to use through the `plotly` library. All of these visualization methods are out of scope, but they're still pretty cool to look at.

## Animated Scatter Plots

Our data today comes from the [Gapminder Foundation](https://www.gapminder.org/data/), which explores data on poverty, inequality and health around the world.

In [None]:
world = Table.from_df(px.data.gapminder())
world

Use the code to generate an animated scatter plot of GDP per capita and life expectancy over time. You can play the animation or scroll to a specific year.

In [None]:
px.scatter(world.to_df(),
           x = 'gdpPercap',
           y = 'lifeExp', 
           hover_name = 'country',
           color = 'continent',
           size = 'pop',
           size_max = 60,
           log_x = True,
           range_y = [30, 90],
           animation_frame = 'year',
           title = 'Life Expectancy, GDP Per Capita, and Population over Time'
          )

## Animated Histograms

We can do the same with `px.histogram`, using the optional argument `animation_frame`.

In [None]:
px.histogram(world.to_df(),
            x = 'lifeExp',
            animation_frame = 'year',
            range_x = [20, 90],
            range_y = [0, 50],
            title = 'Distribution of Life Expectancy over Time')

## Box Plots

Box plots, also called "box and whisker plots" show the rough distribution of multiple numerical variables. In particular, they show the 25th, 50th (median), and 75th percentiles (the box), as well as 1.5 times the Interquartile Range (the whiskers). This is helpful for identifying outliers.

In [None]:
world_latest = world.where('year', 2007)
world_latest

In [None]:
px.box(world_latest.to_df(),
       y = 'lifeExp',
       x = 'continent',
       color = 'continent',
       hover_name = 'country',
       title = 'Distribution of Life Expectancy in 2007 by Continent'
      )

## Pie Charts

Pie charts look cool visually, but they are often hard to analyze.

In [None]:
world_latest.where('continent', 'Americas')

In [None]:
px.pie(world_latest.where('continent', 'Americas').to_df(),
       values = 'pop',
       names = 'country',
       title = 'Population of the Americas'
)

In [None]:
world_for_pie = world_latest \
     .group('continent', sum) \
     .select('continent', 'pop sum')

world_for_pie

In [None]:
px.pie(world_for_pie.to_df(),
      values = 'pop sum',
      names = 'continent',
      title = 'World Population by Continent')

## Timelines (Gantt Charts)

The following code creates a Gantt chart showing the timeline of the life of the illustrious [Suraj Rampure](https://rampure.org/), creator of Data 6.

In [None]:
phases = [
 ['Newborn', '1998-11-26', '1999-11-26', 'Canada'],
 ['Toddler, Preschooler', '1999-11-26', '2005-09-03', 'US'],
 ['Elementary School Student', '2005-09-03', '2009-06-30', 'Canada'],
 ['Middle School Student', '2009-09-15', '2012-06-15', 'Canada'],
 ['High School Student', '2012-09-05', '2016-05-30', 'Canada'],
 ['Undergrad @ UC Berkeley', '2016-08-22','2020-05-15', 'US'],
 ['Masters @ UC Berkeley', '2020-08-25', '2021-05-14', 'Canada'],
 ['Teaching Data 94', '2021-01-20', '2021-05-14', 'Canada']]

phases_table = Table(labels = ['Phase', 'Start', 'End', 'Location']).with_rows(phases)
phases_table

In [None]:
px.timeline(phases_table.to_df(),
           x_start = 'Start',
           x_end = 'End',
           y = 'Phase',
           text = 'Location',
           title = "Suraj's Life Trajectory") \
.update_yaxes(autorange='reversed')

## Choropleth Maps

We have already seen choropleth maps before:

In [None]:
world_latest

In [None]:
px.choropleth(world_latest.to_df(),
              locations = 'iso_alpha',
              color = 'lifeExp',
              hover_name = 'country',
              title = 'Life Expectancy Per Country',
              color_continuous_scale = px.colors.sequential.tempo
)

But know we also know how to make animated choropleth maps!

In [None]:
px.choropleth(world.to_df(), 
             locations="iso_alpha",
             color="lifeExp",
             animation_frame="year",
             color_continuous_scale = px.colors.diverging.RdYlGn,
             title = "Life Expectancy Over Time",
             range_color=(30,80))

## 3D Scatter Plots

It is also possible to plot points along three dimensions (i.e. with three coordinates).

In [None]:
penguins = Table.from_df(sns.load_dataset('penguins'))
penguins

Try dragging the graph to move around the camera.

In [None]:
px.scatter_3d(penguins.to_df(),
             x = 'bill_length_mm',
             y = 'bill_depth_mm',
             z = 'flipper_length_mm',
             color = 'species',
             hover_name = 'island',
             title = 'Flipper Length vs. Bill Depth vs. Bill Length')

But always ask yourself if you really need a 3D scatter plot if 2D would suffice.

In [None]:
px.scatter(penguins.to_df(),
             x = 'bill_length_mm',
             y = 'bill_depth_mm',
             color = 'species',
             hover_name = 'island',
             title = 'Bill Depth vs. Bill Length')