# STA 141B Lecture 16

The class website is <https://github.com/2019-winter-ucdavis-sta141b/notes>

### Announcements


### Topics

* Presentation skills
* A few more SQL examples
* Intro to interactive visualizations

### Datasets

* The [Stock Market Database](http://anson.ucdavis.edu/~clarkf/sql/stocks.sqlite)
* The Gapminder Dataset (included in this repository)
* The [Yolo County Restaurants Dataset](http://anson.ucdavis.edu/~nulle/yolo_food.feather)

### References

* SQL
    + [W3 Schools SQL Tutorial](https://www.w3schools.com/sql/)
    + [SQL Cheatsheet](http://anson.ucdavis.edu/~clarkf/sql/sql_cheatsheet.pdf)
* JavaScript (for web visualizations)
    + [Learn X in Y Minutes, X = JavaScript][js-intro] -- a brief intro
    + [MDN JavaScript Guide][js-guide] -- a detailed guide
    + [MDN Learning Materials][web-intro] -- more information about web development

[PDSH]: https://jakevdp.github.io/PythonDataScienceHandbook/
[ProGit]: https://git-scm.com/book/
[nlpp]: https://www.nltk.org/book/
[atap]: https://search.library.ucdavis.edu/primo-explore/fulldisplay?docid=01UCD_ALMA51320822340003126&context=L&vid=01UCD_V1&search_scope=everything_scope&tab=default_tab&lang=en_US
[js-intro]: https://learnxinyminutes.com/docs/javascript/
[js-guide]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide
[web-intro]: https://developer.mozilla.org/en-US/docs/Learn

In [2]:
# NEW PACKAGES
import bokeh.io       # conda install bokeh
import imageio        # conda install -c conda-forge imageio
import folium         # conda install -c conda-forge folium
# For feather files:  # conda install -c conda-forge pyarrow

# DATA SCIENCE TOOLKIT
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline

import sqlite3 as sql

# Presentation Skills

See the [presentation description](https://github.com/2019-winter-ucdavis-sta141b/notes/blob/master/presentation.md).

# SQL Examples

The questions below use the [stock market database](http://anson.ucdavis.edu/~clarkf/sql/stocks.sqlite). You can find additional practice problems [here](http://anson.ucdavis.edu/~clarkf/sql/).


1. Write a query that produces a table with columns for state,  SIC code,  SIC description, and count of companies located in that state with that SIC code.

In [3]:
db = sql.connect("../data/stocks.sqlite")

In [4]:
pd.read_sql("SELECT * FROM sqlite_master", db)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,company_name,company_name,2,"CREATE TABLE ""company_name"" (\n ""ticker"" TEXT..."
1,table,financial_ratios,financial_ratios,16,"CREATE TABLE ""financial_ratios"" (\n ""ticker"" ..."
2,table,company_info,company_info,54,"CREATE TABLE ""company_info"" (\n ""ticker"" TEXT..."
3,table,daily_share_prices,daily_share_prices,111,"CREATE TABLE ""daily_share_prices"" (\n ""date"" ..."
4,table,sic,sic,39180,"CREATE TABLE ""sic"" (\n ""Division"" TEXT,\n ""M..."
5,table,state_populations,state_populations,39285,"CREATE TABLE ""state_populations"" (\n ""state"" ..."
6,table,company_locations,company_locations,39237,"CREATE TABLE ""company_locations"" (\n ""rank201..."
7,table,fang_info,fang_info,39240,"CREATE TABLE fang_info(\n ticker TEXT,\n com..."
8,table,fang_prices,fang_prices,39241,"CREATE TABLE fang_prices(\n ticker TEXT,\n d..."
9,table,fang_locations,fang_locations,39242,"CREATE TABLE fang_locations(\n ticker TEXT,\n..."


In [5]:
pd.read_sql("SELECT * FROM company_locations LIMIT 3", db)

Unnamed: 0,rank2015,rank2014,trend,company,street,city,state,zip,xcoord,ycoord,ticker
0,1,1,Neutral,Walmart,702 S.W. Eighth St.,Bentonville,Arkansas,72716,-94.217629,36.365378,WMT
1,2,2,Neutral,Exxon Mobil,5959 Las Colinas Blvd.,Irving,Texas,75039,-96.949909,32.890006,XOM
2,3,3,Neutral,Chevron,6001 Bollinger Canyon Rd.,San Ramon,California,94583,-121.958096,37.758251,CVX


In [6]:
pd.read_sql("SELECT * FROM sic LIMIT 3", db)

Unnamed: 0,Division,Major.Group,Industry.Group,SIC,Description
0,A,1,11,111,Wheat
1,A,1,11,112,Rice
2,A,1,11,115,Corn


In [7]:
pd.read_sql("SELECT * FROM company_info LIMIT 3", db)

Unnamed: 0,ticker,company_name,industry,sector,sic_code,web_page,asset,revenue,net_income,earning_per_share
0,WMT,WAL MART STORES INC,Food Retail & Distribution,Consumer Non-Cyclicals,5331.0,www.walmartstores.com,198825000000.0,485873000000.0,13643000000.0,4.38
1,XOM,EXXON MOBIL CORP,Oil & Gas Refining and Marketing,Energy,2911.0,http://www.exxonmobil.com/,330314000000.0,218608000000.0,7840000000.0,1.88
2,MCK,MCKESSON CORP,Drug Retailers,Consumer Non-Cyclicals,5122.0,www.mckesson.com,60969000000.0,198533000000.0,5070000000.0,22.73


In [13]:
pd.read_sql("""
    SELECT a.state, a.sic_code, b.description, COUNT(*) AS total FROM
    (
        SELECT l.ticker, l.state, l.company, r.company_name, r.sic_code FROM
            company_locations AS l
        INNER JOIN
            company_info AS r
        ON l.ticker = r.ticker
    ) AS a
    LEFT JOIN
        sic AS b
    ON a.sic_code = b.sic
    GROUP BY state, sic_code
    ORDER BY total DESC
""", db)

Unnamed: 0,state,sic_code,Description,total
0,Texas,1311.0,Crude Petroleum and Natural Gas,4
1,Texas,2911.0,Petroleum Refining,4
2,California,7372.0,Prepackaged Software,3
3,New York,6211.0,"Security Brokers, Dealers, and Flotation Compa...",3
4,California,2836.0,"Biological Products, Except Diagnostic Substances",2
5,California,3572.0,Computer Storage Devices,2
6,California,3674.0,Semiconductors and Related Devices,2
7,California,5651.0,Family Clothing Stores,2
8,California,7389.0,"Business Services, Not Elsewhere Classified",2
9,Connecticut,6324.0,Hospital and Medical Service Plans,2


2. Focusing only on 2014 and the daily share prices table, find the names of the companies which had any closing price that exceeded the average closing price of AAPL in 2014.

3. Did the Brexit vote affect share prices for commercial banks in the United States? How?

# Intro to Interactive Visualizations

There are a lot of Python packages for visualization:

![Python visualization landscape](img/visualization_landscape.png)
_Image from [Jake VanderPlas](https://speakerdeck.com/jakevdp/pythons-visualization-landscape-pycon-2017). See [here](https://rougier.github.io/python-visualization-landscape/landscape-colors.html) for a version with links to all of the packages!_

When you choose a visualization package, there are three major decisions to make. Consider whether your visualization will:

* Be interactive, animated, or static?
* Display two dimensions, or three?
* Be an image, a video, a web page, or something else?

So far we've made visualizations with packages based on __matplotlib__. These tend to be static 2-dimensional images.

Now we're going to study how to make other kinds of visualizations.

## Animated Visualizations

__matplotlib__ can also make animated videos. The `matplotlib.animation` submodule ([docs](https://matplotlib.org/api/animation_api.html)) provides support for animation.

There are examples online:

* [Drawing animated GIFs with matplotlib](https://eli.thegreenplace.net/2016/drawing-animated-gifs-with-matplotlib/)
* [How to Create Animated Graphs in Python](https://towardsdatascience.com/how-to-create-animated-graphs-in-python-bb619cc2dec1)
* [Animated histogram](https://matplotlib.org/gallery/animation/animated_histogram.html)

### Flipbook Strategy

There's another simple strategy for making animated visualizations. The strategy is the same as an old-fashioned pen-and-paper flipbook: create lots of still images and flip through them quickly.

A good thing about this strategy is that it works with any package that can make static visualizations. The tradeoff is that you have to write code to create the images -- but usually this isn't too hard.

To use this strategy, you'll need a Python package that can save animated images. Let's look at an example using the __imageio__ package to save a GIF image. We'll use the Gapminder Dataset, which contains statistics for countries from 1800 to 2015. This dataset is based on data from the [Gapminder Project](https://www.gapminder.org/).

In [None]:
import imageio    # conda install -c conda-forge imageio


def render_frame(year, data):
    """Render a single frame (plot) in an animated visualization.
    
    Adapted from: https://ndres.me/post/matplotlib-animated-gifs-easily/
    """
    # Create a matplotlib figure to plot into.
    fig = plt.figure(figsize = (10, 5))
    
    # -------------------- Visualization Code
    
    # FILL THIS IN
    
    # -------------------- End Visualization Code

    # Draw the figure and then convert it to a Numpy array.
    fig.canvas.draw()
    image = np.frombuffer(fig.canvas.tostring_rgb(), dtype = "uint8")
    image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))
    
    # Close the matplotlib figure (we're done with it)
    plt.close()
    return image

In [None]:
# Save a list of images as a GIF file.
imageio.mimsave("gapminder.gif", plots, fps = 3)

## Web Visualizations

Web browsers are ubiquitous and support interactivity via JavaScript, so the web is an excellent platform for visualizations. Web visualizations are powered by a few important JavaScript libraries:

*   __[D3.js][]__: Short for Data-Driven Documents, D3 allows you to bind data
    to HTML tags. In other words, you can use data to control the structure and
    style of a web page.


*   __[Vega][]__ & __[Vega Lite][]__: A visualization grammar (the same idea as
    ggplot) built on top of D3. You write a description of what you want in
    JSON, and Vega produces a D3 visualization. Vega Lite adds support for
    common statistical graphics.


*   __[three.js][]__: A 3-dimensional graphics library.


*   __[Leaflet][]__: An interactive maps library.

There are many more JavaScript libraries, but the ones listed here are the most popular. They also have the best support from Python packages. Packages for creating web visualizations from Python include:

Package     | JS Library    | Description
----------  | ------------- | -----------
[mpld3][]   | [D3.js][]     | Matplotlib-like interface to D3
[bqplot][]  | [D3.js][]     | Bloomberg News' interface to D3
[plotly][]  | [D3.js][]     | Unified interface for interactive visualization across multiple languages
[altair][]  | [Vega Lite][] | Declarative interface to Vega & Vega Lite
[bokeh][]   | [BokehJS][]   | Unified interface for interactive visualization across multiple languages
[hvPlot][]  | [BokehJS][]   | Pandas-like interface to Bokeh
[Toyplot][] | -             | Interactive visualizations for Python
[folium][]  | [Leaflet][]   | Interface to Leaflet

Also worth mentioning is the [pygal](http://www.pygal.org/en/stable/) package, which produces SVG plots that can be viewed in a web browser but do not require any JavaScript library.

[D3.js]: https://d3js.org/
[Vega]: https://vega.github.io/vega/
[Vega Lite]: https://vega.github.io/vega-lite/
[three.js]: https://threejs.org/
[BokehJS]: http://bokeh.pydata.org/en/latest/docs/dev_guide/bokehjs.html
[Leaflet]: http://leafletjs.com/

[mpld3]: http://mpld3.github.io/
[altair]: https://altair-viz.github.io/
[plotly]: https://plot.ly/python/
[bokeh]: http://bokeh.pydata.org/
[folium]: https://github.com/python-visualization/folium
[hvPlot]: https://hvplot.pyviz.org/
[bqplot]: https://github.com/bloomberg/bqplot
[Toyplot]: https://toyplot.readthedocs.io/en/stable/

### Basic Interactivity

Let's start by looking at Bokeh. We'll make a scatter plot with the simplest kind of interactivity: a pan tool and a zoom tool.

To display Bokeh plots in a Jupyter notebook, first you must call the setup function `output_notebook()`. You don't have to do this if you're going to save your plots to HTML instead.

In [None]:
bokeh.io.output_notebook()

Now we can make a plot. Bokeh's plotting functions work with data frames in [tidy](http://vita.had.co.nz/papers/tidy-data.pdf) form.

In [None]:
from bokeh.plotting import figure, show


# Optional: save the plot to a standalone HTML file.
#bokeh.io.output_file("MY_PLOT.html")

Bokeh is a relatively low-level plotting package. It does not provide built-in functions for many basic statistical plots.

### Maps

Maps are another example of easy interactivity. Here's an example of setting up a map with the __folium__ package:

In [None]:
import folium    # conda install -c conda-forge folium

# Make a map.
m = folium.Map(location = [45.5236, -122.6750])

# Optional: set up a Figure to control the size of the map.
fig = folium.Figure(width = 600, height = 200)
fig.add_child(m)

# Optional: save the map to a standalone HTML file.
# fig.save("MY_MAP.html")

Let's plot some points on the map. We can use this [Yolo County Restaurants Dataset](http://anson.ucdavis.edu/~nulle/yolo_food.feather).

The dataset is in a _feather_ file, a format we haven't seen before. Feather is a format designed to make it easy to transfer data between R and Python. You can find out more [here](https://github.com/wesm/feather). In order to read a feather file, you need the __pyarrow__ package.

In [None]:
# conda install pyarrow -c conda-forge

food = pd.read_feather("yolo_food.feather")
food.head()

Folium doesn't support missing values, so we have to remove the missing values:

We can start by making a map as usual. Use the `zoom_start` parameter to adjust the zoom on the map.

Next, let's add a marker to the map for each restaurant. To do this, we have to iterate over the rows, creating markers and adding them to the map.

The function to create a marker is `folium.Marker()`. Markers have an `.add_to()` method to add them to a map.

Folium can also display boundaries stored in GeoJSON format. See [the documentation](https://python-visualization.github.io/folium/index.html) for more info.

You can convert shapefiles to GeoJSON with the __geopandas__ package.


### How Interactive Visualizations Work

In order to make a visualization interactive, you need to run some code when the user clicks on a widget. The code can run _client-side_ on the user's machine, or _server-side_ on your server.

All of the examples we've seen so far were of client-side interactivity, and all of them used JavaScript code.

For client-side interactivity:

* Your code usually must be written in JavaScript. A few years from now, this might not be true anymore.
* You can host your visualization on any web server. No special setup is needed.
* Your visualization will use the user's CPU and memory.

For server-side interactivity:

* Your code can be written in any language the server supports. This may require special setup.
* Your visualization will use the server's CPU and memory.
* You can update the data in real-time.
* You can save data submitted by the user.

Shiny is a server-side framework for R. There are lots of server-side frameworks for Python. Two of the most popular are [Django][django] and [Flask][flask].

[Panel][panel] and [Dash][dash] are relatively new server-side frameworks designed specifically for creating dashboards. These are more like Shiny than the other frameworks are.

[django]: https://www.djangoproject.com/
[flask]: http://flask.pocoo.org/
[panel]: https://panel.pyviz.org/
[dash]: https://plot.ly/products/dash/