# STA 141B Lecture 18

The class website is <https://github.com/2019-winter-ucdavis-sta141b/notes>

### Announcements

* Submit project report via project repository (`.md`, `.ipynb`, `.pdf`, `.doc`)
* Report due Wed of Finals Week
* Links to peer feedback for presentation will be sent out today or tomorrow
* Assignment 3 regrade posted
* Assignment 4 grades coming soon
* Send me a link to your slides the day before you present (or earlier)

### Topics

* Interactive visualizations
* End-of-quarter summary

### Datasets

* The Gapminder Dataset (included in this repository)
* The [Yolo County Restaurants Dataset](http://anson.ucdavis.edu/~nulle/yolo_food.feather)

### References

* [The Best Stats You've Ever Seen (Gapminder)](https://www.youtube.com/watch?v=hVimVzgtD6w&t=338s)
* JavaScript (for web visualizations)
    + [Learn X in Y Minutes, X = JavaScript][js-intro] -- a brief intro
    + [MDN JavaScript Guide][js-guide] -- a detailed guide
    + [MDN Learning Materials][web-intro] -- more information about web development

[PDSH]: https://jakevdp.github.io/PythonDataScienceHandbook/
[ProGit]: https://git-scm.com/book/
[nlpp]: https://www.nltk.org/book/
[atap]: https://search.library.ucdavis.edu/primo-explore/fulldisplay?docid=01UCD_ALMA51320822340003126&context=L&vid=01UCD_V1&search_scope=everything_scope&tab=default_tab&lang=en_US
[js-intro]: https://learnxinyminutes.com/docs/javascript/
[js-guide]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide
[web-intro]: https://developer.mozilla.org/en-US/docs/Learn

In [1]:
import bokeh.io       # conda install bokeh
import folium         # conda install -c conda-forge folium
# For feather files:  # conda install -c conda-forge pyarrow

# DATA SCIENCE TOOLKIT
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline

In [2]:
bokeh.io.output_notebook()

# Custom Interactive Visualizations

In order to make a visualization interactive, you need to run some code when the user clicks on a widget. The code can run _client-side_ on the user's machine, or _server-side_ on your server.

For client-side interactivity:

* Your code must be written in JavaScript.
* You can host your visualization on any web server. No special setup is needed.
* Your visualization will use the user's CPU and memory.

For server-side interactivity:

* Your code can be written in any language the server supports. This may require special setup.
* Your visualization will use the server's CPU and memory.
* You can update the data in real-time.
* You can save data submitted by the user.

There are lots of server-side frameworks for Python. Two of the most popular are [Django][django] and [Flask][flask].

[Panel][panel] and [Dash][dash] are relatively new server-side frameworks designed specifically for creating dashboards for data analytics. The purpose and functionality of these is similar to R's Shiny package.

[Bokeh][bokeh] is unique because it provides both a client-side and a server-side API.

[django]: https://www.djangoproject.com/
[flask]: http://flask.pocoo.org/
[panel]: https://panel.pyviz.org/
[dash]: https://plot.ly/products/dash/
[bokeh]: http://bokeh.pydata.org/

## Client-side

Let's use __bokeh__ to make a client-side interactive version of the Gapminder plot. The x-axis will still show fertility rate, and the y-axis will still show life expectancy. We'll use a slider widget to let the user control the year.

In [3]:
gapminder = pd.read_csv("../data/gapminder/gapminder.csv")
gapminder.head()

Unnamed: 0,country,year,life_expectancy,population,fertility_rate
0,Abkhazia,1800,,,
1,Afghanistan,1800,28.21,3280000.0,7.0
2,Akrotiri and Dhekelia,1800,,,
3,Albania,1800,35.4,410445.0,4.6
4,Algeria,1800,28.82,2503218.0,6.99


To create the plot, we need several tools from `bokeh.models`:

* `widgets.Slider` creates a slider widget.
* `CustomJS` adds custom JavaScript code to a plot.
* `ColumnDataSource` creates a data source for a plot that can be manipulated from JavaScript code.
* `CDSView` creates a "view" of a data source based on some kind of filter.
* `GroupFilter` filters a data source based on a category.

We can also use `bokeh.layouts` to organize our slider and plot on the page.

We need to write some JavaScript code to make the visualization work. You can quickly learn the syntax and basic semantics of JavaScript (and many other languages) from [Learn X in Y Minutes, X = JavaScript](https://learnxinyminutes.com/docs/javascript/). See the references at the top of this notebook for more thorough, in-depth tutorials.

In [41]:
# This is the process to get the closest year in Python
np.abs(gapminder["year"].unique() - 1803).argmin()

0

In [38]:
import bokeh.layouts
from bokeh.models import ColumnDataSource, CustomJS, CDSView, GroupFilter
from bokeh.models.widgets import Slider
from bokeh.plotting import figure, show

# Set up the slider.
start = gapminder["year"].min()
end = gapminder["year"].max()
slider = Slider(start = start, end = end, step = 1, value = start)

# Set up figure.
p = figure(title = str(start), width = 300, height = 300, x_range = (0, 10), y_range = (10, 100))
p.xaxis.axis_label = "Fertility Rate"
p.yaxis.axis_label = "Life Expectancy"

# Set up data sources.
df = gapminder.copy()
df["log_pop"] = np.log1p(df["population"])
df["year"] = df["year"].astype(str)
source = ColumnDataSource(df)
view = CDSView(source = source, filters = [GroupFilter(column_name = "year", group = str(start))] )

years = gapminder["year"].unique()

# Add the plot.
p.scatter("fertility_rate", "life_expectancy", size = "log_pop", source = source, view = view, fill_alpha = 0.2)

# ------------------------------------------------------------
# Set up the JavaScript callback.
# Use cb_obj to refer to the caller widget from a JavaScript callback.
callback = CustomJS(args = {"source": source, "view": view, "figure": p, "years": years}, code = """
    // This is the JavaScript code that will run whenever the slider is changed.
    // You can use the console.log() function to print values.
    
    var year = 1800;
    var best_d = 50;
    for (var y of years) {
        var d = Math.abs(y - cb_obj.value);
        
        if (d < best_d) {
            year = y;
            best_d = d;
        }
    }
    
    var value = year.toString();
    view.filters[0].group = value;
    figure.title.text = value;
    
    // Let Bokeh know the data source has changed.
    source.change.emit();
""")

slider.js_on_change("value", callback)

# ------------------------------------------------------------

# Finally, set up the layout and show everything.
layout = bokeh.layouts.Column(slider, p)
show(layout)

## Server-side

Now let's use __bokeh__ make a server-side interactive version of the Gapminder plot. You can use your own computer as a server to test the visualization.

The core of a server-side visualization (or "app") is a script that controls what's displayed on the page.

A code skeleton for our visualization is included in this repo, in the file `myapp.py`. Notice that most of the code is identical to what we wrote for the client-side visualization. The main difference is that now we can write our callback in Python instead of JavaScript.

### Running the App

In order to see the visualization, we have to run a __bokeh__ server. You can start a __bokeh__ server for `myapp.py` in an Anaconda Prompt (or Terminal) with the command:

```sh
python -m bokeh serve --show myapp.py
```

Be careful to specify the path to `myapp.py` correctly relative to the working directory!

# It's Week 9!

Major topics from this class:

* Scientific Python (Numpy, Pandas, Matplotlib)
* Web APIs
* Web Scraping
* Text Mining
* Databases

Where can you go from here?

* Learn...
    * Applied statistics (STA 108, 135) to match your computing knowledge.
    * A systems programming language (Java, C, C++, Rust) for performance-critical programming.
    * JavaScript (and D3.js) to make interactive visualizations.
    * Machine learning techniques (ECS 171).
    * [How to implement a neural network](https://victorzhou.com/blog/intro-to-neural-networks/).


* Get involved with the UC Davis [Data Science Initiative](http://dsi.ucdavis.edu/).


* Get involved with an open-source project.


* Start reading and learning from [Hacker News](https://news.ycombinator.com/).