In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
sns.set()
matplotlib.rcParams['figure.dpi'] = 144

In [None]:
import altair as alt
from vega_datasets import data
alt.renderers.enable('notebook')

# Explanatory Visualization

Data scientists often need to derive and communicate actionable conclusions from massive, complex information. Effective explanatory visualizations are central to this task.

Although we focus in this notebook on *explanatory* visualization, many effective explanatory visualizations give the viewer the ability to interactively *explore* the results themselves. Interactivity in explanatory visualization should allow the viewer to track the narrative uncovered through the scientist's analysis. 

## Multiple interactive plots

Often, a single visual cannot describe all the information we want to communicate to the viewer. Interactivity allows the viewer to make intuitive links between multiple features in data and reach conclusions more quickly. 

In [None]:
df_cars = data.cars()
df_cars['Year'] = df_cars["Year"].dt.year

### Repeat Plots

Altair excels in creating multiple interactive views of data and linking them together. Below, we make use of Altair's `repeat` functionality, which allows us to specify one encoding across multiple features of the data, and display them in separate charts side-by-side.

* We've set the X axis to take on a value of `alt.repeat("column")`
* We add a `repeat` method specifying which columns to repeat the chart over

In addition, we've created a selection tool that links the selections across the charts.

* We define an `alt.selection_interval` encoded by the X axis. 
* We use `alt.condition` to set the color and opacity of each of the marks based on the value of that selection.
* We can reference the same selection object in another plot -- because the underlying data is the same, the selection applies to this plot as well!

In [None]:
brush = alt.selection_interval(encodings=['x'])

repeat_chart = alt.Chart(df_cars).mark_point().encode(
    alt.X(alt.repeat("column"), type='quantitative'),
    alt.Y('Miles_per_Gallon:Q'),
    color=alt.condition(brush, 'Origin:N', alt.value('lightgray')),
    opacity = alt.condition(brush, alt.value(1), alt.value(0.1))
).properties(
    width=150,
    height=150
).add_selection(
    brush           
).repeat(
    column=['Weight_in_lbs', 'Acceleration', 'Horsepower']
)

year_chart = alt.Chart(df_cars, height=151, width = 200).mark_point().encode(
    alt.X("Year:N"),
    alt.Y("Miles_per_Gallon:Q"),
    color=alt.condition(brush, "Origin:N", alt.value("lightgray")),
    opacity = alt.condition(brush, alt.value(1), alt.value(0.1))

).add_selection(brush)



repeat_chart | year_chart

### Faceted plots

We can also easily break one plot into multiple using Altair's `facet` method. This is as simple as calling the facet method and specifying which column to facet on.

In [None]:
brush = alt.selection_interval()

facet_chart = alt.Chart(df_cars).mark_point().encode(
    alt.X('Year:N'),
    alt.Y('Miles_per_Gallon:Q'),
    color=alt.condition(brush, 'Origin:N', alt.value('lightgray'))
).properties(
    width=150,
    height=150
).add_selection(
    brush           
).facet(column="Origin")


facet_chart

## Data transformations: filtering and aggregating

We can also filter and aggregate data directly in Altair. Below, in addition to the faceted plot, we show a histogram of the number of cars from each origin currently selected.

* Specifying `alt.y("count()")` in the histogram chart gives us the count of all entries by each origin in this case
* Adding `.transform_filter()` to the chart quickly filters by data points in the selection.

In [None]:
brush = alt.selection_interval()

facet_chart = alt.Chart(df_cars).mark_point().encode(
    alt.X('Year:N'),
    alt.Y('Miles_per_Gallon:Q'),
    color=alt.condition(brush, 'Origin:N', alt.value('lightgray'))
).properties(
    width=150,
    height=150
).add_selection(
    brush           
).facet(column="Origin")

hist = alt.Chart(df_cars, height=151).mark_bar().encode(
    alt.X('Origin'),
    alt.Y("count()"),
    color="Origin"
).transform_filter(brush)

facet_chart | hist


### Creating tooltips

Setting tooltips in Altair is extremely easy -- simply specify `tooltip = [list of columns]` in the encoding.

In [None]:
chart = alt.Chart(df_cars).mark_point().encode(
    x = 'Horsepower',
    y = "Miles_per_Gallon",
    color= "Origin",
    tooltip = ['Name', 'Year', 'Origin']

)
chart

## Layout and Design

### Chart configurations

Plots in Altair are highly customizable according to the Vega-Lite JSON specification. Configurations are performed as methods on `alt.Chart` objects. Like other methods in Altair, these return chart objects, so methods can be chained together.

Some common options include:

* `chart.configure()`
    * `background=` : 
    
* `chart.configure_title()`
    * `fontSize=` : 

* `chart.configure_axis()`, `chart.configure_axisX()`, `chart.configure_axisY()`
    * `labelFontSize=`
    * `titleFontSize=`


* `chart.configure_legend()`
    * `orient=` e.g. top-right, bottom-left
    * `labelFontSize=`
    * `symbolSize=`
    
A more comprehensive covering of chart configurations can be found in the Altair [documentation](https://altair-viz.github.io/user_guide/configuration.html#config-chart)

In [None]:
chart = alt.Chart(df_cars, title="Miles per Gallon by Horsepower and Origin").mark_point().encode(
    x = 'Horsepower',
    y = "Miles_per_Gallon",
    color= "Origin",
    tooltip = ['Name', 'Year', 'Origin']

)

chart = chart.configure(
).configure_title(
    fontSize=16
).configure_axis(
    titleFontSize=14,
    labelFontSize=12
).configure_legend(
    titleFontSize=14,
    labelFontSize=12,
    symbolSize=400
)

chart

### Themes

Once we've gotten a chart perfectly configured for a project, there's a good chance that configuration will be useful for us again for subsequent charts in the same project. Rather than having to apply the same configuration methods to each new chart, we can specify a **theme** and automatically apply it to each new chart. 

A theme in Altair is simply a Python function that returns the `config` part of a specification. For a chart we have already generated, we can access this specification by calling `chart.to_dict()` and getting the `'config'` values.

In [None]:
chart.to_dict()['config']

In [None]:
def extract_theme(chart):
    return {'config': chart.to_dict()['config']}

In [None]:
extract_theme(chart)

In [None]:
def big_text():
    return {'config': {'view': {'width': 400, 'height': 300},
  'mark': {'tooltip': None},
  'axis': {'labelFontSize': 12, 'titleFontSize': 14},
  'legend': {'labelFontSize': 12, 'symbolSize': 400, 'titleFontSize': 14},
  'title': {'fontSize': 16}}}

In [None]:
# register the custom theme under a chosen name
alt.themes.register('big_text', big_text)

# enable the newly registered theme
# revert changes by alt.themes.enable('default')
alt.themes.enable('big_text')

In [None]:
chart = alt.Chart(df_cars).mark_point().encode(
    x = 'Horsepower',
    y = "Miles_per_Gallon",
    color= "Origin",
    tooltip = ['Name', 'Year', 'Origin']

)
chart

## Using Altair with large data sets

In [None]:
diam = sns.load_dataset("diamonds")
print("Rows: {}, Columns: {}".format(diam.shape[0], diam.shape[1]))
diam.head()

The `diamonds` data set has over 50k rows. What happens if we try to plot its values in Altair?

In [None]:
chart = alt.Chart(diam).mark_point().encode(
    x = 'carat',
    y = 'price',
    color='cut'
)

#The line below will return an error

#chart

### Using external JSON files

To pass more than 5000 rows into an Altair chart, we can initialize the Chart with a link to a JSON file instead of directly supplying it a pandas DataFrame.

Let's sample 10000 points from the `diamonds` data set randomly, save this to a JSON file, then construct an Altair chart by linking to the JSON file on our local directory.

In [None]:
url = 'diam_data.json'
diam.sample(10000).to_json(url, orient='records')


In [None]:
diam_chart = alt.Chart(url).mark_point(filled= True).encode(
    x = 'carat:Q',
    y = 'price:Q',
    color = "clarity:N"
)

diam_chart.save('diam_chart.json')
diam_chart.save('diam_chart.html')

diam_chart

In [None]:
import json
from pprint import pprint
with open('diam_chart.json', 'rb') as f:
    diam_chart_json = json.load(f)
    pprint(diam_chart_json)

## Exporting an Altair chart as HTML

It should be noted that Altair doesn't actually *render* charts itself. This means the output produced by Altair is not in image form. Rather, Altair produces JSON (Javascript Object Notation) files which are interpreted by a number of other packages before being rendered with Javascript in the browser.

We've been using Altair with Jupyter Notebook so far, but for explanatory visuals, you will often want to display charts in a web browser, either in standalone form or (more likely) embedded in a web page. Altair makes it very easy to do so. 

Altair can easily output an HTML file which functions as a standalone version of the chart, or a JSON file which can be embedded using Vega-Embed.

In [None]:
import pandas as pd
df = pd.DataFrame(data={'x':[1,2,0,1, 1.4, 1.2, 1.1, 2], 'y':[2,1,1,0, 1.1, 0.5, 1.2, 1]})

chart = alt.Chart(df).mark_line( size=2).encode(
    alt.X('x'),
    alt.Y("y")
)
chart.save("chart.json")
chart.save("chart.html")
chart

### The exported HTML file
We can see what this HTML file looks like. 

* Imports Vega, Vega-Lite, and Vega-Embed
* Creates a `<div>` with `id="vis"`
* Writes a script, note the Vega-Embed function at the bottom that points to `#vis`

In [None]:
with open("chart.html", "rb") as f:
    for line in f.readlines():
        print(line.decode("utf-8").strip('\n'))

### The exported JSON file
If we examine the JSON file we produced with `chart.save("chart.json")`, we can see that it is identical to the JSON object appearing in the `<script>` at the bottom of the HTML document above.

This file is specified according to Vega-Lite, and is interpreted by Vega-Lite in the above HTML. 

In [None]:
import json
with open("chart.json", "rb") as f:
    j = json.load(f)
    #print(j)
    print(json.dumps(j, indent=2))

For the most part, the JSON file above looks a lot like the Altair commands we wrote. Note, however, that the entire data set is included at the end of the JSON file. Looking back at the exported JSON for the `diam_chart`, we see that the specification simply gives a link to the data set.

## Embedding a chart in an HTML document

Altair's HTML export capability is easy and useful for producing a standalone HTML document containing only a plot, but we also want to be able to embed plots side-by-side with other information. 

Luckily, embedding a plot created by Altair in HTML is also very easy! Recall that Altair outputs JSON files according to the Vega-Lite specification, which in turn outputs JSON files according to the Vega specification. The Javascript library Vega-Embed is used for displaying these charts in the browser, and requires only a few steps:

**1. Import Vega, Vega-Lite, and Vega-Embed**

Place these in the `<head>` of your document:

```  
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega@4"></script>
 <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-lite@3.2.1"></script>
 <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-embed@3"></script> ```
**2. Create a container for the plot**
In the `<body>` of your document, create the `<div>` that will contain the Altair plot.

`<div id='vis'></div>`
 * It doesn't matter what you specify for the `id`, but you will use this `id` to tell Vega-Embed where to embed the plot.


**3. Call a script that embeds the plot**
Then, at the bottom of `<body>`, place the following script (replace 'chart.json' with the path to your local Altair JSON file):

```
<script>
      var spec = 'chart.json';
      vegaEmbed("#vis", spec);
 </script>
```

 **NOTE:** For this embedding to work, spec must consist of a valid link to the chart.json file, or the contents itself. Moreover, if chart.json specifies a *link* to the underlying data source, this link must be valid as well.

## Plotting Geographic Data

### Folium


GitHub: https://github.com/python-visualization/folium

Doc: http://python-visualization.github.io/folium/

In [None]:
import folium
map = folium.Map(location=[38.9071923, -77.0368707])
map

In [None]:
location = '1600 Pennsylvania Ave NW, Washington DC'

In [None]:
import requests
def geocode(address):
    params = { 'format'        :'json', 
               'addressdetails': 1, 
               'q'             : address}
    r = requests.get('http://nominatim.openstreetmap.org/search', params=params).json()
    return (float(r[0]['lat']), float(r[0]['lon']))
latlng = geocode(location)
latlng

In [None]:
map = folium.Map(location=latlng, zoom_start=15)
folium.Marker(latlng, popup=folium.Popup(location, parse_html=True)).add_to(map)
folium.CircleMarker(latlng, popup=folium.Popup(location, parse_html=True), radius=5, color='#3186cc', fill_color='#3186cc').add_to(map)
map

**Exercise**: You can add markers to the map programmatically. Try using requests to get lat/long data from an API or adding the location of tweets as they come in.

*Copyright &copy; 2019 The Data Incubator.  All rights reserved.*