In [1]:
# For this notebook you should have the following packages installed: pandas, altair, vega_datasets, matplotlib
# You can uncomment the line below and run this cell to install the required packages.

# !pip install pandas altair vega_datasets matplotlib

In [2]:
# importing libraries once
import altair as alt
import pandas as pd
from vega_datasets import data

In [3]:
# Get all datasets
iris_df = data.iris()
cars_df = data.cars()

# Vega-Altair - Declarative Visualization in Python

__Vega-Altair__ is a unique library in a Python data visualization ecosystem. We can create interactive visualization to improve exploratory data analysis. 

## Interaction Grammar

The three core components of specifying interactions in __Vegalite__,and hence __Vega-Altair__ are:

- Parameters
- Filters & Conditions
- Widgets

### Parameter

Parameters are the basic building blocks __Vega-Altair__ interaction grammar. Parameters in chart specification are analogous to variables in our Python code.

We can directly declare Python variables to control some aspects of a chart. We will use the Iris flower datasets for our examples

In [4]:
iris_df.head()

Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [15]:
mark_size = 30

alt.Chart(iris_df).mark_point(size=mark_size).encode(
    x="sepalLength:Q",
    y="sepalWidth:Q",
    color="species:N"
)

When we change the size of `mark_size` variable, the chart updates to show the size.

 Let us repeat the above example using Vega-Altair parameters. We can create a new parameter using `alt.param(value= < value of the variable >)`.

In [12]:
mark_size_param = alt.param(value=50)
mark_size_param

Parameter('param_3', VariableParameter({
  name: 'param_3',
  value: 50
}))

To use this parameter in a Vega-Altair plot we use `add_params` function to inform Vega-Altair about the existence of the parameter. Now we can use the parameter anywhere in our specification.

In [16]:
alt.Chart(iris_df).mark_point(size=mark_size_param).encode(
    x="sepalLength:Q",
    y="sepalWidth:Q",
    color="species:N"
).add_params(
    mark_size_param
)

The above approach is quite overkill for something simple like reusing the value in multiple places. The real utility of parameters comes in when we want to bind something in our chart to user input.

#### Binding parameters

Vega-Altair comes with series of input widgets we can use to add interactivity to our charts. The widgets are connected to our Vega-Altair plots using parameters. We will focus on binding the parameters to a widget, but later in the notebook we will take a look at different widgets in Vega-Altair. 

We will specify a Slider widget which allows us to select a number between a range.

In [18]:
slider = alt.binding_range(min=1, max=100, step=1)
slider

BindRange({
  input: 'range',
  max: 100,
  min: 1,
  step: 1
})

We can now bind the slider to our parameter and then use the parameter in the plot specification

In [23]:
slider = alt.binding_range(min=1, max=500, step=1, name="Mark size:  ")
mark_size_param = alt.param(value=30, bind=slider)
mark_size_param

Parameter('param_6', VariableParameter({
  bind: BindRange({
    input: 'range',
    max: 500,
    min: 1,
    name: 'Mark size:  ',
    step: 1
  }),
  name: 'param_6',
  value: 30
}))

In [24]:
alt.Chart(iris_df).mark_point(size=mark_size_param).encode(
    x="sepalLength:Q",
    y="sepalWidth:Q",
    color="species:N"
).add_params(
    mark_size_param
)

The parameters we discussed in the above examples are called Variable Parameter, they usually store a value and can be bound to input elements. Now we will look at Selection parameters.

### Selection Parameters

Widgets provide a really powerful way to add interactivity to our chart. However, widgets don't let us interact directly with the chart. Selection parameters offer us ways to create queries to the dataset by directly manipulating the chart. We can use mouse clicks and even keyboard events to add such interactions. We will look at how to define a selection parameter using `interval selection` as an example. We will look at types of selections later in the notebook.

In [29]:
selection_param = alt.param(select="interval")
selection_param

Parameter('param_9', SelectionParameter({
  name: 'param_9',
  select: 'interval'
}))

We can also use `alt.selection_interval` function to create such parameters. Moving forward we will use the `selection_*` functions rather than `param(select="interval")` style. However, both the styles are equivalent since the `selection_*` functions are just a wrapper around the `param` for our convenience.

In [26]:
selection = alt.selection_interval()
selection

Parameter('param_8', SelectionParameter({
  name: 'param_8',
  select: IntervalSelectionConfig({
    type: 'interval'
  })
}))

We can now add the interval selection to our plot.

In [28]:
alt.Chart(iris_df).mark_point().encode(
    x="sepalLength:Q",
    y="sepalWidth:Q",
    color="species:N"
).add_params(
    selection
)

Similar to the `mark_size` param earlier we created a `selection` and bound it mouse interactions on the chart. However, the selection is not very useful still. We need our chart to update in response to the selection similar to the slider we added earlier. We can use conditional encodings and/or filter transforms we discussed earlier in conjuction with selection parameters to achieve this.

### Conditional Encodings

We will look at a new way of specifying an encoding in a Vega-Altair plot – conditional encoding. Using the `condition` function we can specify two different values a particular encoding depending on wether the condition is satisfied or not. E.g.

```python
encode(
    color = alt.condition(predicate, "red", "blue")
)
```

Here we specified that if value of `predicate` is `True` the color should be red else it should be blue. One of the possible values of `predicate` is the selection parameter. When we use a selection parameter as predicate the _true_ condition is met when points lie within the selection. We will update our interactive plot to show colors for points within the rectangular brush.

In [40]:
selection = alt.selection_interval()

alt.Chart(iris_df).mark_point().encode(
    x="sepalLength:Q",
    y="sepalWidth:Q",
    color=alt.condition(selection, "species:N", alt.value("gray"))
).add_params(
    selection
)

We can use chart composition to create multiple views which are linked together with a brush. We will use the SPLOM as an example of linked multiple views.

In [48]:
cols = ["sepalLength", "petalLength", "sepalWidth", "petalWidth"]

alt.Chart(iris_df, width=200, height=200).mark_point().encode(
    x=alt.X(alt.repeat("row"), type="quantitative"),
    y=alt.X(alt.repeat("column"), type="quantitative"), 
    color="species:N", 
    tooltip="species:N"
).repeat(
    row=cols,
    column=cols
)

In a SPLOM, it is impossible to follow a single point across charts. We can use interactivity to select points in a chart and have them highlighted in others

In [54]:
cols = ["sepalLength", "petalLength", "sepalWidth", "petalWidth"]

selection = alt.selection_interval()

alt.Chart(iris_df, width=200, height=200).mark_point().encode(
    x=alt.X(alt.repeat("row"), type="quantitative"),
    y=alt.X(alt.repeat("column"), type="quantitative"), 
    color=alt.condition(selection, "species:N", alt.value("gray")), 
    opacity=alt.condition(selection, alt.value(0.7), alt.value(0.1)), 
    tooltip="species:N"
).add_params(
    selection
).repeat(
    row=cols,
    column=cols
)

### Interactive Filtering

Similar to conditional encoding, we can selection parameters as predicate for our filter transforms. We will recreate the example at the very beginning of our previous lecture which showed an composite chart with scatterplot and a barchart for the _cars_ dataset. The scatterplot is interactive and can be used to control what data is shown in the barchart. We will first create the static composite plot.

In [55]:
df = data.cars()

base_plot = alt.Chart(df)

scatterplot = base_plot.mark_point().encode(
    x="Miles_per_Gallon:Q",
    y="Weight_in_lbs:Q",
    color="Origin:N",    
)

histogram = base_plot.mark_bar().encode(
    y="Origin:N",
    color="Origin:N",
    x="count():Q",
)

scatterplot & histogram

We will now add a selection parameter to filter the barchart using scatterplot

Combination of parameters and selections bound to various chart properties and encodigns along with chart composition allows us to create very cool dynamic charts for easy exploration of the data.

In [56]:
df = data.cars()

brush_selection = alt.selection_interval()

base_plot = alt.Chart(df)

scatterplot = base_plot.mark_point().encode(
    x="Miles_per_Gallon:Q",
    y="Weight_in_lbs:Q",
    color=alt.condition(brush_selection, "Origin:N", alt.value("gray")),
    opacity=alt.condition(brush_selection, alt.value(0.7), alt.value(0.3))
    
).add_params(
    brush_selection
)

histogram = base_plot.mark_bar().encode(
    y="Origin:N",
    color="Origin:N",
    x="count():Q",
).transform_filter(
    brush_selection
)

scatterplot & histogram