### Resources

In [1]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import altair as alt

import ipywidgets as widgets
from ipywidgets import interact


### Load Vega Datasets

To list all the available datsets, use list_datasets:

In [2]:
from vega_datasets import data
data.list_datasets()

['7zip',
 'airports',
 'annual-precip',
 'anscombe',
 'barley',
 'birdstrikes',
 'budget',
 'budgets',
 'burtin',
 'cars',
 'climate',
 'co2-concentration',
 'countries',
 'crimea',
 'disasters',
 'driving',
 'earthquakes',
 'ffox',
 'flare',
 'flare-dependencies',
 'flights-10k',
 'flights-200k',
 'flights-20k',
 'flights-2k',
 'flights-3m',
 'flights-5k',
 'flights-airport',
 'gapminder',
 'gapminder-health-income',
 'gimp',
 'github',
 'graticule',
 'income',
 'iowa-electricity',
 'iris',
 'jobs',
 'la-riots',
 'londonBoroughs',
 'londonCentroids',
 'londonTubeLines',
 'lookup_groups',
 'lookup_people',
 'miserables',
 'monarchs',
 'movies',
 'normal-2d',
 'obesity',
 'ohlc',
 'points',
 'population',
 'population_engineers_hurricanes',
 'seattle-temps',
 'seattle-weather',
 'sf-temps',
 'sp500',
 'stocks',
 'udistrict',
 'unemployment',
 'unemployment-across-industries',
 'uniform-2d',
 'us-10m',
 'us-employment',
 'us-state-capitals',
 'volcano',
 'weather',
 'weball26',
 'wheat',

### Create a Dummy Dataset

To build the visual, we don't really need data from Power BI. We can use some dummy data to build the visual, which can then be transferred to Deneb. I will use Pandas for creating the dataframe.

Creating a dummy dataframe with X, Y, Z columns with 50 observations.

In [3]:
df = pd.DataFrame({
    'X': range(50),
    'Y': np.random.rand(50).cumsum(),
    'Z': np.random.rand(50)*100
}
).round(0)
    
df.head()


Unnamed: 0,X,Y,Z
0,0,1.0,45.0
1,1,1.0,7.0
2,2,2.0,90.0
3,3,2.0,73.0
4,4,3.0,98.0


### Building Base Selections

I will build two visuals and show how you can create a composite visual. Try interacting with the visual by zooming and hovering. I have annotated the code below if you are not familiar with Altair. If you have never used Altair before, still follow along. The code we generate from the visual can be applied to any data in Power BI.

In this example, we want to build an interactive, composite visual to analyze multivariate data. The bar chart shows X vs. Z and the scatter plot shows X vs Y. We are interested in analyzing the Z variable. By adding interactivity, we can visualize Z in XZ and YZ planes in a single visual.

First I will create the base visuals with conditional formatting and then show how to add the widget.

#### Barchart

In [4]:
# Create a bar chart using df dataframe
bar = (alt.Chart(df).mark_bar().encode(
    x='X',                             # X variable
    y='Z',                             # Y variable is Z
    tooltip=['X','Y','Z'],             # Add tooltip
    color=alt.condition(
        alt.datum.Z< 20,               # Conditional formatting, threshold 20
        # if Z < 20, blue otherwise gray
        alt.value('#477998'), alt.value('hsla(232, 7%, 20%, 0.25)')
        
    )          
).properties(width=1024, height=512, title="Bar Chart"))

text = bar.mark_text(
    align='right',       # add a text label
    baseline='middle',   # align data labels
    
    dx= 3, dy= -5        # align x & y positions of labels
).encode(
    text='Z:Q',
    color=alt.condition(
        alt.datum.Z< 20,   # conditional formatting, threshold 20
        # only show label if Z < 20, 0 is for alpha
        alt.value('#477998'), alt.value('hsla(232, 7%, 20%, 0)'))
)

#add bar chart layer to text layer
bartext = (bar + text)
bartext


#### Scatter Plot

In [5]:
# create a scatterplot using df dataframe
scatter = (alt.Chart(df).mark_circle().encode(
    x='X',      # x axis is X
    y='Y',      # y axis uses Y
    tooltip=['X','Y','Z'], 
    color=alt.condition(
        alt.datum.Z < 20,
        alt.value('red'), alt.value('hsla(232, 7%, 20%, 0.25)')
    )
).properties(width=1024, height=512, title="Scatterplot"))

# add bar chart layer to text layer)
scatter


Let's combine these two to make a composite chart. This is a single visual now with two chart types.

In [6]:
bartext | scatter

Few things to notice in the code above:

* For the bar chart, I have defined an alternate condition that values less than 20 are blue in color and values above 20 are gray
* I added text as another chart on top of the bar chart. This allow us to create data-driven labels. In the options, notice I defined the text color as hsla(232, 7%, 20%, 0). The last value here 0 is the alpha that defines the transperancy. If Z > 20, the text will be become transparent and only values below 20 will be appear.
* For scatterplot, values below 20 are red and values above 20 are gray
* I combined the two charts together using " | ".

In the base visuals, I defined the color threshold 20 manually. Now we want to add a slicer widget so the user can control that threshold. To do that, we have to define a selector and bind that to the visuals above.

Below I am defining the min and max range for the slicer, name of the slicer and the default value.

### Building Complex Selections

Selection values can be accessed directly and used in expressions that affect the chart. For example, here we create a slider to choose a cutoff value, and color points based on whether they are smaller or larger than the value:

In [7]:
# nbi:hide_in
rand = np.random.RandomState(42)

df = pd.DataFrame({
    'xval': range(100),
    'yval': rand.randn(100).cumsum()
})

slider = alt.binding_range(min=0, max=100, step=1, name='CutOff:')
selector = alt.selection_single(
    name="SelectorName", fields=['cutoff'],
    bind=slider, init={'cutoff': 50})

alt.Chart(df).mark_point().encode(
    x='xval',
    y='yval',
    color=alt.condition(
        alt.datum.xval < selector.cutoff,
        alt.value('red'), alt.value('blue')
    )
).add_selection(
    selector
).interactive().properties(width=1024)


#### Interactive Average

The plot below uses an interval selection, which causes the chart to include an interactive brush (shown in grey). The brush selection parameterizes the red guideline, which visualizes the average value within the selected interval.

In [8]:
source = data.seattle_weather()
brush = alt.selection(type='interval', encodings=['x'])

bars = alt.Chart().mark_bar().encode(
    x='month(date):O',
    y='mean(precipitation):Q',
    opacity=alt.condition(brush, alt.OpacityValue(1), alt.OpacityValue(0.7)),
).add_selection(
    brush
)

line = alt.Chart().mark_rule(color='firebrick').encode(
    y='mean(precipitation):Q',
    size=alt.SizeValue(3)
).transform_filter(
    brush
)

alt.layer(bars, line, data=source)

#### Interactive Chart with Cross-Highlight

This example shows an interactive chart where selections in one portion of the chart affect what is shown in other panels. Click on the bar chart to see a detail of the distribution in the upper panel.

In [9]:
source = data.movies.url

pts = alt.selection(type="single", encodings=['x'])

rect = alt.Chart(data.movies.url).mark_rect().encode(
    alt.X('IMDB_Rating:Q', bin=True),
    alt.Y('Rotten_Tomatoes_Rating:Q', bin=True),
    alt.Color('count()',
        scale=alt.Scale(scheme='greenblue'),
        legend=alt.Legend(title='Total Records')
    )
)

circ = rect.mark_point().encode(
    alt.ColorValue('grey'),
    alt.Size('count()',
        legend=alt.Legend(title='Records in Selection')
    )
).transform_filter(
    pts
)

bar = alt.Chart(source).mark_bar().encode(
    x='Major_Genre:N',
    y='count()',
    color=alt.condition(pts, alt.ColorValue("steelblue"), alt.ColorValue("grey"))
).properties(
    width=550,
    height=200
).add_selection(pts)

alt.vconcat(
    rect + circ,
    bar
).resolve_legend(
    color="independent",
    size="independent"
)


#### Interactive Crossfilter

This example shows a multi-panel view of the same data, where you can interactively select a portion of the data in any of the panels to highlight that portion in any of the other panels.

In [10]:
source = alt.UrlData(
    data.flights_2k.url,
    format={'parse': {'date': 'date'}}
)

brush = alt.selection(type='interval', encodings=['x'])

# Define the base chart, with the common parts of the
# background and highlights
base = alt.Chart().mark_bar().encode(
    x=alt.X(alt.repeat('column'), type='quantitative', bin=alt.Bin(maxbins=20)),
    y='count()'
).properties(
    width=160,
    height=130
)

# gray background with selection
background = base.encode(
    color=alt.value('#ddd')
).add_selection(brush)

# blue highlights on the transformed data
highlight = base.transform_filter(brush)

# layer the two charts & repeat
alt.layer(
    background,
    highlight,
    data=source
).transform_calculate(
    "time",
    "hours(datum.date)"
).repeat(column=["distance", "delay", "time"])

#### Interactive Legend

The following shows how to create a chart with an interactive legend, by binding the selection to "legend". Such a binding only works with selection_single or selection_multi when projected over a single field or encoding.



In [11]:
source = data.unemployment_across_industries.url

selection = alt.selection_multi(fields=['series'], bind='legend')

alt.Chart(source).mark_area().encode(
    alt.X('yearmonth(date):T', axis=alt.Axis(domain=False, format='%Y', tickSize=0)),
    alt.Y('sum(count):Q', stack='center', axis=None),
    alt.Color('series:N', scale=alt.Scale(scheme='category20b')),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).add_selection(
    selection
)

#### Interactive Rectangular Brush

This example shows how to add a simple rectangular brush to a scatter plot. By clicking and dragging on the plot, you can highlight points within the range.

In [12]:
source = data.cars()
brush = alt.selection(type='interval')

alt.Chart(source).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Cylinders:O', alt.value('grey')),
).add_selection(brush)

#### Interactive Scatter Plot and Linked Layered Histogram

This example shows how to link a scatter plot and a histogram together such that clicking on a point in the scatter plot will isolate the distribution corresponding to that point, and vice versa.

In [13]:
# generate fake data
source = pd.DataFrame({'gender': ['M']*1000 + ['F']*1000,
               'height':np.concatenate((np.random.normal(69, 7, 1000),
                                       np.random.normal(64, 6, 1000))),
               'weight': np.concatenate((np.random.normal(195.8, 144, 1000),
                                        np.random.normal(167, 100, 1000))),
               'age': np.concatenate((np.random.normal(45, 8, 1000),
                                        np.random.normal(51, 6, 1000)))
        })

selector = alt.selection_single(empty='all', fields=['gender'])

color_scale = alt.Scale(domain=['M', 'F'],
                        range=['#1FC3AA', '#8624F5'])

base = alt.Chart(source).properties(
    width=250,
    height=250
).add_selection(selector)

points = base.mark_point(filled=True, size=200).encode(
    x=alt.X('mean(height):Q',
            scale=alt.Scale(domain=[0,84])),
    y=alt.Y('mean(weight):Q',
            scale=alt.Scale(domain=[0,250])),
    color=alt.condition(selector,
                        'gender:N',
                        alt.value('lightgray'),
                        scale=color_scale),
)

hists = base.mark_bar(opacity=0.5, thickness=100).encode(
    x=alt.X('age',
            bin=alt.Bin(step=5), # step keeps bin size the same
            scale=alt.Scale(domain=[0,100])),
    y=alt.Y('count()',
            stack=None,
            scale=alt.Scale(domain=[0,350])),
    color=alt.Color('gender:N',
                    scale=color_scale)
).transform_filter(
    selector
)

points | hists


#### Multi-Line Highlight

This multi-line chart uses an invisible Voronoi tessellation to handle mouseover to identify the nearest point and then highlight the line on which the point falls.

In [14]:
source = data.stocks()

highlight = alt.selection(type='single', on='mouseover',
                          fields=['symbol'], nearest=True)

base = alt.Chart(source).encode(
    x='date:T',
    y='price:Q',
    color='symbol:N'
)

points = base.mark_circle().encode(
    opacity=alt.value(0)
).add_selection(
    highlight
).properties(
    width=600
)

lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3))
)

points + lines

#### Multi-Line Tooltip

This example shows how you can use selections and layers to create a tooltip-like behavior tied to the x position of the cursor. If you are looking for more standard tooltips, it is recommended to use the tooltip encoding channel as shown in the Scatter Plot With Tooltips example.

The following example employs a little trick to isolate the x-position of the cursor: we add some transparent points with only an x encoding (no y encoding) and tie a nearest selection to these, tied to the “x” field.



In [15]:
np.random.seed(42)
source = pd.DataFrame(np.cumsum(np.random.randn(100, 3), 0).round(2),
                    columns=['A', 'B', 'C'], index=pd.RangeIndex(100, name='x'))
source = source.reset_index().melt('x', var_name='category', value_name='y')

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['x'], empty='none')

# The basic line
line = alt.Chart(source).mark_line(interpolate='basis').encode(
    x='x:Q',
    y='y:Q',
    color='category:N'
)

# Transparent selectors across the chart. This is what tells us
# the x-value of the cursor
selectors = alt.Chart(source).mark_point().encode(
    x='x:Q',
    opacity=alt.value(0),
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'y:Q', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(source).mark_rule(color='gray').encode(
    x='x:Q',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300
)

#### Multi-panel Scatter Plot with Linked Brushing

This is an example of using an interval selection to control the color of points across multiple panels.

In [16]:
source = data.cars()

brush = alt.selection(type='interval', resolve='global')

base = alt.Chart(source).mark_point().encode(
    y='Miles_per_Gallon',
    color=alt.condition(brush, 'Origin', alt.ColorValue('gray')),
).add_selection(
    brush
).properties(
    width=250,
    height=250
)

base.encode(x='Horsepower') | base.encode(x='Acceleration')

#### Multiple Interactions

This example shows how multiple user inputs can be layered onto a chart. The four inputs have functionality as follows:

* Dropdown: Filters the movies by genre
* Radio Buttons: Highlights certain films by Worldwide Gross
* Mouse Drag and Scroll: Zooms the x and y scales to allow for panning.



In [17]:
movies = alt.UrlData(
    data.movies.url,
    format=alt.DataFormat(parse={"Release_Date":"date"})
)
ratings = ['G', 'NC-17', 'PG', 'PG-13', 'R']
genres = ['Action', 'Adventure', 'Black Comedy', 'Comedy',
       'Concert/Performance', 'Documentary', 'Drama', 'Horror', 'Musical',
       'Romantic Comedy', 'Thriller/Suspense', 'Western']

base = alt.Chart(movies, width=200, height=200).mark_point(filled=True).transform_calculate(
    Rounded_IMDB_Rating = "floor(datum.IMDB_Rating)",
    Hundred_Million_Production =  "datum.Production_Budget > 100000000.0 ? 100 : 10",
    Release_Year = "year(datum.Release_Date)"
).transform_filter(
    alt.datum.IMDB_Rating > 0
).transform_filter(
    alt.FieldOneOfPredicate(field='MPAA_Rating', oneOf=ratings)
).encode(
    x=alt.X('Worldwide_Gross:Q', scale=alt.Scale(domain=(100000,10**9), clamp=True)),
    y='IMDB_Rating:Q',
    tooltip="Title:N"
)

# A slider filter
year_slider = alt.binding_range(min=1969, max=2018, step=1)
slider_selection = alt.selection_single(bind=year_slider, fields=['Release_Year'], name="Release Year_")


filter_year = base.add_selection(
    slider_selection
).transform_filter(
    slider_selection
).properties(title="Slider Filtering")

# A dropdown filter
genre_dropdown = alt.binding_select(options=genres)
genre_select = alt.selection_single(fields=['Major_Genre'], bind=genre_dropdown, name='Genre')

filter_genres = base.add_selection(
    genre_select
).transform_filter(
    genre_select
).properties(title="Dropdown Filtering")

#color changing marks
rating_radio = alt.binding_radio(options=ratings)

rating_select = alt.selection_single(fields=['MPAA_Rating'], bind=rating_radio, name='Rating')
rating_color_condition = alt.condition(rating_select,
                      alt.Color('MPAA_Rating:N', legend=None),
                      alt.value('lightgray'))

highlight_ratings = base.add_selection(
    rating_select
).encode(
    color=rating_color_condition
).properties(title="Radio Button Highlighting")

# Boolean selection for format changes
input_checkbox = alt.binding_checkbox()
checkbox_selection = alt.selection_single(bind=input_checkbox, name='Budget Films')

size_checkbox_condition = alt.condition(checkbox_selection,
                                        alt.SizeValue(25),
                                        alt.Size('Hundred_Million_Production:Q')
                                       )

budget_sizing = base.add_selection(
    checkbox_selection
).encode(
    size=size_checkbox_condition
).properties(title="Checkbox Formatting")

( filter_year | filter_genres) & (highlight_ratings | budget_sizing  )


#### Scatter Plot and Histogram with Interval Selection

This example shows how to link a scatter plot and a histogram together such that an interval selection in the histogram will plot the selected values in the scatter plot.

Note that both subplots need to know about the mbin field created by the transform_bin method. In order to achieve this, the data is not passed to the Chart() instances creating the subplots, but directly in the hconcat() function, which joins the two plots together.

In [18]:
x = np.random.normal(size=100)
y = np.random.normal(size=100)

m = np.random.normal(15, 1, size=100)

source = pd.DataFrame({"x": x, "y":y, "m":m})

# interval selection in the scatter plot
pts = alt.selection(type="interval", encodings=["x"])

# left panel: scatter plot
points = alt.Chart().mark_point(filled=True, color="#424242").encode(
    x='x',
    y='y'
).transform_filter(
    pts
).properties(
    width=300,
    height=300
)

# right panel: histogram
mag = alt.Chart().mark_bar().encode(
    x='mbin:N',
    y="count()",
    color=alt.condition(pts, alt.value("#616161"), alt.value("#BDBDBD"))
).properties(
    width=300,
    height=300
).add_selection(pts)

# build the chart:
alt.hconcat(
    points,
    mag,
    data=source
).transform_bin(
    "mbin",
    field="m",
    bin=alt.Bin(maxbins=20)
)


#### Selection Detail Example

This example shows a selection that links two views of data: the left panel contains one point per object, and the right panel contains one line per object. Clicking on either the points or lines will select the corresponding objects in both views of the data.

The challenge lies in expressing such hierarchical data in a way that Altair can handle. We do this by merging the data into a “long form” dataframe, and aggregating identical metadata for the final plot.

In [19]:
np.random.seed(0)

n_objects = 20
n_times = 50

# Create one (x, y) pair of metadata per object
locations = pd.DataFrame({
    'id': range(n_objects),
    'x': np.random.randn(n_objects),
    'y': np.random.randn(n_objects)
})

# Create a 50-element time-series for each object
timeseries = pd.DataFrame(np.random.randn(n_times, n_objects).cumsum(0),
                          columns=locations['id'],
                          index=pd.RangeIndex(0, n_times, name='time'))

# Melt the wide-form timeseries into a long-form view
timeseries = timeseries.reset_index().melt('time')

# Merge the (x, y) metadata into the long-form view
timeseries['id'] = timeseries['id'].astype(int)  # make merge not complain
data = pd.merge(timeseries, locations, on='id')

# Data is prepared, now make a chart

selector = alt.selection_single(empty='all', fields=['id'])

base = alt.Chart(data).properties(
    width=250,
    height=250
).add_selection(selector)

points = base.mark_point(filled=True, size=200).encode(
    x='mean(x)',
    y='mean(y)',
    color=alt.condition(selector, 'id:O', alt.value('lightgray'), legend=None),
)

timeseries = base.mark_line().encode(
    x='time',
    y=alt.Y('value', scale=alt.Scale(domain=(-15, 15))),
    color=alt.Color('id:O', legend=None)
).transform_filter(
    selector
)

points | timeseries