# Interactive Plotting with Altair

Static visualizations are good, but they definitely have their limitations. It's definitely cumbersome to figure out what the outlier is in a static visualization.

There have been a number of interesting developments in recent years, in particular Bokeh, plot.ly, and most recently Altair.

Altair is more declarative, by that I mean that the mapping from data to visual is more natural. There is a specific grammar for the composition of charts, and with this you can go far quickly. Note, Altair is a Python library that is in essence a builder for Vega/Vega-Lite JSON specifications.

Visualizations are composed of marks, mapping data to lines, shapes, size, and opacity.


In [None]:
import altair as alt
import pandas as pd

In [None]:
alt.renderers.enable('notebook')
alt.data_transformers.enable('default', max_rows=None)

In [None]:
fifa = pd.read_csv('../data/fifa_player_data.csv.gz')

In [None]:
mapping = {'RCB': 'DEF', 'LCB': 'DEF', 'CB': 'DEF',
           'LB': 'DEF', 'RB': 'DEF', 
           'LWB': 'DEF', 'RWB': 'DEF',
           'CDM': 'MID', 'RM': 'MID',
           'LCM': 'MID', 'RCM': 'MID', 'LM': 'MID', 'CAM': 'MID',
           'LDM': 'MID', 'RDM': 'MID',
           'LAM': 'MID', 'RAM': 'MID',
           'CM': 'MID', 'LW': 'MID', 'RW': 'MID',
           'LS': 'ST', 'RS': 'ST', 'RF': 'ST', 'LF': 'ST', 'CF': 'ST'
          }
fifa['GeneralPosition'] = fifa.Position.apply(lambda x: mapping[x] if x in mapping else x )

In [None]:
fifa = fifa[~fifa.GeneralPosition.isna()]

## Looking at 1D distributions

In [None]:
sample = fifa.sample(5000)

Creating a chart in Altair is pretty simple.

Let's take this task:

> **We want to view the distribution of player Accelerations on a line so that we can see where individual players sit**

Since it's just one line, and on that line we want to plot each player's acceleration, using some visual mark (e.g. circle, line)

In [None]:
alt.Chart(sample).mark_point().encode(
    x='Acceleration',
)

We can improve this by changing the opacity of the points so we can get a better idea of the distribution.

In [None]:
alt.Chart(sample).mark_point(opacity=0.05).encode(
    x='Acceleration',
)

The point can be changed to a tick (line). 

In [None]:
alt.Chart(sample).mark_tick(
    opacity=0.03, 
    thickness=5).encode(
    x='Acceleration',
)

Or a filled circle very easily.

In [None]:
alt.Chart(sample).mark_circle(opacity=0.03).encode(
    x='Acceleration',
)

Our users have now come back with the following task. 

> **We want to view the distribution of player Accelerations for each position**


In [None]:
alt.Chart(sample).mark_tick(opacity=0.1, thickness=5).encode(
    x='Acceleration',
    y='Position'
)

## Adding Interactivity

### Zooming

In [None]:
alt.Chart(sample).mark_tick(opacity=0.1, thickness=5).encode(
    x='Acceleration',
    y='Position'
).interactive()

### Tooltips

In [None]:
alt.Chart(sample).mark_tick(opacity=0.1, thickness=5).encode(
    x='Acceleration',
    y='Position',
    tooltip=['Name', 'Nationality']
)

## Customising Plots

Similar to what we did in Part 2, we can change how the chart looks in terms or colours, usage of axes, grids, etc.

In [None]:
alt.Chart(sample).mark_tick(opacity=0.1, thickness=5).encode(
    x='Acceleration',
    y='Position',
    tooltip=['Name', 'Nationality']
).configure_axis(
    grid=False
).configure_view(
    strokeWidth=0
)

It would be of course better to use something like a box plot to show the distributions instead.

In [None]:
alt.Chart(sample).mark_boxplot().encode(
    x='Acceleration',
    y='Position',
    color='GeneralPosition'
)

Our users have now come back with the following task. 

> **We want to view the distribution of player Accelerations for each position broken down by Nationality**


In [None]:
alt.Chart(sample).mark_boxplot().encode(
    x='Acceleration',
    y='Nationality',
    color='GeneralPosition'
)

In [None]:
alt.Chart(sample).mark_circle().encode(
    x='Acceleration',
    y=alt.Y('Nationality', sort=alt.EncodingSortField(field='Acceleration', op='max', order='descending')),
    color='GeneralPosition',
    tooltip=['Name']
)

## Histograms

### One Group

In [None]:
alt.Chart(fifa.sample(1000)).mark_bar().encode(
    x='Nationality',
    y='average(Acceleration)'
)

#### Sorting Values

But, as shown in our lectures, if you can sort, it'll make comparison easier. How does it look if we sort by the Average Acceleration?

In [None]:
alt.Chart(fifa.sample(1000)).mark_bar().encode(
    x=alt.X('Nationality', sort=alt.EncodingSortField(field='Acceleration', op='average', order='descending')),
    y='average(Acceleration)'
)

### Comparing Distributions

#### Superimposed

In [None]:
alt.Chart(fifa).mark_area(
    opacity=0.5,
    interpolate='step',
).encode(
    alt.X('Reactions', bin=alt.Bin(maxbins=20)),
    alt.Y('count()', stack=None, axis=alt.Axis(title='Number of Players')),
    color='GeneralPosition',
    tooltip=['GeneralPosition']
)

#### Juxtaposed

In [None]:
alt.Chart(fifa).mark_area(
    interpolate='step'
).encode(
    alt.X('ShortPassing', bin=alt.Bin(maxbins=25)),
    alt.Y('count()', stack=None, axis=alt.Axis(title='Number of Players')),
    alt.Color(
        'GeneralPosition',
    ),
    tooltip=['GeneralPosition']
).properties(height=100, width=100).facet(facet='GeneralPosition', columns=2)

In [None]:
alt.Chart(sample).mark_boxplot().encode(
    x='GeneralPosition',
    y='ShortPassing',
    color='GeneralPosition',
    tooltip=['Name', 'Position', 'Nationality', 'Acceleration', 'SprintSpeed']
).interactive().properties(width=150) | alt.Chart(sample).mark_boxplot().encode(
    x='GeneralPosition',
    y='Finishing',
    color='GeneralPosition',
    tooltip=['Name', 'Position', 'Nationality', 'Acceleration', 'SprintSpeed']
).interactive().properties(width=150)

## Scatter Charts

In [None]:
alt.Chart(sample).mark_circle().encode(
    x='Acceleration',
    y='SprintSpeed',
    color='GeneralPosition',
    tooltip=['Name', 'Position', 'Nationality', 'Acceleration', 'SprintSpeed']
).interactive()

In [None]:
import altair as alt

brush = alt.selection_interval()

alt.Chart(fifa.query('Nationality=="Romania"')).mark_circle().encode(
    alt.X(alt.repeat("column"), type='quantitative'),
    alt.Y(alt.repeat("row"), type='quantitative'),
     color=alt.condition(brush, 'Position:N', alt.value('lightgray')),
        tooltip=['Name', 'Position']
).properties(
    width=200,
    height=200
).add_selection(
    brush
).repeat(
    row=['Acceleration', 'SprintSpeed'],
    column=['Finishing', 'Strength']
)

In [None]:
### 

In [None]:
alt.Chart(sample).mark_circle().encode(
    x='Nationality',
    y='average(Acceleration)',
    size='count()',
    color='GeneralPosition',
    tooltip=['GeneralPosition', 'average(Acceleration)', 'count()']
).interactive().properties(width=900)

In [None]:
alt.Chart(sample).mark_line().encode(
    x='Nationality',
    y='average(Acceleration)',
    size='count()',
    color='GeneralPosition',
    tooltip=['GeneralPosition', 'average(Acceleration)', 'count()']
).interactive().properties(width=900)

## Small Multiples with Facets

Sometimes we also want to create many small juxtaposed plots to show distributions of values split by some feature, such as Nationality.

Again, in Altair, this is really straightforward.

In [None]:
alt.Chart(sample).mark_boxplot().encode(
    x='Acceleration',
    y='GeneralPosition',
    color='GeneralPosition'
).properties(width=100).facet(facet='Nationality', columns=7)

In [None]:
alt.Chart(sample).mark_tick(opacity=0.4, thickness=2).encode(
    x='Acceleration',
    y='GeneralPosition',
    color='GeneralPosition',
    tooltip=['Name']
).properties(width=100).facet(facet='Nationality', columns=7)

## Layering Charts

## Building Cross-Linked Plots

It's often more interesting to be able to interrogate your date interactively, seeing how distributions change based on some selection for example.

Luckily, we can do this directly in our notebook without having to go to a different tool base, and it's rather easy.

Key here are selections and transform filters.

We add selections to a plot, and that selection can be applied to some other plot with a transform_filter.

In [None]:
brush = alt.selection(type='interval')
nation_select = alt.selection(type='single', fields=['Nationality'])

nationality_count = alt.Chart(sample).mark_bar().encode(
    y='count(Nationality)',
    x=alt.X('Nationality',
        sort=alt.EncodingSortField(field='count(Nationality)', order='descending')
    ),
    color=alt.condition(nation_select, alt.value('blue'), alt.value('lightgray')),
).transform_filter(
    brush
).add_selection(
    nation_select
).properties(
    width=800,
    height=350,
    title='Players by Nationality'
)

acceleration_hist = alt.Chart(sample).mark_bar().encode(
    y='count(Acceleration)',
    x='Acceleration',
    color=alt.condition(brush, alt.value('blue'), alt.value('lightgray')),
).add_selection(
    brush
).transform_filter(
    nation_select
).properties(
    width=800,
    height=300,
    title='Acceleration Distribution'
)

name = alt.Chart(sample).mark_circle().encode(
    y=alt.Y('Name', sort=alt.EncodingSortField(field='Acceleration', order='descending')),
    x=alt.X('Acceleration', scale=alt.Scale(domain=[0,100])),
    tooltip=['Name', 'Position', 'Acceleration', 'Nationality'],
    color=alt.value('blue')
).transform_filter(
    brush
).transform_filter(
    nation_select
).transform_window(
    rank='rank(Acceleration)',
    sort=[alt.SortField('Acceleration', order='descending')]
).transform_filter(
    (alt.datum.rank < 50)
).properties(
    width=90,
    height=800,
    title='Top 50 Players'
)

alt.hconcat(name, alt.vconcat(nationality_count, acceleration_hist))

In [None]:
brush = alt.selection(type='interval')
# nationality_select = alt.selection(type='single', fields=['Nationality'])
club_select = alt.selection(type='single', fields=['GeneralPosition'])

fixed_colour_scale = alt.Scale(domain=['GK', 'DEF', 'MID', 'ST'], range=['#34495e', '#1abc9c', '#d35400', '#8e44ad'])
color = alt.Color('GeneralPosition:N', scale=fixed_colour_scale)

points = alt.Chart().mark_circle().encode(
    x='Acceleration',
    y='SprintSpeed',
    color=alt.condition(brush, color, alt.value('lightgray')),
    tooltip=['GeneralPosition', 'Name', 'Nationality']
).add_selection(
    brush
).transform_filter(
    club_select
)

bars_club_position = alt.Chart().mark_bar().encode(
#     color='GeneralPosition',
    x='count(GeneralPosition)',
    y=alt.Y('GeneralPosition',
        sort=alt.EncodingSortField(field='count', order='descending', op='max')
    ),
    color=alt.condition(club_select, color, alt.value('lightgray')),
).properties(
    selection=club_select
).transform_filter(
    brush
)

alt.vconcat(points, bars_club_position, data=fifa.sample(2000))