# Bar Plots with Altair

## Objectives

- Demonstrate the use of Altair to create bar charts for visual data analysis.
- Showcase various bar chart modifications, including color, size, rotation, and conditional formatting.
- Utilize the Automobile Dataset to explore categorical and continuous data distributions.
- Implement interactive features like conditional coloring and aggregation functions to enhance understanding of the data.

## Background

This notebook introduces creating bar charts using Altair, a declarative visualization library in Python. It uses the Automobile Dataset from the UCI Machine Learning Repository to illustrate how Altair can visualize distributions of categorical data and aggregate statistics, such as counts, averages, and maximum values. The notebook also explores how to enhance visualizations with features like color encoding, axis rotation, and interactive conditions.

## Datasets Used

**Automobile Dataset from UCI**: This dataset contains attributes of automobiles such as make, fuel type, body style, and price. It is used to demonstrate different visual encoding strategies and explore the distribution and aggregation of data through visualizations.

## Automobile Dataset

In [1]:
import pandas as pd
pd.set_option('display.max_columns', 8)
import altair as alt

We will use the Automobile Data Set [https://archive.ics.uci.edu/ml/datasets/automobile] from the UCI Machine Learning Repository [https://archive-beta.ics.uci.edu/]. It includes categorical and continuous variables. 

Defining the headers

In [2]:
# Defining the headers
headers = ["symboling", "normalized_losses", "make", "fuel_type", "aspiration", "num_doors", "body_style", 
        "drive_wheels", "engine_location", "wheel_base", "length", "width", "height", "curb_weight", 
        "engine_type", "num_cylinders", "engine_size", "fuel_system", "bore", "stroke", "compression_ratio", 
        "horsepower", "peak_rpm", "city_mpg", "highway_mpg", "price"]

In [3]:
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data",
                  header=None, names=headers, na_values="?" )
df.head()

Unnamed: 0,symboling,normalized_losses,make,fuel_type,...,peak_rpm,city_mpg,highway_mpg,price
0,3,,alfa-romero,gas,...,5000.0,21,27,13495.0
1,3,,alfa-romero,gas,...,5000.0,21,27,16500.0
2,1,,alfa-romero,gas,...,5000.0,19,26,16500.0
3,2,164.0,audi,gas,...,5500.0,24,30,13950.0
4,2,164.0,audi,gas,...,5500.0,18,22,17450.0


Altair allows the variable name specification and the aggregate and type within a simple short-hand string syntax. The table shows examples of the short-hand and long-hand specifications:

Short-hand | Long-hand 
---------|-----
x='varname' | alt.X('varname')
x='varname:Q' | alt.X('varname', type='quantitative')
x='sum(varname)' | alt.X('varname', aggregate='sum')
x='sum(varname):Q' | alt.X('varname', aggregate='sum', type='quantitative')
x='count():Q' | alt.X(aggregate='count', type='quantitative')

## Bar Charts

A bar chart is a standard tool for displaying quantities of several categories.

### Vertical Bar Charts

In [4]:
alt.Chart(df).mark_bar().encode(
    alt.X('make:N', title='Make'),
    alt.Y('count()', title='Count')
)

Let's analyze the variable `fuel_type`, that has only two different values.

In [5]:
# Bar Chart of fuel_type
alt.Chart(df).mark_bar().encode(
    alt.X('fuel_type:N', title='Fuel type'),
    alt.Y('count()', title='Count')
)

We get a tiny plot with bars filled in blue.

We can change the size of the Altair plot using properties() function.

In [6]:
alt.Chart(df).mark_bar().encode(
    alt.X('fuel_type:N', title='Fuel type'),
    alt.Y('count()', title='Count')
).properties(width=300)

Rotating x-axis labels

In [7]:
alt.Chart(df).mark_bar().encode(
    alt.X('fuel_type:N', axis=alt.Axis(title='Fuel type', labelAngle=0)),
    alt.Y('count()', title='Count')
).properties(width=300)

### Introducing `Color`

In [8]:
# Introducing color with a quantitative variable
alt.Chart(df).mark_bar().encode(
    alt.X('make:N', title='Make'),
    alt.Y('count()', title='Count'),
    alt.Color('count()', legend=alt.Legend(title='Count'))
)

Adding a quantitative variable on `Color`

In [9]:
# Introducing color with a quantitative variable
alt.Chart(df).mark_bar().encode(
    alt.X('make:N', title='Make'),
    alt.Y('count()', title='Count'),
    alt.Color('highway_mpg', title='Highway (MPG)')
)

Notice that we were using quantitative variables in color. Now, let's use a nominal variable in the Color field: `body_style`.

Introducing `Color` with a categorical variable

In [10]:
# Introducing color with a categorical variable
alt.Chart(df).mark_bar().encode(
    alt.X('make:N', title='Make'),
    alt.Y('count()', title='Count'),
    alt.Color('body_style', title='Body Style')
)

### Rows and columns

Let's add `fuel_type` on the `Color` property to the `Make vs Count()` graph

In [11]:
alt.Chart(df).mark_bar().encode(
    alt.X('make:N',  title='Make'),
    alt.Y('count()', title='Count'),
    alt.Color('fuel_type', title='Fuel type')
).properties(
    title={"text":"Make & Fuel Type","fontSize":20}
)

Splitting the data into two columns

In [12]:
# Splitting the data into two columns
alt.Chart(df).mark_bar().encode(
    alt.X('make:N',  title='Make'),
    alt.Y('count()', title='Count'),
    alt.Color('fuel_type', title='Fuel type'),
    column=('fuel_type')
).properties(
    title={"text":"Make & Fuel Type","fontSize":20}
)

Splitting the data into two rows

In [13]:
# Splitting the data into two rows
alt.Chart(df).mark_bar().encode(
    alt.X('make:N',  title='Make'),
    alt.Y('count()', title='Count'),
    alt.Color('fuel_type', title='Fuel type'),
    row=('fuel_type')
).properties(
    title={"text":"Make & Fuel Type","fontSize":20}
)

### Including a Conditional Statement

This example shows a basic bar chart with a single bar highlighted.

`alt.condition(predicate, if_true, if_false)`
- `predicate`: the selection predicate for the condition 
- `if_true`: the object to use if the selection predicate is true
- `if_false`: the object to use if the selection predicate is false

In [14]:
alt.Chart(df).mark_bar().encode(
    x=alt.X('make:N', title='Make'),
    y=alt.Y('count():Q', title='Count'),
    # The highlight will be set on the result of a conditional statement
    color=alt.condition(
        alt.datum.make == 'toyota', # If make == toyota this test returns True,
        alt.value('steelblue'),     # which sets the bar steelblue
        alt.value('silver')         # and it sets the bar silver if false
    )
).properties(
    title={"text":"Make & Fuel Type","fontSize":20},
    width=600
)    

### Using `transform_aggregate`

Suppose we want to plot the average `price` by `make` using a bar chart. Una option is to compute the average price, and feed the bar graph with it, but Altair can do it everything by itself. Let's see it!

In [15]:
alt.Chart(df).mark_bar().encode(
    x=alt.X('make:N', title='Make'),
    y=alt.Y('avg_price:Q', title='Average Price', 
            scale=alt.Scale(domain=[0, 50000])), 
    color=alt.value("limegreen")         
).transform_aggregate(
    avg_price = 'mean(price)', groupby=['make']
).properties(
    title={"text":"Average Price by Make","fontSize":20}
) 

Let's plot now two bar charts with the minimum and the maximum prices.

In [16]:
maxP = alt.Chart(df).mark_bar().encode(
    x=alt.X('make:N', title='Make'),
    y=alt.Y('max_price:Q', title='Max Price', 
            scale=alt.Scale(domain=[0, 50000])),
    color=alt.value("seagreen")                 
).transform_aggregate(
    max_price = 'max(price)', groupby=['make']
).properties(
    title={"text":"Max Price by Make","fontSize":20}
)  
maxP

In [17]:
minP = alt.Chart(df).mark_bar().encode(
    x=alt.X('make:N', title='Make'),
    y=alt.Y('min_price:Q', title='Min Price', 
            scale=alt.Scale(domain=[0, 50000])),
    color=alt.value("palegreen")                
).transform_aggregate(
    min_price = 'min(price)', groupby=['make']
).properties(
    title={"text":"Min Price by Make","fontSize":20}
)  
minP

Horizontal concatenation

In [18]:
minP | maxP

Vertical concatenation

In [19]:
minP & maxP

### Horizontal Bar Charts

Switching x and y 

In [20]:
# Switching x and y 
alt.Chart(df).mark_bar().encode(
    alt.Y('make:N', axis=alt.Axis(title='Make')),
    alt.X('count()', axis=alt.Axis(title='Count'))
)

### Adding data labels

We want to add data labels to the previous graph. Basically, we can get the information we are plotting by using `value_counts`.

In [21]:
df.make.value_counts(sort=False)

alfa-romero       3
audi              7
bmw               8
chevrolet         3
dodge             9
honda            13
isuzu             4
jaguar            3
mazda            17
mercedes-benz     8
mercury           1
mitsubishi       13
nissan           18
peugot           11
plymouth          7
porsche           5
renault           2
saab              6
subaru           12
toyota           32
volkswagen       12
volvo            11
Name: make, dtype: int64

For simplicity, let's put the previous information into a DataFrame.

In [22]:
make_data = df.make.value_counts(sort=False)
mk = pd.DataFrame({'make': make_data.index, 'count': make_data.values})
mk

Unnamed: 0,make,count
0,alfa-romero,3
1,audi,7
2,bmw,8
3,chevrolet,3
4,dodge,9
5,honda,13
6,isuzu,4
7,jaguar,3
8,mazda,17
9,mercedes-benz,8


Assigning the chart to a variable

In [23]:
# Assigning the chart to a variable
hBar = alt.Chart(mk).mark_bar().encode(
    alt.Y('make:N', axis=alt.Axis(title='Make')),
    alt.X('count:Q', axis=alt.Axis(title='Count'))
).properties(
    height = 500,
    title='Make'    
)

In [24]:
hBar

Writing the labels in the proper position

In [25]:
# Writing the labels in the proper position
labels = hBar.mark_text(
    align='left',
    baseline='middle',
    dx=5    
).encode(
    text='count:Q'
).properties(
    height = 500
)

labels:

In [26]:
labels

hBar + labels:

In [27]:
hBar + labels

## Conclusions

- Altair simplifies the creation of bar charts with its concise syntax and powerful encoding capabilities.
- The library's integration with Pandas DataFrames allows for seamless data manipulation and visualization.
- Enhancements such as color coding, conditional formatting, and axis adjustments provide clear and informative visual representations.

## References

- https://altair-viz.github.io/