# Altair Axis and Scales

In addition to graphic marks (markers), a chart needs reference elements, or guides, that allow readers to decode the chart. Guides such as axes (*axis* that visualize scales with spatial intervals) and legends (*legend* that visualize scales with intervals of color, size, or shape), are the basic elements of an effective visualization!

In this notebook, we will explore the options provided by Altair to support custom mapping designs through scales, axes, and legends, using an example on global health and population.

In [1]:
import pandas as pd
import altair as alt

alt.data_transformers.enable('default', max_rows=None)

DataTransformerRegistry.enable('default')

## Dataset

To experiment with axes and scales, we will visualize global health and population data for a number of countries over the time period from 1955 to 2005. The data has been collected by the Gapminder Foundation and shared in Hans Rosling's popular [TED TALK](https://www.ted.com/talks/hans_rosling_global_population_growth_box_by_box?language=en). If you haven't seen the Talk, we encourage you to watch it as soon as possible!

In [2]:
data=pd.read_csv('data/gapminder_tidy_scales.csv')

In [3]:
data.head()

Unnamed: 0,Country,Year,fertility,life,population,child_mortality,gdp,region
0,Afghanistan,1964,7.671,33.639,10474903.0,339.7,1182.0,South Asia
1,Afghanistan,1965,7.671,34.152,10697983.0,334.1,1182.0,South Asia
2,Afghanistan,1966,7.671,34.662,10927724.0,328.7,1168.0,South Asia
3,Afghanistan,1967,7.671,35.17,11163656.0,323.3,1173.0,South Asia
4,Afghanistan,1968,7.671,35.674,11411022.0,318.1,1187.0,South Asia


## Scales and Axis

We start with a simple plot about population data for the last available year (2013)

- The pandas way

In [4]:
data_2013=data[data.Year==data.Year.max()]

In [5]:
alt.Chart(data_2013).mark_circle(tooltip=True).encode(
    alt.X('population:Q'),
).properties(width=600)

- The Altair Way

In [6]:
alt.Chart(data).mark_circle(tooltip=True).encode(
    alt.X('population:Q'),
).transform_filter(
    'datum.Year === 2013'
).properties(width=600)

By default, Altair uses a linear mapping between domain values (population) and range values (pixels). To get a better overview of the data, we can apply a different scale transformation.

To change the scale type, we'll set the `scale` attribute, using the `alt.Scale` method and the `type` parameter. Here's the result of using a logarithmic scale type (`log`). The distances in the pixel range now correspond to the square root of the distances in the data domain.

Note: The scale types supported for continuous values are `linear`, `log`, `pow`, `sqrt`

In [7]:
alt.Chart(data_2013).mark_circle().encode(
    alt.X('population:Q', scale=alt.Scale(type='log')),
).properties(width=600)

**Now single values are easier to be visualised**

---

## Axis Customization

By default, Altair places the x-axis along the bottom of the chart. To change this default, we can add an axis attribute with `orient = 'top'`

In [8]:
alt.Chart(data_2013).mark_circle().encode(
    alt.X('population:Q',
          scale=alt.Scale(type='log'),
          axis=alt.Axis(orient='top')),
)

Similarly, the y-axis defaults to a `left` orientation, but can be set to `right`.

Now let's flip the values on the X-axis and customize the title

In [9]:
alt.Chart(data_2013).mark_circle().encode(
    alt.X('population:Q',
          sort='descending',
          scale=alt.Scale(type='log'),
          axis=alt.Axis(orient='top'),
          title='Population (reverse log scale)'
          )
)

## Cartesian Diagram with Different Scales

In [10]:
italy = data[data['Country']=='Italy']
italy.head()

Unnamed: 0,Country,Year,fertility,life,population,child_mortality,gdp,region
4350,Italy,1964,2.5,70.4,51054762.0,43.1,12343.0,Europe & Central Asia
4351,Italy,1965,2.517,70.27,51453513.0,41.0,12599.0,Europe & Central Asia
4352,Italy,1966,2.524,71.02,51839323.0,39.3,13279.0,Europe & Central Asia
4353,Italy,1967,2.524,71.06,52215095.0,37.8,13997.0,Europe & Central Asia
4354,Italy,1968,2.515,70.88,52584254.0,36.4,14929.0,Europe & Central Asia


In [11]:
alt.Chart(italy).mark_line(
    opacity=0.8,
).encode(
    alt.X('Year:Q', title='Year'),
    alt.Y('population:Q'),
).properties(
    width=200,
    height=200
)

In [12]:
alt.Chart(italy).mark_line(
    opacity=0.8,
).encode(
    alt.X('Year:Q', title='Year'),
    alt.Y('population:Q', scale=alt.Scale(zero=False)),
).properties(
    width=200,
    height=200
)

## Axis Domain

In [13]:
fdata = data[data['Year'].isin([2013,1964])]

In [14]:
fdata2=((fdata[['Country','Year','population','region']]
        .pivot(index=['Country','region'],columns='Year',values='population'))
        .reset_index()
        .rename_axis(None, axis=1)
        )
fdata2

Unnamed: 0,Country,region,1964,2013
0,Afghanistan,South Asia,10474903.0,34499915.0
1,Albania,Europe & Central Asia,1817098.0,3238316.0
2,Algeria,Middle East & North Africa,11654905.0,36983924.0
3,Angola,Sub-Saharan Africa,5337063.0,20714494.0
4,Antigua and Barbuda,America,58653.0,91404.0
...,...,...,...,...
197,West Bank and Gaza,Middle East & North Africa,1181587.0,4393572.0
198,Western Sahara,Middle East & North Africa,46308.0,585270.0
199,"Yemen, Rep.",Middle East & North Africa,5527652.0,26358020.0
200,Zambia,Sub-Saharan Africa,3430747.0,14314515.0


In [15]:
fdata2.columns = fdata2.columns.astype(str)

In [16]:
alt.Chart(fdata2).mark_circle().encode(
    alt.X('2013:Q', scale=alt.Scale(type='log')),
    alt.Y('1964:Q',scale=alt.Scale(type='log')),
    tooltip=['Country','1964','2013']
).properties(
    width=300,
    height=300
)

In [17]:
domainMax=max(fdata2['2013'].max(),fdata2['1964'].max())
domainMin=min(fdata2['2013'].min(),fdata2['1964'].min())
domainMin,domainMax

(np.float64(29175.0), np.float64(1359368470.0))

We can define an explicit width and height for the graph to make it symmetric by specifying the corresponding domains using `alt.Scale(domain=[min,max])`.
We also reduce the number of ticks, the grid lines, to make the graph more readable.

In [18]:
scatter_chart = alt.Chart(fdata2).mark_circle().encode(
    alt.X('2013:Q',

          scale=alt.Scale(type='log',domain=[domainMin,domainMax]),
          title='2013 population',
          axis=alt.Axis(tickCount=4)
          ),
    alt.Y(
        '1964:Q',
        scale=alt.Scale(type='log',domain=[domainMin,domainMax]),
        title='1964 population',
        axis=alt.Axis(tickCount=4)
        ),tooltip=['Country']
).properties(
    width=300,
    height=300
)
# Creiamo un DataFrame con due punti che rappresentano gli angoli
# Per usare values invece di datum, dobbiamo creare dei dati reali
line_data = pd.DataFrame({
    '2013': [domainMin, domainMax],
    '1964': [domainMin, domainMax]
})

diagonal_line = alt.Chart(line_data).mark_line(
    color='red',
    strokeDash=[4, 4]  # Linea tratteggiata per distinguerla meglio
).encode(
    x=alt.X('2013:Q', scale=alt.Scale(type='log', domain=[domainMin, domainMax])),
    y=alt.Y('1964:Q', scale=alt.Scale(type='log', domain=[domainMin, domainMax]))
)

# Chart combination
final_chart = scatter_chart + diagonal_line

# Visualize it!
final_chart

In [19]:
scatter_chart = alt.Chart(fdata2).mark_circle(
    fillOpacity=0.5,
    strokeOpacity=0.7,
    strokeWidth = 1,
    stroke = 'black'
).encode(
    alt.X('2013:Q',

          scale=alt.Scale(type='log',domain=[domainMin,domainMax]),
          title='2013 population',
          axis=alt.Axis(tickCount=4)
          ),
    alt.Y(
        '1964:Q',
        scale=alt.Scale(type='log',domain=[domainMin,domainMax]),
        title='1964 population',
        axis=alt.Axis(tickCount=4)
        ),
    size= alt.Size('2013:Q',scale=alt.Scale(domain=[domainMin,domainMax],range=[15,1000])),
    tooltip=['Country']
).properties(
    width=300,
    height=300
)
# Creiamo un DataFrame con due punti che rappresentano gli angoli
# Per usare values invece di datum, dobbiamo creare dei dati reali
line_data = pd.DataFrame({
    '2013': [domainMin, domainMax],
    '1964': [domainMin, domainMax]
})

diagonal_line = alt.Chart(line_data).mark_line(
    color='black',
    strokeWidth=1,
    strokeDash=[4, 4]  # Linea tratteggiata per distinguerla meglio
).encode(
    x=alt.X('2013:Q', scale=alt.Scale(type='log', domain=[domainMin, domainMax])),
    y=alt.Y('1964:Q', scale=alt.Scale(type='log', domain=[domainMin, domainMax]))
)

# Chart combination
final_chart = scatter_chart + diagonal_line

# Visualize it!
final_chart

---

# Exercise

Compare the population in the first and last available year (use the Y-axis to display years).

Set the chart with a height of 800 and a width of 500, assign a fairly high opacity to the markers and make them larger.

Add information about the country and population when hovering over each marker.

In [33]:
data

Unnamed: 0,Country,Year,fertility,life,population,child_mortality,gdp,region
0,Afghanistan,1964,7.671,33.639,10474903.0,339.7,1182.0,South Asia
1,Afghanistan,1965,7.671,34.152,10697983.0,334.1,1182.0,South Asia
2,Afghanistan,1966,7.671,34.662,10927724.0,328.7,1168.0,South Asia
3,Afghanistan,1967,7.671,35.170,11163656.0,323.3,1173.0,South Asia
4,Afghanistan,1968,7.671,35.674,11411022.0,318.1,1187.0,South Asia
...,...,...,...,...,...,...,...,...
10106,Åland,2002,,81.800,26257.0,,,Europe & Central Asia
10107,Åland,2003,,80.630,26347.0,,,Europe & Central Asia
10108,Åland,2004,,79.880,26530.0,,,Europe & Central Asia
10109,Åland,2005,,80.000,26766.0,,,Europe & Central Asia


In [34]:
fdata = data[data['Year'].isin([2013,1964])]

In [35]:
# calculate delta pop

# 1. Sort by Country and Year
fdata_sorted = fdata.sort_values(['Country', 'Year'])

# 2. Calculate first and last population for each Country
fdata_sorted['first_population'] = fdata_sorted.groupby('Country')['population'].transform('first')
fdata_sorted['last_population'] = fdata_sorted.groupby('Country')['population'].transform('last')

# 3. Calculate percentage change
fdata_sorted['percent_change'] = (fdata_sorted['last_population'] / fdata_sorted['first_population']) * 100

# 4. Find the maximum year for each Country
fdata_sorted['max_year'] = fdata_sorted.groupby('Country')['Year'].transform('max')

# 5. Now you have a table like this:
# Columns: Country, Year, population, first_population, last_population, percent_change, max_year, ...

# If you want, you can also filter only the necessary columns:
materialized = fdata_sorted[[
   'Country', 'Year', 'population', 'percent_change'
]]

In [36]:
materialized

Unnamed: 0,Country,Year,population,percent_change
0,Afghanistan,1964,10474903.0,329.357847
49,Afghanistan,2013,34499915.0,329.357847
50,Albania,1964,1817098.0,178.213613
99,Albania,2013,3238316.0,178.213613
100,Algeria,1964,11654905.0,317.324972
...,...,...,...,...
10000,"Yemen, Rep.",2013,26358020.0,476.839352
10001,Zambia,1964,3430747.0,417.241930
10050,Zambia,2013,14314515.0,417.241930
10051,Zimbabwe,1964,4279524.0,311.434753


In [38]:
# Create the chart
base = alt.Chart(materialized).encode(
    y=alt.Y(
        'population:Q',
        scale=alt.Scale(type='log',nice=False),
        axis=alt.Axis(
            gridDash=[1, 0],
            values=[1, 1000,10000,100000,1000000,10000000,100000000,100000000,1000000000]
        )
    ),
    x=alt.X('Year:O',
            axis=alt.Axis(
                labelFlush=True,
                labelAngle=0
            )
        ),
    tooltip=['Country', 'percent_change:Q','population:Q'],
    detail='Country:N',
)

thr = 400

# Bubble chart
pop_bubbles_r = base.transform_filter(
    alt.datum.percent_change > thr,
).mark_circle(
    opacity=1,  # High opacity
    size=100,
    stroke='black',
    fill='red',
)

pop_bubbles_g = base.transform_filter(
    alt.datum.percent_change < thr,
).mark_circle(
    opacity=.3,  # High opacity
    size=50,
    stroke='black',
    fill='gray',
)

# Line chart
pop_lines_r = base.transform_filter(
    alt.datum.percent_change > thr,
).mark_line(
    strokeWidth=1,
    stroke='red'
)

# Line chart
pop_lines_g = base.transform_filter(
    alt.datum.percent_change < thr,
).mark_line(
    strokeWidth=1,
    stroke='lightgray'
)

# Combine the layers
(pop_lines_g + pop_lines_r + pop_bubbles_g + pop_bubbles_r).properties(height=800,width=500)