# Altair Axis and Scales

In [1]:
import pandas as pd
import altair as alt

alt.data_transformers.enable('default', max_rows=None)

DataTransformerRegistry.enable('default')

## Dataset

To experiment with axes and scales, we will visualize global health and population data for a number of countries over the time period from 1955 to 2005. The data has been collected by the Gapminder Foundation and shared in Hans Rosling's popular [TED TALK](https://www.ted.com/talks/hans_rosling_global_population_growth_box_by_box?language=en). If you haven't seen the Talk, we encourage you to watch it as soon as possible!

In [2]:
data=pd.read_csv('data/gapminder_tidy_scales.csv')

In [3]:
data.head()

Unnamed: 0,Country,Year,fertility,life,population,child_mortality,gdp,region
0,Afghanistan,1964,7.671,33.639,10474903.0,339.7,1182.0,South Asia
1,Afghanistan,1965,7.671,34.152,10697983.0,334.1,1182.0,South Asia
2,Afghanistan,1966,7.671,34.662,10927724.0,328.7,1168.0,South Asia
3,Afghanistan,1967,7.671,35.17,11163656.0,323.3,1173.0,South Asia
4,Afghanistan,1968,7.671,35.674,11411022.0,318.1,1187.0,South Asia


## Scales and Axis

`data_2013` data for the last available year (2013)

In [4]:
data_2013=data[data.Year==data.Year.max()]

`fdata2` gapminder data with Country, region, 1964, 2013 as columns

In [5]:
fdata = data[data['Year'].isin([2013,1964])]
fdata2=((fdata[['Country','Year','population','region']]
        .pivot(index=['Country','region'],columns='Year',values='population'))
        .reset_index()
        .rename_axis(None, axis=1)
        )
fdata2.columns = fdata2.columns.astype(str)
fdata2

Unnamed: 0,Country,region,1964,2013
0,Afghanistan,South Asia,10474903.0,34499915.0
1,Albania,Europe & Central Asia,1817098.0,3238316.0
2,Algeria,Middle East & North Africa,11654905.0,36983924.0
3,Angola,Sub-Saharan Africa,5337063.0,20714494.0
4,Antigua and Barbuda,America,58653.0,91404.0
...,...,...,...,...
197,West Bank and Gaza,Middle East & North Africa,1181587.0,4393572.0
198,Western Sahara,Middle East & North Africa,46308.0,585270.0
199,"Yemen, Rep.",Middle East & North Africa,5527652.0,26358020.0
200,Zambia,Sub-Saharan Africa,3430747.0,14314515.0


# Color Scales

### Scale Qualitative

We encode the *region* attribute on the color channel (the data is of nominal type):

In [6]:
alt.Chart(data_2013).mark_circle().encode(
    alt.X('population:Q',
          scale=alt.Scale(type='log'),
          axis=alt.Axis(tickCount=5),
          title='Population (log scale)',
          ),
    color=alt.Color('region:N'),
).properties(
    width=700,
    height=30
)

You can move the legend to another position with the `orient` and `horizontal` arguments and use different color scales available in Altair.

For a complete list of available color scales refer to the [Vega](https://vega.github.io/vega/docs/schemes/) documentation

In [7]:
alt.Chart(data_2013).mark_circle().encode(
    alt.X('population:Q',
          scale=alt.Scale(type='log'),
          axis=alt.Axis(tickCount=5),
          title='Population (log scale)',
          ),
    tooltip=['Country','region'],
    color=alt.Color(
        'region:N',
        legend=alt.Legend(direction='horizontal',orient='top'),
        scale=alt.Scale(scheme='dark2')
        ),
).properties(
    width=700,
    height=30
)

To use custom colors, we can update the color encoding scale property. One option is to explicitly provide a domain and range to indicate the *data:color* mapping.

In [8]:
regions=fdata2['region'].unique()
print(regions)

['South Asia' 'Europe & Central Asia' 'Middle East & North Africa'
 'Sub-Saharan Africa' 'America' 'East Asia & Pacific']


In [9]:
colorsScale=alt.Scale(
    domain=data_2013['region'].unique(),
    range=['#e41a1c','#377eb8','#4daf4a','#984ea3','#ff7f00','#f5df4d'], # presa dal sito colorbrewer
        )

alt.Chart(data_2013).mark_circle(size=80, opacity=0.5).encode(
    alt.Y('region:N',title=None),
    alt.X('population:Q',
          scale=alt.Scale(type='log'),
          axis=alt.Axis(tickCount=5),
          title='Population (log scale)',
          ),
    tooltip=['Country','region','population'],
    color=alt.Color(
        'region:N',
        legend=alt.Legend(direction='horizontal',orient='top'),
        scale=colorsScale
        ),
).properties(
    width=700
)

### Customize legend

In [26]:
colorsScale=alt.Scale(
    domain=data_2013['region'].unique(),
    range=['#e41a1c','#377eb8','#4daf4a','#984ea3','#ff7f00','#f5df4d'], # presa dal sito colorbrewer
        )

alt.Chart(data_2013).mark_circle(size=80, opacity=0.5).encode(
    alt.Y('region:N',title=None),
    alt.X('population:Q',
          scale=alt.Scale(type='log'),
          axis=alt.Axis(tickCount=5),
          title='Population (log scale)',
          ),
    tooltip=['Country','region','population'],
    color=alt.Color(
        'region:N',
        legend=alt.Legend(direction='horizontal',orient='right', columns=2, rowPadding=10, title='Continents'),
        scale=colorsScale
        ),
).properties(
    width=500
)

### Scale Quantitative

---



The size (`size`) of markers with an area is scaled to the area (and not for example for *marker_circle* to the radius). Therefore the range is also expressed as area.

In [10]:
alt.Chart(data_2013.sort_values(by=['population'], ascending=False)).mark_circle(tooltip=True).encode(
    x=alt.X('fertility:Q',scale=alt.Scale(zero=False)),
    y= alt.Y('life:Q', scale=alt.Scale(zero=False)),
    size=alt.Size(
        'population:Q',
        scale=alt.Scale(range=[10,4000]) #la dimensione è relativa all'area (come è giusto che sia)
        ),
    color='region:N',
    tooltip= ['Country','region','population'],
).properties(
    width=400,
    height=400
)

##  increase in GDP from 1980 to 2000 

Let's calculate the increase in GDP from 1980 to 2000 by creating a new column that we will call `'delta_perc'`

In [11]:
fdata=data[data.Year.isin([1980,2000])]
gdpData=fdata[['Country','Year','gdp','region']]\
.pivot(index=['Country','region'],columns='Year',values='gdp').reset_index()
gdpData.columns=gdpData.columns.astype(str)
gdpData['delta_perc']=((gdpData['2000']/gdpData['1980'])*100-100)
gdpData.head()

Year,Country,region,1980,2000,delta_perc
0,Afghanistan,South Asia,1158.0,962.0,-16.925734
1,Albania,Europe & Central Asia,4218.0,5305.0,25.770507
2,Algeria,Middle East & North Africa,10166.0,9885.0,-2.764116
3,Angola,Sub-Saharan Africa,4443.0,3387.0,-23.767725
4,Antigua and Barbuda,America,8169.0,19319.0,136.491615


Here's the step-by-step explanation:

gdpData['2000']/gdpData['1980'] 
1. Divides the GDP value for the year 2000 by the GDP value for the year 1980, giving the ratio between the two values. This ratio represents how many times the GDP has grown (or decreased) from 1980 to 2000.

2. `*100` Multiplies the ratio by 100 to express it in percentage terms. If the GDP has doubled, this value will be 200%.
3. `-100` Subtracts 100 to get the net percentage change. If the GDP has doubled (200%), the percentage change will be 100%.

The final result delta_perc represents the percentage change in GDP between 1980 and 2000

Simple **linear scale**

In [12]:
alt.Chart(gdpData[gdpData['region']=='Middle East & North Africa']).mark_rect().encode(
    x=alt.X('Country:N' , sort=alt.EncodingSortField(field="delta_perc", order='descending')),
    color=alt.Color('delta_perc:Q',scale=alt.Scale(
            scheme='viridis',
            reverse=True
            )
    )

)

**Quantize scale**: it is similar to the linear scale, but the values ​​are discretized (with bins of homogeneous size), the `nice` attribute rounds the value of the thresholds.

In [13]:
alt.Chart(gdpData[gdpData['region']=='Middle East & North Africa']).mark_rect().encode(
    x=alt.X('Country:N' , sort=alt.EncodingSortField(field="delta_perc", order='descending')),
    color=alt.Color('delta_perc:Q',scale=alt.Scale(
            scheme='viridis',
            reverse=True,
            type='quantize',
            nice=True
            )
    )

)

**Scale: Quantile**. `Quantile` scales map a sample of values ​​from the input domain to a discrete interval based on the bounds of the computed quantiles. If the interval is not specified, the domain will be segmented into 4 quantiles (quartiles) by default.

In [14]:
alt.Chart(gdpData[gdpData['region']=='Middle East & North Africa']).mark_rect().encode(
    x=alt.X('Country:N' , sort=alt.EncodingSortField(field="delta_perc", order='descending')),
    color=alt.Color('delta_perc:Q',scale=alt.Scale(
            scheme='viridis',
            reverse=True,
            type='quantile'
            )
    )

)

In [15]:

alt.Chart(gdpData[gdpData['region']=='Middle East & North Africa']).mark_rect().encode(
    x=alt.X('Country:N' , sort=alt.EncodingSortField(field="delta_perc", order='descending')),
    color=alt.Color('delta_perc:Q',scale=alt.Scale(
            scheme='viridis',
            reverse=True,
            type='threshold',
            domain= [-20,0,20,50,80]
            )
    )

)

### Divergent Quantitative Scales

In [16]:
fdata=data[data.Year.isin([1980,2000])]
gdpData=fdata[['Country','Year','gdp','region']]\
.pivot(index=['Country','region'],columns='Year',values='gdp').reset_index()
gdpData.columns=gdpData.columns.astype(str)
gdpData['delta_perc']=((gdpData['2000']/gdpData['1980'])*100-100)
gdpData.head()

Year,Country,region,1980,2000,delta_perc
0,Afghanistan,South Asia,1158.0,962.0,-16.925734
1,Albania,Europe & Central Asia,4218.0,5305.0,25.770507
2,Algeria,Middle East & North Africa,10166.0,9885.0,-2.764116
3,Angola,Sub-Saharan Africa,4443.0,3387.0,-23.767725
4,Antigua and Barbuda,America,8169.0,19319.0,136.491615


In [17]:
alt.Chart(gdpData[gdpData['region']=='Middle East & North Africa']).mark_bar().encode(
    x='Country:N',
    y='delta_perc:Q'
)

How can we improve it?

In [18]:
alt.Chart(gdpData[gdpData['region']=='Middle East & North Africa']).mark_bar().encode(
    y=alt.Y('Country:N',sort=alt.EncodingSortField(field="delta_perc", order='descending')),
    x=alt.X('delta_perc:Q',axis=alt.Axis(title='Δ GPI (%)')),
    color=alt.Color(
        'delta_perc:Q',
        # bin=True,
        # bin= alt.BinParams(maxbins=10),
        scale=alt.Scale(
            scheme='blueorange',
            # domainMid=0,
            domain=[-100,100]
            ),
        legend=alt.Legend(title='Δ GPI (%)')
        ),
)



---

## References:

To see the level of customization related to axes and legends in Altair, we recommend you to consult the following [link](https://observablehq.com/@vega/a-guide-to-guides-axes-legends-in-vega) (the code is vega-lite, but the functionality is very similar)
