![Astrofisica Computacional](../../../logo.PNG)

# Altair Visualization Example 1
---
## Eduard Larrañaga

Observatorio Astronómico Nacional\
Facultad de Ciencias\
Universidad Nacional de Colombia

---

### About this notebook

In this worksheet, we use a real dataset in the csv format to illustrate the use of the package `Altair` in visualization. Plots made with Altair can be visualized and manipulated interactively.

In order to install this package and to see a detailed description go to

https://altair-viz.github.io/

---

In [None]:
import numpy as np
import altair as alt
import pandas as pd


# This line enables the renderer of Altair in the notebook. (It is needed only for local work)
# alt.renderers.enable('notebook')


### Reading the dataset
Since the dataset is a .csv file, we use pandas to read the file and take a look to the first elements.

In [None]:
path='' #Define an empty string to use in case of local working

In [None]:
# Working with google colab needs to mount the Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# we define the path to the files
path = '/content/drive/MyDrive/Colab Notebooks/CA2021/04. Visualization/presentation/05.AltairExample01/'

In [None]:
df = pd.read_csv(path+'data.csv')  

In [None]:
df.head()

Unnamed: 0,Name,z,sigma*,e_sigma*,n_sigma*,FWHM,e_FWHM,logL,e_logL,logM,E_logM,e_logM
0,SDSS J000805.62+145023.4,0.0454,140.0,27.0,,7610.0,380.0,41.13,0.04,7.7,,0.1
1,SDSS J004236.86-104921.8,0.0419,78.4,10.0,,1960.0,97.0,41.58,0.14,6.7,,0.1
2,SDSS J011703.58+000027.3,0.0456,98.8,16.0,,2270.0,110.0,41.45,0.08,6.8,,0.1
3,SDSS J020459.25-080816.0,0.0772,121.0,9.4,a,3720.0,180.0,41.13,0.05,7.0,,0.1
4,SDSS J020615.99-001729.1,0.0426,216.0,30.0,,3860.0,190.0,41.91,0.07,7.5,,0.1


In [None]:
df.describe()

Unnamed: 0,z,sigma*,e_sigma*,FWHM,e_FWHM,logL,e_logL,logM,E_logM,e_logM
count,88.0,88.0,88.0,71.0,71.0,71.0,71.0,88.0,15.0,88.0
mean,0.048665,117.142045,11.805682,3206.056338,210.760563,41.504225,0.078028,6.86625,0.140667,0.189886
std,0.032562,48.285108,5.308383,1759.679743,191.219953,0.663268,0.0417,0.72825,0.074303,0.17247
min,0.000947,30.0,2.9,810.0,41.0,40.1,0.03,4.9,0.02,0.02
25%,0.02775,87.025,7.75,1905.0,110.0,41.155,0.05,6.3,0.1,0.1
50%,0.04225,113.5,12.0,2970.0,160.0,41.51,0.07,7.0,0.12,0.1
75%,0.0622,139.25,15.0,3870.0,210.0,41.86,0.09,7.4075,0.17,0.2
max,0.184,268.0,30.0,8240.0,1190.0,43.61,0.2,8.52,0.31,1.06


The dataframe includes data from 88 supermassive black holes. There can be seen the mean of the values, the standard deviation and the minimum and maximum values.

The columns correspond to

**z** : Redshift \
**sigma**\* : Stellar velocity dispersion \
**e_sigma**\* : Formal uncertainty in sigma* \
**FWHM** : H<sub>$\alpha$</sub> Full-Width at Half Maximum \
**e_FWHM** : Formal uncertainty in FWHM \
**logL** : $log_{10}$ of H<sub>$\alpha$</sub> luminosity in erg/s \
**e_logL** : Formal uncertainty in logL \
**logM** : $\log_{10}$ of the Black Hole mass \
**E_logM** : Formal (upper limit) uncertainty in logM \
**e_logM** : Formal (lower limit) uncertainty in logM 

---

### Ploting the Data

We will use the `Altair` package to plot the dataframe.

---

The first chart looks like this

In [None]:
alt.Chart(df).mark_point()

In this chart we see all the data in the dataframe as a point because we have not defined any axis yet. Therefore, we introduce one of the axis to distribute the data along that direction. For example, define the x-axis as the redshift, $z$.

In [None]:
# put some data along the x axis
alt.Chart(df).mark_point().encode(
    x = 'z'
)

Instead of points to visualice the data, we can use ticks. 

See https://altair-viz.github.io/user_guide/marks.html

In [None]:
# make the chart clear by changing the mark from point to tick
alt.Chart(df).mark_tick().encode(
    x = 'z'
)

Now, we extend the vertical direction with other feature, for example $\log M$.

In [None]:
alt.Chart(df).mark_point().encode(
    x = 'z',
    y = 'logM'
)

Another way to designate the data in the axes is

In [None]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('z'),
    alt.Y('logM')
)

Using this way of designate the axis makes simple to change the scale.

In [None]:
alt.Chart(df).mark_point(filled = True, color = 'black').encode(
    alt.X('z',scale=alt.Scale(type='log', base=10)),
    alt.Y('logM')
)

Now, lets try other characteristics.

In [None]:
alt.Chart(df).mark_point().encode(
    alt.X('sigma*',scale=alt.Scale(type='log', base=10)),
    alt.Y('logM')
)

And now, we will add some color according to the values of another feature.

In [None]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('sigma*', scale=alt.Scale(type='log', base=10)),
    alt.Y('logM'),
    color = 'z'
)

What about using the mass as the color of the point.

In [None]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('z'),
    alt.Y('sigma*'),
    color = 'logM'
)

Or using the mass as the size of the point.

In [None]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('z'),
    alt.Y('sigma*'),
    size = 'logM'
)

### Interactivity

Now the interesting part: interactivity. All the details can be found at

https://altair-viz.github.io/user_guide/interactions.html

and 

https://altair-viz.github.io/gallery/index.html#gallery-category-interactive-charts

We begin with the simple plot

In [None]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('sigma*', scale=alt.Scale(type='log', base=10)),
    alt.Y('logM'),
    color = 'z'
)

First add the `tooltip` property to show the Name tag when the cursor is over any point.

In [None]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('sigma*', scale=alt.Scale(type='log', base=10)),
    alt.Y('logM'),
    color = 'z',
    tooltip = 'Name' 
)

and add the `selection` property:

In [None]:
# create an interval selection
brush = alt.selection_interval()

alt.Chart(df).mark_point(filled=True).encode(
    alt.X('sigma*', scale=alt.Scale(type='log', base=10)),
    alt.Y('logM'),
    color = 'z',
    tooltip = 'Name' 
).add_selection(
    brush
)

Using the click of the mouse it is possible to make a selection. However, we have not defined waht to do with the selected points. Now, we will give color only to the selected points:

In [None]:
brush = alt.selection_interval()

alt.Chart(df).mark_point(filled=True).encode(
    alt.X('sigma*', scale=alt.Scale(type='log', base=10)),
    alt.Y('logM'),
    color = alt.condition(brush,'z', alt.value('lightgray')),
    tooltip = 'Name' 
).add_selection(
    brush
)

With just one modification, we change the selection tool:

In [None]:
brush = alt.selection_interval(encodings=['x'])

alt.Chart(df).mark_point(filled=True).encode(
    alt.X('sigma*', scale=alt.Scale(type='log', base=10)),
    alt.Y('logM'),
    color = alt.condition(brush,'z', alt.value('lightgray')),
    tooltip = 'Name' 
).add_selection(
    brush
)

Changing the selection type we can move and scale the plot

In [None]:
scales = alt.selection_interval(bind='scales')

alt.Chart(df).mark_point(filled=True).encode(
    alt.X('sigma*', scale=alt.Scale(type='log', base=10)),
    alt.Y('logM'),
    color = 'z',
    tooltip = 'Name' 
).add_selection(
    scales
)

Finally, we can create a slider to choose a cutoff value and give color to points based on whether they are smaller or larger than this value. 

In [None]:
# Definition of the slider
slider = alt.binding_range(min=0, max=0.184, step=0.001, name='z cutoff:')
selector = alt.selection_single(name='SelectorName',fields=['cutoff'],
                               bind=slider, init={'cutoff':0.09})

alt.Chart(df).mark_point().encode(
    alt.X('sigma*', scale=alt.Scale(type='log', base=10)),
    alt.Y('logM'),
    tooltip = 'Name',
    color = alt.condition(
    alt.datum.z<selector.cutoff,
    alt.value('blue'),alt.value('red'))
).add_selection(
    selector
)