# Visualization tools in Python

There are many packages for visualization in Python, which can be grouped into several broad categories:

* General plotting libraries where users control all elements in the plot and add different kinds of visual elements individually. [Matplotlib](https://matplotlib.org) is an example of such a package (although it does also have higher-level functionality).
* High-level plotting libraries that make it so that the user focuses on specifying what they want to show conceptually, leaving the details of the plotting to the library. [Seaborn](https://seaborn.pydata.org/) and [Altair](https://altair-viz.github.io/) are examples of this.
* More specialized visualization libraries that focus on a particular kind of visualization that might be domain-specific, for example [PyWWT](https://pywwt.readthedocs.io/en/stable/) which provides an interface to [WorldWide Telescope](http://worldwidetelescope.org).
* Visualization applications that provide additional functionality (such as e.g. data management) or user interfaces beyond the visualizations. An example of such an application is [glue](http://www.glueviz.org)

The [PyViz](https://pyviz.org/) website has been recently set up to serve as a portal to find out about all the different visualization tools in Python. In this tutorial, we look at a few examples of packages for the three first categories above.

To run this notebook, you will need to make sure you have the following packages installed:

* [Numpy](https://numpy.org)
* [Matplotlib](https://matplotlib.org)
* [seaborn](https://seaborn.pydata.org/)
* [Altair](https://altair-viz.github.io/index.html)
* [PyWWT](https://pywwt.readthedocs.io)

You can install these with:

    conda install -c conda-forge numpy matplotlib astropy seaborn altair vega pywwt notebook
    
or

    pip install numpy matplotlib astropy seaborn altair vega pywwt notebook
    
For clarity, we hide warnings:

In [None]:
import warnings
warnings.simplefilter('ignore')

## Dataset

For this notebook, we will use the latest table of confirmed exoplanets (as of 20 August 2019) from the [NASA Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu/):

In [None]:
from astropy.table import Table

In [None]:
catalog = Table.read('planets_2019.08.20_09.21.47.tbl', format='ascii.ipac')

In [None]:
len(catalog)

In [None]:
catalog[:3]

## General plotting libraries

One of the most popular plotting library in Python is [Matplotlib](https://matplotlib.org), which provides a very wide range of functionality, and makes it possible to optionally dive down to the low level and modify virtually any element in the plot. Let's start off by making a simple scatter plot of the exoplanet data:

In [None]:
import matplotlib.pyplot as plt

In [None]:
ax = plt.subplot(1, 1, 1)
ax.loglog(catalog['pl_orbper'], catalog['pl_bmassj'], '.')
ax.set_xlabel('Orbital Period [days]')
ax.set_ylabel('Planet Mass [Mjup]')

Matplotlib is great when you want full control over a plot, and also supports interactive plots in Jupyter using the ``%matplotlib notebook`` directive:

In [None]:
%matplotlib notebook

ax = plt.subplot(1, 1, 1)

ax.loglog(catalog['pl_orbper'], catalog['pl_bmassj'], '.')

ax.annotate('Hot Jupiters', xy=(5, 1),
            xytext=(0.06, 200), arrowprops={'arrowstyle': '->'})

ax.set_xlim(0.01, 1e6)
ax.set_ylim(1e-4, 1e3)

ax.set_title('All confirmed exoplanets to date', weight='bold')

ax.grid()

ax.set_xlabel('Orbital Period [days]', weight='bold')
ax.set_ylabel('Planet Mass [Mjup]', weight='bold')

ax.plot([365.25], [0.00314], 'o', mec='black', mfc='lightgreen', markersize=10)

ax.annotate('Earth', xy=(500, 0.00314),
            xytext=(3000, 0.00314), arrowprops={'arrowstyle': '->'})

plt.rcParams['axes.formatter.min_exponent'] = 3

Matplotlib is extremely versatile, and you can find many examples and types of plots in the [example gallery](https://matplotlib.org/gallery.html).

## High-level plotting libraries

### Seaborn

First, we look at [Seaborn](https://seaborn.pydata.org/), which is built on top of Matplotlib. The aim of seaborn is to make it easier to plot various types of statistical plots, and provide defaults for the visualization that are nicer than those in Matplotlib. Seaborn works using Pandas DataFrame objects, so we convert our catalog to a DataFrame with:

In [None]:
catalog.convert_bytestring_to_unicode()

In [None]:
import numpy as np
df = catalog.to_pandas()

Since setting up log scales can be a little verbose, we pre-compute a few useful log values from the table:

In [None]:
df['log_pl_orbper'] = np.log10(df['pl_orbper'])
df['log_pl_radj'] = np.log10(df['pl_radj'])
df['log_pl_bmassj'] = np.log10(df['pl_bmassj'])
df['log_st_teff'] = np.log10(df['st_teff'])
df['log_gaia_dist'] = np.log10(df['gaia_dist'])

Next up we import seaborn and change the Matplotlib backend back to the non-interactive one:

In [None]:
%matplotlib inline
import seaborn as sns
sns.set()

We can then easily make a variety of statistical plots - for example a joint distribution plot between the orbital period and the planet mass:

In [None]:
sns.jointplot(data=df, x='log_pl_orbper', y='log_pl_bmassj')

A correlation plots between pairs of columns:

In [None]:
sns.pairplot(data=df, vars=['log_pl_orbper', 'log_pl_bmassj', 'log_st_teff', 'log_gaia_dist'])

A violin plot showing the range of planet masses for some of the discovery methods:

In [None]:
sns.violinplot(x="log_pl_bmassj", y="pl_discmethod", data=df, orient='h', width=1.5,
               order=['Radial Velocity', 'Transit', 'Microlensing', 'Imaging'])

And a plot with linear regression models:

In [None]:
sns.lmplot(x="log_pl_bmassj", y="log_pl_radj", hue='pl_discmethod', col='pl_discmethod',
           col_order=['Radial Velocity', 'Transit'], data=df);

You can find many more examples in the [example gallery](https://seaborn.pydata.org/examples/index.html).

### Altair

Another example is the [Altair](https://altair-viz.github.io/) package which is based on [Vega](https://vega.github.io/vega/) and [Vega Lite](https://vega.github.io/vega-lite/) which are 'grammars' for visualization, that is, ways of specifying the content of a visualiation without worrying about how exactly it is rendered. Vega and Vega-Lite are based on a JSON serialization that can be used by a variety of packages. Altair is a Python package that makes it easy to produce such visualizations and optionally serialize it to JSON for exchange with other applications.

To start off, we produce a ``Chart`` object:

In [None]:
import altair as alt
alt.renderers.enable('notebook')
chart = alt.Chart(df)

This doesn't show anything yet because we haven't set up the visualization. We can tell Altair to show the data as points with:

In [None]:
chart.mark_point()

We haven't told Altair what to put on which axis, so by default it will just show all the points at the same location. We can now specify which variable to show on the y-axis with:

In [None]:
chart.mark_point().encode(
    y='pl_discmethod'
)

and similarly for the y-axis:

In [None]:
chart.mark_point().encode(
    x=alt.X('pl_bmassj', scale=alt.Scale(type='log')),
    y='pl_discmethod'
)

We can specify that we want to e.g. average the mass values for each discovery method:

In [None]:
chart.mark_point().encode(
    x=alt.X('average(pl_bmassj)', scale=alt.Scale(type='log')),
    y='pl_discmethod'
)

Altair supports making interactive plots:

In [None]:
alt.Chart(df).mark_circle(size=60).encode(
    x=alt.X('pl_orbper', scale=alt.Scale(type='log'), axis=alt.Axis(grid=False)),
    y=alt.Y('pl_bmassj', scale=alt.Scale(type='log'), axis=alt.Axis(grid=False)),
    tooltip=['pl_name', 'pl_discmethod', 'pl_orbper', 'pl_bmassj'],
).interactive()

and many more kinds of plots, which you can see in the [example gallery](https://altair-viz.github.io/gallery/index.html).

## Specialized visualizations

The third main category of visualization packages are ones that offer specialized, in some cases domain-specific, types of plots. Here we can try out the [PyWWT](https://pywwt.readthedocs.io/en/stable/) package which provides an interface to [WorldWide Telescope](http://worldwidetelescope.org).

In [None]:
from pywwt.jupyter import WWTJupyterWidget

In [None]:
wwt = WWTJupyterWidget()
wwt

In [None]:
wwt.layer_controls

Note - to zoom, press the shift key and use e.g. two-finger scroll.

PyWWT makes it possible to plot tabular data (and soon image data) using:

In [None]:
layer = wwt.layers.add_data_layer(table=catalog, lon_att='ra', lat_att='dec', frame='Sky')

We can change the size of the points with:

In [None]:
layer.size_scale = 100

and we can also set the point olor according to an attribute:

In [None]:
layer.cmap_att = 'st_teff'
layer.cmap_vmin = 3000
layer.cmap_vmax = 6000

WorldWide Telescope also has a full 3-d mode which can be enable as follows:

In [None]:
wwt.set_view('solar system')
wwt.solar_system.cosmos = False  # disable large-scale structure for performance

We can now tell the data layer about the distance as a third dimension:

In [None]:
layer.alt_att = 'gaia_dist'
layer.alt_unit = 'pc'
layer.far_side_visible = True