# Session 03: Intro to Matplotlib Plotting on the Science Platform

<br>Owner(s): **Keith Bechtol** ([@bechtol](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@bechtol))
<br>Last Verified to Run: **2020-05-19**
<br>Verified Stack Release: **w_2020_19**

This notebook is intended as a warm-up to the Visualization lesson (Lesson 3), providing a brief introduction to data visualization with [matplotlib](https://matplotlib.org/). Matplotlib is one of the most widely used libraries for data visualization in astrophysics, and extensive documentation as well as many [examples](https://matplotlib.org/gallery/index.html) can be found with a quick websearch. Matplotlib is also part of the [PyViz](https://pyviz.org/) suite of visualization tools for python. This notebook walks through a few examples to get you started quickly plotting data from Rubin Observatory and precursor datasets.

Today we'll cover:
* How to create a few common types of plots for tabular data (histograms, scatter plots)
* How to customize plot style, e.g., colors, markerstyle, axis labels, legends, etc.

We'll use the same datasets 

In [None]:
# What version of the Stack am I using?
! echo $HOSTNAME
! eups list -s lsst_distrib

## Preliminaries

Let's begin by importing plotting packages. Right away, we are faced with a choice as to which [backend](https://matplotlib.org/faq/usage_faq.html#what-is-a-backend) to use for plotting. For this demo, we'll use the [ipympl](https://github.com/matplotlib/ipympl) backend that allows us to create interactive plots (e.g., pan, zoom, and resize canvas capability) in a JupyterLab notebook. This option is enabled with the line

```%matplotlib widget```

Once the backend is set, one needs to restart the kernel to select a different backend. Alternatively, one could use the *inline* backend, if no user interactivity is required.

```%matplotlib inline```

It appears that the *inline* backend is used by default on the Science Platform.

Some discussion on the relationship between matplotlib, pyplot, and pylab [here](https://matplotlib.org/faq/usage_faq.html#matplotlib-pyplot-and-pylab-how-are-they-related).

In [None]:
# Non-interactive plots
#%matplotlib inline
# Enable interactive plots
%matplotlib widget 
#%matplotlib ipympl

import numpy as np
import matplotlib
import matplotlib.pyplot as plt

We can [customize plotting style with matplotlib](https://matplotlib.org/3.2.1/tutorials/introductory/customizing.html) by setting default parameters. This is an optional step if you are fine with the default style.

In [None]:
matplotlib.rcParams["figure.figsize"] = (6, 4)
matplotlib.rcParams["font.size"] = 10
matplotlib.rcParams["figure.dpi"] = 120

## Abstract Example

Let's do one completely abstract example just to illustration purposes. First, make some simple data.

In [None]:
x = np.linspace(0, 2 * np.pi, 100)
y1 = x
y2 = x**3
y3 = np.cos(x)
y4 = np.sin(x)

Now, we can very quickly get started. 

**Exercise:** Use the interactive widgets to pan and zoom, and to adjust the canvas size. The "Home" button should bring you back to the original figure.

In [None]:
plt.figure()
plt.scatter(x, y4)

We can enhance the figure with various labels and adjust the visual appearance.

**Exercise:** modify the cell below to change the plotting style.

In [None]:
plt.figure()
plt.plot(x, y1, 
         label='y1')
plt.plot(x, y2, 
         lw=2, label='y2')
plt.plot(x, y3, 
         ls='--', label='y3')
plt.scatter(x, y4, 
            c='black')
plt.plot(x, y4, label='y4')
plt.xlabel('The horizontal axis')
plt.ylabel('The vertical axis')
plt.title('My Plot')
plt.xlim(0., 2 * np.pi)
plt.ylim(-2, 2.)
plt.legend(loc='upper right')

The MATLAB style used above is suitable for quick plotting, but for more involved applications, it is advised to use the more verbose pyplot style. The example below is directly inspired by the example [here](https://matplotlib.org/faq/usage_faq.html#coding-styles).

In [None]:
def my_plotter(ax, data1, data2, param_dict):
    """
    A helper function to make a graph

    Parameters
    ----------
    ax : Axes
        The axes to draw to

    data1 : array
       The x data

    data2 : array
       The y data

    param_dict : dict
       Dictionary of kwargs to pass to ax.plot

    Returns
    -------
    out : list
        list of artists added
    """
    out = ax.plot(data1, data2, **param_dict)
    ax.legend()
    return out

Re-create the earlier example in the pyplot style.

In [None]:
fig, ax = plt.subplots(1, 1)
my_plotter(ax, x, y4, {'marker':'o', 'label':'y1'})
ax.set_xlabel('The Horizontal Axis')
ax.set_ylabel('The Vertical Axis')
ax.set_xlim(0, 2. * np.pi)

In the example above, we have specified which `axes` to draw the plot on. A single figure can own multiple axes, as can be seen in the example below. We see the power of the more object-oriented pyplot style as we create more complex visualizations.

In [None]:
fig, ax = plt.subplots(2, 2)
my_plotter(ax[0][0], x, y1, {'marker':'x', 'label':'y1'})
my_plotter(ax[0][1], x, y2, {'marker':'o', 'label':'y2'})
my_plotter(ax[1][0], x, y3, {'color':'red', 'label':'y3'})
my_plotter(ax[1][1], x, y4, {'ls':'--', 'label':'y4'})
#plt.subplots_adjust(hspace=0, left=0.25) # Optionally adjust the layout

Before moving on, a quick histogram example.

In [None]:
# Generate 1M random points from a normal distribution
z = np.random.normal(size=1000000)

# Specify the binning
bins = np.linspace(-5., 5, 101)

# Now create the figure
plt.figure()
plt.hist(z, bins=bins)
plt.xlabel('Value')
plt.ylabel('Counts')

## Access HSC Data

The cell below typically takes roughly a minute to run.

Now let's access some data, specifically we'll use a utility function to assemble a catalog of good quality coadd objects from a few neighboring patches in the HSC RC2 dataset used for continuous integration testing. The function returns a python dictionary of pandas [DataFrames](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), with one DataFrame per band. DataFrames are general purpose tools for working with tabular data and have no specific connection to astronomy or the software stack. The process to create this catalog is inspired by the Stack tutorial [here](https://pipelines.lsst.io/getting-started/multiband-analysis.html). 

In [None]:
%%time

import utils
data = utils.getData()

In [None]:
# In case we need to modify the utility function
#import importlib
#importlib.reload(utils)

Let's look at the dictionary object returned. Notice that we added columns for the PSF and CModel magnitudes.

In [None]:
# Show the keys
print(data.keys())
# Show one of the DataFrames, in this, corresponding to the HSC i band
data['HSC-I'].head()

If you are curious to explore the coadd forced object catalog further, here's the forced source catalog for a single band and single patch. Remembering what Yusra showed last week, we can access a specific data product by using the Butler with a specific `dataId` (which is similar to a SQL `where` statement).

In [None]:
from lsst.daf.persistence import Butler
REPO = '/datasets/hsc/repo/rerun/RC/w_2020_19/DM-24822'
butler = Butler(REPO)

# Available tracts: 9615  9697  9813
dataid = {'filter':'HSC-I', 'tract':9697, 'patch':'0,0'}
coadd_forced_src = butler.get('deepCoadd_forced_src', dataId=dataid)

# Full list of columns
#coadd_forced_src.getSchema().getNames()

## Scatter Plot Example

First a sanity check plot to show that the same set of matched sources is found in the three bands. Notice that `coord_ra` and `coord_dec` are given in radians. We can zoom in and pan around to confirm that the measurements in the three bands correspond to a single matched set of objects.

In [None]:
plt.figure()
plt.scatter(data['HSC-G']['coord_ra'], data['HSC-G']['coord_dec'], marker='+', label='G')
plt.scatter(data['HSC-R']['coord_ra'], data['HSC-R']['coord_dec'], marker='x', label='R')
plt.scatter(data['HSC-I']['coord_ra'], data['HSC-I']['coord_dec'], marker='2', label='I')
plt.legend()
plt.xlabel('RA')
plt.ylabel('Dec')

Often when exploring a multidimensional space, it is helpful to visualize three or more quantities simulataneously using color-coded markers on scatter plots. Note that it is also possible to pass an array of marker sizes to the `scatter` function to plot points with different size values. Below is a color-color diagram with the points color-coded according to their consistency with a PSF model: unresolved stars will have concentration values near zero, while morphologically extended objects like resolved galaxies will have positive concentration values.

In [None]:
# Variables to plot
concentration = data['HSC-I']['psf_mag'] - data['HSC-I']['cm_mag']
gr = data['HSC-G']['cm_mag'] - data['HSC-R']['cm_mag']
ri = data['HSC-R']['cm_mag'] - data['HSC-I']['cm_mag']

plt.figure()
# The vmin and vmax control the colorbar range 
# We're using a smaller point-like marker "."
# The "s" keywork argument controls the marker size
plt.scatter(gr, ri, 
            c=concentration, 
            vmin=-0.02, vmax=0.2,
            marker='.',s=1)
# Notice that we can use LaTeX math syntax in the plot labels
plt.xlabel('$g - r$')
plt.ylabel('$r - i$')
plt.colorbar(label='Concentration')

Let's separate the stars and galaxies and compare their colors and morphology.

In [None]:
ext = (data['HSC-I']['base_ClassificationExtendedness_value'] == 1.)

# Single-panel figure
#plt.figure()
#plt.scatter(gr[ext], ri[ext],
#            marker='.', label='Galaxies')
#plt.scatter(gr[~ext], ri[~ext],
#            marker='.', label='Stars')
#plt.legend(loc='upper left')
#plt.xlabel('$g - r$')
#plt.ylabel('$r - i$')

# Two-panel figure
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].scatter(gr[ext], ri[ext],
              marker='.', label='Galaxies')
ax[0].scatter(gr[~ext], ri[~ext],
              marker='.', label='Stars')
ax[0].legend(loc='upper left')
ax[0].set_xlabel('$g - r$')
ax[0].set_ylabel('$r - i$')
ax[1].scatter(data['HSC-I']['cm_mag'][ext], 
              concentration[ext],
              marker='.', label='Galaxies')
ax[1].scatter(data['HSC-I']['cm_mag'][~ext], 
              concentration[~ext],
              marker='.', label='Stars')
ax[1].legend(loc='upper left')
ax[1].set_xlabel('$i$')
ax[1].set_ylabel('Concentration')
ax[1].set_ylim(-0.2, 1.)
plt.subplots_adjust(wspace=0.3, left=0.075)

## Histogram Example

Next we compare the flux distribution of stars and galaxies using histograms.

In [None]:
bins = np.arange(16., 30., 0.5)

plt.figure()
plt.yscale('log')
kwargs = {'bins': bins,
          'histtype': 'step',
          'lw': 2}
plt.hist(data['HSC-I']['cm_mag'], **kwargs, label='All')
plt.hist(data['HSC-I']['cm_mag'][ext], **kwargs, label='Galaxies')
plt.hist(data['HSC-I']['cm_mag'][~ext], **kwargs, label='Stars')
plt.xlim(18., 26.)
plt.legend(loc='upper left')

## Two-dimensional Histograms

Two-dimensional histograms are useful as we increase the number of data points to plot. The example below (based on this [demo](https://matplotlib.org/examples/pylab_examples/hist2d_log_demo.html)) shows how to use a logarithmic colorscale. In this example, we have used one of the perceptually uniform [colormaps](https://matplotlib.org/tutorials/colors/colormaps.html?highlight=colormaps) that are more colorblind friendly and convert better to grayscale.

In [None]:
plt.figure()
plt.hist2d(gr[~ext], ri[~ext], 
           norm=matplotlib.colors.LogNorm(),
           bins=51, cmap='plasma')
plt.colorbar(label='Counts')
plt.xlabel('$g - r$')
plt.ylabel('$r - i$')
plt.xlim(-0.5, 2.5)
plt.ylim(-0.5, 2.5)

We might want to compare two different distributions, sometimes it is useful to draw two sets of contours instead of a two-dimensional histogram. First define a helper function to draw the contours. (Notice that we apply a Gaussian KDE to the data so that the contours are smooth.)

In [None]:
from scipy.stats import kde

def contour(ax, x, y, nbins=51, **kwargs):
    data = np.vstack([x, y])
    k = kde.gaussian_kde(data)
    xi, yi = np.mgrid[x.min():x.max():nbins*1j, y.min():y.max():nbins*1j]
    zi = k(np.vstack([xi.flatten(), yi.flatten()]))
    ax.contour(xi, yi, zi.reshape(xi.shape), **kwargs)

Now create the figure and plot.

In [None]:
fig, ax = plt.subplots(1,1)
#contour(ax, gr[ext], ri[ext], cmap='Reds')
#contour(ax, gr[~ext], ri[~ext], cmap='Blues')
ax.scatter(gr[ext], ri[ext], s=1, edgecolor='none', c='black', alpha=0.5)
ax.scatter(gr[~ext], ri[~ext], s=1, edgecolor='none', c='red', alpha=0.5)
contour(ax, gr[ext], ri[ext], colors='black')
contour(ax, gr[~ext], ri[~ext], colors='red')
#ax.legend()
ax.set_xlabel('$g - r$')
ax.set_ylabel('$r - i$')
ax.set_xlim(-0.5, 2.5)
ax.set_ylim(-0.5, 2.5)

## Wrap-up

There is much more to learn about visualization with matplotlib, but that should be enough to get started exploring precursor and simulated Rubin Observatory datasets for this Stack Club course. If you want to see more examples of cool matplotlib figures, check out the [matplotlib image gallery](https://matplotlib.org/gallery/index.html).

**Exercise:** Create your own figure to explore this small HSC dataset. For example, try creating a stellar color-magnitude diagram with [error bars](https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.errorbar.html) in the pyplot style.

Hint: The magnitude uncertainty can be computed as 

$\sigma_{\rm mag} \approx 2.5 \log_{10} \left(1 + \rm{SNR}^{-1} \right)$

with signal-to-noise evaluated as

$\rm{SNR} = \frac{\rm{flux}}{\sigma_{\rm flux}}$.

In [None]:
def magnitudeError(flux, flux_err):
    snr = data['HSC-I']['base_PsfFlux_instFlux'] / data['HSC-I']['base_PsfFlux_instFluxErr']
    mag_err = 2.5 * np.log10(1. + snr**-1)
    return mag_err

In [None]:
magnitudeError(data['HSC-I']['base_PsfFlux_instFlux'], data['HSC-I']['base_PsfFlux_instFluxErr'])