# Session 03: Intro to Matplotlib Plotting on the Science Platform

<br>Owner(s): **Keith Bechtol** ([@bechtol](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@bechtol))
<br>Last Verified to Run: **2020-05-17?**
<br>Verified Stack Release: **19.0?**

This notebook is intended as a warm-up to the Visualization lesson (Lesson 3), providing a brief introduction to data visualization with matplotlib. Matplotlib is one of the most widely used libraries for data visualization in astrophysics, and extensive documentation and many examples can be found with a quick websearch. This notebook walks through a few examples to get you started quickly.

Today we'll cover:
* How to create a few common types of plots (histograms, scatter plots)
* How to modify plot style, e.g., colors, markerstyle, axis labels, legends, etc.

We'll use the same datasets 

In [None]:
# What version of the Stack am I using?
! echo $HOSTNAME
! eups list -s lsst_distrib

## Preliminaries

Let's begin by importing plotting packages. Right away, we are faced with a choice as to which [backend](https://matplotlib.org/faq/usage_faq.html#what-is-a-backend) to use for plotting. For this demo, we'll use the [ipympl](https://github.com/matplotlib/ipympl) backend that allows us to create interactive plots (e.g., pan, zoom, and resize canvas capability) in a JupyterLab notebook. This option is enabled with the line

```%matplotlib widget```

Once the backend is set, one needs to restart the kernel to select a different backend. Alternatively, one could use the *inline* backend, if no user interactivity is required.

```%matplotlib inline```

It appears that the *inline* backend is used by default on the Science Platform.

Some discussion on the relationship between matplotlib, pyplot, and pylab [here](https://matplotlib.org/faq/usage_faq.html#matplotlib-pyplot-and-pylab-how-are-they-related).

In [None]:
# Non-interactive plots
#%matplotlib inline 
#%matplotlib ipympl
# Enable interactive plots
%matplotlib widget 

import numpy as np
import matplotlib
import matplotlib.pyplot as plt

#%matplotlib widget
#%matplotlib notebook
#plt.ion()
#import pylab
#pylab.ion()

We can [customize plotting style with matplotlib](https://matplotlib.org/3.2.1/tutorials/introductory/customizing.html) by setting default parameters. This is an optional step if you are fine with the default style.

In [None]:
matplotlib.rcParams["figure.figsize"] = (6, 4)
matplotlib.rcParams["font.size"] = 10
matplotlib.rcParams["figure.dpi"] = 120

## Abstract Example

Let's do one completely abstract example just to illustration purposes. First, make some simple data.

In [None]:
x = np.linspace(0, 2 * np.pi, 100)
y1 = x
y2 = x**3
y3 = np.cos(x)
y4 = np.sin(x)

Now, we can very quickly get started. 

**Exercise:** Use the interactive widgets to pan and zoom, and to adjust the canvas size. The "Home" button should bring you back to the original figure.

In [None]:
plt.figure()
plt.scatter(x, y4)

We can enhance the figure with various labels and adjust the visual appearance.

**Exercise:** modify the cell below to change the plotting style.

In [None]:
plt.figure()
plt.plot(x, y1, 
         label='y1')
plt.plot(x, y2, 
         lw=2, label='y2')
plt.plot(x, y3, 
         ls='--', label='y3')
plt.scatter(x, y4, 
            c='black')
plt.plot(x, y4, label='y4')
plt.xlabel('The horizontal axis')
plt.ylabel('The vertical axis')
plt.title('My Plot')
plt.xlim(0., 2 * np.pi)
plt.ylim(-2, 2.)
plt.legend(loc='upper right')

The MATLAB style used above is suitable for quick plotting, but for more involved applications, it is advised to use the more verbose pyplot style. The example below is directly inspired by the example [here](https://matplotlib.org/faq/usage_faq.html#coding-styles).

In [None]:
def my_plotter(ax, data1, data2, param_dict):
    """
    A helper function to make a graph

    Parameters
    ----------
    ax : Axes
        The axes to draw to

    data1 : array
       The x data

    data2 : array
       The y data

    param_dict : dict
       Dictionary of kwargs to pass to ax.plot

    Returns
    -------
    out : list
        list of artists added
    """
    out = ax.plot(data1, data2, **param_dict)
    ax.legend()
    return out

Re-create the earlier example in the pyplot style.

In [None]:
fig, ax = plt.subplots(1, 1)
my_plotter(ax, x, y4, {'marker':'o', 'label':'y1'})
ax.set_xlabel('The Horizontal Axis')
ax.set_ylabel('The Vertical Axis')
ax.set_xlim(0, 2. * np.pi)

We see the power of the more object-oriented pyplot style as we create more complex visualizations.

In [None]:
fig, ax = plt.subplots(2, 2)
my_plotter(ax[0][0], x, y1, {'marker':'x', 'label':'y1'})
my_plotter(ax[0][1], x, y2, {'marker':'o', 'label':'y2'})
my_plotter(ax[1][0], x, y3, {'color':'red', 'label':'y3'})
my_plotter(ax[1][1], x, y4, {'ls':'--', 'label':'y4'})
#plt.subplots_adjust(hspace=0, left=0.25) # Optionally adjust the layout

In [None]:
# Probably not needed
#import lsst.afw.display as afw_display

## Access HSC Data

Now let's access some data, specifically the same source catalog we accessed in Lesson 2.

In [None]:
%%time

import utils
data = utils.getData()

In [None]:
import importlib
importlib.reload(utils)

In [None]:
REPO = '/datasets/hsc/repo/rerun/RC/w_2020_19/DM-24822'  
from lsst.daf.persistence import Butler
butler = Butler(REPO)

VISIT = 34464
CCD = 81
src = butler.get('src', visit=VISIT, ccd=CCD)
src

In [None]:
#9615  9697  9813
dataid = {'filter':'HSC-I', 'tract':9697, 'patch':'0,0'}
coadd_meas = butler.get('deepCoadd_meas', dataId=dataid)

In [None]:
#coadd_meas.getSchema().getNames()

In [None]:
#src.getSchema().getNames()

In [None]:
#help(coadd_meas.asAstropy().to_pandas().loc)

In [None]:
#columns = ['coord_ra','coord_dec']
columns = ['id',
           'coord_ra', 
           'coord_dec',
           'modelfit_CModel_instFlux',
           'modelfit_CModel_instFluxErr',
           'base_PsfFlux_instFlux',
           'base_PsfFlux_instFluxErr',
           'base_ClassificationExtendedness_flag',
           'base_ClassificationExtendedness_value'
          ]
           #'base_SdssCentroid_x',
           #'base_SdssCentroid_y',
           #'calib_psfCandidate',
           #'calib_psfUsed',
           #'base_ClassificationExtendedness_value',
           #'base_ClassificationExtendedness_flag',
           #'slot_Centroid_x',
           #'slot_Centroid_y',
           #'slot_Shape_xx',
           #'slot_Shape_yy',
           #'slot_Shape_xy',
           #'slot_PsfShape_xx',
           #'slot_PsfShape_yy',
           #'slot_PsfShape_xy',
           #'slot_ModelFlux_flux',
           #'slot_ModelFlux_fluxSigma',
           #'slot_PsfFlux_flux',
           #'slot_PsfFlux_fluxSigma']
#coadd_meas.asAstropy().to_pandas().loc[:,['coord_ra','coord_dec']]
coadd_meas.asAstropy().to_pandas().loc[:,columns]

Options for using matplotlib in a notebook.

[here](https://github.com/matplotlib/ipympl)

In [None]:
import pandas as pd

# Set SHORTCUT = True for quick evaluation but lower statistics, 
# SHORTCUT = False to get all the objects from all the patches in the tract (~10 mins)
# If you don't take the shortcut, youll need to do some uncommenting below as well.

SHORTCUT = True

skymap = butler.get('deepCoadd_skyMap')

# Pick a random tract and collect all the patches
TRACT = 9615
patch_array = []
for ii in range(skymap.generateTract(tract).getNumPatches()[0]):
    for jj in range(skymap.generateTract(tract).getNumPatches()[1]):
        patch_array.append('%s,%s'%(ii, jj))
tract_array = np.tile(tract, len(patch_array))

if SHORTCUT:
    # Only get three patches
    df_tract_patch = pd.DataFrame({'tract': [TRACT, TRACT, TRACT],
                                   'patch': ['0,0', '0,1', '0,2']})
else:
    # Get all the object catalogs from one tract
    df_tract_patch = pd.DataFrame({'tract': tract_array,
                                   'patch': patch_array})

In [None]:
df_tract_patch

In [None]:
"""
selected_columns = ['id',
                    'coord_ra', 
                    'coord_dec',
                    'base_SdssCentroid_x',
                    'base_SdssCentroid_y',
                    'calib_psfCandidate',
                    'calib_psfUsed',
                    'base_ClassificationExtendedness_value',
                    'base_ClassificationExtendedness_flag',
                    'slot_Centroid_x',
                    'slot_Centroid_y',
                    'slot_Shape_xx',
                    'slot_Shape_yy',
                    'slot_Shape_xy',
                    'slot_PsfShape_xx',
                    'slot_PsfShape_yy',
                    'slot_PsfShape_xy',
                    'slot_ModelFlux_flux',
                    'slot_ModelFlux_fluxSigma',
                    'slot_PsfFlux_flux',
                    'slot_PsfFlux_fluxSigma']
"""

In [None]:
#select_columns = ['coord_ra', 
#                  'coord_dec']
selected_columns = ['id',
                    'coord_ra', 
                    'coord_dec',
                    'modelfit_CModel_instFlux',
                    'modelfit_CModel_instFluxErr',
                    'base_PsfFlux_instFlux',
                    'base_PsfFlux_instFluxErr',
                    'base_ClassificationExtendedness_flag',
                    'base_ClassificationExtendedness_value',
                    'base_SdssCentroid_flag',
                    'base_PixelFlags_flag_interpolated',
                    'base_PixelFlags_flag_saturated',
                    'base_PsfFlux_flag',
                    'modelfit_CModel_flag']

In [None]:
%%time

FILTERS = ['HSC-G', 'HSC-R', 'HSC-I']

data = {}
for band in FILTERS:
    coadd_array = []
    selection_array = []
    for ii in range(0, len(df_tract_patch)):
        tract, patch = df_tract_patch['tract'][ii], df_tract_patch['patch'][ii] 
        print(band, tract, patch)
        dataid = {'filter':band, 'tract':tract, 'patch':patch}
        coadd_ref = butler.get('deepCoadd_ref', dataId=dataid)
        #coadd_meas = butler.get('deepCoadd_meas', dataId=dataid)
        coadd_meas = butler.get('deepCoadd_forced_src', dataId=dataid)
        coadd_calib = butler.get('deepCoadd_calexp_photoCalib', dataId=dataid)
    
        selected_rows = coadd_ref['detect_isPrimary']
        #selected_rows = (coadd_ref['detect_isPrimary']
        #                 & ~coadd_meas['base_SdssCentroid_flag']
        #                 & ~coadd_meas['base_PixelFlags_flag_interpolated']
        #                 & ~coadd_meas['base_PixelFlags_flag_saturated']
        #                 & ~coadd_meas['base_PsfFlux_flag']
        #                 & ~coadd_meas['modelfit_CModel_flag'])
    
        print(np.sum(selected_rows))
        coadd_array.append(coadd_meas.asAstropy().to_pandas().loc[selected_rows, selected_columns])
        #coadd_array[-1]['detect_isPrimary'] = coadd_ref['detect_isPrimary'][selected_rows]
        coadd_array[-1]['psf_mag'] = coadd_calib.instFluxToMagnitude(coadd_meas[selected_rows], 'base_PsfFlux')[:,0]
        coadd_array[-1]['cm_mag'] = coadd_calib.instFluxToMagnitude(coadd_meas[selected_rows], 'modelfit_CModel')[:,0]
    
    #df_coadd = pd.concat(coadd_array)
    data[band] = pd.concat(coadd_array)

In [None]:
# Require good quality measurements in all bands
selected_rows = []
for band in FILTERS:
    snr = data[band]['modelfit_CModel_instFlux'] / data[band]['modelfit_CModel_instFluxErr']
    print(snr)
    selected_rows.append(~data[band]['base_SdssCentroid_flag'] 
                         & ~data[band]['base_PixelFlags_flag_interpolated']
                         & ~data[band]['base_PixelFlags_flag_saturated']
                         & ~data[band]['base_PsfFlux_flag']
                         & ~data[band]['modelfit_CModel_flag']
                         & (snr > 10.))
    print(np.sum(data[band]['base_SdssCentroid_flag']))
    print(np.sum(data[band]['base_PixelFlags_flag_interpolated']))
    print(np.sum(data[band]['base_PixelFlags_flag_saturated']))
    print(np.sum(data[band]['base_PsfFlux_flag']))
    print(np.sum(data[band]['modelfit_CModel_flag']))
    print(band, np.sum(selected_rows))

selected_rows = np.all(selected_rows, axis=0)

for band in FILTERS:
    data[band] = data[band].loc[selected_rows]
    print(len(data[band]))

In [None]:
#df_coadd
data['HSC-I'].shape

In [None]:
np.all(data['HSC-I']['detect_isPrimary'])

In [None]:
butler.get('deepCoadd_forced_src', dataId=dataid)

In [None]:
pylab.figure()
pylab.scatter(data['HSC-G']['coord_ra'], data['HSC-G']['coord_dec'], marker='+')
pylab.scatter(data['HSC-R']['coord_ra'], data['HSC-R']['coord_dec'], marker='x')
pylab.scatter(data['HSC-I']['coord_ra'], data['HSC-I']['coord_dec'], marker='2')

In [None]:
ext = (data['HSC-I']['base_ClassificationExtendedness_value'] == 1.)

gr = data['HSC-G']['cm_mag'] - data['HSC-R']['cm_mag']
ri = data['HSC-R']['cm_mag'] - data['HSC-I']['cm_mag']

pylab.figure()
pylab.scatter(gr[ext],
              ri[ext])
pylab.scatter(gr[~ext],
              ri[~ext])
    
#pylab.scatter(data['HSC-G']['cm_mag'] - data['HSC-R']['cm_mag'], 
#              data['HSC-R']['cm_mag'] - data['HSC-I']['cm_mag'],
#              marker='.', edgecolor='none')

In [None]:
ext == 1.
np.sum(ext)
#len(ext)

## Histogram Example

In [None]:
import numpy as np
x = np.random.random(size=1000)
y = np.random.random(size=1000)

In [None]:
bins = np.arange(18., 30., 0.5)

plt.figure()
pylab.yscale('log')
#plt.hist(df_coadd['psf_mag'], bins=bins)
plt.hist(data['HSC-I']['cm_mag'], bins=bins, histtype='step')
plt.hist(data['HSC-I']['cm_mag'][ext], bins=bins, histtype='step')
plt.hist(data['HSC-I']['cm_mag'][~ext], bins=bins, histtype='step')
#plt.show()

In [None]:
#pylab.figure()
#pylab.hist(x)

## Scatter Plot Example

In [None]:
plt.figure()
plt.scatter(x,y)