# WALLABY Example Workflow Tutorial

Here we provide an example of how Django can be used in a typical astronomy workflow. We will interact with the database using Python (enabled through the Django ORM, documentation for queries can be found [here](https://docs.djangoproject.com/en/3.1/topics/db/queries/)), use plots to compare the WALLABY detections and identify a subset of interest to explore further (possibly through other tools).

In this notebook we will assume that users are familiar with the Django ORM query syntax. As such, we won't describe in detail what each of the queries are doing. An introductory notebook is provided for those looking to work through more examples.

In [None]:
# ------- RUN THIS CELL FIRST -------

# import Python standard libraries
import os
import sys
import json
import django
from datetime import datetime

# Django setup
sys.path.append('src/')
django.setup()

# Import Django models
from run.models import Run
from instance.models import Instance
from detection.models import Detection
from products.models import Products

from sources.models import Sources
from comments.models import Comments
from tag.models import Tag

# Searching for sources

In this example let's analyse the detections in a certain patch of sky. These detections have been written into the WALLABY database so we will write a query to get all of that information into the appropriate data type in Python. First we'll import a bunch of libraries that we will need for this.

**NOTE**: If you need libraries other than those defined below, you can install them using `! pip install <library>` from an empty cell. The `!` before `pip install` is required and indicates for the command to be run via the terminal rather than the Python interpreter.

In [None]:
# Data analysis libraries

import pandas as pd
from astropy.io import fits
import mmap

In [None]:
# Visualisation libraries

import matplotlib.pyplot as plt
import ipywidgets as widgets
from matplotlib import figure
from ipywidgets import interact

## Column names for tables

Databases store objects in tables where rows represent entries and columns represent the data fields. We want to find the RA and Dec for the detections. These may be named differently so lets find the names for these fields in the detections table. We will list all of the parameters in the detections table.

In [None]:
# List column names and types for the detections table

Detection._meta.fields

Great so we see that there are two columns named `ra` and `dec` in the table that are represented by floats (they are numerical). 

## Filter by parameter

Now that we know the name of the parameter(s) of interest, we can construct queries based on these fields. Let's find all of the detections in the following region of sky:

* 120 < RA < 140
* -10 < Dec < 10

In [None]:
# Filter the detections by RA and Dec

detections_subset = Detection.objects.filter(ra__gte=120.0, ra__lte=140.0).filter(dec__gte=-10.0, dec__lte=10.0)

In [None]:
# How many detections in our subset of interest?

detections_subset.count()

## Print parameters for each of these detections

For this subset of detections let's print the parameter values in a table. We'll store it as a data frame using [pandas](https://pandas.pydata.org/) for more convenient access.

In [None]:
# Print detection table

detections_subset_df = pd.DataFrame(list(detections_subset.values()))
print(detections_subset_df)

## Plotting

Let's create a plot for the peak flux of the source against the major axis. It might show some interesting pattern in the detections. We'll use `matplotlib` for these visualisations.

In [None]:
# Create plot of detection subset (peak flux x major axis)

plt.rcParams['figure.figsize'] = (20, 10)

fig = plt.figure()
ax = fig.add_subplot(111)

plt.scatter(detections_subset_df['ell_maj'], detections_subset_df['f_max'])

for d in detections_subset:
    point = (float(d.ell_maj), float(d.f_max))
    ax.annotate(d.name, xy=point, textcoords='data')

plt.axhline(0.02, color='red')
plt.axvline(15, color='red')
plt.ylabel("Peak Flux [Jy]")
plt.xlabel("Major Axis [pixels]")
plt.grid()
plt.show()

## A subset of the subset

Suppose we find the sources with `f_max >= 0.02` and with `ell_maj >= 15` interesting. In the plot above this represents the top right quadrant of the red grid. We can perform a filter on our subset of data again to get the data points of interest. Let's not create a data frame for these since there probably won't be any need for it.

In [None]:
# Print the unique id, name, peak flux and semi-major axis of the sources of interest.

interesting_subset = detections_subset.filter(f_max__gte=0.02, ell_maj__gte=15)

print("unique id | name | max flux | major axis ")
for d in interesting_subset:
    print(f"{d.id} | {d.name} | {d.f_max} | {d.ell_maj}")

So we have the names of the detections (and they match up with the names that we see in the plots) as well as the flux, major axis and unique id. With the unique id, we are able to find that row in the database every time. 

## Visualisations

Now that we have identified detections of interest we can explore the products of the detection in the database. This includes information about the run, the SoFiA parameters used, and the output products from the observation. For example, if we wanted to visualise the moment 0 map and data cube, we could do that with the following section of code. Let's do that for the really bright detection: `WALLABY J085143-020813`.

**NOTE**: The cube data we're looking to visualise is stored in [FITS](https://heasarc.gsfc.nasa.gov/docs/heasarc/fits.html) format. `astropy` has functions for reading from FITS files that we will use to open this data. Unfortunately, we're only able to read from disk so we will have to save the FITS bytes from the database query into a temporary file. Therefore, a temporary file will need to be created for each product you wish to investigate.

In [None]:
# View fields of the Products table

Products._meta.fields

### Moment 0 Map

In [None]:
# Write FITS data to local file

detection_id = 2600
filename = 'tmp.fits'
bright_detection = Products.objects.get(id=detection_id)
moment_0_bytes = b''.join(list(bright_detection.moment0[:]))

with open(filename, "wb") as f:
    f.write(moment_0_bytes)

hdul = fits.open(filename)
image = hdul[0].data
hdul.close()

In [None]:
# Visualise moment 0 map.

plt.rcParams['figure.figsize'] = (10, 20)
plt.imshow(image)

### Data cube

The data cube is a 3D array. Rather than visualising the entire cube we will compare the 2D slices with a slider that allows the user to change the frequency channel. 

**NOTE**: Don't change the slider too quickly it will get super laggy.

In [None]:
# Read cube into np array.

detection_id = 2600
filename = 'cube.fits'
bright_detection = Products.objects.get(id=detection_id)
cube_bytes = b''.join(list(bright_detection.cube[:]))

with open(filename, "wb") as f:
    f.write(cube_bytes)

# write cube to np.array
hdul = fits.open(filename)
cube = hdul[0].data
hdul.close()

In [None]:
# Interactive plot of cube with slider for channel.

def cube_visualisation(c = 0):
    plt.imshow(cube[c, :, :])
    plt.show()

interact(cube_visualisation, c=widgets.IntSlider(min=0, max=cube.shape[0], step=1, value=0))

# Conclusion

In this workflow we have

* Queried the database for WALLABY detection data
* Created a plot comparing properties of the detections
* Identified interesting data points from the plot
* Inspected products from the detections of interest.

Hopefully it has been useful to run through this example workflow.