# WALLABY user download notebook

A notebook pre-filled with cells and scripts for downloading WALLABY tables and data products.

The notebook has the following sections

1. Initialise Django
2. List sources
3. View properties for a source
4. Download sources table
5. Download source associated data products 
6. Download summary plots for sources

## 1. Initialise Django

The following cells must be run first. They are used to import `django`, set up a connection to the database and import the Django models. Once these cells are run, you are able to use the Django model objects for access to the database.

In [None]:
# Essential Python libraries for using the ORM

import sys
import os
import django

In [None]:
# We will provide a plotting script that actually doesn't make use of the products retrieved above. Instead, just provide the detection and they will be retrieved in the function below.

import io
import math
import numpy as np

import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

from django.forms.models import model_to_dict

import astropy.units as u
from astropy.io import fits
from astropy.wcs import WCS
from astropy.visualization import PercentileInterval
from astroquery.skyview import SkyView
from astropy.utils.data import clear_download_cache

from astropy.table import Table

In [None]:
# Database access environment variables

os.environ["DJANGO_SECRET_KEY"] = "-=(gyah-@e$-ymbz02mhwu6461zv&1&8uojya413ylk!#bwa-l"
os.environ["DJANGO_SETTINGS_MODULE"] = "api.settings"
os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "True"
os.environ["DATABASE_HOST"] = "146.118.67.204"
os.environ["DATABASE_NAME"] = "wallabydb"
os.environ["DATABASE_USER"] = "wallaby_user"
os.environ["DATABASE_PASSWORD"] = "LKaRsQrNtXZ7vN8L*6"

In [None]:
# Connect with SoFiAX_services Django ORM

sys.path.append('/mnt/shared/wallaby/apps/SoFiAX_services/api/')
django.setup()

In [None]:
# Import models

from tables.models import Run, Instance, Detection, Product, Source, SourceDetection

## Setup

The intention for this notebook is to allow users to easily download all of the products and properties for all sources of a given run. To do this, you run the notebook to completion ("Run" > "Run All Cells" in the menu). If you are using it in this capacity, you can set the Run and whether to download summary plots for all sources in the cells below.

In [None]:
# Get all runs

Run.objects.all()

In [None]:
# Set run name

run_name = 'NGC4636_DR1'

In [None]:
# Save summary plots

save_plots = False

## 2. List Sources

Now that we have imported the Django models, we are able to use them for viewing tables in the database. We'll provide a few examples for how to get objects from the database, but for a more comprehensive guide you can look at the documentation [here](https://docs.djangoproject.com/en/3.2/topics/db/queries/#retrieving-objects).

To get a list of the sources we will need to decide which run we are interested in. This run is the region of sky. We can query the database to look the runs that are in the database. With Django, the rows of the database are retrieved in the form of Django objects called [QuerySets](https://docs.djangoproject.com/en/3.2/ref/models/querysets/). We can essentially treat them as lists (and can be converted directly to a native Python list with `list()` on the object).

### Table structure

Since we are interested in retrieving sources, we need to know how to get them for a given run. The diagram below shows the relationship between the run and sources tables. You can see that there is an intermediate source_detection table that is used to map a source and a detection, and that there is a relationship between the detection and run. 

![sources_table.drawio.png](attachment:84c38a9f-7f74-4efb-9135-e4e90e15e8b6.png)

So, started from the run we must do the following to get access to the sources using foreign key relationships:

1. Get Detections of interest in a Run
2. Get SourceDetection map from Detections
3. Get Sources from SourceDetection source_id field

In [None]:
# 0. List all runs in the database

Run.objects.all()

In [None]:
# 1. Get detections of interest in a run

detections = Detection.objects.filter(run_id = Run.objects.get(name=run_name))

In [None]:
# 2. Get the source_detection map for detections to get the source ids

detections_to_sources = SourceDetection.objects.filter(detection_id__in=[d.id for d in detections])

In [None]:
# 3. Get the sources from the source_detection ids

sources = Source.objects.filter(id__in=[d.source_id for d in detections_to_sources])
print([s.name for s in sources])

Great! Now we have access to all of the sources for a run.

## 3. View source properties

It's probably not useful to just be able to see the names of the sources. We want to look at the source properties obtained from the source finding application, or at the data products associated with it. To start to access these we first need to know the structure of the tables of interest. Let's look at the schema for the two tables in the cells below.

In [None]:
# Source table fields

Source._meta.fields

In [None]:
# Detection table fields

Detection._meta.fields

So we now know that all of the properties that are useful for science are in the detections table. Great, we'll need to map the source to detection to get these. The function below will help you to do this.

The functions below will take an object/instance of the source (not just the source name) and retrieve its properties. Without going into it too much, we'll also need to provide the name of the run since there can be multiple detections for a given source. We'll also show you how to get the object from the source name as well.

In [None]:
# Functions for getting the properties of a source

def get_detection(source_name, run_name):
    """Get the detection for a given source and run.
    
    """
    source = Source.objects.get(name=source_name)
    detection_ids = [sd.detection_id for sd in SourceDetection.objects.filter(source_id=source.id)]
    
    # where there is only 1 detection     
    if (len(detection_ids) == 1):
        detection = Detection.objects.get(id=detection_ids[0])
        return detection

    # Need to specify the run    
    detection = Detection.objects.filter(id__in=detection_ids).filter(run_id=run.id).first()
    return detection
    

def get_property(source_name, run_name, attribute):
    """Retrieve a single property (name) for a source in a given run.
    
    """
    detection = get_detection(source_name, run_name)
    return getattr(detection, attribute)


def get_all_properties(source_name, run_name):
    """Retrieve the value of all properties for a source in a given run.
    
    """
    detection = get_detection(source_name, run_name)
    return model_to_dict(detection)


print(f"f_max for WALLABY_J122708+055255: {get_property(source_name='WALLABY_J122708+055255', run_name=run_name, attribute='f_max')}")
print(f"All property values for WALLABY_J122708+055255: {get_all_properties(source_name='WALLABY_J122708+055255', run_name=run_name)}")

## 4. Download source table

The functions above can be useful for interacting with single detections. Suppose you want to work with all sources and their properties for a given Run, or some subset of the sources. The dictionary above needs more manipulation before it can be useful. 

We provide a function below to get the sources and their detection properties, for a subset of the sources and for a given run, into an astropy table. This table format will be more familiar for many astronomers. From this format you are also able to download the file, first to the local file system, then from the menu on the left onto your local computer.

In [None]:
# Function to get query result into an astropy table

def sources_table(run_name):
    """Take a list of sources from a run by name, and will get the properties as an astropy table.
    
    """
    run = Run.objects.get(name=run_name)
    detections = Detection.objects.filter(run_id=run.id)
    detections_to_sources = SourceDetection.objects.filter(detection_id__in=[d.id for d in detections])
    sources = Source.objects.filter(id__in=[d.source_id for d in detections_to_sources])

    # Manually create numpy array of properties
    name_s = [s.name for s in sources]
    x_d = np.array([d.x for d in detections])
    y_d = np.array([d.y for d in detections])
    z_d = np.array([d.z for d in detections])
    xmin_d = np.array([int(d.x_min) for d in detections])
    xmax_d = np.array([int(d.x_max) for d in detections])
    ymin_d = np.array([int(d.y_min) for d in detections])
    ymax_d = np.array([int(d.y_max) for d in detections])
    zmin_d = np.array([int(d.z_min) for d in detections])
    zmax_d = np.array([int(d.z_max) for d in detections])
    npix_d = np.array([int(d.n_pix) for d in detections])
    fmin_d = np.array([d.f_min for d in detections])
    fmax_d = np.array([d.f_max for d in detections])
    fsum_d = np.array([d.f_sum for d in detections])
    rel_d = np.array([d.rel for d in detections])
    rms_d = np.array([d.rms for d in detections])
    w20_d = np.array([d.w20 for d in detections])
    w50_d = np.array([d.w50 for d in detections])
    ellmaj_d = np.array([d.ell_maj for d in detections])
    ellmin_d = np.array([d.ell_min for d in detections])
    ellpa_d = np.array([d.ell_pa for d in detections])
    ellsmaj_d = np.array([d.ell3s_maj for d in detections])
    ellsmin_d = np.array([d.ell3s_min for d in detections])
    ellspa_d = np.array([d.ell3s_pa for d in detections])
    kinpa_d = np.array([d.kin_pa for d in detections])
    errx_d = np.array([d.err_x for d in detections])
    erry_d = np.array([d.err_y for d in detections])
    errz_d = np.array([d.err_z for d in detections])
    errfsum_d = np.array([d.err_f_sum for d in detections])
    ra_d = np.array([d.ra for d in detections])
    dec_d = np.array([d.dec for d in detections])
    freq_d = np.array([d.freq for d in detections])
    flag_d = np.array([d.flag for d in detections])
    l_d = np.array([d.l for d in detections])
    b_d = np.array([d.b for d in detections])
    vrad_d = np.array([d.v_rad for d in detections])
    vopt_d = np.array([d.v_opt for d in detections])
    vapp_d = np.array([d.v_app for d in detections])
    unres_d = np.array([d.unresolved for d in detections])
    wm50_d = np.array([d.wm50 for d in detections])
    xpeak_d = np.array([d.x_peak for d in detections])
    ypeak_d = np.array([d.y_peak for d in detections])
    zpeak_d = np.array([d.z_peak for d in detections])
    rapeak_d = np.array([d.ra_peak for d in detections])
    decpeak_d = np.array([d.dec_peak for d in detections])
    freqpeak_d = np.array([d.freq_peak for d in detections])
    lpeak_d = np.array([d.l_peak for d in detections])
    bpeak_d = np.array([d.b_peak for d in detections])
    vradpeak_d = np.array([d.v_rad_peak for d in detections])
    voptpeak_d = np.array([d.v_opt_peak for d in detections])
    vapppeak_d = np.array([d.v_app_peak for d in detections])

    # Array of values     
    matrix = [name_s, x_d, y_d, z_d, xmin_d, xmax_d, ymin_d, ymax_d, zmin_d, zmax_d, 
         npix_d, fmin_d, fmax_d, fsum_d, rel_d, rms_d, w20_d, w50_d, ellmaj_d,
         ellmin_d, ellpa_d, ellsmaj_d, ellsmin_d, ellspa_d, kinpa_d, errx_d,
         erry_d, errz_d, errfsum_d, ra_d, dec_d, freq_d, flag_d, l_d, b_d,
         vrad_d, vopt_d, vapp_d, unres_d, wm50_d, xpeak_d, ypeak_d, zpeak_d, 
         rapeak_d, decpeak_d, freqpeak_d, lpeak_d, bpeak_d, vradpeak_d, voptpeak_d,
         vapppeak_d]
    
    # Array names
    col_names = ('name', 'x', 'y', 'z', 'x_min', 'x_max', 'y_min', 'y_max', 'z_min', 'z_max', 
         'n_pix', 'f_min', 'f_max', 'f_sum', 'rel', 'rms', 'w20', 'w50', 'ell_maj', 
         'ell_min', 'ell_pa', 'ell3s_maj', 'ell3s_min', 'ell3s_pa', 'kin_pa',
         'err_x', 'err_y', 'err_z', 'err_f_sum', 'ra', 'dec', 'freq', 'flag',
         'l', 'b', 'v_rad', 'v_opt', 'v_app', 'unresolved', 'wm50', 'x_peak', 
         'y_peak', 'z_peak', 'ra_peak', 'dec_peak', 'freq_peak', 'l_peak', 'b_peak',
         'v_rad_peak', 'v_opt_peak', 'v_app_peak')
    
    return Table(matrix, names=col_names)

In [None]:
# Functions for getting the properties of a source

def get_detection(source_name, run_name):
    """Get the detection for a given source and run.
    
    """
    run = Run.objects.get(name=run_name)
    source = Source.objects.get(name=source_name)
    detection_ids = [sd.detection_id for sd in SourceDetection.objects.filter(source_id=source.id)]
    
    # where there is only 1 detection     
    if (len(detection_ids) == 1):
        detection = Detection.objects.get(id=detection_ids[0])
        return detection

    # Need to specify the run    
    detection = Detection.objects.filter(id__in=detection_ids).filter(run_id=run.id).first()
    return detection
    

def get_property(source_name, run_name, attribute):
    """Retrieve a single property (name) for a source in a given run.
    
    """
    detection = get_detection(source_name, run_name)
    return getattr(detection, attribute)


def get_all_properties(source_name, run_name):
    """Retrieve the value of all properties for a source in a given run.
    
    """
    detection = get_detection(source_name, run_name)
    return model_to_dict(detection)


print(f"f_max for WALLABY_J122708+055255: {get_property(source_name='WALLABY_J122708+055255', run_name=run_name, attribute='f_max')}")
print(f"All property values for WALLABY_J122708+055255: {get_all_properties(source_name='WALLABY_J122708+055255', run_name=run_name)}")

Cool! Now to save it as a csv. 

In [None]:
# Write table to csv

table = sources_table(run_name=run_name)
table.write("sources_%s.csv" % run_name, format = 'csv') 

## 5. Download source associated data products

We can also download the data products (moment maps, spectra) associated with the sources of interest. These are stored in the `Product` table, which we haven't interacted with just yet. All of these data are stored as bytes in the database table, so downloading them allows us to interact with them as `.fits` files (which might be easier for some astronomers). There are also methods for visualising the content directly in these notebooks, which can be found in the `user.ipynb` notebook template.

This script will write a bunch of files for each of the different data products. It's inconvenient to download each of these individually, so we will also zip the content up.

### Save to file

Here we will show you how to save the output products for a given source to a folder. Each of the .fits products will be compressed.

In [None]:
def write_bytesio_to_file(filename, bytesio):
    """Write the contents of the given BytesIO to a file.
    Creates the file or overwrites the file if it does
    not exist yet. 
    
    """
    with open(filename, "wb") as outfile:
        # Copy the BytesIO stream to the output file
        outfile.write(bytesio.getbuffer())
            
def write_zipped_fits_file(filename, product):
    """Write a tar.gz file for a .fits data product.
    
    """
    with io.BytesIO() as buf:
        buf.write(product)
        buf.seek(0)
        zip_file = filename.replace('.fits', '.tar.gz')
        if not os.path.isfile(zip_file):
            write_bytesio_to_file(filename, buf)
            os.system(f'tar -czf {zip_file} {filename}')
            os.system(f'rm -rf {filename}')
    

def save_data_products(source_name, run_name, output_dir):
    """Get the products associated with a source for a given run
    
    """
    source = Source.objects.get(name=source_name)
    detection = get_detection(source_name, run_name)
    products = Product.objects.get(detection=detection)
    
    # Write fits files
    write_zipped_fits_file('%s/%s_cube.fits' % (output_dir, source.name), products.cube)
    write_zipped_fits_file('%s/%s_chan.fits' % (output_dir, source.name), products.chan)
    write_zipped_fits_file('%s/%s_mask.fits' % (output_dir, source.name), products.mask)
    write_zipped_fits_file('%s/%s_mom0.fits' % (output_dir, source.name), products.mom0)
    write_zipped_fits_file('%s/%s_mom1.fits' % (output_dir, source.name), products.mom1)
    write_zipped_fits_file('%s/%s_mom2.fits' % (output_dir, source.name), products.mom2)

    # Open spectrum
    with io.BytesIO() as buf:
        buf.write(b''.join(products.spec))
        buf.seek(0)
        spec_file  = '%s/%s_spec.txt' % (output_dir, source.name)
        if not os.path.isfile(spec_file):
            write_bytesio_to_file(spec_file, buf)

In [None]:
# save data products to a directory called 'products'

if not os.path.isdir('products_%s' % run_name):
    os.mkdir('products_%s' % run_name)

save_data_products(source_name="WALLABY_J122708+055255", run_name=run_name, output_dir='products_%s' % run_name)

### Zipped

If you are downloading the data products for an entire run rather than a source, we have a script below that will add them to a zipped folder. In order to do this we create a temporary directory where the files will be stored, zip this directory and then remove any evidence...

In [None]:
# Zip the products

def save_data_products_run(run_name, filename):
    """Save data products all sources of a given run into a zipped folder.
    
    """
    run = Run.objects.get(name=run_name)
    detections = Detection.objects.filter(run_id=run.id)
    detections_to_sources = SourceDetection.objects.filter(detection_id__in=[d.id for d in detections])
    sources = Source.objects.filter(id__in=[d.source_id for d in detections_to_sources])

    tmp = 'tmp_products'
    if not os.path.isdir(tmp):
        os.mkdir(tmp)

    for s in sources:
        save_data_products(s.name, run.name, tmp)
        
    os.system(f'tar -czf {filename} {tmp}')
    os.system(f'rm -rf {tmp}')

In [None]:
save_data_products_run(run_name=run_name, filename='zipped_%s.tar.gz' % run_name)

## 6. Summary plots

We provide summary plots that show the moment 0, moment 1 maps, spectra and optical counterpart in a single figure. An example of such a plot is shown below.

In this subsection we provide some code for creating these summary plots for each of the sources of a given run. Each plot takes a few seconds to generate, so this part may take a few minutes. There are also a lot of warnings in the plotting code so you will have to just ignore those.

In [None]:
# Import libraries for plotting

import io
import math
import numpy as np

import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

import astropy.units as u
from astropy.io import fits
from astropy.wcs import WCS
from astropy.visualization import PercentileInterval
from astroquery.skyview import SkyView
from astropy.utils.data import clear_download_cache

def retrieve_dss_image(longitude, latitude, width, height):
    hdulist = SkyView.get_images(position="{}, {}".format(longitude, latitude), survey=["DSS2 Blue"], coordinates="J2000", projection="Tan", pixels="{}, {}".format(str(int(2400 * width)), str(int(2400 * height))), width=width*u.deg, height=height*u.deg);
    return hdulist[0][0]

def plot_source(source_name, run_name, save=False, output_dir=None):
    # Plot figure size    
    interval = PercentileInterval(95.0)
    plt.rcParams["figure.figsize"] = (12,12)
    dss_image = True
    
    # Retrieve products from database
    detection = get_detection(source_name, run_name)
    products = Product.objects.get(detection=detection)
    
    # Open moment 0 image
    with io.BytesIO() as buf:
        buf.write(products.mom0)
        buf.seek(0)
        hdu_mom0 = fits.open(buf)[0]
        wcs = WCS(hdu_mom0.header)
        mom0 = hdu_mom0.data

    # Open moment 1 image
    with io.BytesIO() as buf:
        buf.write(products.mom1)
        buf.seek(0)
        hdu_mom1 = fits.open(buf)[0]
        mom1 = hdu_mom1.data

    with io.BytesIO() as buf:
        buf.write(b''.join(products.spec))
        buf.seek(0)
        spectrum = np.loadtxt(buf, dtype="float", comments="#", unpack=True)

    # Extract coordinate information
    nx = hdu_mom0.header["NAXIS1"]
    ny = hdu_mom0.header["NAXIS2"]
    clon, clat = wcs.all_pix2world(nx/2, ny/2, 0)
    tmp1, tmp3 = wcs.all_pix2world(0, ny/2, 0)
    tmp2, tmp4 = wcs.all_pix2world(nx, ny/2, 0)
    width = np.rad2deg(math.acos(math.sin(np.deg2rad(tmp3)) * math.sin(np.deg2rad(tmp4)) + math.cos(np.deg2rad(tmp3)) * math.cos(np.deg2rad(tmp4)) * math.cos(np.deg2rad(tmp1 - tmp2))))
    tmp1, tmp3 = wcs.all_pix2world(nx/2, 0, 0)
    tmp2, tmp4 = wcs.all_pix2world(nx/2, ny, 0)
    height = np.rad2deg(math.acos(math.sin(np.deg2rad(tmp3)) * math.sin(np.deg2rad(tmp4)) + math.cos(np.deg2rad(tmp3)) * math.cos(np.deg2rad(tmp4)) * math.cos(np.deg2rad(tmp1 - tmp2))))
    
    # Download DSS image from SkyView
    try:
        hdu_opt = retrieve_dss_image(clon, clat, width, height)
        wcs_opt = WCS(hdu_opt.header)
    except Exception as e:
        dss_image = False
    
    # Plot moment 0
    ax2 = plt.subplot(2, 2, 1, projection=wcs);
    ax2.imshow(mom0, origin="lower");
    ax2.grid(color="grey", ls="solid");
    ax2.set_xlabel("Right ascension (J2000)");
    ax2.set_ylabel("Declination (J2000)");
    ax2.tick_params(axis="x", which="both", left=False, right=False);
    ax2.tick_params(axis="y", which="both", top=False, bottom=False);
    ax2.set_title("moment 0");
    e = Ellipse((5, 5), 5, 5, 0, edgecolor='peru', facecolor='peru');
    ax2.add_patch(e);

    # Plot DSS image with HI contours
    if dss_image:
        bmin, bmax = interval.get_limits(hdu_opt.data);
        ax = plt.subplot(2, 2, 2, projection=wcs_opt);
        ax.imshow(hdu_opt.data, origin="lower");
        ax.contour(hdu_mom0.data, transform=ax.get_transform(wcs), levels=np.logspace(2.0, 5.0, 10), colors="lightgrey", alpha=1.0);
        ax.grid(color="grey", ls="solid");
        ax.set_xlabel("Right ascension (J2000)");
        ax.set_ylabel("Declination (J2000)");
        ax.tick_params(axis="x", which="both", left=False, right=False);
        ax.tick_params(axis="y", which="both", top=False, bottom=False);
        ax.set_title("DSS + moment 0");

    # Plot moment 1
    bmin, bmax = interval.get_limits(mom1);
    ax3 = plt.subplot(2, 2, 3, projection=wcs);
    ax3.imshow(hdu_mom1.data, origin="lower", vmin=bmin, vmax=bmax, cmap=plt.get_cmap("gist_rainbow"));
    ax3.grid(color="grey", ls="solid");
    ax3.set_xlabel("Right ascension (J2000)");
    ax3.set_ylabel("Declination (J2000)");
    ax3.tick_params(axis="x", which="both", left=False, right=False);
    ax3.tick_params(axis="y", which="both", top=False, bottom=False);
    ax3.set_title("moment 1");

    # Plot spectrum
    xaxis = spectrum[1] / 1e+6;
    data  = 1000.0 * np.nan_to_num(spectrum[2]);
    xmin = np.nanmin(xaxis);
    xmax = np.nanmax(xaxis);
    ymin = np.nanmin(data);
    ymax = np.nanmax(data);
    ymin -= 0.1 * (ymax - ymin);
    ymax += 0.1 * (ymax - ymin);
    ax4 = plt.subplot(2, 2, 4);
    ax4.step(xaxis, data, where="mid", color="royalblue");
    ax4.set_xlabel("Frequency (MHz)");
    ax4.set_ylabel("Flux density (mJy)");
    ax4.set_title("spectrum");
    ax4.grid(True);
    ax4.set_xlim([xmin, xmax]);
    ax4.set_ylim([ymin, ymax]);

    plt.suptitle(detection.name.replace("_", " ").replace("-", "−"), fontsize=16);
    if save:
        plt.savefig("%s/%s.png" % (output_dir, source_name))
        plt.close()
    else:
        plt.show();

    # Clean up
    clear_download_cache(pkgname="astroquery");
    clear_download_cache();

In [None]:
%matplotlib inline

plot_source(source_name="WALLABY_J123244+000656", run_name=run_name)

In [None]:
# Function to save summary plots in a specified output directory

def save_run_figures(run_name, output_dir):
    """Plot all sources for a given run.
    
    """
    detections = Detection.objects.filter(run_id = Run.objects.get(name=run_name))
    detections_to_sources = SourceDetection.objects.filter(detection_id__in=[d.id for d in detections])
    sources = Source.objects.filter(id__in=[d.source_id for d in detections_to_sources])
    for s in sources:
        try:
            plot_source(s.name, run_name, output_dir=output_dir, save=True)
        except Exception as e:
            print("Failed to make plots for %s" % s.name)

In [None]:
# Save figures for a given run as a zip file.

if save_plots:
    output_dir = 'plots_%s' % run_name
    if not os.path.isdir(output_dir):
        os.mkdir(output_dir)

    save_run_figures(run_name=run_name, output_dir=output_dir)
    os.system(f'tar -czf {"%s.tar.gz" % output_dir} {output_dir}')
    os.system(f'rm -rf {output_dir}')

# That's it!

If you have any thoughts or anything you think we should add to this notebook please contact [austin.shen@csiro.au](mailto:austin.shen@csiro.au)