# WALLABY internal data access download

A notebook pre-filled with cells and scripts for downloading WALLABY internal release data. The notebook has the following sections

1. Authentication
2. Get source finding catalog and products
3. Get kinematic model table and products

---

In [None]:
import os
import getpass
import requests
import getpass
import pyvo as vo
from pyvo.auth import authsession, securitymethods
from astropy.io.votable import from_table, parse_single_table

# 1. Connect

We access the internally released WALLABY via TAP. The link to the tap service is provided below. The password to the `wallaby_user` will be circulated internally by the WALLABY project team, and it will be required to access any of the data.

### Authenticate

<span style="font-weight: bold; color: #FF0000;">⚠ Update the cell below with your username and enter your password</span>

In [None]:
# Enter WALLABY user username and password

username = 'wallaby_user'
password = getpass.getpass('Enter your password')

In [None]:
# Connect with TAP service

URL = "https://wallaby.aussrc.org/tap"
auth = vo.auth.AuthSession()
auth.add_security_method_for_url(URL, vo.auth.securitymethods.BASIC)
auth.credentials.set_password(username, password)
tap = vo.dal.TAPService(URL, session=auth)

# 2. Source finding

First we need to identify which internal release we want to access. The WALLABY team uses tags to classify source finding detections as internally released. You can view all of the tags by running the cells below. Then, we set the `tag_name` variable two cells below. This will be used later in the notebook, so once you know what data you would like to access, update this value accordingly.

In [None]:
# Get all tags

query = "SELECT * FROM wallaby.tag"
votable = tap.search(query)
table = votable.to_table()
table

<span style="font-weight: bold; color: #FF0000;">⚠ Update the `tag_name` value here. Include the description with format: "name: description"</span>

In [None]:
# SELECT TAG

tag_name = "WALLABY: Full WALLABY survey"

In [None]:
# Retrieve catalog as Astropy table

query = """SELECT d.*, ivo_string_agg(t.name || ': ' || t.description, '; ') AS tags, ivo_string_agg(c.comment, '; ') AS comments
            FROM wallaby.detection d
            FULL JOIN wallaby.tag_detection td ON d.id = td.detection_id 
            LEFT JOIN wallaby.tag t ON t.id = td.tag_id
            LEFT JOIN wallaby.comment c ON d.id = c.detection_id
            WHERE d.source_name IS NOT NULL
            GROUP BY d.id
            HAVING ivo_string_agg(t.name || ': ' || t.description, '; ') LIKE '%$TAG_NAME%'"""
query = query.replace('$TAG_NAME', tag_name)
result = tap.search(query)
table = result.to_table()
table['comments'] = ['; '.join(list(set([ci.strip() for ci in c.split(';')]))) for c in table['comments']]  # make comments distict
table

## Download catalog

It is convenient to write the catalog (`astropy` Table object) to a number of file formats. Below we show how to export the table as a `.xml` votable file. For this, we convert the astropy table to a votable, but there are a number of write options for the astropy table:

https://docs.astropy.org/en/stable/io/ascii/write.html

<span style="font-weight: bold; color: #FF0000;">⚠ Update the `download_filename` value here to specify where to download catalog and products</span>

In [None]:
# Folder name for downloads

download_filename = "WALLABY"

In [None]:
# Download catalog table

votable = from_table(table)
votable_filename = f'{download_filename}.xml'
votable.to_xml(votable_filename)

## Download products

### Download

We have written a function to download the WALLABY source products for an astropy table containing a list of detections. Here the arguments are:

* `table`: the astropy table of detections for which you would like to download product files
* `directory`: the write directory for the products
* `chunk_size`: size (bytes) of each chunk while streaming the download [default 8192 B]

In [None]:
# useful function for downloading table products (requires authentication)

def download_products(row, products_filename, chunk_size=8192):
    """Download products for a row of the table (a detection entry)
    
    """
    name = row['source_name']
    access_url = row['access_url']
    votable = parse_single_table(access_url)
    product_table = votable.to_table()
    url = product_table[product_table['description'] == 'SoFiA-2 Detection Products'][0]['access_url']
    with requests.get(url, auth=(username, password), stream=True) as r:
        r.raise_for_status()
        with open(products_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=chunk_size):
                f.write(chunk)
    print(f'Downloaded completed for {name}')
    return

def download_table_products(table, directory, chunk_size=8192):
    """Download WALLABY products from ADQL queried table

    """
    if not os.path.exists(directory):
        os.mkdir(directory)
    print(f'Saving products to {directory}')
    for row in table:
        name = row['source_name']
        products_filename = os.path.join(directory, f'{name}.tar')
        download_products(row, products_filename, chunk_size)
    print('Downloads complete')
    return

In [None]:
# Write output products for a source

download_table_products(table[0:5], download_filename)

---

# Kinematic models

This notebook also allows users to download the kinematic model table and products. First, we select the kinematic models by tag and retrieve the table.

In [None]:
# Select the list of available team release tags

query = "SELECT DISTINCT team_release_kin FROM wallaby.kinematic_model"
result = tap.search(query)
result

We can then select the kinematic models corresponding to a specific release key

In [None]:
# Set the kinematic tag desired
kin_tag = "NGC5044 Kin TR3"

# The generic query
query = """SELECT d.*
        FROM wallaby.kinematic_model d
        WHERE d.team_release_kin IN ('$TAG_NAME')"""
query = query.replace('$TAG_NAME', kin_tag)

# Run the tap query
result = tap.search(query)

# Get the resulting table
kin_table = result.to_table()

# Print off the table
kin_table[0:2]

Once the table is put together, it can be downloaded in an XML format.

In [None]:
# Download catalog table

votable = from_table(kin_table)
votable_filename = f'%s_catalogue.xml' % kin_tag.replace(' ', '_')
votable.to_xml(votable_filename)

In [None]:
# Download products

def download_wkapp_products(row, products_filename, chunk_size=8192):
    """Download products for a row of the table (a detection entry)
    
    """
    id = row['id']
    URL = 'https://wallaby.aussrc.org/survey/vo/kinematic_model_dl/dlmeta?ID='
    access_url = f'{URL}{id}'
    votable = parse_single_table(access_url)
    product_table = votable.to_table()
    url = product_table[product_table['description'] == 'Kinematic model Products'][0]['access_url']
    with requests.get(url, auth=(username, password), stream=True) as r:
        r.raise_for_status()
        with open(products_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=chunk_size):
                f.write(chunk)
    print(f'Downloaded completed for kinematic_model id={id}')
    return

def download_table_wkapp_products(table, directory, chunk_size=8192):
    """Download WALLABY WKAPP products from ADQL queried table

    """
    if not os.path.exists(directory):
        os.mkdir(directory)
    print(f'Saving products to {directory}')
    for row in table:
        id = row['id']
        kin_release = row['team_release_kin'].replace(' ', '_')
        products_filename = os.path.join(directory, f'{kin_release}_{id}.tar')
        download_wkapp_products(row, products_filename, chunk_size)
    print('Downloads complete')
    return

In [None]:
# Download WKAPP products

download_wkapp_products(kin_table[0], f"{kin_tag.replace(' ', '_')}.tar")
download_table_wkapp_products(kin_table[0:5], kin_tag.replace(' ', '_'))

We can do a similar thing for the 3KIDNAS tables

<span style="font-weight: bold; color: #FF0000;">⚠ Update the kinematic model tag here</span>

In [None]:
query = "SELECT DISTINCT team_release_kin FROM wallaby.kinematic_model_3kidnas"
result = tap.search(query)
result

In [None]:
kin_tag = "Vela E KinTR1"

In [None]:
# The generic query
query = """SELECT k.*, d.source_name
        FROM wallaby.kinematic_model_3kidnas k
        LEFT JOIN wallaby.detection d ON d.id = k.detection_id
        WHERE k.team_release_kin IN ('$TAG_NAME')"""
query = query.replace('$TAG_NAME', kin_tag)

# Run the tap query
result = tap.search(query)

# Get the resulting table
kin_table_3kidnas = result.to_table()

# Print off the table
kin_table_3kidnas[0:2]

In [None]:
# Download catalog table

votable = from_table(kin_table_3kidnas)
votable_filename = f'%s_catalogue.xml' % kin_tag.replace(' ', '_')
votable.to_xml(votable_filename)

csvtable_filename = f'%s_catalogue.csv' % kin_tag.replace(' ', '_')
kin_table_3kidnas.write(csvtable_filename,overwrite=True,format='csv')

In [None]:
# Download products

def download_wrkp_products(row, products_filename, chunk_size=8192):
    """Download products for a row of the table (a detection entry)
    
    """
    id = row['id']
    source_name = row['source_name']
    URL = 'https://wallaby.aussrc.org/survey/vo/kinematic_model_3kidnas_dl/dlmeta?ID='
    access_url = f'{URL}{id}'
    votable = parse_single_table(access_url)
    product_table = votable.to_table()
    url = product_table[product_table['description'] == 'Kinematic model 3KIDNAS Products'][0]['access_url']
    with requests.get(url, auth=(username, password), stream=True) as r:
        r.raise_for_status()
        with open(products_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=chunk_size):
                f.write(chunk)
    print(f'Downloaded completed for kinematic_model_3kidnas id={id} ({source_name})')
    return

def download_table_wrkp_products(table, directory, chunk_size=8192):
    """Download WALLABY WRKP products from ADQL queried table

    """
    if not os.path.exists(directory):
        os.mkdir(directory)
    print(f'Saving products to {directory}')
    for row in table:
        name = row['source_name'].replace(' ', '_')
        kin_release = row['team_release_kin'].replace(' ', '_')
        products_filename = os.path.join(directory, f'{kin_release}_{name}.tar')
        download_wrkp_products(row, products_filename, chunk_size)
    print('Downloads complete')
    return

In [None]:
# Download WRKP products

download_wrkp_products(kin_table_3kidnas[0], 'test.tar')
download_table_wrkp_products(kin_table_3kidnas[0:5], 'test')