## Getting Started

[Astroquery](https://astroquery.readthedocs.io/en/latest/index.html) is a python package that can help us obtain specific FITS files from the [ALMA Science Archive](https://almascience.nrao.edu/aq/). If this package is not already installed, run the following code in the Anaconda terminal.

```
conda install -c conda-forge astroquery
```

In [1]:
# Imports
import os
import numpy as np
import pandas as pd
import tarfile
import shutil

import astroquery
from astroquery.alma import Alma
from astropy.table import Table

In [2]:
# Setup
alma = Alma()
alma.archive_url = 'https://almascience.nrao.edu'

## FITS Files from a Single Project

Each project in the ALMA Science Archive has an associated "member ous id" (MOUS) for unique identification. Given a MOUS, the following series of cells will extract the continuum FITS files from this project. For additional information, please visit this [documentation](https://astroquery.readthedocs.io/en/latest/alma/alma.html#downloading-data) on Astroquery.

In [3]:
# Specify MOUS.
MOUS = 'uid://A001/X2f7/Xe9'

# Specify path to store extracted FITS files.
PATH = './data'

# All FITS files must have these link attributes:
INCLUDE = ['pbcor', 'fits']

# All FITS files must have at least one of these link attributes:
OPTIONAL = ['cont', 'mfs'] 

# All FITS files cannot have these link attributes:
EXCLUDE = ['cube']

In [4]:
# Stage data
links = alma.stage_data([MOUS], expand_tarfiles=True)['URL']

# Process links
links = [link for link in links if all(term in link for term in INCLUDE)]
links = [link for link in links if any(term in link for term in OPTIONAL)]
links = [link for link in links if all(term not in link for term in EXCLUDE)]

# Count of FITS files
len(links)

21

In [5]:
# Make folder if directory does not exist.
if not os.path.exists(PATH):
    os.mkdir(PATH)

# Download FITS files.
alma.download_files(links, 
                    savedir=PATH, 
                    cache=True) # Turn off cache to re-download FITS files.

Downloading URL https://almascience.nrao.edu/dataPortal/member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw25.mfs.I.pbcor.fits to ./data\member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw25.mfs.I.pbcor.fits ... [Done]
Downloading URL https://almascience.nrao.edu/dataPortal/member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw27.mfs.I.pbcor.fits to ./data\member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw27.mfs.I.pbcor.fits ... [Done]
Downloading URL https://almascience.nrao.edu/dataPortal/member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw29.mfs.I.pbcor.fits to ./data\member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw29.mfs.I.pbcor.fits ... [Done]
Downloading URL https://almascience.nrao.edu/dataPortal/member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw31.mfs.I.pbcor.fits to ./data\member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw31.mfs.I.pbcor.fits ... [Done]
Downloading URL https://almascience.nrao.edu/dataPortal/member.uid___A001_X2f7_Xe9.J1838-1853_chk.spw25.mfs.I.pbcor.fits to ./data\member.uid___A001_X2f7_Xe9.J1838-1853_chk.spw25.m

['./data\\member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw25.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw27.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw29.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1832-2039_ph.spw31.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1838-1853_chk.spw25.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1838-1853_chk.spw27.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1838-1853_chk.spw29.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1838-1853_chk.spw31.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1845-2200_chk.spw25.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1845-2200_chk.spw27.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1845-2200_chk.spw29.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1845-2200_chk.spw31.mfs.I.pbcor.fits',
 './data\\member.uid___A001_X2f7_Xe9.J1911-2006_ph.spw25.mfs.I.pbcor.fits',
 './

## Fits Files from Multiple Projects

To programmatically download fits files from multiple projects, you can follow a three step process:

1. Get a list of all MOUS IDs for a given science keyword

2. Get a list of file links associated with those MOUS IDs

3. Download the FITS file from each link

We wrote functions for each step with options based on our desired data, but some slight modifications to these functions may be needed depending on intended usage.

### Get MOUS IDs for a Science Keyword

Note that the science keywords must exactly match one of the keywords listed in the archive's dropdown menu. For example "Outflows, jets and ionized winds" is a valid science keyword, but "outflows" alone will not work. A full list of science keywords as of this writing is provided at the bottom of this page.

In our code, we also include an option to return only data after a specified year in the results.

In [None]:
def get_mous(science_keyword, save_file_path = None, min_year = None):
    """Get all mous IDs for a given science keyword
    Returns list of mous IDs as strings
    science_keyword: string search keyword
    save_file_path: optional path to save csv of results
    min_year: optional param to filter results to only those after a certain year;
        can be string or int;
        current min year in archive is 2011
    """

    #query alma
    full_query = "select * from ivoa.obscore where science_keyword = '{}'".format(science_keyword)
    query_results = Alma.query_tap(full_query)

    #convert results to df and clean up
    result_df = query_results.to_table().to_pandas()
    result_df.loc[:, result_df.dtypes == object] = result_df.loc[:, result_df.dtypes == object].apply(lambda x: x.str.decode('utf-8'))

    #filter results if desired
    if min_year is not None:
        if type(min_year) != int:
            min_year = int(min_year)
        result_df = result_df[result_df['proposal_id'].str[0:4].astype(int) >= min_year]

    #save results if desired
    if save_file_path is not None:
        result_df.to_csv(save_file_path, index = False)

    return(result_df['member_ous_uid'].unique())

In [None]:
mous = get_mous('Outflows, jets and ionized winds', FILE_PATH, min_year = 2018)

### Get FITS Links

Next, we gather the FITS file links associated with each MOUS ID. This follows the same process as the single MOUS download, but uses a loop to gather all file links.

In [None]:
def get_fits_links(mous_list, trim = True):
    """Get file links from mous IDs
    Returns list of links to fits files
    mous_list: list of mous IDs (strings)
    trim: filter to just continuum fits files or not
    """

    all_links = pd.DataFrame()
    error_ids = []
    for mous_id in mous_list:
        try:
            mous_links = alma.stage_data([mous_id], expand_tarfiles=True)['URL']
            all_links = all_links.append(pd.DataFrame(mous_links))
        except:
            error_ids.append(mous_id)

    if trim:
        trimmed_links = all_links[all_links['URL'].str.contains('cont') & all_links['URL'].str.contains('fits.tar')]['URL']

        return trimmed_links, error_ids
    else:
        return all_links, error_ids

In [None]:
links, _ = get_fits_links(mous)

### Download FITS Files from Links

Finally, we download the FITS files. The files initially are downloaded as zipped tarballs. When unzipped, the actual .fits file is buried a couple levels down in a nested folder structure. We added some clean up functionality to optionally pull the .fits files out of the subfolders into the root of the specified directory and to delete the .tar files after unzipping. If using this functionality, be very careful about your current working directory (or specified cache location), as this will impact all .tar files in the given location.

In [None]:
def download_all_fits(fits_links, cache_location = None, unzip=True, unnest=True, del_tar=False):
    """Download and optionally unzip all FITS files from a list of mous IDs
    mous_list_loc: path to csv with mous IDs for a given search term
    cache_location: path to save downloaded files to (default current working directory)
    unzip: binary whether or not to unzip tar files
    unnest: if true, move fits files out of nested subpath
    del_tar: if true, remove tar files after unzipping
    """

    #set location to download to
    if cache_location is None:
        cache_location = os.getcwd()
    alma.cache_location = cache_location # --> if you want to download to a specific directory

    error_links = []
    #download fits files to that location
    for link in fits_links:
        try:
            alma.download_files([link])
        except:
            error_links.append(link)

    #unzip files
    if unzip:
        #get list of tar files in directory
        dir_files = os.listdir()
        tar_files = [s for s in dir_files if s.endswith('.tar')]

        for tar_file in tar_files:
            #unzip file
            tar = tarfile.open(tar_file)
            tar_names = tar.getnames()

            tar.extractall(cache_location)
            tar.close()

            if unnest:
                for name in tar_names:
                    #move fits file out of subfolder
                    shutil.move(name, name.split('/')[-1])
                    #delete now-empty subfolder
                    shutil.rmtree(name.split('/', 1)[0] + '/')
            if del_tar:
                #delete tar file
                os.remove(tar_file)

    return error_links

In [None]:
download_all_fits(links)

## List of ALMA Science Keywords

As of June 2021, the full list of ALMA science keywords is as follows:

Active galaxies
* Active Galactic Nuclei (AGN)/Quasars (QSO)
* Galactic centres/nuclei
* High-z Active Galactic Nuclei (AGN)
* Outflows, jets, feedback
* Starburst galaxies
* Starbursts, star formation

Cosmology
* Cosmic Microwave Background (CMB)/Sunyaev-Zel'dovich Effect (SZE)
* Damped Lyman Alpha (DLA) systems
* Galaxy Clusters
* Galaxy groups and clusters
* Gamma Ray Bursts (GRB)
* Gravitational lenses

Disks and planet formation
* Debris disks
* Disks around high-mass stars
* Disks around low-mass stars
* Exo-planets

Galaxy evolution
* Early-type galaxies
* Galaxy chemistry
* Galaxy structure & evolution
* Luminous and Ultra-Luminous Infra-Red Galaxies (LIRG & ULIRG)
* Lyman Alpha Emitters/Blobs (LAE/LAB)
* Lyman Break Galaxies (LBG)
* Merging and interacting galaxies
* Sub-mm Galaxies (SMG)
* Surveys of galaxies
 
ISM and star formation
* Astrochemistry
* Giant Molecular Clouds (GMC) properties
* High-mass star formation
* HII regions
* Inter-Stellar Medium (ISM)/Molecular clouds
* Intermediate-mass star formation
* Low-mass star formation
* Outflows, jets and ionized winds
* Photon-Dominated Regions (PDR)/X-Ray Dominated Regions (XDR)
* Pre-stellar cores, Infra-Red Dark Clouds (IRDC)

Local Universe
* Dwarf/metal-poor galaxies
* Magellanic Clouds
* Spiral galaxies

Solar system
* Solar system - Asteroids
* Solar system - Comets
* Solar system - Planetary atmospheres
* Solar system - Planetary surfaces
* Solar system - Trans-Neptunian Objects (TNOs)

Stars and stellar evolution
* Asymptotic Giant Branch (AGB) stars
* Black holes
* Brown dwarfs
* Cataclysmic stars
* Evolved stars - Chemistry
* Evolved stars - Shaping/physical structure
* Hypergiants
* Luminous Blue Variables (LBV)
* Main sequence stars
* Post-AGB stars
* Pulsars and neutron stars
* Supernovae (SN) ejecta
* Transients
* White dwarfs

Sun
* The Sun