# How fast can you access data from NASA's TESS Data Archive?

*Prepared by Geert Barentsen on Feb 3, 2021.*

## Purpose of this notebook

This notebook investigates the performance of downloading TESS FFI's in three different ways:
* from MAST via HTTP;
* from AWS S3 via HTTP;
* from AWS S3 via the `boto3` client library.

## Step 1: Select ten random TESS FFI images (~300MB)

Below we uses TAP to obtain the URI of a random TESS FFI.  We use TAP rather than the `Observations.get_product_list()` mechanism provided by `astroquery.mast`  because the latter can take >10 minutes to list the FFI images for a sector.

In [1]:
# Install dependencies
!pip install -q --upgrade pip
!pip install -q --upgrade numpy astroquery httpx boto3

In [2]:
from astroquery.utils.tap.core import TapPlus
from astroquery.mast import utils
import httpx
import numpy as np

In [3]:
# We use TAP because accessing product URI's via Observations.get_product_list() can take >10 minutes
sector = np.random.choice(30)
mast_tap = TapPlus(url="https://vao.stsci.edu/caomtap/tapservice.aspx")
adql = f"""SELECT * FROM obscore
           WHERE obs_collection='TESS' AND dataproduct_type = "image"
           AND obs_id LIKE 'tess%-s{sector:04d}-1-1-%'"""
job = mast_tap.launch_job_async(adql)
images = job.get_results()

Created TAP+ (v1.2.1) - Connection:
	Host: vao.stsci.edu
	Use HTTPS: True
	Port: 443
	SSL Port: 443
INFO: Query finished. [astroquery.utils.tap.core]


In [4]:
n_images = 30
uri_list = tuple(images[np.random.choice(len(images), n_images)]["access_url"])
print(f"We randomly selected {n_images} images from TESS Sector {sector}:\n" + "\n".join(uri_list))

We randomly selected 30 images from TESS Sector 9:
https://mast.stsci.edu/portal/Download/file?uri=mast:TESS/product/tess2019081055934-s0009-1-1-0139-s_ffic.fits
https://mast.stsci.edu/portal/Download/file?uri=mast:TESS/product/tess2019068135934-s0009-1-1-0139-s_ffic.fits
https://mast.stsci.edu/portal/Download/file?uri=mast:TESS/product/tess2019065065934-s0009-1-1-0139-s_ffic.fits
https://mast.stsci.edu/portal/Download/file?uri=mast:TESS/product/tess2019083125934-s0009-1-1-0139-s_ffic.fits
https://mast.stsci.edu/portal/Download/file?uri=mast:TESS/product/tess2019068115934-s0009-1-1-0139-s_ffic.fits
https://mast.stsci.edu/portal/Download/file?uri=mast:TESS/product/tess2019074172934-s0009-1-1-0139-s_ffic.fits
https://mast.stsci.edu/portal/Download/file?uri=mast:TESS/product/tess2019064035934-s0009-1-1-0139-s_ffic.fits
https://mast.stsci.edu/portal/Download/file?uri=mast:TESS/product/tess2019060142935-s0009-1-1-0139-s_ffic.fits
https://mast.stsci.edu/portal/Download/file?uri=mast:TESS/pro

## Experiment A: Download from MAST via HTTP

In [5]:
%%time
r = [httpx.get(uri, timeout=None) for uri in uri_list]

CPU times: user 53.1 s, sys: 9.74 s, total: 1min 2s
Wall time: 14min 5s


## Experiment B: Download from AWS S3 via HTTP

In [6]:
aws_bucket = "stpubdata"
relative_path = [utils.mast_relative_path(uri.split("uri=")[1]) for uri in uri_list]
aws_uri_list = [f"http://s3.amazonaws.com/{aws_bucket}{rp}" for rp in relative_path]
aws_uri_list

['http://s3.amazonaws.com/stpubdata/tess/public/ffi/s0009/2019/081/1-1/tess2019081055934-s0009-1-1-0139-s_ffic.fits',
 'http://s3.amazonaws.com/stpubdata/tess/public/ffi/s0009/2019/068/1-1/tess2019068135934-s0009-1-1-0139-s_ffic.fits',
 'http://s3.amazonaws.com/stpubdata/tess/public/ffi/s0009/2019/065/1-1/tess2019065065934-s0009-1-1-0139-s_ffic.fits',
 'http://s3.amazonaws.com/stpubdata/tess/public/ffi/s0009/2019/083/1-1/tess2019083125934-s0009-1-1-0139-s_ffic.fits',
 'http://s3.amazonaws.com/stpubdata/tess/public/ffi/s0009/2019/068/1-1/tess2019068115934-s0009-1-1-0139-s_ffic.fits',
 'http://s3.amazonaws.com/stpubdata/tess/public/ffi/s0009/2019/074/1-1/tess2019074172934-s0009-1-1-0139-s_ffic.fits',
 'http://s3.amazonaws.com/stpubdata/tess/public/ffi/s0009/2019/064/1-1/tess2019064035934-s0009-1-1-0139-s_ffic.fits',
 'http://s3.amazonaws.com/stpubdata/tess/public/ffi/s0009/2019/060/1-1/tess2019060142935-s0009-1-1-0139-s_ffic.fits',
 'http://s3.amazonaws.com/stpubdata/tess/public/ffi/s000

In [7]:
%%time
r = [httpx.get(uri, timeout=None) for uri in aws_uri_list]

CPU times: user 33 s, sys: 6.51 s, total: 39.6 s
Wall time: 11min 31s


## Experiment C: Download from AWS S3 via `boto3`

In [8]:
import boto3
from botocore import UNSIGNED
from botocore.config import Config

In [9]:
s3 = boto3.client("s3", config=Config(signature_version=UNSIGNED))

In [10]:
%%time
[s3.download_file(aws_bucket, uri[1:], "/tmp/tmp-aws.fits") for uri in relative_path]

CPU times: user 14 s, sys: 9.82 s, total: 23.9 s
Wall time: 5min 49s


[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

In [11]:
!ls -lh /tmp/tmp-aws.fits

-rw-r--r-- 1 gb wheel 34M Feb  3 18:32 /tmp/tmp-aws.fits
