# using queryIRSA: building the databases

This notebook demonstrates the workings of queryIRSA and how to use it to have an overview of ZTF observations and to download data. The base of the package is a mongoDB database called **IRSAmeta** containing several collections, each one containing the data of each IRSA metadata table for a specific dataset:
 - 'sci' science images (calibrated) and catalogs (PSF and aperture)
 - 'cal' calibration products (calibrated)
 - 'raw' raw files
 - 'ref' reference images (calibrated)

As a firts step, user popoulates these database. These can then be queried with standard mongog synthax to isolate the metadata decibing the files one is interested in. Once these metadata are available, they can be used to download the actual files.

for more details on IRSA metadata and data access, see:
https://irsa.ipac.caltech.edu/docs/program_interface/ztf_metadata.html
https://irsa.ipac.caltech.edu/docs/program_interface/ztf_api.html


## Inserting metadata in the DB

Initially, the database is empty. There are several methods that can be used to fill it up and update it. They are all based on requesting data for particular time ranges. There are two basic options:
 * query for time ranges (strings or astropy.time.Time)
 * query for a night ID or a range of those. The night ID is an integer counting nights from 2017-01-01 onwards.
These two functions they both query IRSA website for the desired metadata and first download it as a `pandas.DataFrame` whose records are then inserted in the database.

In [2]:
from queryIRSA.metadata import metaDB
meta = metaDB('sci')
meta.coll.count()

INFO:queryIRSA.metadata:Initialized metadata for type: sci


0

In [3]:
# inserting the metadata for data taken in a specific time range
# valid inputs are strings or (scalar) astropy.time.Time objects 
meta.insert_for_time_range('2018-07-11', '2018-07-12')
meta.coll.count()

INFO:queryIRSA.metadata:querying metadata of type sci for time range (2018-07-11 00:00:00.000 2018-07-12 00:00:00.000)


16930

In [4]:
# inserting data for a particular night. We use the 'night ID' (nid)
# field. To convert from nid to human readable format we can use
# the following function
from queryIRSA.utils.time import nid_to_fits
nid = 498
print ("night ID %d secretly corresponds to %s"%(nid, nid_to_fits(nid)))
meta.insert_for_nid(nid)
meta.coll.count()

night ID 498 secretly corresponds to 2018-05-14


32292

## a small note on indexing

thanks to a patented ultra clever way of indexing the database, you won't have duplicates, even if you accidentally try. Realizing this won't be immedaite, but eventually the code will find it out and dutifully notify the grateful user.

In [5]:
meta.insert_for_nid(nid)



# NOW FOR REAL:

let's populate the database with all the metadata available untill the present moment (which is actually in the future, since it will be avaluated inside the method call). Using the `build_database` method will query for chunks of metadata corresponding to 7 nights (using night ids). It will do so using some (8 by default) parallel threads. It might take a while and consume quite some memory, better go get a coffe.

In [6]:
meta.build_database()

INFO:queryIRSA.metadata:building the IRSA metadatabase for product: sci using 130 queries
100%|██████████| 130/130 [1:00:54<00:00, 28.11s/it]
INFO:queryIRSA.metadata:finished building the database
