# ALeRCE LSST, database tables

```Author: Alejandra Mu√±oz Arancibia. Last updated: 260224```

This notebook introduces the Rubin alert stream data that are stored in the ALeRCE database, showing all tables that are relevant for LSST. We use the pyvo package to execute Astronomical Data Query Language (ADQL) queries to the ALeRCE Table Access Protocol (TAP) service, and display output tables as pandas dataframes.

*It is highly recommended that you try this notebook in Google Colab using the following [link](https://colab.research.google.com/github/alercebroker/usecases/blob/master/notebooks/LSST/ALeRCE_LSST_Tables.ipynb).* This will avoid you from having to sort out library installation problems and focus on the contents of the tutorial. You can try installing the dependencies later in your own system.

### Some settings

Load libraries

In [None]:
#!pip install pyvo

In [1]:
import pandas as pd
import pyvo as vo

In [2]:
pd.set_option('display.max_colwidth', 1000)

### Connect to ALeRCE data via TAP

We connect to the [ALeRCE TAP service](https://tap.alerce.online/) as

In [3]:
alerce_tap = vo.dal.TAPService('https://tap.alerce.online/tap')

## The ALeRCE multisurvey database

We save Rubin alert stream data in a multisurvey database. We distinguish among data sources by using the field ```tid```, for "telescope identifier". LSST (and more generally, all Rubin data observed with the Simonyi telescope) has ```tid = 1```.

In [4]:
tid = 1

We store both data from the alerts (e.g. detections, forced photometry points) and quantities computed by the ALeRCE pipeline (e.g. mean coordinates, first and last detection dates, probabilities). We distinguish between three types of tables:

1. Tables that are common to all surveys:

In [5]:
common_tables = [
    'object',
    'detection',
    'forced_photometry',
    'probability',
]

2. Tables that are survey-specific. For LSST, these are:

In [6]:
lsst_tables = [
    'lsst_detection',
    'lsst_ss_detection', # ssSource in alert schema
    'lsst_forced_photometry',
    'lsst_dia_object', # diaObject in alert schema
    'lsst_mpc_orbits', # mpc_orbits in alert schema
]

3. Lookup tables (LUTs) that store mappings between integer identifiers and original names:

In [7]:
lookup_tables = [
    'sid_lut',
    'band',
    'catalog_id_lut',
    'classifier',
    'taxonomy',
]

The following TAP query provides a description for each of these tables:

In [8]:
data_tables = common_tables + lsst_tables + lookup_tables
query = '''
SELECT
    table_name, description
FROM
    tap_schema.tables
WHERE
    table_name in (%s)
''' % ','.join(["'alerce_tap.%s'" % table_name for table_name in data_tables])

result = alerce_tap.search(query).to_table().to_pandas()
result['index'] = result['table_name'].str.replace('alerce_tap.', '')
result = result.set_index('index').loc[
   data_tables].reset_index().drop(columns='index').copy()
result['description'] = result['description'].str.replace('\n', ' ')
display(result)

Unnamed: 0,table_name,description
0,alerce_tap.object,"This table contains aggregated information about astronomical objects observed in multiple surveys. Each record corresponds to a unique object identified by its position and survey. The table includes astrometric properties (mean RA/Dec), detection statistics (number of detections, forced photometry measurements, non-detections), and temporal coverage (first and last observation dates)."
1,alerce_tap.detection,"Generic detection table containing basic information for all detections across multiple surveys, including position, observation time, and photometric band. For detailed survey-specific detection properties, see the corresponding survey-specific detection tables."
2,alerce_tap.forced_photometry,"Generic forced photometry measurements across multiple surveys, containing basic position and timing information. For detailed survey-specific forced photometry measurements, see the corresponding survey-specific tables."
3,alerce_tap.probability,"Classification probabilities for each object assigned by machine learning classifiers, including probability values, rankings among all classes, and the timestamp of the classification."
4,alerce_tap.lsst_detection,"Individual LSST Difference Image Analysis sources, containing detailed measurements from difference imaging including PSF photometry, aperture photometry, trail and dipole fitting for moving objects, shape moments, forced photometry on science and template images, and extensive quality flags for pixels and measurements."
5,alerce_tap.lsst_ss_detection,"Individual LSST detections associated with solar system objects, including observing geometry (phase angle, elongation, ranges), ephemeris predictions, ephemeris offsets (O-C residuals), and heliocentric/topocentric state vectors in ICRS coordinates."
6,alerce_tap.lsst_forced_photometry,"LSST forced photometry measurements extracted at DIA object positions on individual visit images, including PSF flux and forced science flux with uncertainties, and temporal processing metadata."
7,alerce_tap.lsst_dia_object,"Summary catalog of transient and variable objects detected in LSST difference imaging. Contains aggregated photometric measurements across all bands (u, g, r, i, z, y), including PSF flux statistics, forced photometry fluxes, and temporal information for each detected DIA object."
8,alerce_tap.lsst_mpc_orbits,"Complete orbital solutions from the Minor Planet Center for solar system objects, including Keplerian orbital elements, non-gravitational parameters (Yarkovsky, SRP, A1-A3), observational arc statistics, fit quality metrics, and Earth MOID calculations."
9,alerce_tap.sid_lut,Lookup table mapping survey identifiers to survey names and associated telescope identifiers.


## Rubin object identifiers

In LSST alerts, astrophysical objects have different identifiers depending on what they are associated to (see the Data Products Definition Document [DPDD](https://ls.st/lse-163) for details). Non-moving objects (e.g. supernovae, active galactic nuclei, variable stars) are associated to diaObjects, while known Solar System moving objects (e.g. asteroids) are associated to ssObjects. Identifiers for diaObjects and ssObjects are generated independent of each other by the lsst pipelines. We distinguish among them using the field ```sid```, for "survey identifier", so that

```sid = 1``` for diaObjects, and

```sid = 2``` for ssObjects. These are listed in the ```sid_lut``` table:

In [9]:
table_name = 'sid_lut'

In [10]:
query = '''
SELECT
    sid, tid, survey_name
FROM
    alerce_tap.%s
WHERE
    tid = %d
''' % (table_name, tid)

result = alerce_tap.search(query).to_table().to_pandas()
display(result)

Unnamed: 0,sid,tid,survey_name
0,1,1,LSST DIA Object
1,2,1,LSST SS Object


It is expected that still-unknown moving objects will initially be associated to diaObjects. Note that at times a non-moving object may be associated in proximity to a known moving object (see below). 

## Differences with respect to the Rubin alert schema

LSST data are stored in the ALeRCE database following the [Alert Production Database (APDB) schema](https://sdm-schemas.lsst.io/apdb.html) for most of the fields, with some exceptions detailed below:

### Object identifiers

We use the field ```oid``` as the identifier for each object. This ```oid``` is defined in different ways depending on the availability of diaObject information in the alert, so that

```oid = diaObjectId``` and ```sid = 1``` for objects that have diaObject information, and

```oid = ssObjectId``` and ```sid = 2``` if they only have ssObject information. In case an object has both identifiers (e.g. a known Solar System object was identified at a small angular distance from a non-moving object), we adopt ```oid = diaObjectId``` and ```sid = 1``` while still saving both their diaObject and ssSource data in the respective tables.

It is very unlikely that an object's diaObjectId is identical to another object's ssObjectId, but we still suggest to always query for the pair ```(oid, sid)``` instead of only using ```oid```.

The field ```oid``` appears in the following tables: ```object```, ```detection```, ```lsst_detection```, ```forced_photometry```, ```lsst_forced_photometry```, ```lsst_dia_object```, and ```probability```.

### Photometric band identifiers

Instead of saving band names as they come in alerts, we use integer identifiers and save them in the field ```band```. The mapping between band name and identifier for each ```sid``` is in the ```band``` table:

In [11]:
table_name = 'band'

In [12]:
query = '''
SELECT
    sid, tid, band, band_name, band_order
FROM
    alerce_tap.%s
WHERE
    tid = %d
ORDER BY
    sid ASC, band_order ASC
''' % (table_name, tid)

result = alerce_tap.search(query).to_table().to_pandas()
display(result)

Unnamed: 0,sid,tid,band,band_name,band_order
0,1,1,6,u,0
1,1,1,1,g,1
2,1,1,2,r,2
3,1,1,3,i,3
4,1,1,4,z,4
5,1,1,5,y,5
6,2,1,6,u,0
7,2,1,1,g,1
8,2,1,2,r,2
9,2,1,3,i,3


The field ```band``` appears in the following tables: ```band```, ```detection```, ```lsst_detection```, ```forced_photometry```, and ```lsst_forced_photometry```.

### Measurement identifiers, splitted tables

We call the identifier for each epoch in every survey as ```measurement_id```. For LSST, this means both field ```diaSourceId``` for detections and field ```diaForcedSourceId``` for forced photometry points are renamed ```measurement_id``` in our database.

Note that we split the diaSource alert field in two tables: ```detection``` (common to all surveys) and ```lsst_detection```. The full table can be recovered joining these two on ```(oid, sid, measurement_id)```. Similarly, we split the prvDiaForcedSources alert field in tables ```forced_photometry``` (common to all surveys) and ```lsst_forced_photometry```.

Field ```measurement_id``` appears in the following tables: ```band```, ```detection```, ```lsst_detection```, ```ss_detection```, ```forced_photometry```, and ```lsst_forced_photometry```.

### Tables that store latest record only

For each object, we only keep the latest record ingested by our pipeline for alert fields diaObject and mpc_orbits. These are saved in tables ```lsst_dia_object``` and ```lsst_mpc_orbits``` respectively.

## Important remarks

### About data types

Note that identifiers for objects and epochs (i.e. fields ```oid``` and ```measurement_id```) are stored as long integers, so caution is advised when manipulating this columns. In particular, some operations may implicitly convert long integer data as float if NaN values are also present (e.g., when converting a dictionary to a pandas dataframe column), with the unintended consequence of modifying original values by some digits. For the pandas package, we suggest to read input files together the pyarrow functionality when reading input files (by adding ```dtype_backend='pyarrow'``` as parameter), and convert pandas columns from ```int64``` type to ```Int64``` (or ```int64[pyarrow]```) when needed.

### About column names in ADQL queries

Note that while we store LSST schema names with the same capitalization as the APDB (e.g. we save column ```ssObjectId```), ADQL is case-insensitive by default, so ADQL queries to the ALeRCE TAP service return all fields in lower case (e.g. we retrieve the column as ```ssobjectid```). Caution is advised when working with tables obtained by mixed means (e.g. using ADQL outputs together with RSP catalogs, or with the ALeRCE python client outputs). We suggest converting all relevant column names to lower case, or map ADQL output columns to their original (mixed case) names instead.

## ALeRCE data tables, their columns and indexes

Here we use TAP queries to get table columns and indexes as

In [13]:
def show_table_info(table_name):
    query = '''
    SELECT
        table_name, column_name, description,
        unit, datatype, indexed
    FROM
        tap_schema.columns
    WHERE
        table_name = 'alerce_tap.%s'
    ''' % table_name
    
    result = alerce_tap.search(query).to_table().to_pandas()

    print(table_name)
    pd.set_option('display.max_rows', None)
    display(result)
    pd.set_option('display.max_rows', 30)

The main table and starting point in the ALeRCE database is the ```object``` table. We query this table to find objects by their ```oid```'s in a given survey, or to perform cone searches when such ```oid```'s are unknown. This table also allows filtering by properties like number of detections, as well as by MJD dates of first and last detections in the alert stream. Note that the ```object``` table is updated to the latest alert ingested by our pipeline, so there is always one row per ```oid```.

In [14]:
table_name = 'object'
show_table_info(table_name)

object


Unnamed: 0,table_name,column_name,description,unit,datatype,indexed
0,alerce_tap.object,oid,Internal object identifier,,long,1
1,alerce_tap.object,tid,Telescope identifier,,short,0
2,alerce_tap.object,sid,Survey identifier,,short,1
3,alerce_tap.object,meanra,Mean right ascension (ICRS),deg,double,1
4,alerce_tap.object,meandec,Mean declination (ICRS),deg,double,1
5,alerce_tap.object,sigmara,Standard deviation of right ascension,deg,double,0
6,alerce_tap.object,sigmadec,Standard deviation of declination,deg,double,0
7,alerce_tap.object,firstmjd,First observation MJD,d,double,1
8,alerce_tap.object,lastmjd,Last observation MJD,d,double,1
9,alerce_tap.object,deltamjd,Time span between first and last observation,d,double,0


We can use known object identifiers to find epoch data in the following tables:

In [15]:
data_tables = [
    'detection',
    'lsst_detection',
    'lsst_ss_detection',
    'forced_photometry',
    'lsst_forced_photometry',
]

In [16]:
for table_name in data_tables:
    show_table_info(table_name)

detection


Unnamed: 0,table_name,column_name,description,unit,datatype,indexed
0,alerce_tap.detection,oid,Object identifier,,long,1
1,alerce_tap.detection,sid,Survey identifier,,short,1
2,alerce_tap.detection,measurement_id,Measurement identifier,,long,1
3,alerce_tap.detection,mjd,Modified Julian Date,d,double,0
4,alerce_tap.detection,ra,Right ascension,deg,double,1
5,alerce_tap.detection,dec,Declination,deg,double,1
6,alerce_tap.detection,band,Photometric band,,short,0
7,alerce_tap.detection,created_date,Date record was created,,char,0


lsst_detection


Unnamed: 0,table_name,column_name,description,unit,datatype,indexed
0,alerce_tap.lsst_detection,oid,Object identifier,,long,1
1,alerce_tap.lsst_detection,sid,Survey identifier,,short,0
2,alerce_tap.lsst_detection,measurement_id,Unique identifier of this DiaSource,,long,1
3,alerce_tap.lsst_detection,parentdiasourceid,"Id of the parent diaSource this diaSource has been deblended from, if any",,long,0
4,alerce_tap.lsst_detection,visit,Visit number,,long,0
5,alerce_tap.lsst_detection,detector,Detector number,,int,0
6,alerce_tap.lsst_detection,diaobjectid,"Id of the diaObject this source was associated with, if any. If not, it is set to NULL (each diaSource will be associated with either a diaObject or ssObject)",,long,0
7,alerce_tap.lsst_detection,ssobjectid,"Id of the ssObject this source was associated with, if any. If not, it is set to NULL (each diaSource will be associated with either a diaObject or ssObject)",,long,0
8,alerce_tap.lsst_detection,raerr,Uncertainty of ra,deg,float,0
9,alerce_tap.lsst_detection,decerr,Uncertainty of dec,deg,float,0


lsst_ss_detection


Unnamed: 0,table_name,column_name,description,unit,datatype,indexed
0,alerce_tap.lsst_ss_detection,measurement_id,Unique identifier of this DIA source observation,,long,1
1,alerce_tap.lsst_ss_detection,ssobjectid,LSST solar system object identifier,,long,1
2,alerce_tap.lsst_ss_detection,designation,Unpacked primary provisional designation for this solar system object,,char,0
3,alerce_tap.lsst_ss_detection,ecllambda,Ecliptic longitude,deg,double,0
4,alerce_tap.lsst_ss_detection,eclbeta,Ecliptic latitude,deg,double,0
5,alerce_tap.lsst_ss_detection,gallon,Galactic longitude,deg,double,0
6,alerce_tap.lsst_ss_detection,gallat,Galactic latitude,deg,double,0
7,alerce_tap.lsst_ss_detection,elongation,"Solar elongation, the angular separation of the object from the Sun on the celestial sphere",deg,float,0
8,alerce_tap.lsst_ss_detection,phaseangle,"Phase angle, the angle between the Sun and observer as seen from the object",deg,float,0
9,alerce_tap.lsst_ss_detection,toporange,"Topocentric range, the distance from the observer to the object",AU,float,0


forced_photometry


Unnamed: 0,table_name,column_name,description,unit,datatype,indexed
0,alerce_tap.forced_photometry,oid,Object identifier,,long,1
1,alerce_tap.forced_photometry,sid,Survey identifier,,short,1
2,alerce_tap.forced_photometry,measurement_id,Measurement identifier,,long,1
3,alerce_tap.forced_photometry,mjd,Modified Julian Date,d,double,0
4,alerce_tap.forced_photometry,ra,Right ascension,deg,double,1
5,alerce_tap.forced_photometry,dec,Declination,deg,double,1
6,alerce_tap.forced_photometry,band,Photometric band,,short,0
7,alerce_tap.forced_photometry,created_date,Date record was created,,char,0


lsst_forced_photometry


Unnamed: 0,table_name,column_name,description,unit,datatype,indexed
0,alerce_tap.lsst_forced_photometry,oid,Object identifier,,long,1
1,alerce_tap.lsst_forced_photometry,sid,Survey identifier,,short,0
2,alerce_tap.lsst_forced_photometry,measurement_id,Unique id,,long,1
3,alerce_tap.lsst_forced_photometry,visit,Id of the visit where this forcedSource was measured,,long,0
4,alerce_tap.lsst_forced_photometry,detector,Id of the detector where this forcedSource was measured,,int,0
5,alerce_tap.lsst_forced_photometry,psfflux,Point Source model flux,,float,0
6,alerce_tap.lsst_forced_photometry,psffluxerr,Uncertainty of psfFlux,,float,0
7,alerce_tap.lsst_forced_photometry,scienceflux,Forced photometry flux for a point source model measured on the visit image centered at the DiaObject position,,float,0
8,alerce_tap.lsst_forced_photometry,sciencefluxerr,Uncertainty of scienceFlux,,float,0
9,alerce_tap.lsst_forced_photometry,timeprocessedmjdtai,"Time when this record was generated, expressed as Modified Julian Date, International Atomic Time",d,double,0


Latest record data for both diaObjects and MPC designations can be found in the following tables:

In [17]:
data_tables = [
    'lsst_dia_object',
    'lsst_mpc_orbits',
]

In [18]:
for table_name in data_tables:
    show_table_info(table_name)

lsst_dia_object


Unnamed: 0,table_name,column_name,description,unit,datatype,indexed
0,alerce_tap.lsst_dia_object,oid,Unique identifier of this DiaObject,,long,1
1,alerce_tap.lsst_dia_object,validitystartmjdtai,"Processing time when validity of this diaObject starts, expressed as Modified Julian Date, International Atomic Time",d,double,0
2,alerce_tap.lsst_dia_object,ra,Right ascension coordinate of the position of the object,deg,double,1
3,alerce_tap.lsst_dia_object,raerr,Uncertainty of ra,deg,float,0
4,alerce_tap.lsst_dia_object,dec,Declination coordinate of the position of the object,deg,double,1
5,alerce_tap.lsst_dia_object,decerr,Uncertainty of dec,deg,float,0
6,alerce_tap.lsst_dia_object,ra_dec_cov,Covariance between ra and dec,deg**2,float,0
7,alerce_tap.lsst_dia_object,u_psffluxmean,Weighted mean point-source model magnitude for u filter,,float,0
8,alerce_tap.lsst_dia_object,u_psffluxmeanerr,Standard error of u_psfFluxMean,,float,0
9,alerce_tap.lsst_dia_object,u_psffluxsigma,Standard deviation of the distribution of u_psfFlux,,float,0


lsst_mpc_orbits


Unnamed: 0,table_name,column_name,description,unit,datatype,indexed
0,alerce_tap.lsst_mpc_orbits,ssobjectid,Solar system object identifier,,long,1
1,alerce_tap.lsst_mpc_orbits,designation,The unpacked primary provisional designation for this object,,char,0
2,alerce_tap.lsst_mpc_orbits,packed_primary_provisional_designation,The primary provisional designation in packed form (e.g. K08A00B),,char,0
3,alerce_tap.lsst_mpc_orbits,unpacked_primary_provisional_designation,The primary provisional designation in unpacked form (e.g. 2008 AB),,char,0
4,alerce_tap.lsst_mpc_orbits,mpc_orb_jsonb,Details of the orbit solution in JSON form,,char,0
5,alerce_tap.lsst_mpc_orbits,created_at,When this row was created,,char,0
6,alerce_tap.lsst_mpc_orbits,updated_at,When this row was updated,,char,0
7,alerce_tap.lsst_mpc_orbits,orbit_type_int,Orbit type code,,int,0
8,alerce_tap.lsst_mpc_orbits,u_param,U parameter,,int,0
9,alerce_tap.lsst_mpc_orbits,nopp,Number of oppositions,,int,0


The ALeRCE pipeline computes probabilities for different classifiers. At this time, we classify LSST objects using the ALeRCE stamp classifier, which is a deep learning classifier based on each object's first alert image cutouts (manuscript in preparation; see [Carrasco-Davis et al. 2021](https://ui.adsabs.harvard.edu/abs/2021AJ....162..231C/abstract) for a similar version using the ZTF public alert stream). This classifier taxonomy includes five classes, and is used to pre-select transient candidates (e.g. supernovae, tidal disruption events) early after their first light.

In [19]:
lookup_tables = [
    'classifier',
    'taxonomy',
]

In [20]:
table_name = 'classifier'

query = '''
SELECT
    classifier_id, classifier_name, classifier_version, tid
FROM
    alerce_tap.%s
WHERE
    tid = %d
''' % (table_name, tid)

result = alerce_tap.search(query).to_table().to_pandas()
display(result)

Unnamed: 0,classifier_id,classifier_name,classifier_version,tid
0,1,stamp_classifier_rubin,2.0.1,1


In [21]:
classifier_id = 1

In [22]:
table_name = 'taxonomy'

query = '''
SELECT
    class_id, class_name, classifier_id
FROM
    alerce_tap.%s
WHERE
    classifier_id = %d
''' % (table_name, classifier_id)

result = alerce_tap.search(query).to_table().to_pandas()
display(result)

Unnamed: 0,class_id,class_name,classifier_id
0,0,SN,1
1,1,AGN,1
2,2,VS,1
3,3,asteroid,1
4,4,bogus,1


In [23]:
table_name = 'probability'
show_table_info(table_name)

probability


Unnamed: 0,table_name,column_name,description,unit,datatype,indexed
0,alerce_tap.probability,class_id,Class identifier,,short,1
1,alerce_tap.probability,oid,Object identifier,,long,1
2,alerce_tap.probability,sid,Survey identifier,,short,1
3,alerce_tap.probability,classifier_id,Classifier identifier,,short,1
4,alerce_tap.probability,classifier_version,Classifier version,,short,1
5,alerce_tap.probability,probability,Classification probability,,float,1
6,alerce_tap.probability,ranking,Classification ranking,,short,1
7,alerce_tap.probability,lastmjd,Last observation MJD,d,double,0
