# Querying the ZTF ALeRCE Python client
```Author: Eden Girma, Last updated 20210422```

# Table of contents:
* [Setting up ALeRCE Python client](#setup)
* [Querying objects from the past 12 hr](#query)
* [Understanding retrieved object data](#data)
* [Exporting data to .xml](#export)

In [1]:
import sys

# Packages for direct database access
# %pip install psycopg2
import psycopg2
import json

# Packages for data and number handling
import numpy as np
import pandas as pd
import math

# Packages for calculating current time and extracting ZTF data to VOTable
from astropy.time import Time
from astropy.table import Table, unique
from astropy.io.votable import from_table, writeto
from datetime import datetime

# Packages for display and data plotting, if desired
from IPython.display import HTML
from IPython.display import display
import matplotlib.pyplot as plt
%matplotlib inline

## Setting up ALeRCE Python client <a class="anchor" id="setup"></a>

In [2]:
# Set up ALeRCE python client
from alerce.core import Alerce
client = Alerce()

## Querying objects from the past 12 hr <a class="anchor" id="query"></a>

In [5]:
# Querying the ALeRCE client for objects detected within the past 12 hours
min_lastmjd = Time(datetime.today(), scale='utc').mjd - 1

objects = client.query_objects(lastmjd = [min_lastmjd, None],
                           page_size = int(1e6),
                           format='votable')

Note that when we specify a ```votable``` format, the output of our ```query_objects``` is an  ```astropy.table.table.Table``` object:

In [6]:
print(type(objects))

<class 'astropy.table.table.Table'>


We can also re-order and sort the resulting table, depending on how we want the final exported information to be organized.

In [7]:
# Re-ordering votable so that the 'OID' column is in the front
nc = ['oid'] + [c for c in objects.columns if c != 'oid']
objects = objects[nc]

# Sort objects by lastmjd then first mjd, in descending order
objects.sort(['lastmjd','firstmjd'])
objects = objects[::-1]

## Understanding retrieved object data <a class="anchor" id="data"></a>

To quickly get a sense of what we're looking at, let's print out the first 5 rows of our table:

In [8]:
objects[0:5]

oid,class,classifier,corrected,deltajd,firstmjd,g_r_max,g_r_max_corr,g_r_mean,g_r_mean_corr,lastmjd,meandec,meanra,mjdendhist,mjdstarthist,ncovhist,ndet,ndethist,probability,sigmadec,sigmara,stellar,step_id_corr
str12,object,object,bool,float64,float64,object,object,object,object,float64,float64,float64,float64,float64,int64,int64,str4,object,object,object,bool,str16
ZTF21aaxbzdb,,,False,0.0,59325.50875000004,,,,,59325.50875000004,40.7263525,352.3253321,59325.50875000004,59325.50875000004,1368,1,4,,,,False,correction_0.0.1
ZTF21aaxbzcz,,,False,0.0,59325.50875000004,,,,,59325.50875000004,40.7264797,352.3255966,59325.50875000004,58767.27603009995,1368,1,5,,,,False,correction_0.0.1
ZTF21aaxbzdg,,,False,0.0,59325.50875000004,,,,,59325.50875000004,41.9033927,351.7398069,59325.50875000004,58507.158391200006,1213,1,8,,,,False,correction_0.0.1
ZTF21aaxbzcl,,,False,0.0,59325.50875000004,,,,,59325.50875000004,37.5465427,353.4647511,59325.50875000004,58854.13714119978,1378,1,2,,,,False,correction_0.0.1
ZTF21aaxbzch,,,False,0.0,59325.50875000004,,,,,59325.50875000004,42.0446682,356.995874,59325.50875000004,59325.50875000004,1385,1,1,,,,False,correction_0.0.1


Below we'll see the total number of entries and names of each column in the data table.

In [9]:
objects.info

<Table length=78472>
     name      dtype 
------------- -------
          oid   str12
        class  object
   classifier  object
    corrected    bool
      deltajd float64
     firstmjd float64
      g_r_max  object
 g_r_max_corr  object
     g_r_mean  object
g_r_mean_corr  object
      lastmjd float64
      meandec float64
       meanra float64
   mjdendhist float64
 mjdstarthist float64
     ncovhist   int64
         ndet   int64
     ndethist    str4
  probability  object
     sigmadec  object
      sigmara  object
      stellar    bool
 step_id_corr   str16

A note: there are some entries that correspond to objects which haven't yet been classified. We know this as their ```class```, ```classifier```, and ```probability``` attributes are all equal to ```None```.

In [10]:
none_indices=[]

for i, c in enumerate(objects['probability']):
    if c == None:
        none_indices.append(i)

# Example row of an object with 'None' values:
idx = none_indices[0]
objects[idx]

oid,class,classifier,corrected,deltajd,firstmjd,g_r_max,g_r_max_corr,g_r_mean,g_r_mean_corr,lastmjd,meandec,meanra,mjdendhist,mjdstarthist,ncovhist,ndet,ndethist,probability,sigmadec,sigmara,stellar,step_id_corr
str12,object,object,bool,float64,float64,object,object,object,object,float64,float64,float64,float64,float64,int64,int64,str4,object,object,object,bool,str16
ZTF21aaxbzdb,,,False,0.0,59325.50875000004,,,,,59325.50875000004,40.7263525,352.3253321,59325.50875000004,59325.50875000004,1368,1,4,,,,False,correction_0.0.1


There can also be some rows with duplicate OIDs:

In [11]:
# Identify duplicate OID entries - rows with same OID but different classes and probabilities
obsgroup = objects.group_by(['oid'])
duplicates = []
for key, group in zip(obsgroup.groups.keys, obsgroup.groups):
    if len(group) > 1:
        oid = group['oid'][1]
        duplicates.append(oid)

print('Number of duplicate OIDs: %i' % (len(duplicates)))

# Print example rows with duplicate OIDs
oid = duplicates[0]
mask = (objects['oid'] == oid)
objects[mask]

Number of duplicate OIDs: 101


oid,class,classifier,corrected,deltajd,firstmjd,g_r_max,g_r_max_corr,g_r_mean,g_r_mean_corr,lastmjd,meandec,meanra,mjdendhist,mjdstarthist,ncovhist,ndet,ndethist,probability,sigmadec,sigmara,stellar,step_id_corr
str12,object,object,bool,float64,float64,object,object,object,object,float64,float64,float64,float64,float64,int64,int64,str4,object,object,object,bool,str16
ZTF17aaaeart,RRL,lc_classifier,True,1043.0584028000012,58282.42708329996,0.622361,0.516841,0.4271347,0.45453137,59325.48548609996,48.02049198565122,335.263424599117,59325.48548609996,58101.119618100114,1247,453,546,0.224016,9.188696327157331e-05,0.0001068833937816,True,correction_0.0.1
ZTF17aaaeart,E,lc_classifier,True,1043.0584028000012,58282.42708329996,0.622361,0.516841,0.4271347,0.45453137,59325.48548609996,48.02049198565122,335.263424599117,59325.48548609996,58101.119618100114,1247,453,546,0.224016,9.188696327157331e-05,0.0001068833937816,True,correction_0.0.1


However it seems that these duplicate cases occur with objects that are classified as having the same probability in different classes.

In [14]:
same_probability = True

for oid in duplicates:
    mask = (objects['oid']==oid)
    c = [c for c in objects[mask]['class']]
    p = [c for c in objects[mask]['probability']]
    
    if all(el == p[0] for el in p):
        if oid == duplicates[-1]:
            if same_probability:
                print('All duplicate cases have the same probability in 2 or more classes.')
        continue
    else:
        print('OID: %s' % (oid))
        print('Classes: %i' % (c))
        print('Probabilities: %l' % (p))
        print('--------------------------')
        same_probability = False

All duplicate cases have the same probability in 2 or more classes.


We can make a new table that removes duplicate OID rows:

In [15]:
unique_objects = unique(objects, keys=['oid'])
unique_objects.info

<Table length=78370>
     name      dtype 
------------- -------
          oid   str12
        class  object
   classifier  object
    corrected    bool
      deltajd float64
     firstmjd float64
      g_r_max  object
 g_r_max_corr  object
     g_r_mean  object
g_r_mean_corr  object
      lastmjd float64
      meandec float64
       meanra float64
   mjdendhist float64
 mjdstarthist float64
     ncovhist   int64
         ndet   int64
     ndethist    str4
  probability  object
     sigmadec  object
      sigmara  object
      stellar    bool
 step_id_corr   str16

## Exporting data to .xml <a class="anchor" id="export"></a>

```astropy``` offers a simple method in which to export their Table object as an .xml file.

In [16]:
# Export our queried objects table to an .xml file
writeto(unique_objects, "ztf_API_output.xml")