# Exploring FINK dataset

This notebook will contain a few working examples on how to explore FINK dataset

It assumes data were already transferred from FINK download service:
*    Use FInk API https://fink-portal.org/download
*    Selection keys azre given by https://fink-broker.readthedocs.io/en/latest/science/added_values/ and https://zwickytransientfacility.github.io/ztf-avro-alert/schema.html
* Credentials are needed in order to effectively download data


## 1) Loading local data to pandas

In [1]:
import pandas as pd

#### Import packages for this part 

FINK downloaded data are stored in Parquet format, under a top directory hereafter named 'topdir'

**Make sure you edit properly next cell for your own setup**

In [2]:
#datapath = "<topdir>/ftransfer_ztf_2024-02-01_689626"
datapath="/Users/gangler/data/FINK/MatrixProfile/ftransfer_ztf_2024-02-01_689626"

#### Load data into Pandas

In [3]:
pdf=pd.read_parquet(datapath)

## 2) Select an alert of interest

Here we get the extracted data corresponding to the last alert of an object given by ObjectID

In [4]:
objectId='ZTF18adbmoft'

In [5]:
idx=(pdf[pdf['objectId']==objectId].apply(lambda x: x.candidate['jd'],axis=1)).idxmax()
pdf_obj=pdf.loc[idx]


In [6]:
#getting info is done for example by
print(pdf_obj.candid)

1356527403015015001


## 3) Build a new dataframe from the candidate and prv_candidate data

Note that alternate ways to extract the information may be explored...
But we need to put together candidate and prv_candidate data...

In [7]:
def panda_candidate(pdf,idx):
    """builds a nex data frame out of the candidate and prv_candidate information for record idx"""
    pdf_obj=pdf.loc[idx]

    # makle sure candidate is set into the prv_candidate format
    minicand=dict()
    for col in pdf_obj.prv_candidates[0].keys():
        minicand[col]=pdf_obj.candidate[col]

    # add candidate as the last record of prv_candidates
    all_candidates=list(pdf_obj.prv_candidates)
    all_candidates.append(minicand)

    # turn and return as a dataframe
    return pd.DataFrame.from_records(all_candidates)

In [8]:
pdf_cand=panda_candidate(pdf,idx)

In [9]:
pdf_cand

Unnamed: 0,aimage,aimagerat,bimage,bimagerat,candid,chinr,chipsf,classtar,clrcoeff,clrcounc,...,sigmagnr,sigmapsf,sky,ssdistnr,ssmagnr,ssnamenr,sumrat,tblid,xpos,ypos
0,0.834,0.698080,0.788,0.659577,1.327451e+18,0.506,38.930698,0.019,0.089630,0.000006,...,0.018,0.081736,0.174667,,,,0.921578,3.0,264.931000,552.168030
1,,,,,,,,,-0.067182,0.000014,...,,,,,,,,,,
2,1.128,0.484120,1.038,0.445494,1.328446e+18,0.966,1.545020,0.648,-0.041911,0.000025,...,0.019,0.140106,-0.024658,,,,1.000000,1.0,264.778015,522.242004
3,,,,,,,,,0.087819,0.000005,...,,,,,,,,,,
4,0.915,0.731859,0.835,0.667871,1.329476e+18,0.506,23.880400,0.996,0.091457,0.000006,...,0.018,0.085136,0.029418,,,,1.000000,5.0,283.713013,580.533997
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73,1.125,0.354890,0.993,0.313249,1.356489e+18,0.564,13.679300,0.748,0.092013,0.000004,...,0.013,0.044373,0.551492,,,,1.000000,3.0,828.994019,853.330017
74,0.841,0.426904,0.750,0.380711,1.356506e+18,0.701,5.281990,0.953,-0.076709,0.000007,...,0.017,0.091629,0.735663,,,,1.000000,10.0,787.593994,853.427979
75,0.795,0.315476,0.721,0.286111,1.356507e+18,0.966,3.318680,0.887,-0.077541,0.000011,...,0.019,0.125647,0.210901,,,,1.000000,2.0,239.983994,829.258972
76,0.832,0.597514,0.772,0.554424,1.356527e+18,0.966,1.181300,0.671,-0.047331,0.000025,...,0.019,0.130945,0.984812,,,,1.000000,2.0,251.404999,821.200989


## 4) Explore the relation between sigmapdf and magdifflim

#### Import packages for this section

In [10]:
import numpy as np

### A few useful definitions

In [11]:
jansky=3631

def magnitude(flux):
    """returns magnitude from flux"""
    return -2.5*np.log10(flux/jansky)

def flux(mag):
    """returns flux from magnitude"""
    return jansky*10**(-0.4*mag)

def sigmaflux(mag,sigmag):
    """returns the sigma of the flux from the magnitude and sigma in magnitude"""
    return np.log(10)/2.5*flux(mag)*sigmag

def mag_dc(magpsf,magnr,sign=1):
    """returns the total magnitude from a sum of magnitudes"""
    return magnitude(sign*flux(magpsf) + flux(magnr))

def sigmag_dc(magpsf,sigmapsf,magnr,sigmagnr,sign=1):
    """returns the sigma on the total magnitude from 2 magnitudes and their sigmas"""
    sigmaflux_calc=np.sqrt(sigmaflux(magpsf,sigmapsf)**2 + sigmaflux(magnr,sigmagnr) **2)
    return 2.5/np.log(10) *sigmaflux_calc / (sign*flux(magpsf) + flux(magnr))

def sigmaflux_from_upper(diffmaglim,rescale_factor=1):
    """infer the sigmapsf from diffmaglim
        the 4.26 factor is black magic !!"""
    return 1/5*flux(diffmaglim)*rescale_factor

def sigmaglim_dc(magnr,sigmagnr,diffmaglim):
    """infer the sigma in magnitude from the underlying object and the 5-sigma limit"""
    return 2.5/np.log(10) *sigmaflux_from_upper(diffmaglim) / flux(magnr)
    

#### Working in flux

Let's compare the sigma in flux from the psf photometry on the difference, and the sigma in flux inferred from the 5-sigma upper limit in magnitude.

For this object (and maybe for other objects), there is a systematic underestimation of the sigma from the 5-sigma limit. 

The doc https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf explains that the 5-sigma limit is an a priori estimate, without forced fotometry extration, while the sigmapsf is the result of an actual fit. The difference seemly comes from there.

In [12]:
pdf_cand['sigmaflux']=sigmaflux(pdf_cand['magpsf'],pdf_cand['sigmapsf'])
pdf_cand['sigmafluxupper']=sigmaflux_from_upper(pdf_cand['diffmaglim'])
pdf_cand[['sigmaflux','sigmafluxupper']]

Unnamed: 0,sigmaflux,sigmafluxupper
0,0.000018,0.000005
1,,0.000005
2,0.000013,0.000008
3,,0.000005
4,0.000023,0.000006
...,...,...
73,0.000017,0.000005
74,0.000008,0.000004
75,0.000012,0.000004
76,0.000011,0.000010


#### Scale factor
Here we compute the rescale factor to be applied in sigmaflux_from_upper . Note that it seemds to be filter dependent.

It may also be object dependent


In [13]:
def sigmalim_rescale_factor(pdf_cand):
    """provides an estimate of the rescale factor based on the quadratic averages ratio of sigmaflux and sigmafluxupper"""

    # check if already computed
    try:
        pdf_cand['sigmaflux'],pdf_cand['sigmafluxupper']
    except:
        pdf_cand['sigmaflux']=sigmaflux(pdf_cand['magpsf'],pdf_cand['sigmapsf'])
        pdf_cand['sigmafluxupper']=sigmaflux_from_upper(pdf_cand['diffmaglim'])
    return np.sqrt(np.mean(((pdf_cand['sigmaflux']/pdf_cand['sigmafluxupper'])[np.isfinite(pdf_cand['sigmaflux'])])**2))

In [14]:
# not sure how to get rid of the warnings...
print(sigmalim_rescale_factor(pdf_cand) , sigmalim_rescale_factor(pdf_cand[pdf_cand['fid']==1]) , sigmalim_rescale_factor(pdf_cand[pdf_cand['fid']==2]))

3.725030096799761 1.9973023835917523 4.654601721082196
