# Time Correlation Filter Function

The Purpose of this Notebook will be to to plan and test a time correlation function that will filter sources depending on how well its opitcal and radio data lines up timewise.

The goal so far is to collect the optical and radio data for all the eta-v filtered sources (206 of them), and put them into two dataframes: fsd for FINK data and vsd for VAST data.

I've managed to construct vsd fine, but I'm having issues constructing fsd. when I run the portal request for the full list of sources, even with batching, the kernel appears to die and the notebook is reset. This happens even when the request is contained in a function I've called 'query_fink_db', which you can find in Projecttools.py. This Kernel restart error dosent happen if the ID list is sufficiently small (I tested 12 IDS with batching and it worked fine.)

In [2]:
#here are the necessary imports
import os
import sys
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from io import StringIO
from vasttools.pipeline import Pipeline
from vasttools.query import Query
import Projecttools as pro #brand new module for frequently used code!

%matplotlib inline

In [3]:
cms = pd.read_pickle('Fink_2020_sources_matched_to_VAST_all_sources.pickle')
pro.family_sort(cms)
cms.groupby('family').size().sort_values(ascending=False)

family
AGN                827
Unknown            516
Galaxy             167
Solar System        81
Radio               70
Supernova           51
Multiwavelength     39
Star                21
dtype: int64

In [4]:
#This will automatically find the base directory that needed to be specified
pipe=Pipeline()
#this way, we can also load specific runs from the VAST pipeline:
my_run=pipe.load_run('tiles_corrected')



In [5]:
#Im just putting the eta and v threshholds because the eta-v analysis takes an actual eternity to complete and I already
#have the values here:
eta_thresh=2.315552652171963
v_thresh=0.2878888414273631

In [7]:
cms_candidates = pro.eta_v_candidate_filter(cms,my_run,eta_thresh,v_thresh)
cms_candidates.groupby('family').size().sort_values(ascending=False)

There are 213 candidate sources:


family
AGN                93
Unknown            53
Solar System       30
Galaxy             15
Radio               9
Star                5
Multiwavelength     4
Supernova           4
dtype: int64

I will be testing out this function on the ETA-V filtered sources. In order for this to work, I need to have both the radio and optical data available for each source. Since the FINK broker has a limit as to how many sources can be queried at a time, I will have to either take a random sample of sources from the filtered catalogue or apply the function to a sampling of each family seperately.

Alternatively, I can do some "Batching:" breaking up the ID list into batches, running the portal query, and stitching the results of each batch together into a DataFrame.

In [8]:
#These IDs are selected from the curated list of interesting sources (lightcurves can be seen via powerpoint.)
Idlist=cms_candidates['objectId']

In [42]:
#defining column array for cutouts

num_elem=len(Idlist)#length of id list
num_chunks=num_elem//10+1 #number of chunks, based on how you want to divide them up. in this case, 10 IDS per chunk
list_chunks=np.array_split(np.arange(num_elem), num_chunks) #np.arange(num_elem) makes an ordered array, from 0 to (num_elem - 1).
                                                            #np.array_split splits said ordered array according to the number of chunks specified by num_chunks
                                                            #each chunk is an element in the array 'list_chunks'

list_df=[] #empty array to hold fink sources.

#defining column array for cutouts
cutouts=[
'b:cutoutScience_stampData',
'b:cutoutTemplate_stampData',
'b:cutoutDifference_stampData'
]

#for chunk_idx in list_chunks: #for each chunk in list_chunks
    #start,end=chunk_idx[0],chunk_idx[-1]+1 #define the starting and ending indexes for the given chunk
chunk_idx=list_chunks[0]
start,end=chunk_idx[0],chunk_idx[-1]+1
#this is the request made to the fink portal to pull out the info for each source
r = requests.post(
    'https://fink-portal.org/api/v1/objects',
    json={
    'objectId': ','.join(Idlist[start:end]), #This is where the 'chunk_idx[-1] +1' comes into play. the 'end' variable when slicing the list is inclusive of the index.
    'output-format': 'json',
    'withcutouts': 'False',
    'columns': 'i:objectId,v:firstdate,v:lastdate',
    #'cols': ','.join(cutouts),
    'withupperlim': 'False' #important for lightcurve plotting
    }
)
df_tmp=pd.read_json(StringIO(r.content.decode()))#define a temporary dataframe that holds the queried sources from the chunk
    #df_tmp=pro.query_fink_db(Idlist[start:end])
list_df.append(df_tmp)
fsd=pd.concat(list_df)

ValueError: Expected object or value

In [14]:
fsd.keys().tolist()

['b:cutoutDifference_stampData',
 'b:cutoutScience_stampData',
 'b:cutoutTemplate_stampData',
 'd:DR3Name',
 'd:Plx',
 'd:cdsxmatch',
 'd:e_Plx',
 'd:gcvs',
 'd:mulens',
 'd:nalerthist',
 'd:rf_kn_vs_nonkn',
 'd:rf_snia_vs_nonia',
 'd:roid',
 'd:snn_sn_vs_all',
 'd:snn_snia_vs_nonia',
 'd:vsx',
 'i:aimage',
 'i:aimagerat',
 'i:bimage',
 'i:bimagerat',
 'i:candid',
 'i:chinr',
 'i:chipsf',
 'i:classtar',
 'i:clrcoeff',
 'i:clrcounc',
 'i:clrmed',
 'i:clrrms',
 'i:dec',
 'i:decnr',
 'i:diffmaglim',
 'i:distnr',
 'i:distpsnr1',
 'i:distpsnr2',
 'i:distpsnr3',
 'i:drb',
 'i:drbversion',
 'i:dsdiff',
 'i:dsnrms',
 'i:elong',
 'i:exptime',
 'i:fid',
 'i:field',
 'i:fink_broker_version',
 'i:fink_science_version',
 'i:fwhm',
 'i:isdiffpos',
 'i:jd',
 'i:jdendhist',
 'i:jdendref',
 'i:jdstarthist',
 'i:jdstartref',
 'i:magap',
 'i:magapbig',
 'i:magdiff',
 'i:magfromlim',
 'i:maggaia',
 'i:maggaiabright',
 'i:magnr',
 'i:magpsf',
 'i:magzpsci',
 'i:magzpscirms',
 'i:magzpsciunc',
 'i:mindtoedg

In [None]:
#cms_candidates_selection = cms_candidates.query('objectId == @Special_IDS_1')

#This reformats the index's in the candidate selection, so specific row ranges can be pulled out.
#cms_candidates_selection = cms_candidates_selection.reset_index()
#len(cms_candidates_selection)

In [None]:
#This selects rows in our selection in interger index steps from a starting and stopping point
#test.loc[start:stop:steps]

#This selects up to 30 rows at a time
#candidate_sample = cms_candidates_selection.loc[0:11:1]
#len(candidate_sample)

In [None]:
#defining column array for cutouts
#cutouts=[
#'b:cutoutScience_stampData',
#'b:cutoutTemplate_stampData',
#'b:cutoutDifference_stampData'
#]

#this is the request made to the fink portal to pull out the info for each source
#r = requests.post(
#'https://fink-portal.org/api/v1/objects',
# json={
#    'objectId': ','.join(Idlist), 
#    'output-format': 'json',
#    'withcutouts': 'True',
#    'cols': ','.join(cutouts),
#    'withupperlim': 'True' #important for lightcurve plotting
#  }
#)

In [None]:
#reads in json file data as DataFrame. fsd stands for 'FINK source data'
#fsd=pd.read_json(StringIO(r.content.decode()))

In [None]:
#fsd object should contain the optical time data for the selected objects
#fsd

In [None]:
#vsd stands for 'vast source data'
#vsd=[]
#for x in Special_IDS_1:
    #y=cms_candidates_selection[cms_candidates_selection['objectId'] == x]['matched_id'].astype(int).values[0]
    #z=my_run.get_source(y).measurements
    #vsd.append(z)

As you can see below, the radio data is there.

In [None]:
vsi=[]
for i in Idlist:
    y=cms_candidates[cms_candidates['objectId'] == i]['matched_id'].astype(int).values[0]
    vsi.append(y)
meas=my_run.measurements
vsd=meas[meas.source.isin(vsi)]

In [None]:
#that should have a length of 206, because, we're neglecting duplicate lines
vsd