# Time Correlation Filter Function

The Purpose of this Notebook will be to to plan and test a time correlation function that will filter sources depending on how well its opitcal and radio data lines up timewise.

The goal so far is to collect the optical and radio data for all the eta-v filtered sources (206 of them), and put them into two dataframes: fsd for FINK data and vsd for VAST data.

I've managed to construct vsd fine, but I'm having issues constructing fsd. when I run the portal request for the full list of sources, even with batching, the kernel appears to die and the notebook is reset. This happens even when the request is contained in a function I've called 'query_fink_db', which you can find in Projecttools.py. This Kernel restart error dosent happen if the ID list is sufficiently small (I tested 12 IDS with batching and it worked fine.)

In [1]:
#here are the necessary imports
import os
import sys
import gc
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from io import StringIO
from vasttools.pipeline import Pipeline
from vasttools.query import Query
import Projecttools as pro #brand new module for frequently used code!
import datetime as date

%matplotlib inline

In [2]:
cms = pd.read_pickle('Fink_2020_sources_matched_to_VAST_all_sources.pickle')
pro.family_sort(cms)
cms.groupby('family').size().sort_values(ascending=False)

family
AGN                827
Unknown            516
Galaxy             167
Solar System        81
Radio               70
Supernova           51
Multiwavelength     39
Star                21
dtype: int64

In [3]:
#This will automatically find the base directory that needed to be specified
pipe=Pipeline()
#this way, we can also load specific runs from the VAST pipeline:
my_run=pipe.load_run('tiles_corrected')



In [4]:
#Im just putting the eta and v threshholds because the eta-v analysis takes an actual eternity to complete and I already
#have the values here:
eta_thresh=2.315552652171963
v_thresh=0.2878888414273631

In [5]:
cms_candidates = pro.eta_v_candidate_filter(cms,my_run,eta_thresh,v_thresh)
cms_candidates.groupby('family').size().sort_values(ascending=False)

There are 213 candidate sources:


family
AGN                93
Unknown            53
Solar System       30
Galaxy             15
Radio               9
Star                5
Multiwavelength     4
Supernova           4
dtype: int64

I will be testing out this function on the ETA-V filtered sources. In order for this to work, I need to have both the radio and optical data available for each source. Since the FINK broker has a limit as to how many sources can be queried at a time, I've done some "Batching:" breaking up the ID list into batches, running the portal query, and stitching the results of each batch together into a DataFrame.

In [10]:
#These IDs are selected from the curated list of interesting sources (lightcurves can be seen via powerpoint.)
Idlist=cms_candidates['objectId']

In [46]:
num_elem=len(Idlist)#length of id list
num_chunks=num_elem//30+1 #number of chunks, based on how you want to divide them up. in this case, 10 IDS per chunk
list_chunks=(np.array_split(np.arange(num_elem), num_chunks))#np.arange(num_elem) makes an ordered array, from 0 to (num_elem - 1).
                                                            #np.array_split splits said ordered array according to the number of chunks specified by num_chunks
                                                            #each chunk is an element in the array 'list_chunks'
for i in list(range(len(list_chunks))):
    list_chunks[i]=list_chunks[i].tolist()

#defining column array for cutouts
cutouts=[
'b:cutoutScience_stampData',
'b:cutoutTemplate_stampData',
'b:cutoutDifference_stampData'
]

for chunk_idx in list_chunks: #for each chunk in list_chunks
    start,end=chunk_idx[0],chunk_idx[-1]+1 #define the starting and ending indexes for the given chunk

    #this is the request made to the fink portal to pull out the info for each source
    #df_tmp=pro.query_fink_db(Idlist[start:end])
    r = requests.post(
        'https://fink-portal.org/api/v1/objects',
        json={
        'objectId': ','.join(Idlist[start:end]), #This is where the 'chunk_idx[-1] +1' comes into play. the 'end' variable when slicing the list is inclusive of the index.
        'output-format': 'json'
        #'withcutouts': 'False',
        #'columns': 'i:objectId,v:firstdate,v:lastdate',
        #'cols': ','.join(cutouts),
        #'withupperlim': 'True' #important for lightcurve plotting
        }
    )
    df_tmp=pd.read_json(StringIO(r.content.decode()))#define a temporary dataframe that holds the queried sources from the chunk
    #saves the temporary dataframe to a folder as a .pkl file. the naming is based on which batch we're looking at
    df_tmp.to_pickle('/home/jovyan/work/Project_VAST_FINK/FINK_Batches/Batch_{}.pkl'.format(list_chunks.index(chunk_idx)+1))
    #clears memory from jupyter to help it not get stuck.
    gc.collect()

list_df=[] #empty array to hold fink sources.

for chunk_idx in list_chunks:
    #now, we're loading back in all the batches we saved and appending/concatonating them all back together into one dataframe: fsd
    df_tmp=pd.read_pickle('/home/jovyan/work/Project_VAST_FINK/FINK_Batches/Batch_{}.pkl'.format(list_chunks.index(chunk_idx)+1))
    list_df.append(df_tmp)
fsd_load=pd.concat(list_df)

Alternatively, If you've already got fsd saved as a pickle file, load it here:

In [6]:
fsd_load=pd.read_pickle('FINK_Batches/FSD_No_Upperlim.pkl')

In [122]:
#fsd object should contain the optical time data for the selected objects
fsd

Unnamed: 0,b:cutoutDifference_stampData,b:cutoutScience_stampData,b:cutoutTemplate_stampData,d:DR3Name,d:Plx,d:cdsxmatch,d:e_Plx,d:gcvs,d:mulens,d:nalerthist,...,v:g-r,v:rate(g-r),v:dg,v:rate(dg),v:dr,v:rate(dr),v:lastdate,v:firstdate,v:lapse,v:constellation
0,binary:ZTF18acvcfsc_2459840.9871528:cutoutDiff...,binary:ZTF18acvcfsc_2459840.9871528:cutoutScie...,binary:ZTF18acvcfsc_2459840.9871528:cutoutTemp...,Gaia DR3 2495240965204584960,0.0794,Seyfert_1,0.0781,Unknown,0.0,14,...,,,-0.016734,-0.273729,0.000000,0.000000,2022-09-18 11:41:30.002,2018-11-07 07:56:53.002,1411.155984,Cetus
1,binary:ZTF18acvcfsc_2459840.9260185:cutoutDiff...,binary:ZTF18acvcfsc_2459840.9260185:cutoutScie...,binary:ZTF18acvcfsc_2459840.9260185:cutoutTemp...,Gaia DR3 2495240965204584960,0.0794,Seyfert_1,0.0781,Unknown,0.0,13,...,,,0.053758,0.004914,0.000000,0.000000,2022-09-18 10:13:27.998,2018-11-07 07:56:53.002,1411.094849,Cetus
2,binary:ZTF19abxtqqt_2459840.8890972:cutoutDiff...,binary:ZTF19abxtqqt_2459840.8890972:cutoutScie...,binary:ZTF19abxtqqt_2459840.8890972:cutoutTemp...,Gaia DR3 2546694574627257344,-0.0933,QSO,0.2442,Unknown,0.0,15,...,,,-0.001322,-0.000261,0.000000,0.000000,2022-09-18 09:20:17.998,2019-01-08 03:09:47.002,1349.257303,Pisces
3,binary:ZTF19abxtqqt_2459840.7953356:cutoutDiff...,binary:ZTF19abxtqqt_2459840.7953356:cutoutScie...,binary:ZTF19abxtqqt_2459840.7953356:cutoutTemp...,Gaia DR3 2546694574627257344,-0.0933,QSO,0.2442,Unknown,0.0,15,...,,,0.000000,0.000000,0.026254,0.008643,2022-09-18 07:05:16.996,2019-01-08 03:09:47.002,1349.163542,Pisces
4,binary:ZTF19acscbee_2459840.7874769:cutoutDiff...,binary:ZTF19acscbee_2459840.7874769:cutoutScie...,binary:ZTF19acscbee_2459840.7874769:cutoutTemp...,Gaia DR3 2684762616252942080,0.3254,QSO,0.1993,Unknown,0.0,16,...,,,0.000000,0.000000,0.060176,0.030405,2022-09-18 06:53:58.004,2018-11-21 02:31:39.000,1397.182164,Aquarius
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
764,binary:ZTF19aavprpp_2458791.646794:cutoutDiffe...,binary:ZTF19aavprpp_2458791.646794:cutoutScien...,binary:ZTF19aavprpp_2458791.646794:cutoutTempl...,,,Galaxy,,,0.0,3,...,,,0.000000,0.000000,0.000000,0.000000,2019-11-04 03:31:23.002,2018-11-13 02:03:26.001,356.061076,Aquarius
765,binary:ZTF19aazdsff_2458790.7058333:cutoutDiff...,binary:ZTF19aazdsff_2458790.7058333:cutoutScie...,binary:ZTF19aazdsff_2458790.7058333:cutoutTemp...,,,QSO,,,0.0,14,...,,,0.000000,0.000000,-0.035954,-0.005989,2019-11-03 04:56:23.997,2018-09-27 08:55:40.999,401.833831,Pisces
766,binary:ZTF19aazdsff_2458789.7295718:cutoutDiff...,binary:ZTF19aazdsff_2458789.7295718:cutoutScie...,binary:ZTF19aazdsff_2458789.7295718:cutoutTemp...,,,QSO,,,0.0,14,...,,,0.000000,0.000000,0.000000,0.000000,2019-11-02 05:30:35.004,2018-09-27 08:55:40.999,400.857570,Pisces
767,binary:ZTF18adcbdbj_2458789.7245718:cutoutDiff...,binary:ZTF18adcbdbj_2458789.7245718:cutoutScie...,binary:ZTF18adcbdbj_2458789.7245718:cutoutTemp...,,,QSO,,,0.0,12,...,,,0.000000,0.000000,0.000000,0.000000,2019-11-02 05:23:23.004,2018-12-30 02:39:19.002,307.113935,Aquarius


As you can see below, the radio data is there.

In [79]:
#at the end, I turn the vaex Dataframelocal object into a pandas dataframe directly (our list of sources is not that large)
vsi=[]
for i in Idlist:
    y=cms_candidates[cms_candidates['objectId'] == i]['matched_id'].astype(int).values[0]
    vsi.append(y)
meas=my_run.measurements
vsd=meas[meas.source.isin(vsi)].to_pandas_df()

In [89]:
#that should have a length of 206, because, we're neglecting duplicate lines
vsd

Unnamed: 0,source,island_id,component_id,local_rms,ra,ra_err,dec,dec_err,flux_peak,flux_peak_err,...,ns_sys_err,error_radius,uncertainty_ew,uncertainty_ns,weight_ew,weight_ns,forced,flux_int_isl_ratio,flux_peak_isl_ratio,id
0,3363670,CS_0918-06A_island_3703,CS_0918-06A_component_3703a,0.286000,138.088654,0.000572,-3.426351,0.000431,1.536000,0.303659,...,0.000278,0.000715,0.000767,0.000767,1698801.50,1698801.50,False,1.0,1.0,16300152
1,3363670,SB10908_island_4202,SB10908_component_4202a,0.294000,138.088440,0.000344,-3.426282,0.000302,1.581000,0.311606,...,0.000278,0.000457,0.000535,0.000535,3492637.25,3492637.25,False,1.0,1.0,17974207
2,3363670,15670_island_4273,15670_component_4273a,0.300000,138.088715,0.000241,-3.426245,0.000239,1.570000,0.293329,...,0.000278,0.000339,0.000438,0.000438,5208737.50,5208737.50,False,1.0,1.0,21152927
3,3363670,SB10336_VAST_0918-06A_island_5046,SB10336_VAST_0918-06A_component_5046a,0.378446,138.088608,0.000003,-3.426275,0.000003,1.530748,0.378446,...,0.000278,0.000000,0.000278,0.000278,12958704.00,12958704.00,True,1.0,1.0,21673913
4,3363670,SB11215_island_6860,SB11215_component_6860a,0.513548,138.088608,0.000003,-3.426275,0.000003,-1.667265,0.513548,...,0.000278,0.000000,0.000278,0.000278,12958704.00,12958704.00,True,1.0,1.0,21688501
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2062,3336055,SB10335_VAST_2233-06A_island_5241,SB10335_VAST_2233-06A_component_5241a,0.308610,341.022888,0.000003,-3.569260,0.000003,7.582205,0.308610,...,0.000278,0.000000,0.000278,0.000278,12958704.00,12958704.00,True,1.0,1.0,23843205
2063,3336055,SB10342_VAST_2233-06A_island_5293,SB10342_VAST_2233-06A_component_5293a,0.319335,341.022888,0.000003,-3.569260,0.000003,6.803195,0.319335,...,0.000278,0.000000,0.000278,0.000278,12958704.00,12958704.00,True,1.0,1.0,23846686
2064,3336055,SB11263_island_4806,SB11263_component_4806a,0.377347,341.022888,0.000003,-3.569260,0.000003,9.691784,0.377347,...,0.000278,0.000000,0.000278,0.000278,12958704.00,12958704.00,True,1.0,1.0,23850162
2065,3336055,SB11439_island_5435,SB11439_component_5435a,0.307101,341.022888,0.000003,-3.569260,0.000003,6.414573,0.307101,...,0.000278,0.000000,0.000278,0.000278,12958704.00,12958704.00,True,1.0,1.0,23853767


In [78]:
#This is pretty much how you turn the datetime from vsd into MJD. The difference between JD and MJD is 2400000.5
vsd.time[0].to_julian_date()-2400000.5

58602.12381995376

In [120]:
#This will convert the 'time' column in vsd into MjD
vsd['time'].apply(pd.Timestamp.to_julian_date)-2400000.5

0       58602.123820
1       58836.791302
2       59090.090720
3       58785.778001
4       58859.728603
            ...     
2062    58785.550827
2063    58786.476804
2064    58860.286485
2065    58866.313027
2066    58601.975029
Name: time, Length: 2067, dtype: float64

In [121]:
#and this will convert the 'i:jd' column into MjD. I'll rename the column afterwards to 'i:mjd' using the .rename() function
fsd['i:jd']-2400000.5

0      59840.487153
1      59840.426018
2      59840.389097
3      59840.295336
4      59840.287477
           ...     
764    58791.146794
765    58790.205833
766    58789.229572
767    58789.224572
768    58789.206968
Name: i:jd, Length: 10760, dtype: float64