# FRBID prediction phase on new candidate files

    Authors : Zafiirah Hosenie
    Email : zafiirah.hosenie@gmail.com or zafiirah.hosenie@postgrad.manchester.ac.uk
    Affiliation : The University of Manchester, UK.
    License : MIT
    Status : Under Development
    Description : Python implementation for FRBID: Fast Radio Burst Intelligent Distinguisher.
    This code is tested in Python 3 version 3.6


In [1]:
import warnings
warnings.filterwarnings("ignore")
import os
import numpy as np
import pandas as pd
from frbid_code.model import compile_model,model_save 
import matplotlib.pylab as plt
from keras.utils import np_utils
from time import gmtime, strftime
from frbid_code.util import makedirs, ensure_dir
from frbid_code.prediction_phase import load_candidate, FRB_prediction


Using TensorFlow backend.


In [2]:
# Parameters to change
data_dir = './data/testing_set/' # The directory where the hdf5 candidates are located
result_dir = './data/results_csv/' # The directory where the csv file after prediction will be saved
model_cnn_name = 'MULTIINPUT' # The network name choose from: 'MULTIINPUT'
probability = 0.5 # The detection threshold

# Load the new candidates
- data_dir: The directory that contains the hdf5 files
- n_images: can either take str 'dm_fq_time', 'dm_time', 'fq_time'

In [3]:
# test, ID_test = load_candidate(data_dir=data_dir ,n_images=n_images)
test_dm, test_freq, ID_test    = load_candidate(data_dir = data_dir, n_images = 'dm_time_fq_time')
# test_freq, ID_test    = load_candidate(data_dir = data_dir, n_images = 'fq_time')

print("Total number of candidate instances: {}".format(str(len(ID_test))))
print("The Shape of the DM test set is {}".format(test_dm.shape))
print("The Shape of the Freq test set is {}".format(test_freq.shape))

Total number of candidate instances: 1291
The Shape of the DM test set is (1291, 256, 256, 1)
The Shape of the Freq test set is (1291, 256, 256, 1)


# Prediction on new candidate files
Here we will load the pre-existing train model using the parameter 
INPUT:
- model_name: 'MULTIINPUT'
- X_test : Image data should have shape (Nimages,256,256,1). This will vary depending on the criteria one use for n_images.
- ID: The candidate filename
- result_dir: The directory to save the csv prediction file

OUTPUT:
- overall_real_prob: An array of probability that each source is FRB. Value will range between [0 to 1.0]
- overall_dataframe: A table with column candidate name of all sources and its associated probability that it is a FRB source and its labels


In [4]:
overall_real_prob, overall_dataframe = FRB_prediction(model_name=model_cnn_name, X_test_dm=test_dm, X_test_freq=test_freq, ID=ID_test,result_dir=result_dir,probability=probability)


Instructions for updating:
keep_dims is deprecated, use keepdims instead
Loaded model:MULTIINPUT from disk


In [5]:
# The transient ID for each candidate
ID_test

array(['59049.5164386894_DM_42.98_beam_149C_frbid.hdf5',
       '59049.532366364_DM_114.20_beam_666C_frbid.hdf5',
       '59049.5356864285_DM_279.06_beam_477C_frbid.hdf5', ...,
       '59149.261510809105_DM_48.81_beam_0I_frbid.hdf5',
       '59149.2615524036_DM_36.84_beam_209C_frbid.hdf5',
       '59149.2615802313_DM_23.64_beam_206C_frbid.hdf5'], dtype='<U50')

In [6]:
# The probability that each source is a real source: It varies from 0 to 1
overall_real_prob

array([1.00000000e+00, 1.71089638e-02, 4.81271098e-04, ...,
       3.27687982e-26, 0.00000000e+00, 1.04439086e-23], dtype=float32)

In [7]:
# A dataframe that contains the transient ID and its probability that it is a Real source
overall_dataframe.iloc[1900:,:]

Unnamed: 0,candidate,probability,label
