# Hands-on Session: Writing a Filter

Welcome! In this notebook, you will learn to write a simple filter to search for transients in a sample set of alerts. 


As you learned this morning, the alert packets are stored in .avro files. For this session, a set of .avro files has been prepared for you. There are some real transients in this set of files (!) but there are way too many files to search by eye -- you will have to write a filter! As you do so, you will find it useful to refer to the alert schema, since these are the fields you will be able to use:
https://zwickytransientfacility.github.io/ztf-avro-alert/schema.html

We start by importing fastavro (to read the alert packets)
and checking out how many alerts there are in the dataset.

In [None]:
import fastavro
import glob

# You will have to modify this path to wherever you stored the alerts
# in the last session
files = glob.glob("../alerts/20180713/*.avro")
print("There are %s alerts in this dataset" %len(files))

As you can see, there are nearly 1000 alerts in this dataset. In practice, ZTF produces nearly a million alerts per night -- and LSST will produce nearly 10 million alerts per night!

We will put our filter into a function called my_filter. Ultimately, our filter will have several components. The first few components are important for almost any filter. (1) We want the transient to be real and not bogus, (2) we want to make sure that it's not an artifact from a bright star, and (3) we want to make sure it's a positive subtraction. Below is a filter that will apply these cuts.

In [None]:
def my_filter(current_observation):
    """ A filter to reduce the 835 alerts into a much smaller number! """
    
    # First part: initialize the filtering criteria.
    # Notice that these are initially all set to False.
    # If the source passes our criteria, 
    # we will set each of these accordingly and then check them at the end.
    passes_filter = False
    positivesubtraction = False
    real = False
    brightstar = False
    
    # If a source is a positive subtraction, then the field 'isdiffpos'
    # is set to either 't' (True) or '1' (another way to say True)
    isdiffpos = current_observation['candidate']['isdiffpos']
    if (isdiffpos and (isdiffpos == 't' or isdiffpos == '1')):
        positivesubtraction = True

    # To decide that a source is real, we use the real-bogus (RB) score
    # as well as some metrics that describe the shape of the PSF.
    rbscore = current_observation['candidate']['rb']
    fwhm = current_observation['candidate']['fwhm']
    nbad = current_observation['candidate']['nbad']
    m_now = current_observation['candidate']['magpsf']
    m_app = current_observation['candidate']['magap']
    psfminap = m_now - m_app
    if (rbscore and rbscore > 0.3 and fwhm > 0.5 and nbad < 5 and (psfminap < 0.75 or psfminap > (-0.75))):
        real = True

    # Here are some complicated criteria to get rid of artifacts from bright stars.
    # We developed this as part of the ZTF commissioning period...
    # don't worry too much about it.
    sgscore = current_observation['candidate']['sgscore1']
    sgscore2 = current_observation['candidate']['sgscore2']
    sgscore3 = current_observation['candidate']['sgscore3']
    distpsnr1 = current_observation['candidate']['distpsnr1']
    distpsnr2 = current_observation['candidate']['distpsnr2']
    distpsnr3 = current_observation['candidate']['distpsnr3']
    srmag = current_observation['candidate']['srmag1']
    srmag2 = current_observation['candidate']['srmag2']
    srmag3 = current_observation['candidate']['srmag3']
    if (
        (distpsnr1 and srmag and distpsnr1 < 20 and srmag < 15.0 and srmag > 0 and sgscore > 0.49) 
        or (distpsnr2 and srmag2 and distpsnr2 < 20 and srmag2 < 15.0 and srmag2 > 0 and sgscore2 > 0.49) 
        or (distpsnr3 and srmag3 and distpsnr3 < 20 and srmag3 < 15.0 and srmag3 > 0 and sgscore3 > 0.49)):
        brightstar = True
    
    # Now that we have a basic filter, we can return whether or not
    # a source passed our criteria.
    passes_filter = real and positivesubtraction and ((not brightstar))
    return passes_filter

Let's try out the filter, and count how many survive out of the original 835.

In [None]:
count = 0
for f in files:
    with open(f, 'rb') as fo:
        reader = fastavro.reader(fo)
        candidate = next(reader, None)
        result = my_filter(candidate)
        if result:
            count += 1
print("%s candidates passed the filter" %count)

About 1/4 passed the filter. That's still a lot to look through by eye.
To make it more manageable, let's start introducing some astrophysical requirements.
In particular, we're going to try to find some supernovae. Supernovae are located in galaxies,
so one requirement is that our source shouldn't be co-located with (say, within 2 arcseconds of) a star.
Below is the same filter as above, with an additional requirement to not be a star.

In [None]:
def my_filter(current_observation):
    """ A filter to reduce the 835 alerts into a much smaller number! """
    
    # First part: initialize the filtering criteria.
    # Notice that these are initially all set to False.
    # If the source passes our criteria, 
    # we will set each of these accordingly and then check them at the end.
    passes_filter = False
    positivesubtraction = False
    real = False
    brightstar = False
    nopointunderneath = True
    
    # If a source is a positive subtraction, then the field 'isdiffpos'
    # is set to either 't' (True) or '1' (another way to say True)
    isdiffpos = current_observation['candidate']['isdiffpos']
    if (isdiffpos and (isdiffpos == 't' or isdiffpos == '1')):
        positivesubtraction = True

    # To decide that a source is real, we use the real-bogus (RB) score
    # as well as some metrics that describe the shape of the PSF.
    rbscore = current_observation['candidate']['rb']
    fwhm = current_observation['candidate']['fwhm']
    nbad = current_observation['candidate']['nbad']
    m_now = current_observation['candidate']['magpsf']
    m_app = current_observation['candidate']['magap']
    psfminap = m_now - m_app
    if (rbscore and rbscore > 0.3 and fwhm > 0.5 and nbad < 5 and (psfminap < 0.75 or psfminap > (-0.75))):
        real = True

    # Here are some complicated criteria to get rid of artifacts from bright stars.
    # We developed this as part of the ZTF commissioning period...
    # don't worry too much about it.
    sgscore = current_observation['candidate']['sgscore1']
    sgscore2 = current_observation['candidate']['sgscore2']
    sgscore3 = current_observation['candidate']['sgscore3']
    distpsnr1 = current_observation['candidate']['distpsnr1']
    distpsnr2 = current_observation['candidate']['distpsnr2']
    distpsnr3 = current_observation['candidate']['distpsnr3']
    srmag = current_observation['candidate']['srmag1']
    srmag2 = current_observation['candidate']['srmag2']
    srmag3 = current_observation['candidate']['srmag3']
    if (
        (distpsnr1 and srmag and distpsnr1 < 20 and srmag < 15.0 and srmag > 0 and sgscore > 0.49) 
        or (distpsnr2 and srmag2 and distpsnr2 < 20 and srmag2 < 15.0 and srmag2 > 0 and sgscore2 > 0.49) 
        or (distpsnr3 and srmag3 and distpsnr3 < 20 and srmag3 < 15.0 and srmag3 > 0 and sgscore3 > 0.49)):
        brightstar = True
    
    # Not within 2 arcseconds of a star
    if (sgscore and distpsnr1 and sgscore > 0.76 and distpsnr1 < 2):
        nopointunderneath = False
    
    # Now that we have a basic filter, we can return whether or not
    # a source passed our criteria.
    passes_filter = real and positivesubtraction and ((not brightstar)) and nopointunderneath
    return passes_filter

How many candidates pass the filter this time?

In [None]:
count = 0
for f in files:
    with open(f, 'rb') as fo:
        reader = fastavro.reader(fo)
        candidate = next(reader, None)
        result = my_filter(candidate)
        if result:
            count += 1
print("%s candidates passed the filter" %count)

This is getting more manageable. Now, try copying the filter code above and adding an additional
criterion: the duration of the transient. Let's say that we expect a supernova to last less than a month.
Hint: use the fields 'jdendhist' and 'jdstarthist' (see the alert schema info page linked above). 


In [None]:
def supernovae(current_observation):
    """ A filter to reduce the 835 alerts into a much smaller number of candidate supernovae! """
    
    # copy the code from above, but this time add the criterion of the duration of the transient

How many sources were left this time? What are their ZTF names?

In [None]:
count = 0
for f in files:
    with open(f, 'rb') as fo:
        reader = fastavro.reader(fo)
        candidate = next(reader, None)
        result = supernovae(candidate)
        if result:
            print('%s passed the filter' %candidate['objectId'])
            count += 1
print("%s candidates passed the filter" %count)

What other criteria can you come up with?