# Introduction to filtering an alert stream

For the hackathon closing out ZTF Summer School 2024, your challenge is to create your own alert broker which can consume a stream of alerts and filter for interesting objects.

This is a starter notebook for the hackathon. It will show you how to read a simulated ZTF alert stream and give some examples of how you can build your own filtering for those alerts. Previous notebooks from this week, for example "ztf_alert_filtering" from Day 1, provide additional examples that you may find useful to apply to this challenge.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Part 1 - Reading the alert stream

Note: here we simulate our Kafka consumer by reading the data with an iterator function. Some people will find that their computers can run the true kafka consumer on these couple hundred thousand alerts on the order of minutes, but we use the iterator to ensure feasible timescales for all users.

In [None]:
# TO ADD: count how many files are in data/20240707, ie how many alerts there were on 2024/07/07 that you will be processing in this challenge



In [4]:
# run this cell to create the iterator
import pathlib
import time

def get_iterator(date):
    path = pathlib.Path("data") / date

    # grab the full list of alert files and create an iterator
    alert_files = sorted(list(path.glob("*.avro")))

    for file in alert_files:
        content = open(file, mode='rb').read()
        yield content

iterator = get_iterator("20240707")

In [17]:
# run this to get a sample of alerts from the night of 2024/07/07

import time

def get_n_alerts(alerts_iterator, n_events):
    start = time.time()
    events = []
    try:
        for _ in range(n_events):
            events.append(next(alerts_iterator))
    except StopIteration:
        print("No more events available.")
    
    print(f"Processed {n_events} alerts in {time.time() - start} seconds")

    return events

test_alerts = get_n_alerts(iterator, 1000)  

Processed 1000 alerts in 1.368638515472412 seconds


In [None]:
#TO ADD: Get all of the alerts from the night of 2024/07/07

In [18]:
import io
from fastavro import reader

def decode_alerts(message):
    bytes_io = io.BytesIO(message)
    bytes_io.seek(0)
    decoded_msg = reader(bytes_io)
    return [record for record in decoded_msg]

# Assuming test_alerts is a list of Avro messages
test_dict = [decode_alerts(alert) for alert in test_alerts]

# Now test_dict contains the actual contents of the Avro messages

In [20]:
len(test_dict)

1000

In [None]:
#TO ADD: Get the id for one of the alerts, and look it up on your choice of ZTF alert broker, for example alerce (https://alerce.online)
# you would be very lucky to find something interesting by chance, but this will be a helpful reference once you have designed a filter



# Part 2 - Filtering the alert stream

The next cells will start with some general suggested steps in building your filter. Some suggested techniques for filtering are:

- Make cuts on [alert parameters](https://zwickytransientfacility.github.io/ztf-avro-alert/schema.html) (as you did in ztf_summer_school_2024/lectures/01/ztf_alert_filtering.ipynb)

- Use prv_candidates to understand the alert history, and do things like compute the rate that transients are rising and falling (hint: np.polyfit)

- Use a classifier trained in a previous notebook or train a new one

- Try something new !

In [None]:
# To Add: it is recommended to start by printing all of the fields in the alert schema, and look up things you don't undertand in the alert parameters
# link above.


In [None]:
# To Add: As you did on the first day, one important first step in your filter will be to filter out "bogus" alerts. 
# Science groups actually have a wide range of ways to do this, from a simple cut on drb to much longer filters that use multiple different fields.
# Either (a) make a decision on what value to cut on drb at, or (b) design your own set of conditions to removes bogus alert 

alerts_real = alerts[alerts['candidate.drb'] > 0.5]

In [None]:
# To Add: Another good first step of your filter would be to filter out solar system objects. 
# Hint: try using the fields ssdistnr and magnr to create such a filter. How many objects are you cutting out with this filter?



# Time to start!

Reminder: your challenge is to find one or multiple of these objects in with filters that take minimal time to run.

1. Supernovae
2. AGN
3. Variable stars
4. Aliens ??