# Mini Spark Broker: LSST alert system & ZTF alerts

author: **Julien Peloton** [@JulienPeloton](https://github.com/JulienPeloton)  
Last Verifed to Run: 2019-02-04  

Welcome to the first part of this broker bootcamp!
The purpose of this notebook is to describe the LSST alert system. 
For the purpose of this bootcamp, we will be using the ZTF public alerts. This repo contains all you need to download the alerts, make a stream with it, and play with the stream!

**Useful Links:**

* https://github.com/lsst-dm/alert_stream
* https://docs.docker.com/compose/
* https://kafka.apache.org/

**Reference documents mentionning the LSST alert system:**

* LDM-612: Plans and Policies for LSST Alert Distribution: [repo](https://github.com/lsst/LDM-612)
* DMTN-028: Benchmarking a distribution system for LSST alerts: [webpage](https://dmtn-028.lsst.io/)
* DMTN-081: Deploying an alert stream mini-broker prototype: [webpage](https://dmtn-081.lsst.io/)
* DMTN-092: Alert Production Pipeline Interfaces: [webpage](https://dmtn-092.lsst.io/)
* DMTN-093: Design of the LSST Alert Distribution System: [webpage](https://dmtn-093.lsst.io/)

## LSST alert system

### Definition

After each visit, and mainly based on image differencing, the LSST's alert-time Prompt Processing pipelines will report alerts. Alerts will be collected and issued as stream by the LSST alert system. Third-party community brokers will then receive the full stream of alerts and refine the selection of events of interest, extracting relevant scientfic events. However given the high volume of data to be transfered, only a limited number of brokers will be allowed to connect to the full stream of alerts. Therefore, a simple filtering service will also be provided by LSST for smaller use of the alert stream. In this note, we only focus on a community broker receiving the full stream of alerts.

Data processing leading to alerts will occur in the LSST Data Facility (LDF) at the National Center for Supercomputing Applications (NCSA) in Illinois, USA. The LDF also hosts the alert stream feeds to community brokers.

Note from the LDM-612: _Due to the need for Data Release Production-derived templates, Alert Production cannot run at full scale and full fidelity during commissioning nor the first year of operations. LSST DM is currently investigating options for Alert Production in year one._

### Anatomy of an alert: Apache Avro

The format chosen for the alert is Apache Avro. Each file has a JSON header containing the metadata, and data is serialised in a compact binary format.
An alert contains information about the dectection itself (ID, timestamp, ...) but also historical lightcurve, cutout images, timeseries features, and other contextual information. An alert is typically O(100) KB.

### Alert distribution: Apache Kafka

Apache Kafka is a distributed streaming platform. From our perspective (complete documentation [here](https://kafka.apache.org/documentation/)), that is a client of a Kafka cluster, the most relevant aspects of it are:

* Kafka publishes streams of records (= the alerts). 
* Kafka stores streams of records in a fault-tolerant durable way (= keep alive the alerts some time with guarantee it will not disappear). 
* For each stream of records, the Kafka cluster maintains a partitioned log (= alerts are distributed amond several machines).

In addition, two important properties:

* Communications between Kafka servers and the outside world are simply done using a language agnostic TCP protocol.
* The Kafka cluster durably persists all published records—whether or not they have been consumed—using a configurable retention period.


Kafka has its own naming of things. Here is an attemps to give definitions of most commonly used things, and relate those to LSST alert system components:

| Kafka name   | Kafka definition  | Corresponding LSST alert system object |
|--------------|--------------------------|---------------------------------|
| topic        | category or feed name to which records are published (= stream of records)    | category or feed name to which alerts are published (= stream of alerts)
| record       | an entry to the commit log. Each record consists of a key, a value, and a timestamp.             | alert

Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.
Each partition is an ordered, immutable sequence of records that is continually appended to a structured commit log. The records in the partitions are each assigned a sequential `id` number called the `offset` that uniquely identifies each record within the partition. 

### Numbers and Requirements

#### Alert Stream size

Alerts are issued within 60 seconds of the shutter closure after each visit. Each visit produces O(10,000) alerts, each alert is O(100) KB, and there is O(30) seconds in between two visits. For 8 hours observation per night, this leads to a stream of O(1) TB per night.

#### Number of Brokers with full stream access

An allocation of 10 Gbps is baselined for alert stream transfer from the LDF. Given the alert stream size, this constrains the number brokers that can receive the full stream of alerts to a few.

#### Data retention periods

Keeping alert data available for some time is generally a good idea, as it allows a longer time window to analyse alerts if needed. All alerts will be stored in an archive in the DACs (incl. CC-IN2P3). *Need to know when this is updated/done and accessible*.

#### Shortest time to alert data products

A number of measurements will be stored at the alert moment in the Prompt Product Database (DIASource, etc.) and some others will be available within 24 hours (e.g. forced photometry, survey precovery for a limited number of objects, processed visit images). This service will be accessible through the LSST Science Platform.


## ZTF alerts

Documentation can be found at: https://github.com/ZwickyTransientFacility/ztf-avro-alert