You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently ZTF alerts are inserted into the archive database along with the index of the Kafka partition they were read from. When reading back from the archive, each client gets an index, and reads only alerts with that partition id. Since the UW Kafka mirror has 16 partitions per topic, this means that you must use exactly 16 AlertProcessors to consume alerts from the archive. Using more will result in idle workers, and using fewer will result in seeing only e.g. 7/16 of the alerts.
Instead, it should be possible to split alerts from the archive across an arbitrary number of AlertProcessors. The most straightforward way to do this requires implementing a shared queue in PostgreSQL. Here's a sketch of the design:
Similar to Kafka, consumers will identify themselves by a shared group name. Unlike Kafka, though, the group name is associated with a query, so different groups can consume different sets of alerts.
Upon start, each consumer will attempt to insert a new row in the read_queue_groups table, which has a unique constraint on the group name. At most one consumer will succeed; this one is the alert query executor and will proceed to step 3. All others will proceed to step 4. If the queue for this group already existed, all consumers proceed to step 4.
The alert query executor runs the query (e.g. an observation date range and/or cone search), and inserts rows of (group_id,array(alert_id)) into the read_queue table, where the size of the array is a predetermined block size, e.g. 5000. The block size should be large enough that the row is much larger postgres' fixed tuple overhead, but not so large that hours of work would be lost if the consumer were shut down before it processed the block.
Currently ZTF alerts are inserted into the archive database along with the index of the Kafka partition they were read from. When reading back from the archive, each client gets an index, and reads only alerts with that partition id. Since the UW Kafka mirror has 16 partitions per topic, this means that you must use exactly 16 AlertProcessors to consume alerts from the archive. Using more will result in idle workers, and using fewer will result in seeing only e.g. 7/16 of the alerts.
Instead, it should be possible to split alerts from the archive across an arbitrary number of AlertProcessors. The most straightforward way to do this requires implementing a shared queue in PostgreSQL. Here's a sketch of the design:
read_queue_groups
table, which has a unique constraint on the group name. At most one consumer will succeed; this one is the alert query executor and will proceed to step 3. All others will proceed to step 4. If the queue for this group already existed, all consumers proceed to step 4.(group_id,array(alert_id))
into theread_queue
table, where the size of the array is a predetermined block size, e.g. 5000. The block size should be large enough that the row is much larger postgres' fixed tuple overhead, but not so large that hours of work would be lost if the consumer were shut down before it processed the block.read_queue
for a block of alert_ids usingFOR UPDATE SKIP LOCKED
to atomically acquire a lock an unclaimed item and release it if the consumer dies before it can process the block.The text was updated successfully, but these errors were encountered: