An end-to-end active learning anomaly detection pipeline for time-domain astronomy (ZTF). It uses RTF autoencoders for rapid candidate filtering, Bayesian light curve fitting for physics extraction, and LLMs for real-time scientific triage and reporting.
SkyPortal UI Integration: The frontend React widget (
CYOAWidget.jsx) is being developed as a separate contribution to the main SkyPortal repository. WIP Pull Request: skyportal/skyportal#6068
The pipeline is designed to run in an HPC environment (NCSA Delta) using SLURM. It executes 4 modular stages sequentially:
Connects to nightly ZTF archive tarballs from ztf.uw.edu, unpacks Avro files, and structures them into contiguous alerts.npy arrays for fast sequential reads.
To run manually:
sbatch ingest/slurm_unpack.sh 20260408Sweeps the RTF autoencoder over 150,000+ objects per night using multi-GPU batches. Calculates anomaly scores via Isolation Forest and reconstruction thresholding to flag the top 5% of transients.
To run manually:
sbatch inference/slurm_mass_infer.sh 20260408Passes anomalous targets to the Rust-based Bayesian physical modeling backend (boom-fit-batch). Extracts thermal evolution, rise times, cooling rates, and standardized constraints (e.g. dm15, TDE decay power-law slopes).
To run manually:
sbatch precision_fitting/slurm_precision_fit.sh 20260408Transforms extracted physics parameters into expert-level scientific context prompts. Sends objects to an LLM (Llama-3.3-70B) for 10-class transient classification, confidence scoring, and scientific reasoning.
To run manually:
export GROQ_API_KEY="your_api_key_here"
python triage/triage.py \
--json-dir /work/hdd/bcrv/kmajithia/sweep_results/20260408/applecider/json \
--anomaly-csv /work/hdd/bcrv/kmajithia/sweep_results/20260408/threshold_anomalies.csv \
--output /work/hdd/bcrv/kmajithia/sweep_results/20260408/triage_verdicts.jsonl \
--api-key $GROQ_API_KEY \
--top-n 50Pushes vetted anomalies into the Fritz SkyPortal database via POST /api/sources/{oid}/annotations. Human reviewers provide feedback natively inside the Fritz interface. The retrain.py loop periodically sweeps those annotations to synthesize new training datasets (human clicks, LLM reviews, and statistical pseudo-labels) and update the per-group classifiers using HistGradientBoosting.
To run manually:
export FRITZ_TOKEN="your_fritz_token"
python active_learning/push_to_skyportal.py --verdicts triage_verdicts.jsonl
python active_learning/retrain.py --group-id 1947Set your environment variables:
export GROQ_API_KEY="your_groq_key"
export FRITZ_TOKEN="your_fritz_token"Then execute the master orchestrator for the current date:
chmod +x run_nightly_cron.sh
./run_nightly_cron.shThis script acts as a SLURM dispatcher, waiting for each HPC job to complete before launching the next stage.
pip install numpy pandas scikit-learn pyarrow groq requests fastavroCargo (Rust) is required for compiling the AppleCiDEr fitting engine in the precision fitting stage.