Probabilistic Autoencoder (PAE) implementation for SED modeling and redshift estimation of SPHEREx spectrophotometry. This software release corresponds to Feder+2026 (arXiv:xx).
This repository focuses on the main code to train and execute PAESpec on multi-band photometry. Large data products are not bundled and must be provided locally by users.
We recommend creating a dedicated Python environment for PAESpec rather than installing into your base environment. This avoids version conflicts with other JAX/ML projects.
# Choose any environment directory name you like.
ENV_NAME=.venv-paespec
python -m venv "$ENV_NAME"
source "$ENV_NAME/bin/activate"
pip install -r requirements.txtThe main dependency groups are:
- JAX ecosystem:
jax,flax,optax,blackjax,jaxopt - Flow and model tooling:
flowjax,equinox,distrax,paramax - Scientific Python stack:
numpy,scipy,pandas,astropy,matplotlib - Utilities:
PyYAML(YAML config support)
scripts/train_pae_autoencoder.pyscripts/redshift_job_mock_batched.pyscripts/generate_redshift_plots.py
Train PAE model (YAML-driven):
python scripts/train_pae_autoencoder.py \
--config-yaml configs/public_mock_template.yamlRun mock redshift evaluation:
python scripts/redshift_job_mock_batched.py \
--config-yaml configs/public_mock_template.yamlGenerate summary redshift plots:
python scripts/generate_redshift_plots.py \
--datestr my_first_paespec_eval \
--no-showCollate per-batch outputs (if needed):
python scripts/collate_batched_results.py \
--datestr my_first_paespec_evalIf you are reading or extending the code, these are the central call points:
models/pae_jax.pyinitialize_PAE(...): main model setup used by redshift inference scripts
sampling/sample_pae_batch_refactor.pysample_mclmc_wrapper(...): core MCLMC sampling wrapper
training/train_ae_jax.pyrun_ae_sed_fit_jax(...): AE training loop
models/flow_jax.pyfit_flow_to_latents_jax(...): flow fitting used after AE latent extraction
The training and mock-redshift scripts support --config-yaml.
Example:
python scripts/train_pae_autoencoder.py \
--config-yaml configs/public_mock_template.yaml
python scripts/redshift_job_mock_batched.py \
--config-yaml configs/public_mock_template.yamlUse configs/public_mock_template.yaml as a starting point. Required knobs such
as run_name and sources_per_task can be set in YAML instead of on CLI.
Minimal working YAML example:
common:
filter_set: spherex_filters102/
sig_level_norm: 0.01
training:
run_name: my_first_paespec_run
redshift_mock:
run_name: my_first_paespec_run
sources_per_task: 1000
datestr: my_first_paespec_evalIf you are starting from scratch, use this flow:
- Copy
configs/public_mock_template.yamland edit it for your run. - Set required fields:
training.run_nameredshift_mock.sources_per_taskredshift_mock.run_name(usually the same model run name as training)
- Optionally set
redshift_mock.datestrto control output naming. - Run the two commands above with your edited YAML file.
For details on what each section means, see docs/PUBLIC_RELEASE_GUIDE.md
(especially "YAML-first usage" and "Required keys").
For full parameter definitions, start at docs/CONFIG_REFERENCE.md.
The shell wrapper scripts/run_redshift_job_mock.sh now supports configurable
output roots via environment variables:
RESULTS_BASE_DIRorSPAE_RESULTS_BASE_DIRFIGURES_BASE_DIRorSPAE_FIGURES_BASE_DIR
Defaults are local repo paths under results/ and figures/.
docs/PUBLIC_RELEASE_GUIDE.mdfor public workflow guidancedocs/CONFIG_REFERENCE.mdas the reference hub for all parameter definitionsdocs/config/README.mdindex for split config docsdocs/config/YAML_SCHEMA.mdfor YAML sections and required keysdocs/config/TRAINING_REFERENCE.mdfor training argumentsdocs/config/INFERENCE_REFERENCE.mdfor mock redshift argumentsdocs/config/PLOTTING_REFERENCE.mdfor plotting and diagnostics argumentsdocs/config/ENVIRONMENT_REFERENCE.mdfor shell/environment variablesconfigs/public_mock_template.yamlfor config schema template