sys_mapping

PyPI: https://pypi.org/project/sys-mapping/0.9/ Docs: https://sys-mapping.readthedocs.io/en/latest/#

Joint inference of multiplicative and additive systematics in galaxy clustering. Implements six decontamination methods with a shared contamination model, template normalisation, noise debiasing, and two-point function correction infrastructure.

Six decontamination methods

Method	Model	Parameter estimation
OLS	Additive	Ordinary least-squares pixel regression
ElasticNet	Additive	ℓ₁+ℓ₂-regularised regression, cross-validated
ISD-1	Additive–multiplicative	Iterative reweighted OLS, poly order 1
ISD-3	Additive–multiplicative	Iterative reweighted OLS, poly order 3
MCMC-add	Additive	Bayesian posterior sampling (emcee), b=0
MCMC-comb	Combined	Bayesian posterior sampling (emcee), free a, b

All methods produce per-galaxy weights (WEIGHT_SYS / WEIGHT_COMB) for use downstream in two-point function estimators.

Pipeline overview

Raw catalogs / randoms
        │
        ▼
scripts/build_systematic_maps.py   ← build HEALPix template maps (GAIA DR2, LS10)
        │
        ▼
    Template FITS files  (LS10_EBV_NSIDE_0064.fits, GAIA_nstar_faint_NSIDE_00064.fits, …)
        │
        ├──► scripts/run_ls10_analysis.py              ← LS10 BGS per-method weights + w(θ)
        │         (or scripts/run_all_methods_sequential.sh for phased multi-method run)
        ├──► scripts/run_mock_analysis.py              ← parameter recovery on mocks
        ├──► scripts/run_validation.py                 ← all methods × all scenarios
        ├──► scripts/run_systematic_tests.py           ← all methods × template configs
        └──► scripts/run_paper_validation.py           ← reproduce Berlfein 2024 Figs 2–8
                │
                ▼
        scripts/compute_sys_weights.py    ← batch per-galaxy weights → WEIGHT_SYS, WEIGHT_ADD, WEIGHT_COMB
                │
                ▼
        data/sys_weights/  ← consumed by sum_stat package

Method

Core method: Berlfein et al. 2024 (MNRAS 531, 4954).

Contamination model (Eq. 11–13)

The observed galaxy overdensity in pixel p is modelled as the linear combination of three nested models:

Model	Equation	Free params
Additive	`δ̂_g,p = δ_g,p + Σ_i a_i δ_{ti,p}`	a
Multiplicative	`δ̂_g,p = δ_g,p (1 + Σ_i a_i δ_{ti,p})`	a (b=a)
Combined	`δ̂_g,p = δ_g,p (1 + Σ_i b_i δ_{ti,p}) + Σ_i a_i δ_{ti,p}`	a, b

Likelihoods (Eq. 17–18)

Integrating out the true δ_g (assumed Gaussian with dispersion σ) gives:

ln L = −(N/2) ln(2πσ²) − Σ_p ln|1 + Σ_i b_i δ_{ti,p}|
       − (1/2σ²) Σ_p [(δ̂_g,p − Σ_i a_i δ_{ti,p}) / (1 + Σ_i b_i δ_{ti,p})]²

An optional skew-normal extension (Eq. 18) adds N ln 2 + Σ log Φ(γ δ_g,p / σ).

Template rotation (Appendix A)

When templates are correlated, the PCA-rotated basis δ'_t = Vᵀ δ_t (where V diagonalises the template covariance C = VDVᵀ) makes each contamination parameter independently identifiable. Parameters are transformed back to the original basis after inference.

Noise debiasing (Eq. 21)

Because E[â²] = a² + Var[â], the squared MLE is noise-inflated. The debiased estimate is:

ã²_i = max(â²_i − Var[â_i], 0)

Two-point correction (Eq. 15–16)

After debiasing, the observed angular correlation function is corrected:

ŵ_corr(θ) = [ŵ(θ) − Σ_i ã²_i C_{ti}(θ)] / [1 + Σ_i b̃²_i C_{ti}(θ)]

Model selection (Eq. 19)

Nested models are compared with the likelihood ratio test:

λ_LR = 2 [ln L(Θ̂) − ln L(Θ̂₀)] ~ χ²(r)

where r is the number of additional free parameters.

Scripts

`scripts/run_all_methods_sequential.sh`

Main orchestration script: runs all six methods on LS10 BGS in phases (fast methods first), writing incremental Sphinx documentation after each phase. Environment variables control behaviour:

# Full run (nohup so the terminal can be closed)
nohup bash scripts/run_all_methods_sequential.sh > logs/run_all.log 2>&1 &

# OLS-only quick preview (seconds)
METHODS="OLS" bash scripts/run_all_methods_sequential.sh

# Resume from MCMC-add (OLS and ElasticNet already done)
METHODS="MCMC-add MCMC-comb" bash scripts/run_all_methods_sequential.sh

# Force re-run all bins
FORCE=1 METHODS="OLS" bash scripts/run_all_methods_sequential.sh

Key environment variables: BINS, DEVICE (cpu/gpu), METHODS, FORCE, SKIP_LS10, CATALOG_DIR, LS10_NSIDE.

`scripts/build_systematic_maps.py`

Build HEALPix systematic template maps from GAIA DR2 or LS10 BGS randoms.

# LS10 maps (EBV, GALDEPTH, PSFSIZE, NOBS)
python scripts/build_systematic_maps.py --source ls10 --nside 32 64 128 256

# GAIA DR2 maps (G/BP/RP flux, star counts by magnitude)
python scripts/build_systematic_maps.py --source gaia --nside 64

`scripts/compute_sys_weights.py`

Batch pipeline: processes every *_DATA.fits / *_RAND.fits pair, runs all methods, writes per-galaxy systematic weights for the sum_stat package.

# All samples, default settings (NSIDE=64)
python scripts/compute_sys_weights.py

# GPU — vectorised walker evaluation
python scripts/compute_sys_weights.py --device gpu

# Single sample
python scripts/compute_sys_weights.py \
    --sample LS10_VLIM_ANY_10.5_Mstar_12.0_0.05_z_0.26_N_3263228

Weighting scheme

Column	Model	Formula
`WEIGHT_ADD`	Additive	`1 / max(1 + Σ_i a_i · t_i(p), 0.01)`
`WEIGHT_COMB`	Combined	`1 / max(1 + Σ_i b_i · t_i(p), 0.01)`
`WEIGHT_SYS`	Combined (recommended)	identical to `WEIGHT_COMB`

`scripts/run_ls10_analysis.py`

Full pipeline on LS10 BGS catalogs: overdensity → all methods → correction → w(θ) plots.

python scripts/run_ls10_analysis.py \
    --catalog-dir /path/to/BGS_VLIM_Mstar \
    --template-dir /path/to/systematics/ \
    --nside 64 --output-dir results/ls10/

`scripts/run_validation.py`

Multi-method, multi-scenario validation on synthetic mocks (none / additive / multiplicative / combined contamination). Applies all six methods and compares against known ground truth.

python scripts/run_validation.py --nside 64 --n-mocks 50 \
    --output-dir results/validation/

`scripts/run_systematic_tests.py`

Systematic test matrix: all methods × all template configurations (7 synthetic templates, tiers of single and combined contamination types).

python scripts/run_systematic_tests.py --nside 64

`scripts/run_mock_analysis.py`

Parameter recovery on mock galaxy catalogs (LRT statistics).

# Synthetic mocks (no data files required)
python scripts/run_mock_analysis.py --synthetic --n-mocks 20 --nside 32

# Real mocks
python scripts/run_mock_analysis.py \
    --mock-dir /path/to/mocks --rand-file /path/to/randoms.fits \
    --n-mocks 100 --output-dir results/mock_analysis/

`scripts/run_mock_analysis_diagnostic.py`

Deep-dive diagnostic figures for one synthetic mock (sky maps, template maps, weight histograms, per-template S/N).

python scripts/run_mock_analysis_diagnostic.py \
    --nside 64 --n-sys 5 --seed 0 \
    --output-dir results/mock_analysis_diagnostic/

`scripts/run_mock_analysis_progressive.py`

Progressive template contamination study: varies the number of contaminated templates (k=1,2,3) and contamination mode across mocks.

`scripts/run_mock_analysis_real_templates.py`

Validation with real GAIA stellar and LS10 depth templates on the actual LS10 south footprint (~22 000 pixels at NSIDE=64).

`scripts/run_paper_validation.py`

Reproduce Berlfein 2024 Figures 2–8 and Tables 2–3 on lognormal synthetic mocks.

# Quick test (NSIDE=32, 2 realisations)
python scripts/run_paper_validation.py --nside 32 --n-real 2 --output-dir /tmp/val

# Full paper settings (use HPC)
python scripts/run_paper_validation.py --nside 512 --n-real 119 \
    --n-walkers 250 --n-steps 1500 --n-burn 300

Documentation-generation scripts

Script	Output
`scripts/generate_results_ls10_summary.py`	`docs/results_ls10.rst` from `*_params.json` files
`scripts/generate_sample_pages.py`	Per-sample RST pages in `docs/`
`scripts/generate_recommendations.py`	`docs/results_ls10_recommendations.rst`
`scripts/plot_ls10_wtheta_corrected.py`	`docs/_static/results_ls10/wtheta_corrected_nside64.png`
`scripts/plot_runtime_scaling.py`	`docs/_static/runtime_scaling.png`
`scripts/patch_params_from_partials.py`	Patches `*_params.json` from per-method partial JSON files

`scripts/benchmark_corrfunc_vs_treecorr.py`

Performance comparison between Corrfunc and TreeCorr for the angular two-point function estimator.

Server execution (bash/ — background, all cores)

bash/ provides scripts that launch each pipeline in the background with nohup, writing timestamped logs to logs/ and a .pid file for process tracking. Each script sets OMP_NUM_THREADS=$(nproc) and JAX XLA flags.

# LS10 analysis
bash/ls10_analysis.sh
CATALOG_DIR=/path/to/BGS bash/ls10_analysis.sh --sample LS10_VLIM_ANY_10.5_...

# Paper validation
bash/paper_validation.sh
bash/paper_validation.sh --nside 512 --n-real 119   # full paper settings

# Mock analysis
bash/mock_analysis.sh --synthetic --n-mocks 20
bash/mock_analysis.sh --mock-dir /path/to/mocks --rand-file /path/to/randoms.fits

# Build template maps
bash/build_maps.sh --source ls10
bash/build_maps.sh --source gaia --nside 64 128

Monitoring:

tail -f logs/ls10_analysis_20260505_143200.log
kill -0 $(cat logs/ls10_analysis_20260505_143200.pid) && echo running || echo done
kill $(cat logs/ls10_analysis_20260505_143200.pid)

GPU notes

--device gpu lets JAX auto-detect the GPU backend. To enable GPU:

pip install 'jax[cuda12_pip]' \
    -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Installation

Install from PyPI:

pip install sys-mapping==0.9

Or install from source:

cd ~/sys_mapping
mamba env create -f environment.yml
mamba activate sys_map
pip install -e .

Tests

pytest tests/ -v

Test modules: test_contamination, test_correction, test_likelihood, test_maps, test_inference, test_model_selection, test_bootstrap, test_power_spectrum, test_regression, test_diagnostics, test_mocks, test_real_templates, test_accuracy, test_benchmarks, test_timing.

Documentation

Full Sphinx documentation is in docs/, covering: overview, installation, quickstart, methods with paper references, validation results, LS10 BGS analysis results (9 samples × 4 NSIDEs), and full API reference.

cd docs && make html
# open docs/_build/html/index.html

Package layout

Module	Public symbols
`contamination`	`apply_contamination`, `invert_contamination`, `compute_two_point_correction`, `pack_params`, `unpack_params`, `n_free_params`
`likelihood`	`make_log_likelihood` — factory returning a `@jax.jit` log-likelihood (Gaussian or skew-normal)
`maps`	`systematic_power_spectrum`, `generate_systematic_map`, `generate_systematic_maps`, `load_real_template`, `load_real_templates`, `pixelize_catalog`, `compute_overdensity`, `assign_template_values`
`inference`	`make_log_prob`, `run_mcmc`, `get_mle_params`, `get_param_variance_from_chain`, `get_param_covariance_from_chain`
`correction`	`debias_params` (Eq. 21), `rotate_templates` (App. A), `transform_params_from_rotated`, `correct_two_point_function`, `correct_power_spectrum_harmonic`
`model_selection`	`likelihood_ratio_test` → `LikelihoodRatioResult` (Eq. 19)
`bootstrap`	`block_bootstrap_variance` — spatial block bootstrap via HEALPix coarsening (Sec. 6.2); `jackknife_covariance`
`regression`	`elasticnet_contamination_fit`, `iterative_systematics_decontamination`, `method_comparison`, `run_decontamination`
`diagnostics`	`null_test_cross_correlations`, `snr_template_ranking`, `footprint_mask_diagnostics`
`mocks`	`generate_lognormal_field`, `make_galactic_mask`, `make_mock_catalog`, `make_mock_suite`, `MockCatalog`
`power_spectrum`	`measure_pseudo_cl`, `subtract_template_cl`, `harmonic_bias`, `mode_projection_bias`
`utils`	`compute_covariance_matrix` (Eq. 24), `compute_amplitude_bias` (Eq. 25-26), `measure_two_point_function` (Landy-Szalay via TreeCorr), `measure_two_point_function_corrfunc`, `measure_kk_correlation_treecorr`, `measure_kk_correlation_corrfunc`
`plotting`	`METHOD_COLORS`, `METHOD_LINESTYLES`, `METHOD_MARKERS`, `METHOD_LABELS`, `METHOD_ORDER` — shared colorblind-safe palette constants

All symbols above are importable directly from sys_mapping. LikelihoodRatioResult must be imported as from sys_mapping.model_selection import LikelihoodRatioResult.

Berlfein 2024 equation cross-reference

Equation	Description	Implementation
Eq. 11	Additive model	`contamination.py:apply_contamination` (model='additive')
Eq. 12	Multiplicative model	`contamination.py:apply_contamination` (model='multiplicative')
Eq. 13	Combined model	`contamination.py:apply_contamination` (model='combined')
Eq. 15-16	2PCF correction	`contamination.py:compute_two_point_correction`; `correction.py:correct_two_point_function`
Eq. 17	Gaussian log-likelihood + Jacobian	`likelihood.py:make_log_likelihood` (skew=False)
Eq. 18	Skew-normal log-likelihood	`likelihood.py:make_log_likelihood` (skew=True)
Eq. 19	Likelihood ratio test	`model_selection.py:likelihood_ratio_test`
Eq. 21	Noise debiasing	`correction.py:debias_params`
Eq. 24	Template covariance matrix	`utils.py:compute_covariance_matrix`
Eq. 25-26	Amplitude bias ΔĀ	`utils.py:compute_amplitude_bias`
App. A	PCA template rotation C = VDVᵀ	`correction.py:rotate_templates` + `transform_params_from_rotated`

Data files

Generated outputs (`data/sys_weights/` — git-ignored)

Produced by scripts/compute_sys_weights.py or scripts/run_ls10_analysis.py:

File	Contents
`{sample_id}_NSIDE{NNNN}_WEIGHTS.fits`	Per-galaxy systematic weights (`WEIGHT_SYS`, `WEIGHT_ADD`, `WEIGHT_COMB`)
`{sample_id}_NSIDE{NNNN}_params.json`	MAP parameters and chain diagnostics for all methods
`{sample_id}_NSIDE{NNNN}_partial_*.json`	Partial results from fast-method phases

Paper tables and simulation configs (`data/` — tracked)

File	Contents
`data/paper_tables/table1_mock_properties.csv`	KiDS-HOD mock survey properties
`data/paper_tables/table2_lrt_results.csv`	LRT model-selection fractions
`data/paper_tables/table3_cosmological_impact.csv`	Amplitude bias ΔĀ per model
`data/simulation_config/contamination_params_25sys.csv`	25 contamination parameters used in the paper
`data/simulation_config/angular_bins.csv`	Log-spaced angular bins (0.5–60 arcmin)

Equations vs. implementation — notes

Combined model: the code implements the exact paper form δ̂_g = δ_g(1+b·t) + a·t (Eq. 13). The full product (δ_g + a·t)(1+b·t) differs by the second-order term a·t·b·t, which is negligible for |a|, |b| ≪ 1.
Jacobian sign: the Gaussian likelihood subtracts Σ ln|1+b·t| (positive Jacobian of the forward map), consistent with Eq. 17.
Skew-normal: the CDF term is Σ log Φ(γ·r/σ) = Σ log_ndtr(z), which includes the N·ln 2 from the 2/σ prefactor. A previous version used Σ log_ndtr(√2·z) (off by √2 in the erf argument); this is corrected in commit 10c01d0.
Variance propagation through PCA: Var[a_orig] = Vᵀ · diag(Var[a_rot]) · V is correct when the rotated parameters are approximately uncorrelated.

Reference

Berlfein et al. 2024, Joint inference of multiplicative and additive systematics in galaxy clustering, MNRAS 531, 4954, arXiv:2401.12293.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
bash		bash
docs		docs
scripts		scripts
slurm		slurm
sys_mapping		sys_mapping
tests		tests
.coverage		.coverage
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

sys_mapping

Six decontamination methods

Pipeline overview

Method

Contamination model (Eq. 11–13)

Likelihoods (Eq. 17–18)

Template rotation (Appendix A)

Noise debiasing (Eq. 21)

Two-point correction (Eq. 15–16)

Model selection (Eq. 19)

Scripts

scripts/run_all_methods_sequential.sh

scripts/build_systematic_maps.py

scripts/compute_sys_weights.py

scripts/run_ls10_analysis.py

scripts/run_validation.py

scripts/run_systematic_tests.py

scripts/run_mock_analysis.py

scripts/run_mock_analysis_diagnostic.py

scripts/run_mock_analysis_progressive.py

scripts/run_mock_analysis_real_templates.py

scripts/run_paper_validation.py

Documentation-generation scripts

scripts/benchmark_corrfunc_vs_treecorr.py

Server execution (bash/ — background, all cores)

Installation

Tests

Documentation

Package layout

Berlfein 2024 equation cross-reference

Data files

Generated outputs (data/sys_weights/ — git-ignored)

Paper tables and simulation configs (data/ — tracked)

Equations vs. implementation — notes

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`scripts/run_all_methods_sequential.sh`

`scripts/build_systematic_maps.py`

`scripts/compute_sys_weights.py`

`scripts/run_ls10_analysis.py`

`scripts/run_validation.py`

`scripts/run_systematic_tests.py`

`scripts/run_mock_analysis.py`

`scripts/run_mock_analysis_diagnostic.py`

`scripts/run_mock_analysis_progressive.py`

`scripts/run_mock_analysis_real_templates.py`

`scripts/run_paper_validation.py`

`scripts/benchmark_corrfunc_vs_treecorr.py`

Generated outputs (`data/sys_weights/` — git-ignored)

Paper tables and simulation configs (`data/` — tracked)

Packages