feat: CSV I/O for PointDataset for cluster-scale workflows by Jammy2211 · Pull Request #447 · PyAutoLabs/PyAutoLens

Jammy2211 · 2026-04-19T14:43:10Z

Summary

Strong-lens cluster workflows can involve tens or hundreds of multiply-imaged
background sources. Editing that many per-source JSON files by hand is
unwieldy — one row per image grouped by name in a spreadsheet is a much
better hand-editing surface.

This PR adds a CSV I/O path for PointDataset alongside the existing JSON
path. JSON remains the canonical exact-round-trip format used by the
modeling scripts; CSV is the hand-editable cluster form. No new runtime
dependencies — csv stdlib only, no pandas.

Closes #446.

API Changes

Added: PointDataset.to_csv(file_path), classmethod
PointDataset.from_csv(file_path, name=None), and module-level
autolens.output_to_csv(datasets, file_path) /
autolens.list_from_csv(file_path) — CSV I/O, one row per observed image,
grouped by name; optional flux / time_delay column groups dropped
when blank for every row.
Docstring fix: drop the stale PointDict bullet in
autolens/point/dataset.py module docstring (that class no longer
exists); document the JSON-vs-CSV surfaces instead.
No removals, renames, or signature changes. Purely additive.

See full details below.

Test Plan

test_autolens/point/test_dataset.py — round-trip tests for:
- positions-only
- positions + fluxes + fluxes_noise_map
- positions + time_delays + time_delays_noise_map
- all optional fields populated
- heterogeneous list (one dataset with fluxes, one without) — exercises
  optional-column handling at the list level
- multi-group from_csv requires explicit name=, single-group
  auto-picks
Module-level manual round-trip of every case (pytest couldn't load in
the sandbox: installed autogalaxy predates LensCalc, the conftest
import fails before collection — not an issue with the new code).
CI / maintainer-side pytest test_autolens/point/test_dataset.py

Full API Changes (for automation & release notes)

Added

autolens.PointDataset.to_csv(file_path: str) -> None — write a single
dataset as one CSV, one row per observed image.
classmethod autolens.PointDataset.from_csv(file_path: str, name: Optional[str] = None) -> PointDataset
— load a single dataset; name= required when the CSV has multiple
groups, auto-picked when it has exactly one.
autolens.output_to_csv(datasets: List[PointDataset], file_path: str) -> None
— write a list of datasets to a single CSV; optional flux / time-delay
column groups are included when any dataset carries those values and
left blank for datasets that do not.
autolens.list_from_csv(file_path: str) -> List[PointDataset] — load a
list of datasets grouped by the name column in order of first
appearance; raises ValueError on partially-populated optional columns
within a group.

CSV schema (one row per image)

name, y, x, positions_noise always written.
flux, flux_noise added iff any dataset has fluxes.
time_delay, time_delay_noise added iff any dataset has time_delays.

Changed Behaviour

None for existing code paths.

Removed

None.

Renamed

None.

Migration

No migration needed — JSON paths (output_to_json / from_json) are
untouched and remain the canonical modeling input. CSV is opt-in for
cluster-scale hand-editing workflows.

Non-goals (explicitly deferred to follow-up issues)

FitPointDataset.to_csv residual / chi-squared export.
Introducing a PointDatasetList class (list-level helpers stay as plain
functions for now).
Adding pandas as a dependency.
Promoting CSV I/O to autoconf alongside fits / json (discussed with
@Jammy2211 — reasonable future cleanup, but out of scope here).

https://claude.ai/code/session_01WUVZQGGLQhVKcprvUMf1Sa

Adds PointDataset.to_csv / from_csv plus module-level output_to_csv and list_from_csv helpers so lists of point-source datasets can be edited as a single spreadsheet (one row per image, grouped by name), which is far more practical than hundreds of per-source JSON files in strong-lens cluster work. JSON remains the canonical exact-round-trip format. - autolens/point/dataset.py: csv stdlib-only I/O (no pandas dep); optional flux/time_delay columns dropped when blank for every row; loader groups by name and enforces all-or-nothing population of optional columns per group - autolens/__init__.py: re-export output_to_csv, list_from_csv - test_autolens/point/test_dataset.py: round-trip tests for positions-only, positions+fluxes, positions+time_delays, all fields, heterogeneous list, multi-group from_csv name selection - Drop the stale PointDict bullet in the module docstring and document the JSON vs CSV surfaces Refs: #446 https://claude.ai/code/session_01WUVZQGGLQhVKcprvUMf1Sa

Jammy2211 added the pending-release Tracked for next release build label Apr 19, 2026 — with Claude

Jammy2211 merged commit da0ab64 into main Apr 19, 2026
5 checks passed

Jammy2211 deleted the claude/point-dataset-csv-Y9Jvn branch April 19, 2026 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CSV I/O for PointDataset for cluster-scale workflows#447

feat: CSV I/O for PointDataset for cluster-scale workflows#447
Jammy2211 merged 1 commit intomainfrom
claude/point-dataset-csv-Y9Jvn

Jammy2211 commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jammy2211 commented Apr 19, 2026

Summary

API Changes

Test Plan

Added

CSV schema (one row per image)

Changed Behaviour

Removed

Renamed

Migration

Non-goals (explicitly deferred to follow-up issues)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants