Skip to content

feat: CSV I/O for PointDataset for cluster-scale workflows#447

Merged
Jammy2211 merged 1 commit intomainfrom
claude/point-dataset-csv-Y9Jvn
Apr 19, 2026
Merged

feat: CSV I/O for PointDataset for cluster-scale workflows#447
Jammy2211 merged 1 commit intomainfrom
claude/point-dataset-csv-Y9Jvn

Conversation

@Jammy2211
Copy link
Copy Markdown
Collaborator

Summary

Strong-lens cluster workflows can involve tens or hundreds of multiply-imaged
background sources. Editing that many per-source JSON files by hand is
unwieldy — one row per image grouped by name in a spreadsheet is a much
better hand-editing surface.

This PR adds a CSV I/O path for PointDataset alongside the existing JSON
path. JSON remains the canonical exact-round-trip format used by the
modeling scripts; CSV is the hand-editable cluster form. No new runtime
dependencies — csv stdlib only, no pandas.

Closes #446.

API Changes

  • Added: PointDataset.to_csv(file_path), classmethod
    PointDataset.from_csv(file_path, name=None), and module-level
    autolens.output_to_csv(datasets, file_path) /
    autolens.list_from_csv(file_path) — CSV I/O, one row per observed image,
    grouped by name; optional flux / time_delay column groups dropped
    when blank for every row.
  • Docstring fix: drop the stale PointDict bullet in
    autolens/point/dataset.py module docstring (that class no longer
    exists); document the JSON-vs-CSV surfaces instead.
  • No removals, renames, or signature changes. Purely additive.

See full details below.

Test Plan

  • test_autolens/point/test_dataset.py — round-trip tests for:
    • positions-only
    • positions + fluxes + fluxes_noise_map
    • positions + time_delays + time_delays_noise_map
    • all optional fields populated
    • heterogeneous list (one dataset with fluxes, one without) — exercises
      optional-column handling at the list level
    • multi-group from_csv requires explicit name=, single-group
      auto-picks
  • Module-level manual round-trip of every case (pytest couldn't load in
    the sandbox: installed autogalaxy predates LensCalc, the conftest
    import fails before collection — not an issue with the new code).
  • CI / maintainer-side pytest test_autolens/point/test_dataset.py
Full API Changes (for automation & release notes)

Added

  • autolens.PointDataset.to_csv(file_path: str) -> None — write a single
    dataset as one CSV, one row per observed image.
  • classmethod autolens.PointDataset.from_csv(file_path: str, name: Optional[str] = None) -> PointDataset
    — load a single dataset; name= required when the CSV has multiple
    groups, auto-picked when it has exactly one.
  • autolens.output_to_csv(datasets: List[PointDataset], file_path: str) -> None
    — write a list of datasets to a single CSV; optional flux / time-delay
    column groups are included when any dataset carries those values and
    left blank for datasets that do not.
  • autolens.list_from_csv(file_path: str) -> List[PointDataset] — load a
    list of datasets grouped by the name column in order of first
    appearance; raises ValueError on partially-populated optional columns
    within a group.

CSV schema (one row per image)

| name | y | x | positions_noise | flux | flux_noise | time_delay | time_delay_noise |

  • name, y, x, positions_noise always written.
  • flux, flux_noise added iff any dataset has fluxes.
  • time_delay, time_delay_noise added iff any dataset has time_delays.

Changed Behaviour

  • None for existing code paths.

Removed

  • None.

Renamed

  • None.

Migration

  • No migration needed — JSON paths (output_to_json / from_json) are
    untouched and remain the canonical modeling input. CSV is opt-in for
    cluster-scale hand-editing workflows.

Non-goals (explicitly deferred to follow-up issues)

  • FitPointDataset.to_csv residual / chi-squared export.
  • Introducing a PointDatasetList class (list-level helpers stay as plain
    functions for now).
  • Adding pandas as a dependency.
  • Promoting CSV I/O to autoconf alongside fits / json (discussed with
    @Jammy2211 — reasonable future cleanup, but out of scope here).

https://claude.ai/code/session_01WUVZQGGLQhVKcprvUMf1Sa

Adds PointDataset.to_csv / from_csv plus module-level output_to_csv and
list_from_csv helpers so lists of point-source datasets can be edited as a
single spreadsheet (one row per image, grouped by name), which is far more
practical than hundreds of per-source JSON files in strong-lens cluster work.
JSON remains the canonical exact-round-trip format.

- autolens/point/dataset.py: csv stdlib-only I/O (no pandas dep); optional
  flux/time_delay columns dropped when blank for every row; loader groups by
  name and enforces all-or-nothing population of optional columns per group
- autolens/__init__.py: re-export output_to_csv, list_from_csv
- test_autolens/point/test_dataset.py: round-trip tests for positions-only,
  positions+fluxes, positions+time_delays, all fields, heterogeneous list,
  multi-group from_csv name selection
- Drop the stale PointDict bullet in the module docstring and document the
  JSON vs CSV surfaces

Refs: #446

https://claude.ai/code/session_01WUVZQGGLQhVKcprvUMf1Sa
@Jammy2211 Jammy2211 added the pending-release Tracked for next release build label Apr 19, 2026 — with Claude
@Jammy2211 Jammy2211 merged commit da0ab64 into main Apr 19, 2026
5 checks passed
@Jammy2211 Jammy2211 deleted the claude/point-dataset-csv-Y9Jvn branch April 19, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pending-release Tracked for next release build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CSV I/O for PointDataset (cluster-friendly alternative to JSON)

2 participants