Skip to content

Extract GeoDataFrameExecutor as shared base class for GDF-based executors` #128

@profsergiocosta

Description

@profsergiocosta

Summary

Currently, every GeoDataFrame-based executor (LUCCVectorExecutor, future
raster/vector variants) reimplements the same boilerplate for load(),
validate(), and save(). The only genuinely domain-specific method is
run(). This issue proposes extracting a shared GeoDataFrameExecutor base
class to eliminate that duplication.

Motivation

LUCCVectorExecutor.load() and save() are already nearly identical to what
any future GDF executor would need:

  • load() — calls load_dataset(), captures checksum, applies column_map
  • validate() — checks column_map keys against model spec
  • save() — calls save_dataset(), records output_sha256, sets status
  • _check_columns() — post-load column presence check

These four concerns are infrastructure, not domain logic. A second executor
(e.g., a raster variant or a new LUCC model) would duplicate them verbatim —
which is the signal that a base class is warranted.

Proposed Design

New file: dissmodel/executor/geodataframe.py

class GeoDataFrameExecutor(ModelExecutor):
    """
    Base class for executors that operate on GeoDataFrames.

    Provides default implementations of load(), validate(), and save()
    that cover the standard GDF contract (load_dataset, column_map,
    checksum, save_dataset). Subclasses only need to implement run().
    """

    output_ext: str = "gpkg"  # subclasses may override (e.g. "geojson")

    def load(self, record: ExperimentRecord) -> gpd.GeoDataFrame: ...
    def validate(self, record: ExperimentRecord) -> None: ...
    def save(self, result: gpd.GeoDataFrame, record: ExperimentRecord) -> ExperimentRecord: ...

    def run(self, data: gpd.GeoDataFrame, record: ExperimentRecord):
        raise NotImplementedError

Reduced executor definition

After this change, LUCCVectorExecutor becomes:

class LUCCVectorExecutor(GeoDataFrameExecutor):
    name = "lucc_vector"

    def run(self, data: gpd.GeoDataFrame, record: ExperimentRecord) -> gpd.GeoDataFrame:
        # only domain logic: Environment, Demand, Potential, Allocation
        ...

Three overridden methods collapse into one.

Scope

In scope:

  • Extract GeoDataFrameExecutor into dissmodel/executor/geodataframe.py
  • Migrate LUCCVectorExecutor (in dissluc) to inherit from it
  • Export GeoDataFrameExecutor from dissmodel.executor
  • Update docstrings and type hints

Out of scope (tracked separately):

  • Declarative pipeline via TOML (model.pipeline driving component
    instantiation) — more powerful but trades flexibility for convention;
    worth evaluating post-JOSS
  • Raster base class (XarrayExecutor or similar) — follow-up issue once
    a second raster executor exists

Acceptance Criteria

  • GeoDataFrameExecutor exists in dissmodel/executor/geodataframe.py
  • LUCCVectorExecutor inherits from it with only run() overridden
  • All existing tests pass without modification
  • GeoDataFrameExecutor is exported from dissmodel.executor.__init__
  • Docstring explains the output_ext override pattern

Notes

The right moment to implement this is when a second GDF executor is
introduced — the duplication cost becomes concrete and the abstraction boundary
is validated by two real use cases. This issue can be picked up speculatively
before that point if bandwidth allows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions