GatewayRequestBase allows arbitrary fields and forwards them to the worker

## Summary

`GatewayRequestBase` sets `extra="allow"`, so any extra field on a request body is forwarded to the worker. An attacker can inject internal-only options (for example overriding `data:` with a URI they control) and spawn simulations against arbitrary datasets. No payload size cap is enforced either.

## Location

- `projects/policyengine-api-simulation/src/modal/gateway/models.py:23-33`
- `projects/policyengine-api-simulation/src/modal/gateway/endpoints.py:157-163, 214-222`

## What goes wrong

```python
class GatewayRequestBase(BaseModel):
    country: str
    version: Optional[str] = None
    telemetry: TelemetryEnvelope | None = None

    model_config = ConfigDict(
        extra="allow",
        populate_by_name=True,
    )  # Pass through all other fields
```

Then in `endpoints.py`:

```python
payload = request.model_dump(
    exclude={"version", "telemetry"},
    mode="json",
)
...
sim_func = modal.Function.from_name(app_name, "run_simulation")
call = sim_func.spawn(payload)
```

Every un-modeled field flows verbatim into `run_simulation_impl`, which passes them to `SimulationOptions.model_validate(simulation_params)` (`simulation.py:90`). A caller can:

- Supply `data: "gs://attacker-bucket/malicious.h5"` to point the worker at a dataset they control (see `_build_policyengine_bundle`, `endpoints.py:65-68`, which already treats any `"://"` string as a trusted URI).
- Inject internal-only `SimulationOptions` fields that were never meant to be user-controllable.
- Supply arbitrarily large blobs (no `max_length` anywhere); Modal has no built-in body size cap for `@modal.asgi_app()`.

## Suggested fix

- Change `extra="allow"` to `extra="forbid"` and add the real required simulation fields to `SimulationRequest` / `BudgetWindowBatchRequest` as typed attributes.
- If passthrough is genuinely needed, define an explicit `extra_simulation_options: dict[str, AllowedOption]` with a closed-set discriminator.
- Refuse `data:` values that do not match an allowlist of known URIs (the `DATASET_URIS` table is already a natural whitelist).
- Add a payload size cap via FastAPI middleware.

## Severity

High, security. SSRF-like data exfiltration/dataset substitution plus unbounded-payload DoS.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GatewayRequestBase allows arbitrary fields and forwards them to the worker #450

Summary

Location

What goes wrong

Suggested fix

Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GatewayRequestBase allows arbitrary fields and forwards them to the worker #450

Description

Summary

Location

What goes wrong

Suggested fix

Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions