Skip to content

Budget-window job status flaps between failed and submitted on poll #448

@MaxGhenis

Description

@MaxGhenis

Summary

When a budget-window parent job fails on the Modal side, the gateway's polling endpoint mutates the seed BudgetWindowBatchState in memory only. The mutation is never persisted, so the next poll reloads an untouched seed and reports status="submitted" again. Clients see the status flap between failed and submitted.

Location

projects/policyengine-api-simulation/src/modal/gateway/endpoints.py:298-310

What goes wrong

try:
    result = call.get(timeout=0)
except TimeoutError:
    return batch_status_response(build_batch_status_response(seed_state))
except Exception as e:
    seed_state.status = "failed"
    seed_state.error = str(e)
    return batch_status_response(build_batch_status_response(seed_state))

seed_state is an instance loaded from the seed store via get_batch_job_seed (line 292). Neither put_batch_job_seed(seed_state) nor put_batch_job_state(seed_state) is called before returning. On the next /budget-window-jobs/{batch_job_id} poll:

  1. get_batch_job_state returns None (worker never reached the main store).
  2. get_batch_job_seed returns the original seed (status still "submitted").
  3. call.get(timeout=0) either succeeds or raises again.
  4. Client alternates between "failed" and "submitted" on each poll.

Suggested fix

Persist state transitions before returning:

except Exception as e:
    seed_state.status = "failed"
    seed_state.error = str(e)
    put_batch_job_seed(seed_state)  # or put_batch_job_state
    return batch_status_response(build_batch_status_response(seed_state))

Consider consolidating on a single store so the gateway and worker cannot diverge on which dict is authoritative.

Severity

High. Breaks polling contract and any retry logic that keys off "failed".

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions