Add schema migration to supervisor-child comm#67235
Conversation
7cd6fc3 to
ec65643
Compare
There was a problem hiding this comment.
Do we need this file committed? I'd been liking the fact that we don't have generated files committed in the execution API to date.
There was a problem hiding this comment.
Hm, I guess we don’t since we would publish this somewhere for the Java SDK to consume.
On the other hand it’d be a while for us to get this merged and then have the infrastructure ready. Maybe have this for now before we set things up on airflow.apache.org so the Java SDK has somewhere to point to, and remove it after setup and Java SDK points to it instead?
There was a problem hiding this comment.
There's JSON schema for Dag serialization format airflow-core/src/airflow/serialization/schema.json, no harm to introduce explicit JSON schema snapshot for the migration schema IMHO.
There was a problem hiding this comment.
I sort of see why we have this, but it feels weird to have a the version history "duplicated" on the client side.
Don't a lot of the schemas/models implicitly have a dependency on the Exec API side versioning too?
There was a problem hiding this comment.
Yes, but there’s not an established way to forward those migrations here, especially since there are additional supervisor-only models that may change as well but aren’t tracked in the Execution API spec.
I’m open to ideas; the currently way doesn’t really provide a good way to keep shared migrations in sync.
There was a problem hiding this comment.
especially since there are additional supervisor-only models that may change as well but aren’t tracked in the Execution API spec
Exactly, this is what the migrator and the versions bundle here trying to resolve.
Yes, there're partial of data models are directly from Execution API spec. but most of them aren't define in Execution API, like all the data models defined in task-sdk/src/airflow/sdk/execution_time/comms.py.
From my perspective, no matter the data models define in Execution API or not, all the data models consumed by CommsDecoder should be versioned somewhere.
ec65643 to
d057c20
Compare
| # mode="json")`` -- an unexpected kwarg that Pydantic rejects. The | ||
| # corrected version below calls ``downgrade`` with no extra args and | ||
| # lets the resulting versioned model handle serialisation. | ||
| def _corrected_serialize_response(self, msg, **dump_opts): |
There was a problem hiding this comment.
This monkeypatch documents the supervisor bug rather than testing the real code path.
The fixture installs a corrected _serialize_response that calls m.downgrade(msg, version) without the broken dump_kwargs=dump_opts kwarg (see comment on supervisor.py:743). With this patch in place the integration test suite never exercises the actual _serialize_response -- so the production code can ship broken and the suite stays green.
Once the supervisor call site is fixed (**dump_opts instead of dump_kwargs=dump_opts), delete the monkeypatch.setattr(WatchedSubprocess, "_serialize_response", ...) and the _corrected_serialize_response definition above. The fixture still needs the migrator + registry patches, just not the _serialize_response override.
This comment was marked as resolved.
This comment was marked as resolved.
d057c20 to
71e177d
Compare
With foreign language SDKs, it may be possible the two sides of supervisor comm have different versions. This adds a migration layer at the supervisor (server) side, so an SDK (client) using a lower version of the schema may be able to communicate to the server.
71e177d to
6ea9922
Compare
With foreign language SDKs, it may be possible the two sides of supervisor comm have different versions. This adds a migration layer at the supervisor (server) side, so an SDK (client) using a lower version of the schema may be able to communicate to the server.
This adds: (subpoints describe what each component is for, but those are not covered in this PR)
execution_time.