Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensures DAG params order regardless of backend #40156

Merged
merged 4 commits into from
Jun 13, 2024

Conversation

Usiel
Copy link
Contributor

@Usiel Usiel commented Jun 10, 2024

Fixes #40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn out to be helpful (edit: Postgres is also affected).

Tests

I made sure the new test fails with the old implementation + MySQL/Postgres. I assume this test will be executed with MySQL somewhere in the build actions?

@uranusjr
Copy link
Member

Instead of needing an extra key and extra nesting, should we just serialise params as list of key-value pairs instead?

@uranusjr
Copy link
Member

Alternatively maybe we can serialise the key order under a separate key? This is proably slightly more compact in most cases as well.

@Usiel
Copy link
Contributor Author

Usiel commented Jun 11, 2024

Instead of needing an extra key and extra nesting, should we just serialise params as list of key-value pairs instead?

I like this. I disregarded it previously, because I thought it would imply changing the params field on DAG. But if we just do it for serialization purposes the only changes will be in the (de)serialization function. I can do that change in a bit.

MySQL at least seems to hold true to the standard :)

An array is an ordered sequence of zero or more values.
https://www.rfc-editor.org/rfc/rfc7159.txt

@Usiel
Copy link
Contributor Author

Usiel commented Jun 11, 2024

Serializing as a list would require some additional handling of already serialized DAGs (using a dict) though

Usiel and others added 2 commits June 11, 2024 12:52
Fixes apache#40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide
the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn
out to be helpful.

I made sure the new test fails with the old implementation + MySQL. I assume this test will be
executed with MySQL somewhere in the build actions?
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
@Usiel Usiel force-pushed the usiel/fix-dag-param-order-with-mysql branch from aa2af24 to 6da5cbe Compare June 11, 2024 04:55
@Usiel Usiel force-pushed the usiel/fix-dag-param-order-with-mysql branch from 6da5cbe to 8a0c5fb Compare June 11, 2024 05:13
Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend.

Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works.
@Usiel
Copy link
Contributor Author

Usiel commented Jun 11, 2024

Apologies for the noise, was fighting with a test. I'll push another version in a bit with the incorporated comments.

Based on suggestions by @uranusjr with an additional fix to make mypy happy.
@Usiel Usiel force-pushed the usiel/fix-dag-param-order-with-mysql branch from 8a0c5fb to a1c7e31 Compare June 11, 2024 05:45
@Usiel
Copy link
Contributor Author

Usiel commented Jun 11, 2024

Ok, done. Latest push is rebased on main, includes the serialization as list and the additional suggestions.

@eladkal eladkal added this to the Airflow 2.9.3 milestone Jun 13, 2024
@eladkal eladkal added the type:bug-fix Changelog: Bug Fixes label Jun 13, 2024
@eladkal eladkal merged commit 2149b4d into apache:main Jun 13, 2024
51 checks passed
jannisko pushed a commit to jannisko/airflow that referenced this pull request Jun 15, 2024
* Ensures DAG params order regardless of backend

Fixes apache#40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide
the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn
out to be helpful.

I made sure the new test fails with the old implementation + MySQL. I assume this test will be
executed with MySQL somewhere in the build actions?

* Removes GitHub reference

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Serialize DAG params as array of tuples to ensure ordering

Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend.

Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works.

* Make backwards compatibility more explicit

Based on suggestions by @uranusjr with an additional fix to make mypy happy.

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
utkarsharma2 pushed a commit that referenced this pull request Jul 2, 2024
* Ensures DAG params order regardless of backend

Fixes #40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide
the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn
out to be helpful.

I made sure the new test fails with the old implementation + MySQL. I assume this test will be
executed with MySQL somewhere in the build actions?

* Removes GitHub reference

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Serialize DAG params as array of tuples to ensure ordering

Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend.

Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works.

* Make backwards compatibility more explicit

Based on suggestions by @uranusjr with an additional fix to make mypy happy.

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
(cherry picked from commit 2149b4d)
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
* Ensures DAG params order regardless of backend

Fixes apache#40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide
the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn
out to be helpful.

I made sure the new test fails with the old implementation + MySQL. I assume this test will be
executed with MySQL somewhere in the build actions?

* Removes GitHub reference

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Serialize DAG params as array of tuples to ensure ordering

Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend.

Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works.

* Make backwards compatibility more explicit

Based on suggestions by @uranusjr with an additional fix to make mypy happy.

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DAG params ordering goes by key string length for some backends
5 participants