Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent templated field logic checks in operators __init__ in BigQueryToPostgresOperator operator #36491

Merged
merged 1 commit into from
Dec 29, 2023

Conversation

romsharon98
Copy link
Collaborator

related: #36484

fix BigQueryToPostgresOperator operator for this cherry-picking: #33786


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Dec 29, 2023
@potiuk potiuk merged commit f070efa into apache:main Dec 29, 2023
52 checks passed
@romsharon98 romsharon98 deleted the fix/bigqeury branch January 2, 2024 12:26
@@ -36,8 +34,6 @@ class BigQueryToPostgresOperator(BigQueryToSqlBaseOperator):
:param postgres_conn_id: Reference to :ref:`postgres connection id <howto/connection:postgres>`.
"""

template_fields: Sequence[str] = (*BigQueryToSqlBaseOperator.template_fields, "dataset_id", "table_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a breaking change?

I think the real issue is with:

try:
self.dataset_id, self.table_id = dataset_table.split(".")
except ValueError:
raise ValueError(f"Could not parse {dataset_table} as <dataset>.<table>") from None

and it will affect all operators that inhert from the base class

Copy link
Collaborator

@shahar1 shahar1 Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a breaking change?

I think the real issue is with:

try:
self.dataset_id, self.table_id = dataset_table.split(".")
except ValueError:
raise ValueError(f"Could not parse {dataset_table} as <dataset>.<table>") from None

and it will affect all operators that inhert from the base class

Coming to think of it, it might be breaking indeed as fields that don't exist in the parent's template_fields are removed from child's definition.
I suggest reverting it for now.
@romsharon98 instead of deleting this line, try to hardcode all of the values that should be templated, and see if it works (a bit ugly, but I don't have better idea for now).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your insight @eladkal !
Can you help me understand why is this breaking change?

This is how I understand it:
Lets assume I revert the PR, so both "dataset_id", "table_id" are templated field for BigQueryToPostgresOperator.

But the parent constructor (BigQueryToSqlBaseOperator) always run them over by the line you mentioned.

So as I understand it this reverted line has no meaning.

Copy link
Member

@potiuk potiuk Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCRATCH_THAT:

I think we should not treat it as breaking (or at least what I undertstand it here).

I think the only scenario where it would matter is:

  1. Someone creates a custom operator derived from BigQueryToPostgresOperator
  2. The same someone adds new fields there "dataset_id" and "table_id"
  3. And expects them to be templated.

Even if it worked previously, that was accidental and unintended and we should treat this change as a bug-fix. If somoene adds new fields in a derived operator it's their responsibilty to add those fields to templated fields.

While this change might technically break someone's implementation, IMHO We should treat it as bugfix because:

a) it's a very low chance this will happen
b) while we are breaking something technically we are bringing things back to how they were intended to work. Having those fields in this operator was accidental not intentional

I will repeat it for as long as it sticks - SemVer and "breaking" classification is not whether something is "technically" broken but whether our intentions changed. If we would apply "breaking change" label for every change that changes behaviour then pretty much every single bugfix is "technically" breaking because it changes behaviour.

UPDATE: I just realized I missed the parent class. Let me revise it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal is right - but we shiould not revert it - instead we should ad those to fields to the base class.

Copy link
Collaborator

@shahar1 shahar1 Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal is right - but we shiould not revert it - instead we should ad those to fields to the base class.

Sounds good to me.
A note from a technical perspective of the validation pre-commit -
As the validation is currently based on very simplified AST parsing, it would be better for now to define the fields directly (i.e., template_fields = ['a','b']), rather than relying on parents' fields (i.e, template_fields=(**ParentClass.template_fields,'b'))., otherwise the validation might fail.
The cost would be minimal abuse to the inheritance, which can later be fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is really what I also proposed - to move template_fields = ['dataset_id', 'template_id'] to BigQueryToSqlBaseOperator. This is where they belong and this is what will make them consistent with the AST check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup PR #36663

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants