Allow for templated args to be passed to the result_processor function in the GithubSensor fixing templated tags in GithubTagSensor#29842
Allow for templated args to be passed to the result_processor function in the GithubSensor fixing templated tags in GithubTagSensor#29842TJaniF wants to merge 27 commits intoapache:mainfrom TJaniF:githubsensor_update
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
|
What are the incompatibilities if this is enabled by default, or always enabled? Since this is from a provider, we can break backward compatibility more freely if we think the new behaviour is significantly better. |
|
@uranusjr if I change the default of For example this is the error that happens if the current GitHubTagSensor inherits from the BaseGithubRepositorySensor/GitHubSensor with the changes. |
|
The breakage sounds reasonable if we bump the GitHub provider to 3.0.0. So the question would be, is having the kwarg default to true a good idea? Let’s bump if it is. |
so lets do a major release. also please add changelog entry with breaking change under 3.0.0 version in You can see example here: |
|
@uranusjr @eladkal Thank you for the reviews! I just realized there is a second thing to consider if the default for Users who are using the GitHubSensor directly and pass a I have an idea for a solution that would not break any current direct use of the GithubSensor while making the GitHubTagSensor I think this change would not be breaking anything, at least what I can think of. I pushed that version now to compare, let me know which version you think works better. :) |
Only if they use inputs that look like templates but actually are not, I think? Because otherwise the input can still be template-rendered and would not change anyway. (It might be ever so slightly slower but shouldn’t break.) That feels like a very specific edge case to me. |
I'm sorry I think I explained this poorly. 😅 With the first version of this PR (commit fe97f18 the following task fails: def max_10_branches(repo):
result = None
try:
if repo is not None:
branches = repo.get_branches()
if branches.totalCount <= 10:
result = True
else:
result = False
except Exception as e:
raise AirflowException(f"Github sensor error: {str(e)}")
return result
using_first_version_githubsensor = FIRST_VERSION_GithubSensor(
task_id="first_version_sensor",
github_conn_id='github_default',
method_name="get_repo",
method_params={'full_name_or_id': GITHUB_REPOSITORY},
result_processor=lambda repo: max_10_branches(repo),
allow_templates_in_result_processor=True
)with this is why I think setting The current version of the PR prevents this from happening and means the GithubSensor can be used exactly the same as it is right now, the only change is that operators inheriting from it can now use templated args in their processor function if the parameter |
|
There’s precedence of using |
…ult_processor function
|
Thank you @uranusjr ! I made the changes, now |
| ): | ||
| return self.result_processor(github_result, templated_fields=templated_fields) | ||
| if self.allow_templates_in_result_processor: | ||
| self.log.info("To use templated fields in your `result_processor` function, provide them as a dict to a `templated_fields` parameter in your `result_processor` function.") |
There was a problem hiding this comment.
This line is too long thus failing the static checks
There was a problem hiding this comment.
Thank you!
I did the changelog and provider.yaml entry now for a 2.2.2 version since at least as far as I can tell this change should now not break anything :)
Is there anything else I should be doing? (this is my first time committing to Airflow)
| 2.2.2 | ||
| ..... | ||
|
|
||
| Bug Fixes | ||
| ~~~~~~~~~ | ||
|
|
||
| * ``Allow for 'tag_name' in GithubTagSensor to be templated (#29842)`` | ||
|
|
There was a problem hiding this comment.
We are adding manual entry to changelog only in breaking changes (as in that case the commit message is not enough to explain what action users should take) since now the change is backward compatible you can remove this
|
|
||
| suspended: false | ||
| versions: | ||
| - 2.2.2 |
|
Thank you @eladkal! I removed the patch version bump and the changelog entry. :) |
| self.method_params = method_params | ||
|
|
||
| def poke(self, context: Context) -> bool: | ||
| def poke(self, context: Context, templated_fields: dict | None = None) -> bool: |
There was a problem hiding this comment.
Question: Maybe I do not understand someting, but how do you plan to use the "templated_fields" here? From what I understand, poke method is called by airflow without those optional "templated_fields", and when you use sensor in your DAG "as is" you cannot really pass those templated_fields here..
Am I missing something?
potiuk
left a comment
There was a problem hiding this comment.
I have a question about the usage of the sensor - I cannot understand how the new parameter will be used.
Also - regardless - this PR does not have any tests and it adds substantial code, this gives no regression protection capabilities so it needs some tests to be mergable.
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
Note: I did not forget about this, just trying to find the time to figure out enough about testing to add the requested tests :) and to remember/retest what I did here in the first place 👀 |
|
Closing this because at this point it is far behind main and I expect it will take a while until I have time to sit down and properly learn how to write tests for this and assess if this is really the best of of solving it, sorry about that! 😓 I'll re-PR it once I do and link to this PR. |
When using a templated
tag_namein the current GithubTagSensor, the templated argument will not correctly be passed to the call to the GithubSensor.poke method to be used in thetag_checkermethod that is passed in as theresult_processorfunction.The result is that the sensor is looking for a tag of the format of the string template as shown in the logs:
This PR adds an
allow_templates_in_result_processorparameter to the GithubSensor, False by default to allow for backwards compatibility for users who have built custom operators on top of the GithubSensor. Ifallow_templates_in_result_processoris set to True as the GithubTagSensor now does by default, a dict containing the values of all templated arguments can be passed to the call toGithubSensor.pokeand within that method toself.result_processor.Within the GithubTagSensor the
tag_checkermethod now pulls thetag_namefrom this passed dictionary instead of using self.tag_name, which was retrieving the template-string before.I also added
(BaseGithubRepositorySensor can't be used directly to create Airflow tasks)to the AirflowException raised when using the BaseGithubRepositorySensor directly in a DAG, because theOverride me.message alone might be confusing for new Airflow users accidentally using this sensor directly.I'm of course very happy to take suggestions on how to better solve this (and better naming 😅 ). I did not make any changes to the test running on the GitHubTagSensor yet because I am not sure how to mock templating in a unittest.