add missing read for K8S config file from conn in deferred `KubernetesPodOperator` #29498

hussein-awala · 2023-02-12T22:15:08Z

The async execute method of KubernetesPodOperator doesn't check if the config_path is provided in the connection extra, this PR fixes this by extracting the config path in order to read it and convert it to dictionary.

raphaelauv · 2023-02-13T14:56:12Z

airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py

@@ -565,7 +565,16 @@ def execute_async(self, context: Context):

    def convert_config_file_to_dict(self):
        """Converts passed config_file to dict format."""
-        config_file = self.config_file if self.config_file else os.environ.get(KUBE_CONFIG_ENV_VAR)
+        config_file = None


thanks @hussein-awala for proposing this fix.

why the async need the function convert_config_file_to_dictand not the sync ?

Look like the async was implemented not fully following this pattern -> #20578

your PR fix the problems for the extra config_path , there is a risk that another is missing or new in the future would need "manual" fix like this

I am not sure about the initial reason to convert the file into dictionary before creating the trigger, it may be to avoid copying the config file to the triggerer, where the pod is created on the worker using the sync hook and the waiting task is running on the triggerer and it uses the async hook.

here is a risk that another is missing or new in the future would need "manual" fix like this

With this fix, we cover all options currently available to provide the configuration file, and yes, if we add a new one in the future, we must add it on the sync hook and in this method.

@VladaZakharova can you please explain what was the motivation to convert the config file to a dictionary before creating the trigger?

Hi Team!
This was implemented to that config file was converted to dict to be passed to trigger and then hook to establish connection.

what do you mean by lighten the credential management ?

the hook is not re instantiate at every run of the trigger ?

We needed a way to pass config file to the trigger to create a client for kubernetes, but using file system to communicate with trigger was not a good solution. So then we added a possibility to pass all config file parameters as a dict.

To respect the pattern mentioned by @raphaelauv, I will try loading the config file in the async hook, this should work where the triggerer is initiated once.

Please mind that all FS operations are blocking side effects. It's violating asyncio contract and can cause additional error logs informing about blocking code.

potiuk · 2023-02-20T09:33:26Z

@hussein-awala I guess you will be still changing the config access pattern on that one ? Do I understand correctly?

hussein-awala · 2023-02-20T09:56:03Z

I guess you will be still changing the config access pattern on that one ? Do I understand correctly?

Yes, I'm testing loading the config file in the triggerer instead of loading it in the worker and pass it as a dict.

I convert the PR to draft until I finish testing

VladaZakharova · 2023-02-20T10:24:04Z

Hi!
May i ask in which format you will pass the config file to trigger? So it will be just a file passed as a parameter to trigger? Or how?

…rker and pass it as a dict

hussein-awala · 2023-02-21T00:38:04Z

Hi! May i ask in which format you will pass the config file to trigger? So it will be just a file passed as a parameter to trigger? Or how?

@VladaZakharova - Yes, I pass the file path and let the triggerer loads it. Can you check my last commit?

BTW, I am not sure if loading the config file from the env var KUBECONFIG is a good idea or not, because it's difficult to decide when we need to load it and when we don't.

raphaelauv · 2023-02-21T08:30:27Z

Loading the config file from the env KUBECONFIG is deprecated in latest provider version

raphaelauv

LGTM

github-actions · 2023-04-11T00:11:35Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

raphaelauv · 2023-04-13T09:17:45Z

@hussein-awala the PR have conflicts , could you rebase on main , thank you 👍

eladkal · 2023-04-15T12:42:15Z

airflow/providers/cncf/kubernetes/operators/pod.py

-    def convert_config_file_to_dict(self):
-        """Converts passed config_file to dict format."""
-        config_file = self.config_file if self.config_file else os.environ.get(KUBE_CONFIG_ENV_VAR)
-        if config_file:
-            with open(config_file) as f:
-                self._config_dict = yaml.safe_load(f)
-        else:
-            self._config_dict = None


is removing this function considered a breaking change?

In my opinion, this method is used as a private method since it only updates some attributes in the class instances without returning any value. However, it's possible that someone could extend the operator class and use it. Should we deprecate it and remove it in the next major release, or should we add a breaking change note?

So lets deprecate first. Just to be on the safe side.

potiuk · 2023-04-22T17:22:16Z

LGTM. @eladkal ?

This is safe as Jinja template will not validate with s3 rules Small quotation fix (#30448) Co-authored-by: bugraozturk <bugra.ozturk@mollie.com> Merge DbtCloudJobRunAsyncSensor logic to DbtCloudJobRunSensor (#30227) * feat(providers/dbt): move the async execution logic from DbtCloudJobRunAsyncSensor to DbtCloudJobRunSensor * test(providers/dbt): add test cases for DbtCloudJobRunSensor when its deferrable attribute is set to True * docs(providers/dbt): update the doc for DbtCloudJobRunSensor deferrable mode and DbtCloudJobRunAsyncSensor deprecation * refactor(proviers/dbt): deprecate poll_interval argument * docs(providers/dbt): add deprecation note as DbtCloudJobRunAsyncSensor docstring * fix(providers/dbt): check whether timeout is in kwargs * docs(providers/dbt): add missing deferrable=True in howto_operator_dbt_cloud_run_job_sensor_defered Collect test types in CI in parallel (#30450) One of the steps in our CI is to collect tests, which is to prevent parallel tests from even starting when we know test collection will fail (it is terrible waste of resources to start 20 test jobs and initialize databases etc. when we know it is not needed. This however introduced a single point of delay in the CI process, which with the recent collection protection implemented in #30315 piled up to more than 5 minutes occassionally on the CI machines of ours, especially on public runners. This PR utilises our existing test framework to be able to parallelise test collection (Pytest does not have paralllel collection mechanism) - also for localised PRs it will only run test collection for the test types that are going to be executed, which will speed it up quite a lot. This might get further sped up if we split Provider tests into smaller groups to parallelise them even more. remove stray parenthesis in spark provider docs (#30454) Rename --test-types to --parallel-test-types parameters (#30424) The --test-type and --test-types parameters were very similar, but they have enough difference to differentiate them even more: The --test-type is specifically to run single test and it might include not only the regular "test-types" but also allow for cross-selection of tests from different other types (for example --test-type Postgres will run tests for Postgres database and they might come from Providers, Others, CLI etc. Where --test-types was generally foreseen to be able to split the tests into "sepearated" groups that could be run in parallel. The parameters have different defaults and even different choice of test type that you could choose from (and --test-types is a space-separated one to make it easier to pass it around in CI, where rather than passing multiple (variable number) of parameters, it's easier to pass a single, even space-separated list of tests to run. This change is good to show the difference between then parameters and to stress that they are really quite different, also it makes it easier to avoid confusion people might have especially that the name was easy to have typo in. In a way (but different than in the original issue it Fixes: #30407 Fix cloud build async credentials (#30441) Fix bad merge conflict on test-name-parameter-change (#30456) We've added a new reference to test-types in #30450 and it clashed with parameter rename in #30424. This resulted in bad merge (not too dangerous, just causing missing optimisation in collection elapsed time in case only a subset of test types were to be executed. Add description of the provider suspension process (#30359) Following discussion at the devlist, we are adding description of the suspension process for providers that hold us back from upgrading old dependencies. Discussion here: https://lists.apache.org/thread/j98bgw9jo7xr4fvjh27d6bfoyxr1omcm Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com> fix: upgrade moment-timezone package to fix Tehran tz (#30455) Discovery safe mode toggle comment clarification (#30459) Fix Breeze failing with error on Windows (#30464) When breeze is run on Windows it fails with FileNotFoundException when running uname during emulation check. This is now fixed alongside fixing TimeoutError misplacement - after moving it to local import, an exception triggered before importing it causes UnboundLocaalError. Related: https://github.com/apache/airflow/pull/30405#issuecomment-1496414377 Fixes: #30465 Update MANIFEST_TEMPLATE.in.jinja2 (#30431) * Update MANIFEST_TEMPLATE.in.jinja2 * remove google * remove README.md Add mechanism to suspend providers (#30422) As agreed in https://lists.apache.org/thread/g8b3k028qhzgw6c3yz4jvmlc67kcr9hj we introduce mechanism to suspend providers from our suite of providers when they are holding us back to older version of dependencies. Provider's suspension is controlled from a single `suspend` flag in `provider.yaml` - this flag is used to generate providers_dependencies.json in generated folders (provider is skipped if it has `suspended` flag set to `true`. This is enough to exclude provider from the extras of airflow and (automatically) from being used when CI image is build and constraints are being generated, as well as from provider documentation/generation. Also several parts of the CI build use the flag to filter out such suspended provider from: * verification of provider.yaml files in pre-commit is skipped in terms of importing and checking if classes are defined and listed in the provider.yaml * the "tests" folders for providers are skipped automatically if the provider has "suspend" = true set * in case of PR that is aimed to modify suspended providers directory tree (when it is not a global provider refactor) selective checks will detect it and fail such PR with appropriate message suggesting to fix the reason for suspention first * documentation build is skipped for suspended providers * mypy static checks will skip suspended provider folders while we will still run ruff checks on them (unlike mypy ruff does not expect the libraries that it imports to be available and we are running ruff in a separate environment where no airflow dependencies are installed anyway Add docs for livy deferrable operator (#30397) * Add docs for livy deferrable * Add docs for livy deferrable * Apply review suggestions * Fix example DAG add clarification about timezone aware dags (#30467) * add clarification about timezone aware dags Fix typo on index.rst file (#30481) A duplicate word has been removed. add template field for s3 bucket (#30472) Fix typo in outputs in parallel-test-types (#30482) The typo causes unnecessary delays on building regular PRs :( It was introduced in #30424 Support serialization to Pydantic models in Internal API (#30282) * Support serialization to Pydantic models in Internal API. * Added BaseJobPydantic support and more tests Add a new parameter for base sensor to catch the exceptions in poke method (#30293) * add a new parameter for base sensor to catch the exception in poke method * add unit test for soft_fail parameter Allow to set limits for XCOM container (#28125) Add AWS deferrable BatchOperator (#29300) This PR donates the following BatchOperator deferrable developed in [astronomer-providers](https://github.com/astronomer/astronomer-providers) repo to apache airflow. Update dead link in Sentry integration document (#30486) * Update dead link in Sentry integration document * fix Add more info to quicksight error messages (#30466) Revert "Add AWS deferrable BatchOperator (#29300)" (#30489) This reverts commit 77c272e6e8ecda0ce48917064e58ba14f6a15844. Fix output to outputs typos in ci.yaml everywhere (#30490) (Facepalm) The typo of output -> outputs from #30482 was also in the ci.yaml where it was used and it was missed in this PR. I can blame the GitHub Actions stupid choice of accepting typoed names of outputs and replacing them with blank strings (which I raised as an issue a long time ago) Reformat chart templates part 3 (#30312) Move Pydantic classes for ORM objects to serialization (#30484) The Pydantic classes are really part of the serialization mechanism and they should be moved there, rather than kept in the core packages they serialize, following our serialization approach. Separate mypy pre-commit checks (#30502) Previously all mypy pre-commit checks were run as one "run-mypy" check, but that does not allow to run them separately when trying to fix some of them, only for a specific part of the sources. This PR splits them into "dev", "core", "providers" and "docs". Avoid logging sensitive information in triggerer job log (#30110) * Change trigger name to task id instead of repr(trigger) to avoid logging sensitive information Allow specifying a `max_depth` to the `redact()` call (#30505) The default was hard-coded as 5 which is suitable for the logs redacting, but the OpenLineage PR would like to be able to use a deeper depth. Fix deprecation warning in `example_sensor_decorator` DAG (#30513) Put AIP-44 internal API behind feature flag (#30510) This includes: * configurable setting with defaults taken from env variable * raising exception if config variables are used with feature flag not enabled * hiding config values (adding mechanism to hide config values that are set for the future versions) * skipping tests Summarize skipped tests after tests are run (#30520) When Pytest run tests it provides a summary of the tests. We are running a lot of the tests so we are really interested only in cases that are "interesting". So far we were not showing "skipped" tests in the summary, because there were cases where a lot of tests were skipped (mostly when integration tests were run - we collected tests from "tests" folder and run only those tests that were not skipped by @integration mark. This however changed in #28170 as we moved all integration tests to "integration" subfolder and now instead of large number of skipped tests we run them selectively for each integration. This should help in verifying that the skipped tests were skipped for a good reason (and that we actually see which tests have been skipped). Add more type hints to the code base (#30503) * Fully type Pool Also fix a bug where create_or_update_pool silently fails when an empty name is given. An error is raised instead now. * Add types to 'airflow dags' * Add types to 'airflow task' and 'airflow job' * Improve KubernetesExecutor typing * Add types to BackfillJob This triggers an existing typing bug that pickle_id is incorrectly typed as str in executors, while it should be int in practice. This is fixed to keep things straight. * Add types to job classes * Fix missing DagModel case in SchedulerJob * Add types to DagCode * Add more types to DagRun * Add types to serialized DAG model * Add more types to TaskInstance and TaskReschedule * Add types to Trigger * Add types to MetastoreBackend * Add types to external task sensor * Add types to AirflowSecurityManager This uncovers a couple of incorrect type hints in the base SecurityManager (in fab_security), which are also fixed. * Add types to views This slightly improves how view functions are typechecked and should prevent some trivial bugs. Type related import optimization for Executors (#30361) Move some expensive typing related imports to be under TYPE_CHECKING Fix link to pre-commit-hook section (#30522) * Change static link * Update LOCAL_VIRTUALENV.rst Do not use template literals to construct html elements (#30447) Enable AIP-44 and AIP-52 by default for development and CI on main (#30521) * Enable AIP-44 and AIP-52 by default for development and CI on main The AIP-44 and AIP-52 are controlled now by environment variables, however those variables were not passed by default to inside the docker-compose environment so they had no effect when set on the ci.yaml. This PR fixes it, but it also sets the variables to be enabled by default in Breeze environment and when the tests are run locally in main using local venv so that the contributors are not surprised when they try to reproduce local failures. In 2.6 branch, we will set both variables to "false" by default in ci.yml, so that the tests are not run when we cherry-pick the changes. * Update scripts/ci/docker-compose/devcontainer.env Run "api_internal" tests in CI (#30518) * Run "api_internal" tests in CI While adding a feature flag for AIP-44 I noticed that due to a weird naming we have in tests, the "api_internal" tests were actually excluded from running - this was due to a combination of factors: * When API tests are are run, only "api" and "api_connexion" were added to API_tests * We already have "api" folder in "tests" (for experimental api) * finding "Other" tests should cover it but it exluded "api" tests but the way it is implemented, it took the "api" prefix and excluded all the test directories that were starting with "api" (including "api_internal/*" ones This change addresses it twofold: * The "api_internal" tests are added explicitly to the "API" test type * The "tests/api" folder with tests for the experimental API has been renamed to "api_experimental" (including integration tests) This should set the "internal_api" tests to run in the API test type, and renaming the "api" to api_experimental should avoid accidental skipping the tests in case someone adds "tests/api_SOMETHING" in the future. * Update Dockerfile.ci Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> * Update entrypoint_ci.sh --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> `EmailOperator`: fix wrong assignment of `from_email` (#30524) * `EmailOperator`: fix wrong assignment of `from_email` Add asgiref as a core dependency (#30527) We added asgiref to core a few months back for the `sync_to_async` in airflow.triggers.external_task. Although the core http provider depends on asgiref since v 4.2 it is possible to have an older version http installed meaning that you end up without asgiref, which leads to every dag failing to parse as the "dependency detector" code inside the DAG Serializer ends up importing this module! improve first PR bot comment (#30529) Put AIP-52 setup/teardown tasks behind feature flag (#30509) We aren't going to land AIP-52 in time for 2.6, so put the authoring api behind a feature flag. I've chosen to put it in `airflow.settings` so users can set it in `airflow_local_settings`, or set it via env var. Add tests to PythonOperator (#30362) * Add tests to airflow/operators/python.py * Convert log error of _BasePythonVirtualenvOperator._read_result() into a custom exception class * Improve deserialization error handling --------- Co-authored-by: Shahar Epstein <shahar1@live.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Make AWS RDS operator page title consistent (#30536) This PR changes the title of the RDS documentation page from "Amazon Relational Database Service Documentation (RDS)" to "Amazon Relational Database Service (RDS)". This page was the only one with the word "Documentation" in its title, and several other services had a similar title format of ("Amazon <full service name> (<acronym>)"), for example ["Amazon Simple Notification Service (SNS)"](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/sns.html). Revert "Add tests to PythonOperator (#30362)" (#30540) This reverts commit b4f3efd36a0566ef9d34baf071d935c0655a02ef. Add new known warnings after dependencies upgrade. (#30539) Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Fix AzureDataFactoryPipelineRunLink get_link method (#30514) Use default connection id for KubernetesPodOperator (#28848) * Use default connection id for KubernetesPodOperator --------- Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Fix dynamic imports in google ads vendored in library (#30544) This is a further fix to vendor-in google-ads library: * moving from _vendor to google_vendor is needed to keep the tree structure separate from `google` package used in google provider. If we do not do that, ads library will find "google" package when walking up the tree and will determine that this is the "top" google package * dynamic imports of ads library imports have been updated to also import from the vendored-in library * only v12 version is supported Closes: #30526 BigQueryHook list_rows/get_datasets_list can return iterator (#30543) Add deferrable mode to GKEStartPodOperator (#29266) * Add deferrable mode to GKEStartPodOperator * Change naming for GKEHook and add comments * Rebase main, revert unrelated changes * Add review suggestions + rebase * Add deprecation warning for deleted method + rebase Accept None for `EmailOperator.from_email` to load it from smtp connection (#30533) * Add default None for to load it from smtp connection Update DV360 operators to use API v2 (#30326) * Update DV360 operators to use API v2 * Update display_video.rst * fixup! Update display_video.rst * fixup! Update display_video.rst --------- Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Prepare docs for ad hoc release of Providers (#30545) * Prepare docs for ad hoc release of Providers * add smtp provider * add google Add --one-pass-only parameter to breeze docs build command (#30555) The parameter was previously supported in the docs-build script but it was not exposed via breeze commands. It allows to iterate faster on docs building, by default docs building runs up to 3 passes in order to account for new cross-references between multiple providers, this flag makes it one pass, which makes it faster to summarize the errors when you try to nail down a problem with docs. Load subscription_id from extra__azure__subscriptionId (#30556) Use the engine provided in the session (#29804) Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com> fix release doc for providers (#30559) Change timer type back to float (#30532) Decouple "job runner" from BaseJob ORM model (#30255) * Decouple "job runner" from BaseJob ORM model Originally BaseJob ORM model was extended and Polymorphism has been used to tie different execution logic to different job types. This has proven to be difficult to handle during AIP-44 implementation (internal API) because LocalTaskJob, DagProcessorJob and TriggererJob are all going to not use the ORM BaseJob model, but they should use BaseJobPydantic instead. In order to make it possible, we introduce a new type of object BaseJobRunner and make BaseJob use the runners instead. This way, the BaseJobRunners are used for the logic of each of the job, where single, non-polimorphic BaseJob is used to keep the records in the database - as a follow up it will allow to completely decouple the job database operations and move it to internal_api component when db-lesss mode is enabled. Closes: #30294 Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Fix one more dynamic import needed for vendored-in google ads (#30564) Continuation of #30544 Move Pydantic class serialization under AIP-44 feature flag (#30560) The Pydantic representation of the ORM models is only used in AIP-44 in-progress feature, and we are moving to a new seialization implementation (more modular) in a near feature so in order to not unecessarily extend features in old serialization, but allow to test AIP-44, we are moving the use_pydantic_models parameter and it's implementation under _ENABLE_AIP_44 feature flag, so that it is not used accidentally. We will eventually remove it and add Pydantic serialization to the new serialization implementation. Add podAnnotations to PgBouncer (#30168) Added support for using SHA digest of Docker images (#30214) Bump json5 to 1.0.2 and eslint-plugin-import to 2.27.5 in /airflow/www (#30568) Bumping from 1.0.1 for json5 and 2.26.0 Update dataproc.rst (#30566) Making statement more contextual so change proposed here is "a provide" to "to provide" Quieter output during asset compilation (#30565) The "Still waiting ....." message was emitted every second, which can be quite noisy even on moderate machines. This reduces the message to once every 5 seconds. Rename JobRunner modules to *_job_runner and base_job* to job (#30302) The #30255 introduced "JobRunner" concept and decoupled the job logic from the ORM polymorphic *Job objects. The change was implemented in the way to minimise the review effort needed, so it avoided renaming the modules for the runners (from `_job` to `_job_runner`). Also BaseJob lost its "polymorphism" properties so the package, and class name can be renamed to simply job. This PR completes the JobRunner concept introduction by applying the renames. Closes: #30296 Speed up dag runs deletion (#30330) * Provide custom deletion for dag runs to speed up when a dag run has a lot of related task instances --------- Co-authored-by: Zhyhimont Dmitry <zhyhimont.d@profitero.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Adding taskflow API example for sensors (#30344) Use connection URI in SqliteHook (#28721) * Use connection URI in SqliteHook This allows the user to define more sqlite args such as mode. See https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#uri-connections for details. - remove unsupported schema, login and password fields in docs - add info about host field to docs Release notes for helm chart 1.9.0 (#30570) Do not remove docker provider for Airflow 2.3 check (#30483) This removal is a remnant of old docker provider for 2.2 and should not be happening. Separate and split run job method into prepare/execute/complete steps (#30308) * Separate and split run job method into prepare/execute/complete steps As a follow-up after decoupling of the job logic from the BaseJob ORM object (#30255), the `run` method of BaseJob should also be decoupled from it (allowing BaseJobPydantic to be passed) as well as split into three steps, in order to allow db-less mode. The "prepare" and "complete" steps of the `run` method are modifying BaseJob ORM-mapped object, so they should be called over the internal-api from LocalTask, DafFileProcessor and Triggerer running in db-less mode. The "execute" method however does not need the database however and should be run locally. This is not yet full AIP-44 conversion, this is a prerequisite to do so - and AIP-44 conversion will be done as a follow-up after this one. However we added a mermaid diagram showing the job lifecycle with and without Internal API to make it easier to reason about it Closes: #30295 Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Update SQLAlchemy `select()` to new style (#30515) SQLAlchemy has a new style for `select() that is standard for 2.0. This updates our uses of it to avoid `RemovedIn20Warning` warnings. https://docs.sqlalchemy.org/en/20/errors.html#select-construct-created-in-legacy-mode-keyword-arguments-etc Remove JobRunners back reference from Job (#30376) This is the final step of decoupling of the job runner from ORM based BaseJob. After this change, finally we rich the state that the BaseJob is just a state of the Job being run, but all the logic is kept in separate "JobRunner" entity which just keeps the reference to the job. Also it makes sure that job in each runner is defined as appropriate for each job type: * SchedulerJobRunner, BackfillJobRunner can only use BaseJob * DagProcessorJobRunner, TriggererJobRunner and especially the LocalTaskJobRunner can keep both BaseJob and it's Pydantic BaseJobPydantic representation - for AIP-44 usage. The highlights of this change: * Job does not have job_runner reference any more * Job is a mandatory parameter when creating each JobRunner * run_job method takes as parameter the job (i.e. where the state of the job is called) and executor_callable - i.e. the method to run when the job gets executed * heartbeat callback is also passed a generic callable in order to execute the post-heartbeat operation of each of the job type * there is no more need to specify job_type when you create BaseJob, the job gets its type by a simply creating a runner with the job This is the final stage of refactoring that was split into reviewable stages: #30255 -> #30302 -> #30308 -> this PR. Closes: #30325 Cast binding +1 in helm chart release vote email (#30590) We will assume that the release manager for the helm chart wants to cast a binding +1 vote :) Databricks SQL sensor (#30477) * Renamed example DAG Add Hussein to committers (#30589) Add support in AWS Batch Operator for multinode jobs (#29522) picking up #28321 after it's been somewhat abandoned by the original author. Addressed my own comment about empty array, and it should be good to go I think. Initial description from @camilleanne: Adds support for AWS Batch multinode jobs by allowing a node_overrides json object to be passed through to the boto3 submit_job method. Adds support for multinode jobs by properly parsing the output of describe_jobs (which is different for container vs multinode) to extract the log stream name. closes: #25522 Fix CONTRIBUTORS_QUICK_START Doc (#30549) Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Use custom validator for OpenAPI request body (#30596) * Use custom validator for OpenAPI request body The default error message for an empty request body from Connexion is quite unhelpful (taken directly from JSONSchema). This custom validator emits a more helpful message for this particular context. * Add test for custom request body validator Co-Authored-By: maahir22 <56473490+maahir22@users.noreply.github.com> --------- Co-authored-by: maahir22 <56473490+maahir22@users.noreply.github.com> Remove 'run-' prefix from pre-commit jobs (#30597) * Remove 'run-' prefix from pre-commit jobs The job ID already implies 'run', and having the additional prefix results in weird CLI, e.g. 'pre-commit run run-mypy-core'. This changes the CLI to 'pre-commit run mypy-core', which reads better. * Fix table marker * Fix outdated pre-commit hook ID references Add ability to override waiter delay in EcsRunTaskOperator (#30586) Prepare docs for RC2 of provider wave (#30606) Deactivate DAGs deleted from within zipfiles (#30608) DagBag: Use dag.fileloc instead of dag.full_filepath in exception message (#30610) Co-authored-by: Douglas Staple <staple.douglas@gmail.com> Remove gauge scheduler.tasks.running (#30374) * Remove gauge scheduler.tasks.running * Add significant.rst file * Update newsfragments/30374.significant.rst --------- Co-authored-by: Niko Oliveira <onikolas@amazon.com> Recover from `too old resource version exception` by retrieving the latest `resource_version` (#30425) * Recover from `too old resource version exception` by retrieving the latest `resource_version` * Update airflow/executors/kubernetes_executor.py Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> docs: use correct import path for Dataset (#30617) Speed up TaskGroups with caching property of group_id (#30284) Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Fix `TriggerDagRunOperator` with deferrable parameter (#30406) * readding after borked it * pre-commit * finally fixing after the github issue last week * push fix * feedback from hussein Fix failing SQS tests on moto upgrade (#30625) The new moto (4.1.7) performs additional validation on the queues created during tests and it failes the tests when content deduplication is not specified. Explicit setting the deduplication mode, fixes the problem and allows the new moto to be installed. fix possible race condition when refreshing DAGs (#30392) * fix possible race condition when refreshing DAGs * merge the two queries into one * Remove provide_session from internal function Since get_latest_version_hash_and_updated_datetime is internal and we always pass in the session anyway, the provide_session decorator is redundant and only introduce possibility for developer errors. --------- Co-authored-by: Sébastien Brochet <sebastien.brochet@nielsen.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Remove Norm and Hussein from the triage group (#30627) Hussein is now a commiter and Norm has completed building out the initial AIP-52 tasks. Remove mysql-connector-python (#30487) * Turn the package 'mysql-connector-python' as an optional feature * Update airflow/providers/mysql/provider.yaml * Update airflow/providers/mysql/CHANGELOG.rst --------- Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Better error message where non-compatible providers are not excluded (#30629) When compatibility check is performed for old version of Airflow, we attempt to install all providers for the old version. However if one of the providers has >= limit on Airflow for newer version of Airflow, this installation lead to attempting to upgrade airflow rather than fail, which could lead to misleading errors. This change adds "airflow==x.y.z" taken from --use-airflow-version flag to the `pip install` command, which should in this case fail with much more accurate message, that the provider conflicts with airflo version. Updating the links to the Dataform product documentation to fix 404 redirect error (#30631) Updating the links to the Dataform product documentation to fix 404 redirect error New AWS sensor — DynamoDBValueSensor (#28338) Remove duplicate param docstring in EksPodOperator (#30634) In `DockerOperator`, adding an attribute `tls_verify` to choose whether to validate certificate (#30309) (#30310) * add `tls_verify` to choose whether to validate certificate (#30309) --------- Co-authored-by: Hussein Awala <hussein@awala.fr> Add `max_active_tis_per_dagrun` for Dynamic Task Mapping (#29094) * add max_active_tis_per_dagrun param to BaseOperator * set has_task_concurrency_limits when max_active_tis_per_dagrun is not None * check if max_active_tis_per_dagrun is reached in the task deps * check if all the tasks have None max_active_tis_per_dagrun before auto schedule the dagrun * check if the max_active_tis_per_dagrun is reached before queuing the ti * check max_active_tis_per_dagrun in backfill job * fix current tests and ensure everything is ok before adding new tests * refacto TestTaskConcurrencyDep * fix a bug in TaskConcurrencyDep * test max_active_tis_per_dagrun in TaskConcurrencyDep * tests max_active_tis_per_dagrun in TestTaskInstance * test dag_file_processor with max_active_tis_per_dagrun * test scheduling with max_active_tis_per_dagrun on different DAG runs * test scheduling mapped task with max_active_tis_per_dagrun * test max_active_tis_per_dagrun with backfill CLI * add new starved_tasks filter to avoid affecting the scheduling perf * unify the usage of TaskInstance filters and use TI * refacto concurrecy map type and create a new dataclass * move docstring to ConcurrencyMap class and create a method for default_factory * move concurrency_map creation to ConcurrencyMap class * replace default dicts by counters * replace all default dicts by counters in the scheduler_job_runner module * suggestions from review Simplify logic to resolve tasks stuck in queued despite stalled_task_timeout (#30375) * simplify and consolidate logic for tasks stuck in queued * simplify and consolidate logic for tasks stuck in queued * simplify and consolidate logic for tasks stuck in queued * fixed tests; updated fail stuck tasks to use run_with_db_retries * mypy; fixed tests * fix task_adoption_timeout in celery integration test * addressing comments * remove useless print * fix typo * move failure logic to executor * fix scheduler job test * adjustments for new scheduler job * appeasing static checks * fix test for new scheduler job paradigm * Updating docs for deprecations * news & small changes * news & small changes * Update newsfragments/30375.significant.rst Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Update newsfragments/30375.significant.rst Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * added cleanup stuck task functionality to base executor * fix sloppy mistakes & mypy * removing self.fail from base_executor * Update airflow/jobs/scheduler_job_runner.py Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Update airflow/jobs/scheduler_job_runner.py Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Fix job_id filter * Don't even run query if executor doesn't support timing out queued tasks * Add support for LocalKubernetesExecutor and CeleryKubernetesExecutor * Add config option to control how often it runs - we want it quicker than the timeout * Fixup newsfragment * mark old KE pending pod check interval as deprecated by new check interval * Fixup deprecation warnings This more closely mirrors how deprecations are raised for "normal" deprecations. I've removed the depth, as moving up the stack doesn't really help the user at all in this situation. * Another deprecation cleanup * Remove db retries * Fix test --------- Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Jed Cunningham <jedcunningham@apache.org> Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com> Display Video 360 cleanup v1 API usage (#30577) * Display Video 360 cleanup v1 API usage * Update docs Fix mapped tasks partial arguments when DAG default args are provided (#29913) * Add a failing test to make it pass * use partial_kwargs when they are provide and override only None values by dag default values * update the test and check if the values are filled in the right order * fix overriding retry_delay with default value when it is equal to 0 * add missing default value for inlets and outlets * set partial_kwargs dict type to dict[str, Any] and remove type ignore comments * create a dict for default values and use NotSet instead of None to support None as accepted value * update partial typing by removing None type from some args and set NotSet for all args * Tweak kwarg merging slightly This should improve iteration a bit, I think. * Fix unit tests --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> First commit of OpenLineage provider. (#29940) This PR consistent mostly of code that was created in OpenLineage project. It consists of - Provider wiring - OpenLineageListener that uses Listener API to get notification about changes to TaskInstance and Dag states - Extractor framework, which is used to extract lineage information from particular operators. It's ment to be replaced by direct implementation of lineage features in later phase and extracting them using DefaultExtractor. This PR does not include actual extractors, but code around using and registering them. - OpenLineageAdapter that translates extracted information to OpenLineage events. - Utils around specific Airflow OL facets and features This is a base implementation that's not ment to be released yet, but to add code modified to be consistent with Airflow standards, get early feedback and provide canvas to add later features, docs, tests on. Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com> Add v2-6-test and v2-6-stable to codecov and protected branches (#30640) Adding configuration to control retry parameters for k8s api client (#29809) * Adding configuration to control retry parameters for k8s api client * Handling review comments * Fixing code bug * Fixing failing tests * Temporary commit with UT wip * Fixing unit test * Fixing the strict checks * Handling review comments from Hussein * Revert "Handling review comments from Hussein" This reverts commit fa3bc260f7462c42620f694ee97b7f15c0b0b9c3. * Fixing failing ut * Reverting bad hack * Updating logic in kube_client.py Co-authored-by: Hussein Awala <hussein@awala.fr> * Fixing unit tests * Fixing unit tests * Handling review comments from Ash * Fix loading mock call args for python3.7 * Apply suggestions from code review * fix static check * add in 2.6.0 --------- Co-authored-by: Amogh <adesai@cloudera.com> Co-authored-by: Hussein Awala <houssein.awala.96@gmail.com> fix(chart): webserver probes timeout and period. (#30609) * fix(chart): webserver probes timeout and period * Update default values in JSON schema to reflect values.yaml * remove defautl templated values Clarify release announcements on social media (#30639) DynamoDBHook - waiter_path() to consider `resource_type` or `client_type` (#30595) * Add while initilizing * Add while initilizing * Add logic to pick either client_type or resource_type * Add test case * Assert expected path Improve task & run actions ux in grid view (#30373) * update run clear+mark, update task clear * add mark as tasks and include list of affected tasks * Add support for mapped tasks, add shared modal component * Clean up styling, restore warning for past/future tg clear Add command to get DAG Details via CLI (#30432) --------- Co-authored-by: Hussein Awala <hussein@awala.fr> Co-authored-by: Hussein Awala <houssein.awala.96@gmail.com> When clearing task instances try to get associated DAGs from database (#29065) * When clearing task instances try to get associated DAGs from database. This fixes problems when recursively clearing task instances across multiple DAGs: * Task instances in downstream DAGs weren't having their `max_tries` property incremented, which could cause downstream external task sensors in reschedule mode to instantly time out (issue #29049). * Task instances in downstream DAGs could have some of their properties overridden by an unrelated task in the upstream DAG if they had the same task ID. * Use session fixture for new `test_clear_task_instances_without_dag_param` test. * Use session fixture for new `test_clear_task_instances_in_multiple_dags` test. --------- Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Organize Amazon providers docs index (#30541) preload airflow imports before dag parsing to save time (#30495) --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com> Add delete inactive run functionality to databricks provider (#30646) Create audit_logs.rst (#30405) --------- Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com> Present affected task instances as table (#30633) Helm chart 1.9.0 has been released (#30649) Add 2.6.0b1 to issue template (#30652) add missing project_id in BigQueryGetDataOperator (#30651) Properly classify google_vendor package to google provider (#30659) We've recently added google_venor package to vendor-in ads library, and we had to do it outside of regular google provider package, because internally the library assumed our google package is top level package when discovering the right relative imports (#30544). This confused the pre-commit that updates provider depedencies to not recognise the package and print warnings about bad classification. Special case handling will classify it to google provider. Make pandas optional in workday calendar example (#30660) The workday calendar expected pandas to be available and it is part of our examples, however Airflow does not have pandas as a core dependency, so in case someone does not have pandas installed, importing of the workday example would fail. This change makes pandas optional and fallbacks to regular working days for the example in case it is not available (including warning about it). It also fixes a slight inefficiency where the USFederalHoliday calendar has been created every time next workday was calculated. Update Google Campaign Manager360 operators to use API v4 (#30598) Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Skip KubernetesPodOperator task when it returns a provided exit code (#29000) * Skip KubernetesPodOperator task when it returns a provided exit code * set default value to None, and get exit code only when skip_exit_code is not None * get the exit code for the base container and check if everything is ok * add unit test for the operator * add a test for deffered mode * apply change requests --------- Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Upgrade Pip to 23.1 (#30663) Fix docs building for workday example. (#30664) The #30660 was merged to quickly as it results in doc building failure. This PR fixes it. docker compose doc changes (#30662) Add suspended providers to pytest collection test (#30668) Pytest collection has been extracted recently to a separate job and SUSPENDED_PROVIDERS_FOLDERS variable was not set in the new job - which causes suspended provider tests being attempted by pytest collection, leading to import errors when suspended providers have some dependencies removed from our image. Workaround type-incompatibility with new attrs in openlineage (#30674) The new attrs released today (11 hours ago) had added typing information and they caused OpenLineageRedactor to fail mypy checks. Temporary adding type: ignore should allow to upgrade to the new attrs and stop PRs changing dependencies from failing. Related: #30673 Update the release note (#30680) * Update the release note During the beta release, I observed some minor things that need fixing. Here's the PR * Use local import Correctly pass a type to attrs.has() (#30677) Merge WasbBlobAsyncSensor to WasbBlobSensor (#30488) Updated app to support configuring the caching hash method for FIPS v2 (#30675) Install twine with --force for package verification (#30683) In some cases when the machine has been reused across builds, pipx installed twine might seem both installed and removed (this happens when builds are cancelled while installing twine. Installing twine with --force should fix the problem. Fix docs: add an "apache" prefix to pip install (#30681) Remove unittests.TestCase from tests/test_utils (#30685) Introduce consistency of package sequence for "Other" test type (#30682) When packages for "Other" test type are calculated, the list of all test folders is generated and they are compared with the packages previously selected by the "predefined" test types. This is done via `find` method that returns the folders in arbitrary order, mostly depending on the sequence the folders were created. In case the tests from some packages have some side-effects that impact tests in other packages (obviously not something that is desired), this might end up that the tests succeed in one environment, but fail in another. This happened for example in case of #30362 that had cross-package side-effect later fixed in #30588. There - results of "Other" test type depended on where the tests were executed. This PR sorts the find output so it is always in consistent order. we are using ASCII for package names and the test types are derived in the same Docker CI image with the same LOCALE, so it should guarantee that the output of packages for "Other" test type should be always consistent. Add missing version val to caching_hash_method config (#30688) Upgrade to MyPy 1.2.0 (#30687) Upgrading to latest (released a week ago) MyPy in the hopes it will fix some more problem with attrs after upgrading new packages, but it seems that even the latest MyPy does not know about the new typing changes introduced in attrs (traditionally mypy has attrs plugin that injects appropriate typing but apparently it needs to catch up with those changes. Parallelize Helm tests with multiple job runners (#30672) Helm Unit tests are using template rendering and the rendering uses a lot of CPU for `helm template command`. We have a lot of those rendering tests (>800) so even running the tests in parallel on a multi-cpu machine does not lead to a decreased elapsed time to execute the tests. However, each of the tests is run entirely independently and we should be able to achieve much faster elapsed time if we run a subset of tetsts on separate, multi-CPU machine. This will not lower the job build time, however it might speed up elapsed time and thus get a faster feedback. This PR achieves that. Skip PythonVirtualenvOperator task when it returns a provided exit code (#30690) * Add a new argument to rais skip exception when the python callable exit with the same value * add unit tests for skip_exit_code [OTel Integration] Add tagging to existing stats (#30496) rename skip_exit_code to skip_on_exit_code and allow providing multiple codes (#30692) * rename skip_exit_code to skip_on_exit_code and allow providing multiple codes * replace list type by Container Fix d3 dependencies (#30702) Update system test example_emr to have logs (#30715) Fixed logging issue (#30703) Co-authored-by: Mark Richman <mrkrchm@amazon.com> Separate out and clarify policies for providers (#30657) This change separates out the policies we have for providers to a separate PROVIERS.rst file. It also documents clearly the process and policy we have for accepting new community-managed providers, explaining the conditions that have to be fulfilled and stating a very strong preference of keeping providers maintained by the 3rd-party providers when there are 3rd-party teams that manage the providers. SqlToS3Operator - Add feature to partition SQL table (#30460) Optimize parallel test execution for unit tests (#30705) We are runnig the tests in parallel test types in order to speed up their execution. Howver some test types and subsets of tests are taking far longer to execute than other test types. The longest tests to run are Providers and WWW tests, and the longest tests from Providers are by far Amazon tests, then Google. "All Other" Provider tests take about the same time as Amazon tests - also after splitting the provider tests, Core tests take the longest time. When we are running tests in parallel on multiple CPUs, often the longest running tests remain runing on their own while the other CPUS are not busy. We could run separate tests type per provider, but overhead of starting the database and collecting and initializing tests for them is too big for it to achieve speedups - especially for Public runners, having 80 separate databases with 80 subsequent container runs is slower than running all Provider tests together. However we can split the Provider tests into smaller number of chunks and prioritize running the long chunks first. This should improve the effect of parellelisation and improve utilization of our multi-CPU machines. This PR aims to do that: * Split Provider tests (if amazon or google are part of the provider tests) into amazon, google, all-other chunks * Move sorting of the test types to selective_check, to sort the test types according to expected longest running time (the longest tests to run are added first) This should improve the CPU utilization of our multi-CPU runners and make the tests involving complete Provider set (or even sets containing amazon, google and few other providers) execute quite a few minutes faster on average. We could also get rid of some sequential processing for the Public PRs because each test type we will run will be less demanding overall. We used to get a lot of 137 exit codes (memory errors) but with splitting out Providers, the risk of exhausting resources be two test types running in paralel are low. Deprecate `skip_exit_code` in `BashOperator` (#30734) Add explicit information about how to write task logs (#30732) There was no explicit information in our documentation on how to write logs from your tasks. While for classic operators, that is easy and straightforward as they all have log property which is the right logger coming from LoggingMixin, for taskflow code and custom classes it is is not straightforward that you have to use `airflow.task` logger (or a child of it) or that you have extend LoggingMixin to use the built-in logging configuration. Suspend Yandex provider due to protobuf limitation (#30667) Yandex provider brings protobuf dependency down to <4 and we are gearing up to updating it everywhere else. Protobuf3 support ends in Q2 2023 for Python https://protobuf.dev/support/version-support/#python Yandex is the last provider that we do not closely collaborate with on fixing * Gogle provider dependencies are actively upgraded to latest version by Google led team: #30067 (some of the libraries are already updated) with target to update all dependencies by mid-May * Apache-Beam has already merged protobuf4 support https://github.com/apache/beam/pull/25874 with the target of releasing it in 2.47.0 mid-May * The mysql-connector-python in MySQL provider is already turned into optional dependency: #30487 The only remaining dependency limiting us to protobuf 3 (<3.21) is yandexcloud. We've opened an issue to yandexcloud https://github.com/yandex-cloud/python-sdk/issues/71 3 weeks ago and while there was an initial interest, there is no progress on the issue, therefore - in order to prepare for running all the tests and final migration to protobuf4 we need to suspend Yandex provider - following the suspension process we agreed and got a LAZY CONSENSUS on in the https://lists.apache.org/thread/g8b3k028qhzgw6c3yz4jvmlc67kcr9hj mailing list discussion. The yandex provider can be removed from suspension by a PR reverting this change once yandexcloud dependency removes the protobuf limitation in their release and PR reverting this change (and fixing all tests and static check that will be needed) is the way it can be done. Add a collapse grid button (#30711) Add skip_on_exit_code also to ExternalPythonOperator (#30738) The change ##30690 and #30692 added skip_on_exit_code to the PythonVirtualenvOperator, but it skipped the - very closely related - ExternalPythonOperator. This change brings the same functionality to ExternalPythonOperator, moves it to the base class for both operators, it also adds separate Test class for ExternalPythonOperator, also introducing a common base class and moving the test methods that are common to both operators there. Add multiple exit code handling in skip logic for BashOperator (#30739) Follow-up after #30734 Deprecate `skip_exit_code` in `DockerOperator` and `KubernetesPodOperator` (#30733) * Deprecate `skip_exit_code` in `DockerOperator` and `KubernetesPodOperator` * satisfy mypy Remove protobuf limitation from eager upgrade (#30182) Protobuf limitation was added to help pip resolve eager upgrade dependencies, however it is not needed any more. Fix misc grid/graph view UI bugs (#30752) add a stop operator to emr serverless (#30720) * add a stop operator to emr serverless * update doc --------- Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com> Better explanation on how to log from tasks (#30746) * Better explanation on how to log from tasks After Daniel's explanation this should provide a better description on how to log from tasks. Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com> Co-authored-by: Niko Oliveira <onikolas@amazon.com> Skip suspended providers when generating providers summary index (#30763) When provider's summary index gets generated it should not include suspended providers. This has been missed in #30422 Fix when OpenLineage plugins has listener disabled. (#30708) Add parametrized test for disabling OL listener in plugin. Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com> Split installation of sdist providers into parallel chunks (#29223) Sdist provider installation takes a lot of time because pip cannot parallelise the sdist package building. But we still want to test all our provider's installation as sdist packages. This can be achieved by running N parallel installations with only subset of providers being installed in each chunk. This is what we do in this PR. Speed up package wheel job in CI (#30766) After recent improvements, package wheel job has become one of the longest jobs to run. So far it sequentially build airlfow, prepared documentation for packages, build the packages installed both airflow and packages and tested imports for them, then it was removing installed airlfow, removed airflow and run the same tests with 2.3 airflow version to check for compatibility. This change splits it into two parallel jobs. There is a small duplication (3 minutes of preparing the whl packages) but then the "compatibility" job does not need Airflow and few other steps to be run (such as preparing docs or airlfow) and overall we just get few minutes longer to repeate the wheel package preparation but then each of the two jobs will take a bit more than half the time of the original way, which will greately improve feedback time for the users (in most cases the two jobs will complete under 12 minutes, where the original job needed 21 minutes to complete. Use material icons for dag import error banner (#30771) * Use material icons for dag import error banner * fix message caret direction Update DataprocCreateCluster operator to use 'label' parameter properly (#30741) Add multiple exit code handling in skip logic for `DockerOperator` and `KubernetesPodOperator` (#30769) remove delegate_to from GCP operators and hooks (#30748) Remove @poke_mode_only from EmrStepSensor (#30774) * Remove @poke_mode_only from EmrStepSensor * Add EmrStepSensor to system test and documentation * Fix test add pod status phase to KPO test mock (#30782) Export SUSPENDED_PROVIDERS_FOLDERS for breeze testing commands (#30780) Export the SUSPENDED_PROVIDERS_FOLDERS env var in breeze directly instead of in Airflow CI workflows. This will fix the issue for users executing `breeze testing ...` commands locally. Add openlineage to boring-cyborg.yml (#30772) Improve url detection (#30779) Adapt to better resolver of pip (#30758) We used to have helper limits for eager upgrade of our packages but with 23.1 updated in #30663 pip has a much improved resolver that does not need that much of a help and can resolve our dependnecies pretty fast on its own, so we can remove all the dependecy limts that aimed to limit the dependency resolution time. Also we used to have a mechanism to track backtracking issues and find out which of the new dependencies caused excessive backtracking. This seems to be not needed so we can remove it from CI and breeze. Add explanation on why we have two local pre-commit groups (#30795) Remove skip_exit_code from KubernetesPodOperator (#30788) Since the parameter was not released we can safely remove it without deprecation. Better message on deserialization error (#30588) Previously deserialization error thrown a pretty mysterious ValueError in case for example there was a Python version mismatch - the python object serialized in one version of Python produced "version error" messsage. This change turns such ValueError in specific Deserilization error, with better message explaining possible reason, but also without loosing the cause. Co-authored-by: Shahar Epstein <shahar1@live.com> Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com> AWS logs. Exit fast when 3 consecutive responses are returned from AWS Cloudwatch logs (#30756) * AWS logs. Exit fast when 3 consecutive responses are returned from AWS Cloudwatch logs --------- Co-authored-by: Niko Oliveira <onikolas@amazon.com> Add provider for Apache Kafka (#30175) * Add provider for Apache Kafka Pulls in a series of integrations to Kafka from airflow-provider-kafka (https://pypi.org/project/airflow-provider-kafka/) to core airflow. --------- Co-authored-by: Tamara Janina Fingerlin <90063506+TJaniF@users.noreply.github.com> Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com> Remove deprecated code from Amazon provider (#30755) * Remove deprecated code from Amazon provider Prepare docs for adhoc release of providers (#30787) * Prepare docs for adhoc release of providers Speed up test collection (#30801) Test collection had default setting for parallel test types because TEST_TYPES variable had not been renamed to PARALLEL_TEST_TYPES Also test collection can be run in Wait for CI images job which should save around a minute for setting up Breeze and pulling the images. This should speed up pytest collection test by around 1 and half minute Removed Mandatory Encryption in Neo4jHook (#30418) * Removed mandatory encryption in neo4jhook * Added unit tests and altered exising * Added unit-test and fixed existing ones. * Changed the implementation of get_client * Changed test for encrypted param * fix unit test and check if encrypted arg is provided or not * fix static checks * fix unit tests fo python 3.7 --------- Co-authored-by: Hussein Awala <hussein@awala.fr> restore fallback to empty connection behavior (#30806) Also remove restored behavior from changelog fixes to system tests following obsolete cleanup (#30804) Co-authored-by: Niko Oliveira <onikolas@amazon.com> Add deferrable mode to `WasbPrefixSensor` (#30252) Add licence to The __init__.py in google_vendored (#30807) This is not a problem (as empty __init__.py has 0 creativity so the licence can be skipped) but it confuses RAT tool when verifying the sources. add sentry transport configuration option (#30419) Upgrade to pip 23.1.1 (#30808) Just released, fresh off-the-press bugifx version of pip. Clean bigquery operator tests (#30550) Add deferrable mode to `GCSObjectUpdateSensor` (#30579) Fix removed delegate_to parameter in deferrable GCS sensor (#30810) Two PRs crossed and the result of #30748 caused the #30579 to fail as delegate_to parameter has been removed. Upgrade ruff to 0.0.262 (#30809) Fix dev index building for suspended providers (#30812) This is a follow-up after #30422 and #30763 - it turns out that locally building index of providers failed when some providers are suspended. It only impacts dev workflow locally. Add instructions on how to avoid accidental airflow upgrade/downgrade (#30813) Some of our users raised issues that when extending the image, airflow suddenly started reporting problem with database versions and migration not aplied or out-of-sync. This almost always turns out to be a dependency conflict, that leads to automated downgrate or upgrade of installed airflow version. This is - obviously - undesired (you should be upgrading airflow consciously rather than accidentally). However there is no way to do it implicitly - `pip` might decide to upgrade or downgrade airflow as it sees fit. From the point of view - airflow is just one of the packages and has no special meaning. The only way to "keep" airflow version is to specify it together with other requirements, pinned to the specific version. This PR updates our examples to do this and explains why airflow is added there. There is - of course - another risk that the user will forget to update the version of airflow when they upgrade, however, sinc this is explicit action performed during image extension, it is much easier to diagnose and notice. We also warn the users that they should upgrade when airflow is upgraded. Make eager upgrade additional dependencies optional (#30811) In case additional dependencies are installed in customisation path of Docker, the eager upgrade dependencies are now empty after #30758, which made installation of extra dependencies fail. This PR makes it optional. Include sequoia.com in INTHEWILD (#30814) Reenable clear on TaskInstanceModelView for role User (#30415) * Reenable clear on TaskInstanceModelView for role User The action was disable in https://github.com/apache/airflow/pull/20659 which resolved https://github.com/apache/airflow/issues/20655. The issue only mentions that the edit is broken and should be disabled. So it seem like the disabling of the clear action was unintentional. Also based on the discussion in the PR https://github.com/apache/airflow/issues/20655 further reinforces this. That the author believed it still worked could be explain by that using a user with role `Admin` the action was still available and therefore one could easily make a mistake believing it still worked as expected. This PR reenables it action and modifies and existing test case to also verify that clearing is possible using a user with the role `User`. * Add back other set state actions * fix static checks --------- Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> `ExternalTaskSensor`: add `external_task_group_id` to `template_fields` (#30401) * Add missing info in external_task.py Add missing external_task_group_id parameter to the ExternalTaskSensor docstring and template_fields. As suggested, to match other operator classes add `(templated)` to templated fields. add missing read for K8S config file from conn in deferred `KubernetesPodOperator` (#29498) * restore convert_config_file_to_dict method and deprecate it Update log level in scheduler critical section edge case (#30694) This log message can be useful if the scheduler ends up needing to query TIs more than once per scheduler loop, so make it INFO vs DEBUG to increase discoverability. Validate `executor` and `config.core.executor` match (#30693) The chart expects the executor to be set in `executor`, however if a user only sets `config.core.executor` it is difficult to diagnose as the chart deploys the wrong rbac resources. This tries to catch that situation. Count mapped upstreams only if all are finished (#30641) * Fix Pydantic TI handling in XComArg.resolve() * Count mapped upstreams only if all are finished An XComArg's get_task_map_length() should only return an integer when the *entire* task has finished. However, before this patch, it may attempt to count a mapped upstream even when some (or all!) of its expanded tis are still unfinished, resulting its downstream to be expanded prematurely. This patch adds an additional check before we count upstream results to ensure all the upstreams are actually finished. * Use SQL IN to find unfinished TI instead This needs a special workaround for a NULL quirk in SQL. Optimize performance of scheduling mapped tasks (#30372) * Optimize performance of scheduling mapped tasks * Provide max_tis_per_query as a parameter for the schedule_tis method * Add max_tis_per_query to the JobPydantic class --------- Co-authored-by: Zhyhimont Dmitry <zhyhimont.d@profitero.com> Co-authored-by: Zhyhimont Dmitry <dzhigimont@gmail.com> Update the user-facing documentation of providers (#30816) We've recently clarified and described our policies for accepting providers to be maintained by the community (#30657) - this was directed towards the Airflow developers and contributors. This PR reviews user-facing part of the documentation for providers by removing some obsolete/not very useful documentation and pointing to the new policy where appropriate. Small refactors in ClusterGenerator of dataproc (#30714) Rename most pod_id usage to pod_name in KubernetesExecutor (#29147) We were using pod_id in a lot of place, where really it is just the pod name. I've renamed it, where it is easy to do so, so things are easier to follow. Deprecate databricks async operator (#30761) detailed docs (#30729) fixed some errant strings in the kafka example dags (#30818) * fixed some errant strings in the kafka example dags * fixed some errant strings in the kafka example dags Add repair job functionality to databricks hook (#30786) * add repair job run functionality * Add tests Use template comments for the chart license header (#30569) Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> allow multiple prefixes in gcs delete/list hooks and operators (#30815) Update the error message for invalid use of poke-only sensors (#30821) Fix XCom deserialization when it contains nonprimitive values (#30819) * Add testcase to show issue with deserialization * fix XCom deserializion --------- Co-authored-by: utkarsh sharma <utkarsharma2@gmail.com> Add Fail Fast feature for DAGs (#29406) Improve nested_dict serialization test (#30823) --------- Co-authored-by: bolkedebruin <bolkedebruin@users.noreply.github.com> Improve Quick Start instructions (#30820) Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Add retry param in databrics async operator (#30744) * Add retry param in databrics async operator * Apply review suggestions Optimize docs building in CI (#30825) * Optimize docs building in CI Docs building is the longest build for regular PRs - it takes 30 minutes for any PR that touches any of the docs or python files. This PR optimises it - only the affected packages will be built when the PR touches only some of the files. * fixup! Optimize docs building in CI * fixup! fixup! Optimize docs building in CI * fixup! fixup! fixup! Optimize docs building in CI Optimize away pytest collection steps (#30824) The Pytest collection steps are only needed if there are any tests about to be run. There are cases where we build CI images but we do not expect to run any tests (for doc-only …

add missing read for conf file from connection

72a1913

hussein-awala requested a review from jedcunningham as a code owner February 12, 2023 22:15

boring-cyborg bot added provider:cncf-kubernetes Kubernetes provider related issues area:providers labels Feb 12, 2023

add a test for the different methods to provide the config file path

8ffa8eb

hussein-awala changed the title ~~[WIP] add missing read for K8S config file from conn in deferred KubernetesPodOperator~~ add missing read for K8S config file from conn in deferred KubernetesPodOperator Feb 13, 2023

change loading order to config_file, connection then env var

8f40fbb

raphaelauv reviewed Feb 13, 2023

View reviewed changes

potiuk requested a review from dstandish February 20, 2023 09:32

hussein-awala marked this pull request as draft February 20, 2023 09:56

load the config file in the triggerer instead of loading it in the wo…

e7b0e39

…rker and pass it as a dict

fix deferrable mode tests

a007a76

hussein-awala marked this pull request as ready for review February 22, 2023 00:55

raphaelauv approved these changes Feb 23, 2023

View reviewed changes

raphaelauv mentioned this pull request Mar 14, 2023

KPO (async) log full config_dict in triggerer #30097

Closed

2 tasks

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 11, 2023

github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 14, 2023

hussein-awala force-pushed the fix/deferrable_k8s_pod_op branch 4 times, most recently from 839b3c1 to 59d76b8 Compare April 14, 2023 23:40

Merge branch 'main' into fix/deferrable_k8s_pod_op

24885c6

hussein-awala force-pushed the fix/deferrable_k8s_pod_op branch from 59d76b8 to 24885c6 Compare April 15, 2023 00:28

eladkal reviewed Apr 15, 2023

View reviewed changes

hussein-awala mentioned this pull request Apr 15, 2023

Add provider for Apache Kafka #30175

Merged

hussein-awala force-pushed the fix/deferrable_k8s_pod_op branch from bdef821 to 1a87f6e Compare April 18, 2023 00:04

restore convert_config_file_to_dict method and deprecate it

5bc37f2

hussein-awala force-pushed the fix/deferrable_k8s_pod_op branch from 1a87f6e to 5bc37f2 Compare April 18, 2023 00:07

potiuk approved these changes Apr 22, 2023

View reviewed changes

eladkal approved these changes Apr 22, 2023

View reviewed changes

potiuk merged commit b5296b7 into apache:main Apr 22, 2023
43 checks passed

eladkal mentioned this pull request May 16, 2023

Status of testing Providers that were prepared on May 19, 2023 #31322

Closed

80 tasks

bjankie1 mentioned this pull request Dec 29, 2023

Invalid kube-config file. Expected key current-context in kube-config when using deferrable=True #34644

Closed

2 tasks

GoVulnBot mentioned this pull request Jan 24, 2024

x/vulndb: potential Go vuln in github.com/apache/airflow: CVE-2023-51702 golang/vulndb#2475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add missing read for K8S config file from conn in deferred `KubernetesPodOperator` #29498

add missing read for K8S config file from conn in deferred `KubernetesPodOperator` #29498

hussein-awala commented Feb 12, 2023

raphaelauv Feb 13, 2023

hussein-awala Feb 14, 2023

VladaZakharova Feb 14, 2023 •

edited

raphaelauv Feb 14, 2023

VladaZakharova Feb 14, 2023

hussein-awala Feb 14, 2023

bjankie1 Feb 24, 2023

potiuk commented Feb 20, 2023

hussein-awala commented Feb 20, 2023

VladaZakharova commented Feb 20, 2023 •

edited

hussein-awala commented Feb 21, 2023

raphaelauv commented Feb 21, 2023

raphaelauv left a comment

github-actions bot commented Apr 11, 2023

raphaelauv commented Apr 13, 2023

eladkal Apr 15, 2023

hussein-awala Apr 15, 2023

eladkal Apr 15, 2023

potiuk commented Apr 22, 2023

add missing read for K8S config file from conn in deferred KubernetesPodOperator #29498

add missing read for K8S config file from conn in deferred KubernetesPodOperator #29498

Conversation

hussein-awala commented Feb 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VladaZakharova Feb 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

potiuk commented Feb 20, 2023

hussein-awala commented Feb 20, 2023

VladaZakharova commented Feb 20, 2023 • edited

hussein-awala commented Feb 21, 2023

raphaelauv commented Feb 21, 2023

raphaelauv left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 11, 2023

raphaelauv commented Apr 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

potiuk commented Apr 22, 2023

add missing read for K8S config file from conn in deferred `KubernetesPodOperator` #29498

add missing read for K8S config file from conn in deferred `KubernetesPodOperator` #29498

VladaZakharova Feb 14, 2023 •

edited

VladaZakharova commented Feb 20, 2023 •

edited