Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add missing read for K8S config file from conn in deferred KubernetesPodOperator #29498

Merged
merged 7 commits into from Apr 22, 2023

Conversation

hussein-awala
Copy link
Member

closes: #29488


The async execute method of KubernetesPodOperator doesn't check if the config_path is provided in the connection extra, this PR fixes this by extracting the config path in order to read it and convert it to dictionary.

@hussein-awala hussein-awala changed the title [WIP] add missing read for K8S config file from conn in deferred KubernetesPodOperator add missing read for K8S config file from conn in deferred KubernetesPodOperator Feb 13, 2023
@@ -565,7 +565,16 @@ def execute_async(self, context: Context):

def convert_config_file_to_dict(self):
"""Converts passed config_file to dict format."""
config_file = self.config_file if self.config_file else os.environ.get(KUBE_CONFIG_ENV_VAR)
config_file = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @hussein-awala for proposing this fix.

why the async need the function convert_config_file_to_dictand not the sync ?

Look like the async was implemented not fully following this pattern -> #20578

your PR fix the problems for the extra config_path , there is a risk that another is missing or new in the future would need "manual" fix like this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the initial reason to convert the file into dictionary before creating the trigger, it may be to avoid copying the config file to the triggerer, where the pod is created on the worker using the sync hook and the waiting task is running on the triggerer and it uses the async hook.

here is a risk that another is missing or new in the future would need "manual" fix like this

With this fix, we cover all options currently available to provide the configuration file, and yes, if we add a new one in the future, we must add it on the sync hook and in this method.

@VladaZakharova can you please explain what was the motivation to convert the config file to a dictionary before creating the trigger?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Team!
This was implemented to that config file was converted to dict to be passed to trigger and then hook to establish connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by lighten the credential management ?

the hook is not re instantiate at every run of the trigger ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We needed a way to pass config file to the trigger to create a client for kubernetes, but using file system to communicate with trigger was not a good solution. So then we added a possibility to pass all config file parameters as a dict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To respect the pattern mentioned by @raphaelauv, I will try loading the config file in the async hook, this should work where the triggerer is initiated once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mind that all FS operations are blocking side effects. It's violating asyncio contract and can cause additional error logs informing about blocking code.

@potiuk
Copy link
Member

potiuk commented Feb 20, 2023

@hussein-awala I guess you will be still changing the config access pattern on that one ? Do I understand correctly?

@hussein-awala
Copy link
Member Author

I guess you will be still changing the config access pattern on that one ? Do I understand correctly?

Yes, I'm testing loading the config file in the triggerer instead of loading it in the worker and pass it as a dict.

I convert the PR to draft until I finish testing

@hussein-awala hussein-awala marked this pull request as draft February 20, 2023 09:56
@VladaZakharova
Copy link
Contributor

VladaZakharova commented Feb 20, 2023

Hi!
May i ask in which format you will pass the config file to trigger? So it will be just a file passed as a parameter to trigger? Or how?

@hussein-awala
Copy link
Member Author

Hi! May i ask in which format you will pass the config file to trigger? So it will be just a file passed as a parameter to trigger? Or how?

@VladaZakharova - Yes, I pass the file path and let the triggerer loads it. Can you check my last commit?

BTW, I am not sure if loading the config file from the env var KUBECONFIG is a good idea or not, because it's difficult to decide when we need to load it and when we don't.

@raphaelauv
Copy link
Contributor

Loading the config file from the env KUBECONFIG is deprecated in latest provider version

@hussein-awala hussein-awala marked this pull request as ready for review February 22, 2023 00:55
Copy link
Contributor

@raphaelauv raphaelauv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 11, 2023
@raphaelauv
Copy link
Contributor

@hussein-awala the PR have conflicts , could you rebase on main , thank you 👍

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 14, 2023
@hussein-awala hussein-awala force-pushed the fix/deferrable_k8s_pod_op branch 4 times, most recently from 839b3c1 to 59d76b8 Compare April 14, 2023 23:40
Comment on lines 566 to 573
def convert_config_file_to_dict(self):
"""Converts passed config_file to dict format."""
config_file = self.config_file if self.config_file else os.environ.get(KUBE_CONFIG_ENV_VAR)
if config_file:
with open(config_file) as f:
self._config_dict = yaml.safe_load(f)
else:
self._config_dict = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is removing this function considered a breaking change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this method is used as a private method since it only updates some attributes in the class instances without returning any value. However, it's possible that someone could extend the operator class and use it. Should we deprecate it and remove it in the next major release, or should we add a breaking change note?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So lets deprecate first. Just to be on the safe side.

@potiuk
Copy link
Member

potiuk commented Apr 22, 2023

LGTM. @eladkal ?

@potiuk potiuk merged commit b5296b7 into apache:main Apr 22, 2023
43 checks passed
howardyoo pushed a commit to howardyoo/airflow that referenced this pull request Mar 31, 2024
This is safe as Jinja template will not validate with s3 rules

Small quotation fix (#30448)

Co-authored-by: bugraozturk <bugra.ozturk@mollie.com>

Merge DbtCloudJobRunAsyncSensor logic to DbtCloudJobRunSensor (#30227)

* feat(providers/dbt): move the async execution logic from DbtCloudJobRunAsyncSensor to DbtCloudJobRunSensor

* test(providers/dbt): add test cases for DbtCloudJobRunSensor when its deferrable attribute is set to True

* docs(providers/dbt): update the doc for DbtCloudJobRunSensor deferrable mode and DbtCloudJobRunAsyncSensor deprecation

* refactor(proviers/dbt): deprecate poll_interval argument

* docs(providers/dbt): add deprecation note as DbtCloudJobRunAsyncSensor docstring

* fix(providers/dbt): check whether timeout is in kwargs

* docs(providers/dbt): add missing deferrable=True in howto_operator_dbt_cloud_run_job_sensor_defered

Collect test types in CI in parallel (#30450)

One of the steps in our CI is to collect tests, which is to prevent
parallel tests from even starting when we know test collection will
fail (it is terrible waste of resources to start 20 test jobs and
initialize databases etc. when we know it is not needed.

This however introduced a single point of delay in the CI process,
which with the recent collection protection implemented in #30315 piled
up to more than 5 minutes occassionally on the CI machines of ours,
especially on public runners.

This PR utilises our existing test framework to be able to parallelise
test collection (Pytest does not have paralllel collection
mechanism) - also for localised PRs it will only run test collection
for the test types that are going to be executed, which will speed it
up quite a lot.

This might get further sped up if we split Provider tests into
smaller groups to parallelise them even more.

remove stray parenthesis in spark provider docs (#30454)

Rename --test-types to --parallel-test-types parameters (#30424)

The --test-type and --test-types parameters were very similar, but
they have enough difference to differentiate them even more:

The --test-type is specifically to run single test and it might
include not only the regular "test-types" but also allow for
cross-selection of tests from different other types (for example
--test-type Postgres will run tests for Postgres database and
they might come from Providers, Others, CLI etc.

Where --test-types was generally foreseen to be able to split
the tests into "sepearated" groups that could be run in
parallel.

The parameters have different defaults and even different choice
of test type that you could choose from (and --test-types is a
space-separated one to make it easier to pass it around in CI,
where rather than passing multiple (variable number) of parameters,
it's easier to pass a single, even space-separated list of tests
to run.

This change is good to show the difference between then parameters
and to stress that they are really quite different, also it makes
it easier to avoid confusion people might have especially that the
name was easy to have typo in.

In a way (but different than in the original issue it
Fixes: #30407

Fix cloud build async credentials (#30441)

Fix bad merge conflict on test-name-parameter-change (#30456)

We've added a new reference to test-types in #30450 and it clashed
with parameter rename in #30424. This resulted in bad merge
(not too dangerous, just causing missing optimisation in collection
elapsed time in case only a subset of test types were to be executed.

Add description of the provider suspension process (#30359)

Following discussion at the devlist, we are adding description
of the suspension process for providers that hold us back from
upgrading old dependencies. Discussion here:

https://lists.apache.org/thread/j98bgw9jo7xr4fvjh27d6bfoyxr1omcm

Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>

fix: upgrade moment-timezone package to fix Tehran tz (#30455)

Discovery safe mode toggle comment clarification (#30459)

Fix Breeze failing with error on Windows (#30464)

When breeze is run on Windows it fails with FileNotFoundException
when running uname during emulation check. This is now fixed alongside
fixing TimeoutError misplacement - after moving it to local import,
an exception triggered before importing it causes UnboundLocaalError.

Related: https://github.com/apache/airflow/pull/30405#issuecomment-1496414377

Fixes: #30465

Update MANIFEST_TEMPLATE.in.jinja2 (#30431)

* Update MANIFEST_TEMPLATE.in.jinja2

* remove google

* remove README.md

Add mechanism to suspend providers (#30422)

As agreed in https://lists.apache.org/thread/g8b3k028qhzgw6c3yz4jvmlc67kcr9hj
we introduce mechanism to suspend providers from our suite of providers
when they are holding us back to older version of dependencies.

Provider's suspension is controlled from a single `suspend` flag in
`provider.yaml` - this flag is used to generate
providers_dependencies.json in generated folders (provider is skipped
if it has `suspended` flag set to `true`. This is enough to exclude
provider from the extras of airflow and (automatically) from being
used when CI image is build and constraints are being generated,
as well as from provider documentation/generation.

Also several parts of the CI build use the flag to filter out
such suspended provider from:

* verification of provider.yaml files in pre-commit is skipped
  in terms of importing and checking if classes are defined and
  listed in the provider.yaml
* the "tests" folders for providers are skipped automatically
  if the provider has "suspend" = true set
* in case of PR that is aimed to modify suspended providers directory
  tree (when it is not a global provider refactor) selective checks
  will detect it and fail such PR with appropriate message suggesting
  to fix the reason for suspention first
* documentation build is skipped for suspended providers
* mypy static checks will skip suspended provider folders while we
  will still run ruff checks on them (unlike mypy ruff does not
  expect the libraries that it imports to be available and we are
  running ruff in a separate environment where no airflow dependencies
  are installed anyway

Add docs for livy deferrable operator (#30397)

* Add docs for livy deferrable

* Add docs for livy deferrable

* Apply review suggestions

* Fix example DAG

add clarification about timezone aware dags (#30467)

* add clarification about timezone aware dags

Fix typo on index.rst file (#30481)

A duplicate word has been removed.

add template field for s3 bucket (#30472)

Fix typo in outputs in parallel-test-types (#30482)

The typo causes unnecessary delays on building regular PRs :(
It was introduced in #30424

Support serialization to Pydantic models in Internal API (#30282)

* Support serialization to Pydantic models in Internal API.

* Added BaseJobPydantic support and more tests

Add a new parameter for base sensor to catch the exceptions in poke method (#30293)

* add a new parameter for base sensor to catch the exception in poke method

* add unit test for soft_fail parameter

Allow to set limits for XCOM container (#28125)

Add AWS deferrable BatchOperator (#29300)

This PR donates the following BatchOperator deferrable developed in [astronomer-providers](https://github.com/astronomer/astronomer-providers) repo to apache airflow.

Update dead link in Sentry integration document (#30486)

* Update dead link in Sentry integration document

* fix

Add more info to quicksight error messages (#30466)

Revert "Add AWS deferrable BatchOperator (#29300)" (#30489)

This reverts commit 77c272e6e8ecda0ce48917064e58ba14f6a15844.

Fix output to outputs typos in ci.yaml everywhere (#30490)

(Facepalm) The typo of output -> outputs from #30482  was also in the
ci.yaml where it was used and it was missed in this PR.

I can blame the GitHub Actions stupid choice of accepting typoed
names of outputs and replacing them with blank strings (which I
raised as an issue a long time ago)

Reformat chart templates part 3 (#30312)

Move Pydantic classes for ORM objects to serialization (#30484)

The Pydantic classes are really part of the serialization
mechanism and they should be moved there, rather than kept in
the core packages they serialize, following our serialization
approach.

Separate mypy pre-commit checks (#30502)

Previously all mypy pre-commit checks were run as one "run-mypy"
check, but that does not allow to run them separately when trying
to fix some of them, only for a specific part of the sources.

This PR splits them into "dev", "core", "providers" and "docs".

Avoid logging sensitive information in triggerer job log (#30110)

* Change trigger name to task id instead of repr(trigger) to avoid logging sensitive information

Allow specifying a `max_depth` to the `redact()` call (#30505)

The default was hard-coded as 5 which is suitable for the logs
redacting, but the OpenLineage PR would like to be able to use a deeper
depth.

Fix deprecation warning in `example_sensor_decorator` DAG (#30513)

Put AIP-44 internal API behind feature flag (#30510)

This includes:

* configurable setting with defaults taken from env variable
* raising exception if config variables are used with feature
  flag not enabled
* hiding config values (adding mechanism to hide config values
  that are set for the future versions)
* skipping tests

Summarize skipped tests after tests are run (#30520)

When Pytest run tests it provides a summary of the tests. We are
running a lot of the tests so we are really interested only in cases
that are "interesting". So far we were not showing "skipped" tests
in the summary, because there were cases where a lot of tests
were skipped (mostly when integration tests were run - we collected
tests from "tests" folder and run only those tests that were not
skipped by @integration mark.

This however changed in #28170 as we moved all integration
tests to "integration" subfolder and now instead of large number of
skipped tests we run them selectively for each integration.

This should help in verifying that the skipped tests were skipped
for a good reason (and that we actually see which tests have been
skipped).

Add more type hints to the code base (#30503)

* Fully type Pool

Also fix a bug where create_or_update_pool silently fails when an empty
name is given. An error is raised instead now.

* Add types to 'airflow dags'

* Add types to 'airflow task' and 'airflow job'

* Improve KubernetesExecutor typing

* Add types to BackfillJob

This triggers an existing typing bug that pickle_id is incorrectly typed
as str in executors, while it should be int in practice. This is fixed
to keep things straight.

* Add types to job classes

* Fix missing DagModel case in SchedulerJob

* Add types to DagCode

* Add more types to DagRun

* Add types to serialized DAG model

* Add more types to TaskInstance and TaskReschedule

* Add types to Trigger

* Add types to MetastoreBackend

* Add types to external task sensor

* Add types to AirflowSecurityManager

This uncovers a couple of incorrect type hints in the base
SecurityManager (in fab_security), which are also fixed.

* Add types to views

This slightly improves how view functions are typechecked and should
prevent some trivial bugs.

Type related import optimization for Executors (#30361)

Move some expensive typing related imports to be under TYPE_CHECKING

Fix link to pre-commit-hook section (#30522)

* Change static link

* Update LOCAL_VIRTUALENV.rst

Do not use template literals to construct html elements (#30447)

Enable AIP-44 and AIP-52 by default for development and CI on main (#30521)

* Enable AIP-44 and AIP-52 by default for development and CI on main

The AIP-44 and AIP-52 are controlled now by environment variables,
however those variables were not passed by default to inside the
docker-compose environment so they had no effect when set on the
ci.yaml. This PR fixes it, but it also sets the variables to
be enabled by default in Breeze environment and when the tests
are run locally in main using local venv so that the contributors
are not surprised when they try to reproduce local failures.

In 2.6 branch, we will set both variables to "false" by default
in ci.yml, so that the tests are not run when we cherry-pick the changes.

* Update scripts/ci/docker-compose/devcontainer.env

Run "api_internal" tests in CI (#30518)

* Run "api_internal" tests in CI

While adding a feature flag for AIP-44 I noticed that due to a
weird naming we have in tests, the "api_internal" tests were
actually excluded from running - this was due to a combination of
factors:

* When API tests are are run, only "api" and "api_connexion" were
  added to API_tests
* We already have "api" folder in "tests" (for experimental api)
* finding "Other" tests should cover it but it exluded "api" tests but
  the way it is implemented, it took the "api" prefix and excluded
  all the test directories that were starting with "api" (including
  "api_internal/*" ones

This change addresses it twofold:

* The "api_internal" tests are added explicitly to the "API"
  test type
* The "tests/api" folder with tests for the experimental API
  has been renamed to "api_experimental" (including integration
  tests)

This should set the "internal_api" tests to run in the API test
type, and renaming the "api" to api_experimental should avoid
accidental skipping the tests in case someone adds
"tests/api_SOMETHING" in the future.

* Update Dockerfile.ci

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Update entrypoint_ci.sh

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

`EmailOperator`: fix wrong assignment of `from_email` (#30524)

* `EmailOperator`: fix wrong assignment of `from_email`

Add asgiref as a core dependency (#30527)

We added asgiref to core a few months back for the `sync_to_async` in
airflow.triggers.external_task.

Although the core http provider depends on asgiref since v 4.2 it is
possible to have an older version http installed meaning that you end up
without asgiref, which leads to every dag failing to parse as the
"dependency detector" code inside the DAG Serializer ends up importing
this module!

improve first PR bot comment (#30529)

Put AIP-52 setup/teardown tasks behind feature flag (#30509)

We aren't going to land AIP-52 in time for 2.6, so put the authoring api
behind a feature flag. I've chosen to put it in `airflow.settings` so
users can set it in `airflow_local_settings`, or set it via env var.

Add tests to PythonOperator (#30362)

* Add tests to airflow/operators/python.py

* Convert log error of _BasePythonVirtualenvOperator._read_result() into a custom exception class

* Improve deserialization error handling

---------

Co-authored-by: Shahar Epstein <shahar1@live.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Make AWS RDS operator page title consistent (#30536)

This PR changes the title of the RDS documentation page from "Amazon Relational Database Service Documentation (RDS)" to "Amazon Relational Database Service (RDS)". This page was the only one with the word "Documentation" in its title, and several other services had a similar title format of ("Amazon <full service name> (<acronym>)"), for example ["Amazon Simple Notification Service (SNS)"](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/sns.html).

Revert "Add tests to PythonOperator (#30362)" (#30540)

This reverts commit b4f3efd36a0566ef9d34baf071d935c0655a02ef.

Add new known warnings after dependencies upgrade. (#30539)

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Fix AzureDataFactoryPipelineRunLink get_link method (#30514)

Use default connection id for KubernetesPodOperator (#28848)

* Use default connection id for KubernetesPodOperator

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Fix dynamic imports in google ads vendored in library (#30544)

This is a further fix to vendor-in google-ads library:

* moving from _vendor to google_vendor is needed to keep
  the tree structure separate from `google` package used
  in google provider. If we do not do that, ads library
  will find "google" package when walking up the tree
  and will determine that this is the "top" google package

* dynamic imports of ads library imports have been updated
  to also import from the vendored-in library

* only v12 version is supported

Closes: #30526

BigQueryHook list_rows/get_datasets_list can return iterator (#30543)

Add deferrable mode to GKEStartPodOperator (#29266)

* Add deferrable mode to GKEStartPodOperator

* Change naming for GKEHook and add comments

* Rebase main, revert unrelated changes

* Add review suggestions + rebase

* Add deprecation warning for deleted method + rebase

Accept None for `EmailOperator.from_email` to load it from smtp connection (#30533)

* Add default None for  to load it from smtp connection

Update DV360 operators to use API v2 (#30326)

* Update DV360 operators to use API v2

* Update display_video.rst

* fixup! Update display_video.rst

* fixup! Update display_video.rst

---------

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>

Prepare docs for ad hoc release of Providers (#30545)

* Prepare docs for ad hoc release of Providers

* add smtp provider

* add google

Add --one-pass-only parameter to breeze docs build command (#30555)

The parameter was previously supported in the docs-build script
but it was not exposed via breeze commands.

It allows to iterate faster on docs building, by default docs building
runs up to 3 passes in order to account for new cross-references
between multiple providers, this flag makes it one pass, which makes
it faster to summarize the errors when you try to nail down a problem
with docs.

Load subscription_id from extra__azure__subscriptionId (#30556)

Use the engine provided in the session (#29804)

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>

fix release doc for providers (#30559)

Change timer type back to float (#30532)

Decouple "job runner" from BaseJob ORM model (#30255)

* Decouple "job runner" from BaseJob ORM model

Originally BaseJob ORM model was extended and Polymorphism has been
used to tie different execution logic to different job types. This
has proven to be difficult to handle during AIP-44 implementation
(internal API) because LocalTaskJob, DagProcessorJob and TriggererJob
are all going to not use the ORM BaseJob model, but they should use
BaseJobPydantic instead. In order to make it possible, we introduce
a new type of object BaseJobRunner and make BaseJob use the runners
instead.

This way, the BaseJobRunners are used for the logic of each of the
job, where single, non-polimorphic BaseJob is used to keep the
records in the database - as a follow up it will allow to completely
decouple the job database operations and move it to internal_api
component when db-lesss mode is enabled.

Closes: #30294

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Fix one more dynamic import needed for vendored-in google ads (#30564)

Continuation of #30544

Move Pydantic class serialization under AIP-44 feature flag (#30560)

The Pydantic representation of the ORM models is only used
in AIP-44 in-progress feature, and we are moving to a new
seialization implementation (more modular) in a near feature
so in order to not unecessarily extend features in old
serialization, but allow to test AIP-44, we are moving the
use_pydantic_models parameter and it's implementation under
_ENABLE_AIP_44 feature flag, so that it is not used accidentally.

We will eventually remove it and add Pydantic serialization to
the new serialization implementation.

Add podAnnotations to PgBouncer (#30168)

Added support for using SHA digest of Docker images (#30214)

Bump json5 to 1.0.2 and eslint-plugin-import to 2.27.5 in /airflow/www (#30568)

Bumping from 1.0.1 for json5 and 2.26.0

Update dataproc.rst (#30566)

Making statement more contextual so change proposed here is "a provide" to "to provide"

Quieter output during asset compilation (#30565)

The "Still waiting ....." message was emitted every second, which can be
quite noisy even on moderate machines. This reduces the message to once
every 5 seconds.

Rename JobRunner modules to *_job_runner and base_job* to job (#30302)

The #30255 introduced "JobRunner" concept and decoupled the job logic
from the ORM polymorphic *Job objects. The change was implemented
in the way to minimise the review effort needed, so it avoided renaming
the modules for the runners (from `_job` to `_job_runner`).

Also BaseJob lost its "polymorphism" properties so the package, and class name
can be renamed to simply job.

This PR completes the JobRunner concept introduction by applying the
renames.

Closes: #30296

Speed up dag runs deletion (#30330)

* Provide custom deletion for dag runs to speed up when a dag run has a lot of related task instances
---------

Co-authored-by: Zhyhimont Dmitry <zhyhimont.d@profitero.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Adding taskflow API example for sensors (#30344)

Use connection URI in SqliteHook (#28721)

* Use connection URI in SqliteHook

This allows the user to define more sqlite args such as mode. See https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#uri-connections for details.
- remove unsupported schema, login and password fields in docs
- add info about host field to docs

Release notes for helm chart 1.9.0 (#30570)

Do not remove docker provider for Airflow 2.3 check (#30483)

This removal is a remnant of old docker provider for 2.2 and should
not be happening.

Separate and split run job method into prepare/execute/complete steps (#30308)

* Separate and split run job method into prepare/execute/complete steps

As a follow-up after decoupling of the job logic from the BaseJob
ORM object (#30255), the `run` method of BaseJob should also be
decoupled from it (allowing BaseJobPydantic to be passed) as well
as split into three steps, in order to allow db-less mode.

The "prepare" and "complete" steps of the `run` method are modifying
BaseJob ORM-mapped object, so they should be called over the
internal-api from LocalTask, DafFileProcessor and Triggerer running
in db-less mode. The "execute" method however does not need the
database however and should be run locally.

This is not yet full AIP-44 conversion, this is a prerequisite to do
so - and AIP-44 conversion will be done as a follow-up after this one.

However we added a mermaid diagram showing the job lifecycle with and
without Internal API to make it easier to reason about it

Closes: #30295

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

Update SQLAlchemy `select()` to new style (#30515)

SQLAlchemy has a new style for `select() that is standard for 2.0. This
updates our uses of it to avoid `RemovedIn20Warning` warnings.

https://docs.sqlalchemy.org/en/20/errors.html#select-construct-created-in-legacy-mode-keyword-arguments-etc

Remove JobRunners back reference from Job (#30376)

This is the final step of decoupling of the job runner from ORM
based BaseJob. After this change, finally we rich the state that
the BaseJob is just a state of the Job being run, but all
the logic is kept in separate "JobRunner" entity which just
keeps the reference to the job. Also it makes sure that
job in each runner is defined as appropriate for each job type:

* SchedulerJobRunner, BackfillJobRunner can only use BaseJob
* DagProcessorJobRunner, TriggererJobRunner and especially the
  LocalTaskJobRunner can keep both BaseJob and it's Pydantic
  BaseJobPydantic representation - for AIP-44 usage.

The highlights of this change:

* Job does not have job_runner reference any more
* Job is a mandatory parameter when creating each JobRunner
* run_job method takes as parameter the job (i.e. where the state
  of the job is called) and executor_callable - i.e. the method
  to run when the job gets executed
* heartbeat callback is also passed a generic callable in order
  to execute the post-heartbeat operation of each of the job
  type
* there is no more need to specify job_type when you create
  BaseJob, the job gets its type by a simply creating a runner
  with the job

This is the final stage of refactoring that was split into
reviewable stages: #30255 -> #30302 -> #30308 -> this PR.

Closes: #30325

Cast binding +1 in helm chart release vote email (#30590)

We will assume that the release manager for the helm chart wants to cast
a binding +1 vote :)

Databricks SQL sensor (#30477)

* Renamed example DAG

Add Hussein to committers (#30589)

Add support in AWS Batch Operator for multinode jobs (#29522)

picking up #28321 after it's been somewhat abandoned by the original author.
Addressed my own comment about empty array, and it should be good to go I think.

Initial description from @camilleanne:

Adds support for AWS Batch multinode jobs by allowing a node_overrides json object to be passed through to the boto3 submit_job method.

Adds support for multinode jobs by properly parsing the output of describe_jobs (which is different for container vs multinode) to extract the log stream name.
closes: #25522

Fix CONTRIBUTORS_QUICK_START Doc (#30549)

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Use custom validator for OpenAPI request body (#30596)

* Use custom validator for OpenAPI request body

The default error message for an empty request body from Connexion
is quite unhelpful (taken directly from JSONSchema). This custom
validator emits a more helpful message for this particular context.

* Add test for custom request body validator

Co-Authored-By: maahir22 <56473490+maahir22@users.noreply.github.com>

---------

Co-authored-by: maahir22 <56473490+maahir22@users.noreply.github.com>

Remove 'run-' prefix from pre-commit jobs (#30597)

* Remove 'run-' prefix from pre-commit jobs

The job ID already implies 'run', and having the additional prefix
results in weird CLI, e.g. 'pre-commit run run-mypy-core'. This changes
the CLI to 'pre-commit run mypy-core', which reads better.

* Fix table marker

* Fix outdated pre-commit hook ID references

Add ability to override waiter delay in EcsRunTaskOperator (#30586)

Prepare docs for RC2 of provider wave (#30606)

Deactivate DAGs deleted from within zipfiles (#30608)

DagBag: Use dag.fileloc instead of dag.full_filepath in exception message (#30610)

Co-authored-by: Douglas Staple <staple.douglas@gmail.com>

Remove gauge scheduler.tasks.running (#30374)

* Remove gauge scheduler.tasks.running

* Add significant.rst file

* Update newsfragments/30374.significant.rst

---------

Co-authored-by: Niko Oliveira <onikolas@amazon.com>

Recover from `too old resource version exception` by retrieving the latest `resource_version` (#30425)

* Recover from `too old resource version exception` by retrieving the latest `resource_version`

* Update airflow/executors/kubernetes_executor.py

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

docs: use correct import path for Dataset (#30617)

Speed up TaskGroups with caching property of group_id (#30284)

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Fix `TriggerDagRunOperator` with deferrable parameter (#30406)

* readding after borked it

* pre-commit

* finally fixing after the github issue last week

* push fix

* feedback from hussein

Fix failing SQS tests on moto upgrade (#30625)

The new moto (4.1.7) performs additional validation on the queues
created during tests and it failes the tests when content
deduplication is not specified.

Explicit setting the deduplication mode, fixes the problem and
allows the new moto to be installed.

fix possible race condition when refreshing DAGs (#30392)

* fix possible race condition when refreshing DAGs

* merge the two queries into one

* Remove provide_session from internal function

Since get_latest_version_hash_and_updated_datetime is internal and we
always pass in the session anyway, the provide_session decorator is
redundant and only introduce possibility for developer errors.

---------

Co-authored-by: Sébastien Brochet <sebastien.brochet@nielsen.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Remove Norm and Hussein from the triage group (#30627)

Hussein is now a commiter and Norm has completed building out the
initial AIP-52 tasks.

Remove mysql-connector-python (#30487)

* Turn the package 'mysql-connector-python' as an optional feature

* Update airflow/providers/mysql/provider.yaml

* Update airflow/providers/mysql/CHANGELOG.rst

---------

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Better error message where non-compatible providers are not excluded (#30629)

When compatibility check is performed for old version of Airflow,
we attempt to install all providers for the old version. However if
one of the providers has >= limit on Airflow for newer version of
Airflow, this installation lead to attempting to upgrade airflow
rather than fail, which could lead to misleading errors.

This change adds "airflow==x.y.z" taken from --use-airflow-version
flag to the `pip install` command, which should in this case fail
with much more accurate message, that the provider conflicts with
airflo version.

Updating the links to the Dataform product documentation to fix 404 redirect error (#30631)

Updating the links to the Dataform product documentation to fix 404 redirect error

New AWS sensor — DynamoDBValueSensor (#28338)

Remove duplicate param docstring in EksPodOperator (#30634)

In `DockerOperator`, adding an attribute `tls_verify` to choose whether to validate certificate (#30309) (#30310)

* add `tls_verify` to choose whether to validate certificate (#30309)

---------

Co-authored-by: Hussein Awala <hussein@awala.fr>

Add `max_active_tis_per_dagrun` for Dynamic Task Mapping (#29094)

* add max_active_tis_per_dagrun param to BaseOperator

* set has_task_concurrency_limits when max_active_tis_per_dagrun is not None

* check if max_active_tis_per_dagrun is reached in the task deps

* check if all the tasks have None max_active_tis_per_dagrun before auto schedule the dagrun

* check if the max_active_tis_per_dagrun is reached before queuing the ti

* check max_active_tis_per_dagrun in backfill job

* fix current tests and ensure everything is ok before adding new tests

* refacto TestTaskConcurrencyDep

* fix a bug in TaskConcurrencyDep

* test max_active_tis_per_dagrun in TaskConcurrencyDep

* tests max_active_tis_per_dagrun in TestTaskInstance

* test dag_file_processor with max_active_tis_per_dagrun

* test scheduling with max_active_tis_per_dagrun on different DAG runs

* test scheduling mapped task with max_active_tis_per_dagrun

* test max_active_tis_per_dagrun with backfill CLI

* add new starved_tasks filter to avoid affecting the scheduling perf

* unify the usage of TaskInstance filters and use TI

* refacto concurrecy map type and create a new dataclass

* move docstring to ConcurrencyMap class and create a method for default_factory

* move concurrency_map creation to ConcurrencyMap class

* replace default dicts by counters

* replace all default dicts by counters in the scheduler_job_runner module

* suggestions from review

Simplify logic to resolve tasks stuck in queued despite stalled_task_timeout (#30375)

* simplify and consolidate logic for tasks stuck in queued

* simplify and consolidate logic for tasks stuck in queued

* simplify and consolidate logic for tasks stuck in queued

* fixed tests; updated fail stuck tasks to use run_with_db_retries

* mypy; fixed tests

* fix task_adoption_timeout in celery integration test

* addressing comments

* remove useless print

* fix typo

* move failure logic to executor

* fix scheduler job test

* adjustments for new scheduler job

* appeasing static checks

* fix test for new scheduler job paradigm

* Updating docs for deprecations

* news & small changes

* news & small changes

* Update newsfragments/30375.significant.rst

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Update newsfragments/30375.significant.rst

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* added cleanup stuck task functionality to base executor

* fix sloppy mistakes & mypy

* removing self.fail from base_executor

* Update airflow/jobs/scheduler_job_runner.py

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Update airflow/jobs/scheduler_job_runner.py

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Fix job_id filter

* Don't even run query if executor doesn't support timing out queued tasks

* Add support for LocalKubernetesExecutor and CeleryKubernetesExecutor

* Add config option to control how often it runs - we want it quicker than
the timeout

* Fixup newsfragment

* mark old KE pending pod check interval as deprecated by new check interval

* Fixup deprecation warnings

This more closely mirrors how deprecations are raised for "normal"
deprecations.

I've removed the depth, as moving up the stack doesn't really help the
user at all in this situation.

* Another deprecation cleanup

* Remove db retries

* Fix test

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Jed Cunningham <jedcunningham@apache.org>
Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>

Display Video 360 cleanup v1 API usage (#30577)

* Display Video 360 cleanup v1 API usage

* Update docs

Fix mapped tasks partial arguments when DAG default args are provided (#29913)

* Add a failing test to make it pass

* use partial_kwargs when they are provide and override only None values by dag default values

* update the test and check if the values are filled in the right order

* fix overriding retry_delay with default value when it is equal to 0

* add missing default value for inlets and outlets

* set partial_kwargs dict type to dict[str, Any] and remove type ignore comments

* create a dict for default values and use NotSet instead of None to support None as accepted value

* update partial typing by removing None type from some args and set NotSet for all args

* Tweak kwarg merging slightly

This should improve iteration a bit, I think.

* Fix unit tests

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

First commit of OpenLineage provider. (#29940)

This PR consistent mostly of code that was created in OpenLineage project. It
consists of

- Provider wiring
- OpenLineageListener that uses Listener API to get notification about changes
  to TaskInstance and Dag states
- Extractor framework, which is used to extract lineage information from
  particular operators. It's ment to be replaced by direct implementation of
  lineage features in later phase and extracting them using DefaultExtractor.
  This PR does not include actual extractors, but code around using and registering them.
- OpenLineageAdapter that translates extracted information to OpenLineage events.
- Utils around specific Airflow OL facets and features

This is a base implementation that's not ment to be released yet, but to add
code modified to be consistent with Airflow standards, get early feedback and
provide canvas to add later features, docs, tests on.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>

Add v2-6-test and v2-6-stable to codecov and protected branches (#30640)

Adding configuration to control retry parameters for k8s api client (#29809)

* Adding configuration to control retry parameters for k8s api client

* Handling review comments

* Fixing code bug

* Fixing failing tests

* Temporary commit with UT wip

* Fixing unit test

* Fixing the strict checks

* Handling review comments from Hussein

* Revert "Handling review comments from Hussein"

This reverts commit fa3bc260f7462c42620f694ee97b7f15c0b0b9c3.

* Fixing failing ut

* Reverting bad hack

* Updating logic in kube_client.py

Co-authored-by: Hussein Awala <hussein@awala.fr>

* Fixing unit tests

* Fixing unit tests

* Handling review comments from Ash

* Fix loading mock call args for python3.7

* Apply suggestions from code review

* fix static check

* add in 2.6.0

---------

Co-authored-by: Amogh <adesai@cloudera.com>
Co-authored-by: Hussein Awala <houssein.awala.96@gmail.com>

fix(chart): webserver probes timeout and period. (#30609)

* fix(chart): webserver probes timeout and period

* Update default values in JSON schema to reflect values.yaml

* remove defautl templated values

Clarify release announcements on social media (#30639)

DynamoDBHook - waiter_path() to consider `resource_type` or `client_type` (#30595)

* Add  while initilizing

* Add  while initilizing

* Add logic to pick either client_type or resource_type

* Add test case

* Assert expected path

Improve task & run actions ux in grid view (#30373)

* update run clear+mark, update task clear

* add mark as tasks and include list of affected tasks

* Add support for mapped tasks, add shared modal component

* Clean up styling, restore warning for past/future tg clear

Add command to get DAG Details via CLI (#30432)

---------

Co-authored-by: Hussein Awala <hussein@awala.fr>
Co-authored-by: Hussein Awala <houssein.awala.96@gmail.com>

When clearing task instances try to get associated DAGs from database (#29065)

* When clearing task instances try to get associated DAGs from database.

This fixes problems when recursively clearing task instances across multiple DAGs:
  * Task instances in downstream DAGs weren't having their `max_tries` property incremented, which could cause downstream external task sensors in reschedule mode to instantly time out (issue #29049).
  * Task instances in downstream DAGs could have some of their properties overridden by an unrelated task in the upstream DAG if they had the same task ID.

* Use session fixture for new `test_clear_task_instances_without_dag_param` test.

* Use session fixture for new `test_clear_task_instances_in_multiple_dags` test.

---------

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Organize Amazon providers docs index (#30541)

preload airflow imports before dag parsing to save time (#30495)

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>

Add delete inactive run functionality to databricks provider (#30646)

Create audit_logs.rst (#30405)

---------

Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com>

Present affected task instances as table (#30633)

Helm chart 1.9.0 has been released (#30649)

Add 2.6.0b1 to issue template (#30652)

add missing project_id in BigQueryGetDataOperator (#30651)

Properly classify google_vendor package to google provider (#30659)

We've recently added google_venor package to vendor-in ads library,
and we had to do it outside of regular google provider package,
because internally the library assumed our google package is top
level package when discovering the right relative imports (#30544).

This confused the pre-commit that updates provider depedencies to
not recognise the package and print warnings about bad classification.

Special case handling will classify it to google provider.

Make pandas optional in workday calendar example (#30660)

The workday calendar expected pandas to be available and it is part
of our examples, however Airflow does not have pandas as a core
dependency, so in case someone does not have pandas installed, importing
of the workday example would fail.

This change makes pandas optional and fallbacks to regular working
days for the example in case it is not available (including warning
about it). It also fixes a slight inefficiency where the
USFederalHoliday calendar has been created every time next workday
was calculated.

Update Google Campaign Manager360 operators to use API v4 (#30598)

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Skip KubernetesPodOperator task when it returns a provided exit code (#29000)

* Skip KubernetesPodOperator task when it returns a provided exit code

* set default value to None, and get exit code only when skip_exit_code is not None

* get the exit code for the base container and check if everything is ok

* add unit test for the operator

* add a test for deffered mode

* apply change requests

---------

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Upgrade Pip to 23.1 (#30663)

Fix docs building for workday example. (#30664)

The #30660 was merged to quickly as it results in doc building
failure. This PR fixes it.

docker compose doc changes (#30662)

Add suspended providers to pytest collection test (#30668)

Pytest collection has been extracted recently to a separate job
and SUSPENDED_PROVIDERS_FOLDERS variable was not set in the new
job - which causes suspended provider tests being attempted by
pytest collection, leading to import errors when suspended providers
have some dependencies removed from our image.

Workaround type-incompatibility with new attrs in openlineage (#30674)

The new attrs released today (11 hours ago) had added typing
information and they caused OpenLineageRedactor to fail mypy checks.

Temporary adding type: ignore should allow to upgrade to the new
attrs and stop PRs changing dependencies from failing.

Related: #30673

Update the release note (#30680)

* Update the release note

During the beta release, I observed some minor things that need fixing.
Here's the PR

* Use local import

Correctly pass a type to attrs.has() (#30677)

Merge WasbBlobAsyncSensor to WasbBlobSensor (#30488)

Updated app to support configuring the caching hash method for FIPS v2 (#30675)

Install twine with --force for package verification (#30683)

In some cases when the machine has been reused across builds, pipx
installed twine might seem both installed and removed (this happens
when builds are cancelled while installing twine.

Installing twine with --force should fix the problem.

Fix docs: add an "apache" prefix to pip install (#30681)

Remove unittests.TestCase from tests/test_utils (#30685)

Introduce consistency of package sequence for "Other" test type (#30682)

When packages for "Other" test type are calculated, the list of
all test folders is generated and they are compared with the
packages previously selected by the "predefined" test types. This
is done via `find` method that returns the folders in arbitrary
order, mostly depending on the sequence the folders were created.

In case the tests from some packages have some side-effects that
impact tests in other packages (obviously not something that is
desired), this might end up that the tests succeed in one
environment, but fail in another. This happened for example
in case of #30362 that had cross-package side-effect later
fixed in #30588. There - results of "Other" test type depended
on where the tests were executed.

This PR sorts the find output so it is always in consistent order.
we are using ASCII for package names and the test types are
derived in the same Docker CI image with the same LOCALE, so it
should guarantee that the output of packages for "Other" test type
should be always consistent.

Add missing version val to caching_hash_method config (#30688)

Upgrade to MyPy 1.2.0 (#30687)

Upgrading to latest (released a week ago) MyPy in the hopes it
will fix some more problem with attrs after upgrading new packages,
but it seems that even the latest MyPy does not know about the
new typing changes introduced in attrs (traditionally mypy has
attrs plugin that injects appropriate typing but apparently it
needs to catch up with those changes.

Parallelize Helm tests with multiple job runners (#30672)

Helm Unit tests are using template rendering and the rendering
uses a lot of CPU for `helm template command`. We have a lot of
those rendering tests (>800) so even running the tests in parallel
on a multi-cpu machine does not lead to a decreased elapsed time
to execute the tests.

However, each of the tests is run entirely independently and we
should be able to achieve much faster elapsed time if we run
a subset of tetsts on separate, multi-CPU machine. This will not
lower the job build time, however it might speed up elapsed time
and thus get a faster feedback.

This PR achieves that.

Skip PythonVirtualenvOperator task when it returns a provided exit code (#30690)

* Add a new argument to rais skip exception when the python callable exit with the same value

* add unit tests for skip_exit_code

[OTel Integration] Add tagging to existing stats (#30496)

rename skip_exit_code to skip_on_exit_code and allow providing multiple codes (#30692)

* rename skip_exit_code to skip_on_exit_code and allow providing multiple codes

* replace list type by Container

Fix d3 dependencies (#30702)

Update system test example_emr to have logs (#30715)

Fixed logging issue (#30703)

Co-authored-by: Mark Richman <mrkrchm@amazon.com>

Separate out and clarify policies for providers (#30657)

This change separates out the policies we have for providers to
a separate PROVIERS.rst file. It also documents clearly the process
and policy we have for accepting new community-managed providers,
explaining the conditions that have to be fulfilled and stating
a very strong preference of keeping providers maintained by the
3rd-party providers when there are 3rd-party teams that manage
the providers.

SqlToS3Operator - Add feature to partition SQL table (#30460)

Optimize parallel test execution for unit tests (#30705)

We are runnig the tests in parallel test types in order to speed
up their execution. Howver some test types and subsets of tests
are taking far longer to execute than other test types.

The longest tests to run are Providers and WWW tests, and the
longest tests from Providers are by far Amazon tests, then
Google. "All Other" Provider tests take about the same time
as Amazon tests - also after splitting the provider tests,
Core tests take the longest time.

When we are running tests in parallel on multiple CPUs, often
the longest running tests remain runing on their own while the
other CPUS are not busy. We could run separate tests type
per provider, but overhead of starting the database and collecting
and initializing tests for them is too big for it to achieve
speedups - especially for Public runners, having 80 separate
databases with 80 subsequent container runs is slower than
running all Provider tests together.

However we can split the Provider tests into smaller number of
chunks and prioritize running the long chunks first. This
should improve the effect of parellelisation and improve utilization of
our multi-CPU machines.

This PR aims to do that:

* Split Provider tests (if amazon or google are part of the
  provider tests) into amazon, google, all-other chunks

* Move sorting of the test types to selective_check, to sort the
  test types according to expected longest running time (the longest
  tests to run are added first)

This should improve the CPU utilization of our multi-CPU runners
and make the tests involving complete Provider set (or even sets
containing amazon, google and few other providers)
execute quite a few minutes faster on average.

We could also get rid of some sequential processing for the Public PRs
because each test type we will run will be less demanding overall. We
used to get a lot of 137 exit codes (memory errors) but with splitting
out Providers, the risk of exhausting resources be two test types
running in paralel are low.

Deprecate `skip_exit_code` in `BashOperator` (#30734)

Add explicit information about how to write task logs (#30732)

There was no explicit information in our documentation on how to
write logs from your tasks. While for classic operators, that is
easy and straightforward as they all have log property which
is the right logger coming from LoggingMixin, for taskflow code
and custom classes it is is not straightforward that you have to
use `airflow.task` logger (or a child of it) or that you have
extend LoggingMixin to use the built-in logging configuration.

Suspend Yandex provider due to protobuf limitation (#30667)

Yandex provider brings protobuf dependency down to <4 and we are gearing
up to updating it everywhere else. Protobuf3 support ends in Q2 2023
for Python https://protobuf.dev/support/version-support/#python

Yandex is the last provider that we do not closely collaborate with on fixing
* Gogle provider dependencies are actively upgraded to latest version
  by Google led team: #30067 (some of the libraries are already updated)
  with target to update all dependencies by mid-May
* Apache-Beam has already merged protobuf4 support
  https://github.com/apache/beam/pull/25874 with the target of
  releasing it in 2.47.0 mid-May
* The mysql-connector-python in MySQL provider is already turned into
  optional dependency: #30487

The only remaining dependency limiting us to protobuf 3 (<3.21) is
yandexcloud. We've opened an issue to yandexcloud
https://github.com/yandex-cloud/python-sdk/issues/71 3 weeks ago
and while there was an initial interest, there is no progress on
the issue, therefore - in order to prepare for running all
the tests and final migration to protobuf4 we need to suspend
Yandex provider - following the suspension process we agreed
and got a LAZY CONSENSUS on in
the https://lists.apache.org/thread/g8b3k028qhzgw6c3yz4jvmlc67kcr9hj
mailing list discussion.

The yandex provider can be removed from suspension by a PR reverting
this change once yandexcloud dependency removes the protobuf limitation
in their release and PR reverting this change (and fixing all tests
and static check that will be needed) is the way it can be done.

Add a collapse grid button (#30711)

Add skip_on_exit_code also to ExternalPythonOperator (#30738)

The change ##30690 and #30692 added skip_on_exit_code to the
PythonVirtualenvOperator, but it skipped the - very closely related
- ExternalPythonOperator.

This change brings the same functionality to ExternalPythonOperator,
moves it to the base class for both operators, it also adds
separate Test class for ExternalPythonOperator, also introducing
a common base class and moving the test methods that are common
to both operators there.

Add multiple exit code handling in skip logic for BashOperator (#30739)

Follow-up after #30734

Deprecate `skip_exit_code` in `DockerOperator` and `KubernetesPodOperator` (#30733)

* Deprecate `skip_exit_code` in `DockerOperator` and `KubernetesPodOperator`

* satisfy mypy

Remove protobuf limitation from eager upgrade (#30182)

Protobuf limitation was added to help pip resolve eager upgrade
dependencies, however it is not needed any more.

Fix misc grid/graph view UI bugs (#30752)

add a stop operator to emr serverless (#30720)

* add a stop operator to emr serverless

* update doc
---------

Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>

Better explanation on how to log from tasks (#30746)

* Better explanation on how to log from tasks

After Daniel's explanation this should provide a better description
on how to log from tasks.

Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
Co-authored-by: Niko Oliveira <onikolas@amazon.com>

Skip suspended providers when generating providers summary index (#30763)

When provider's summary index gets generated it should not
include suspended providers. This has been missed in #30422

Fix when OpenLineage plugins has listener disabled. (#30708)

Add parametrized test for disabling OL listener in plugin.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>

Split installation of sdist providers into parallel chunks (#29223)

Sdist provider installation takes a lot of time because pip cannot
parallelise the sdist package building. But we still want to test all
our provider's installation as sdist packages.

This can be achieved by running N parallel installations with only
subset of providers being installed in each chunk.

This is what we do in this PR.

Speed up package wheel job in CI (#30766)

After recent improvements, package wheel job has become one
of the longest jobs to run. So far it sequentially build airlfow,
prepared documentation for packages, build the packages
installed both airflow and packages and tested imports for them, then it
was removing installed airlfow, removed airflow and run the same tests
with 2.3 airflow version to check for compatibility.

This change splits it into two parallel jobs. There is a small
duplication (3 minutes of preparing the whl packages) but then
the "compatibility" job does not need Airflow and few other
steps to be run (such as preparing docs or airlfow) and overall
we just get few minutes longer to repeate the wheel package
preparation but then each of the two jobs will take a bit more
than half the time of the original way, which will greately improve
feedback time for the users (in most cases the two jobs will complete
under 12 minutes, where the original job needed 21 minutes to complete.

Use material icons for dag import error banner (#30771)

* Use material icons for dag import error banner

* fix message caret direction

Update DataprocCreateCluster operator to use 'label' parameter properly (#30741)

Add multiple exit code handling in skip logic for `DockerOperator` and `KubernetesPodOperator` (#30769)

remove delegate_to from GCP operators and hooks (#30748)

Remove @poke_mode_only from EmrStepSensor (#30774)

* Remove @poke_mode_only from EmrStepSensor

* Add EmrStepSensor to system test and documentation

* Fix test

add pod status phase to KPO test mock (#30782)

Export SUSPENDED_PROVIDERS_FOLDERS for breeze testing commands (#30780)

Export the SUSPENDED_PROVIDERS_FOLDERS env var in breeze directly
instead of in Airflow CI workflows. This will fix the issue for users
executing `breeze testing ...` commands locally.

Add openlineage to boring-cyborg.yml (#30772)

Improve url detection (#30779)

Adapt to better resolver of pip (#30758)

We used to have helper limits for eager upgrade of our packages
but with 23.1 updated in #30663 pip has a much improved resolver that
does not need that much of a help and can resolve our dependnecies
pretty fast on its own, so we can remove all the dependecy limts that
aimed to limit the dependency resolution time.

Also we used to have a mechanism to track backtracking issues and
find out which of the new dependencies caused excessive backtracking.
This seems to be not needed so we can remove it from CI and breeze.

Add explanation on why we have two local pre-commit groups (#30795)

Remove skip_exit_code from KubernetesPodOperator (#30788)

Since the parameter was not released we can safely remove it without deprecation.

Better message on deserialization error (#30588)

Previously deserialization error thrown a pretty mysterious ValueError
in case for example there was a Python version mismatch - the python
object serialized in one version of Python produced "version error"
messsage. This change turns such ValueError in specific
Deserilization error, with better message explaining possible reason,
but also without loosing the cause.

Co-authored-by: Shahar Epstein <shahar1@live.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>

AWS logs. Exit fast when 3 consecutive responses are returned from AWS Cloudwatch logs (#30756)

* AWS logs. Exit fast when 3 consecutive responses are returned from AWS Cloudwatch logs

---------

Co-authored-by: Niko Oliveira <onikolas@amazon.com>

Add provider for Apache Kafka (#30175)

* Add provider for Apache Kafka

Pulls in a series of integrations to Kafka from airflow-provider-kafka (https://pypi.org/project/airflow-provider-kafka/) to core airflow.

---------

Co-authored-by: Tamara Janina Fingerlin <90063506+TJaniF@users.noreply.github.com>
Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com>

Remove deprecated code from Amazon provider (#30755)

* Remove deprecated code from Amazon provider

Prepare docs for adhoc release of providers (#30787)

* Prepare docs for adhoc release of providers

Speed up test collection (#30801)

Test collection had default setting for parallel test types because
TEST_TYPES variable had not been renamed to PARALLEL_TEST_TYPES
Also test collection can be run in Wait for CI images job which
should save around a minute for setting up Breeze and pulling the
images.

This should speed up pytest collection test by around 1 and half minute

Removed Mandatory Encryption in Neo4jHook (#30418)

* Removed mandatory encryption in neo4jhook

* Added unit tests and altered exising

* Added unit-test and fixed existing ones.

* Changed the implementation of get_client

* Changed test for encrypted param

* fix unit test and check if encrypted arg is provided or not

* fix static checks

* fix unit tests fo python 3.7

---------

Co-authored-by: Hussein Awala <hussein@awala.fr>

restore fallback to empty connection behavior (#30806)

Also remove restored behavior from changelog

fixes to system tests following obsolete cleanup (#30804)

Co-authored-by: Niko Oliveira <onikolas@amazon.com>

Add deferrable mode to `WasbPrefixSensor` (#30252)

Add licence to The __init__.py in google_vendored (#30807)

This is not a problem (as empty __init__.py has 0 creativity so
the licence can be skipped) but it confuses RAT tool when
verifying the sources.

add sentry transport configuration option (#30419)

Upgrade to pip 23.1.1 (#30808)

Just released, fresh off-the-press bugifx version of pip.

Clean bigquery operator tests (#30550)

Add deferrable mode to `GCSObjectUpdateSensor` (#30579)

Fix removed delegate_to parameter in deferrable GCS sensor (#30810)

Two PRs crossed and the result of #30748 caused the #30579 to fail
as delegate_to parameter has been removed.

Upgrade ruff to 0.0.262 (#30809)

Fix dev index building for suspended providers (#30812)

This is a follow-up after #30422 and #30763 - it turns out that
locally building index of providers failed when some providers
are suspended. It only impacts dev workflow locally.

Add instructions on how to avoid accidental airflow upgrade/downgrade (#30813)

Some of our users raised issues that when extending the image, airflow
suddenly started reporting problem with database versions and migration
not aplied or out-of-sync. This almost always turns out to be a
dependency conflict, that leads to automated downgrate or upgrade of
installed airflow version. This is - obviously - undesired (you should
be upgrading airflow consciously rather than accidentally). However
there is no way to do it implicitly - `pip` might decide to upgrade or
downgrade airflow as it sees fit. From the point of view - airflow is
just one of the packages and has no special meaning.

The only way to "keep" airflow version is to specify it together with
other requirements, pinned to the specific version. This PR updates
our examples to do this and explains why airflow is added there.

There is - of course - another risk that the user will forget to
update the version of airflow when they upgrade, however, sinc this
is explicit action performed during image extension, it is much easier
to diagnose and notice. We also warn the users that they should upgrade
when airflow is upgraded.

Make eager upgrade additional dependencies optional (#30811)

In case additional dependencies are installed in customisation path
of Docker, the eager upgrade dependencies are now empty after #30758,
which made installation of extra dependencies fail.

This PR makes it optional.

Include sequoia.com in INTHEWILD (#30814)

Reenable clear on TaskInstanceModelView for role User (#30415)

* Reenable clear on TaskInstanceModelView for role User

The action was disable in https://github.com/apache/airflow/pull/20659
which resolved https://github.com/apache/airflow/issues/20655. The issue
only mentions that the edit is broken and should be disabled. So it seem
like the disabling of the clear action was unintentional.

Also based on the discussion in the PR
https://github.com/apache/airflow/issues/20655 further reinforces this.
That the author believed it still worked could be explain by that using
a user with role `Admin` the action was still available and therefore
one could easily make a mistake believing it still worked as expected.

This PR reenables it action and modifies and existing test case to also
verify that clearing is possible using a user with the role `User`.

* Add back other set state actions

* fix static checks

---------

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

`ExternalTaskSensor`: add `external_task_group_id` to `template_fields` (#30401)

* Add missing info in external_task.py

Add missing external_task_group_id parameter to the ExternalTaskSensor docstring and template_fields.
As suggested, to match other operator classes add `(templated)` to templated fields.

add missing read for K8S config file from conn in deferred `KubernetesPodOperator`  (#29498)

* restore convert_config_file_to_dict method and deprecate it

Update log level in scheduler critical section edge case (#30694)

This log message can be useful if the scheduler ends up needing to query
TIs more than once per scheduler loop, so make it INFO vs DEBUG to
increase discoverability.

Validate `executor` and `config.core.executor` match (#30693)

The chart expects the executor to be set in `executor`, however if a
user only sets `config.core.executor` it is difficult to diagnose as the
chart deploys the wrong rbac resources. This tries to catch that
situation.

Count mapped upstreams only if all are finished (#30641)

* Fix Pydantic TI handling in XComArg.resolve()

* Count mapped upstreams only if all are finished

An XComArg's get_task_map_length() should only return an integer when
the *entire* task has finished. However, before this patch, it may
attempt to count a mapped upstream even when some (or all!) of its
expanded tis are still unfinished, resulting its downstream to be
expanded prematurely.

This patch adds an additional check before we count upstream results to
ensure all the upstreams are actually finished.

* Use SQL IN to find unfinished TI instead

This needs a special workaround for a NULL quirk in SQL.

Optimize performance of scheduling mapped tasks (#30372)

* Optimize performance of scheduling mapped tasks

* Provide max_tis_per_query as a parameter for the schedule_tis method

* Add max_tis_per_query to the JobPydantic class

---------

Co-authored-by: Zhyhimont Dmitry <zhyhimont.d@profitero.com>
Co-authored-by: Zhyhimont Dmitry <dzhigimont@gmail.com>

Update the user-facing documentation of providers (#30816)

We've recently clarified and described our policies for accepting
providers to be maintained by the community (#30657) - this was
directed towards the Airflow developers and contributors. This PR
reviews user-facing part of the documentation for providers by
removing some obsolete/not very useful documentation and pointing
to the new policy where appropriate.

Small refactors in ClusterGenerator of dataproc (#30714)

Rename most pod_id usage to pod_name in KubernetesExecutor (#29147)

We were using pod_id in a lot of place, where really it is just the pod
name. I've renamed it, where it is easy to do so, so things are easier
to follow.

Deprecate databricks async operator (#30761)

detailed docs (#30729)

fixed some errant strings in the kafka example dags (#30818)

* fixed some errant strings in the kafka example dags

* fixed some errant strings in the kafka example dags

Add repair job functionality to databricks hook (#30786)

* add repair job run functionality

* Add tests

Use template comments for the chart license header (#30569)

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

allow multiple prefixes in gcs delete/list hooks and operators (#30815)

Update the error message for invalid use of poke-only sensors (#30821)

Fix XCom deserialization when it contains nonprimitive values (#30819)

* Add testcase to show issue with deserialization

* fix XCom deserializion

---------

Co-authored-by: utkarsh sharma <utkarsharma2@gmail.com>

Add Fail Fast feature for DAGs (#29406)

Improve nested_dict serialization test (#30823)

---------

Co-authored-by: bolkedebruin <bolkedebruin@users.noreply.github.com>

Improve Quick Start instructions (#30820)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>

Add retry param in databrics async operator (#30744)

* Add retry param in databrics async operator

* Apply review suggestions

Optimize docs building in CI (#30825)

* Optimize docs building in CI

Docs building is the longest build for regular PRs - it takes 30 minutes
for any PR that touches any of the docs or python files.

This PR optimises it - only the affected packages will be built when
the PR touches only some of the files.

* fixup! Optimize docs building in CI

* fixup! fixup! Optimize docs building in CI

* fixup! fixup! fixup! Optimize docs building in CI

Optimize away pytest collection steps (#30824)

The Pytest collection steps are only needed if there are any tests
about to be run. There are cases where we build CI images but
we do not expect to run any tests (for doc-only …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KPO - deferrable - Invalid kube-config file. Expected key contexts in kube-config
6 participants