Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SDKs for google provider package #30067

Merged
merged 31 commits into from May 17, 2023
Merged

Conversation

lwyszomi
Copy link
Contributor

@lwyszomi lwyszomi commented Mar 13, 2023

As everyone know google provider package have a lot of old dependencies. I would like to start migration to the latest versions of the SDK. For now we are blocked by some other dependencies because they are using protobuf<4.

apache-beam
mysql-connector-python
yandexcloud

Also in the google SDKs we had a lot of breaking changes so after updating we need to adjust broken operators. I did investigation how big is this problem and I'm attaching the list of services where some of the operators are broken:

  • AutoML -> need investigation
  • BigQuery -> need investigation
  • BigTable -> need investigation
  • CloudBuild -> need investigation
  • CloudFunctions -> need investigation
  • CloudMemorystore -> OK
  • CloudSQL -> need investigation
  • Composer -> adjust system tests
  • Compute -> need adjustments in system tests
  • DataLossPrevention -> OK
  • Dataflow -> need investigation
  • dataform -> OK
  • datafusion -> OK
  • dataplex -> need investigation
  • dataprep -> need investigation
  • Dataproc -> need investigation
  • dataprocMetastore -> need investigation
  • datastore -> need adjustments in system tests
  • GCS -> need investigations
  • kubernetesEngine -> OK
  • LifeSciences -> OK
  • MLEngine -> need investigation
  • NaturalLanguage -> need investigation
  • PubSub - OK
  • Spanner - OK
  • SpeachToTet -> need investigation
  • SQLToSheets -> need investigation
  • StackDriver -> need investigation
  • StorageTransfer -> need investigation
  • Tasks -> need investigation
  • TextToSpeech -> need investigation
  • Transfers -> Need investigation
  • Translate -> OK
  • TranslateSpeech -> Need investigation
  • VertexAI - Need Investigation
  • VideoIntelligence - need investigation
  • Vision - need investigation
  • Workflows - OK

Fixes: #27292


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@felicienveldema
Copy link

felicienveldema commented Mar 21, 2023

Got referred to here from #27292 .
I'm experiencing an issue where I am not able to upgrade the deprecated Google Ads API 18.

I eventually get stuck with apache-airflow-providers-google depending on google-cloud-secret-manager < 2.x . Which depend on protobuf 3 which causes my predicament. Higher versions depend on protobuf > 4.5.x .

Is there progress on this ticket? Just wondering but keep up the good work

@lwyszomi
Copy link
Contributor Author

@felicienveldema We are still working on the changes in the google-provider package, but still we have some problems with the dependencies in 3 packages because they are still depends on protobuf<4.0 and probably at May we will have updates. So I think the will have new google-provider version supporting latest version of SDKs at the end of May or later.

@r-richmond
Copy link
Contributor

@potiuk

I got curious about the 3 listed packages above causing issues. One of them mysql-connector-python's latest version has protobuf pinned at 3.20.3. How do you anticipate this conflict will be resolved? (After reviewing the commit activity & Oracle's ownership I'm assuming they will be slow to update if they do at all, apologies if this is a bad assumption).

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

I got curious about the 3 listed packages above causing issues. One of them mysql-connector-python's latest version has protobuf pinned at 3.20.3. How do you anticipate this conflict will be resolved? (After reviewing the commit activity & Oracle's ownership I'm assuming they will be slow to update if they do at all, apologies if this is a bad assumption).

I have not looked at it yet. Do you have some ideas ?

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

Generally the options are:

  • replace the library with something else
  • exclude such provider (and stop releasing it) that holds us back
  • vendor-in the library and bump the dependency
  • make the dependency optional and skip tests for it
  • work with the maintainers and actively help them to upgrade

So we have a number of options we can follow.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

I just opened the issue to: yandex-cloud: yandex-cloud/python-sdk#71 and I will prepare support for disabling providers and excluding them if they are holding us back (cc: @eladkal).

I will also raise this to our devlist.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

Devlist discussion started: https://lists.apache.org/thread/j98bgw9jo7xr4fvjh27d6bfoyxr1omcm (especially CC: @eladkal especially) I am curious what you think.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

FYI: We have no problem with apache-beam: apache/beam#24599 - 2 weeks ago they marged protobuf bump, so we just neeed to wait for the next release

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

I also asked Oracle/MySQL in their forums (the only way we can do it) https://forums.mysql.com/read.php?50,708413 and see what they say. But I am also all for disabling mysql provider if they don't respond.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

We have first reaction: yandex-cloud/python-sdk#71 (comment)

@r-richmond
Copy link
Contributor

But I am also all for disabling mysql provider if they don't respond.

I'm also 100% for this (but I've got some bias in that I always use Postgres over mysql). I hesitated to suggest that since I was unsure if that would impact offering mysql as one of Airflow Meta DB backends.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

I'm also 100% for this (but I've got some bias in that I always use Postgres over mysql). I hesitated to suggest that since I was unsure if that would impact offering mysql as one of Airflow Meta DB backends.

I have to check but I think this has actually nothing to do with mysql metadata backend. For that we are using sqlalchemy and it has a few drivers it can choose from. And I think our driver for CI/tests is mysqlclient not mysql-connector-python.

BTW. This is another possibilty to rewrite the hooks to use mysqlclient. I might take a look at that actually.

@cgadam
Copy link

cgadam commented Mar 29, 2023

Hi, is it too risky for Airflow to just update from google-ads v18.0.0 to v18.2.0? See: #30353

Today v11 of Google API is sunsetting: https://developers.google.com/google-ads/api/docs/sunset-dates which means that current latest version of Airflow won't be officially compatible (due to its constraint file: https://raw.githubusercontent.com/apache/airflow/constraints-2.5.2/constraints-3.7.txt) with any google-ads package that can actually interact with the Google Ads API. (API calls will start failing)

Latest compatibility to a new API version was added in: https://github.com/googleads/google-ads-python/pull/672/files#diff-91c5b46dc84a94604a4e4d0caed9bf85590a2eddbb12d2e8dc80badf324a9dfbR9 (v17.0.0) and it added support v11 of the API.

v18.2.0 actually added support for v12 of the API. See here.

@cgadam
Copy link

cgadam commented Mar 30, 2023

Hi, is it too risky for Airflow to just update from google-ads v18.0.0 to v18.2.0? See: #30353

Today v11 of Google API is sunsetting: https://developers.google.com/google-ads/api/docs/sunset-dates which means that current latest version of Airflow won't be officially compatible (due to its constraint file: https://raw.githubusercontent.com/apache/airflow/constraints-2.5.2/constraints-3.7.txt) with any google-ads package that can actually interact with the Google Ads API. (API calls will start failing)

Latest compatibility to a new API version was added in: https://github.com/googleads/google-ads-python/pull/672/files#diff-91c5b46dc84a94604a4e4d0caed9bf85590a2eddbb12d2e8dc80badf324a9dfbR9 (v17.0.0) and it added support v11 of the API.

v18.2.0 actually added support for v12 of the API. See here.

We're in the dark night now. Sunset has passed 😅 We're now getting error: "Version v11 is deprecated. Requests to this version will be blocked."

@moiseenkov
Copy link
Contributor

moiseenkov commented Mar 31, 2023

Hi everyone,
Speaking about disabling mysql-connector-python, I found that the current MySqlHook implementation allows users to choose which library to use in an Airflow connection: mysql-connector-python or mysqlclient (default). What is the reason for it?

I'm wondering, because after removing the mysql-connector-python this feature will be no longer needed and can be removed as well. However, new libraries might appear in the future, and we will probably need it back then, so in this case it would be nice to save it for future use even if there will be only one option available. WDYT @potiuk, should we save it?

@potiuk
Copy link
Member

potiuk commented Mar 31, 2023

I'm wondering, because after removing the mysql-connector-python this feature will be no longer needed and can be removed as well. However, new libraries might appear in the future, and we will probably need it back then, so in this case it would be nice to save it for future use even if there will be only one option available. WDYT @potiuk, should we save it?

I have not thought about it yet. I am waiting for the response of Oracle (If it comes) for a week - according to our new policy that's being "lazy consent now" and then I will take a closer look at that after. There is also an option to tunr mysql-connector-python into ACTUALLY optional feature (which I think is the best option) - so make it an extra (we already have a few of those). In this case we should leave it.

@potiuk
Copy link
Member

potiuk commented Mar 31, 2023

@cgadam: It is likely we might have a proposal how to solve it soon - would you be willing to test it if I give you access to a beta/pre-release of google provider that you could test with it with an implemented worakround (with an intention of making it into next release?)

Beata Kossakowska and others added 11 commits May 17, 2023 20:57
Changes:
- update train model that is used for prediction
- update version and runner for ApacheBeam in utils for MLEngine
- update connection inside async hook
Changes:
- fix tests/system/providers/google/cloud/dataprep/example_dataprep.py
- Secret Manager was missing updating to v2, now expects a request dict
- Compute ssh had a bug when no cmd_timeout was passed
- Cloud Build tests were improved/refactored in community, so deleting
old ones
- googleapiclient.errors.HttpError was incorrectly used in our tests, it
it didn´t matter before but a change in the class makes HttpError()
raise an error in initialization the way we were using it before
- fix static checks

```
$ pytest tests/providers/google/cloud/
...
===== 2763 passed, 71 skipped, 21 warnings in 193.46s (0:03:13) =====
```
@potiuk
Copy link
Member

potiuk commented May 17, 2023

👀 👀 👀 👀

@kristopherkane
Copy link
Contributor

I wanted to say thanks for all this work and I've been tracking it from a distance. I'm looking forward to the updated Dataproc libs for further enhancements to the Dataproc serverless operator.

@potiuk
Copy link
Member

potiuk commented May 17, 2023

I wanted to say thanks for all this work and I've been tracking it from a distance. I'm looking forward to the updated Dataproc libs for further enhancements to the Dataproc serverless operator.

Thanks in the name of all the people who worked on that (I was also just helping) - it's rare to get an unsolicited positive feedback and a thank you note. So rare :).

@potiuk
Copy link
Member

potiuk commented May 17, 2023

Those are intermittent errors only (I need to make them more stable). Merging

@potiuk potiuk merged commit 28d1bf8 into apache:main May 17, 2023
58 of 61 checks passed
@potiuk
Copy link
Member

potiuk commented May 17, 2023

🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉
🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉
🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉

@potiuk
Copy link
Member

potiuk commented May 20, 2023

CC: @ephraimbuddy -> I just realized we will need it - I marked this one also for 2.6.2. While the "code" changes aren't used in the release from 2.6.2, the "dependency" part (provider.yaml and generated/provider_dependencies.json) will be needed to properly build CI once we release the new google provider with all its deps

@eladkal
Copy link
Contributor

eladkal commented May 20, 2023

CC: @ephraimbuddy -> I just realized we will need it - I marked this one also for 2.6.2. While the "code" changes aren't used in the release from 2.6.2, the "dependency" part (provider.yaml and generated/provider_dependencies.json) will be needed to properly build CI once we release the new google provider with all its deps

I'm hoping that we will release 2.6.2 as followup right after provider wave is released.

@eladkal eladkal added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jun 9, 2023
potiuk pushed a commit that referenced this pull request Jun 9, 2023
* Update SDK versions for Google provider

* Adjust google ads operators to v12

Changes:
- fix tests/system/providers/google/cloud/bigquery/example_bigquery_queries.py
- fix tests/system/providers/google/cloud/bigquery/example_bigquery_queries_async.py

* Fix GCS system tests

* Fix CloudBuild unit test

* Update BigTable operators to accomodate for new dependencies.

* Fix Cloud Tasks System tests

Tasks dag was quite flaky without the retry option in the run_task step,
but it's consistently green with the option set.

We also add a GCP_APP_ENGINE_LOCATION env variable since this depends on
the used GCP Project App Engine's location

* Add setup docstring to Tasks system tests.

* Update Vision operators to accommodate new dependencies.

Changes:
- fix methods for CloudVisionHook
- fix Vision Operators
- fix tests/providers/google/cloud/hooks/test_vision.py
- fix tests/providers/google/cloud/operators/test_vision.py
- fix tests/system/providers/google/cloud/vision/example_vision_annotate_image.py
- fix tests/system/providers/google/cloud/vision/example_vision_autogenerated.py
- fix tests/system/providers/google/cloud/vision/example_vision_explicit.py

* Update SpeechToText operators to accommodate new dependencies.

Changes:
- fix synthesize_speech method for CloudTextToSpeechHook
- fix CloudSpeechToTextRecognizeSpeechOperator
- fix tests/providers/google/cloud/operators/test_speech_to_text.py
- fix tests/providers/google/cloud/hooks/test_text_to_speech.py
- fix tests/providers/google/cloud/hooks/test_speech_to_text.py

* Update Translate Speech operators to accommodate new dependencies.

Changes:
- fix synthesize_speech method for CloudTextToSpeechHook
- fix CloudTranslateSpeechOperator
- tests/providers/google/cloud/operators/test_translate_speech.py

* Update VideoIntelligence operators to accommodate new dependencies.

Changes:
- fix annotate_video method for CloudVideoIntelligenceHook
- fix VideoIntelligence Operators
- fix tests/providers/google/cloud/hooks/test_video_intelligence.py
- fix tests/providers/google/cloud/operators/test_video_intelligence.py

* Update Compute Engine operators to accomodate for new dependencies.

Changes:
- added wait_for_operation_complete() method to check the execution flow
- added new attribute cmd_timeout for ComputeEngineSSHHook

* Fix Stackdriver system test

This test has not worked because of slack channel and credentials not
being setup. We now test the same operators by creating notification
channels and policy alerts against pubsub topics, which don't need to
exist before the test is ran, making the test self-contained.

* Update Natural Language operators to accommodate new dependencies.

Changes:
- fix airflow/providers/google/cloud/operators/natural_language.py
- fix airflow/providers/google/cloud/hooks/natural_language.py
- fix tests/providers/google/cloud/hooks/test_natural_language.py
- fix tests/providers/google/cloud/operators/test_natural_language.py
- fix tests/system/providers/google/cloud/natural_language/example_natural_language.py

* Update Composer system tests.

Fix environment id to contain underscores.

* Update AutoML operators to accommodate new dependencies.

Changes:
- add timeout parameter to all long-running operations for operators
- fix tests/system/providers/google/cloud/automl/example_automl_dataset.py
- fix tests/system/providers/google/cloud/automl/example_automl_model.py
- fix tests/system/providers/google/cloud/automl/example_automl_nl_text_extraction.py
- fix tests/system/providers/google/cloud/automl/example_automl_vision_classification.py

* Fix Cloud SQL delete operator

For some delete instance operations, the operation stops being available ~9 seconds after completion, so we need a shorter sleep time to make sure we don'tmiss the DONE status.

* Update VertexAI operators to accommodate new dependencies.

* Add SQL to Sheets Test instructions

* Update Dataproc Metastore operators to accommodate new dependencies.

* Update Dataproc operators to accommodate new dependencies.

* Update Dataflow sys tests to new sdk

* Update Dataproc on gke operators to accommodate new dependencies.

* Update MLEngine operators to accomodate new dependencies.

Changes:
- update train model that is used for prediction
- update version and runner for ApacheBeam in utils for MLEngine
- update connection inside async hook

* Update Dataprep operators to accommodate new dependencies.

Changes:
- fix tests/system/providers/google/cloud/dataprep/example_dataprep.py

* Add Dataflow Go system test

* Update providers.yaml for google

* fixup! Update providers.yaml for google

* Google SDK Fixes after rebase

- Secret Manager was missing updating to v2, now expects a request dict
- Compute ssh had a bug when no cmd_timeout was passed
- Cloud Build tests were improved/refactored in community, so deleting
old ones
- googleapiclient.errors.HttpError was incorrectly used in our tests, it
it didn´t matter before but a change in the class makes HttpError()
raise an error in initialization the way we were using it before
- fix static checks
* Fix Google providers type errors

---------

Co-authored-by: Lukasz Wyszomirski <wyszomirski@google.com>
Co-authored-by: Maksim Moiseenkov <maksim_moiseenkov@epam.com>
Co-authored-by: Eugene Kostieiev <kosteev@google.com>
Co-authored-by: Augusto Hidalgo <augustoh@google.com>
Co-authored-by: Beata Kossakowska <bkossakowska@google.com>
Co-authored-by: Ulada Zakharava <uladaz@google.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
(cherry picked from commit 28d1bf8)
howardyoo pushed a commit to howardyoo/airflow that referenced this pull request Mar 31, 2024
This is safe as Jinja template will not validate with s3 rules

Small quotation fix (#30448)

Co-authored-by: bugraozturk <bugra.ozturk@mollie.com>

Merge DbtCloudJobRunAsyncSensor logic to DbtCloudJobRunSensor (#30227)

* feat(providers/dbt): move the async execution logic from DbtCloudJobRunAsyncSensor to DbtCloudJobRunSensor

* test(providers/dbt): add test cases for DbtCloudJobRunSensor when its deferrable attribute is set to True

* docs(providers/dbt): update the doc for DbtCloudJobRunSensor deferrable mode and DbtCloudJobRunAsyncSensor deprecation

* refactor(proviers/dbt): deprecate poll_interval argument

* docs(providers/dbt): add deprecation note as DbtCloudJobRunAsyncSensor docstring

* fix(providers/dbt): check whether timeout is in kwargs

* docs(providers/dbt): add missing deferrable=True in howto_operator_dbt_cloud_run_job_sensor_defered

Collect test types in CI in parallel (#30450)

One of the steps in our CI is to collect tests, which is to prevent
parallel tests from even starting when we know test collection will
fail (it is terrible waste of resources to start 20 test jobs and
initialize databases etc. when we know it is not needed.

This however introduced a single point of delay in the CI process,
which with the recent collection protection implemented in #30315 piled
up to more than 5 minutes occassionally on the CI machines of ours,
especially on public runners.

This PR utilises our existing test framework to be able to parallelise
test collection (Pytest does not have paralllel collection
mechanism) - also for localised PRs it will only run test collection
for the test types that are going to be executed, which will speed it
up quite a lot.

This might get further sped up if we split Provider tests into
smaller groups to parallelise them even more.

remove stray parenthesis in spark provider docs (#30454)

Rename --test-types to --parallel-test-types parameters (#30424)

The --test-type and --test-types parameters were very similar, but
they have enough difference to differentiate them even more:

The --test-type is specifically to run single test and it might
include not only the regular "test-types" but also allow for
cross-selection of tests from different other types (for example
--test-type Postgres will run tests for Postgres database and
they might come from Providers, Others, CLI etc.

Where --test-types was generally foreseen to be able to split
the tests into "sepearated" groups that could be run in
parallel.

The parameters have different defaults and even different choice
of test type that you could choose from (and --test-types is a
space-separated one to make it easier to pass it around in CI,
where rather than passing multiple (variable number) of parameters,
it's easier to pass a single, even space-separated list of tests
to run.

This change is good to show the difference between then parameters
and to stress that they are really quite different, also it makes
it easier to avoid confusion people might have especially that the
name was easy to have typo in.

In a way (but different than in the original issue it
Fixes: #30407

Fix cloud build async credentials (#30441)

Fix bad merge conflict on test-name-parameter-change (#30456)

We've added a new reference to test-types in #30450 and it clashed
with parameter rename in #30424. This resulted in bad merge
(not too dangerous, just causing missing optimisation in collection
elapsed time in case only a subset of test types were to be executed.

Add description of the provider suspension process (#30359)

Following discussion at the devlist, we are adding description
of the suspension process for providers that hold us back from
upgrading old dependencies. Discussion here:

https://lists.apache.org/thread/j98bgw9jo7xr4fvjh27d6bfoyxr1omcm

Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>

fix: upgrade moment-timezone package to fix Tehran tz (#30455)

Discovery safe mode toggle comment clarification (#30459)

Fix Breeze failing with error on Windows (#30464)

When breeze is run on Windows it fails with FileNotFoundException
when running uname during emulation check. This is now fixed alongside
fixing TimeoutError misplacement - after moving it to local import,
an exception triggered before importing it causes UnboundLocaalError.

Related: https://github.com/apache/airflow/pull/30405#issuecomment-1496414377

Fixes: #30465

Update MANIFEST_TEMPLATE.in.jinja2 (#30431)

* Update MANIFEST_TEMPLATE.in.jinja2

* remove google

* remove README.md

Add mechanism to suspend providers (#30422)

As agreed in https://lists.apache.org/thread/g8b3k028qhzgw6c3yz4jvmlc67kcr9hj
we introduce mechanism to suspend providers from our suite of providers
when they are holding us back to older version of dependencies.

Provider's suspension is controlled from a single `suspend` flag in
`provider.yaml` - this flag is used to generate
providers_dependencies.json in generated folders (provider is skipped
if it has `suspended` flag set to `true`. This is enough to exclude
provider from the extras of airflow and (automatically) from being
used when CI image is build and constraints are being generated,
as well as from provider documentation/generation.

Also several parts of the CI build use the flag to filter out
such suspended provider from:

* verification of provider.yaml files in pre-commit is skipped
  in terms of importing and checking if classes are defined and
  listed in the provider.yaml
* the "tests" folders for providers are skipped automatically
  if the provider has "suspend" = true set
* in case of PR that is aimed to modify suspended providers directory
  tree (when it is not a global provider refactor) selective checks
  will detect it and fail such PR with appropriate message suggesting
  to fix the reason for suspention first
* documentation build is skipped for suspended providers
* mypy static checks will skip suspended provider folders while we
  will still run ruff checks on them (unlike mypy ruff does not
  expect the libraries that it imports to be available and we are
  running ruff in a separate environment where no airflow dependencies
  are installed anyway

Add docs for livy deferrable operator (#30397)

* Add docs for livy deferrable

* Add docs for livy deferrable

* Apply review suggestions

* Fix example DAG

add clarification about timezone aware dags (#30467)

* add clarification about timezone aware dags

Fix typo on index.rst file (#30481)

A duplicate word has been removed.

add template field for s3 bucket (#30472)

Fix typo in outputs in parallel-test-types (#30482)

The typo causes unnecessary delays on building regular PRs :(
It was introduced in #30424

Support serialization to Pydantic models in Internal API (#30282)

* Support serialization to Pydantic models in Internal API.

* Added BaseJobPydantic support and more tests

Add a new parameter for base sensor to catch the exceptions in poke method (#30293)

* add a new parameter for base sensor to catch the exception in poke method

* add unit test for soft_fail parameter

Allow to set limits for XCOM container (#28125)

Add AWS deferrable BatchOperator (#29300)

This PR donates the following BatchOperator deferrable developed in [astronomer-providers](https://github.com/astronomer/astronomer-providers) repo to apache airflow.

Update dead link in Sentry integration document (#30486)

* Update dead link in Sentry integration document

* fix

Add more info to quicksight error messages (#30466)

Revert "Add AWS deferrable BatchOperator (#29300)" (#30489)

This reverts commit 77c272e6e8ecda0ce48917064e58ba14f6a15844.

Fix output to outputs typos in ci.yaml everywhere (#30490)

(Facepalm) The typo of output -> outputs from #30482  was also in the
ci.yaml where it was used and it was missed in this PR.

I can blame the GitHub Actions stupid choice of accepting typoed
names of outputs and replacing them with blank strings (which I
raised as an issue a long time ago)

Reformat chart templates part 3 (#30312)

Move Pydantic classes for ORM objects to serialization (#30484)

The Pydantic classes are really part of the serialization
mechanism and they should be moved there, rather than kept in
the core packages they serialize, following our serialization
approach.

Separate mypy pre-commit checks (#30502)

Previously all mypy pre-commit checks were run as one "run-mypy"
check, but that does not allow to run them separately when trying
to fix some of them, only for a specific part of the sources.

This PR splits them into "dev", "core", "providers" and "docs".

Avoid logging sensitive information in triggerer job log (#30110)

* Change trigger name to task id instead of repr(trigger) to avoid logging sensitive information

Allow specifying a `max_depth` to the `redact()` call (#30505)

The default was hard-coded as 5 which is suitable for the logs
redacting, but the OpenLineage PR would like to be able to use a deeper
depth.

Fix deprecation warning in `example_sensor_decorator` DAG (#30513)

Put AIP-44 internal API behind feature flag (#30510)

This includes:

* configurable setting with defaults taken from env variable
* raising exception if config variables are used with feature
  flag not enabled
* hiding config values (adding mechanism to hide config values
  that are set for the future versions)
* skipping tests

Summarize skipped tests after tests are run (#30520)

When Pytest run tests it provides a summary of the tests. We are
running a lot of the tests so we are really interested only in cases
that are "interesting". So far we were not showing "skipped" tests
in the summary, because there were cases where a lot of tests
were skipped (mostly when integration tests were run - we collected
tests from "tests" folder and run only those tests that were not
skipped by @integration mark.

This however changed in #28170 as we moved all integration
tests to "integration" subfolder and now instead of large number of
skipped tests we run them selectively for each integration.

This should help in verifying that the skipped tests were skipped
for a good reason (and that we actually see which tests have been
skipped).

Add more type hints to the code base (#30503)

* Fully type Pool

Also fix a bug where create_or_update_pool silently fails when an empty
name is given. An error is raised instead now.

* Add types to 'airflow dags'

* Add types to 'airflow task' and 'airflow job'

* Improve KubernetesExecutor typing

* Add types to BackfillJob

This triggers an existing typing bug that pickle_id is incorrectly typed
as str in executors, while it should be int in practice. This is fixed
to keep things straight.

* Add types to job classes

* Fix missing DagModel case in SchedulerJob

* Add types to DagCode

* Add more types to DagRun

* Add types to serialized DAG model

* Add more types to TaskInstance and TaskReschedule

* Add types to Trigger

* Add types to MetastoreBackend

* Add types to external task sensor

* Add types to AirflowSecurityManager

This uncovers a couple of incorrect type hints in the base
SecurityManager (in fab_security), which are also fixed.

* Add types to views

This slightly improves how view functions are typechecked and should
prevent some trivial bugs.

Type related import optimization for Executors (#30361)

Move some expensive typing related imports to be under TYPE_CHECKING

Fix link to pre-commit-hook section (#30522)

* Change static link

* Update LOCAL_VIRTUALENV.rst

Do not use template literals to construct html elements (#30447)

Enable AIP-44 and AIP-52 by default for development and CI on main (#30521)

* Enable AIP-44 and AIP-52 by default for development and CI on main

The AIP-44 and AIP-52 are controlled now by environment variables,
however those variables were not passed by default to inside the
docker-compose environment so they had no effect when set on the
ci.yaml. This PR fixes it, but it also sets the variables to
be enabled by default in Breeze environment and when the tests
are run locally in main using local venv so that the contributors
are not surprised when they try to reproduce local failures.

In 2.6 branch, we will set both variables to "false" by default
in ci.yml, so that the tests are not run when we cherry-pick the changes.

* Update scripts/ci/docker-compose/devcontainer.env

Run "api_internal" tests in CI (#30518)

* Run "api_internal" tests in CI

While adding a feature flag for AIP-44 I noticed that due to a
weird naming we have in tests, the "api_internal" tests were
actually excluded from running - this was due to a combination of
factors:

* When API tests are are run, only "api" and "api_connexion" were
  added to API_tests
* We already have "api" folder in "tests" (for experimental api)
* finding "Other" tests should cover it but it exluded "api" tests but
  the way it is implemented, it took the "api" prefix and excluded
  all the test directories that were starting with "api" (including
  "api_internal/*" ones

This change addresses it twofold:

* The "api_internal" tests are added explicitly to the "API"
  test type
* The "tests/api" folder with tests for the experimental API
  has been renamed to "api_experimental" (including integration
  tests)

This should set the "internal_api" tests to run in the API test
type, and renaming the "api" to api_experimental should avoid
accidental skipping the tests in case someone adds
"tests/api_SOMETHING" in the future.

* Update Dockerfile.ci

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Update entrypoint_ci.sh

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

`EmailOperator`: fix wrong assignment of `from_email` (#30524)

* `EmailOperator`: fix wrong assignment of `from_email`

Add asgiref as a core dependency (#30527)

We added asgiref to core a few months back for the `sync_to_async` in
airflow.triggers.external_task.

Although the core http provider depends on asgiref since v 4.2 it is
possible to have an older version http installed meaning that you end up
without asgiref, which leads to every dag failing to parse as the
"dependency detector" code inside the DAG Serializer ends up importing
this module!

improve first PR bot comment (#30529)

Put AIP-52 setup/teardown tasks behind feature flag (#30509)

We aren't going to land AIP-52 in time for 2.6, so put the authoring api
behind a feature flag. I've chosen to put it in `airflow.settings` so
users can set it in `airflow_local_settings`, or set it via env var.

Add tests to PythonOperator (#30362)

* Add tests to airflow/operators/python.py

* Convert log error of _BasePythonVirtualenvOperator._read_result() into a custom exception class

* Improve deserialization error handling

---------

Co-authored-by: Shahar Epstein <shahar1@live.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Make AWS RDS operator page title consistent (#30536)

This PR changes the title of the RDS documentation page from "Amazon Relational Database Service Documentation (RDS)" to "Amazon Relational Database Service (RDS)". This page was the only one with the word "Documentation" in its title, and several other services had a similar title format of ("Amazon <full service name> (<acronym>)"), for example ["Amazon Simple Notification Service (SNS)"](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/sns.html).

Revert "Add tests to PythonOperator (#30362)" (#30540)

This reverts commit b4f3efd36a0566ef9d34baf071d935c0655a02ef.

Add new known warnings after dependencies upgrade. (#30539)

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Fix AzureDataFactoryPipelineRunLink get_link method (#30514)

Use default connection id for KubernetesPodOperator (#28848)

* Use default connection id for KubernetesPodOperator

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Fix dynamic imports in google ads vendored in library (#30544)

This is a further fix to vendor-in google-ads library:

* moving from _vendor to google_vendor is needed to keep
  the tree structure separate from `google` package used
  in google provider. If we do not do that, ads library
  will find "google" package when walking up the tree
  and will determine that this is the "top" google package

* dynamic imports of ads library imports have been updated
  to also import from the vendored-in library

* only v12 version is supported

Closes: #30526

BigQueryHook list_rows/get_datasets_list can return iterator (#30543)

Add deferrable mode to GKEStartPodOperator (#29266)

* Add deferrable mode to GKEStartPodOperator

* Change naming for GKEHook and add comments

* Rebase main, revert unrelated changes

* Add review suggestions + rebase

* Add deprecation warning for deleted method + rebase

Accept None for `EmailOperator.from_email` to load it from smtp connection (#30533)

* Add default None for  to load it from smtp connection

Update DV360 operators to use API v2 (#30326)

* Update DV360 operators to use API v2

* Update display_video.rst

* fixup! Update display_video.rst

* fixup! Update display_video.rst

---------

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>

Prepare docs for ad hoc release of Providers (#30545)

* Prepare docs for ad hoc release of Providers

* add smtp provider

* add google

Add --one-pass-only parameter to breeze docs build command (#30555)

The parameter was previously supported in the docs-build script
but it was not exposed via breeze commands.

It allows to iterate faster on docs building, by default docs building
runs up to 3 passes in order to account for new cross-references
between multiple providers, this flag makes it one pass, which makes
it faster to summarize the errors when you try to nail down a problem
with docs.

Load subscription_id from extra__azure__subscriptionId (#30556)

Use the engine provided in the session (#29804)

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>

fix release doc for providers (#30559)

Change timer type back to float (#30532)

Decouple "job runner" from BaseJob ORM model (#30255)

* Decouple "job runner" from BaseJob ORM model

Originally BaseJob ORM model was extended and Polymorphism has been
used to tie different execution logic to different job types. This
has proven to be difficult to handle during AIP-44 implementation
(internal API) because LocalTaskJob, DagProcessorJob and TriggererJob
are all going to not use the ORM BaseJob model, but they should use
BaseJobPydantic instead. In order to make it possible, we introduce
a new type of object BaseJobRunner and make BaseJob use the runners
instead.

This way, the BaseJobRunners are used for the logic of each of the
job, where single, non-polimorphic BaseJob is used to keep the
records in the database - as a follow up it will allow to completely
decouple the job database operations and move it to internal_api
component when db-lesss mode is enabled.

Closes: #30294

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Fix one more dynamic import needed for vendored-in google ads (#30564)

Continuation of #30544

Move Pydantic class serialization under AIP-44 feature flag (#30560)

The Pydantic representation of the ORM models is only used
in AIP-44 in-progress feature, and we are moving to a new
seialization implementation (more modular) in a near feature
so in order to not unecessarily extend features in old
serialization, but allow to test AIP-44, we are moving the
use_pydantic_models parameter and it's implementation under
_ENABLE_AIP_44 feature flag, so that it is not used accidentally.

We will eventually remove it and add Pydantic serialization to
the new serialization implementation.

Add podAnnotations to PgBouncer (#30168)

Added support for using SHA digest of Docker images (#30214)

Bump json5 to 1.0.2 and eslint-plugin-import to 2.27.5 in /airflow/www (#30568)

Bumping from 1.0.1 for json5 and 2.26.0

Update dataproc.rst (#30566)

Making statement more contextual so change proposed here is "a provide" to "to provide"

Quieter output during asset compilation (#30565)

The "Still waiting ....." message was emitted every second, which can be
quite noisy even on moderate machines. This reduces the message to once
every 5 seconds.

Rename JobRunner modules to *_job_runner and base_job* to job (#30302)

The #30255 introduced "JobRunner" concept and decoupled the job logic
from the ORM polymorphic *Job objects. The change was implemented
in the way to minimise the review effort needed, so it avoided renaming
the modules for the runners (from `_job` to `_job_runner`).

Also BaseJob lost its "polymorphism" properties so the package, and class name
can be renamed to simply job.

This PR completes the JobRunner concept introduction by applying the
renames.

Closes: #30296

Speed up dag runs deletion (#30330)

* Provide custom deletion for dag runs to speed up when a dag run has a lot of related task instances
---------

Co-authored-by: Zhyhimont Dmitry <zhyhimont.d@profitero.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Adding taskflow API example for sensors (#30344)

Use connection URI in SqliteHook (#28721)

* Use connection URI in SqliteHook

This allows the user to define more sqlite args such as mode. See https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#uri-connections for details.
- remove unsupported schema, login and password fields in docs
- add info about host field to docs

Release notes for helm chart 1.9.0 (#30570)

Do not remove docker provider for Airflow 2.3 check (#30483)

This removal is a remnant of old docker provider for 2.2 and should
not be happening.

Separate and split run job method into prepare/execute/complete steps (#30308)

* Separate and split run job method into prepare/execute/complete steps

As a follow-up after decoupling of the job logic from the BaseJob
ORM object (#30255), the `run` method of BaseJob should also be
decoupled from it (allowing BaseJobPydantic to be passed) as well
as split into three steps, in order to allow db-less mode.

The "prepare" and "complete" steps of the `run` method are modifying
BaseJob ORM-mapped object, so they should be called over the
internal-api from LocalTask, DafFileProcessor and Triggerer running
in db-less mode. The "execute" method however does not need the
database however and should be run locally.

This is not yet full AIP-44 conversion, this is a prerequisite to do
so - and AIP-44 conversion will be done as a follow-up after this one.

However we added a mermaid diagram showing the job lifecycle with and
without Internal API to make it easier to reason about it

Closes: #30295

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

Update SQLAlchemy `select()` to new style (#30515)

SQLAlchemy has a new style for `select() that is standard for 2.0. This
updates our uses of it to avoid `RemovedIn20Warning` warnings.

https://docs.sqlalchemy.org/en/20/errors.html#select-construct-created-in-legacy-mode-keyword-arguments-etc

Remove JobRunners back reference from Job (#30376)

This is the final step of decoupling of the job runner from ORM
based BaseJob. After this change, finally we rich the state that
the BaseJob is just a state of the Job being run, but all
the logic is kept in separate "JobRunner" entity which just
keeps the reference to the job. Also it makes sure that
job in each runner is defined as appropriate for each job type:

* SchedulerJobRunner, BackfillJobRunner can only use BaseJob
* DagProcessorJobRunner, TriggererJobRunner and especially the
  LocalTaskJobRunner can keep both BaseJob and it's Pydantic
  BaseJobPydantic representation - for AIP-44 usage.

The highlights of this change:

* Job does not have job_runner reference any more
* Job is a mandatory parameter when creating each JobRunner
* run_job method takes as parameter the job (i.e. where the state
  of the job is called) and executor_callable - i.e. the method
  to run when the job gets executed
* heartbeat callback is also passed a generic callable in order
  to execute the post-heartbeat operation of each of the job
  type
* there is no more need to specify job_type when you create
  BaseJob, the job gets its type by a simply creating a runner
  with the job

This is the final stage of refactoring that was split into
reviewable stages: #30255 -> #30302 -> #30308 -> this PR.

Closes: #30325

Cast binding +1 in helm chart release vote email (#30590)

We will assume that the release manager for the helm chart wants to cast
a binding +1 vote :)

Databricks SQL sensor (#30477)

* Renamed example DAG

Add Hussein to committers (#30589)

Add support in AWS Batch Operator for multinode jobs (#29522)

picking up #28321 after it's been somewhat abandoned by the original author.
Addressed my own comment about empty array, and it should be good to go I think.

Initial description from @camilleanne:

Adds support for AWS Batch multinode jobs by allowing a node_overrides json object to be passed through to the boto3 submit_job method.

Adds support for multinode jobs by properly parsing the output of describe_jobs (which is different for container vs multinode) to extract the log stream name.
closes: #25522

Fix CONTRIBUTORS_QUICK_START Doc (#30549)

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Use custom validator for OpenAPI request body (#30596)

* Use custom validator for OpenAPI request body

The default error message for an empty request body from Connexion
is quite unhelpful (taken directly from JSONSchema). This custom
validator emits a more helpful message for this particular context.

* Add test for custom request body validator

Co-Authored-By: maahir22 <56473490+maahir22@users.noreply.github.com>

---------

Co-authored-by: maahir22 <56473490+maahir22@users.noreply.github.com>

Remove 'run-' prefix from pre-commit jobs (#30597)

* Remove 'run-' prefix from pre-commit jobs

The job ID already implies 'run', and having the additional prefix
results in weird CLI, e.g. 'pre-commit run run-mypy-core'. This changes
the CLI to 'pre-commit run mypy-core', which reads better.

* Fix table marker

* Fix outdated pre-commit hook ID references

Add ability to override waiter delay in EcsRunTaskOperator (#30586)

Prepare docs for RC2 of provider wave (#30606)

Deactivate DAGs deleted from within zipfiles (#30608)

DagBag: Use dag.fileloc instead of dag.full_filepath in exception message (#30610)

Co-authored-by: Douglas Staple <staple.douglas@gmail.com>

Remove gauge scheduler.tasks.running (#30374)

* Remove gauge scheduler.tasks.running

* Add significant.rst file

* Update newsfragments/30374.significant.rst

---------

Co-authored-by: Niko Oliveira <onikolas@amazon.com>

Recover from `too old resource version exception` by retrieving the latest `resource_version` (#30425)

* Recover from `too old resource version exception` by retrieving the latest `resource_version`

* Update airflow/executors/kubernetes_executor.py

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

docs: use correct import path for Dataset (#30617)

Speed up TaskGroups with caching property of group_id (#30284)

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Fix `TriggerDagRunOperator` with deferrable parameter (#30406)

* readding after borked it

* pre-commit

* finally fixing after the github issue last week

* push fix

* feedback from hussein

Fix failing SQS tests on moto upgrade (#30625)

The new moto (4.1.7) performs additional validation on the queues
created during tests and it failes the tests when content
deduplication is not specified.

Explicit setting the deduplication mode, fixes the problem and
allows the new moto to be installed.

fix possible race condition when refreshing DAGs (#30392)

* fix possible race condition when refreshing DAGs

* merge the two queries into one

* Remove provide_session from internal function

Since get_latest_version_hash_and_updated_datetime is internal and we
always pass in the session anyway, the provide_session decorator is
redundant and only introduce possibility for developer errors.

---------

Co-authored-by: Sébastien Brochet <sebastien.brochet@nielsen.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Remove Norm and Hussein from the triage group (#30627)

Hussein is now a commiter and Norm has completed building out the
initial AIP-52 tasks.

Remove mysql-connector-python (#30487)

* Turn the package 'mysql-connector-python' as an optional feature

* Update airflow/providers/mysql/provider.yaml

* Update airflow/providers/mysql/CHANGELOG.rst

---------

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Better error message where non-compatible providers are not excluded (#30629)

When compatibility check is performed for old version of Airflow,
we attempt to install all providers for the old version. However if
one of the providers has >= limit on Airflow for newer version of
Airflow, this installation lead to attempting to upgrade airflow
rather than fail, which could lead to misleading errors.

This change adds "airflow==x.y.z" taken from --use-airflow-version
flag to the `pip install` command, which should in this case fail
with much more accurate message, that the provider conflicts with
airflo version.

Updating the links to the Dataform product documentation to fix 404 redirect error (#30631)

Updating the links to the Dataform product documentation to fix 404 redirect error

New AWS sensor — DynamoDBValueSensor (#28338)

Remove duplicate param docstring in EksPodOperator (#30634)

In `DockerOperator`, adding an attribute `tls_verify` to choose whether to validate certificate (#30309) (#30310)

* add `tls_verify` to choose whether to validate certificate (#30309)

---------

Co-authored-by: Hussein Awala <hussein@awala.fr>

Add `max_active_tis_per_dagrun` for Dynamic Task Mapping (#29094)

* add max_active_tis_per_dagrun param to BaseOperator

* set has_task_concurrency_limits when max_active_tis_per_dagrun is not None

* check if max_active_tis_per_dagrun is reached in the task deps

* check if all the tasks have None max_active_tis_per_dagrun before auto schedule the dagrun

* check if the max_active_tis_per_dagrun is reached before queuing the ti

* check max_active_tis_per_dagrun in backfill job

* fix current tests and ensure everything is ok before adding new tests

* refacto TestTaskConcurrencyDep

* fix a bug in TaskConcurrencyDep

* test max_active_tis_per_dagrun in TaskConcurrencyDep

* tests max_active_tis_per_dagrun in TestTaskInstance

* test dag_file_processor with max_active_tis_per_dagrun

* test scheduling with max_active_tis_per_dagrun on different DAG runs

* test scheduling mapped task with max_active_tis_per_dagrun

* test max_active_tis_per_dagrun with backfill CLI

* add new starved_tasks filter to avoid affecting the scheduling perf

* unify the usage of TaskInstance filters and use TI

* refacto concurrecy map type and create a new dataclass

* move docstring to ConcurrencyMap class and create a method for default_factory

* move concurrency_map creation to ConcurrencyMap class

* replace default dicts by counters

* replace all default dicts by counters in the scheduler_job_runner module

* suggestions from review

Simplify logic to resolve tasks stuck in queued despite stalled_task_timeout (#30375)

* simplify and consolidate logic for tasks stuck in queued

* simplify and consolidate logic for tasks stuck in queued

* simplify and consolidate logic for tasks stuck in queued

* fixed tests; updated fail stuck tasks to use run_with_db_retries

* mypy; fixed tests

* fix task_adoption_timeout in celery integration test

* addressing comments

* remove useless print

* fix typo

* move failure logic to executor

* fix scheduler job test

* adjustments for new scheduler job

* appeasing static checks

* fix test for new scheduler job paradigm

* Updating docs for deprecations

* news & small changes

* news & small changes

* Update newsfragments/30375.significant.rst

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Update newsfragments/30375.significant.rst

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* added cleanup stuck task functionality to base executor

* fix sloppy mistakes & mypy

* removing self.fail from base_executor

* Update airflow/jobs/scheduler_job_runner.py

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Update airflow/jobs/scheduler_job_runner.py

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Fix job_id filter

* Don't even run query if executor doesn't support timing out queued tasks

* Add support for LocalKubernetesExecutor and CeleryKubernetesExecutor

* Add config option to control how often it runs - we want it quicker than
the timeout

* Fixup newsfragment

* mark old KE pending pod check interval as deprecated by new check interval

* Fixup deprecation warnings

This more closely mirrors how deprecations are raised for "normal"
deprecations.

I've removed the depth, as moving up the stack doesn't really help the
user at all in this situation.

* Another deprecation cleanup

* Remove db retries

* Fix test

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Jed Cunningham <jedcunningham@apache.org>
Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>

Display Video 360 cleanup v1 API usage (#30577)

* Display Video 360 cleanup v1 API usage

* Update docs

Fix mapped tasks partial arguments when DAG default args are provided (#29913)

* Add a failing test to make it pass

* use partial_kwargs when they are provide and override only None values by dag default values

* update the test and check if the values are filled in the right order

* fix overriding retry_delay with default value when it is equal to 0

* add missing default value for inlets and outlets

* set partial_kwargs dict type to dict[str, Any] and remove type ignore comments

* create a dict for default values and use NotSet instead of None to support None as accepted value

* update partial typing by removing None type from some args and set NotSet for all args

* Tweak kwarg merging slightly

This should improve iteration a bit, I think.

* Fix unit tests

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

First commit of OpenLineage provider. (#29940)

This PR consistent mostly of code that was created in OpenLineage project. It
consists of

- Provider wiring
- OpenLineageListener that uses Listener API to get notification about changes
  to TaskInstance and Dag states
- Extractor framework, which is used to extract lineage information from
  particular operators. It's ment to be replaced by direct implementation of
  lineage features in later phase and extracting them using DefaultExtractor.
  This PR does not include actual extractors, but code around using and registering them.
- OpenLineageAdapter that translates extracted information to OpenLineage events.
- Utils around specific Airflow OL facets and features

This is a base implementation that's not ment to be released yet, but to add
code modified to be consistent with Airflow standards, get early feedback and
provide canvas to add later features, docs, tests on.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>

Add v2-6-test and v2-6-stable to codecov and protected branches (#30640)

Adding configuration to control retry parameters for k8s api client (#29809)

* Adding configuration to control retry parameters for k8s api client

* Handling review comments

* Fixing code bug

* Fixing failing tests

* Temporary commit with UT wip

* Fixing unit test

* Fixing the strict checks

* Handling review comments from Hussein

* Revert "Handling review comments from Hussein"

This reverts commit fa3bc260f7462c42620f694ee97b7f15c0b0b9c3.

* Fixing failing ut

* Reverting bad hack

* Updating logic in kube_client.py

Co-authored-by: Hussein Awala <hussein@awala.fr>

* Fixing unit tests

* Fixing unit tests

* Handling review comments from Ash

* Fix loading mock call args for python3.7

* Apply suggestions from code review

* fix static check

* add in 2.6.0

---------

Co-authored-by: Amogh <adesai@cloudera.com>
Co-authored-by: Hussein Awala <houssein.awala.96@gmail.com>

fix(chart): webserver probes timeout and period. (#30609)

* fix(chart): webserver probes timeout and period

* Update default values in JSON schema to reflect values.yaml

* remove defautl templated values

Clarify release announcements on social media (#30639)

DynamoDBHook - waiter_path() to consider `resource_type` or `client_type` (#30595)

* Add  while initilizing

* Add  while initilizing

* Add logic to pick either client_type or resource_type

* Add test case

* Assert expected path

Improve task & run actions ux in grid view (#30373)

* update run clear+mark, update task clear

* add mark as tasks and include list of affected tasks

* Add support for mapped tasks, add shared modal component

* Clean up styling, restore warning for past/future tg clear

Add command to get DAG Details via CLI (#30432)

---------

Co-authored-by: Hussein Awala <hussein@awala.fr>
Co-authored-by: Hussein Awala <houssein.awala.96@gmail.com>

When clearing task instances try to get associated DAGs from database (#29065)

* When clearing task instances try to get associated DAGs from database.

This fixes problems when recursively clearing task instances across multiple DAGs:
  * Task instances in downstream DAGs weren't having their `max_tries` property incremented, which could cause downstream external task sensors in reschedule mode to instantly time out (issue #29049).
  * Task instances in downstream DAGs could have some of their properties overridden by an unrelated task in the upstream DAG if they had the same task ID.

* Use session fixture for new `test_clear_task_instances_without_dag_param` test.

* Use session fixture for new `test_clear_task_instances_in_multiple_dags` test.

---------

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Organize Amazon providers docs index (#30541)

preload airflow imports before dag parsing to save time (#30495)

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>

Add delete inactive run functionality to databricks provider (#30646)

Create audit_logs.rst (#30405)

---------

Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com>

Present affected task instances as table (#30633)

Helm chart 1.9.0 has been released (#30649)

Add 2.6.0b1 to issue template (#30652)

add missing project_id in BigQueryGetDataOperator (#30651)

Properly classify google_vendor package to google provider (#30659)

We've recently added google_venor package to vendor-in ads library,
and we had to do it outside of regular google provider package,
because internally the library assumed our google package is top
level package when discovering the right relative imports (#30544).

This confused the pre-commit that updates provider depedencies to
not recognise the package and print warnings about bad classification.

Special case handling will classify it to google provider.

Make pandas optional in workday calendar example (#30660)

The workday calendar expected pandas to be available and it is part
of our examples, however Airflow does not have pandas as a core
dependency, so in case someone does not have pandas installed, importing
of the workday example would fail.

This change makes pandas optional and fallbacks to regular working
days for the example in case it is not available (including warning
about it). It also fixes a slight inefficiency where the
USFederalHoliday calendar has been created every time next workday
was calculated.

Update Google Campaign Manager360 operators to use API v4 (#30598)

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Skip KubernetesPodOperator task when it returns a provided exit code (#29000)

* Skip KubernetesPodOperator task when it returns a provided exit code

* set default value to None, and get exit code only when skip_exit_code is not None

* get the exit code for the base container and check if everything is ok

* add unit test for the operator

* add a test for deffered mode

* apply change requests

---------

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Upgrade Pip to 23.1 (#30663)

Fix docs building for workday example. (#30664)

The #30660 was merged to quickly as it results in doc building
failure. This PR fixes it.

docker compose doc changes (#30662)

Add suspended providers to pytest collection test (#30668)

Pytest collection has been extracted recently to a separate job
and SUSPENDED_PROVIDERS_FOLDERS variable was not set in the new
job - which causes suspended provider tests being attempted by
pytest collection, leading to import errors when suspended providers
have some dependencies removed from our image.

Workaround type-incompatibility with new attrs in openlineage (#30674)

The new attrs released today (11 hours ago) had added typing
information and they caused OpenLineageRedactor to fail mypy checks.

Temporary adding type: ignore should allow to upgrade to the new
attrs and stop PRs changing dependencies from failing.

Related: #30673

Update the release note (#30680)

* Update the release note

During the beta release, I observed some minor things that need fixing.
Here's the PR

* Use local import

Correctly pass a type to attrs.has() (#30677)

Merge WasbBlobAsyncSensor to WasbBlobSensor (#30488)

Updated app to support configuring the caching hash method for FIPS v2 (#30675)

Install twine with --force for package verification (#30683)

In some cases when the machine has been reused across builds, pipx
installed twine might seem both installed and removed (this happens
when builds are cancelled while installing twine.

Installing twine with --force should fix the problem.

Fix docs: add an "apache" prefix to pip install (#30681)

Remove unittests.TestCase from tests/test_utils (#30685)

Introduce consistency of package sequence for "Other" test type (#30682)

When packages for "Other" test type are calculated, the list of
all test folders is generated and they are compared with the
packages previously selected by the "predefined" test types. This
is done via `find` method that returns the folders in arbitrary
order, mostly depending on the sequence the folders were created.

In case the tests from some packages have some side-effects that
impact tests in other packages (obviously not something that is
desired), this might end up that the tests succeed in one
environment, but fail in another. This happened for example
in case of #30362 that had cross-package side-effect later
fixed in #30588. There - results of "Other" test type depended
on where the tests were executed.

This PR sorts the find output so it is always in consistent order.
we are using ASCII for package names and the test types are
derived in the same Docker CI image with the same LOCALE, so it
should guarantee that the output of packages for "Other" test type
should be always consistent.

Add missing version val to caching_hash_method config (#30688)

Upgrade to MyPy 1.2.0 (#30687)

Upgrading to latest (released a week ago) MyPy in the hopes it
will fix some more problem with attrs after upgrading new packages,
but it seems that even the latest MyPy does not know about the
new typing changes introduced in attrs (traditionally mypy has
attrs plugin that injects appropriate typing but apparently it
needs to catch up with those changes.

Parallelize Helm tests with multiple job runners (#30672)

Helm Unit tests are using template rendering and the rendering
uses a lot of CPU for `helm template command`. We have a lot of
those rendering tests (>800) so even running the tests in parallel
on a multi-cpu machine does not lead to a decreased elapsed time
to execute the tests.

However, each of the tests is run entirely independently and we
should be able to achieve much faster elapsed time if we run
a subset of tetsts on separate, multi-CPU machine. This will not
lower the job build time, however it might speed up elapsed time
and thus get a faster feedback.

This PR achieves that.

Skip PythonVirtualenvOperator task when it returns a provided exit code (#30690)

* Add a new argument to rais skip exception when the python callable exit with the same value

* add unit tests for skip_exit_code

[OTel Integration] Add tagging to existing stats (#30496)

rename skip_exit_code to skip_on_exit_code and allow providing multiple codes (#30692)

* rename skip_exit_code to skip_on_exit_code and allow providing multiple codes

* replace list type by Container

Fix d3 dependencies (#30702)

Update system test example_emr to have logs (#30715)

Fixed logging issue (#30703)

Co-authored-by: Mark Richman <mrkrchm@amazon.com>

Separate out and clarify policies for providers (#30657)

This change separates out the policies we have for providers to
a separate PROVIERS.rst file. It also documents clearly the process
and policy we have for accepting new community-managed providers,
explaining the conditions that have to be fulfilled and stating
a very strong preference of keeping providers maintained by the
3rd-party providers when there are 3rd-party teams that manage
the providers.

SqlToS3Operator - Add feature to partition SQL table (#30460)

Optimize parallel test execution for unit tests (#30705)

We are runnig the tests in parallel test types in order to speed
up their execution. Howver some test types and subsets of tests
are taking far longer to execute than other test types.

The longest tests to run are Providers and WWW tests, and the
longest tests from Providers are by far Amazon tests, then
Google. "All Other" Provider tests take about the same time
as Amazon tests - also after splitting the provider tests,
Core tests take the longest time.

When we are running tests in parallel on multiple CPUs, often
the longest running tests remain runing on their own while the
other CPUS are not busy. We could run separate tests type
per provider, but overhead of starting the database and collecting
and initializing tests for them is too big for it to achieve
speedups - especially for Public runners, having 80 separate
databases with 80 subsequent container runs is slower than
running all Provider tests together.

However we can split the Provider tests into smaller number of
chunks and prioritize running the long chunks first. This
should improve the effect of parellelisation and improve utilization of
our multi-CPU machines.

This PR aims to do that:

* Split Provider tests (if amazon or google are part of the
  provider tests) into amazon, google, all-other chunks

* Move sorting of the test types to selective_check, to sort the
  test types according to expected longest running time (the longest
  tests to run are added first)

This should improve the CPU utilization of our multi-CPU runners
and make the tests involving complete Provider set (or even sets
containing amazon, google and few other providers)
execute quite a few minutes faster on average.

We could also get rid of some sequential processing for the Public PRs
because each test type we will run will be less demanding overall. We
used to get a lot of 137 exit codes (memory errors) but with splitting
out Providers, the risk of exhausting resources be two test types
running in paralel are low.

Deprecate `skip_exit_code` in `BashOperator` (#30734)

Add explicit information about how to write task logs (#30732)

There was no explicit information in our documentation on how to
write logs from your tasks. While for classic operators, that is
easy and straightforward as they all have log property which
is the right logger coming from LoggingMixin, for taskflow code
and custom classes it is is not straightforward that you have to
use `airflow.task` logger (or a child of it) or that you have
extend LoggingMixin to use the built-in logging configuration.

Suspend Yandex provider due to protobuf limitation (#30667)

Yandex provider brings protobuf dependency down to <4 and we are gearing
up to updating it everywhere else. Protobuf3 support ends in Q2 2023
for Python https://protobuf.dev/support/version-support/#python

Yandex is the last provider that we do not closely collaborate with on fixing
* Gogle provider dependencies are actively upgraded to latest version
  by Google led team: #30067 (some of the libraries are already updated)
  with target to update all dependencies by mid-May
* Apache-Beam has already merged protobuf4 support
  https://github.com/apache/beam/pull/25874 with the target of
  releasing it in 2.47.0 mid-May
* The mysql-connector-python in MySQL provider is already turned into
  optional dependency: #30487

The only remaining dependency limiting us to protobuf 3 (<3.21) is
yandexcloud. We've opened an issue to yandexcloud
https://github.com/yandex-cloud/python-sdk/issues/71 3 weeks ago
and while there was an initial interest, there is no progress on
the issue, therefore - in order to prepare for running all
the tests and final migration to protobuf4 we need to suspend
Yandex provider - following the suspension process we agreed
and got a LAZY CONSENSUS on in
the https://lists.apache.org/thread/g8b3k028qhzgw6c3yz4jvmlc67kcr9hj
mailing list discussion.

The yandex provider can be removed from suspension by a PR reverting
this change once yandexcloud dependency removes the protobuf limitation
in their release and PR reverting this change (and fixing all tests
and static check that will be needed) is the way it can be done.

Add a collapse grid button (#30711)

Add skip_on_exit_code also to ExternalPythonOperator (#30738)

The change ##30690 and #30692 added skip_on_exit_code to the
PythonVirtualenvOperator, but it skipped the - very closely related
- ExternalPythonOperator.

This change brings the same functionality to ExternalPythonOperator,
moves it to the base class for both operators, it also adds
separate Test class for ExternalPythonOperator, also introducing
a common base class and moving the test methods that are common
to both operators there.

Add multiple exit code handling in skip logic for BashOperator (#30739)

Follow-up after #30734

Deprecate `skip_exit_code` in `DockerOperator` and `KubernetesPodOperator` (#30733)

* Deprecate `skip_exit_code` in `DockerOperator` and `KubernetesPodOperator`

* satisfy mypy

Remove protobuf limitation from eager upgrade (#30182)

Protobuf limitation was added to help pip resolve eager upgrade
dependencies, however it is not needed any more.

Fix misc grid/graph view UI bugs (#30752)

add a stop operator to emr serverless (#30720)

* add a stop operator to emr serverless

* update doc
---------

Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>

Better explanation on how to log from tasks (#30746)

* Better explanation on how to log from tasks

After Daniel's explanation this should provide a better description
on how to log from tasks.

Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
Co-authored-by: Niko Oliveira <onikolas@amazon.com>

Skip suspended providers when generating providers summary index (#30763)

When provider's summary index gets generated it should not
include suspended providers. This has been missed in #30422

Fix when OpenLineage plugins has listener disabled. (#30708)

Add parametrized test for disabling OL listener in plugin.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>

Split installation of sdist providers into parallel chunks (#29223)

Sdist provider installation takes a lot of time because pip cannot
parallelise the sdist package building. But we still want to test all
our provider's installation as sdist packages.

This can be achieved by running N parallel installations with only
subset of providers being installed in each chunk.

This is what we do in this PR.

Speed up package wheel job in CI (#30766)

After recent improvements, package wheel job has become one
of the longest jobs to run. So far it sequentially build airlfow,
prepared documentation for packages, build the packages
installed both airflow and packages and tested imports for them, then it
was removing installed airlfow, removed airflow and run the same tests
with 2.3 airflow version to check for compatibility.

This change splits it into two parallel jobs. There is a small
duplication (3 minutes of preparing the whl packages) but then
the "compatibility" job does not need Airflow and few other
steps to be run (such as preparing docs or airlfow) and overall
we just get few minutes longer to repeate the wheel package
preparation but then each of the two jobs will take a bit more
than half the time of the original way, which will greately improve
feedback time for the users (in most cases the two jobs will complete
under 12 minutes, where the original job needed 21 minutes to complete.

Use material icons for dag import error banner (#30771)

* Use material icons for dag import error banner

* fix message caret direction

Update DataprocCreateCluster operator to use 'label' parameter properly (#30741)

Add multiple exit code handling in skip logic for `DockerOperator` and `KubernetesPodOperator` (#30769)

remove delegate_to from GCP operators and hooks (#30748)

Remove @poke_mode_only from EmrStepSensor (#30774)

* Remove @poke_mode_only from EmrStepSensor

* Add EmrStepSensor to system test and documentation

* Fix test

add pod status phase to KPO test mock (#30782)

Export SUSPENDED_PROVIDERS_FOLDERS for breeze testing commands (#30780)

Export the SUSPENDED_PROVIDERS_FOLDERS env var in breeze directly
instead of in Airflow CI workflows. This will fix the issue for users
executing `breeze testing ...` commands locally.

Add openlineage to boring-cyborg.yml (#30772)

Improve url detection (#30779)

Adapt to better resolver of pip (#30758)

We used to have helper limits for eager upgrade of our packages
but with 23.1 updated in #30663 pip has a much improved resolver that
does not need that much of a help and can resolve our dependnecies
pretty fast on its own, so we can remove all the dependecy limts that
aimed to limit the dependency resolution time.

Also we used to have a mechanism to track backtracking issues and
find out which of the new dependencies caused excessive backtracking.
This seems to be not needed so we can remove it from CI and breeze.

Add explanation on why we have two local pre-commit groups (#30795)

Remove skip_exit_code from KubernetesPodOperator (#30788)

Since the parameter was not released we can safely remove it without deprecation.

Better message on deserialization error (#30588)

Previously deserialization error thrown a pretty mysterious ValueError
in case for example there was a Python version mismatch - the python
object serialized in one version of Python produced "version error"
messsage. This change turns such ValueError in specific
Deserilization error, with better message explaining possible reason,
but also without loosing the cause.

Co-authored-by: Shahar Epstein <shahar1@live.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>

AWS logs. Exit fast when 3 consecutive responses are returned from AWS Cloudwatch logs (#30756)

* AWS logs. Exit fast when 3 consecutive responses are returned from AWS Cloudwatch logs

---------

Co-authored-by: Niko Oliveira <onikolas@amazon.com>

Add provider for Apache Kafka (#30175)

* Add provider for Apache Kafka

Pulls in a series of integrations to Kafka from airflow-provider-kafka (https://pypi.org/project/airflow-provider-kafka/) to core airflow.

---------

Co-authored-by: Tamara Janina Fingerlin <90063506+TJaniF@users.noreply.github.com>
Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com>

Remove deprecated code from Amazon provider (#30755)

* Remove deprecated code from Amazon provider

Prepare docs for adhoc release of providers (#30787)

* Prepare docs for adhoc release of providers

Speed up test collection (#30801)

Test collection had default setting for parallel test types because
TEST_TYPES variable had not been renamed to PARALLEL_TEST_TYPES
Also test collection can be run in Wait for CI images job which
should save around a minute for setting up Breeze and pulling the
images.

This should speed up pytest collection test by around 1 and half minute

Removed Mandatory Encryption in Neo4jHook (#30418)

* Removed mandatory encryption in neo4jhook

* Added unit tests and altered exising

* Added unit-test and fixed existing ones.

* Changed the implementation of get_client

* Changed test for encrypted param

* fix unit test and check if encrypted arg is provided or not

* fix static checks

* fix unit tests fo python 3.7

---------

Co-authored-by: Hussein Awala <hussein@awala.fr>

restore fallback to empty connection behavior (#30806)

Also remove restored behavior from changelog

fixes to system tests following obsolete cleanup (#30804)

Co-authored-by: Niko Oliveira <onikolas@amazon.com>

Add deferrable mode to `WasbPrefixSensor` (#30252)

Add licence to The __init__.py in google_vendored (#30807)

This is not a problem (as empty __init__.py has 0 creativity so
the licence can be skipped) but it confuses RAT tool when
verifying the sources.

add sentry transport configuration option (#30419)

Upgrade to pip 23.1.1 (#30808)

Just released, fresh off-the-press bugifx version of pip.

Clean bigquery operator tests (#30550)

Add deferrable mode to `GCSObjectUpdateSensor` (#30579)

Fix removed delegate_to parameter in deferrable GCS sensor (#30810)

Two PRs crossed and the result of #30748 caused the #30579 to fail
as delegate_to parameter has been removed.

Upgrade ruff to 0.0.262 (#30809)

Fix dev index building for suspended providers (#30812)

This is a follow-up after #30422 and #30763 - it turns out that
locally building index of providers failed when some providers
are suspended. It only impacts dev workflow locally.

Add instructions on how to avoid accidental airflow upgrade/downgrade (#30813)

Some of our users raised issues that when extending the image, airflow
suddenly started reporting problem with database versions and migration
not aplied or out-of-sync. This almost always turns out to be a
dependency conflict, that leads to automated downgrate or upgrade of
installed airflow version. This is - obviously - undesired (you should
be upgrading airflow consciously rather than accidentally). However
there is no way to do it implicitly - `pip` might decide to upgrade or
downgrade airflow as it sees fit. From the point of view - airflow is
just one of the packages and has no special meaning.

The only way to "keep" airflow version is to specify it together with
other requirements, pinned to the specific version. This PR updates
our examples to do this and explains why airflow is added there.

There is - of course - another risk that the user will forget to
update the version of airflow when they upgrade, however, sinc this
is explicit action performed during image extension, it is much easier
to diagnose and notice. We also warn the users that they should upgrade
when airflow is upgraded.

Make eager upgrade additional dependencies optional (#30811)

In case additional dependencies are installed in customisation path
of Docker, the eager upgrade dependencies are now empty after #30758,
which made installation of extra dependencies fail.

This PR makes it optional.

Include sequoia.com in INTHEWILD (#30814)

Reenable clear on TaskInstanceModelView for role User (#30415)

* Reenable clear on TaskInstanceModelView for role User

The action was disable in https://github.com/apache/airflow/pull/20659
which resolved https://github.com/apache/airflow/issues/20655. The issue
only mentions that the edit is broken and should be disabled. So it seem
like the disabling of the clear action was unintentional.

Also based on the discussion in the PR
https://github.com/apache/airflow/issues/20655 further reinforces this.
That the author believed it still worked could be explain by that using
a user with role `Admin` the action was still available and therefore
one could easily make a mistake believing it still worked as expected.

This PR reenables it action and modifies and existing test case to also
verify that clearing is possible using a user with the role `User`.

* Add back other set state actions

* fix static checks

---------

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

`ExternalTaskSensor`: add `external_task_group_id` to `template_fields` (#30401)

* Add missing info in external_task.py

Add missing external_task_group_id parameter to the ExternalTaskSensor docstring and template_fields.
As suggested, to match other operator classes add `(templated)` to templated fields.

add missing read for K8S config file from conn in deferred `KubernetesPodOperator`  (#29498)

* restore convert_config_file_to_dict method and deprecate it

Update log level in scheduler critical section edge case (#30694)

This log message can be useful if the scheduler ends up needing to query
TIs more than once per scheduler loop, so make it INFO vs DEBUG to
increase discoverability.

Validate `executor` and `config.core.executor` match (#30693)

The chart expects the executor to be set in `executor`, however if a
user only sets `config.core.executor` it is difficult to diagnose as the
chart deploys the wrong rbac resources. This tries to catch that
situation.

Count mapped upstreams only if all are finished (#30641)

* Fix Pydantic TI handling in XComArg.resolve()

* Count mapped upstreams only if all are finished

An XComArg's get_task_map_length() should only return an integer when
the *entire* task has finished. However, before this patch, it may
attempt to count a mapped upstream even when some (or all!) of its
expanded tis are still unfinished, resulting its downstream to be
expanded prematurely.

This patch adds an additional check before we count upstream results to
ensure all the upstreams are actually finished.

* Use SQL IN to find unfinished TI instead

This needs a special workaround for a NULL quirk in SQL.

Optimize performance of scheduling mapped tasks (#30372)

* Optimize performance of scheduling mapped tasks

* Provide max_tis_per_query as a parameter for the schedule_tis method

* Add max_tis_per_query to the JobPydantic class

---------

Co-authored-by: Zhyhimont Dmitry <zhyhimont.d@profitero.com>
Co-authored-by: Zhyhimont Dmitry <dzhigimont@gmail.com>

Update the user-facing documentation of providers (#30816)

We've recently clarified and described our policies for accepting
providers to be maintained by the community (#30657) - this was
directed towards the Airflow developers and contributors. This PR
reviews user-facing part of the documentation for providers by
removing some obsolete/not very useful documentation and pointing
to the new policy where appropriate.

Small refactors in ClusterGenerator of dataproc (#30714)

Rename most pod_id usage to pod_name in KubernetesExecutor (#29147)

We were using pod_id in a lot of place, where really it is just the pod
name. I've renamed it, where it is easy to do so, so things are easier
to follow.

Deprecate databricks async operator (#30761)

detailed docs (#30729)

fixed some errant strings in the kafka example dags (#30818)

* fixed some errant strings in the kafka example dags

* fixed some errant strings in the kafka example dags

Add repair job functionality to databricks hook (#30786)

* add repair job run functionality

* Add tests

Use template comments for the chart license header (#30569)

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

allow multiple prefixes in gcs delete/list hooks and operators (#30815)

Update the error message for invalid use of poke-only sensors (#30821)

Fix XCom deserialization when it contains nonprimitive values (#30819)

* Add testcase to show issue with deserialization

* fix XCom deserializion

---------

Co-authored-by: utkarsh sharma <utkarsharma2@gmail.com>

Add Fail Fast feature for DAGs (#29406)

Improve nested_dict serialization test (#30823)

---------

Co-authored-by: bolkedebruin <bolkedebruin@users.noreply.github.com>

Improve Quick Start instructions (#30820)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>

Add retry param in databrics async operator (#30744)

* Add retry param in databrics async operator

* Apply review suggestions

Optimize docs building in CI (#30825)

* Optimize docs building in CI

Docs building is the longest build for regular PRs - it takes 30 minutes
for any PR that touches any of the docs or python files.

This PR optimises it - only the affected packages will be built when
the PR touches only some of the files.

* fixup! Optimize docs building in CI

* fixup! fixup! Optimize docs building in CI

* fixup! fixup! fixup! Optimize docs building in CI

Optimize away pytest collection steps (#30824)

The Pytest collection steps are only needed if there are any tests
about to be run. There are cases where we build CI images but
we do not expect to run any tests (for doc-only …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Python 3.11 for Google Provider (upgrading all dependencies)