Skip to content

Releases: apache/airflow

Apache Airflow Helm Chart 1.11.0

02 Oct 23:29
helm-chart/1.11.0
Compare
Choose a tag to compare

Significant Changes

Support naming customization on helm chart resources, some resources may be renamed during upgrade (#31066)

This is a new opt-in switch useStandardNaming, for backwards compatibility, to leverage the standard naming convention, which allows full use of fullnameOverride and nameOverride in all resources.

The following resources will be renamed using default of useStandardNaming=false when upgrading to 1.11.0 or a higher version.

  • ConfigMap {release}-airflow-config to {release}-config
  • Secret {release}-airflow-metadata to {release}-metadata
  • Secret {release}-airflow-result-backend to {release}-result-backend
  • Ingress {release}-airflow-ingress to {release}-ingress

For existing installations, all your resources will be recreated with a new name and Helm will delete the previous resources.

This won't delete existing PVCs for logs used by StatefulSet/Deployments, but it will recreate them with brand new PVCs.
If you do want to preserve logs history you'll need to manually copy the data of these volumes into the new volumes after
deployment. Depending on what storage backend/class you're using this procedure may vary. If you don't mind starting
with fresh logs/redis volumes, you can just delete the old PVCs that will be names, for example:

kubectl delete pvc -n airflow logs-gta-triggerer-0
kubectl delete pvc -n airflow logs-gta-worker-0
kubectl delete pvc -n airflow redis-db-gta-redis-0

If you do not change useStandardNaming or fullnameOverride after upgrade, you can proceed as usual and no unexpected behaviours will be presented.

bitnami/postgresql subchart updated to 12.10.0 (#33747)

The PostgreSQL subchart that is used with the Chart is now 12.10.0, previously it was 12.1.9.

Default git-sync image is updated to 3.6.9 (#33748)

The default git-sync image that is used with the Chart is now 3.6.9, previously it was 3.6.3.

Default Airflow image is updated to 2.7.1 (#34186)

The default Airflow image that is used with the Chart is now 2.7.1, previously it was 2.6.2.

New Features

  • Add support for scheduler name to PODs templates (#33843)
  • Support KEDA scaling for triggerer (#32302)
  • Add support for container lifecycle hooks (#32349, #34677)
  • Support naming customization on helm chart resources (#31066)
  • Adding startupProbe to scheduler and webserver (#33107)
  • Allow disabling token mounts using automountServiceAccountToken (#32808)
  • Add support for defining custom priority classes (#31615)
  • Add support for runtimeClassName (#31868)
  • Add support for custom query in workers KEDA trigger (#32308)

Improvements

  • Add containerSecurityContext for cleanup job (#34351)
  • Add existing secret support for PGBouncer metrics exporter (#32724)
  • Allow templating in webserver ingress hostnames (#33142)
  • Allow templating in flower ingress hostnames (#33363)
  • Add configmap annotations to StatsD and webserver (#33340)
  • Add pod security context to PgBouncer (#32662)
  • Add an option to use a direct DB connection in KEDA when PgBouncer is enabled (#32608)
  • Allow templating in cleanup.schedule (#32570)
  • Template dag processor waitformigration containers extraVolumeMounts (#32100)
  • Ability to inject extra containers into PgBouncer (#33686)
  • Allowing ability to add custom env into PgBouncer container (#33438)
  • Add support for env variables in the StatsD container (#33175)

Bug Fixes

  • Add airflow db migrate command to database migration job (#34178)
  • Pass workers.terminationGracePeriodSeconds into KubeExecutor pod template (#33514)
  • CeleryExecutor namespace depends on Airflow version (#32753)
  • Fix dag processor not including webserver config volume (#32644)
  • Dag processor liveness probe include --local and --job-type args (#32426)
  • Revising flower_url_prefix considering default value (#33134)

Doc only changes

  • Add more explicit "embedded postgres" exclusion for production (#33034)
  • Update git-sync description (#32181)

Misc

  • Default Airflow version to 2.7.1 (#34186)
  • Update PostgreSQL subchart to 12.10.0 (#33747)
  • Update git-sync to 3.6.9 (#33748)
  • Remove unnecessary loops to load env from helm values (#33506)
  • Replace common.tplvalues.render with tpl in ingress template files (#33384)
  • Remove K8S 1.23 support (#32899)
  • Fix chart named template comments (#32681)
  • Remove outdated comment from chart values in the workers KEDA conf section (#32300)
  • Remove unnecessary or function in template files (#34415)

Apache Airflow 2.7.1

07 Sep 18:08
2.7.1
b8c4166
Compare
Choose a tag to compare

Significant Changes

CronTriggerTimetable is now less aggressive when trying to skip a run (#33404)

When setting catchup=False, CronTriggerTimetable no longer skips a run if
the scheduler does not query the timetable immediately after the previous run
has been triggered.

This should not affect scheduling in most cases, but can change the behaviour if
a DAG is paused-unpaused to manually skip a run. Previously, the timetable (with
catchup=False) would only start a run after a DAG is unpaused, but with this
change, the scheduler would try to look at little bit back to schedule the
previous run that covers a part of the period when the DAG was paused. This
means you will need to keep a DAG paused longer (namely, for the entire cron
period to pass) to really skip a run.

Note that this is also the behaviour exhibited by various other cron-based
scheduling tools, such as anacron.

conf.set() becomes case insensitive to match conf.get() behavior (#33452)

Also, conf.get() will now break if used with non-string parameters.

conf.set(section, key, value) used to be case sensitive, i.e. conf.set("SECTION", "KEY", value)
and conf.set("section", "key", value) were stored as two distinct configurations.
This was inconsistent with the behavior of conf.get(section, key), which was always converting the section and key to lower case.

As a result, configuration options set with upper case characters in the section or key were unreachable.
That's why we are now converting section and key to lower case in conf.set too.

We also changed a bit the behavior of conf.get(). It used to allow objects that are not strings in the section or key.
Doing this will now result in an exception. For instance, conf.get("section", 123) needs to be replaced with conf.get("section", "123").

Bug Fixes

  • Ensure that tasks wait for running indirect setup (#33903)
  • Respect "soft_fail" for core async sensors (#33403)
  • Differentiate 0 and unset as a default param values (#33965)
  • Raise 404 from Variable PATCH API if variable is not found (#33885)
  • Fix MappedTaskGroup tasks not respecting upstream dependency (#33732)
  • Add limit 1 if required first value from query result (#33672)
  • Fix UI DAG counts including deleted DAGs (#33778)
  • Fix cleaning zombie RESTARTING tasks (#33706)
  • SECURITY_MANAGER_CLASS should be a reference to class, not a string (#33690)
  • Add back get_url_for_login in security manager (#33660)
  • Fix 2.7.0 db migration job errors (#33652)
  • Set context inside templates (#33645)
  • Treat dag-defined access_control as authoritative if defined (#33632)
  • Bind engine before attempting to drop archive tables (#33622)
  • Add a fallback in case no first name and last name are set (#33617)
  • Sort data before groupby in TIS duration calculation (#33535)
  • Stop adding values to rendered templates UI when there is no dagrun (#33516)
  • Set strict to True when parsing dates in webserver views (#33512)
  • Use dialect.name in custom SA types (#33503)
  • Do not return ongoing dagrun when a end_date is less than utcnow (#33488)
  • Fix a bug in formatDuration method (#33486)
  • Make conf.set case insensitive (#33452)
  • Allow timetable to slightly miss catchup cutoff (#33404)
  • Respect soft_fail argument when poke is called (#33401)
  • Create a new method used to resume the task in order to implement specific logic for operators (#33424)
  • Fix DagFileProcessor interfering with dags outside its processor_subdir (#33357)
  • Remove the unnecessary <br> text in Provider's view (#33326)
  • Respect soft_fail argument when ExternalTaskSensor runs in deferrable mode (#33196)
  • Fix handling of default value and serialization of Param class (#33141)
  • Check if the dynamically-added index is in the table schema before adding (#32731)
  • Fix rendering the mapped parameters when using expand_kwargs method (#32272)
  • Fix dependencies for celery and opentelemetry for Python 3.8 (#33579)

Misc/Internal

Doc only changes

  • Add documentation explaining template_ext (and how to override it) (#33735)
  • Explain how users can check if python code is top-level (#34006)
  • Clarify that DAG authors can also run code in DAG File Processor (#33920)
  • Fix broken link in Modules Management page (#33499)
  • Fix secrets backend docs (#33471)
  • Fix config description for base_log_folder (#33388)

Apache Airflow 2.7.0

18 Aug 16:40
2.7.0
c08c82e
Compare
Choose a tag to compare

Significant Changes

Remove Python 3.7 support (#30963)

As of now, Python 3.7 is no longer supported by the Python community.
Therefore, to use Airflow 2.7.0, you must ensure your Python version is
either 3.8, 3.9, 3.10, or 3.11.

Old Graph View is removed (#32958)

The old Graph View is removed. The new Graph View is the default view now.

The trigger UI form is skipped in web UI if no parameters are defined in a DAG (#33351)

If you are using dag_run.conf dictionary and web UI JSON entry to run your DAG you should either:

  • Add params to your DAG <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/params.html#use-params-to-provide-a-trigger-ui-form>_
  • Enable the new configuration show_trigger_form_if_no_params to bring back old behaviour

The "db init", "db upgrade" commands and "[database] load_default_connections" configuration options are deprecated (#33136).

Instead, you should use "airflow db migrate" command to create or upgrade database. This command will not create default connections.
In order to create default connections you need to run "airflow connections create-default-connections" explicitly,
after running "airflow db migrate".

In case of SMTP SSL connection, the context now uses the "default" context (#33070)

The "default" context is Python's default_ssl_contest instead of previously used "none". The
default_ssl_context provides a balance between security and compatibility but in some cases,
when certificates are old, self-signed or misconfigured, it might not work. This can be configured
by setting "ssl_context" in "email" configuration of Airflow.

Setting it to "none" brings back the "none" setting that was used in Airflow 2.6 and before,
but it is not recommended due to security reasons ad this setting disables validation of certificates and allows MITM attacks.

Disable default allowing the testing of connections in UI, API and CLI(#32052)

For security reasons, the test connection functionality is disabled by default across Airflow UI,
API and CLI. The availability of the functionality can be controlled by the
test_connection flag in the core section of the Airflow
configuration (airflow.cfg). It can also be controlled by the
environment variable AIRFLOW__CORE__TEST_CONNECTION.

The following values are accepted for this config param:

  1. Disabled: Disables the test connection functionality and
    disables the Test Connection button in the UI.

This is also the default value set in the Airflow configuration.
2. Enabled: Enables the test connection functionality and
activates the Test Connection button in the UI.

  1. Hidden: Disables the test connection functionality and
    hides the Test Connection button in UI.

For more information on capabilities of users, see the documentation:
https://airflow.apache.org/docs/apache-airflow/stable/security/security_model.html#capabilities-of-authenticated-ui-users
It is strongly advised to not enable the feature until you make sure that only
highly trusted UI/API users have "edit connection" permissions.

The xcomEntries API disables support for the deserialize flag by default (#32176)

For security reasons, the /dags/*/dagRuns/*/taskInstances/*/xcomEntries/*
API endpoint now disables the deserialize option to deserialize arbitrary
XCom values in the webserver. For backward compatibility, server admins may set
the [api] enable_xcom_deserialize_support config to True to enable the
flag and restore backward compatibility.

However, it is strongly advised to not enable the feature, and perform
deserialization at the client side instead.

Change of the default Celery application name (#32526)

Default name of the Celery application changed from airflow.executors.celery_executor to airflow.providers.celery.executors.celery_executor.

You should change both your configuration and Health check command to use the new name:

  • in configuration (celery_app_name configuration in celery section) use airflow.providers.celery.executors.celery_executor
  • in your Health check command use airflow.providers.celery.executors.celery_executor.app

The default value for scheduler.max_tis_per_query is changed from 512 to 16 (#32572)

This change is expected to make the Scheduler more responsive.

scheduler.max_tis_per_query needs to be lower than core.parallelism.
If both were left to their default value previously, the effective default value of scheduler.max_tis_per_query was 32
(because it was capped at core.parallelism).

To keep the behavior as close as possible to the old config, one can set scheduler.max_tis_per_query = 0,
in which case it'll always use the value of core.parallelism.

Some executors have been moved to corresponding providers (#32767)

In order to use the executors, you need to install the providers:

  • for Celery executors you need to install apache-airflow-providers-celery package >= 3.3.0
  • for Kubernetes executors you need to install apache-airflow-providers-cncf-kubernetes package >= 7.4.0
  • For Dask executors you need to install apache-airflow-providers-daskexecutor package in any version

You can achieve it also by installing airflow with [celery], [cncf.kubernetes], [daskexecutor] extras respectively.

Users who base their images on the apache/airflow reference image (not slim) should be unaffected - the base
reference image comes with all the three providers installed.

Improvement Changes

PostgreSQL only improvement: Added index on taskinstance table (#30762)

This index seems to have great positive effect in a setup with tens of millions such rows.

New Features

  • Add OpenTelemetry to Airflow AIP-49
  • Trigger Button - Implement Part 2 of AIP-50 (#31583)
  • Removing Executor Coupling from Core Airflow AIP-51
  • Automatic setup and teardown tasks AIP-52
  • OpenLineage in Airflow AIP-53
  • Experimental: Add a cache to Variable and Connection when called at dag parsing time (#30259)
  • Enable pools to consider deferred tasks (#32709)
  • Allows to choose SSL context for SMTP connection (#33070)
  • New gantt tab (#31806)
  • Load plugins from providers (#32692)
  • Add BranchExternalPythonOperator (#32787, #33360)
  • Add option for storing configuration description in providers (#32629)
  • Introduce Heartbeat Parameter to Allow Per-LocalTaskJob Configuration (#32313)
  • Add Executors discovery and documentation (#32532)
  • Add JobState for job state constants (#32549)
  • Add config to disable the 'deserialize' XCom API flag (#32176)
  • Show task instance in web UI by custom operator name (#31852)
  • Add default_deferrable config (#31712)
  • Introducing AirflowClusterPolicySkipDag exception (#32013)
  • Use reactflow for datasets graph (#31775)
  • Add an option to load the dags from db for command tasks run (#32038)
  • Add version of chain which doesn't require matched lists (#31927)
  • Use operator_name instead of task_type in UI (#31662)
  • Add --retry and --retry-delay to airflow db check (#31836)
  • Allow skipped task state task_instance_schema.py (#31421)
  • Add a new config for celery result_backend engine options (#30426)
  • UI Add Cluster Activity Page (#31123, #32446)
  • Adding keyboard shortcuts to common actions (#30950)
  • Adding more information to kubernetes executor logs (#29929)
  • Add support for configuring custom alembic file (#31415)
  • Add running and failed status tab for DAGs on the UI (#30429)
  • Add multi-select, proposals and labels for trigger form (#31441)
  • Making webserver config customizable (#29926)
  • Render DAGCode in the Grid View as a tab (#31113)
  • Add rest endpoint to get option of configuration (#31056)
  • Add section query param in get config rest API (#30936)
  • Create metrics to track Scheduled->Queued->Running task state transition times (#30612)
  • Mark Task Groups as Success/Failure (#30478)
  • Add CLI command to list the provider trigger info (#30822)
  • Add Fail Fast feature for DAGs (#29406)

Improvements

  • Improve graph nesting logic (#33421)
  • Configurable health check threshold for triggerer (#33089, #33084)
  • add dag_run_ids and task_ids filter for the batch task instance API endpoint (#32705)
  • Ensure DAG-level references are filled on unmap (#33083)
  • Add support for arrays of different data types in the Trigger Form UI (#32734)
  • Always show gantt and code tabs (#33029)
  • Move listener success hook to after SQLAlchemy commit (#32988)
  • Rename db upgrade to db migrate and add connections create-default-connections (#32810, #33136)
  • Remove old gantt chart and redirect to grid views gantt tab (#32908)
  • Adjust graph zoom based on selected task (#32792)
  • Call listener on_task_instance_running after rendering templates (#32716)
  • Display execution_date in graph view task instance tooltip. (#32527)
  • Allow configuration to be contributed by providers (#32604, #32755, #32812)
  • Reduce default for max TIs per query, enforce <= parallelism (#32572)
  • Store config description in Airflow configuration object (#32669)
  • Use isdisjoint instead of not intersection (#32616)
  • Speed up calculation of leaves and roots for task groups (#32592)
  • Kubernetes Executor Load Time Optimizations (#30727)
  • Save DAG parsing time if dag is not schedulable ...
Read more

Apache Airflow 2.6.3

10 Jul 23:02
2.6.3
eb24742
Compare
Choose a tag to compare

Bug Fixes

  • Use linear time regular expressions (#32303)
  • Fix triggerers alive check and add a new conf for triggerer heartbeat rate (#32123)
  • Catch the exception that triggerer initialization failed (#31999)
  • Hide sensitive values from extra in connection edit form (#32309)
  • Sanitize DagRun.run_id and allow flexibility (#32293)
  • Add triggerer canceled log (#31757)
  • Fix try number shown in the task view (#32361)
  • Retry transactions on occasional deadlocks for rendered fields (#32341)
  • Fix behaviour of LazyDictWithCache when import fails (#32248)
  • Remove executor_class from Job - fixing backfill for custom executors (#32219)
  • Fix bugged singleton implementation (#32218)
  • Use mapIndex to display extra links per mapped task. (#32154)
  • Ensure that main triggerer thread exits if the async thread fails (#32092)
  • Use re2 for matching untrusted regex (#32060)
  • Render list items in rendered fields view (#32042)
  • Fix hashing of dag_dependencies in serialized dag (#32037)
  • Return None if an XComArg fails to resolve in a multiple_outputs Task (#32027)
  • Check for DAG ID in query param from url as well as kwargs (#32014)
  • Flash an error message instead of failure in rendered-templates when map index is not found (#32011)
  • Fix ExternalTaskSensor when there is no task group TIs for the current execution date (#32009)
  • Fix number param html type in trigger template (#31980, #31946)
  • Fix masking nested variable fields (#31964)
  • Fix operator_extra_links property serialization in mapped tasks (#31904)
  • Decode old-style nested Xcom value (#31866)
  • Add a check for trailing slash in webserver base_url (#31833)
  • Fix connection uri parsing when the host includes a scheme (#31465)
  • Fix database session closing with xcom_pull and inlets (#31128)
  • Fix DAG's on_failure_callback is not invoked when task failed during testing dag. (#30965)
  • Fix airflow module version check when using ExternalPythonOperator and debug logging level (#30367)

Misc/Internal

  • Fix task.sensor annotation in type stub (#31954)
  • Limit Pydantic to < 2.0.0 until we solve 2.0.0 incompatibilities (#32312)
  • Fix Pydantic 2 pickiness about model definition (#32307)

Doc only changes

  • Add explanation about tag creation and cleanup (#32406)
  • Minor updates to docs (#32369, #32315, #32310, #31794)
  • Clarify Listener API behavior (#32269)
  • Add information for users who ask for requirements (#32262)
  • Add links to DAGRun / DAG / Task in Templates Reference (#32245)
  • Add comment to warn off a potential wrong fix (#32230)
  • Add a note that we'll need to restart triggerer to reflect any trigger change (#32140)
  • Adding missing hyperlink to the tutorial documentation (#32105)
  • Added difference between Deferrable and Non-Deferrable Operators (#31840)
  • Add comments explaining need for special "trigger end" log message (#31812)
  • Documentation update on Plugin updates. (#31781)
  • Fix SemVer link in security documentation (#32320)
  • Update security model of Airflow (#32098)
  • Update references to restructured documentation from Airflow core (#32282)
  • Separate out advanced logging configuration (#32131)
  • Add to Airflow in prominent places (#31977)

Apache Airflow Helm Chart 1.10.0

27 Jun 14:34
helm-chart/1.10.0
Compare
Choose a tag to compare

Significant Changes

Default Airflow image is updated to 2.6.2 (#31979)

The default Airflow image that is used with the Chart is now 2.6.2, previously it was 2.5.3.

New Features

  • Add support for container security context (#31043)

Improvements

  • Validate executor and config.core.executor match (#30693)
  • Support minAvailable property for PodDisruptionBudget (#30603)
  • Add volumeMounts to dag processor waitForMigrations (#30990)
  • Template extra volumes (#30773)

Bug Fixes

  • Fix webserver probes timeout and period (#30609)
  • Add missing waitForMigrations for workers (#31625)
  • Add missing priorityClassName to K8S worker pod template (#31328)
  • Adding log groomer sidecar to dag processor (#30726)
  • Do not propagate global security context to statsd and redis (#31865)

Misc

  • Default Airflow version to 2.6.2 (#31979)
  • Use template comments for the chart license header (#30569)
  • Align apiVersion and kind order in chart templates (#31850)
  • Cleanup Kubernetes < 1.23 support (#31847)

Apache Airflow 2.6.2

17 Jun 10:19
d2f0d10
Compare
Choose a tag to compare

Bug Fixes

  • Cascade update of TaskInstance to TaskMap table (#31445)
  • Fix Kubernetes executors detection of deleted pods (#31274)
  • Use keyword parameters for migration methods for mssql (#31309)
  • Control permissibility of driver config in extra from airflow.cfg (#31754)
  • Fixing broken links in openapi/v1.yaml (#31619)
  • Hide old alert box when testing connection with different value (#31606)
  • Add TriggererStatus to OpenAPI spec (#31579)
  • Resolving issue where Grid won't un-collapse when Details is collapsed (#31561)
  • Fix sorting of tags (#31553)
  • Add the missing map_index to the xcom key when skipping downstream tasks (#31541)
  • Fix airflow users delete CLI command (#31539)
  • Include triggerer health status in Airflow /health endpoint (#31529)
  • Remove dependency already registered for this task warning (#31502)
  • Use kube_client over default CoreV1Api for deleting pods (#31477)
  • Ensure min backoff in base sensor is at least 1 (#31412)
  • Fix max_active_tis_per_dagrun for Dynamic Task Mapping (#31406)
  • Fix error handling when pre-importing modules in DAGs (#31401)
  • Fix dropdown default and adjust tutorial to use 42 as default for proof (#31400)
  • Fix crash when clearing run with task from normal to mapped (#31352)
  • Make BaseJobRunner a generic on the job class (#31287)
  • Fix url_for_asset fallback and 404 on DAG Audit Log (#31233)
  • Don't present an undefined execution date (#31196)
  • Added spinner activity while the logs load (#31165)
  • Include rediss to the list of supported URL schemes (#31028)
  • Optimize scheduler by skipping "non-schedulable" DAGs (#30706)
  • Save scheduler execution time during search for queued dag_runs (#30699)
  • Fix ExternalTaskSensor to work correctly with task groups (#30742)
  • Fix DAG.access_control can't sync when clean access_control (#30340)
  • Fix failing get_safe_url tests for latest Python 3.8 and 3.9 (#31766)
  • Fix typing for POST user endpoint (#31767)
  • Fix wrong update for nested group default args (#31776)
  • Fix overriding default_args in nested task groups (#31608)
  • Mark [secrets] backend_kwargs as a sensitive config (#31788)
  • Executor events are not always "exited" here (#30859)
  • Validate connection IDs (#31140)

Misc/Internal

  • Add Python 3.11 support (#27264)
  • Replace unicodecsv with standard csv library (#31693)
  • Bring back unicodecsv as dependency of Airflow (#31814)
  • Remove found_descendents param from get_flat_relative_ids (#31559)
  • Fix typing in external task triggers (#31490)
  • Wording the next and last run DAG columns better (#31467)
  • Skip auto-document things with :meta private: (#31380)
  • Add an example for sql_alchemy_connect_args conf (#31332)
  • Convert dask upper-binding into exclusion (#31329)
  • Upgrade FAB to 4.3.1 (#31203)
  • Added metavar and choices to --state flag in airflow dags list-jobs CLI for suggesting valid state arguments. (#31308)
  • Use only one line for tmp dir log (#31170)
  • Rephrase comment in setup.py (#31312)
  • Add fullname to owner on logging (#30185)
  • Make connection id validation consistent across interface (#31282)
  • Use single source of truth for sensitive config items (#31820)

Doc only changes

  • Add docstring and signature for _read_remote_logs (#31623)
  • Remove note about triggerer being 3.7+ only (#31483)
  • Fix version support information (#31468)
  • Add missing BashOperator import to documentation example (#31436)
  • Fix task.branch error caused by incorrect initial parameter (#31265)
  • Update callbacks documentation (errors and context) (#31116)
  • Add an example for dynamic task mapping with non-TaskFlow operator (#29762)
  • Few doc fixes - links, grammar and wording (#31719)
  • Add description in a few more places about adding airflow to pip install (#31448)
  • Fix table formatting in docker build documentation (#31472)
  • Update documentation for constraints installation (#31882)

Apache Airflow 2.6.1

16 May 14:38
2.6.1
58fca5e
Compare
Choose a tag to compare

Significant Changes

Clarifications of the external Health Check mechanism and using Job classes (#31277).

In the past SchedulerJob and other *Job classes are known to have been used to perform
external health checks for Airflow components. Those are, however, Airflow DB ORM related classes.
The DB models and database structure of Airflow are considered as internal implementation detail, following
public interface <https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html>_).
Therefore, they should not be used for external health checks. Instead, you should use the
airflow jobs check CLI command (introduced in Airflow 2.1) for that purpose.

Bug Fixes

  • Fix calculation of health check threshold for SchedulerJob (#31277)
  • Fix timestamp parse failure for k8s executor pod tailing (#31175)
  • Make sure that DAG processor job row has filled value in job_type column (#31182)
  • Fix section name reference for api_client_retry_configuration (#31174)
  • Ensure the KPO runs pod mutation hooks correctly (#31173)
  • Remove worrying log message about redaction from the OpenLineage plugin (#31149)
  • Move interleave_timestamp_parser config to the logging section (#31102)
  • Ensure that we check worker for served logs if no local or remote logs found (#31101)
  • Fix MappedTaskGroup import in taskinstance file (#31100)
  • Format DagBag.dagbag_report() Output (#31095)
  • Mask task attribute on task detail view (#31125)
  • Fix template error when iterating None value and fix params documentation (#31078)
  • Fix apache-hive extra so it installs the correct package (#31068)
  • Fix issue with zip files in DAGs folder when pre-importing Airflow modules (#31061)
  • Move TaskInstanceKey to a separate file to fix circular import (#31033, #31204)
  • Fix deleting DagRuns and TaskInstances that have a note (#30987)
  • Fix airflow providers get command output (#30978)
  • Fix Pool schema in the OpenAPI spec (#30973)
  • Add support for dynamic tasks with template fields that contain pandas.DataFrame (#30943)
  • Use the Task Group explicitly passed to 'partial' if any (#30933)
  • Fix order_by request in list DAG rest api (#30926)
  • Include node height/width in center-on-task logic (#30924)
  • Remove print from dag trigger command (#30921)
  • Improve task group UI in new graph (#30918)
  • Fix mapped states in grid view (#30916)
  • Fix problem with displaying graph (#30765)
  • Fix backfill KeyError when try_number out of sync (#30653)
  • Re-enable clear and setting state in the TaskInstance UI (#30415)
  • Prevent DagRun's state and start_date from being reset when clearing a task in a running DagRun (#30125)

Misc/Internal

  • Upper bind dask until they solve a side effect in their test suite (#31259)
  • Show task instances affected by clearing in a table (#30633)
  • Fix missing models in API documentation (#31021)

Doc only changes

  • Improve description of the dag_processing.processes metric (#30891)
  • Improve Quick Start instructions (#30820)
  • Add section about missing task logs to the FAQ (#30717)
  • Mount the config directory in docker compose (#30662)
  • Update version_added config field for might_contain_dag and metrics_allow_list (#30969)

Apache Airflow 2.6.0

30 Apr 13:06
2.6.0
ab54c63
Compare
Choose a tag to compare

Significant Changes

Default permissions of file task handler log directories and files has been changed to "owner + group" writeable (#29506).

Default setting handles case where impersonation is needed and both users (airflow and the impersonated user)
have the same group set as main group. Previously the default was also other-writeable and the user might choose
to use the other-writeable setting if they wish by configuring file_task_handler_new_folder_permissions
and file_task_handler_new_file_permissions in logging section.

SLA callbacks no longer add files to the dag processor manager's queue (#30076)

This stops SLA callbacks from keeping the dag processor manager permanently busy. It means reduced CPU,
and fixes issues where SLAs stop the system from seeing changes to existing dag files. Additional metrics added to help track queue state.

The cleanup() method in BaseTrigger is now defined as asynchronous (following async/await) pattern (#30152).

This is potentially a breaking change for any custom trigger implementations that override the cleanup()
method and uses synchronous code, however using synchronous operations in cleanup was technically wrong,
because the method was executed in the main loop of the Triggerer and it was introducing unnecessary delays
impacting other triggers. The change is unlikely to affect any existing trigger implementations.

The gauge scheduler.tasks.running no longer exist (#30374)

The gauge has never been working and its value has always been 0. Having an accurate
value for this metric is complex so it has been decided that removing this gauge makes
more sense than fixing it with no certainty of the correctness of its value.

Consolidate handling of tasks stuck in queued under new task_queued_timeout config (#30375)

Logic for handling tasks stuck in the queued state has been consolidated, and the all configurations
responsible for timing out stuck queued tasks have been deprecated and merged into
[scheduler] task_queued_timeout. The configurations that have been deprecated are
[kubernetes] worker_pods_pending_timeout, [celery] stalled_task_timeout, and
[celery] task_adoption_timeout. If any of these configurations are set, the longest timeout will be
respected. For example, if [celery] stalled_task_timeout is 1200, and [scheduler] task_queued_timeout
is 600, Airflow will set [scheduler] task_queued_timeout to 1200.

Improvement Changes

Display only the running configuration in configurations view (#28892)

The configurations view now only displays the running configuration. Previously, the default configuration
was displayed at the top but it was not obvious whether this default configuration was overridden or not.
Subsequently, the non-documented endpoint /configuration?raw=true is deprecated and will be removed in
Airflow 3.0. The HTTP response now returns an additional Deprecation header. The /config endpoint on
the REST API is the standard way to fetch Airflow configuration programmatically.

Explicit skipped states list for ExternalTaskSensor (#29933)

ExternalTaskSensor now has an explicit skipped_states list

Miscellaneous Changes

Handle OverflowError on exponential backoff in next_run_calculation (#28172)

Maximum retry task delay is set to be 24h (86400s) by default. You can change it globally via core.max_task_retry_delay
parameter.

Move Hive macros to the provider (#28538)

The Hive Macros (hive.max_partition, hive.closest_ds_partition) are available only when Hive Provider is
installed. Please install Hive Provider > 5.1.0 when using those macros.

New Features

  • Skip PythonVirtualenvOperator task when it returns a provided exit code (#30690)
  • rename skip_exit_code to skip_on_exit_code and allow providing multiple codes (#30692)
  • Add skip_on_exit_code also to ExternalPythonOperator (#30738)
  • Add max_active_tis_per_dagrun for Dynamic Task Mapping (#29094)
  • Add serializer for pandas dataframe (#30390)
  • Deferrable TriggerDagRunOperator (#30292)
  • Add command to get DAG Details via CLI (#30432)
  • Adding ContinuousTimetable and support for @continuous schedule_interval (#29909)
  • Allow customized rules to check if a file has dag (#30104)
  • Add a new Airflow conf to specify a SSL ca cert for Kubernetes client (#30048)
  • Bash sensor has an explicit retry code (#30080)
  • Add filter task upstream/downstream to grid view (#29885)
  • Add testing a connection via Airflow CLI (#29892)
  • Support deleting the local log files when using remote logging (#29772)
  • Blocklist to disable specific metric tags or metric names (#29881)
  • Add a new graph inside of the grid view (#29413)
  • Add database check_migrations config (#29714)
  • add output format arg for cli.dags.trigger (#29224)
  • Make json and yaml available in templates (#28930)
  • Enable tagged metric names for existing Statsd metric publishing events | influxdb-statsd support (#29093)
  • Add arg --yes to db export-archived command. (#29485)
  • Make the policy functions pluggable (#28558)
  • Add airflow db drop-archived command (#29309)
  • Enable individual trigger logging (#27758)
  • Implement new filtering options in graph view (#29226)
  • Add triggers for ExternalTask (#29313)
  • Add command to export purged records to CSV files (#29058)
  • Add FileTrigger (#29265)
  • Emit DataDog statsd metrics with metadata tags (#28961)
  • Add some statsd metrics for dataset (#28907)
  • Add --overwrite option to connections import CLI command (#28738)
  • Add general-purpose "notifier" concept to DAGs (#28569)
  • Add a new conf to wait past_deps before skipping a task (#27710)
  • Add Flink on K8s Operator (#28512)
  • Allow Users to disable SwaggerUI via configuration (#28354)
  • Show mapped task groups in graph (#28392)
  • Log FileTaskHandler to work with KubernetesExecutor's multi_namespace_mode (#28436)
  • Add a new config for adapting masked secrets to make it easier to prevent secret leakage in logs (#28239)
  • List specific config section and its values using the cli (#28334)
  • KubernetesExecutor multi_namespace_mode can use namespace list to avoid requiring cluster role (#28047)
  • Automatically save and allow restore of recent DAG run configs (#27805)
  • Added exclude_microseconds to cli (#27640)

Improvements

  • Rename most pod_id usage to pod_name in KubernetesExecutor (#29147)
  • Update the error message for invalid use of poke-only sensors (#30821)
  • Update log level in scheduler critical section edge case (#30694)
  • AIP-51 Removing Executor Coupling from Core Airflow (AIP-51 <https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-51+milestone%3A%22Airflow+2.6.0%22>_)
  • Add multiple exit code handling in skip logic for BashOperator (#30739)
  • Updated app to support configuring the caching hash method for FIPS v2 (#30675)
  • Preload airflow imports before dag parsing to save time (#30495)
  • Improve task & run actions UX in grid view (#30373)
  • Speed up TaskGroups with caching property of group_id (#30284)
  • Use the engine provided in the session (#29804)
  • Type related import optimization for Executors (#30361)
  • Add more type hints to the code base (#30503)
  • Always use self.appbuilder.get_session in security managers (#30233)
  • Update SQLAlchemy select() to new style (#30515)
  • Refactor out xcom constants from models (#30180)
  • Add exception class name to DAG-parsing error message (#30105)
  • Rename statsd_allow_list and statsd_block_list to metrics_*_list (#30174)
  • Improve serialization of tuples and sets (#29019)
  • Make cleanup method in trigger an async one (#30152)
  • Lazy load serialization modules (#30094)
  • SLA callbacks no longer add files to the dag_processing manager queue (#30076)
  • Add task.trigger rule to grid_data (#30130)
  • Speed up log template sync by avoiding ORM (#30119)
  • Separate cli_parser.py into two modules (#29962)
  • Explicit skipped states list for ExternalTaskSensor (#29933)
  • Add task state hover highlighting to new graph (#30100)
  • Store grid tabs in url params (#29904)
  • Use custom Connexion resolver to load lazily (#29992)
  • Delay Kubernetes import in secret masker (#29993)
  • Delay ConnectionModelView init until it's accessed (#29946)
  • Scheduler, make stale DAG deactivation threshold configurable instead of using dag processing timeout (#29446)
  • Improve grid view height calculations (#29563)
  • Avoid importing executor during conf validation (#29569)
  • Make permissions for FileTaskHandler group-writeable and configurable (#29506)
  • Add colors in help outputs of Airflow CLI commands #28789 (#29116)
  • Add a param for get_dags endpoint to list only unpaused dags (#28713)
  • Expose updated_at filter for dag run and task instance endpoints (#28636)
  • Increase length of user identifier columns (#29061)
  • Update gantt chart UI to display queued state of tasks (#28686)
  • Add index on log.dttm (#28944)
  • Display only the running configuration in configurations view (#28892)
  • Cap dropdown menu size dynamically (#28736)
  • Added JSON linter to connection edit / add UI for field extra. On connection edit screen, existing extra data will be displayed indented (#28583)
  • Use labels instead of pod name for pod log read in k8s exec (#28546)
  • Use time not tries for queued & running re-checks. (#28586)
  • CustomTTYColoredFormatter should inherit TimezoneAware formatter (#28439)
  • Improve past depends handling in Airflow CLI tasks.run command (#28113)
  • Support using a list of callbacks in on_*_callback/sla_miss_callbacks (#28469)
  • Better table name validation for db clean (#28246)
  • Use object instead of array in config.yml for config template (#28417)
  • Add markdown rendering for task notes. (#28245)
  • Show mapped task groups in grid view (#28208)
  • Add renamed and ``pre...
Read more

Apache Airflow Helm Chart 1.9.0

14 Apr 21:02
helm-chart/1.9.0
Compare
Choose a tag to compare

Significant Changes

Default PgBouncer and PgBouncer Exporter images have been updated (#29919)

The PgBouncer and PgBouncer Exporter images are based on newer software/os. They are also multi-platform AMD/ARM images:

  • pgbouncer: 1.16.1 based on alpine 3.14 (airflow-pgbouncer-2023.02.24-1.16.1)
  • pgbouncer-exporter: 0.14.0 based on alpine 3.17 (apache/airflow:airflow-pgbouncer-exporter-2023.02.21-0.14.0)

Default Airflow image is updated to 2.5.3 (#30411)

The default Airflow image that is used with the Chart is now 2.5.3, previously it was 2.5.1.

New Features

  • Add support for hostAliases for Airflow webserver and scheduler (#30051)
  • Add support for annotations on StatsD Deployment and cleanup CronJob (#30126)
  • Add support for annotations in logs PVC (#29270)
  • Add support for annotations in extra ConfigMap and Secrets (#30303)
  • Add support for pod annotations to PgBouncer (#30168)
  • Add support for ttlSecondsAfterFinished on migrateDatabaseJob and createUserJob (#29314)
  • Add support for using SHA digest of Docker images (#30214)

Improvements

  • Template extra volumes in Helm Chart (#29357)
  • Make Liveness/Readiness Probe timeouts configurable for PgBouncer Exporter (#29752)
  • Enable individual trigger logging (#29482)

Bug Fixes

  • Add config.kubernetes_executor to values (#29818)
  • Block extra properties in image config (#30217)
  • Remove replicas if KEDA is enabled (#29838)
  • Mount kerberos.keytab to worker when enabled (#29526)
  • Fix adding annotations for dag persistence PVC (#29622)
  • Fix bitnami/postgresql default username and password (#29478)
  • Add global volumes in pod template file (#29295)
  • Add log groomer sidecar to triggerer service (#29392)
  • Helm deployment fails when postgresql.nameOverride is used (#29214)

Doc only changes

  • Add gitSync optional env description (#29378)
  • Add webserver NodePort example (#29460)
  • Include Rancher in Helm chart install instructions (#28416)
  • Change RSA SSH host key to reflect update from Github (#30286)

Misc

  • Update Airflow version to 2.5.3 (#30411)
  • Switch to newer versions of PgBouncer and PgBouncer Exporter in chart (#29919)
  • Reformat chart templates (#29917)
  • Reformat chart templates part 2 (#29941)
  • Reformat chart templates part 3 (#30312)
  • Replace deprecated k8s registry references (#29938)
  • Fix airflow_dags_mount formatting (#29296)
  • Fix webserver.service.ports formatting (#29297)

Apache Airflow 2.5.3

01 Apr 09:33
2.5.3
cb842dd
Compare
Choose a tag to compare

Significant Changes

No significant changes.

Bug Fixes

  • Fix DagProcessorJob integration for standalone dag-processor (#30278)
  • Fix proper termination of gunicorn when it hangs (#30188)
  • Fix XCom.get_one exactly one exception text (#30183)
  • Correct the VARCHAR size to 250. (#30178)
  • Revert fix for on_failure_callback when task receives a SIGTERM (#30165)
  • Move read only property to DagState to fix generated docs (#30149)
  • Ensure that dag.partial_subset doesn't mutate task group properties (#30129)
  • Fix inconsistent returned value of airflow dags next-execution cli command (#30117)
  • Fix www/utils.dag_run_link redirection (#30098)
  • Fix TriggerRuleDep when the mapped tasks count is 0 (#30084)
  • Dag processor manager, add retry_db_transaction to _fetch_callbacks (#30079)
  • Fix db clean command for mysql db (#29999)
  • Avoid considering EmptyOperator in mini scheduler (#29979)
  • Fix some long known Graph View UI problems (#29971, #30355, #30360)
  • Fix dag docs toggle icon initial angle (#29970)
  • Fix tags selection in DAGs UI (#29944)
  • Including airflow/example_dags/sql/sample.sql in MANIFEST.in (#29883)
  • Fixing broken filter in /taskinstance/list view (#29850)
  • Allow generic param dicts (#29782)
  • Fix update_mask in patch variable route (#29711)
  • Strip markup from app_name if instance_name_has_markup = True (#28894)

Misc/Internal

  • Revert "Also limit importlib on Python 3.9 (#30069)" (#30209)
  • Add custom_operator_name to @task.sensor tasks (#30131)
  • Bump webpack from 5.73.0 to 5.76.0 in /airflow/www (#30112)
  • Formatted config (#30103)
  • Remove upper bound limit of astroid (#30033)
  • Remove accidentally merged vendor daemon patch code (#29895)
  • Fix warning in airflow tasks test command regarding absence of data_interval (#27106)

Doc only changes

  • Adding more information regarding top level code (#30040)
  • Update workday example (#30026)
  • Fix some typos in the DAGs docs (#30015)
  • Update set-up-database.rst (#29991)
  • Fix some typos on the kubernetes documentation (#29936)
  • Fix some punctuation and grammar (#29342)