Skip to content

Comments

providers-fab: Handle database errors in cleanup_session_middleware Session.remove()#62336

Merged
vincbeck merged 7 commits intoapache:mainfrom
8silvergun:fix/fab-cleanup-session-middleware-error-handling
Feb 24, 2026
Merged

providers-fab: Handle database errors in cleanup_session_middleware Session.remove()#62336
vincbeck merged 7 commits intoapache:mainfrom
8silvergun:fix/fab-cleanup-session-middleware-error-handling

Conversation

@8silvergun
Copy link
Contributor

@8silvergun 8silvergun commented Feb 23, 2026

Summary

Fix unhandled MySQLdb.OperationalError: (2006, 'Server has gone away') in cleanup_session_middleware when Session.remove() encounters a dead database connection.

Closes: #62335

Problem

PR #61480 introduced cleanup_session_middleware in providers-fab 3.3.0 to fix PendingRollbackError (#59349). The middleware calls Session.remove() in a bare finally block:

finally:
    from airflow import settings
    if settings.Session:
        settings.Session.remove()  # can raise if DB connection is dead

When the underlying database connection has been closed server-side (MySQL timeout, Aurora failover, network interruption), Session.remove() internally attempts a ROLLBACK on the dead connection, raising OperationalError. This unhandled exception propagates as a 500 Internal Server Error — even though the original request completed successfully.

Production error log:

[error] Exception in ASGI application [airflow.providers.fab.auth_manager.fab_auth_manager] loc=fab_auth_manager.py:243
  File ".../fab_auth_manager.py", line 243, in cleanup_session_middleware
    settings.Session.remove()
  ...
MySQLdb.OperationalError: (2006, 'Server has gone away')

Solution

Wrap Session.remove() in a try-except that catches and logs the error as a warning:

finally:
    from airflow import settings
    if settings.Session:
        try:
            settings.Session.remove()
        except Exception:
            log.warning("Failed to remove session during cleanup", exc_info=True)

This is consistent with session cleanup patterns elsewhere in Airflow (e.g., airflow/utils/session.py).

Why this is safe

  • Session.remove() is a cleanup operation — if it fails because the connection is already dead, the session will be discarded anyway on the next request
  • The warning log preserves visibility for debugging
  • The except Exception is intentionally broad since any error during cleanup should not affect the HTTP response

Testing

Added TestFabAuthManagerSessionCleanupErrorHandling with 2 tests:

  • test_session_remove_db_error_does_not_propagate: Verifies OperationalError (MySQL 'Server has gone away') is caught
  • test_session_remove_generic_error_does_not_propagate: Verifies other exceptions (e.g., RuntimeError) are also caught

Related

AI Disclosure

This PR was developed with AI assistance.

When Session.remove() is called in the finally block of
cleanup_session_middleware, it may raise an OperationalError
if the underlying database connection is already dead (e.g.,
MySQL 'Server has gone away', Aurora failover, network timeout).

The unhandled exception propagates as a 500 Internal Server
Error to the client, even when the original request succeeded.

This commit wraps Session.remove() in a try-except block that
catches and logs the error as a warning, consistent with how
session cleanup is handled elsewhere in Airflow.

Fixes follow-up to apache#61480.
@8silvergun 8silvergun requested a review from vincbeck as a code owner February 23, 2026 01:09
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 23, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@vincbeck
Copy link
Contributor

Static checks are failing, running prek should solve them

nanaones and others added 6 commits February 24, 2026 00:29
Define module logger for cleanup warnings and drop unused test response assignment to satisfy ruff.
Move logger definition below imports to satisfy static checks and assert warning logging in session cleanup tests so failures are surfaced clearly.
@vincbeck vincbeck merged commit 856c3ec into apache:main Feb 24, 2026
86 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 24, 2026

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

providers-fab: Unhandled MySQLdb.OperationalError in cleanup_session_middleware Session.remove() causes 500 on api-server

3 participants