Skip to content

Fix race condition in auth manager initialization#62431

Open
kimyoungi99 wants to merge 3 commits intoapache:mainfrom
kimyoungi99:fix/thread-safe-auth-manager-init-v2
Open

Fix race condition in auth manager initialization#62431
kimyoungi99 wants to merge 3 commits intoapache:mainfrom
kimyoungi99:fix/thread-safe-auth-manager-init-v2

Conversation

@kimyoungi99
Copy link
Contributor

@kimyoungi99 kimyoungi99 commented Feb 24, 2026

Closes #61108

This is a follow-up to #62214 (reverted in #62404).

Problem

Concurrent requests to /auth/token cause intermittent 500 errors:

AttributeError: 'AirflowAppBuilder' object has no attribute 'sm'

create_auth_manager() creates a new instance on every call. Under concurrent requests, one thread overwrites _AuthManagerState.instance while another's is still initializing.

Previous approach (#62214) and why it was reverted

The previous fix added purge_cached_app() in get_application_builder(), but that function is called at runtime by FAB FastAPI routes (login, user/role management). Clearing the singleton on every call broke subsequent core API requests with KeyError: 'AUTH_USER_REGISTRATION'.

This fix

  1. create_auth_manager(): Double-checked locking with isinstance validation — creates the singleton once, replaces it only when the auth manager class changes (e.g. SimpleAuthManagerFabAuthManager).

  2. init_appbuilder.py: Clears security_manager @cached_property when init_app() is called with a new Flask app, so _init_config() runs against the current app context.

No changes to get_application_builder() or test fixtures.

Testing

Added test_create_auth_manager_thread_safety — verifies singleton behavior under 10 concurrent threads.

@eladkal eladkal added area:core backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch and removed area:providers provider:fab labels Feb 25, 2026
@eladkal eladkal added this to the Airflow 3.1.8 milestone Feb 25, 2026
Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kimyoungi99, thanks for raising the PR again!

Would you mind starting Airflow locally to verify the system behavior, in case the situation described in #62404 happens again?

If you haven’t installed Breeze yet, you can run: uv tool install -e ./dev/breeze --force
Then run: breeze start-airflow --mount-sources providers-and-tests --auth-manager FabAuthManager to verify that the updated FabAuthManager and FastAPI app work as expected.

Thanks!

@kimyoungi99 kimyoungi99 force-pushed the fix/thread-safe-auth-manager-init-v2 branch from f9e3584 to 9d0d69a Compare February 26, 2026 08:41
@kimyoungi99
Copy link
Contributor Author

kimyoungi99 commented Feb 26, 2026

Hi @jason810496, thanks for the suggestion!

I ran breeze start-airflow --auth-manager FabAuthManager and verified the system behavior:

Sequential requests — all working correctly:

  • Token generation (POST /auth/token/cli) → 201
  • FAB routes (/auth/fab/v1/users, /auth/fab/v1/roles) → 200
  • Core API (/api/v2/dags) after FAB routes → 200
  • No KeyError: AUTH_USER_REGISTRATION or AttributeError

Concurrent requests — while testing concurrent FAB + Core requests, I discovered an additional race condition in init_app(). get_application_builder() is called per-request by FAB FastAPI routes, and concurrent calls interleave mutations on the singleton auth_manager's appbuilder and security_manager with view registration reads, causing KeyError: 'AUTH_USER_REGISTRATION' and AttributeError: 'NoneType' object has no attribute '__module__'.

Added a new commit (ad1324f) that serializes init_app() with a threading.Lock around the critical section. After the fix, concurrent token generation (5 simultaneous), mixed FAB + Core requests (5 rounds of 3 concurrent) all return 200/201 consistently.

@kimyoungi99 kimyoungi99 force-pushed the fix/thread-safe-auth-manager-init-v2 branch 2 times, most recently from d7a80ca to 2f82360 Compare February 27, 2026 02:41
…races

FAB FastAPI routes call get_application_builder() on every request,
which creates a new Flask app and invokes init_app(). Concurrent calls
race on the singleton auth_manager's appbuilder and security_manager,
causing KeyError: 'AUTH_USER_REGISTRATION' and AttributeError.

Add _init_app_lock around the critical section in init_app() that
mutates the singleton auth_manager state and registers views, so
concurrent get_application_builder() calls are serialized.
@kimyoungi99 kimyoungi99 force-pushed the fix/thread-safe-auth-manager-init-v2 branch from 2f82360 to ad1324f Compare February 27, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:core backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition causes "AirflowAppBuilder has no attribute 'sm'" on concurrent auth requests

4 participants