Skip to content

Add prek hook to enforce keyword-only session on @provide_session#67150

Open
jason810496 wants to merge 9 commits into
apache:mainfrom
jason810496:prek-hook-provide-session-kwargs
Open

Add prek hook to enforce keyword-only session on @provide_session#67150
jason810496 wants to merge 9 commits into
apache:mainfrom
jason810496:prek-hook-provide-session-kwargs

Conversation

@jason810496
Copy link
Copy Markdown
Member

Why

The project convention is that any function decorated with @provide_session must declare session as keyword-only (after a bare *), so callers cannot pass it positionally by accident. Today this is only documented in contributing-docs/05_pull_requests.rst#database-session-handling, so regressions slip in during review.

What

  • New prek hook check-no-new-provide-session-positional scans changed files (or --all-files) for @provide_session functions whose session parameter is still positional.
  • Existing offenders are recorded in scripts/ci/prek/known_provide_session_positional.txt as path::N entries (max count per file). The hook fails when a file exceeds its limit, and auto-tightens the entry when the count decreases so the allowlist only shrinks over time. (Following convention of build(prek-hook): Check whether new "raise AirflowException" is added #55416 )
  • Supports --all-files, --cleanup (drop stale entries), and --generate (rebuild the allowlist from scratch).

Was generative AI tooling used to co-author this PR?

@boring-cyborg boring-cyborg Bot added area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels May 19, 2026
@jason810496 jason810496 requested a review from Copilot May 19, 2026 04:51
@jason810496 jason810496 self-assigned this May 19, 2026
@jason810496 jason810496 removed the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label May 19, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new prek/pre-commit hook intended to prevent regressions where functions decorated with @provide_session accept session positionally, enforcing the documented convention that session should be keyword-only.

Changes:

  • Add check_provide_session_kwargs.py prek hook that AST-scans Python files for @provide_session functions with positional session.
  • Introduce a shrinking allowlist (known_provide_session_positional.txt) to grandfather existing violations while preventing increases.
  • Add unit tests for decorator detection, violation counting, and allowlist tightening/cleanup behavior; wire the hook into .pre-commit-config.yaml.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
scripts/ci/prek/check_provide_session_kwargs.py Implements the new AST-based checker, allowlist manager, and CLI modes (--all-files, --cleanup, --generate).
scripts/ci/prek/known_provide_session_positional.txt Initial allowlist of existing positional-session offenders with per-file maximum counts.
scripts/tests/ci/prek/test_check_provide_session_kwargs.py Unit tests for the new hook’s core logic and allowlist behavior.
.pre-commit-config.yaml Registers the new hook so it runs on relevant Python paths (and the allowlist/script paths).

Comment thread scripts/ci/prek/check_provide_session_kwargs.py Outdated
Comment thread scripts/ci/prek/check_provide_session_kwargs.py
Comment thread scripts/ci/prek/check_provide_session_kwargs.py Outdated
Comment thread scripts/tests/ci/prek/test_check_provide_session_kwargs.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread scripts/ci/prek/check_provide_session_kwargs.py Outdated
Comment thread scripts/tests/ci/prek/test_check_provide_session_kwargs.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread scripts/ci/prek/check_provide_session_kwargs.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread scripts/tests/ci/prek/test_check_provide_session_kwargs.py
Comment thread scripts/ci/prek/check_provide_session_kwargs.py
Comment thread scripts/ci/prek/check_provide_session_kwargs.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread scripts/ci/prek/check_provide_session_kwargs.py Outdated
@jason810496 jason810496 force-pushed the prek-hook-provide-session-kwargs branch from e8fdddb to cccdbc6 Compare May 19, 2026 06:12
@jason810496 jason810496 requested a review from Copilot May 19, 2026 06:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread scripts/tests/ci/prek/test_check_provide_session_kwargs.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread scripts/ci/prek/check_provide_session_kwargs.py
Comment thread scripts/tests/ci/prek/test_check_provide_session_kwargs.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@jason810496 jason810496 marked this pull request as ready for review May 19, 2026 07:28
- Detect positional-only `session` parameters (`def f(session, /, ...)`)
  by also scanning `args.posonlyargs`.
- Close the allowlist-edit bypass: when
  `known_provide_session_positional.txt` is among the changed files, the
  hook now re-validates every listed file so contributors cannot raise
  counts without triggering the check.
- Clarify that `--all-files` / `--generate` scan the project source
  roots (airflow-core, airflow-ctl, task-sdk, providers, shared), not
  the entire repository, and centralise that list in
  `_PROJECT_SOURCE_ROOTS`.
- Update the test annotation for `find_violations` to include
  `ast.AsyncFunctionDef` so async functions are typed correctly.
- Add tests covering positional-only detection and the allowlist-edit
  re-validation path.
Resolve both sides of the allowlist-file comparison in
`_expand_for_allowlist_edits` so the detection is robust to symlinks
and to callers passing unresolved paths. Without this, a symlinked or
non-normalized invocation could skip the re-validation path and let a
loosened allowlist sail through.

Update tests to pass resolved paths (matching what `main()` does in
production) and add a dedicated test that points the helper at a
symlink to the allowlist file.
The previous message said "Scanning <REPO_ROOT>" but the helper only
walks the project source roots
(airflow-core, airflow-ctl, task-sdk, providers, shared). Spell out
those roots in the log so the output matches what is actually scanned.
- Reject allowlist entries that escape `REPO_ROOT` (absolute paths or
  `..` segments) at load time. Without this guard, `relative_to()` in
  the violation reporter could crash on a maliciously or accidentally
  malformed allowlist line. Skipped entries are logged.
- Detect when the allowlist file itself is among the changed paths but
  no longer exists on disk (deletion or rename in the same commit) and
  fail with a clear "restore or regenerate" message instead of silently
  passing.
- Move the `check_provide_session_kwargs` module alias import to the
  top of the test module rather than inside the `fake_repo` fixture.

Tests cover the missing-file failure path and the unsafe-entry skip.
When only the allowlist file is changed, the hook now unions the previous
allowlist (from `git show HEAD:<file>`) with the current one, so removing
an entry can't silently drop coverage for a file that still has positional
`session` arguments.
@jason810496 jason810496 force-pushed the prek-hook-provide-session-kwargs branch from 184add7 to 19f2e2a Compare May 22, 2026 08:39
return 0

def cleanup(self) -> int:
allowlist = self.load()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by loading, I thought we're to save it as a class vairable 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can refactor as you mentioned.

@@ -0,0 +1,89 @@
airflow-core/src/airflow/api/common/delete_dag.py::1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these things we want to exclude or do? Or both?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will prefer to keep the existing one and avoid the mistake usage in the future.
Most of them should be fine to fix as keyword-only, but not sure will we introduce regression by changing it to keyword-only.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, not asking for doing the change. just want to make sure whether we have understanding on these (perfectly fine if we're not)

Comment thread .pre-commit-config.yaml
entry: ./scripts/ci/prek/check_provide_session_kwargs.py
language: python
pass_filenames: true
files: ^(airflow-core|airflow-ctl|task-sdk|providers|shared)/.*\.py$|^scripts/ci/prek/known_provide_session_positional\.txt$|^scripts/ci/prek/check_provide_session_kwargs\.py$
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we're not suppose to have provide_session in task-sdk

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like so.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants