Add prek hook to enforce keyword-only session on @provide_session#67150
Add prek hook to enforce keyword-only session on @provide_session#67150jason810496 wants to merge 9 commits into
session on @provide_session#67150Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new prek/pre-commit hook intended to prevent regressions where functions decorated with @provide_session accept session positionally, enforcing the documented convention that session should be keyword-only.
Changes:
- Add
check_provide_session_kwargs.pyprek hook that AST-scans Python files for@provide_sessionfunctions with positionalsession. - Introduce a shrinking allowlist (
known_provide_session_positional.txt) to grandfather existing violations while preventing increases. - Add unit tests for decorator detection, violation counting, and allowlist tightening/cleanup behavior; wire the hook into
.pre-commit-config.yaml.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
scripts/ci/prek/check_provide_session_kwargs.py |
Implements the new AST-based checker, allowlist manager, and CLI modes (--all-files, --cleanup, --generate). |
scripts/ci/prek/known_provide_session_positional.txt |
Initial allowlist of existing positional-session offenders with per-file maximum counts. |
scripts/tests/ci/prek/test_check_provide_session_kwargs.py |
Unit tests for the new hook’s core logic and allowlist behavior. |
.pre-commit-config.yaml |
Registers the new hook so it runs on relevant Python paths (and the allowlist/script paths). |
e8fdddb to
cccdbc6
Compare
df2fab1 to
184add7
Compare
… `session` positionally
- Detect positional-only `session` parameters (`def f(session, /, ...)`) by also scanning `args.posonlyargs`. - Close the allowlist-edit bypass: when `known_provide_session_positional.txt` is among the changed files, the hook now re-validates every listed file so contributors cannot raise counts without triggering the check. - Clarify that `--all-files` / `--generate` scan the project source roots (airflow-core, airflow-ctl, task-sdk, providers, shared), not the entire repository, and centralise that list in `_PROJECT_SOURCE_ROOTS`. - Update the test annotation for `find_violations` to include `ast.AsyncFunctionDef` so async functions are typed correctly. - Add tests covering positional-only detection and the allowlist-edit re-validation path.
Resolve both sides of the allowlist-file comparison in `_expand_for_allowlist_edits` so the detection is robust to symlinks and to callers passing unresolved paths. Without this, a symlinked or non-normalized invocation could skip the re-validation path and let a loosened allowlist sail through. Update tests to pass resolved paths (matching what `main()` does in production) and add a dedicated test that points the helper at a symlink to the allowlist file.
The previous message said "Scanning <REPO_ROOT>" but the helper only walks the project source roots (airflow-core, airflow-ctl, task-sdk, providers, shared). Spell out those roots in the log so the output matches what is actually scanned.
- Reject allowlist entries that escape `REPO_ROOT` (absolute paths or `..` segments) at load time. Without this guard, `relative_to()` in the violation reporter could crash on a maliciously or accidentally malformed allowlist line. Skipped entries are logged. - Detect when the allowlist file itself is among the changed paths but no longer exists on disk (deletion or rename in the same commit) and fail with a clear "restore or regenerate" message instead of silently passing. - Move the `check_provide_session_kwargs` module alias import to the top of the test module rather than inside the `fake_repo` fixture. Tests cover the missing-file failure path and the unsafe-entry skip.
When only the allowlist file is changed, the hook now unions the previous allowlist (from `git show HEAD:<file>`) with the current one, so removing an entry can't silently drop coverage for a file that still has positional `session` arguments.
184add7 to
19f2e2a
Compare
| return 0 | ||
|
|
||
| def cleanup(self) -> int: | ||
| allowlist = self.load() |
There was a problem hiding this comment.
by loading, I thought we're to save it as a class vairable 🤔
There was a problem hiding this comment.
Sure, I can refactor as you mentioned.
| @@ -0,0 +1,89 @@ | |||
| airflow-core/src/airflow/api/common/delete_dag.py::1 | |||
There was a problem hiding this comment.
Are these things we want to exclude or do? Or both?
There was a problem hiding this comment.
I will prefer to keep the existing one and avoid the mistake usage in the future.
Most of them should be fine to fix as keyword-only, but not sure will we introduce regression by changing it to keyword-only.
There was a problem hiding this comment.
yep, not asking for doing the change. just want to make sure whether we have understanding on these (perfectly fine if we're not)
| entry: ./scripts/ci/prek/check_provide_session_kwargs.py | ||
| language: python | ||
| pass_filenames: true | ||
| files: ^(airflow-core|airflow-ctl|task-sdk|providers|shared)/.*\.py$|^scripts/ci/prek/known_provide_session_positional\.txt$|^scripts/ci/prek/check_provide_session_kwargs\.py$ |
There was a problem hiding this comment.
I thought we're not suppose to have provide_session in task-sdk
Why
The project convention is that any function decorated with
@provide_sessionmust declaresessionas keyword-only (after a bare*), so callers cannot pass it positionally by accident. Today this is only documented incontributing-docs/05_pull_requests.rst#database-session-handling, so regressions slip in during review.What
check-no-new-provide-session-positionalscans changed files (or--all-files) for@provide_sessionfunctions whosesessionparameter is still positional.scripts/ci/prek/known_provide_session_positional.txtaspath::Nentries (max count per file). The hook fails when a file exceeds its limit, and auto-tightens the entry when the count decreases so the allowlist only shrinks over time. (Following convention of build(prek-hook): Check whether new "raise AirflowException" is added #55416 )--all-files,--cleanup(drop stale entries), and--generate(rebuild the allowlist from scratch).Was generative AI tooling used to co-author this PR?