Fix license header validation to compare against the correct base branch#24197
Conversation
When a file is present in a release branch but deleted from master, the validator was treating it as a new file and enforcing the current-year copyright template, causing false failures in PRs targeting that release branch. The fix detects the appropriate base ref (GITHUB_BASE_REF in CI, the current branch name when it is master or a release branch, or the closest ancestor via git merge-base otherwise) so the comparison is always made against the actual parent branch.
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 60b6cd8 | Docs | Datadog PR Page | Give us feedback! |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00f750806a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| relpath = path.relative_to(get_root()) | ||
| try: | ||
| return git_show_file(str(relpath), "origin/master") | ||
| return git_show_file(str(relpath), get_base_ref()) |
There was a problem hiding this comment.
Cache the base ref before per-file lookups
When ddev validate license-headers runs outside the GitHub env vars, this calls get_base_ref() once for every Python file being scanned. In that fallback path get_base_ref() runs for-each-ref and then merge-base/show for every release branch, so local changed/all validation becomes O(files × release branches) extra git processes and can become very slow or time out on large integrations. Resolve the base ref once per validation run and reuse it for each git_show_file call.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed that by having the get_base_ref being called with a lru cache of one element. Not derived at the beginning because if there is nothing to validate we will be doing all git operations no matter what. Keeping it lazily executed we ensure we only derive it when we have something to actually validate.
validate_license_headers resolved the base ref for every scanned file, and the CLI invokes it once per check, so a local full run triggered a git for-each-ref + per-branch merge-base/show storm O(files x checks) times. A shared build_get_previous() resolver now resolves the base ref lazily at most once and is reused across every check.
9f857ba to
7a7da23
Compare
…ixture - Use `X | None`/`list[...]` on the edited validate_license_headers signature, matching the new helpers; trim unused typing imports. - Move the restore_root fixture into tests/tooling/conftest.py so test_git and test_license_headers share it. - Clarify build_get_previous's docstring: the returned instance caches the base ref; callers should share one to amortize across checks.
…e-fail
- Have the three mocked _find_closest_base_ref tests take the restore_root fixture so set_root('/foo/') no longer leaks into the session; drop the unused monkeypatch param.
- Add a test for the branch where for-each-ref succeeds but every merge-base fails, which must fall back to origin/master.
- Pre-build a sha-to-timestamp dict in the _find_closest_base_ref mock helper for an O(1) lookup. - Remove the inline comment in the license-headers command; the factory and its docstring already convey the intent.
|
@codex review |
|
Codex Review: Didn't find any major issues. Can't wait for the next one! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 60b6cd88db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @click.option('--fix', is_flag=True, help='Attempt to fix errors') | ||
| @click.pass_context | ||
| def license_headers(ctx, check, fix): | ||
| def license_headers(ctx: click.Context, check: str | None, fix: bool) -> None: |
There was a problem hiding this comment.
Remove unsolicited type hints from refactored command
/workspace/integrations-core/AGENTS.md says under Python Type Hinting > Refactoring code to "never add type hints" to existing untyped methods unless explicitly asked. This PR is a license-header base-ref fix, but it annotates the pre-existing Click command signature here without a requested typing refactor, so it violates the repo guidance; please keep this existing signature untyped or move typing cleanup to an explicitly requested change.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
says under Python Type Hinting > Refactoring code to "never add type hints" to existing untyped methods unless explicitly asked
I would say AGENTS.md needs to be changed. I explicitly asked.
Validation ReportAll 21 validations passed. Show details
|
…nch (#24197) (#24198) * Fix license header validation to compare against the correct base branch When a file is present in a release branch but deleted from master, the validator was treating it as a new file and enforcing the current-year copyright template, causing false failures in PRs targeting that release branch. The fix detects the appropriate base ref (GITHUB_BASE_REF in CI, the current branch name when it is master or a release branch, or the closest ancestor via git merge-base otherwise) so the comparison is always made against the actual parent branch. * Add changelog entry * Add real-git integration tests for base ref detection * Resolve license-header base ref once per run instead of per file validate_license_headers resolved the base ref for every scanned file, and the CLI invokes it once per check, so a local full run triggered a git for-each-ref + per-branch merge-base/show storm O(files x checks) times. A shared build_get_previous() resolver now resolves the base ref lazily at most once and is reused across every check. * Add type annotations to the license-header methods touched here * Modernize validate_license_headers hints and share the restore_root fixture - Use `X | None`/`list[...]` on the edited validate_license_headers signature, matching the new helpers; trim unused typing imports. - Move the restore_root fixture into tests/tooling/conftest.py so test_git and test_license_headers share it. - Clarify build_get_previous's docstring: the returned instance caches the base ref; callers should share one to amortize across checks. * Use restore_root in the mocked base-ref tests and cover all-merge-base-fail - Have the three mocked _find_closest_base_ref tests take the restore_root fixture so set_root('/foo/') no longer leaks into the session; drop the unused monkeypatch param. - Add a test for the branch where for-each-ref succeeds but every merge-base fails, which must fall back to origin/master. * Simplify base-ref test mock lookup and drop a redundant comment - Pre-build a sha-to-timestamp dict in the _find_closest_base_ref mock helper for an O(1) lookup. - Remove the inline comment in the license-headers command; the factory and its docstring already convey the intent. (cherry picked from commit 30053d6) Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com>
…nch (#24197) (#24199) * Fix license header validation to compare against the correct base branch When a file is present in a release branch but deleted from master, the validator was treating it as a new file and enforcing the current-year copyright template, causing false failures in PRs targeting that release branch. The fix detects the appropriate base ref (GITHUB_BASE_REF in CI, the current branch name when it is master or a release branch, or the closest ancestor via git merge-base otherwise) so the comparison is always made against the actual parent branch. * Add changelog entry * Add real-git integration tests for base ref detection * Resolve license-header base ref once per run instead of per file validate_license_headers resolved the base ref for every scanned file, and the CLI invokes it once per check, so a local full run triggered a git for-each-ref + per-branch merge-base/show storm O(files x checks) times. A shared build_get_previous() resolver now resolves the base ref lazily at most once and is reused across every check. * Add type annotations to the license-header methods touched here * Modernize validate_license_headers hints and share the restore_root fixture - Use `X | None`/`list[...]` on the edited validate_license_headers signature, matching the new helpers; trim unused typing imports. - Move the restore_root fixture into tests/tooling/conftest.py so test_git and test_license_headers share it. - Clarify build_get_previous's docstring: the returned instance caches the base ref; callers should share one to amortize across checks. * Use restore_root in the mocked base-ref tests and cover all-merge-base-fail - Have the three mocked _find_closest_base_ref tests take the restore_root fixture so set_root('/foo/') no longer leaks into the session; drop the unused monkeypatch param. - Add a test for the branch where for-each-ref succeeds but every merge-base fails, which must fall back to origin/master. * Simplify base-ref test mock lookup and drop a redundant comment - Pre-build a sha-to-timestamp dict in the _find_closest_base_ref mock helper for an O(1) lookup. - Remove the inline comment in the license-headers command; the factory and its docstring already convey the intent. (cherry picked from commit 30053d6) Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com>
What does this PR do?
Makes the license-header validator compare each file against its actual base branch instead of always against
origin/master.A new
get_base_ref()indatadog_checks_dev/.../tooling/git.pyresolves the comparison ref by priority:GITHUB_BASE_REF(GitHub Actions PR context: the target branch).GITHUB_REF_NAMEwhen it ismasteror a release branch (N.N.x) for push events.masterand release branches, picked by most recentgit merge-basetimestamp (_find_closest_base_ref()), falling back toorigin/master.license_headers._get_previous()now usesget_base_ref()instead of the hardcodedorigin/master.Motivation
When a file exists on a release branch but has been deleted from
master, the validator compared againstorigin/master, found nothing, and treated the file as new, enforcing the current-year copyright template. Files written in earlier years then failed validation in PRs targeting the release branch with "file does not match expected license format". Comparing against the real base branch fixes these false failures.Testing
get_base_ref()env-var precedence and the_find_closest_base_ref()candidate filtering/selection and fallbacks.refs/remotes/origin/*) and exercise_find_closest_base_ref()/get_base_ref()end to end with no mocking, so the actualfor-each-ref/merge-base/%ctplumbing and output capture are verified rather than simulated.The real-git tests hand-roll their setup (a local
_git()subprocess wrapper plus arestore_rootfixture) becausedatadog_checks_devships no git-in-tests helpers.ddevalready does: aClonedRepohelper andlocal_clone/repositoryfixtures (ddev/tests/helpers/git.py,ddev/tests/conftest.py) that clone the repo, set git identity, and manage branches/worktrees viaPLATFORM.check_command_outputandPath.as_cwd. When this validation is migrated toddev, these tests are better placed there and can reuse that machinery instead of reimplementing it.Review checklist (to be filled by reviewers)
qa/requiredif this PR needs QA validation, orqa/skip-qaif it does not. Exactly one of the two is required.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged