Skip to content

Fix license header validation to compare against the correct base branch#24197

Merged
AAraKKe merged 8 commits into
masterfrom
aarakke/fix-license-header-base-ref-detection
Jun 26, 2026
Merged

Fix license header validation to compare against the correct base branch#24197
AAraKKe merged 8 commits into
masterfrom
aarakke/fix-license-header-base-ref-detection

Conversation

@AAraKKe

@AAraKKe AAraKKe commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Makes the license-header validator compare each file against its actual base branch instead of always against origin/master.

A new get_base_ref() in datadog_checks_dev/.../tooling/git.py resolves the comparison ref by priority:

  1. GITHUB_BASE_REF (GitHub Actions PR context: the target branch).
  2. GITHUB_REF_NAME when it is master or a release branch (N.N.x) for push events.
  3. The closest ancestor among master and release branches, picked by most recent git merge-base timestamp (_find_closest_base_ref()), falling back to origin/master.

license_headers._get_previous() now uses get_base_ref() instead of the hardcoded origin/master.

Motivation

When a file exists on a release branch but has been deleted from master, the validator compared against origin/master, found nothing, and treated the file as new, enforcing the current-year copyright template. Files written in earlier years then failed validation in PRs targeting the release branch with "file does not match expected license format". Comparing against the real base branch fixes these false failures.

Testing

  • Mocked unit tests cover get_base_ref() env-var precedence and the _find_closest_base_ref() candidate filtering/selection and fallbacks.
  • Real-git integration tests build throwaway repos (real commits, dated history, populated refs/remotes/origin/*) and exercise _find_closest_base_ref() / get_base_ref() end to end with no mocking, so the actual for-each-ref / merge-base / %ct plumbing and output capture are verified rather than simulated.

The real-git tests hand-roll their setup (a local _git() subprocess wrapper plus a restore_root fixture) because datadog_checks_dev ships no git-in-tests helpers. ddev already does: a ClonedRepo helper and local_clone/repository fixtures (ddev/tests/helpers/git.py, ddev/tests/conftest.py) that clone the repo, set git identity, and manage branches/worktrees via PLATFORM.check_command_output and Path.as_cwd. When this validation is migrated to ddev, these tests are better placed there and can reuse that machinery instead of reimplementing it.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

When a file is present in a release branch but deleted from master, the
validator was treating it as a new file and enforcing the current-year
copyright template, causing false failures in PRs targeting that release
branch.

The fix detects the appropriate base ref (GITHUB_BASE_REF in CI, the
current branch name when it is master or a release branch, or the
closest ancestor via git merge-base otherwise) so the comparison is
always made against the actual parent branch.
@datadog-official

datadog-official Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Tests  Code Coverage

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 94.06%
Overall Coverage: 67.07% (-21.00%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 60b6cd8 | Docs | Datadog PR Page | Give us feedback!

@AAraKKe AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label Jun 26, 2026
@AAraKKe

AAraKKe commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00f750806a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

relpath = path.relative_to(get_root())
try:
return git_show_file(str(relpath), "origin/master")
return git_show_file(str(relpath), get_base_ref())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cache the base ref before per-file lookups

When ddev validate license-headers runs outside the GitHub env vars, this calls get_base_ref() once for every Python file being scanned. In that fallback path get_base_ref() runs for-each-ref and then merge-base/show for every release branch, so local changed/all validation becomes O(files × release branches) extra git processes and can become very slow or time out on large integrations. Resolve the base ref once per validation run and reuse it for each git_show_file call.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed that by having the get_base_ref being called with a lru cache of one element. Not derived at the beginning because if there is nothing to validate we will be doing all git operations no matter what. Keeping it lazily executed we ensure we only derive it when we have something to actually validate.

validate_license_headers resolved the base ref for every scanned file, and the CLI invokes it once per check, so a local full run triggered a git for-each-ref + per-branch merge-base/show storm O(files x checks) times. A shared build_get_previous() resolver now resolves the base ref lazily at most once and is reused across every check.
@AAraKKe AAraKKe force-pushed the aarakke/fix-license-header-base-ref-detection branch from 9f857ba to 7a7da23 Compare June 26, 2026 09:12
AAraKKe added 4 commits June 26, 2026 11:29
…ixture

- Use `X | None`/`list[...]` on the edited validate_license_headers signature, matching the new helpers; trim unused typing imports.
- Move the restore_root fixture into tests/tooling/conftest.py so test_git and test_license_headers share it.
- Clarify build_get_previous's docstring: the returned instance caches the base ref; callers should share one to amortize across checks.
…e-fail

- Have the three mocked _find_closest_base_ref tests take the restore_root fixture so set_root('/foo/') no longer leaks into the session; drop the unused monkeypatch param.
- Add a test for the branch where for-each-ref succeeds but every merge-base fails, which must fall back to origin/master.
- Pre-build a sha-to-timestamp dict in the _find_closest_base_ref mock helper for an O(1) lookup.
- Remove the inline comment in the license-headers command; the factory and its docstring already convey the intent.
@AAraKKe

AAraKKe commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

Reviewed commit: 60b6cd88db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@AAraKKe AAraKKe marked this pull request as ready for review June 26, 2026 11:11
@AAraKKe AAraKKe requested a review from a team as a code owner June 26, 2026 11:11

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60b6cd88db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@click.option('--fix', is_flag=True, help='Attempt to fix errors')
@click.pass_context
def license_headers(ctx, check, fix):
def license_headers(ctx: click.Context, check: str | None, fix: bool) -> None:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove unsolicited type hints from refactored command

/workspace/integrations-core/AGENTS.md says under Python Type Hinting > Refactoring code to "never add type hints" to existing untyped methods unless explicitly asked. This PR is a license-header base-ref fix, but it annotates the pre-existing Click command signature here without a requested typing refactor, so it violates the repo guidance; please keep this existing signature untyped or move typing cleanup to an explicitly requested change.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

says under Python Type Hinting > Refactoring code to "never add type hints" to existing untyped methods unless explicitly asked

I would say AGENTS.md needs to be changed. I explicitly asked.

@AAraKKe AAraKKe added this pull request to the merge queue Jun 26, 2026
Merged via the queue into master with commit 30053d6 Jun 26, 2026
55 of 58 checks passed
@AAraKKe AAraKKe deleted the aarakke/fix-license-header-base-ref-detection branch June 26, 2026 12:15
@dd-octo-sts

dd-octo-sts Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and code coverage settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

AAraKKe added a commit that referenced this pull request Jun 26, 2026
…nch (#24197) (#24198)

* Fix license header validation to compare against the correct base branch

When a file is present in a release branch but deleted from master, the
validator was treating it as a new file and enforcing the current-year
copyright template, causing false failures in PRs targeting that release
branch.

The fix detects the appropriate base ref (GITHUB_BASE_REF in CI, the
current branch name when it is master or a release branch, or the
closest ancestor via git merge-base otherwise) so the comparison is
always made against the actual parent branch.

* Add changelog entry

* Add real-git integration tests for base ref detection

* Resolve license-header base ref once per run instead of per file

validate_license_headers resolved the base ref for every scanned file, and the CLI invokes it once per check, so a local full run triggered a git for-each-ref + per-branch merge-base/show storm O(files x checks) times. A shared build_get_previous() resolver now resolves the base ref lazily at most once and is reused across every check.

* Add type annotations to the license-header methods touched here

* Modernize validate_license_headers hints and share the restore_root fixture

- Use `X | None`/`list[...]` on the edited validate_license_headers signature, matching the new helpers; trim unused typing imports.
- Move the restore_root fixture into tests/tooling/conftest.py so test_git and test_license_headers share it.
- Clarify build_get_previous's docstring: the returned instance caches the base ref; callers should share one to amortize across checks.

* Use restore_root in the mocked base-ref tests and cover all-merge-base-fail

- Have the three mocked _find_closest_base_ref tests take the restore_root fixture so set_root('/foo/') no longer leaks into the session; drop the unused monkeypatch param.
- Add a test for the branch where for-each-ref succeeds but every merge-base fails, which must fall back to origin/master.

* Simplify base-ref test mock lookup and drop a redundant comment

- Pre-build a sha-to-timestamp dict in the _find_closest_base_ref mock helper for an O(1) lookup.
- Remove the inline comment in the license-headers command; the factory and its docstring already convey the intent.

(cherry picked from commit 30053d6)

Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com>
AAraKKe added a commit that referenced this pull request Jun 26, 2026
…nch (#24197) (#24199)

* Fix license header validation to compare against the correct base branch

When a file is present in a release branch but deleted from master, the
validator was treating it as a new file and enforcing the current-year
copyright template, causing false failures in PRs targeting that release
branch.

The fix detects the appropriate base ref (GITHUB_BASE_REF in CI, the
current branch name when it is master or a release branch, or the
closest ancestor via git merge-base otherwise) so the comparison is
always made against the actual parent branch.

* Add changelog entry

* Add real-git integration tests for base ref detection

* Resolve license-header base ref once per run instead of per file

validate_license_headers resolved the base ref for every scanned file, and the CLI invokes it once per check, so a local full run triggered a git for-each-ref + per-branch merge-base/show storm O(files x checks) times. A shared build_get_previous() resolver now resolves the base ref lazily at most once and is reused across every check.

* Add type annotations to the license-header methods touched here

* Modernize validate_license_headers hints and share the restore_root fixture

- Use `X | None`/`list[...]` on the edited validate_license_headers signature, matching the new helpers; trim unused typing imports.
- Move the restore_root fixture into tests/tooling/conftest.py so test_git and test_license_headers share it.
- Clarify build_get_previous's docstring: the returned instance caches the base ref; callers should share one to amortize across checks.

* Use restore_root in the mocked base-ref tests and cover all-merge-base-fail

- Have the three mocked _find_closest_base_ref tests take the restore_root fixture so set_root('/foo/') no longer leaks into the session; drop the unused monkeypatch param.
- Add a test for the branch where for-each-ref succeeds but every merge-base fails, which must fall back to origin/master.

* Simplify base-ref test mock lookup and drop a redundant comment

- Pre-build a sha-to-timestamp dict in the _find_closest_base_ref mock helper for an O(1) lookup.
- Remove the inline comment in the license-headers command; the factory and its docstring already convey the intent.

(cherry picked from commit 30053d6)

Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants