[ci](fix) license check#62534
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
skip buildall |
|
/review |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Findings
- : changed-file discovery now relies on . GitHub documents that endpoint as returning a maximum of 3000 files total, so and are truncated on very large PRs. In that case the incremental license config silently skips files after the cutoff, and the workflow can falsely pass unlicensed additions. The previous local path did not have this cap.
- : the previous implementation intentionally fell back to a full scan when changed-file discovery failed. Under the default Actions shell (), a transient failure now aborts the step before any fallback is written, turning a recoverable discovery failure into a hard CI failure.
Critical Checkpoint Conclusions
- Goal of the change: fix license-check failures by adding missing headers to Cloud , exempting non-source artifacts, and tightening PR changed-file detection. The header and ignore-list updates look correct for the touched files, but the workflow rewrite does not fully achieve the goal because it can miss changed files on large PRs and is less robust on discovery failures.
- Small, clear, focused: mostly yes, but the workflow change introduces a repo-wide CI behavior regression beyond the touched Cloud files.
- Concurrency: not applicable for these YAML, CMake, and config-file changes.
- Lifecycle and static initialization: not applicable.
- Configuration items: no runtime configuration added.
- Compatibility and incompatible changes: no FE/BE/storage compatibility impact.
- Parallel code paths: the push and manual paths still run the full scan; the PR path now becomes less complete than those paths.
- Special conditional checks: the changed-file filtering now depends on a REST API contract with a hard file-count limit and needs an uncapped or fallback approach.
- Test coverage: I did not see automated coverage or validation for the new changed-file enumeration path.
- Test result changes: none.
- Observability: sufficient for this scope.
- Transaction, persistence, data writes, and FE-BE variable passing: not applicable.
- Performance: the new API approach is lighter than fetching the base ref, but correctness regresses; correctness should take priority here.
- Other issues: I did not find problems in the added Cloud license headers or in the new ignore entries themselves.
There was a problem hiding this comment.
Blocking findings:
.github/workflows/license-eyes.yml: changed-file discovery now relies onGET /pulls/{number}/files, which GitHub documents as returning at most 3000 files total even with pagination. Large PRs will silently skip license checks on files past that cap, and the previous fallback to a full.licenserc.yamlscan on enumeration failure is also gone..licenserc.yaml:conf/ubsan_ignorelist.txtandextension/dbt-doris/dev-requirements.txtare Doris-owned text configs that support#comments. Ignoring them weakens header enforcement instead of fixing the missing headers in-place.
Critical checkpoints:
- Goal of current task: Partially achieved. The PR adds missing ASF headers to Cloud
CMakeLists.txtfiles and updates the workflow/config, but the new workflow regresses correctness for large PRs and the config broadens ignore coverage for first-party files. - Minimality/focus: Mostly focused on license-check remediation, but the two new ignores above are broader than necessary.
- Concurrency: N/A for these CMake/CI/config changes; no concurrency-sensitive runtime code is touched.
- Lifecycle/static initialization: N/A.
- Configuration changes: Yes. The workflow/config changes are not fully safe because changed-file enumeration can now be incomplete or fail hard.
- Compatibility/incompatible change: No runtime, storage, or FE/BE protocol compatibility impact.
- Parallel code paths: The same
pull_request_targetlicense-check path is used for future PRs, so the changed-file enumeration regression affects all large PRs on that path. - Special conditional checks: The
.licenserc.yamlshortcut itself is fine, but the underlying file list source is now bounded by the API limit. - Test coverage: No tests were added. For CI-only changes that can be acceptable, but there is no validation here for large-PR or API-failure behavior.
- Test result changes: None.
- Observability: No additional observability is needed; the main issue is correctness/reliability.
- Transaction/persistence/data writes/FE-BE variable passing: N/A.
- Performance: Acceptable; the concern is correctness and robustness, not runtime cost.
- Other issues: None beyond the two findings above.
Requesting changes.
| echo "config_file=.licenserc.yaml" >> "$GITHUB_OUTPUT" | ||
| exit 0 | ||
| fi | ||
| all=$(gh api repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files \ |
There was a problem hiding this comment.
Using GET /pulls/{number}/files here introduces a correctness regression. GitHub documents that this endpoint returns at most 3000 files total even with pagination, so on any larger PR all/acmr becomes incomplete and skywalking-eyes will silently skip license checking for files past that cap. It also removes the previous fallback to a full .licenserc.yaml scan when enumeration fails, because a transient gh api error now aborts the step under bash -e. Could we keep the local git diff "${base_sha}...HEAD" path (or another uncapped source) and only use the API as a fallback?
| - "docker/thirdparties/docker-compose/iceberg/spark-defaults.conf.tpl" | ||
| - "docker/thirdparties/docker-compose/postgresql/certs/**" | ||
| - "conf/mysql_ssl_default_certificate/*" | ||
| - "conf/mysql_ssl_default_certificate/client_certificate/ca.pem" |
There was a problem hiding this comment.
I do not think these first-party text configs should be excluded from header checking. Both conf/ubsan_ignorelist.txt and extension/dbt-doris/dev-requirements.txt accept # comments, and comparable files in-tree already carry ASF headers (pytest/requirements.txt, tools/cost_model_evaluate/requirements.txt, extension/dbt-doris/tox.ini, etc.). Adding them to paths-ignore weakens license coverage instead of fixing the missing headers in-place.
|
/review |
There was a problem hiding this comment.
Blocking finding:
.github/workflows/license-eyes.yml: the PR path now makes a secondgh api pulls/{number}/filescall to computeacmr, but unlike the first call it has no fallback. GitHub Actions runsrun:steps underbash -e, so a transient API failure here (for example a 5xx or secondary rate limit) aborts the job even though we already had enough information to fall back to the full.licenserc.yamlscan. The old localgit diff --diff-filter=ACMRpath did not add this extra remote-failure point.
Critical checkpoint conclusions:
- Goal of current task: Mostly achieved. The PR adds the missing ASF headers and the new ignore entries for the touched data/license files look reasonable, but the workflow rewrite still leaves the PR license-check path less robust than intended.
- Minimality/focus: Yes. The changes are focused on license-check remediation.
- Concurrency: N/A for these workflow/config/CMake/text-file changes.
- Lifecycle/static initialization: N/A.
- Configuration changes: Yes. The workflow/config behavior changed, and the remaining issue is in that CI configuration path.
- Compatibility/incompatible change: No runtime, storage, or FE/BE compatibility impact.
- Parallel code paths: The
pull_request_targetincremental-check path is affected; push/manual paths still use the full scan. - Special conditional checks: The new
.licenserc.yamland>= 3000 filesfallbacks are good. The remaining concern is the unguarded second API call. - Test coverage: No dedicated validation was added for the new workflow branch or its failure handling.
- Test result changes: None.
- Observability: Existing workflow logs are sufficient for this scope.
- Transaction/persistence/data writes/FE-BE variable passing: N/A.
- Performance: Acceptable overall, but the extra API call adds avoidable failure surface.
- Other issues: I did not find problems in the added ASF headers themselves.
Requesting changes.
|
/review |
|
PR approved by at least one committer and no changes requested. |
There was a problem hiding this comment.
No blocking findings.
Critical checkpoint conclusions:
- Goal of the current task: Achieved. The PR adds the missing ASF headers to the reported files and updates the
pull_request_targetlicense workflow to enumerate changed files through the GitHub API, with full-scan fallbacks for API failure,.licenserc.yamlchanges, and the documented 3000-file API cap. The currentLicense Checkrun passed on this PR. - Minimality/focus: Yes. The change stays narrowly scoped to license-check remediation.
- Concurrency: Not applicable for these workflow, config, CMake, and text-file changes.
- Lifecycle/static initialization: Not applicable.
- Configuration changes: Yes, CI configuration changed. The new behavior is dynamically effective without restart and retains a safe full-scan fallback when incremental enumeration is incomplete or unavailable.
- Compatibility/incompatible changes: No runtime, FE/BE protocol, storage-format, or rolling-upgrade impact.
- Parallel code paths: The
pull_request_targetincremental path remains aligned with the push/manual full-scan behavior through the fallback path; I did not find another parallel path that also needs updating. - Special conditional checks: The
.licenserc.yamland>= 3000 filesguards are appropriate because they prevent running an incomplete incremental scan. - Test coverage: No dedicated tests were added, which is acceptable for this CI-only fix. Residual risk is limited to fallback branches that are not directly exercised by this PR.
- Test result changes: None.
- Observability: Existing workflow logs are sufficient for this scope.
- Transaction/persistence/data writes/FE-BE variable passing: Not applicable.
- Performance: Acceptable. The API-based file enumeration avoids the extra base-branch fetch and is not on a product hot path.
- Other issues: I did not find blocking problems in the new
.licenserc.yamlignore entries or the added ASF headers.
From an automated review perspective, this PR is approvable.
|
skip buildall |
### What problem does this PR solve? Issue Number: Related PR: apache#62536 Problem Summary: Empty commit to retrigger GitHub Actions so License Check uses the updated workflow merged in apache#62534. ### Release note None ### Check List (For Author) - Test: No need to test (CI) - Behavior changed: No - Does this need documentation: No
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)