Skip to content

[chore](build) Add code style enforcement documentation, skills, and clang-tidy script#62307

Merged
BiteTheDDDDt merged 1 commit intoapache:masterfrom
BiteTheDDDDt:add-code-style-skills-and-clang-tidy
Apr 10, 2026
Merged

[chore](build) Add code style enforcement documentation, skills, and clang-tidy script#62307
BiteTheDDDDt merged 1 commit intoapache:masterfrom
BiteTheDDDDt:add-code-style-skills-and-clang-tidy

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: N/A

Problem Summary:
The AGENTS.md and Copilot skills had no documentation about code style enforcement workflows for BE (C++ clang-format/clang-tidy) or FE (Java checkstyle). This made it unclear how and when to run formatting and static analysis tools before committing code changes.

This PR adds:

  1. A "Code Style Enforcement" subsection under "Coding Standards" in AGENTS.md covering BE clang-format, BE clang-tidy, and FE checkstyle.
  2. Three new skills: be-code-style, fe-code-style, clang-tidy-check, each providing step-by-step procedures for fixing style issues.
  3. A new build-support/run-clang-tidy.sh script that runs clang-tidy incrementally on only changed lines (not entire files) by parsing git diff, avoiding noise from pre-existing code.

Release note

None

Check List (For Author)

  • Test: Manual test
    • Verified check-format.sh, run-clang-tidy.sh (--help, --files, --base, --full, line-level filtering logic), and FE checkstyle all work correctly
  • Behavior changed: No
  • Does this need documentation: No (this IS documentation)

Copilot AI review requested due to automatic review settings April 10, 2026 03:00
@BiteTheDDDDt BiteTheDDDDt requested a review from zclllyybb as a code owner April 10, 2026 03:00
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation and “skills” guidance for enforcing code style in Apache Doris, and introduces a helper script to run clang-tidy on changed C++ code with reduced noise.

Changes:

  • Document BE clang-format/clang-tidy and FE Checkstyle workflows in AGENTS.md.
  • Add three new Copilot skills for BE formatting, FE Checkstyle, and clang-tidy usage.
  • Add build-support/run-clang-tidy.sh to run clang-tidy and (intended) filter warnings to changed lines.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
build-support/run-clang-tidy.sh New clang-tidy runner with intended changed-line warning filtering and optional --fix/--full modes
AGENTS.md Adds “Code Style Enforcement” section referencing the new workflows/skills
.claude/skills/be-code-style/SKILL.md Documents BE clang-format usage via existing build-support scripts
.claude/skills/fe-code-style/SKILL.md Documents FE Checkstyle usage and common violation fixes
.claude/skills/clang-tidy-check/SKILL.md Documents running clang-tidy via the new script and interpreting results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +60 to +74
usage() {
echo "Usage: $(basename "$0") [options]"
echo ""
echo "Run clang-tidy on newly added/modified C++ code."
echo "By default, only reports warnings on changed lines."
echo ""
echo "Options:"
echo " --base <ref> Git ref to diff against (default: auto-detect)"
echo " --files <f1> ... Check specific files (all lines, no line filter)"
echo " --build-dir <dir> Path to build dir with compile_commands.json"
echo " --fix Apply clang-tidy auto-fixes (changed lines only)"
echo " --full Report all warnings in changed files"
echo " -h, --help Show this help"
exit 0
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage() always exits with code 0. That means argument errors (e.g. unknown option) will still return success, which can hide mis-invocations in automation. Consider making usage accept an exit code (default 0) and calling it with non-zero for error cases, or explicitly exit 1 on unknown options.

Copilot uses AI. Check for mistakes.
Comment on lines +128 to +140
# Find compile_commands.json
if [[ -z "${BUILD_DIR}" ]]; then
for candidate in \
"${DORIS_HOME}/be/build_ASAN" \
"${DORIS_HOME}/be/build_Release" \
"${DORIS_HOME}/be/ut_build_ASAN" \
"${DORIS_HOME}/be/ut_build_Release"; do
if [[ -f "${candidate}/compile_commands.json" ]]; then
BUILD_DIR="${candidate}"
break
fi
done
fi
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-detection of compile_commands.json only checks be/build_* / be/ut_build_*, but this script also targets cloud/src and cloud/test. A BE compilation database typically won’t contain Cloud compile commands, so clang-tidy may fail to analyze Cloud files unless the user manually passes --build-dir (and the error is currently easy to miss). Consider detecting Cloud build dirs too (e.g. cloud/build_*, cloud/ut_build_*) or splitting checks by module/build-dir.

Copilot uses AI. Check for mistakes.
while IFS= read -r line; do
if [[ "$line" =~ ^diff\ --git\ a/(.+)\ b/(.+)$ ]]; then
current_file="${BASH_REMATCH[2]}"
elif [[ "$line" =~ ^@@.*\+([0-9]+)(,([0-9]+))?.*@@ ]]; then
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff hunk-header regex appears to require trailing spaces after the closing @@ (.*@@ ). Standard git diff hunk headers are @@ ... @@ or @@ ... @@ <context> (typically 0 or 1 space), so this likely won’t match and CHANGED_LINES_MAP will stay empty. That breaks line-level filtering (it will effectively treat everything as changed). Update the regex to match @@ regardless of trailing whitespace / context text.

Suggested change
elif [[ "$line" =~ ^@@.*\+([0-9]+)(,([0-9]+))?.*@@ ]]; then
elif [[ "$line" =~ ^@@.*\+([0-9]+)(,([0-9]+))?.*@@([[:space:]].*)?$ ]]; then

Copilot uses AI. Check for mistakes.
Comment on lines +246 to +254
is_line_changed() {
local file="$1"
local line_num="$2"
local ranges="${CHANGED_LINES_MAP["${file}"]}"

if [[ -z "${ranges}" ]]; then
# No range info (e.g., new file or --files mode) — treat all lines as changed
return 0
fi
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_line_changed() treats missing/empty range info as “all lines changed”. For diagnostics coming from headers or other included files that weren’t modified (and thus have no entry in CHANGED_LINES_MAP), this will incorrectly pass the filter and re-introduce noise. Consider treating unknown files as “not changed” by default, and/or using clang-tidy’s native -line-filter so diagnostics are only produced for the intended changed ranges.

Copilot uses AI. Check for mistakes.
Comment on lines +286 to +290
for f in "${FILTERED_FILES[@]}"; do
echo "=== Checking: ${f} ==="
OUTPUT=$("${CLANG_TIDY}" "${TIDY_ARGS[@]}" "${f}" 2>&1) || true

if echo "${OUTPUT}" | grep -q "warning:"; then
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy failures are currently ignored (... ) || true), and the script only looks for warning: in output. If clang-tidy emits error: (e.g. missing compile command for a file), this will be reported as “No warnings” and the script may exit 0, which defeats enforcement. Capture the clang-tidy exit status and/or scan for error:/fatal error: and fail the script when analysis couldn’t be performed.

Copilot uses AI. Check for mistakes.
# --base <ref> Git ref to diff against (default: auto-detect)
# --files <f1> <f2> Check specific files (all lines, no filtering)
# --build-dir <dir> Path to build dir with compile_commands.json
# --fix Apply clang-tidy auto-fixes (only on changed lines)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text claims --fix applies auto-fixes “(changed lines only)”, but the script doesn’t pass a -line-filter (or similar) to clang-tidy. In practice, clang-tidy may apply fixes anywhere in the file for diagnostics it emits, which can modify unrelated lines. Either implement a proper line filter (so both diagnostics and fixes are constrained) or adjust the --fix wording to avoid promising changed-line-only edits.

Suggested change
# --fix Apply clang-tidy auto-fixes (only on changed lines)
# --fix Apply clang-tidy auto-fixes for reported diagnostics

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +21
Run clang-tidy static analysis on **only the changed lines** of newly added or modified C++ files in the BE and Cloud modules, using the project's `.clang-tidy` configuration. The script parses `git diff` to identify changed line ranges and filters clang-tidy output accordingly, avoiding noise from pre-existing code.

## When to use me

- After building BE, before committing C++ code changes
- When CI reports clang-tidy warnings on your PR
- When you want to proactively check new code for common bugs and style issues

## Prerequisites

1. **BE must be built first** — clang-tidy needs `compile_commands.json` generated during the CMake build.
2. **clang-tidy must be installed** — the project uses clang-tidy from the LDB toolchain.

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skill claims clang-tidy runs on BE and Cloud changes, but the prerequisites/instructions assume only a BE build/BE compile_commands.json. In practice Cloud files need a Cloud compilation database (cloud/build_* or cloud/ut_build_*), and a single -p directory usually can’t cover both BE and Cloud at once. Please clarify the workflow (e.g., run twice with different --build-dir) or update the script to auto-handle Cloud build dirs.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +62
# Specify build directory explicitly
build-support/run-clang-tidy.sh --build-dir be/build_ASAN

# Apply auto-fixes where possible (only on changed lines)
build-support/run-clang-tidy.sh --fix
```
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --fix example says auto-fixes apply “only on changed lines”, but the current script doesn’t constrain clang-tidy diagnostics/fixes to a line filter; clang-tidy may rewrite other parts of the file. Either implement a real line filter in the script or adjust this wording so users aren’t surprised by broader edits.

Copilot uses AI. Check for mistakes.
Comment on lines +158 to +165
else
if [[ -n "${BASE_REF}" ]]; then
mapfile -t FILES_TO_CHECK < <(git diff --name-only --diff-filter=ACMR "${BASE_REF}" -- 'be/src/' 'be/test/' 'cloud/src/' 'cloud/test/' 2>/dev/null || true)
else
mapfile -t STAGED < <(git diff --cached --name-only --diff-filter=ACMR -- 'be/src/' 'be/test/' 'cloud/src/' 'cloud/test/' 2>/dev/null || true)
mapfile -t UNSTAGED < <(git diff --name-only --diff-filter=ACMR -- 'be/src/' 'be/test/' 'cloud/src/' 'cloud/test/' 2>/dev/null || true)
mapfile -t FILES_TO_CHECK < <(printf '%s\n' "${STAGED[@]}" "${UNSTAGED[@]}" | sort -u | grep -v '^$')
fi
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several git diff calls suppress errors (2>/dev/null || true). If --base is invalid (typo, missing remote ref), the script will silently act as if there are no changed files and exit 0. Consider surfacing git errors (at least for --base mode) and failing fast so users don’t get a false pass.

Copilot uses AI. Check for mistakes.
zclllyybb
zclllyybb previously approved these changes Apr 10, 2026
Copy link
Copy Markdown
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@BiteTheDDDDt BiteTheDDDDt force-pushed the add-code-style-skills-and-clang-tidy branch from bff5c2b to ff6635c Compare April 10, 2026 05:27
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 10, 2026
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found 2 issues in build-support/run-clang-tidy.sh that make the new incremental clang-tidy workflow report the wrong diagnostics in common review flows.\n\nFindings:\n1. build-support/run-clang-tidy.sh:172 and build-support/run-clang-tidy.sh:228 use git diff "${BASE_REF}" directly. In the documented --base origin/master workflow, that compares against the current tip of the base branch instead of the merge-base, so once origin/master moves forward the script starts analyzing files/lines changed only on the base branch rather than only this branch's changes.\n2. build-support/run-clang-tidy.sh:336-337 falls back to is_line_changed "${f}" "${local_line}" when the emitted diagnostic path does not match the diff map. That ignores the diagnostic's file path entirely, so a warning from an included header is incorrectly reported whenever the current source file also changed the same line number. This breaks the script's stated guarantee that header noise is filtered out.\n\nCritical checkpoint conclusions:\n- Goal and proof: The PR goal is to document and enforce code-style workflows, but the new clang-tidy helper does not yet reliably scope diagnostics to the current change set. The manual testing described in the PR did not cover these cases.\n- Scope/minimality: The change set is small and focused.\n- Concurrency: Not applicable; no concurrent logic added.\n- Lifecycle/static init: Not applicable.\n- Configuration: No new runtime config added.\n- Compatibility: No FE/BE protocol or storage compatibility impact.\n- Parallel code paths: The problematic --base logic is duplicated in both file discovery and line-range construction, so both paths need the same fix.\n- Special conditions/checks: The conditional path normalization logic is not sufficient; the fallback currently admits false positives.\n- Test coverage: Insufficient for the new script behavior. Cases missing include a branch that has diverged from the chosen base ref and a clang-tidy diagnostic emitted from an included header.\n- Observability: Existing output is adequate for a helper script.\n- Transaction/persistence/data-write concerns: Not applicable.\n- FE/BE variable passing: Not applicable.\n- Performance: No material concern found beyond the correctness issues above.\n- Other issues: None beyond the two findings above.

echo "Error: invalid git ref '${BASE_REF}'. Check for typos or fetch the remote first."
exit 1
fi
mapfile -t FILES_TO_CHECK < <(git diff --name-only --diff-filter=ACMR "${BASE_REF}" -- 'be/src/' 'be/test/' 'cloud/src/' 'cloud/test/')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using git diff "${BASE_REF}" here compares the working tree directly against the tip of the base ref, not against the branch point. In the documented --base origin/master PR-review workflow, that means once origin/master advances, this starts picking up files changed only on the base branch and can report warnings for code the current branch never touched. Please diff from git merge-base "${BASE_REF}" HEAD instead, and reuse the same merge-base in build_changed_lines_map().

local_line="${BASH_REMATCH[2]}"
# Normalize: clang-tidy may emit absolute or relative paths
local_file_rel="${local_file#"${DORIS_HOME}/"}"
if is_line_changed "${local_file_rel}" "${local_line}" || \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback is_line_changed "${f}" "${local_line}" is too broad. If clang-tidy emits a diagnostic for an included header at line N, this branch still accepts it whenever the current source file also changed line N, even though the header itself is not in the diff. That reintroduces exactly the header noise the script is trying to filter out. The fallback needs to normalize and compare the file path, not ignore it.

@BiteTheDDDDt BiteTheDDDDt force-pushed the add-code-style-skills-and-clang-tidy branch from ff6635c to 52fbf44 Compare April 10, 2026 05:39
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 1 issue in the new run-clang-tidy.sh flow.

Critical checkpoints:

  • Goal of the task: Partially met. The documentation and skills are added, but the new clang-tidy helper does not correctly handle the repository's default .clang-tidy behavior, so the advertised workflow is not reliable yet.
  • Small, clear, focused change: Yes.
  • Concurrency: Not applicable.
  • Lifecycle / static initialization: Not applicable.
  • Configuration changes: No runtime configuration added.
  • Compatibility / storage / symbol changes: Not applicable.
  • Parallel code paths: Applicable. The problem affects the main diff-based review path and any workflow that relies on the shared diagnostic parser.
  • Special conditional checks: The fatal-error detection is too broad and conflates promoted diagnostics with real analysis failures.
  • Test coverage: Insufficient. Manual verification did not cover the common case where .clang-tidy promotes warnings to errors.
  • Observability: Sufficient for this helper script.
  • Transaction / persistence / data-write correctness: Not applicable.
  • FE/BE variable passing: Not applicable.
  • Performance: No significant concern found in this review.
  • Other issues: None beyond the inline finding.

OUTPUT=$("${CLANG_TIDY}" "${TIDY_ARGS[@]}" "${f}" 2>&1) || TIDY_EXIT=$?

# Detect fatal analysis errors (missing compile command, broken AST, etc.)
if echo "${OUTPUT}" | grep -qE "(^|: )(error|fatal error):"; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.clang-tidy in this repo sets WarningsAsErrors: "*", so ordinary findings are emitted as error: diagnostics with a [-warnings-as-errors] suffix. That means this branch will treat a normal clang-tidy hit as a fatal analysis failure, print Error: clang-tidy could not analyze ..., and skip the changed-line filtering/counting path entirely.

I reproduced this locally with clang-tidy --warnings-as-errors='*' on a trivial file: a modernize-use-nullptr diagnostic is printed as error: even though analysis succeeded. Please distinguish real clang/AST failures from promoted diagnostics instead of keying off any error: line.

…clang-tidy script

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: N/A

Problem Summary:
The AGENTS.md and Copilot skills had no documentation about code style
enforcement workflows for BE (C++ clang-format/clang-tidy) or FE (Java
checkstyle). This made it unclear how and when to run formatting and
static analysis tools before committing code changes.

This PR adds:
1. A "Code Style Enforcement" subsection under "Coding Standards" in
   AGENTS.md covering BE clang-format, BE clang-tidy, and FE checkstyle.
2. Three new skills: be-code-style, fe-code-style, clang-tidy-check,
   each providing step-by-step procedures for fixing style issues.
3. A new build-support/run-clang-tidy.sh script that runs clang-tidy
   incrementally on only changed lines (not entire files) by parsing
   git diff, avoiding noise from pre-existing code.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - Verified check-format.sh, clang-tidy script (--help, --files,
      --base, --full, line-level filtering), and FE checkstyle all work
- Behavior changed: No
- Does this need documentation: No (this IS documentation)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@BiteTheDDDDt BiteTheDDDDt force-pushed the add-code-style-skills-and-clang-tidy branch from 52fbf44 to c974ff9 Compare April 10, 2026 05:52
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@BiteTheDDDDt BiteTheDDDDt merged commit 3d75942 into apache:master Apr 10, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants