From 42c479c05b3502b8c83029ec0b23d221049d9c03 Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Tue, 7 Apr 2026 13:04:33 -0400 Subject: [PATCH 1/8] Add analyze-duplicates skill for jscpd-based duplication detection New skill that scans codebases and documentation for duplicated content using jscpd (token-based detection). Produces a consolidated Markdown report with an overview table, collapsible per-cluster details showing the duplicated fragments, and inline mediation recommendations (difficulty rating + refactoring strategy). Also generates an interactive HTML report via @jscpd/html-reporter. Includes generate-report.py helper that converts jscpd JSON output into the GitHub/Gitea-friendly report format. Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 --- README.md | 1 + analyze-duplicates/SKILL.md | 203 +++++++++++ analyze-duplicates/generate-report.py | 489 ++++++++++++++++++++++++++ 3 files changed, 693 insertions(+) create mode 100644 analyze-duplicates/SKILL.md create mode 100644 analyze-duplicates/generate-report.py diff --git a/README.md b/README.md index 3ed0805..d3b89ae 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,7 @@ software project maintenance, triage, and automation. | Skill | Description | |-------|-------------| +| [analyze-duplicates](analyze-duplicates/) | Detect code and documentation duplication using jscpd, generate a Markdown report with collapsible `
` sections (suitable for GitHub/Gitea issues), and propose a mediation plan with refactoring strategies. | | [github-project-status](github-project-status/) | Assess whether a GitHub project is healthy, in maintenance mode, stagnant, or abandoned. Checks commits, releases, issues, PRs, forks, and package registries to produce a structured status report. | | [introduce-codespell](introduce-codespell/) | Add [codespell](https://github.com/codespell-project/codespell) spell-checking to a project end-to-end: config, GitHub Actions workflow, pre-commit hook, exclusion tuning, ambiguous-typo review, and automated fixes via `datalad run`. | | [introduce-git-bug](introduce-git-bug/) | Set up [git-bug](https://github.com/git-bug/git-bug) distributed issue tracking: configure GitHub bridge, sync issues, push `refs/bugs/*`, and document the workflow in DEVELOPMENT.md / CLAUDE.md. | diff --git a/analyze-duplicates/SKILL.md b/analyze-duplicates/SKILL.md new file mode 100644 index 0000000..476f911 --- /dev/null +++ b/analyze-duplicates/SKILL.md @@ -0,0 +1,203 @@ +--- +name: analyze-duplicates +description: Analyze codebase or documentation for code/text duplication using jscpd. Generates a Markdown report with collapsible sections (suitable for GitHub/Gitea issues) showing duplicate clusters, statistics, and a mediation plan proposing refactoring strategies. +allowed-tools: Bash, Read, Write, Glob, Grep, Agent +user-invocable: true +--- + +# Analyze Duplicates + +Detect code and documentation duplication in one or more paths, produce a +Markdown report with `
` sections for posting as a GitHub/Gitea issue, +and propose a concrete mediation plan. + +## When to Use + +- User wants to find duplicated code or documentation in a project +- User asks to "check for duplicates", "find copy-paste code", "DRY audit" +- User mentions "jscpd", "duplicate detection", or "code clones" +- User runs `/analyze-duplicates` + +## Configuration + +| Variable | Default | Description | +|----------|---------|-------------| +| `MIN_LINES` | `6` | Minimum duplicate block size in lines | +| `MIN_TOKENS` | `50` | Minimum duplicate block size in tokens | +| `THRESHOLD` | `5` | Duplication percentage that flags a warning | +| `FORMATS` | (auto-detect) | Comma-separated jscpd format list (e.g., `python,markdown`) | + +## Arguments + +The skill accepts one or more paths to scan. If none are provided, scan the +current working directory. + +Optional flags (passed as part of the argument string): +- `--formats python,markdown` — override auto-detected formats +- `--min-lines N` — override MIN_LINES +- `--min-tokens N` — override MIN_TOKENS +- `--threshold N` — override warning threshold percentage +- `--output PATH` — where to write the report (default: `.jscpd-report.md` in first scanned path) +- `--cross-project` — when multiple paths given, also run a combined scan to find cross-project duplicates +- `--no-html` — skip generating the HTML report (default: generate it) +- `--badge` — also generate an SVG badge and embed it in the report (default: off) + +## Execution Steps + +### Step 0: Parse Arguments + +Parse the argument string. Extract paths (any arg not starting with `--`), +and optional flags. Apply defaults from Configuration for anything not specified. + +If no paths provided, use the current working directory. + +Create the `.tmp/` directory in the current working directory for intermediate +output. If `.tmp` is not already in `.gitignore`, add it (or warn the user). + +### Step 1: Ensure jscpd is Available + +Check if jscpd is available: + +```bash +command -v jscpd || npx --yes jscpd@latest --version +``` + +If neither works, report the error and stop: +> jscpd not found. Install via `npm install -g jscpd` or ensure `npx` is available. + +### Step 2: Detect Project Context + +For each scan path: +1. Check if it is a git repository (`git -C PATH rev-parse --is-inside-work-tree`) +2. Detect primary languages by file extension counts (`.py` -> python, `.js/.ts` -> javascript/typescript, `.md` -> markdown, etc.) +3. If `--formats` was specified, use that instead of auto-detection +4. Note the project name from the directory basename (or git remote if available) + +### Step 3: Run jscpd + +For each scan path, run jscpd with JSON, HTML, and badge reporters: + +```bash +npx --yes jscpd@latest \ + --min-lines MIN_LINES \ + --min-tokens MIN_TOKENS \ + --reporters "json,html" \ + --output .tmp/jscpd-PROJECTNAME \ + --ignore "**/.tox/**,**/venv*/**,**/.venv/**,**/node_modules/**,**/__pycache__/**,**/build/**,**/dist/**,**/.eggs/**,**/.git/**,**/.npm/**,**/.tmp/**" \ + PATH +``` + +If `--no-html` is set, omit `html` from reporters. If `--badge` is set, add `badge` to reporters. +If `--formats` is set, add `--format FORMATS`. + +This produces: +- `.tmp/jscpd-PROJECTNAME/jscpd-report.json` — structured data for the markdown report +- `.tmp/jscpd-PROJECTNAME/html/index.html` — interactive HTML report with syntax highlighting +- `.tmp/jscpd-PROJECTNAME/jscpd-badge.svg` — shields.io-style badge showing duplication % (only with `--badge`) + +If `--cross-project` and multiple paths: after individual scans, create a +temporary parent directory with symlinks to all paths and run one combined scan. + +### Step 4: Parse Results and Generate Report + +Read each `.tmp/jscpd-PROJECTNAME/jscpd-report.json` and generate the report +using the helper script: + +```bash +python3 SKILL_DIR/generate-report.py \ + --threshold THRESHOLD \ + --output REPORT_PATH \ + --jscpd-version "$(npx --yes jscpd@latest --version 2>/dev/null)" \ + [--cross-project .tmp/jscpd-combined/jscpd-report.json] \ + .tmp/jscpd-PROJECT1/jscpd-report.json \ + [.tmp/jscpd-PROJECT2/jscpd-report.json ...] +``` + +Where `SKILL_DIR` is the directory containing this SKILL.md file. Resolve it +by searching for `generate-report.py` in `~/.claude/skills/analyze-duplicates/`. + +If `--badge` was requested and a badge was generated, pass `--badge-path` with +a relative path to the SVG. Copy the badge SVG to the output directory so both +files are co-located. + +### Step 5: Review and Enhance Mediation Plan + +The `generate-report.py` script already produces a `## Mediation Plan` section +with heuristic classifications (trivial/easy/moderate/hard) and strategies +for each cluster. After the report is generated: + +1. Read the generated report and the duplicated fragments +2. For each cluster, **verify** the heuristic recommendation makes sense in + context — read the actual source files around the duplicated lines if needed +3. For **easy/trivial** clusters: add a concrete diff or pseudo-diff showing + the proposed refactoring (extract function, parametrize test, etc.) +4. For **moderate/hard** clusters: enhance the description with specifics + about what the shared abstraction should look like +5. Adjust difficulty ratings if the heuristic got it wrong (e.g., what looks + like a simple extract may actually involve different signatures) + +### Step 6: Present Results + +1. Print a brief summary to the console: + - Total duplication percentage per project + - Number of clone clusters found + - Whether threshold was exceeded +2. Print paths to all generated artifacts: + - Markdown report (the primary deliverable, suitable for GitHub issues) + - HTML report directory (interactive browser view with syntax highlighting) + - Badge SVG path (only if `--badge` was used) +3. If duplication exceeds the threshold, note this prominently + +## Report Format + +The report MUST be a Markdown file using `
` blocks so it +renders well when posted as a GitHub/Gitea issue. Structure: + +```markdown +# Duplication Analysis Report + +> Generated: YYYY-MM-DD | Tool: jscpd VERSION | Threshold: N% + +## Summary + +| Project | Files | Lines | Clones | Duplicated Lines | Percentage | +|------------|------:|------:|-------:|-----------------:|-----------:| +| my-project | 42 | 12000 | 5 | 83 | 0.69% | + +> Duplication is within the 5% threshold for all projects. + +## Duplicate Clusters + +| # | Lines | Difficulty | Strategy | Files | +|---|-------|-------------------|-------------------------------|---------| +| 1 | 8 | Trivial | Extract local helper function | file.py | + +
+Cluster 1: [Trivial] `file.py` lines 10-18 +↔ `file.py` lines 30-38 (8 lines) + +**Files involved:** +- `file.py` (lines 10-18) +- `file.py` (lines 30-38) + +**Duplicated fragment:** +~~~python + +~~~ + +**Mediation** (Trivial): Extract local helper function + +> Duplicated logic within `file.py`. Extract into a private function +> in the same module. + +
+``` + +## Commit Co-Authorship + +All commits created during this workflow MUST include a `Co-Authored-By` trailer. +Get the version via `claude --version`. Format: + +``` +Co-Authored-By: Claude Code / Claude +``` diff --git a/analyze-duplicates/generate-report.py b/analyze-duplicates/generate-report.py new file mode 100644 index 0000000..28722da --- /dev/null +++ b/analyze-duplicates/generate-report.py @@ -0,0 +1,489 @@ +#!/usr/bin/env python3 +"""Generate a Markdown duplication report from jscpd JSON output. + +Reads one or more jscpd-report.json files and produces a GitHub/Gitea-friendly +Markdown report with
sections for each duplicate cluster. + +Usage: + python3 generate-report.py [OPTIONS] REPORT_JSON [REPORT_JSON ...] + +Options: + --threshold N Duplication warning threshold percentage (default: 5) + --output PATH Output markdown file (default: stdout) + --cross-project PATH Include cross-project scan results from this JSON file +""" + +import argparse +import json +import sys +from datetime import datetime, timezone +from pathlib import Path + + +def load_report(path): + """Load a jscpd JSON report.""" + try: + with open(path) as f: + return json.load(f) + except (OSError, json.JSONDecodeError) as exc: + print(f"Error loading {path}: {exc}", file=sys.stderr) + sys.exit(1) + + +def guess_project_name(report_path): + """Infer project name from the report's parent directory name.""" + parent = Path(report_path).parent.name + # Strip jscpd- prefix if present + if parent.startswith("jscpd-"): + return parent[6:] + return parent + + +def format_language(fmt): + """Map jscpd format names to markdown code fence language hints.""" + mapping = { + "python": "python", + "javascript": "javascript", + "typescript": "typescript", + "markup": "html", + "markdown": "markdown", + "yaml": "yaml", + "json": "json", + "css": "css", + "go": "go", + "rust": "rust", + "java": "java", + "csharp": "csharp", + "ruby": "ruby", + "bash": "bash", + "shell": "bash", + } + return mapping.get(fmt, "") + + +def render_summary_table(projects): + """Render the summary table with human-aligned columns.""" + headers = ["Project", "Files", "Lines", "Clones", "Duplicated Lines", "Percentage"] + rows = [] + for p in projects: + stats = p["stats"] + rows.append([ + p["name"], + str(stats.get("sources", 0)), + str(stats.get("lines", 0)), + str(stats.get("clones", 0)), + str(stats.get("duplicatedLines", 0)), + f"{stats.get('percentage', 0.0):.2f}%", + ]) + + # Compute column widths (max of header and all row values) + widths = [len(h) for h in headers] + for row in rows: + for i, cell in enumerate(row): + widths[i] = max(widths[i], len(cell)) + + # First column left-aligned, rest right-aligned + def fmt_row(cells): + parts = [] + for i, cell in enumerate(cells): + if i == 0: + parts.append(f" {cell:<{widths[i]}} ") + else: + parts.append(f" {cell:>{widths[i]}} ") + return "|" + "|".join(parts) + "|" + + def fmt_sep(): + parts = [] + for i, w in enumerate(widths): + if i == 0: + parts.append("-" * (w + 2)) + else: + parts.append("-" * (w + 1) + ":") + return "|" + "|".join(parts) + "|" + + lines = [fmt_row(headers), fmt_sep()] + for row in rows: + lines.append(fmt_row(row)) + return "\n".join(lines) + + +def truncate_fragment(fragment, max_lines=30): + """Truncate long code fragments for readability.""" + lines = fragment.splitlines() + if len(lines) <= max_lines: + return fragment + kept = lines[:max_lines] + kept.append(f"... ({len(lines) - max_lines} more lines)") + return "\n".join(kept) + + +def classify_cluster(dup): + """Classify a duplicate cluster and propose mediation strategy. + + Returns (difficulty, strategy, rationale) where difficulty is one of: + trivial, easy, moderate, hard. + """ + first_name = dup["firstFile"]["name"] + second_name = dup["secondFile"]["name"] + n_lines = dup.get("lines", 0) + fmt = dup.get("format", "") + same_file = first_name == second_name + + # Detect test files + is_test = any( + t in n for n in (first_name, second_name) + for t in ("test_", "tests/", "_test.", "conftest", "spec.", "spec/") + ) + + # Detect documentation / markdown + is_docs = fmt in ("markdown", "markup") or any( + n.endswith((".md", ".rst", ".adoc")) for n in (first_name, second_name) + ) + + # Same directory? + first_dir = "/".join(first_name.split("/")[:-1]) + second_dir = "/".join(second_name.split("/")[:-1]) + same_dir = first_dir == second_dir + + if is_docs: + if same_file: + return ( + "easy", + "Consolidate repeated sections within this document", + "Same content appears multiple times in one file. " + "Merge into a single section and add internal cross-references.", + ) + return ( + "moderate", + "Create a canonical section and cross-reference", + "Duplicated documentation across files. Extract shared content " + "into a single authoritative location and reference it " + "(e.g., includes, links, or shortcodes).", + ) + + if is_test: + if same_file: + difficulty = "easy" if n_lines <= 10 else "moderate" + return ( + difficulty, + "Extract test fixture or parametrize", + "Duplicated test setup/assertions within one test file. " + "Use `@pytest.fixture`, `@pytest.mark.parametrize`, " + "or a helper function to share the common pattern.", + ) + return ( + "moderate", + "Extract shared test fixture to conftest.py", + "Duplicated test code across files. Move common setup into " + "`conftest.py` as a shared fixture, or into a test utilities module.", + ) + + if same_file: + difficulty = "trivial" if n_lines <= 8 else "easy" + return ( + difficulty, + "Extract local helper function", + f"Duplicated logic within `{first_name.split('/')[-1]}`. " + "Extract into a private function in the same module.", + ) + + if same_dir: + difficulty = "easy" if n_lines <= 10 else "moderate" + return ( + difficulty, + "Extract shared function into sibling module", + f"Duplicated code in same package (`{first_dir or './'}`). " + "Extract into a shared utility module within the package.", + ) + + # Different directories / packages + difficulty = "moderate" if n_lines <= 15 else "hard" + return ( + difficulty, + "Extract into shared library or utils package", + "Duplicated code across different packages. Consider a shared " + "utility module or library that both can import.", + ) + + +DIFFICULTY_LABELS = { + "trivial": "Trivial", + "easy": "Easy", + "moderate": "Moderate", + "hard": "Hard", +} + + +def render_overview_table(all_dups): + """Render a compact overview table of all clusters with mediation info.""" + headers = ["#", "Lines", "Difficulty", "Strategy", "Files"] + rows = [] + for i, (_proj, dup) in enumerate(all_dups, 1): + difficulty, strategy, _rationale = classify_cluster(dup) + first_short = dup["firstFile"]["name"].rsplit("/", 1)[-1] + second_short = dup["secondFile"]["name"].rsplit("/", 1)[-1] + if first_short == second_short: + files_str = first_short + else: + files_str = f"{first_short} / {second_short}" + label = DIFFICULTY_LABELS.get(difficulty, difficulty) + rows.append([ + str(i), + str(dup.get("lines", 0)), + label, + strategy, + files_str, + ]) + + widths = [len(h) for h in headers] + for row in rows: + for j, cell in enumerate(row): + widths[j] = max(widths[j], len(cell)) + + def fmt_row(cells): + parts = [] + for j, cell in enumerate(cells): + parts.append(f" {cell:<{widths[j]}} ") + return "|" + "|".join(parts) + "|" + + lines = [ + fmt_row(headers), + "|" + "|".join("-" * (w + 2) for w in widths) + "|", + ] + for row in rows: + lines.append(fmt_row(row)) + return "\n".join(lines) + + +def render_cluster(idx, dup, prefix=""): + """Render a single duplicate cluster as a
block with mediation.""" + fmt = dup.get("format", "") + lang = format_language(fmt) + first = dup["firstFile"] + second = dup["secondFile"] + n_lines = dup.get("lines", 0) + + first_name = first["name"] + second_name = second["name"] + first_range = f"lines {first['start']}-{first['end']}" + second_range = f"lines {second['start']}-{second['end']}" + + difficulty, strategy, rationale = classify_cluster(dup) + diff_label = DIFFICULTY_LABELS.get(difficulty, difficulty) + + label = f"{prefix}Cluster {idx}" + summary = ( + f"[{diff_label}] " + f"`{first_name}` {first_range} " + f"↔ `{second_name}` {second_range} " + f"({n_lines} lines)" + ) + + fragment = dup.get("fragment", "") + fragment = truncate_fragment(fragment) + + block = [ + "
", + f"{label}: {summary}", + "", + "**Files involved:**", + f"- `{first_name}` ({first_range})", + f"- `{second_name}` ({second_range})", + "", + ] + + if fragment.strip(): + fence = "~~~" + while fence in fragment: + fence += "~" + block.extend([ + "**Duplicated fragment:**", + f"{fence}{lang}", + fragment, + fence, + "", + ]) + + # Inline mediation recommendation + block.extend([ + f"**Mediation** ({diff_label}): {strategy}", + "", + f"> {rationale}", + "", + ]) + + block.extend(["
", ""]) + return "\n".join(block) + + +def render_report(projects, threshold, cross_project=None, jscpd_version=None, + badge_path=None): + """Render the full Markdown report.""" + now = datetime.now(timezone.utc).strftime("%Y-%m-%d") + version_str = jscpd_version or "unknown" + + parts = [ + "# Duplication Analysis Report", + "", + f"> Generated: {now} | Tool: jscpd {version_str} | Threshold: {threshold}%", + "", + ] + + if badge_path: + parts.append(f"![Copy/Paste]({badge_path})") + parts.append("") + + parts.extend([ + "## Summary", + "", + render_summary_table(projects), + "", + ]) + + # Status badge + any_over = any(p["stats"]["percentage"] > threshold for p in projects) + if any_over: + over = [p for p in projects if p["stats"]["percentage"] > threshold] + names = ", ".join(p["name"] for p in over) + parts.append( + f"> **WARNING**: Duplication exceeds {threshold}% threshold in: {names}" + ) + else: + parts.append( + f"> Duplication is within the {threshold}% threshold for all projects." + ) + parts.append("") + + # Collect all duplicates for the overview table + all_dups = [] + for p in projects: + for dup in sorted( + p.get("duplicates", []), + key=lambda d: d.get("lines", 0), + reverse=True, + ): + all_dups.append((p["name"], dup)) + if cross_project: + for dup in sorted( + cross_project.get("duplicates", []), + key=lambda d: d.get("lines", 0), + reverse=True, + ): + all_dups.append(("cross-project", dup)) + + # Duplicate Clusters section with overview table at the top + parts.append("## Duplicate Clusters") + parts.append("") + + if not all_dups: + parts.append("No duplicates found.") + parts.append("") + return "\n".join(parts) + + # Overview table + parts.append(render_overview_table(all_dups)) + parts.append("") + + # Per-project cluster details + global_idx = 1 + for p in projects: + if len(projects) > 1: + parts.append(f"### {p['name']}") + parts.append("") + + duplicates = p.get("duplicates", []) + if not duplicates: + parts.append("No duplicates found.") + parts.append("") + continue + + duplicates = sorted(duplicates, key=lambda d: d.get("lines", 0), reverse=True) + + for dup in duplicates: + parts.append(render_cluster(global_idx, dup)) + global_idx += 1 + + # Cross-project section + if cross_project: + cross_dups = cross_project.get("duplicates", []) + if cross_dups: + parts.append("### Cross-Project Duplicates") + parts.append("") + cross_dups = sorted( + cross_dups, key=lambda d: d.get("lines", 0), reverse=True + ) + for i, dup in enumerate(cross_dups, 1): + parts.append(render_cluster(global_idx, dup, prefix="Cross-project ")) + global_idx += 1 + + return "\n".join(parts) + + +def main(): + parser = argparse.ArgumentParser( + description="Generate Markdown duplication report from jscpd JSON" + ) + parser.add_argument( + "reports", nargs="+", help="Path(s) to jscpd-report.json files" + ) + parser.add_argument( + "--threshold", + type=float, + default=5.0, + help="Duplication warning threshold percentage (default: 5)", + ) + parser.add_argument( + "--output", + default=None, + help="Output markdown file (default: stdout)", + ) + parser.add_argument( + "--cross-project", + default=None, + help="Path to cross-project jscpd-report.json", + ) + parser.add_argument( + "--jscpd-version", + default=None, + help="jscpd version string for the report header", + ) + parser.add_argument( + "--badge-path", + default=None, + help="Relative path to the jscpd-badge.svg for embedding in report", + ) + args = parser.parse_args() + + projects = [] + for rpath in args.reports: + data = load_report(rpath) + name = guess_project_name(rpath) + projects.append({ + "name": name, + "stats": data.get("statistics", {}).get("total", {}), + "duplicates": data.get("duplicates", []), + }) + + cross_project_data = None + if args.cross_project: + cross_project_data = load_report(args.cross_project) + + report = render_report( + projects, + args.threshold, + cross_project=cross_project_data, + jscpd_version=args.jscpd_version, + badge_path=args.badge_path, + ) + + if args.output: + Path(args.output).parent.mkdir(parents=True, exist_ok=True) + with open(args.output, "w") as f: + f.write(report) + print(f"Report written to: {args.output}", file=sys.stderr) + else: + print(report) + + +if __name__ == "__main__": + main() From 4453aea47c6420f3a3e7ec849f5e7f8a0f215e8f Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Tue, 7 Apr 2026 13:08:54 -0400 Subject: [PATCH 2/8] analyze-duplicates: use local .tmp/, link files to remote, C not # - Use .tmp/ in the scanned project instead of /tmp/ for intermediate output (jscpd JSON, HTML reports) - Use "C" instead of "#" as the cluster column header to avoid GitHub auto-linking to issues/PRs - File references in cluster details are now hyperlinks to the file on the tracked remote (e.g., GitHub blob URL with line anchors) - Auto-detects repo URL and branch from git remote; also accepts --repo-url, --branch, --scan-path overrides Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 --- analyze-duplicates/SKILL.md | 11 +-- analyze-duplicates/generate-report.py | 100 +++++++++++++++++++++++--- 2 files changed, 95 insertions(+), 16 deletions(-) diff --git a/analyze-duplicates/SKILL.md b/analyze-duplicates/SKILL.md index 476f911..027b148 100644 --- a/analyze-duplicates/SKILL.md +++ b/analyze-duplicates/SKILL.md @@ -108,6 +108,7 @@ python3 SKILL_DIR/generate-report.py \ --threshold THRESHOLD \ --output REPORT_PATH \ --jscpd-version "$(npx --yes jscpd@latest --version 2>/dev/null)" \ + --scan-path PATH \ [--cross-project .tmp/jscpd-combined/jscpd-report.json] \ .tmp/jscpd-PROJECT1/jscpd-report.json \ [.tmp/jscpd-PROJECT2/jscpd-report.json ...] @@ -168,17 +169,17 @@ renders well when posted as a GitHub/Gitea issue. Structure: ## Duplicate Clusters -| # | Lines | Difficulty | Strategy | Files | -|---|-------|-------------------|-------------------------------|---------| -| 1 | 8 | Trivial | Extract local helper function | file.py | +| C | Lines | Difficulty | Strategy | Files | +|---|-------|------------|-------------------------------|---------| +| 1 | 8 | Trivial | Extract local helper function | file.py |
Cluster 1: [Trivial] `file.py` lines 10-18 ↔ `file.py` lines 30-38 (8 lines) **Files involved:** -- `file.py` (lines 10-18) -- `file.py` (lines 30-38) +- [`file.py` (lines 10-18)](https://github.com/owner/repo/blob/main/file.py#L10-L18) +- [`file.py` (lines 30-38)](https://github.com/owner/repo/blob/main/file.py#L30-L38) **Duplicated fragment:** ~~~python diff --git a/analyze-duplicates/generate-report.py b/analyze-duplicates/generate-report.py index 28722da..2276ae3 100644 --- a/analyze-duplicates/generate-report.py +++ b/analyze-duplicates/generate-report.py @@ -15,11 +15,54 @@ import argparse import json +import re +import subprocess import sys from datetime import datetime, timezone from pathlib import Path +def detect_git_info(scan_path): + """Try to detect the remote browse URL and branch for a git repo.""" + try: + branch = subprocess.run( + ["git", "-C", scan_path, "rev-parse", "--abbrev-ref", "HEAD"], + capture_output=True, text=True, timeout=5, + ).stdout.strip() + # Find the remote that the branch tracks, fall back to origin + tracking_remote = subprocess.run( + ["git", "-C", scan_path, "config", + f"branch.{branch}.remote"], + capture_output=True, text=True, timeout=5, + ).stdout.strip() or "origin" + remote_url = subprocess.run( + ["git", "-C", scan_path, "remote", "get-url", tracking_remote], + capture_output=True, text=True, timeout=5, + ).stdout.strip() + if not remote_url or not branch: + return None, None + # Convert git@ or https:// URL to browse URL + browse_url = remote_url + browse_url = re.sub(r"\.git$", "", browse_url) + browse_url = re.sub( + r"^git@([^:]+):", r"https://\1/", browse_url + ) + return browse_url, branch + except (subprocess.SubprocessError, OSError): + return None, None + + +def file_link(name, start, end, repo_url, branch): + """Format a file reference, as a hyperlink if repo info is available.""" + label = f"`{name}` (lines {start}-{end})" + if repo_url and branch: + # Strip leading ./ or ../ — jscpd paths are relative to scan dir + clean = re.sub(r"^(\.\./?)+" , "", name) + url = f"{repo_url}/blob/{branch}/{clean}#L{start}-L{end}" + return f"[{label}]({url})" + return label + + def load_report(path): """Load a jscpd JSON report.""" try: @@ -216,7 +259,7 @@ def classify_cluster(dup): def render_overview_table(all_dups): """Render a compact overview table of all clusters with mediation info.""" - headers = ["#", "Lines", "Difficulty", "Strategy", "Files"] + headers = ["C", "Lines", "Difficulty", "Strategy", "Files"] rows = [] for i, (_proj, dup) in enumerate(all_dups, 1): difficulty, strategy, _rationale = classify_cluster(dup) @@ -255,7 +298,7 @@ def fmt_row(cells): return "\n".join(lines) -def render_cluster(idx, dup, prefix=""): +def render_cluster(idx, dup, prefix="", repo_url=None, branch=None): """Render a single duplicate cluster as a
block with mediation.""" fmt = dup.get("format", "") lang = format_language(fmt) @@ -265,17 +308,21 @@ def render_cluster(idx, dup, prefix=""): first_name = first["name"] second_name = second["name"] - first_range = f"lines {first['start']}-{first['end']}" - second_range = f"lines {second['start']}-{second['end']}" + + first_link = file_link(first_name, first["start"], first["end"], + repo_url, branch) + second_link = file_link(second_name, second["start"], second["end"], + repo_url, branch) difficulty, strategy, rationale = classify_cluster(dup) diff_label = DIFFICULTY_LABELS.get(difficulty, difficulty) label = f"{prefix}Cluster {idx}" + # Summary line uses plain text (no links — they don't work inside ) summary = ( f"[{diff_label}] " - f"`{first_name}` {first_range} " - f"↔ `{second_name}` {second_range} " + f"`{first_name}` lines {first['start']}-{first['end']} " + f"↔ `{second_name}` lines {second['start']}-{second['end']} " f"({n_lines} lines)" ) @@ -287,8 +334,8 @@ def render_cluster(idx, dup, prefix=""): f"{label}: {summary}", "", "**Files involved:**", - f"- `{first_name}` ({first_range})", - f"- `{second_name}` ({second_range})", + f"- {first_link}", + f"- {second_link}", "", ] @@ -317,7 +364,7 @@ def render_cluster(idx, dup, prefix=""): def render_report(projects, threshold, cross_project=None, jscpd_version=None, - badge_path=None): + badge_path=None, repo_url=None, branch=None): """Render the full Markdown report.""" now = datetime.now(timezone.utc).strftime("%Y-%m-%d") version_str = jscpd_version or "unknown" @@ -400,7 +447,8 @@ def render_report(projects, threshold, cross_project=None, jscpd_version=None, duplicates = sorted(duplicates, key=lambda d: d.get("lines", 0), reverse=True) for dup in duplicates: - parts.append(render_cluster(global_idx, dup)) + parts.append(render_cluster(global_idx, dup, + repo_url=repo_url, branch=branch)) global_idx += 1 # Cross-project section @@ -413,7 +461,9 @@ def render_report(projects, threshold, cross_project=None, jscpd_version=None, cross_dups, key=lambda d: d.get("lines", 0), reverse=True ) for i, dup in enumerate(cross_dups, 1): - parts.append(render_cluster(global_idx, dup, prefix="Cross-project ")) + parts.append(render_cluster(global_idx, dup, + prefix="Cross-project ", + repo_url=repo_url, branch=branch)) global_idx += 1 return "\n".join(parts) @@ -452,8 +502,34 @@ def main(): default=None, help="Relative path to the jscpd-badge.svg for embedding in report", ) + parser.add_argument( + "--repo-url", + default=None, + help="Repository browse URL (e.g., https://github.com/owner/repo). " + "Auto-detected from git remote if not specified.", + ) + parser.add_argument( + "--branch", + default=None, + help="Branch name for file links. Auto-detected from git if not specified.", + ) + parser.add_argument( + "--scan-path", + default=".", + help="Path that was scanned (used for git auto-detection, default: .)", + ) args = parser.parse_args() + # Auto-detect repo URL and branch if not provided + repo_url = args.repo_url + branch = args.branch + if not repo_url or not branch: + auto_url, auto_branch = detect_git_info(args.scan_path) + if not repo_url: + repo_url = auto_url + if not branch: + branch = auto_branch + projects = [] for rpath in args.reports: data = load_report(rpath) @@ -474,6 +550,8 @@ def main(): cross_project=cross_project_data, jscpd_version=args.jscpd_version, badge_path=args.badge_path, + repo_url=repo_url, + branch=branch, ) if args.output: From 5c0ff9c3b47cb95afef46183181ba742c53e60df Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Tue, 7 Apr 2026 13:35:07 -0400 Subject: [PATCH 3/8] analyze-duplicates: smarter ignores, asset detection, better file labels MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - SKILL.md: don't blindly ignore build/ and dist/ — check if they're git-tracked first; find and ignore symlinks to avoid false positives - classify_cluster: detect asset files (SVG, images, fonts) and suggest keeping one copy + symlink/reference - Overview table: when two files share the same basename but live in different directories, show parent/file (e.g., configure-domain/SKILL.md / configure-domain/SKILL.md) to distinguish copies across build/dist Tested on smestern/sciagent which has diverged skill copies across build/, dist/, and templates/ directories. Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 --- analyze-duplicates/SKILL.md | 14 ++++++++++++- analyze-duplicates/generate-report.py | 29 ++++++++++++++++++++++++--- 2 files changed, 39 insertions(+), 4 deletions(-) diff --git a/analyze-duplicates/SKILL.md b/analyze-duplicates/SKILL.md index 027b148..b5226b0 100644 --- a/analyze-duplicates/SKILL.md +++ b/analyze-duplicates/SKILL.md @@ -83,13 +83,25 @@ npx --yes jscpd@latest \ --min-tokens MIN_TOKENS \ --reporters "json,html" \ --output .tmp/jscpd-PROJECTNAME \ - --ignore "**/.tox/**,**/venv*/**,**/.venv/**,**/node_modules/**,**/__pycache__/**,**/build/**,**/dist/**,**/.eggs/**,**/.git/**,**/.npm/**,**/.tmp/**" \ + --ignore "IGNORE_PATTERNS" \ PATH ``` If `--no-html` is set, omit `html` from reporters. If `--badge` is set, add `badge` to reporters. If `--formats` is set, add `--format FORMATS`. +**Building the ignore list** — start with these safe defaults: +`**/.tox/**,**/venv*/**,**/.venv/**,**/node_modules/**,**/__pycache__/**,**/.eggs/**,**/.git/**,**/.npm/**,**/.tmp/**` + +Then for each of `build/`, `dist/`, `.eggs/`: +- Check if the directory is **tracked by git** (`git ls-files --error-unmatch DIR/ 2>/dev/null`) +- If tracked: do NOT ignore it (it's intentionally committed content) +- If untracked: add it to the ignore list + +Additionally, find all **symlinks** in the scan path (`find PATH -type l`) and +add ignore patterns for them (e.g., `**/symlinked-dir/**`). Symlinked content +is intentionally shared — duplicates from symlinks are noise, not bugs. + This produces: - `.tmp/jscpd-PROJECTNAME/jscpd-report.json` — structured data for the markdown report - `.tmp/jscpd-PROJECTNAME/html/index.html` — interactive HTML report with syntax highlighting diff --git a/analyze-duplicates/generate-report.py b/analyze-duplicates/generate-report.py index 2276ae3..332f7e3 100644 --- a/analyze-duplicates/generate-report.py +++ b/analyze-duplicates/generate-report.py @@ -178,6 +178,22 @@ def classify_cluster(dup): for t in ("test_", "tests/", "_test.", "conftest", "spec.", "spec/") ) + # Detect generated / binary-like artifacts (SVG, images, configs) + is_asset = any( + n.endswith((".svg", ".png", ".jpg", ".ico", ".woff", ".woff2", ".eot", ".ttf")) + for n in (first_name, second_name) + ) + if is_asset: + if same_file: + return ("trivial", "Internal duplication in asset file", + "Repeated content within an asset file. Usually harmless.") + return ( + "easy", + "Deduplicate asset — keep one copy and reference it", + "Same asset committed in multiple locations. " + "Keep a single canonical copy and reference/symlink from other locations.", + ) + # Detect documentation / markdown is_docs = fmt in ("markdown", "markup") or any( n.endswith((".md", ".rst", ".adoc")) for n in (first_name, second_name) @@ -263,10 +279,17 @@ def render_overview_table(all_dups): rows = [] for i, (_proj, dup) in enumerate(all_dups, 1): difficulty, strategy, _rationale = classify_cluster(dup) - first_short = dup["firstFile"]["name"].rsplit("/", 1)[-1] - second_short = dup["secondFile"]["name"].rsplit("/", 1)[-1] - if first_short == second_short: + first_name = dup["firstFile"]["name"] + second_name = dup["secondFile"]["name"] + first_short = first_name.rsplit("/", 1)[-1] + second_short = second_name.rsplit("/", 1)[-1] + if first_name == second_name: files_str = first_short + elif first_short == second_short: + # Same filename in different dirs — show parent/file + first_ctx = "/".join(first_name.rsplit("/", 2)[-2:]) + second_ctx = "/".join(second_name.rsplit("/", 2)[-2:]) + files_str = f"{first_ctx} / {second_ctx}" else: files_str = f"{first_short} / {second_short}" label = DIFFICULTY_LABELS.get(difficulty, difficulty) From 42abc552c3384304c61a83ec2d17766da5b15f15 Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Tue, 7 Apr 2026 15:14:54 -0400 Subject: [PATCH 4/8] analyze-duplicates: add %file column showing clone coverage Add a %file column to the overview table showing what percentage of the smaller involved file is covered by the clone. This distinguishes full-file copies (100%) from partial overlaps (e.g., 29%), making it immediately clear which clusters are diverged copies vs shared snippets. For clusters where %file >= 50%, the detail summary also shows "N% of file" inline. Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 --- analyze-duplicates/generate-report.py | 53 ++++++++++++++++++++++----- 1 file changed, 44 insertions(+), 9 deletions(-) diff --git a/analyze-duplicates/generate-report.py b/analyze-duplicates/generate-report.py index 332f7e3..d522120 100644 --- a/analyze-duplicates/generate-report.py +++ b/analyze-duplicates/generate-report.py @@ -63,6 +63,32 @@ def file_link(name, start, end, repo_url, branch): return label +def build_file_lines_map(data): + """Build a {filepath: total_lines} map from jscpd statistics.""" + file_lines = {} + for _fmt_name, fmt_data in data.get("statistics", {}).get("formats", {}).items(): + for fpath, finfo in fmt_data.get("sources", {}).items(): + file_lines[fpath] = finfo.get("lines", 0) + return file_lines + + +def clone_file_percent(dup, file_lines): + """Compute what % of the smaller file is covered by the clone. + + Returns an int 0-100. Uses the smaller of the two files as denominator + so that "95%" means the clone is nearly the entire file. + """ + first_name = dup["firstFile"]["name"] + second_name = dup["secondFile"]["name"] + first_total = file_lines.get(first_name, 0) + second_total = file_lines.get(second_name, 0) + smaller = min(first_total, second_total) if first_total and second_total else 0 + if smaller == 0: + return 0 + clone_lines = dup.get("lines", 0) + return min(100, round(100 * clone_lines / smaller)) + + def load_report(path): """Load a jscpd JSON report.""" try: @@ -273,9 +299,9 @@ def classify_cluster(dup): } -def render_overview_table(all_dups): +def render_overview_table(all_dups, file_lines): """Render a compact overview table of all clusters with mediation info.""" - headers = ["C", "Lines", "Difficulty", "Strategy", "Files"] + headers = ["C", "Lines", "%file", "Difficulty", "Strategy", "Files"] rows = [] for i, (_proj, dup) in enumerate(all_dups, 1): difficulty, strategy, _rationale = classify_cluster(dup) @@ -286,16 +312,17 @@ def render_overview_table(all_dups): if first_name == second_name: files_str = first_short elif first_short == second_short: - # Same filename in different dirs — show parent/file first_ctx = "/".join(first_name.rsplit("/", 2)[-2:]) second_ctx = "/".join(second_name.rsplit("/", 2)[-2:]) files_str = f"{first_ctx} / {second_ctx}" else: files_str = f"{first_short} / {second_short}" label = DIFFICULTY_LABELS.get(difficulty, difficulty) + pct = clone_file_percent(dup, file_lines) rows.append([ str(i), str(dup.get("lines", 0)), + f"{pct}%", label, strategy, files_str, @@ -321,7 +348,8 @@ def fmt_row(cells): return "\n".join(lines) -def render_cluster(idx, dup, prefix="", repo_url=None, branch=None): +def render_cluster(idx, dup, prefix="", repo_url=None, branch=None, + file_lines=None): """Render a single duplicate cluster as a
block with mediation.""" fmt = dup.get("format", "") lang = format_language(fmt) @@ -339,14 +367,16 @@ def render_cluster(idx, dup, prefix="", repo_url=None, branch=None): difficulty, strategy, rationale = classify_cluster(dup) diff_label = DIFFICULTY_LABELS.get(difficulty, difficulty) + pct = clone_file_percent(dup, file_lines or {}) label = f"{prefix}Cluster {idx}" + pct_str = f" {pct}% of file" if pct >= 50 else "" # Summary line uses plain text (no links — they don't work inside ) summary = ( f"[{diff_label}] " f"`{first_name}` lines {first['start']}-{first['end']} " f"↔ `{second_name}` lines {second['start']}-{second['end']} " - f"({n_lines} lines)" + f"({n_lines} lines{pct_str})" ) fragment = dup.get("fragment", "") @@ -387,7 +417,7 @@ def render_cluster(idx, dup, prefix="", repo_url=None, branch=None): def render_report(projects, threshold, cross_project=None, jscpd_version=None, - badge_path=None, repo_url=None, branch=None): + badge_path=None, repo_url=None, branch=None, file_lines=None): """Render the full Markdown report.""" now = datetime.now(timezone.utc).strftime("%Y-%m-%d") version_str = jscpd_version or "unknown" @@ -451,7 +481,7 @@ def render_report(projects, threshold, cross_project=None, jscpd_version=None, return "\n".join(parts) # Overview table - parts.append(render_overview_table(all_dups)) + parts.append(render_overview_table(all_dups, file_lines or {})) parts.append("") # Per-project cluster details @@ -471,7 +501,8 @@ def render_report(projects, threshold, cross_project=None, jscpd_version=None, for dup in duplicates: parts.append(render_cluster(global_idx, dup, - repo_url=repo_url, branch=branch)) + repo_url=repo_url, branch=branch, + file_lines=file_lines)) global_idx += 1 # Cross-project section @@ -486,7 +517,8 @@ def render_report(projects, threshold, cross_project=None, jscpd_version=None, for i, dup in enumerate(cross_dups, 1): parts.append(render_cluster(global_idx, dup, prefix="Cross-project ", - repo_url=repo_url, branch=branch)) + repo_url=repo_url, branch=branch, + file_lines=file_lines)) global_idx += 1 return "\n".join(parts) @@ -554,9 +586,11 @@ def main(): branch = auto_branch projects = [] + file_lines = {} for rpath in args.reports: data = load_report(rpath) name = guess_project_name(rpath) + file_lines.update(build_file_lines_map(data)) projects.append({ "name": name, "stats": data.get("statistics", {}).get("total", {}), @@ -575,6 +609,7 @@ def main(): badge_path=args.badge_path, repo_url=repo_url, branch=branch, + file_lines=file_lines, ) if args.output: From cfcc9115546585be77837bfe35dd50bf72e3e22e Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Wed, 29 Apr 2026 08:39:14 -0400 Subject: [PATCH 5/8] github-project-status: derive repo from git remote when no arg given Adds a "no argument" code path: run `git remote -v` and pick upstream by priority (`upstream` -> `origin` -> first GitHub remote), extracting `owner/repo` from either HTTPS or SSH URL forms. Lets the skill be invoked from inside a checkout without retyping the slug. Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 --- github-project-status/SKILL.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/github-project-status/SKILL.md b/github-project-status/SKILL.md index 545e193..f39d8aa 100644 --- a/github-project-status/SKILL.md +++ b/github-project-status/SKILL.md @@ -23,6 +23,12 @@ Accept GitHub project references in these formats: - Full URL: `https://github.com/owner/repo` - Short form: `owner/repo` - Organization URL: `https://github.com/org` (analyze main repos) +- **No argument given**: Run `git remote -v` in the current directory and identify the upstream repository. Use the first match from this priority order: + 1. Remote named `upstream` + 2. Remote named `origin` + 3. First remote with a GitHub URL + + Extract `owner/repo` from the remote URL (handles both `https://github.com/owner/repo` and `git@github.com:owner/repo` formats). ## Analysis Steps From 2bab6d4fb2f0f3f31090ff52d22e84a442e81beb Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Wed, 29 Apr 2026 08:39:42 -0400 Subject: [PATCH 6/8] pr-feedback-review: add companion-account auth and red/green fix workflow - Add AI_COMPANION_TOKEN_FILE config (default `~/.claude/gh-token`) so reply scripts can post as a dedicated bot account (e.g. `yarikoptic-gitmate`) instead of the user's personal account. Reply scripts source the token file and print the posting account for verification. - For [ADDRESSED] comments, prescribe a red/green TDD loop: extend a failing test, apply the minimal fix, run the suite, commit, and reference the commit SHA in the reply so reviewers can click through. Falls back to "propose only" when a fix is too risky to apply immediately. - Update Step 9 prompt to reflect that fixes are committed earlier and the remaining decision is whether to push and post replies. Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 --- pr-feedback-review/SKILL.md | 45 +++++++++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/pr-feedback-review/SKILL.md b/pr-feedback-review/SKILL.md index f1384a3..af87edf 100644 --- a/pr-feedback-review/SKILL.md +++ b/pr-feedback-review/SKILL.md @@ -19,6 +19,12 @@ This skill uses the following values. Adjust for your setup by editing this sect - **SCAN_DIRS**: `~/proj` — comma-separated parent directories to scan for git repos - **GITHUB_USER**: `yarikoptic` — your GitHub username - **MAX_SCAN_DEPTH**: `3` — how deep to recurse when scanning for repos +- **AI_COMPANION_TOKEN_FILE**: `~/.claude/gh-token` — path to a shell-sourceable + file that exports `GH_TOKEN` for an AI companion GitHub account (e.g. + `yarikoptic-gitmate`). When present, reply scripts use this token so + responses are posted from the companion account rather than the user's + personal account. The file should contain `export GH_TOKEN=github_pat_...`. + Set to empty string to disable and post as yourself. Throughout this document, these names refer to the configured values above. @@ -230,7 +236,17 @@ recommendation based on its type and actionability: - Show the comment text and the relevant code context - If a ` ```suggestion ` block exists, show exactly what it would change (before/after) -- Propose a specific code edit to apply +- **Fix the issue directly** using red/green TDD: + 1. **Red**: Write or extend a test that exposes the bug/missing behavior. + Prefer extending an existing test over creating a new one. Run it to + confirm it fails. + 2. **Green**: Apply the minimal code fix. Run the test to confirm it passes. + 3. **Verify**: Run the broader test suite to ensure no regressions. + 4. **Commit**: Create a commit with a message referencing the review comment + (e.g. "Spotted by Copilot review on PR #NNN"). + 5. Include the commit SHA in the reply so the reviewer can verify the fix. +- If the fix is too complex or risky to apply immediately, propose the edit + and note it as a follow-up instead of committing. **Dismissible comments** (actionable=no): - Draft a concise response explaining why no change is needed @@ -321,9 +337,30 @@ selectively run replies. -f body="" > /dev/null && echo " replied to COMMENT_ID" \ || echo " FAILED to reply to COMMENT_ID" ``` + For `[ADDRESSED]` comments that were fixed via commit (Step 7), include + the short commit SHA and first line of the commit message in the reply + body, e.g.: "Fixed in abc1234 `BF: forward recursion_limit ...` — ..." + so the reviewer can click through to verify the fix. 3. Script format requirements: - - Header with `#!/bin/bash`, `set -e`, and `REPO`/`PR` variables + - Header with `#!/bin/bash`, `set -e`, `REPO`/`PR` variables, and + companion token setup. When `$AI_COMPANION_TOKEN_FILE` is configured + and the file exists, source it at the top of the script so all `gh api` + calls authenticate as the companion account: + ```bash + #!/bin/bash + set -e + REPO="OWNER/REPO" + PR=NUMBER + + # Authenticate as AI companion account + source ~/.claude/gh-token # exports GH_TOKEN + export GH_TOKEN + ``` + Also add a verification line that prints which account is posting: + ```bash + echo "Posting replies as: $(gh api user --jq .login)" + ``` - Each comment block has: - A comment line with file, line number, short description, and `[ADDRESSED]`, `[DISMISSED]`, or `[DISCUSS]` tag @@ -346,10 +383,10 @@ selectively run replies. ### Step 9 — Interactive Follow-up +Actionable issues should already be fixed and committed in Step 7. After presenting the report, ask the user: -- "Should I apply the suggested code changes?" (if any actionable suggestions exist) -- "Should I post the reply script?" (if the script was generated) +- "Should I push and post the reply script?" (if fixes were committed) - "Any comments you want to re-classify or handle differently?" Wait for the user's response before taking any further action. From 73742816da5c3ea5b7c956d2c70774f43921b7fd Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Wed, 29 Apr 2026 08:53:46 -0400 Subject: [PATCH 7/8] Add reuse-compliance skill Move the REUSE-compliance helper out of `~/.claude/commands/` into a proper skill in this repo so it can be shared (and symlinked to `~/.claude/skills/`) like the other CON skills. Content is preserved from the original command, with two adjustments: - Skill-format frontmatter (`name`, `allowed-tools`, `user-invocable`) with a more discovery-friendly description listing concrete trigger scenarios. - Added a "When to Use" section and a "Commit Co-Authorship" section matching the convention used by introduce-codespell and friends. The proposed-structure example no longer mentions a `.reuseignore` file (REUSE 3.x deprecated it; the body of the skill already explains this and recommends `.gitignore` instead). Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 --- README.md | 1 + reuse-compliance/SKILL.md | 507 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 508 insertions(+) create mode 100644 reuse-compliance/SKILL.md diff --git a/README.md b/README.md index d3b89ae..8423ab7 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,7 @@ software project maintenance, triage, and automation. | [issue-triage](issue-triage/) | Triage open GitHub issues by cross-referencing the codebase and git history. Detects duplicates, drafts proposed comments, and serves results in a local web dashboard. Includes Python helper scripts for gathering and serving data. | | [pr-feedback-review](pr-feedback-review/) | Load a PR's review feedback (human + bot), classify each comment by type and actionability, and recommend what to address vs dismiss — with draft code changes and responses. Works from a local repo or a PR URL. | | [pr-review-update](pr-review-update/) | Scan an [improveit-dashboard](https://github.com/yarikoptic/improveit-dashboard) for PRs awaiting your response, assess confidence, auto-rebase codespell PRs, and produce copy-paste-ready push commands. | +| [reuse-compliance](reuse-compliance/) | Set up and validate [REUSE](https://reuse.software/) licensing compliance: `LICENSES/`, `REUSE.toml`, SPDX headers, and integration with tox / pre-commit / Makefile / GitHub Actions. Handles BIDS dataset data-vs-code separation, [DUO](https://github.com/EBISPOT/DUO) data-use ontology codes, and DEP-3 patch tagging for vendoring repos. | | [scan-projects](scan-projects/) | Walk subdirectories of git repos, collect metadata (language, license, commit dates, remote URL), and generate concise LLM-produced summaries into a `projects.tsv` file. Ships with helper scripts for batch updates. | | [tinuous-analyzer](tinuous-analyzer/) | Analyze CI log collections gathered by [con/tinuous](https://github.com/con/tinuous/) to pinpoint when a test started failing, diff environment/dependency changes between passing and failing runs, and recommend investigation steps. | diff --git a/reuse-compliance/SKILL.md b/reuse-compliance/SKILL.md new file mode 100644 index 0000000..07857d9 --- /dev/null +++ b/reuse-compliance/SKILL.md @@ -0,0 +1,507 @@ +--- +name: reuse-compliance +description: Set up and validate REUSE specification compliance (LICENSES/ directory, REUSE.toml, SPDX headers) for software projects and BIDS datasets. Covers BIDS data-vs-code separation, DUO (Data Use Ontology) integration, DEP-3 patch tagging for vendoring repos, and integration with tox / pre-commit / Makefile / GitHub Actions. Use when adding licensing metadata to a project, fixing `reuse lint` failures, licensing a BIDS dataset, or annotating patches in a vendoring repo. +allowed-tools: Bash, Read, Edit, Write, Glob, Grep, AskUserQuestion +user-invocable: true +--- + +# REUSE Compliance Skill + +Implement the [REUSE specification](https://reuse.software/) for clear, +machine-readable licensing and copyright information. Includes special +support for BIDS datasets, Data Use Ontology (DUO) integration, and +DEP-3 patch tagging for vendoring repositories. + +## When to Use + +- User wants to add REUSE / SPDX licensing metadata to a project +- User asks to "introduce REUSE" or runs `/reuse-compliance` +- `reuse lint` is failing and needs to be brought to 100% compliance +- User is licensing a BIDS dataset (data + code + docs separately) +- User is annotating `*.patch` files in a vendoring repo with DEP-3 headers +- User wants to integrate REUSE checks into tox / pre-commit / CI + +## Overview + +**REUSE Specification** provides standardized practices for declaring copyright and licensing information in software projects and datasets. + +**DUO (Data Use Ontology)** from GA4GH provides machine-readable codes for data use restrictions and conditions, particularly for health/biomedical research data. + +**Integration**: REUSE handles copyright/licensing (legal permissions), while DUO handles consent-based data use restrictions (ethical/regulatory constraints). + +## Key Concepts + +### REUSE Core Components + +1. **LICENSES/** directory: Contains full license texts (e.g., Apache-2.0.txt, CC0-1.0.txt) +2. **REUSE.toml**: Configuration file with copyright and license annotations +3. **`.gitignore`**: REUSE 3.x honors `.gitignore` for excluding build artifacts/caches. + (`.reuseignore` was deprecated; do not create new `.reuseignore` files.) +4. **SPDX headers**: In-file copyright/license declarations + +### REUSE.toml `precedence` field + +Each `[[annotations]]` block takes a `precedence` value that controls how the +block-level annotation interacts with in-file SPDX headers: + +- `"aggregate"` — block annotation + any in-file SPDX header are combined + (good default for most blocks). +- `"closest"` — in-file SPDX header wins if present; otherwise the block + applies. **Use this whenever per-file overrides are expected** (e.g. + patch files with DEP-3/SPDX headers, vendored sub-trees with mixed + authorship). +- `"override"` — block always wins, even over in-file SPDX. Rarely the + right choice; use only when you cannot trust file headers (e.g. + generated/vendored files with stale or missing tags). + +### REUSE scope: per-working-tree, not per-branch + +`REUSE.toml` describes the working tree it lives in. If a repository has +substantially different content across branches (e.g. a vendoring repo +with an `upstream/` branch tracking unmodified upstream alongside a +`master` branch with local patches), state this in the README and let +each branch carry its own `REUSE.toml` (or none, deferring to upstream's +own copyright file). This is uncommon — most projects only need one +`REUSE.toml` on the default branch. + +### BIDS Dataset Considerations + +Per [bids-specification#2015](https://github.com/bids-standard/bids-specification/issues/2015): +- **dataset_description.json**: Contains `License` field for data portion +- **Multiple licenses**: Code components may need separate licensing from data +- **REUSE.toml in BIDS**: Should clarify data vs. code licensing +- **DUO annotations**: Can supplement licenses with data use conditions + +### DUO Integration + +Per [bids-specification#2078](https://github.com/bids-standard/bids-specification/issues/2078) and [reuse-tool#1148](https://github.com/fsfe/reuse-tool/issues/1148): +- DUO codes describe data use conditions beyond licensing +- Examples: "no re-identification" (DUO:0000028), "general research use" (DUO:0000042) +- Can be included in REUSE.toml or dataset_description.json +- See: https://github.com/EBISPOT/DUO + +## Commit Co-Authorship + +All commits created during this workflow MUST include a `Co-Authored-By` trailer identifying +both Claude Code version and the model used. Get the version via `claude --version` and +use the model name from the environment. Format: + +``` +Co-Authored-By: Claude Code / Claude +``` + +Example: +``` +Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 +``` + +## Execution Steps + +When this skill is invoked, follow these steps: + +### 1. Assess Current State + +**Check for existing REUSE infrastructure:** +- Look for LICENSES/ directory +- Check if REUSE.toml or .reuse/dep5 exists +- Check `.gitignore` for build-artifact exclusions (REUSE 3.x honors it; + if a legacy `.reuseignore` exists, plan to migrate its entries to + `.gitignore` and remove it) +- Scan for SPDX headers in files + +**Check for BIDS dataset:** +- Look for dataset_description.json +- Check if it contains License field +- Identify data files vs. code files (scripts/, code/) + +**Check for build system integration:** +- Check if tox.ini exists → suggest adding [testenv:reuse] +- Check if .pre-commit-config.yaml exists → suggest adding reuse hook +- Check if Makefile exists → suggest adding reuse target +- Check if .github/workflows/ exists → suggest adding reuse check + +### 2. Propose REUSE Structure + +**For general projects:** +``` +LICENSES/ +├── Apache-2.0.txt # Main code license +├── CC-BY-4.0.txt # Documentation license +└── CC0-1.0.txt # Public domain data + +REUSE.toml # License annotations +``` + +**For BIDS datasets:** +``` +LICENSES/ +├── CC0-1.0.txt # Data license (if public domain) +├── CC-BY-4.0.txt # Data license (if attribution required) +└── MIT.txt # Code/scripts license + +REUSE.toml # Separate annotations for data vs code +dataset_description.json # License field + optional DUO codes +``` + +### 3. Create REUSE.toml + +Generate appropriate annotations: + +**Standard Project Template:** +```toml +version = 1 + +[[annotations]] +path = [ + "src/**", + "tests/**", + "*.py", + "*.md", + ".github/**", +] +precedence = "aggregate" +SPDX-FileCopyrightText = "YEAR AUTHOR " +SPDX-License-Identifier = "LICENSE-ID" + +[[annotations]] +path = ["data/**"] +precedence = "aggregate" +SPDX-FileCopyrightText = "YEAR DATA-PROVIDER" +SPDX-License-Identifier = "CC0-1.0" +``` + +**BIDS Dataset Template:** +```toml +version = 1 + +# BIDS data files +[[annotations]] +path = [ + "sub-*/**/*.nii.gz", + "sub-*/**/*.json", + "sub-*/**/*.tsv", + "participants.tsv", + "participants.json", + "*.tsv", + "*.json", +] +precedence = "aggregate" +SPDX-FileCopyrightText = "YEAR DATA-COLLECTORS" +SPDX-License-Identifier = "CC0-1.0" +# Optional DUO annotation (if applicable) +# DataUseOntology = ["DUO:0000042"] # General research use + +# BIDS code/derivatives +[[annotations]] +path = [ + "code/**", + "derivatives/**/*.py", + "derivatives/**/*.sh", +] +precedence = "aggregate" +SPDX-FileCopyrightText = "YEAR DEVELOPERS" +SPDX-License-Identifier = "MIT" + +# Documentation +[[annotations]] +path = ["README*", "CHANGES*", "dataset_description.json"] +precedence = "aggregate" +SPDX-FileCopyrightText = "YEAR AUTHORS" +SPDX-License-Identifier = "CC-BY-4.0" +``` + +### 4. Handle BIDS dataset_description.json + +**Current format (BIDS 1.x):** +```json +{ + "Name": "Dataset Name", + "BIDSVersion": "1.9.0", + "License": "CC0" +} +``` + +**Proposed enhanced format (per bids-spec#2015 and #2078):** +```json +{ + "Name": "Dataset Name", + "BIDSVersion": "1.9.0", + "License": "CC0", + "DataUseOntology": [ + "DUO:0000042", + "DUO:0000028" + ], + "DataUseDescription": "General research use; No re-identification" +} +``` + +**Common DUO codes:** +- `DUO:0000042` - General research use +- `DUO:0000028` - No re-identification +- `DUO:0000006` - Health or medical or biomedical research +- `DUO:0000007` - Disease-specific research +- `DUO:0000021` - Ethics approval required +- `DUO:0000043` - Clinical care use + +### 5. Exclude build artifacts via `.gitignore` + +REUSE 3.x honors `.gitignore` — anything matched there is automatically +skipped by `reuse lint`. **Do not create a `.reuseignore` file** (it is +deprecated). Add build artifacts and caches to `.gitignore` if not +already there: + +```gitignore +# Build artifacts and caches +.tox/ +.venv*/ +__pycache__/ +*.egg-info/ +build/ +dist/ +.pytest_cache/ +.mypy_cache/ +.ruff_cache/ +node_modules/ +``` + +**BIDS-specific excludes:** +```gitignore +# BIDS working directories (if any) +sourcedata/ +work/ +.bidsignore +.datalad/ +``` + +If a legacy `.reuseignore` exists, migrate its entries to `.gitignore` +and delete the file. Large generated/binary artifacts that you +intentionally want tracked but excluded from REUSE (rare) should instead +be covered by an `[[annotations]]` block in `REUSE.toml` with +appropriate SPDX tags. + +### 6. Integrate with Build Systems + +**A. tox.ini Integration:** +```ini +[testenv:reuse] +skip_install = true +deps = reuse +description = Check REUSE specification compliance +commands = + reuse lint + +[gh-actions] +python = + 3.12: py312, lint, type, reuse +``` + +**B. pre-commit Integration:** +```yaml +repos: + - repo: https://github.com/fsfe/reuse-tool + rev: v4.0.3 + hooks: + - id: reuse +``` + +**C. Makefile Integration:** +```makefile +.PHONY: reuse-lint reuse-download + +reuse-lint: + @echo "=== Checking REUSE compliance ===" + reuse lint + +reuse-download: + @echo "=== Downloading missing licenses ===" + reuse download --all + +reuse-annotate: + @echo "=== Annotating file with license header ===" + @read -p "File to annotate: " file; \ + reuse annotate --license Apache-2.0 --copyright "YEAR AUTHOR" $$file +``` + +**D. GitHub Actions Integration:** +```yaml +- name: Check REUSE compliance + uses: fsfe/reuse-action@v5 +``` + +### 7. Validate and Report + +Run validation: +```bash +reuse lint +``` + +Expected output sections: +- **Bad licenses**: License files with issues +- **Missing licenses**: Referenced but not in LICENSES/ +- **Files with copyright information**: X / Y +- **Files with license information**: X / Y + +Goal: 100% compliance (all files have both copyright and license info) + +### 8. DUO Validation (BIDS Datasets) + +If DUO codes are present, validate them: +1. Check codes exist in DUO ontology: https://www.ebi.ac.uk/ols/ontologies/duo +2. Ensure codes are consistent with License field +3. Verify DataUseDescription matches codes +4. Check for conflicting restrictions + +**Common patterns:** +- CC0 + DUO:0000042 → "Open data, general research use" +- CC-BY-4.0 + DUO:0000028 → "Attribution required, no re-identification" +- Custom + DUO:0000021 → "Restricted access, ethics approval required" + +## Optional: Patches against external upstream + DEP-3 + +**Skip this section unless** the repository carries `*.patch` files that +modify some other project's source (e.g. a vendoring/CI repo with +`patches/` applied at build time). This is a relatively rare setup — +most projects do not need it. + +When it does apply, REUSE alone is not enough: each patch should also +carry a [DEP-3](https://dep-team.pages.debian.net/deps/dep3/) header so +its provenance, upstream-forwarding status, and license are documented +in-band. + +### Licensing of patch files + +Patches are derivative works of the upstream they modify and must +inherit the upstream license. Choose the SPDX identifier from upstream's +license: +- git-annex / GPL upstreams → `AGPL-3.0-or-later` / `GPL-2.0-or-later` / etc. +- BSD/MIT upstreams → match exactly. + +In `REUSE.toml`, use `precedence = "closest"` on the patches subtree so +the per-patch SPDX header (added below) wins over the block-level +fallback: + +```toml +[[annotations]] +path = "patches/**" +precedence = "closest" +SPDX-FileCopyrightText = "YEAR PROJECT TEAM " +SPDX-License-Identifier = "AGPL-3.0-or-later" # match upstream +``` + +### DEP-3 + SPDX header template + +Prepend the following RFC-2822-style block to every `*.patch`. The +trailing `---` line terminates the metadata; everything after it is the +ordinary `git diff` content. Patch tools (`git apply`, `git apply -R +--check`, `quilt`, `patch`) accept and ignore the preamble. + +``` +Description: + +Origin: vendor, https:// +Author: First Last +Forwarded: not-needed # OR: ; OR: no +Last-Update: YYYY-MM-DD +Bug: +Applied-Upstream: +SPDX-FileCopyrightText: YEAR First Last +SPDX-License-Identifier: +--- +diff --git a/... +``` + +Field reference (DEP-3): +- `Description` (required) — short summary on first line, longer + explanation indented on following lines. +- `Origin` (required unless `Author`) — `upstream`, `backport`, + `vendor`, or `other`, optionally with a URL. +- `Author` / `From` — patch author(s). +- `Forwarded` — `yes`/URL, `no`, or `not-needed`. +- `Last-Update` — ISO date the metadata was last revised. +- `Bug`, `Bug-`, `Reviewed-by`, `Applied-Upstream` — optional. + +### Verify patch tooling tolerates the preamble + +Before committing, sanity-check the project's actual patch-application +path (not just `git apply`). For example: +```bash +git apply --check patches/.patch +git apply -R --check patches/.patch # if reverse-check is used +``` +If the project uses `quilt`, `patch -p1`, or a custom script, run that +too. Most tools ignore the preamble, but confirm before assuming. + +### Document it in the README + +Add a brief Licensing section pointing at `REUSE.toml` and `LICENSES/`, +and extend any "Submitting Patches" / contributing guidance with the +DEP-3 + SPDX template so new patches are compliant out of the gate. + +## Decision Points + +### License Selection + +**For code:** +- Apache-2.0: Permissive, patent grant +- MIT: Simple, permissive +- GPL-3.0-or-later: Copyleft + +**For data:** +- CC0-1.0: Public domain dedication +- CC-BY-4.0: Attribution required +- PDDL-1.0: Open Data Commons Public Domain + +**For documentation:** +- CC-BY-4.0: Standard for documentation +- CC-BY-SA-4.0: Share-alike for wikis + +### DUO Code Selection (BIDS) + +Ask user about data use restrictions: +1. Is this general research use? → DUO:0000042 +2. Can data be used for re-identification? → If no, add DUO:0000028 +3. Is ethics approval required? → DUO:0000021 +4. Disease-specific restrictions? → DUO:0000007 + specific disease +5. Collaboration required? → DUO:0000020 +6. Time limit? → DUO:0000024 + duration + +### Build System Priority + +If multiple systems exist, suggest: +1. **Primary**: tox (Python standard) +2. **Developer workflow**: pre-commit (catches issues early) +3. **CI/CD**: GitHub Actions (automated checks) +4. **Make**: For projects already using it + +## Output + +Provide: +1. **Status report**: Current compliance level +2. **Action items**: What needs to be done +3. **File changes**: Specific files to create/modify +4. **Integration steps**: How to add to build systems +5. **Validation command**: How to check compliance + +For BIDS datasets, additionally provide: +- Suggested dataset_description.json updates +- DUO code recommendations based on data type +- Explanation of REUSE + DUO synergy + +## References + +- REUSE Specification: https://reuse.software/spec/ +- REUSE Tutorial: https://reuse.software/tutorial/ +- DEP-3 (Patch Tagging Guidelines): https://dep-team.pages.debian.net/deps/dep3/ +- BIDS REUSE Issue: https://github.com/bids-standard/bids-specification/issues/2015 +- BIDS DUO Issue: https://github.com/bids-standard/bids-specification/issues/2078 +- DUO Ontology: https://github.com/EBISPOT/DUO +- GA4GH DUO Standard: https://www.ga4gh.org/product/data-use-ontology-duo/ + +## Notes + +- Always preserve existing licensing information when adding REUSE compliance +- For BIDS: License field in dataset_description.json should match REUSE.toml data annotations +- DUO codes are complementary to licenses, not replacements +- REUSE handles "can you legally use this?", DUO handles "under what conditions?" +- When in doubt about DUO codes, consult institutional review board or data governance team From 5ada2e1912587e5c823a3cde248d1d460b6600a3 Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Wed, 29 Apr 2026 09:03:45 -0400 Subject: [PATCH 8/8] Rename reuse-compliance -> introduce-reuse-compliance Match the `introduce-*` naming used by the other "set this up in a project" skills (introduce-codespell, introduce-git-bug, introduce-mailmap). Updates the directory, the `name:` field in the frontmatter, the heading, the example slash-command in "When to Use", and the README entry (also re-sorted into its alphabetical slot). The `~/.claude/skills/` symlink is repointed locally; not part of this commit. Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 --- README.md | 2 +- {reuse-compliance => introduce-reuse-compliance}/SKILL.md | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) rename {reuse-compliance => introduce-reuse-compliance}/SKILL.md (96%) diff --git a/README.md b/README.md index 8423ab7..c83f79f 100644 --- a/README.md +++ b/README.md @@ -11,10 +11,10 @@ software project maintenance, triage, and automation. | [github-project-status](github-project-status/) | Assess whether a GitHub project is healthy, in maintenance mode, stagnant, or abandoned. Checks commits, releases, issues, PRs, forks, and package registries to produce a structured status report. | | [introduce-codespell](introduce-codespell/) | Add [codespell](https://github.com/codespell-project/codespell) spell-checking to a project end-to-end: config, GitHub Actions workflow, pre-commit hook, exclusion tuning, ambiguous-typo review, and automated fixes via `datalad run`. | | [introduce-git-bug](introduce-git-bug/) | Set up [git-bug](https://github.com/git-bug/git-bug) distributed issue tracking: configure GitHub bridge, sync issues, push `refs/bugs/*`, and document the workflow in DEVELOPMENT.md / CLAUDE.md. | +| [introduce-reuse-compliance](introduce-reuse-compliance/) | Introduce [REUSE](https://reuse.software/) licensing compliance to a project: `LICENSES/`, `REUSE.toml`, SPDX headers, and integration with tox / pre-commit / Makefile / GitHub Actions. Handles BIDS dataset data-vs-code separation, [DUO](https://github.com/EBISPOT/DUO) data-use ontology codes, and DEP-3 patch tagging for vendoring repos. | | [issue-triage](issue-triage/) | Triage open GitHub issues by cross-referencing the codebase and git history. Detects duplicates, drafts proposed comments, and serves results in a local web dashboard. Includes Python helper scripts for gathering and serving data. | | [pr-feedback-review](pr-feedback-review/) | Load a PR's review feedback (human + bot), classify each comment by type and actionability, and recommend what to address vs dismiss — with draft code changes and responses. Works from a local repo or a PR URL. | | [pr-review-update](pr-review-update/) | Scan an [improveit-dashboard](https://github.com/yarikoptic/improveit-dashboard) for PRs awaiting your response, assess confidence, auto-rebase codespell PRs, and produce copy-paste-ready push commands. | -| [reuse-compliance](reuse-compliance/) | Set up and validate [REUSE](https://reuse.software/) licensing compliance: `LICENSES/`, `REUSE.toml`, SPDX headers, and integration with tox / pre-commit / Makefile / GitHub Actions. Handles BIDS dataset data-vs-code separation, [DUO](https://github.com/EBISPOT/DUO) data-use ontology codes, and DEP-3 patch tagging for vendoring repos. | | [scan-projects](scan-projects/) | Walk subdirectories of git repos, collect metadata (language, license, commit dates, remote URL), and generate concise LLM-produced summaries into a `projects.tsv` file. Ships with helper scripts for batch updates. | | [tinuous-analyzer](tinuous-analyzer/) | Analyze CI log collections gathered by [con/tinuous](https://github.com/con/tinuous/) to pinpoint when a test started failing, diff environment/dependency changes between passing and failing runs, and recommend investigation steps. | diff --git a/reuse-compliance/SKILL.md b/introduce-reuse-compliance/SKILL.md similarity index 96% rename from reuse-compliance/SKILL.md rename to introduce-reuse-compliance/SKILL.md index 07857d9..563eb51 100644 --- a/reuse-compliance/SKILL.md +++ b/introduce-reuse-compliance/SKILL.md @@ -1,11 +1,11 @@ --- -name: reuse-compliance -description: Set up and validate REUSE specification compliance (LICENSES/ directory, REUSE.toml, SPDX headers) for software projects and BIDS datasets. Covers BIDS data-vs-code separation, DUO (Data Use Ontology) integration, DEP-3 patch tagging for vendoring repos, and integration with tox / pre-commit / Makefile / GitHub Actions. Use when adding licensing metadata to a project, fixing `reuse lint` failures, licensing a BIDS dataset, or annotating patches in a vendoring repo. +name: introduce-reuse-compliance +description: Introduce REUSE specification compliance (LICENSES/ directory, REUSE.toml, SPDX headers) to a software project or BIDS dataset, then validate it. Covers BIDS data-vs-code separation, DUO (Data Use Ontology) integration, DEP-3 patch tagging for vendoring repos, and integration with tox / pre-commit / Makefile / GitHub Actions. Use when adding licensing metadata to a project, fixing `reuse lint` failures, licensing a BIDS dataset, or annotating patches in a vendoring repo. allowed-tools: Bash, Read, Edit, Write, Glob, Grep, AskUserQuestion user-invocable: true --- -# REUSE Compliance Skill +# Introduce REUSE Compliance to a Project Implement the [REUSE specification](https://reuse.software/) for clear, machine-readable licensing and copyright information. Includes special @@ -15,7 +15,7 @@ DEP-3 patch tagging for vendoring repositories. ## When to Use - User wants to add REUSE / SPDX licensing metadata to a project -- User asks to "introduce REUSE" or runs `/reuse-compliance` +- User asks to "introduce REUSE" or runs `/introduce-reuse-compliance` - `reuse lint` is failing and needs to be brought to 100% compliance - User is licensing a BIDS dataset (data + code + docs separately) - User is annotating `*.patch` files in a vendoring repo with DEP-3 headers