From 42c479c05b3502b8c83029ec0b23d221049d9c03 Mon Sep 17 00:00:00 2001
From: Yaroslav Halchenko <debian@onerussian.com>
Date: Tue, 7 Apr 2026 13:04:33 -0400
Subject: [PATCH 1/8] Add analyze-duplicates skill for jscpd-based duplication
 detection

New skill that scans codebases and documentation for duplicated content
using jscpd (token-based detection). Produces a consolidated Markdown
report with an overview table, collapsible per-cluster details showing
the duplicated fragments, and inline mediation recommendations
(difficulty rating + refactoring strategy). Also generates an
interactive HTML report via @jscpd/html-reporter.

Includes generate-report.py helper that converts jscpd JSON output into
the GitHub/Gitea-friendly report format.

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
---
 README.md                             |   1 +
 analyze-duplicates/SKILL.md           | 203 +++++++++++
 analyze-duplicates/generate-report.py | 489 ++++++++++++++++++++++++++
 3 files changed, 693 insertions(+)
 create mode 100644 analyze-duplicates/SKILL.md
 create mode 100644 analyze-duplicates/generate-report.py
diff --git a/README.md b/README.md
index 3ed0805..d3b89ae 100644
--- a/README.md
+++ b/README.md
@@ -7,6 +7,7 @@ software project maintenance, triage, and automation.
 
 | Skill | Description |
 |-------|-------------|
+| [analyze-duplicates](analyze-duplicates/) | Detect code and documentation duplication using jscpd, generate a Markdown report with collapsible `<details>` sections (suitable for GitHub/Gitea issues), and propose a mediation plan with refactoring strategies. |
 | [github-project-status](github-project-status/) | Assess whether a GitHub project is healthy, in maintenance mode, stagnant, or abandoned. Checks commits, releases, issues, PRs, forks, and package registries to produce a structured status report. |
 | [introduce-codespell](introduce-codespell/) | Add [codespell](https://github.com/codespell-project/codespell) spell-checking to a project end-to-end: config, GitHub Actions workflow, pre-commit hook, exclusion tuning, ambiguous-typo review, and automated fixes via `datalad run`. |
 | [introduce-git-bug](introduce-git-bug/) | Set up [git-bug](https://github.com/git-bug/git-bug) distributed issue tracking: configure GitHub bridge, sync issues, push `refs/bugs/*`, and document the workflow in DEVELOPMENT.md / CLAUDE.md. |
diff --git a/analyze-duplicates/SKILL.md b/analyze-duplicates/SKILL.md
new file mode 100644
index 0000000..476f911
--- /dev/null
+++ b/analyze-duplicates/SKILL.md
@@ -0,0 +1,203 @@
+---
+name: analyze-duplicates
+description: Analyze codebase or documentation for code/text duplication using jscpd. Generates a Markdown report with collapsible sections (suitable for GitHub/Gitea issues) showing duplicate clusters, statistics, and a mediation plan proposing refactoring strategies.
+allowed-tools: Bash, Read, Write, Glob, Grep, Agent
+user-invocable: true
+---
+
+# Analyze Duplicates
+
+Detect code and documentation duplication in one or more paths, produce a
+Markdown report with `<details>` sections for posting as a GitHub/Gitea issue,
+and propose a concrete mediation plan.
+
+## When to Use
+
+- User wants to find duplicated code or documentation in a project
+- User asks to "check for duplicates", "find copy-paste code", "DRY audit"
+- User mentions "jscpd", "duplicate detection", or "code clones"
+- User runs `/analyze-duplicates`
+
+## Configuration
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `MIN_LINES` | `6` | Minimum duplicate block size in lines |
+| `MIN_TOKENS` | `50` | Minimum duplicate block size in tokens |
+| `THRESHOLD` | `5` | Duplication percentage that flags a warning |
+| `FORMATS` | (auto-detect) | Comma-separated jscpd format list (e.g., `python,markdown`) |
+
+## Arguments
+
+The skill accepts one or more paths to scan. If none are provided, scan the
+current working directory.
+
+Optional flags (passed as part of the argument string):
+- `--formats python,markdown` — override auto-detected formats
+- `--min-lines N` — override MIN_LINES
+- `--min-tokens N` — override MIN_TOKENS
+- `--threshold N` — override warning threshold percentage
+- `--output PATH` — where to write the report (default: `.jscpd-report.md` in first scanned path)
+- `--cross-project` — when multiple paths given, also run a combined scan to find cross-project duplicates
+- `--no-html` — skip generating the HTML report (default: generate it)
+- `--badge` — also generate an SVG badge and embed it in the report (default: off)
+
+## Execution Steps
+
+### Step 0: Parse Arguments
+
+Parse the argument string. Extract paths (any arg not starting with `--`),
+and optional flags. Apply defaults from Configuration for anything not specified.
+
+If no paths provided, use the current working directory.
+
+Create the `.tmp/` directory in the current working directory for intermediate
+output. If `.tmp` is not already in `.gitignore`, add it (or warn the user).
+
+### Step 1: Ensure jscpd is Available
+
+Check if jscpd is available:
+
+```bash
+command -v jscpd || npx --yes jscpd@latest --version
+```
+
+If neither works, report the error and stop:
+> jscpd not found. Install via `npm install -g jscpd` or ensure `npx` is available.
+
+### Step 2: Detect Project Context
+
+For each scan path:
+1. Check if it is a git repository (`git -C PATH rev-parse --is-inside-work-tree`)
+2. Detect primary languages by file extension counts (`.py` -> python, `.js/.ts` -> javascript/typescript, `.md` -> markdown, etc.)
+3. If `--formats` was specified, use that instead of auto-detection
+4. Note the project name from the directory basename (or git remote if available)
+
+### Step 3: Run jscpd
+
+For each scan path, run jscpd with JSON, HTML, and badge reporters:
+
+```bash
+npx --yes jscpd@latest \
+    --min-lines MIN_LINES \
+    --min-tokens MIN_TOKENS \
+    --reporters "json,html" \
+    --output .tmp/jscpd-PROJECTNAME \
+    --ignore "**/.tox/**,**/venv*/**,**/.venv/**,**/node_modules/**,**/__pycache__/**,**/build/**,**/dist/**,**/.eggs/**,**/.git/**,**/.npm/**,**/.tmp/**" \
+    PATH
+```
+
+If `--no-html` is set, omit `html` from reporters. If `--badge` is set, add `badge` to reporters.
+If `--formats` is set, add `--format FORMATS`.
+
+This produces:
+- `.tmp/jscpd-PROJECTNAME/jscpd-report.json` — structured data for the markdown report
+- `.tmp/jscpd-PROJECTNAME/html/index.html` — interactive HTML report with syntax highlighting
+- `.tmp/jscpd-PROJECTNAME/jscpd-badge.svg` — shields.io-style badge showing duplication % (only with `--badge`)
+
+If `--cross-project` and multiple paths: after individual scans, create a
+temporary parent directory with symlinks to all paths and run one combined scan.
+
+### Step 4: Parse Results and Generate Report
+
+Read each `.tmp/jscpd-PROJECTNAME/jscpd-report.json` and generate the report
+using the helper script:
+
+```bash
+python3 SKILL_DIR/generate-report.py \
+    --threshold THRESHOLD \
+    --output REPORT_PATH \
+    --jscpd-version "$(npx --yes jscpd@latest --version 2>/dev/null)" \
+    [--cross-project .tmp/jscpd-combined/jscpd-report.json] \
+    .tmp/jscpd-PROJECT1/jscpd-report.json \
+    [.tmp/jscpd-PROJECT2/jscpd-report.json ...]
+```
+
+Where `SKILL_DIR` is the directory containing this SKILL.md file. Resolve it
+by searching for `generate-report.py` in `~/.claude/skills/analyze-duplicates/`.
+
+If `--badge` was requested and a badge was generated, pass `--badge-path` with
+a relative path to the SVG. Copy the badge SVG to the output directory so both
+files are co-located.
+
+### Step 5: Review and Enhance Mediation Plan
+
+The `generate-report.py` script already produces a `## Mediation Plan` section
+with heuristic classifications (trivial/easy/moderate/hard) and strategies
+for each cluster. After the report is generated:
+
+1. Read the generated report and the duplicated fragments
+2. For each cluster, **verify** the heuristic recommendation makes sense in
+   context — read the actual source files around the duplicated lines if needed
+3. For **easy/trivial** clusters: add a concrete diff or pseudo-diff showing
+   the proposed refactoring (extract function, parametrize test, etc.)
+4. For **moderate/hard** clusters: enhance the description with specifics
+   about what the shared abstraction should look like
+5. Adjust difficulty ratings if the heuristic got it wrong (e.g., what looks
+   like a simple extract may actually involve different signatures)
+
+### Step 6: Present Results
+
+1. Print a brief summary to the console:
+   - Total duplication percentage per project
+   - Number of clone clusters found
+   - Whether threshold was exceeded
+2. Print paths to all generated artifacts:
+   - Markdown report (the primary deliverable, suitable for GitHub issues)
+   - HTML report directory (interactive browser view with syntax highlighting)
+   - Badge SVG path (only if `--badge` was used)
+3. If duplication exceeds the threshold, note this prominently
+
+## Report Format
+
+The report MUST be a Markdown file using `<details><summary>` blocks so it
+renders well when posted as a GitHub/Gitea issue. Structure:
+
+```markdown
+# Duplication Analysis Report
+
+> Generated: YYYY-MM-DD | Tool: jscpd VERSION | Threshold: N%
+
+## Summary
+
+| Project    | Files | Lines | Clones | Duplicated Lines | Percentage |
+|------------|------:|------:|-------:|-----------------:|-----------:|
+| my-project |    42 | 12000 |      5 |               83 |      0.69% |
+
+> Duplication is within the 5% threshold for all projects.
+
+## Duplicate Clusters
+
+| # | Lines | Difficulty        | Strategy                      | Files   |
+|---|-------|-------------------|-------------------------------|---------|
+| 1 | 8     | Trivial | Extract local helper function | file.py |
+
+<details>
+<summary><b>Cluster 1</b>: [Trivial] `file.py` lines 10-18
+&harr; `file.py` lines 30-38 (8 lines)</summary>
+
+**Files involved:**
+- `file.py` (lines 10-18)
+- `file.py` (lines 30-38)
+
+**Duplicated fragment:**
+~~~python
+<the duplicated code here>
+~~~
+
+**Mediation** (Trivial): Extract local helper function
+
+> Duplicated logic within `file.py`. Extract into a private function
+> in the same module.
+
+</details>
+```
+
+## Commit Co-Authorship
+
+All commits created during this workflow MUST include a `Co-Authored-By` trailer.
+Get the version via `claude --version`. Format:
+
+```
+Co-Authored-By: Claude Code <VERSION> / Claude <MODEL> <noreply@anthropic.com>
+```
diff --git a/analyze-duplicates/generate-report.py b/analyze-duplicates/generate-report.py
new file mode 100644
index 0000000..28722da
--- /dev/null
+++ b/analyze-duplicates/generate-report.py
@@ -0,0 +1,489 @@
+#!/usr/bin/env python3
+"""Generate a Markdown duplication report from jscpd JSON output.
+
+Reads one or more jscpd-report.json files and produces a GitHub/Gitea-friendly
+Markdown report with <details> sections for each duplicate cluster.
+
+Usage:
+    python3 generate-report.py [OPTIONS] REPORT_JSON [REPORT_JSON ...]
+
+Options:
+    --threshold N        Duplication warning threshold percentage (default: 5)
+    --output PATH        Output markdown file (default: stdout)
+    --cross-project PATH Include cross-project scan results from this JSON file
+"""
+
+import argparse
+import json
+import sys
+from datetime import datetime, timezone
+from pathlib import Path
+
+
+def load_report(path):
+    """Load a jscpd JSON report."""
+    try:
+        with open(path) as f:
+            return json.load(f)
+    except (OSError, json.JSONDecodeError) as exc:
+        print(f"Error loading {path}: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+
+def guess_project_name(report_path):
+    """Infer project name from the report's parent directory name."""
+    parent = Path(report_path).parent.name
+    # Strip jscpd- prefix if present
+    if parent.startswith("jscpd-"):
+        return parent[6:]
+    return parent
+
+
+def format_language(fmt):
+    """Map jscpd format names to markdown code fence language hints."""
+    mapping = {
+        "python": "python",
+        "javascript": "javascript",
+        "typescript": "typescript",
+        "markup": "html",
+        "markdown": "markdown",
+        "yaml": "yaml",
+        "json": "json",
+        "css": "css",
+        "go": "go",
+        "rust": "rust",
+        "java": "java",
+        "csharp": "csharp",
+        "ruby": "ruby",
+        "bash": "bash",
+        "shell": "bash",
+    }
+    return mapping.get(fmt, "")
+
+
+def render_summary_table(projects):
+    """Render the summary table with human-aligned columns."""
+    headers = ["Project", "Files", "Lines", "Clones", "Duplicated Lines", "Percentage"]
+    rows = []
+    for p in projects:
+        stats = p["stats"]
+        rows.append([
+            p["name"],
+            str(stats.get("sources", 0)),
+            str(stats.get("lines", 0)),
+            str(stats.get("clones", 0)),
+            str(stats.get("duplicatedLines", 0)),
+            f"{stats.get('percentage', 0.0):.2f}%",
+        ])
+
+    # Compute column widths (max of header and all row values)
+    widths = [len(h) for h in headers]
+    for row in rows:
+        for i, cell in enumerate(row):
+            widths[i] = max(widths[i], len(cell))
+
+    # First column left-aligned, rest right-aligned
+    def fmt_row(cells):
+        parts = []
+        for i, cell in enumerate(cells):
+            if i == 0:
+                parts.append(f" {cell:<{widths[i]}} ")
+            else:
+                parts.append(f" {cell:>{widths[i]}} ")
+        return "|" + "|".join(parts) + "|"
+
+    def fmt_sep():
+        parts = []
+        for i, w in enumerate(widths):
+            if i == 0:
+                parts.append("-" * (w + 2))
+            else:
+                parts.append("-" * (w + 1) + ":")
+        return "|" + "|".join(parts) + "|"
+
+    lines = [fmt_row(headers), fmt_sep()]
+    for row in rows:
+        lines.append(fmt_row(row))
+    return "\n".join(lines)
+
+
+def truncate_fragment(fragment, max_lines=30):
+    """Truncate long code fragments for readability."""
+    lines = fragment.splitlines()
+    if len(lines) <= max_lines:
+        return fragment
+    kept = lines[:max_lines]
+    kept.append(f"... ({len(lines) - max_lines} more lines)")
+    return "\n".join(kept)
+
+
+def classify_cluster(dup):
+    """Classify a duplicate cluster and propose mediation strategy.
+
+    Returns (difficulty, strategy, rationale) where difficulty is one of:
+    trivial, easy, moderate, hard.
+    """
+    first_name = dup["firstFile"]["name"]
+    second_name = dup["secondFile"]["name"]
+    n_lines = dup.get("lines", 0)
+    fmt = dup.get("format", "")
+    same_file = first_name == second_name
+
+    # Detect test files
+    is_test = any(
+        t in n for n in (first_name, second_name)
+        for t in ("test_", "tests/", "_test.", "conftest", "spec.", "spec/")
+    )
+
+    # Detect documentation / markdown
+    is_docs = fmt in ("markdown", "markup") or any(
+        n.endswith((".md", ".rst", ".adoc")) for n in (first_name, second_name)
+    )
+
+    # Same directory?
+    first_dir = "/".join(first_name.split("/")[:-1])
+    second_dir = "/".join(second_name.split("/")[:-1])
+    same_dir = first_dir == second_dir
+
+    if is_docs:
+        if same_file:
+            return (
+                "easy",
+                "Consolidate repeated sections within this document",
+                "Same content appears multiple times in one file. "
+                "Merge into a single section and add internal cross-references.",
+            )
+        return (
+            "moderate",
+            "Create a canonical section and cross-reference",
+            "Duplicated documentation across files. Extract shared content "
+            "into a single authoritative location and reference it "
+            "(e.g., includes, links, or shortcodes).",
+        )
+
+    if is_test:
+        if same_file:
+            difficulty = "easy" if n_lines <= 10 else "moderate"
+            return (
+                difficulty,
+                "Extract test fixture or parametrize",
+                "Duplicated test setup/assertions within one test file. "
+                "Use `@pytest.fixture`, `@pytest.mark.parametrize`, "
+                "or a helper function to share the common pattern.",
+            )
+        return (
+            "moderate",
+            "Extract shared test fixture to conftest.py",
+            "Duplicated test code across files. Move common setup into "
+            "`conftest.py` as a shared fixture, or into a test utilities module.",
+        )
+
+    if same_file:
+        difficulty = "trivial" if n_lines <= 8 else "easy"
+        return (
+            difficulty,
+            "Extract local helper function",
+            f"Duplicated logic within `{first_name.split('/')[-1]}`. "
+            "Extract into a private function in the same module.",
+        )
+
+    if same_dir:
+        difficulty = "easy" if n_lines <= 10 else "moderate"
+        return (
+            difficulty,
+            "Extract shared function into sibling module",
+            f"Duplicated code in same package (`{first_dir or './'}`). "
+            "Extract into a shared utility module within the package.",
+        )
+
+    # Different directories / packages
+    difficulty = "moderate" if n_lines <= 15 else "hard"
+    return (
+        difficulty,
+        "Extract into shared library or utils package",
+        "Duplicated code across different packages. Consider a shared "
+        "utility module or library that both can import.",
+    )
+
+
+DIFFICULTY_LABELS = {
+    "trivial": "Trivial",
+    "easy": "Easy",
+    "moderate": "Moderate",
+    "hard": "Hard",
+}
+
+
+def render_overview_table(all_dups):
+    """Render a compact overview table of all clusters with mediation info."""
+    headers = ["#", "Lines", "Difficulty", "Strategy", "Files"]
+    rows = []
+    for i, (_proj, dup) in enumerate(all_dups, 1):
+        difficulty, strategy, _rationale = classify_cluster(dup)
+        first_short = dup["firstFile"]["name"].rsplit("/", 1)[-1]
+        second_short = dup["secondFile"]["name"].rsplit("/", 1)[-1]
+        if first_short == second_short:
+            files_str = first_short
+        else:
+            files_str = f"{first_short} / {second_short}"
+        label = DIFFICULTY_LABELS.get(difficulty, difficulty)
+        rows.append([
+            str(i),
+            str(dup.get("lines", 0)),
+            label,
+            strategy,
+            files_str,
+        ])
+
+    widths = [len(h) for h in headers]
+    for row in rows:
+        for j, cell in enumerate(row):
+            widths[j] = max(widths[j], len(cell))
+
+    def fmt_row(cells):
+        parts = []
+        for j, cell in enumerate(cells):
+            parts.append(f" {cell:<{widths[j]}} ")
+        return "|" + "|".join(parts) + "|"
+
+    lines = [
+        fmt_row(headers),
+        "|" + "|".join("-" * (w + 2) for w in widths) + "|",
+    ]
+    for row in rows:
+        lines.append(fmt_row(row))
+    return "\n".join(lines)
+
+
+def render_cluster(idx, dup, prefix=""):
+    """Render a single duplicate cluster as a <details> block with mediation."""
+    fmt = dup.get("format", "")
+    lang = format_language(fmt)
+    first = dup["firstFile"]
+    second = dup["secondFile"]
+    n_lines = dup.get("lines", 0)
+
+    first_name = first["name"]
+    second_name = second["name"]
+    first_range = f"lines {first['start']}-{first['end']}"
+    second_range = f"lines {second['start']}-{second['end']}"
+
+    difficulty, strategy, rationale = classify_cluster(dup)
+    diff_label = DIFFICULTY_LABELS.get(difficulty, difficulty)
+
+    label = f"{prefix}Cluster {idx}"
+    summary = (
+        f"[{diff_label}] "
+        f"`{first_name}` {first_range} "
+        f"&harr; `{second_name}` {second_range} "
+        f"({n_lines} lines)"
+    )
+
+    fragment = dup.get("fragment", "")
+    fragment = truncate_fragment(fragment)
+
+    block = [
+        "<details>",
+        f"<summary><b>{label}</b>: {summary}</summary>",
+        "",
+        "**Files involved:**",
+        f"- `{first_name}` ({first_range})",
+        f"- `{second_name}` ({second_range})",
+        "",
+    ]
+
+    if fragment.strip():
+        fence = "~~~"
+        while fence in fragment:
+            fence += "~"
+        block.extend([
+            "**Duplicated fragment:**",
+            f"{fence}{lang}",
+            fragment,
+            fence,
+            "",
+        ])
+
+    # Inline mediation recommendation
+    block.extend([
+        f"**Mediation** ({diff_label}): {strategy}",
+        "",
+        f"> {rationale}",
+        "",
+    ])
+
+    block.extend(["</details>", ""])
+    return "\n".join(block)
+
+
+def render_report(projects, threshold, cross_project=None, jscpd_version=None,
+                   badge_path=None):
+    """Render the full Markdown report."""
+    now = datetime.now(timezone.utc).strftime("%Y-%m-%d")
+    version_str = jscpd_version or "unknown"
+
+    parts = [
+        "# Duplication Analysis Report",
+        "",
+        f"> Generated: {now} | Tool: jscpd {version_str} | Threshold: {threshold}%",
+        "",
+    ]
+
+    if badge_path:
+        parts.append(f"![Copy/Paste]({badge_path})")
+        parts.append("")
+
+    parts.extend([
+        "## Summary",
+        "",
+        render_summary_table(projects),
+        "",
+    ])
+
+    # Status badge
+    any_over = any(p["stats"]["percentage"] > threshold for p in projects)
+    if any_over:
+        over = [p for p in projects if p["stats"]["percentage"] > threshold]
+        names = ", ".join(p["name"] for p in over)
+        parts.append(
+            f"> **WARNING**: Duplication exceeds {threshold}% threshold in: {names}"
+        )
+    else:
+        parts.append(
+            f"> Duplication is within the {threshold}% threshold for all projects."
+        )
+    parts.append("")
+
+    # Collect all duplicates for the overview table
+    all_dups = []
+    for p in projects:
+        for dup in sorted(
+            p.get("duplicates", []),
+            key=lambda d: d.get("lines", 0),
+            reverse=True,
+        ):
+            all_dups.append((p["name"], dup))
+    if cross_project:
+        for dup in sorted(
+            cross_project.get("duplicates", []),
+            key=lambda d: d.get("lines", 0),
+            reverse=True,
+        ):
+            all_dups.append(("cross-project", dup))
+
+    # Duplicate Clusters section with overview table at the top
+    parts.append("## Duplicate Clusters")
+    parts.append("")
+
+    if not all_dups:
+        parts.append("No duplicates found.")
+        parts.append("")
+        return "\n".join(parts)
+
+    # Overview table
+    parts.append(render_overview_table(all_dups))
+    parts.append("")
+
+    # Per-project cluster details
+    global_idx = 1
+    for p in projects:
+        if len(projects) > 1:
+            parts.append(f"### {p['name']}")
+            parts.append("")
+
+        duplicates = p.get("duplicates", [])
+        if not duplicates:
+            parts.append("No duplicates found.")
+            parts.append("")
+            continue
+
+        duplicates = sorted(duplicates, key=lambda d: d.get("lines", 0), reverse=True)
+
+        for dup in duplicates:
+            parts.append(render_cluster(global_idx, dup))
+            global_idx += 1
+
+    # Cross-project section
+    if cross_project:
+        cross_dups = cross_project.get("duplicates", [])
+        if cross_dups:
+            parts.append("### Cross-Project Duplicates")
+            parts.append("")
+            cross_dups = sorted(
+                cross_dups, key=lambda d: d.get("lines", 0), reverse=True
+            )
+            for i, dup in enumerate(cross_dups, 1):
+                parts.append(render_cluster(global_idx, dup, prefix="Cross-project "))
+                global_idx += 1
+
+    return "\n".join(parts)
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Generate Markdown duplication report from jscpd JSON"
+    )
+    parser.add_argument(
+        "reports", nargs="+", help="Path(s) to jscpd-report.json files"
+    )
+    parser.add_argument(
+        "--threshold",
+        type=float,
+        default=5.0,
+        help="Duplication warning threshold percentage (default: 5)",
+    )
+    parser.add_argument(
+        "--output",
+        default=None,
+        help="Output markdown file (default: stdout)",
+    )
+    parser.add_argument(
+        "--cross-project",
+        default=None,
+        help="Path to cross-project jscpd-report.json",
+    )
+    parser.add_argument(
+        "--jscpd-version",
+        default=None,
+        help="jscpd version string for the report header",
+    )
+    parser.add_argument(
+        "--badge-path",
+        default=None,
+        help="Relative path to the jscpd-badge.svg for embedding in report",
+    )
+    args = parser.parse_args()
+
+    projects = []
+    for rpath in args.reports:
+        data = load_report(rpath)
+        name = guess_project_name(rpath)
+        projects.append({
+            "name": name,
+            "stats": data.get("statistics", {}).get("total", {}),
+            "duplicates": data.get("duplicates", []),
+        })
+
+    cross_project_data = None
+    if args.cross_project:
+        cross_project_data = load_report(args.cross_project)
+
+    report = render_report(
+        projects,
+        args.threshold,
+        cross_project=cross_project_data,
+        jscpd_version=args.jscpd_version,
+        badge_path=args.badge_path,
+    )
+
+    if args.output:
+        Path(args.output).parent.mkdir(parents=True, exist_ok=True)
+        with open(args.output, "w") as f:
+            f.write(report)
+        print(f"Report written to: {args.output}", file=sys.stderr)
+    else:
+        print(report)
+
+
+if __name__ == "__main__":
+    main()

From 4453aea47c6420f3a3e7ec849f5e7f8a0f215e8f Mon Sep 17 00:00:00 2001
From: Yaroslav Halchenko <debian@onerussian.com>
Date: Tue, 7 Apr 2026 13:08:54 -0400
Subject: [PATCH 2/8] analyze-duplicates: use local .tmp/, link files to
 remote, C not #

- Use .tmp/ in the scanned project instead of /tmp/ for intermediate
  output (jscpd JSON, HTML reports)
- Use "C" instead of "#" as the cluster column header to avoid GitHub
  auto-linking to issues/PRs
- File references in cluster details are now hyperlinks to the file on
  the tracked remote (e.g., GitHub blob URL with line anchors)
- Auto-detects repo URL and branch from git remote; also accepts
  --repo-url, --branch, --scan-path overrides

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
---
 analyze-duplicates/SKILL.md           |  11 +--
 analyze-duplicates/generate-report.py | 100 +++++++++++++++++++++++---
 2 files changed, 95 insertions(+), 16 deletions(-)

diff --git a/analyze-duplicates/SKILL.md b/analyze-duplicates/SKILL.md
index 476f911..027b148 100644
--- a/analyze-duplicates/SKILL.md
+++ b/analyze-duplicates/SKILL.md
@@ -108,6 +108,7 @@ python3 SKILL_DIR/generate-report.py \
     --threshold THRESHOLD \
     --output REPORT_PATH \
     --jscpd-version "$(npx --yes jscpd@latest --version 2>/dev/null)" \
+    --scan-path PATH \
     [--cross-project .tmp/jscpd-combined/jscpd-report.json] \
     .tmp/jscpd-PROJECT1/jscpd-report.json \
     [.tmp/jscpd-PROJECT2/jscpd-report.json ...]
@@ -168,17 +169,17 @@ renders well when posted as a GitHub/Gitea issue. Structure:
 
 ## Duplicate Clusters
 
-| # | Lines | Difficulty        | Strategy                      | Files   |
-|---|-------|-------------------|-------------------------------|---------|
-| 1 | 8     | Trivial | Extract local helper function | file.py |
+| C | Lines | Difficulty | Strategy                      | Files   |
+|---|-------|------------|-------------------------------|---------|
+| 1 | 8     | Trivial    | Extract local helper function | file.py |
 
 <details>
 <summary><b>Cluster 1</b>: [Trivial] `file.py` lines 10-18
 &harr; `file.py` lines 30-38 (8 lines)</summary>
 
 **Files involved:**
-- `file.py` (lines 10-18)
-- `file.py` (lines 30-38)
+- [`file.py` (lines 10-18)](https://github.com/owner/repo/blob/main/file.py#L10-L18)
+- [`file.py` (lines 30-38)](https://github.com/owner/repo/blob/main/file.py#L30-L38)
 
 **Duplicated fragment:**
 ~~~python
diff --git a/analyze-duplicates/generate-report.py b/analyze-duplicates/generate-report.py
index 28722da..2276ae3 100644
--- a/analyze-duplicates/generate-report.py
+++ b/analyze-duplicates/generate-report.py
@@ -15,11 +15,54 @@
 
 import argparse
 import json
+import re
+import subprocess
 import sys
 from datetime import datetime, timezone
 from pathlib import Path
 
 
+def detect_git_info(scan_path):
+    """Try to detect the remote browse URL and branch for a git repo."""
+    try:
+        branch = subprocess.run(
+            ["git", "-C", scan_path, "rev-parse", "--abbrev-ref", "HEAD"],
+            capture_output=True, text=True, timeout=5,
+        ).stdout.strip()
+        # Find the remote that the branch tracks, fall back to origin
+        tracking_remote = subprocess.run(
+            ["git", "-C", scan_path, "config",
+             f"branch.{branch}.remote"],
+            capture_output=True, text=True, timeout=5,
+        ).stdout.strip() or "origin"
+        remote_url = subprocess.run(
+            ["git", "-C", scan_path, "remote", "get-url", tracking_remote],
+            capture_output=True, text=True, timeout=5,
+        ).stdout.strip()
+        if not remote_url or not branch:
+            return None, None
+        # Convert git@ or https:// URL to browse URL
+        browse_url = remote_url
+        browse_url = re.sub(r"\.git$", "", browse_url)
+        browse_url = re.sub(
+            r"^git@([^:]+):", r"https://\1/", browse_url
+        )
+        return browse_url, branch
+    except (subprocess.SubprocessError, OSError):
+        return None, None
+
+
+def file_link(name, start, end, repo_url, branch):
+    """Format a file reference, as a hyperlink if repo info is available."""
+    label = f"`{name}` (lines {start}-{end})"
+    if repo_url and branch:
+        # Strip leading ./ or ../ — jscpd paths are relative to scan dir
+        clean = re.sub(r"^(\.\./?)+" , "", name)
+        url = f"{repo_url}/blob/{branch}/{clean}#L{start}-L{end}"
+        return f"[{label}]({url})"
+    return label
+
+
 def load_report(path):
     """Load a jscpd JSON report."""
     try:
@@ -216,7 +259,7 @@ def classify_cluster(dup):
 
 def render_overview_table(all_dups):
     """Render a compact overview table of all clusters with mediation info."""
-    headers = ["#", "Lines", "Difficulty", "Strategy", "Files"]
+    headers = ["C", "Lines", "Difficulty", "Strategy", "Files"]
     rows = []
     for i, (_proj, dup) in enumerate(all_dups, 1):
         difficulty, strategy, _rationale = classify_cluster(dup)
@@ -255,7 +298,7 @@ def fmt_row(cells):
     return "\n".join(lines)
 
 
-def render_cluster(idx, dup, prefix=""):
+def render_cluster(idx, dup, prefix="", repo_url=None, branch=None):
     """Render a single duplicate cluster as a <details> block with mediation."""
     fmt = dup.get("format", "")
     lang = format_language(fmt)
@@ -265,17 +308,21 @@ def render_cluster(idx, dup, prefix=""):
 
     first_name = first["name"]
     second_name = second["name"]
-    first_range = f"lines {first['start']}-{first['end']}"
-    second_range = f"lines {second['start']}-{second['end']}"
+
+    first_link = file_link(first_name, first["start"], first["end"],
+                           repo_url, branch)
+    second_link = file_link(second_name, second["start"], second["end"],
+                            repo_url, branch)
 
     difficulty, strategy, rationale = classify_cluster(dup)
     diff_label = DIFFICULTY_LABELS.get(difficulty, difficulty)
 
     label = f"{prefix}Cluster {idx}"
+    # Summary line uses plain text (no links — they don't work inside <summary>)
     summary = (
         f"[{diff_label}] "
-        f"`{first_name}` {first_range} "
-        f"&harr; `{second_name}` {second_range} "
+        f"`{first_name}` lines {first['start']}-{first['end']} "
+        f"&harr; `{second_name}` lines {second['start']}-{second['end']} "
         f"({n_lines} lines)"
     )
 
@@ -287,8 +334,8 @@ def render_cluster(idx, dup, prefix=""):
         f"<summary><b>{label}</b>: {summary}</summary>",
         "",
         "**Files involved:**",
-        f"- `{first_name}` ({first_range})",
-        f"- `{second_name}` ({second_range})",
+        f"- {first_link}",
+        f"- {second_link}",
         "",
     ]
 
@@ -317,7 +364,7 @@ def render_cluster(idx, dup, prefix=""):
 
 
 def render_report(projects, threshold, cross_project=None, jscpd_version=None,
-                   badge_path=None):
+                   badge_path=None, repo_url=None, branch=None):
     """Render the full Markdown report."""
     now = datetime.now(timezone.utc).strftime("%Y-%m-%d")
     version_str = jscpd_version or "unknown"
@@ -400,7 +447,8 @@ def render_report(projects, threshold, cross_project=None, jscpd_version=None,
         duplicates = sorted(duplicates, key=lambda d: d.get("lines", 0), reverse=True)
 
         for dup in duplicates:
-            parts.append(render_cluster(global_idx, dup))
+            parts.append(render_cluster(global_idx, dup,
+                                        repo_url=repo_url, branch=branch))
             global_idx += 1
 
     # Cross-project section
@@ -413,7 +461,9 @@ def render_report(projects, threshold, cross_project=None, jscpd_version=None,
                 cross_dups, key=lambda d: d.get("lines", 0), reverse=True
             )
             for i, dup in enumerate(cross_dups, 1):
-                parts.append(render_cluster(global_idx, dup, prefix="Cross-project "))
+                parts.append(render_cluster(global_idx, dup,
+                                            prefix="Cross-project ",
+                                            repo_url=repo_url, branch=branch))
                 global_idx += 1
 
     return "\n".join(parts)
@@ -452,8 +502,34 @@ def main():
         default=None,
         help="Relative path to the jscpd-badge.svg for embedding in report",
     )
+    parser.add_argument(
+        "--repo-url",
+        default=None,
+        help="Repository browse URL (e.g., https://github.com/owner/repo). "
+             "Auto-detected from git remote if not specified.",
+    )
+    parser.add_argument(
+        "--branch",
+        default=None,
+        help="Branch name for file links. Auto-detected from git if not specified.",
+    )
+    parser.add_argument(
+        "--scan-path",
+        default=".",
+        help="Path that was scanned (used for git auto-detection, default: .)",
+    )
     args = parser.parse_args()
 
+    # Auto-detect repo URL and branch if not provided
+    repo_url = args.repo_url
+    branch = args.branch
+    if not repo_url or not branch:
+        auto_url, auto_branch = detect_git_info(args.scan_path)
+        if not repo_url:
+            repo_url = auto_url
+        if not branch:
+            branch = auto_branch
+
     projects = []
     for rpath in args.reports:
         data = load_report(rpath)
@@ -474,6 +550,8 @@ def main():
         cross_project=cross_project_data,
         jscpd_version=args.jscpd_version,
         badge_path=args.badge_path,
+        repo_url=repo_url,
+        branch=branch,
     )
 
     if args.output:

From 5c0ff9c3b47cb95afef46183181ba742c53e60df Mon Sep 17 00:00:00 2001
From: Yaroslav Halchenko <debian@onerussian.com>
Date: Tue, 7 Apr 2026 13:35:07 -0400
Subject: [PATCH 3/8] analyze-duplicates: smarter ignores, asset detection,
 better file labels
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- SKILL.md: don't blindly ignore build/ and dist/ — check if they're
  git-tracked first; find and ignore symlinks to avoid false positives
- classify_cluster: detect asset files (SVG, images, fonts) and suggest
  keeping one copy + symlink/reference
- Overview table: when two files share the same basename but live in
  different directories, show parent/file (e.g., configure-domain/SKILL.md
  / configure-domain/SKILL.md) to distinguish copies across build/dist

Tested on smestern/sciagent which has diverged skill copies across
build/, dist/, and templates/ directories.

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
---
 analyze-duplicates/SKILL.md           | 14 ++++++++++++-
 analyze-duplicates/generate-report.py | 29 ++++++++++++++++++++++++---
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/analyze-duplicates/SKILL.md b/analyze-duplicates/SKILL.md
index 027b148..b5226b0 100644
--- a/analyze-duplicates/SKILL.md
+++ b/analyze-duplicates/SKILL.md
@@ -83,13 +83,25 @@ npx --yes jscpd@latest \
     --min-tokens MIN_TOKENS \
     --reporters "json,html" \
     --output .tmp/jscpd-PROJECTNAME \
-    --ignore "**/.tox/**,**/venv*/**,**/.venv/**,**/node_modules/**,**/__pycache__/**,**/build/**,**/dist/**,**/.eggs/**,**/.git/**,**/.npm/**,**/.tmp/**" \
+    --ignore "IGNORE_PATTERNS" \
     PATH
 ```
 
 If `--no-html` is set, omit `html` from reporters. If `--badge` is set, add `badge` to reporters.
 If `--formats` is set, add `--format FORMATS`.
 
+**Building the ignore list** — start with these safe defaults:
+`**/.tox/**,**/venv*/**,**/.venv/**,**/node_modules/**,**/__pycache__/**,**/.eggs/**,**/.git/**,**/.npm/**,**/.tmp/**`
+
+Then for each of `build/`, `dist/`, `.eggs/`:
+- Check if the directory is **tracked by git** (`git ls-files --error-unmatch DIR/ 2>/dev/null`)
+- If tracked: do NOT ignore it (it's intentionally committed content)
+- If untracked: add it to the ignore list
+
+Additionally, find all **symlinks** in the scan path (`find PATH -type l`) and
+add ignore patterns for them (e.g., `**/symlinked-dir/**`). Symlinked content
+is intentionally shared — duplicates from symlinks are noise, not bugs.
+
 This produces:
 - `.tmp/jscpd-PROJECTNAME/jscpd-report.json` — structured data for the markdown report
 - `.tmp/jscpd-PROJECTNAME/html/index.html` — interactive HTML report with syntax highlighting
diff --git a/analyze-duplicates/generate-report.py b/analyze-duplicates/generate-report.py
index 2276ae3..332f7e3 100644
--- a/analyze-duplicates/generate-report.py
+++ b/analyze-duplicates/generate-report.py
@@ -178,6 +178,22 @@ def classify_cluster(dup):
         for t in ("test_", "tests/", "_test.", "conftest", "spec.", "spec/")
     )
 
+    # Detect generated / binary-like artifacts (SVG, images, configs)
+    is_asset = any(
+        n.endswith((".svg", ".png", ".jpg", ".ico", ".woff", ".woff2", ".eot", ".ttf"))
+        for n in (first_name, second_name)
+    )
+    if is_asset:
+        if same_file:
+            return ("trivial", "Internal duplication in asset file",
+                    "Repeated content within an asset file. Usually harmless.")
+        return (
+            "easy",
+            "Deduplicate asset — keep one copy and reference it",
+            "Same asset committed in multiple locations. "
+            "Keep a single canonical copy and reference/symlink from other locations.",
+        )
+
     # Detect documentation / markdown
     is_docs = fmt in ("markdown", "markup") or any(
         n.endswith((".md", ".rst", ".adoc")) for n in (first_name, second_name)
@@ -263,10 +279,17 @@ def render_overview_table(all_dups):
     rows = []
     for i, (_proj, dup) in enumerate(all_dups, 1):
         difficulty, strategy, _rationale = classify_cluster(dup)
-        first_short = dup["firstFile"]["name"].rsplit("/", 1)[-1]
-        second_short = dup["secondFile"]["name"].rsplit("/", 1)[-1]
-        if first_short == second_short:
+        first_name = dup["firstFile"]["name"]
+        second_name = dup["secondFile"]["name"]
+        first_short = first_name.rsplit("/", 1)[-1]
+        second_short = second_name.rsplit("/", 1)[-1]
+        if first_name == second_name:
             files_str = first_short
+        elif first_short == second_short:
+            # Same filename in different dirs — show parent/file
+            first_ctx = "/".join(first_name.rsplit("/", 2)[-2:])
+            second_ctx = "/".join(second_name.rsplit("/", 2)[-2:])
+            files_str = f"{first_ctx} / {second_ctx}"
         else:
             files_str = f"{first_short} / {second_short}"
         label = DIFFICULTY_LABELS.get(difficulty, difficulty)

From 42abc552c3384304c61a83ec2d17766da5b15f15 Mon Sep 17 00:00:00 2001
From: Yaroslav Halchenko <debian@onerussian.com>
Date: Tue, 7 Apr 2026 15:14:54 -0400
Subject: [PATCH 4/8] analyze-duplicates: add %file column showing clone
 coverage

Add a %file column to the overview table showing what percentage of
the smaller involved file is covered by the clone. This distinguishes
full-file copies (100%) from partial overlaps (e.g., 29%), making it
immediately clear which clusters are diverged copies vs shared snippets.

For clusters where %file >= 50%, the detail summary also shows
"N% of file" inline.

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
---
 analyze-duplicates/generate-report.py | 53 ++++++++++++++++++++++-----
 1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/analyze-duplicates/generate-report.py b/analyze-duplicates/generate-report.py
index 332f7e3..d522120 100644
--- a/analyze-duplicates/generate-report.py
+++ b/analyze-duplicates/generate-report.py
@@ -63,6 +63,32 @@ def file_link(name, start, end, repo_url, branch):
     return label
 
 
+def build_file_lines_map(data):
+    """Build a {filepath: total_lines} map from jscpd statistics."""
+    file_lines = {}
+    for _fmt_name, fmt_data in data.get("statistics", {}).get("formats", {}).items():
+        for fpath, finfo in fmt_data.get("sources", {}).items():
+            file_lines[fpath] = finfo.get("lines", 0)
+    return file_lines
+
+
+def clone_file_percent(dup, file_lines):
+    """Compute what % of the smaller file is covered by the clone.
+
+    Returns an int 0-100. Uses the smaller of the two files as denominator
+    so that "95%" means the clone is nearly the entire file.
+    """
+    first_name = dup["firstFile"]["name"]
+    second_name = dup["secondFile"]["name"]
+    first_total = file_lines.get(first_name, 0)
+    second_total = file_lines.get(second_name, 0)
+    smaller = min(first_total, second_total) if first_total and second_total else 0
+    if smaller == 0:
+        return 0
+    clone_lines = dup.get("lines", 0)
+    return min(100, round(100 * clone_lines / smaller))
+
+
 def load_report(path):
     """Load a jscpd JSON report."""
     try:
@@ -273,9 +299,9 @@ def classify_cluster(dup):
 }
 
 
-def render_overview_table(all_dups):
+def render_overview_table(all_dups, file_lines):
     """Render a compact overview table of all clusters with mediation info."""
-    headers = ["C", "Lines", "Difficulty", "Strategy", "Files"]
+    headers = ["C", "Lines", "%file", "Difficulty", "Strategy", "Files"]
     rows = []
     for i, (_proj, dup) in enumerate(all_dups, 1):
         difficulty, strategy, _rationale = classify_cluster(dup)
@@ -286,16 +312,17 @@ def render_overview_table(all_dups):
         if first_name == second_name:
             files_str = first_short
         elif first_short == second_short:
-            # Same filename in different dirs — show parent/file
             first_ctx = "/".join(first_name.rsplit("/", 2)[-2:])
             second_ctx = "/".join(second_name.rsplit("/", 2)[-2:])
             files_str = f"{first_ctx} / {second_ctx}"
         else:
             files_str = f"{first_short} / {second_short}"
         label = DIFFICULTY_LABELS.get(difficulty, difficulty)
+        pct = clone_file_percent(dup, file_lines)
         rows.append([
             str(i),
             str(dup.get("lines", 0)),
+            f"{pct}%",
             label,
             strategy,
             files_str,
@@ -321,7 +348,8 @@ def fmt_row(cells):
     return "\n".join(lines)
 
 
-def render_cluster(idx, dup, prefix="", repo_url=None, branch=None):
+def render_cluster(idx, dup, prefix="", repo_url=None, branch=None,
+                    file_lines=None):
     """Render a single duplicate cluster as a <details> block with mediation."""
     fmt = dup.get("format", "")
     lang = format_language(fmt)
@@ -339,14 +367,16 @@ def render_cluster(idx, dup, prefix="", repo_url=None, branch=None):
 
     difficulty, strategy, rationale = classify_cluster(dup)
     diff_label = DIFFICULTY_LABELS.get(difficulty, difficulty)
+    pct = clone_file_percent(dup, file_lines or {})
 
     label = f"{prefix}Cluster {idx}"
+    pct_str = f" {pct}% of file" if pct >= 50 else ""
     # Summary line uses plain text (no links — they don't work inside <summary>)
     summary = (
         f"[{diff_label}] "
         f"`{first_name}` lines {first['start']}-{first['end']} "
         f"&harr; `{second_name}` lines {second['start']}-{second['end']} "
-        f"({n_lines} lines)"
+        f"({n_lines} lines{pct_str})"
     )
 
     fragment = dup.get("fragment", "")
@@ -387,7 +417,7 @@ def render_cluster(idx, dup, prefix="", repo_url=None, branch=None):
 
 
 def render_report(projects, threshold, cross_project=None, jscpd_version=None,
-                   badge_path=None, repo_url=None, branch=None):
+                   badge_path=None, repo_url=None, branch=None, file_lines=None):
     """Render the full Markdown report."""
     now = datetime.now(timezone.utc).strftime("%Y-%m-%d")
     version_str = jscpd_version or "unknown"
@@ -451,7 +481,7 @@ def render_report(projects, threshold, cross_project=None, jscpd_version=None,
         return "\n".join(parts)
 
     # Overview table
-    parts.append(render_overview_table(all_dups))
+    parts.append(render_overview_table(all_dups, file_lines or {}))
     parts.append("")
 
     # Per-project cluster details
@@ -471,7 +501,8 @@ def render_report(projects, threshold, cross_project=None, jscpd_version=None,
 
         for dup in duplicates:
             parts.append(render_cluster(global_idx, dup,
-                                        repo_url=repo_url, branch=branch))
+                                        repo_url=repo_url, branch=branch,
+                                        file_lines=file_lines))
             global_idx += 1
 
     # Cross-project section
@@ -486,7 +517,8 @@ def render_report(projects, threshold, cross_project=None, jscpd_version=None,
             for i, dup in enumerate(cross_dups, 1):
                 parts.append(render_cluster(global_idx, dup,
                                             prefix="Cross-project ",
-                                            repo_url=repo_url, branch=branch))
+                                            repo_url=repo_url, branch=branch,
+                                            file_lines=file_lines))
                 global_idx += 1
 
     return "\n".join(parts)
@@ -554,9 +586,11 @@ def main():
             branch = auto_branch
 
     projects = []
+    file_lines = {}
     for rpath in args.reports:
         data = load_report(rpath)
         name = guess_project_name(rpath)
+        file_lines.update(build_file_lines_map(data))
         projects.append({
             "name": name,
             "stats": data.get("statistics", {}).get("total", {}),
@@ -575,6 +609,7 @@ def main():
         badge_path=args.badge_path,
         repo_url=repo_url,
         branch=branch,
+        file_lines=file_lines,
     )
 
     if args.output:

From cfcc9115546585be77837bfe35dd50bf72e3e22e Mon Sep 17 00:00:00 2001
From: Yaroslav Halchenko <debian@onerussian.com>
Date: Wed, 29 Apr 2026 08:39:14 -0400
Subject: [PATCH 5/8] github-project-status: derive repo from git remote when
 no arg given

Adds a "no argument" code path: run `git remote -v` and pick upstream by
priority (`upstream` -> `origin` -> first GitHub remote), extracting
`owner/repo` from either HTTPS or SSH URL forms. Lets the skill be
invoked from inside a checkout without retyping the slug.

Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 <noreply@anthropic.com>
---
 github-project-status/SKILL.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/github-project-status/SKILL.md b/github-project-status/SKILL.md
index 545e193..f39d8aa 100644
--- a/github-project-status/SKILL.md
+++ b/github-project-status/SKILL.md
@@ -23,6 +23,12 @@ Accept GitHub project references in these formats:
 - Full URL: `https://github.com/owner/repo`
 - Short form: `owner/repo`
 - Organization URL: `https://github.com/org` (analyze main repos)
+- **No argument given**: Run `git remote -v` in the current directory and identify the upstream repository. Use the first match from this priority order:
+  1. Remote named `upstream`
+  2. Remote named `origin`
+  3. First remote with a GitHub URL
+  
+  Extract `owner/repo` from the remote URL (handles both `https://github.com/owner/repo` and `git@github.com:owner/repo` formats).
 
 ## Analysis Steps
 

From 2bab6d4fb2f0f3f31090ff52d22e84a442e81beb Mon Sep 17 00:00:00 2001
From: Yaroslav Halchenko <debian@onerussian.com>
Date: Wed, 29 Apr 2026 08:39:42 -0400
Subject: [PATCH 6/8] pr-feedback-review: add companion-account auth and
 red/green fix workflow

- Add AI_COMPANION_TOKEN_FILE config (default `~/.claude/gh-token`) so
  reply scripts can post as a dedicated bot account (e.g.
  `yarikoptic-gitmate`) instead of the user's personal account. Reply
  scripts source the token file and print the posting account for
  verification.
- For [ADDRESSED] comments, prescribe a red/green TDD loop: extend a
  failing test, apply the minimal fix, run the suite, commit, and
  reference the commit SHA in the reply so reviewers can click through.
  Falls back to "propose only" when a fix is too risky to apply
  immediately.
- Update Step 9 prompt to reflect that fixes are committed earlier and
  the remaining decision is whether to push and post replies.

Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 <noreply@anthropic.com>
---
 pr-feedback-review/SKILL.md | 45 +++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/pr-feedback-review/SKILL.md b/pr-feedback-review/SKILL.md
index f1384a3..af87edf 100644
--- a/pr-feedback-review/SKILL.md
+++ b/pr-feedback-review/SKILL.md
@@ -19,6 +19,12 @@ This skill uses the following values. Adjust for your setup by editing this sect
 - **SCAN_DIRS**: `~/proj` — comma-separated parent directories to scan for git repos
 - **GITHUB_USER**: `yarikoptic` — your GitHub username
 - **MAX_SCAN_DEPTH**: `3` — how deep to recurse when scanning for repos
+- **AI_COMPANION_TOKEN_FILE**: `~/.claude/gh-token` — path to a shell-sourceable
+  file that exports `GH_TOKEN` for an AI companion GitHub account (e.g.
+  `yarikoptic-gitmate`). When present, reply scripts use this token so
+  responses are posted from the companion account rather than the user's
+  personal account. The file should contain `export GH_TOKEN=github_pat_...`.
+  Set to empty string to disable and post as yourself.
 
 Throughout this document, these names refer to the configured values above.
 
@@ -230,7 +236,17 @@ recommendation based on its type and actionability:
 - Show the comment text and the relevant code context
 - If a ` ```suggestion ` block exists, show exactly what it would change
   (before/after)
-- Propose a specific code edit to apply
+- **Fix the issue directly** using red/green TDD:
+  1. **Red**: Write or extend a test that exposes the bug/missing behavior.
+     Prefer extending an existing test over creating a new one. Run it to
+     confirm it fails.
+  2. **Green**: Apply the minimal code fix. Run the test to confirm it passes.
+  3. **Verify**: Run the broader test suite to ensure no regressions.
+  4. **Commit**: Create a commit with a message referencing the review comment
+     (e.g. "Spotted by Copilot review on PR #NNN").
+  5. Include the commit SHA in the reply so the reviewer can verify the fix.
+- If the fix is too complex or risky to apply immediately, propose the edit
+  and note it as a follow-up instead of committing.
 
 **Dismissible comments** (actionable=no):
 - Draft a concise response explaining why no change is needed
@@ -321,9 +337,30 @@ selectively run replies.
      -f body="<reply text>" > /dev/null && echo "  replied to COMMENT_ID" \
      || echo "  FAILED to reply to COMMENT_ID"
    ```
+   For `[ADDRESSED]` comments that were fixed via commit (Step 7), include
+   the short commit SHA and first line of the commit message in the reply
+   body, e.g.: "Fixed in abc1234 `BF: forward recursion_limit ...` — ..."
+   so the reviewer can click through to verify the fix.
 
 3. Script format requirements:
-   - Header with `#!/bin/bash`, `set -e`, and `REPO`/`PR` variables
+   - Header with `#!/bin/bash`, `set -e`, `REPO`/`PR` variables, and
+     companion token setup. When `$AI_COMPANION_TOKEN_FILE` is configured
+     and the file exists, source it at the top of the script so all `gh api`
+     calls authenticate as the companion account:
+     ```bash
+     #!/bin/bash
+     set -e
+     REPO="OWNER/REPO"
+     PR=NUMBER
+
+     # Authenticate as AI companion account
+     source ~/.claude/gh-token  # exports GH_TOKEN
+     export GH_TOKEN
+     ```
+     Also add a verification line that prints which account is posting:
+     ```bash
+     echo "Posting replies as: $(gh api user --jq .login)"
+     ```
    - Each comment block has:
      - A comment line with file, line number, short description, and
        `[ADDRESSED]`, `[DISMISSED]`, or `[DISCUSS]` tag
@@ -346,10 +383,10 @@ selectively run replies.
 
 ### Step 9 — Interactive Follow-up
 
+Actionable issues should already be fixed and committed in Step 7.
 After presenting the report, ask the user:
 
-- "Should I apply the suggested code changes?" (if any actionable suggestions exist)
-- "Should I post the reply script?" (if the script was generated)
+- "Should I push and post the reply script?" (if fixes were committed)
 - "Any comments you want to re-classify or handle differently?"
 
 Wait for the user's response before taking any further action.

From 73742816da5c3ea5b7c956d2c70774f43921b7fd Mon Sep 17 00:00:00 2001
From: Yaroslav Halchenko <debian@onerussian.com>
Date: Wed, 29 Apr 2026 08:53:46 -0400
Subject: [PATCH 7/8] Add reuse-compliance skill

Move the REUSE-compliance helper out of `~/.claude/commands/` into a
proper skill in this repo so it can be shared (and symlinked to
`~/.claude/skills/`) like the other CON skills.

Content is preserved from the original command, with two adjustments:
- Skill-format frontmatter (`name`, `allowed-tools`, `user-invocable`)
  with a more discovery-friendly description listing concrete trigger
  scenarios.
- Added a "When to Use" section and a "Commit Co-Authorship" section
  matching the convention used by introduce-codespell and friends.

The proposed-structure example no longer mentions a `.reuseignore` file
(REUSE 3.x deprecated it; the body of the skill already explains this
and recommends `.gitignore` instead).

Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 <noreply@anthropic.com>
---
 README.md                 |   1 +
 reuse-compliance/SKILL.md | 507 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 508 insertions(+)
 create mode 100644 reuse-compliance/SKILL.md

diff --git a/README.md b/README.md
index d3b89ae..8423ab7 100644
--- a/README.md
+++ b/README.md
@@ -14,6 +14,7 @@ software project maintenance, triage, and automation.
 | [issue-triage](issue-triage/) | Triage open GitHub issues by cross-referencing the codebase and git history. Detects duplicates, drafts proposed comments, and serves results in a local web dashboard. Includes Python helper scripts for gathering and serving data. |
 | [pr-feedback-review](pr-feedback-review/) | Load a PR's review feedback (human + bot), classify each comment by type and actionability, and recommend what to address vs dismiss — with draft code changes and responses. Works from a local repo or a PR URL. |
 | [pr-review-update](pr-review-update/) | Scan an [improveit-dashboard](https://github.com/yarikoptic/improveit-dashboard) for PRs awaiting your response, assess confidence, auto-rebase codespell PRs, and produce copy-paste-ready push commands. |
+| [reuse-compliance](reuse-compliance/) | Set up and validate [REUSE](https://reuse.software/) licensing compliance: `LICENSES/`, `REUSE.toml`, SPDX headers, and integration with tox / pre-commit / Makefile / GitHub Actions. Handles BIDS dataset data-vs-code separation, [DUO](https://github.com/EBISPOT/DUO) data-use ontology codes, and DEP-3 patch tagging for vendoring repos. |
 | [scan-projects](scan-projects/) | Walk subdirectories of git repos, collect metadata (language, license, commit dates, remote URL), and generate concise LLM-produced summaries into a `projects.tsv` file. Ships with helper scripts for batch updates. |
 | [tinuous-analyzer](tinuous-analyzer/) | Analyze CI log collections gathered by [con/tinuous](https://github.com/con/tinuous/) to pinpoint when a test started failing, diff environment/dependency changes between passing and failing runs, and recommend investigation steps. |
 
diff --git a/reuse-compliance/SKILL.md b/reuse-compliance/SKILL.md
new file mode 100644
index 0000000..07857d9
--- /dev/null
+++ b/reuse-compliance/SKILL.md
@@ -0,0 +1,507 @@
+---
+name: reuse-compliance
+description: Set up and validate REUSE specification compliance (LICENSES/ directory, REUSE.toml, SPDX headers) for software projects and BIDS datasets. Covers BIDS data-vs-code separation, DUO (Data Use Ontology) integration, DEP-3 patch tagging for vendoring repos, and integration with tox / pre-commit / Makefile / GitHub Actions. Use when adding licensing metadata to a project, fixing `reuse lint` failures, licensing a BIDS dataset, or annotating patches in a vendoring repo.
+allowed-tools: Bash, Read, Edit, Write, Glob, Grep, AskUserQuestion
+user-invocable: true
+---
+
+# REUSE Compliance Skill
+
+Implement the [REUSE specification](https://reuse.software/) for clear,
+machine-readable licensing and copyright information. Includes special
+support for BIDS datasets, Data Use Ontology (DUO) integration, and
+DEP-3 patch tagging for vendoring repositories.
+
+## When to Use
+
+- User wants to add REUSE / SPDX licensing metadata to a project
+- User asks to "introduce REUSE" or runs `/reuse-compliance`
+- `reuse lint` is failing and needs to be brought to 100% compliance
+- User is licensing a BIDS dataset (data + code + docs separately)
+- User is annotating `*.patch` files in a vendoring repo with DEP-3 headers
+- User wants to integrate REUSE checks into tox / pre-commit / CI
+
+## Overview
+
+**REUSE Specification** provides standardized practices for declaring copyright and licensing information in software projects and datasets.
+
+**DUO (Data Use Ontology)** from GA4GH provides machine-readable codes for data use restrictions and conditions, particularly for health/biomedical research data.
+
+**Integration**: REUSE handles copyright/licensing (legal permissions), while DUO handles consent-based data use restrictions (ethical/regulatory constraints).
+
+## Key Concepts
+
+### REUSE Core Components
+
+1. **LICENSES/** directory: Contains full license texts (e.g., Apache-2.0.txt, CC0-1.0.txt)
+2. **REUSE.toml**: Configuration file with copyright and license annotations
+3. **`.gitignore`**: REUSE 3.x honors `.gitignore` for excluding build artifacts/caches.
+   (`.reuseignore` was deprecated; do not create new `.reuseignore` files.)
+4. **SPDX headers**: In-file copyright/license declarations
+
+### REUSE.toml `precedence` field
+
+Each `[[annotations]]` block takes a `precedence` value that controls how the
+block-level annotation interacts with in-file SPDX headers:
+
+- `"aggregate"` — block annotation + any in-file SPDX header are combined
+  (good default for most blocks).
+- `"closest"` — in-file SPDX header wins if present; otherwise the block
+  applies. **Use this whenever per-file overrides are expected** (e.g.
+  patch files with DEP-3/SPDX headers, vendored sub-trees with mixed
+  authorship).
+- `"override"` — block always wins, even over in-file SPDX. Rarely the
+  right choice; use only when you cannot trust file headers (e.g.
+  generated/vendored files with stale or missing tags).
+
+### REUSE scope: per-working-tree, not per-branch
+
+`REUSE.toml` describes the working tree it lives in. If a repository has
+substantially different content across branches (e.g. a vendoring repo
+with an `upstream/` branch tracking unmodified upstream alongside a
+`master` branch with local patches), state this in the README and let
+each branch carry its own `REUSE.toml` (or none, deferring to upstream's
+own copyright file). This is uncommon — most projects only need one
+`REUSE.toml` on the default branch.
+
+### BIDS Dataset Considerations
+
+Per [bids-specification#2015](https://github.com/bids-standard/bids-specification/issues/2015):
+- **dataset_description.json**: Contains `License` field for data portion
+- **Multiple licenses**: Code components may need separate licensing from data
+- **REUSE.toml in BIDS**: Should clarify data vs. code licensing
+- **DUO annotations**: Can supplement licenses with data use conditions
+
+### DUO Integration
+
+Per [bids-specification#2078](https://github.com/bids-standard/bids-specification/issues/2078) and [reuse-tool#1148](https://github.com/fsfe/reuse-tool/issues/1148):
+- DUO codes describe data use conditions beyond licensing
+- Examples: "no re-identification" (DUO:0000028), "general research use" (DUO:0000042)
+- Can be included in REUSE.toml or dataset_description.json
+- See: https://github.com/EBISPOT/DUO
+
+## Commit Co-Authorship
+
+All commits created during this workflow MUST include a `Co-Authored-By` trailer identifying
+both Claude Code version and the model used. Get the version via `claude --version` and
+use the model name from the environment. Format:
+
+```
+Co-Authored-By: Claude Code <VERSION> / Claude <MODEL> <noreply@anthropic.com>
+```
+
+Example:
+```
+Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 <noreply@anthropic.com>
+```
+
+## Execution Steps
+
+When this skill is invoked, follow these steps:
+
+### 1. Assess Current State
+
+**Check for existing REUSE infrastructure:**
+- Look for LICENSES/ directory
+- Check if REUSE.toml or .reuse/dep5 exists
+- Check `.gitignore` for build-artifact exclusions (REUSE 3.x honors it;
+  if a legacy `.reuseignore` exists, plan to migrate its entries to
+  `.gitignore` and remove it)
+- Scan for SPDX headers in files
+
+**Check for BIDS dataset:**
+- Look for dataset_description.json
+- Check if it contains License field
+- Identify data files vs. code files (scripts/, code/)
+
+**Check for build system integration:**
+- Check if tox.ini exists → suggest adding [testenv:reuse]
+- Check if .pre-commit-config.yaml exists → suggest adding reuse hook
+- Check if Makefile exists → suggest adding reuse target
+- Check if .github/workflows/ exists → suggest adding reuse check
+
+### 2. Propose REUSE Structure
+
+**For general projects:**
+```
+LICENSES/
+├── Apache-2.0.txt     # Main code license
+├── CC-BY-4.0.txt      # Documentation license
+└── CC0-1.0.txt        # Public domain data
+
+REUSE.toml              # License annotations
+```
+
+**For BIDS datasets:**
+```
+LICENSES/
+├── CC0-1.0.txt        # Data license (if public domain)
+├── CC-BY-4.0.txt      # Data license (if attribution required)
+└── MIT.txt            # Code/scripts license
+
+REUSE.toml              # Separate annotations for data vs code
+dataset_description.json  # License field + optional DUO codes
+```
+
+### 3. Create REUSE.toml
+
+Generate appropriate annotations:
+
+**Standard Project Template:**
+```toml
+version = 1
+
+[[annotations]]
+path = [
+    "src/**",
+    "tests/**",
+    "*.py",
+    "*.md",
+    ".github/**",
+]
+precedence = "aggregate"
+SPDX-FileCopyrightText = "YEAR AUTHOR <email>"
+SPDX-License-Identifier = "LICENSE-ID"
+
+[[annotations]]
+path = ["data/**"]
+precedence = "aggregate"
+SPDX-FileCopyrightText = "YEAR DATA-PROVIDER"
+SPDX-License-Identifier = "CC0-1.0"
+```
+
+**BIDS Dataset Template:**
+```toml
+version = 1
+
+# BIDS data files
+[[annotations]]
+path = [
+    "sub-*/**/*.nii.gz",
+    "sub-*/**/*.json",
+    "sub-*/**/*.tsv",
+    "participants.tsv",
+    "participants.json",
+    "*.tsv",
+    "*.json",
+]
+precedence = "aggregate"
+SPDX-FileCopyrightText = "YEAR DATA-COLLECTORS"
+SPDX-License-Identifier = "CC0-1.0"
+# Optional DUO annotation (if applicable)
+# DataUseOntology = ["DUO:0000042"]  # General research use
+
+# BIDS code/derivatives
+[[annotations]]
+path = [
+    "code/**",
+    "derivatives/**/*.py",
+    "derivatives/**/*.sh",
+]
+precedence = "aggregate"
+SPDX-FileCopyrightText = "YEAR DEVELOPERS"
+SPDX-License-Identifier = "MIT"
+
+# Documentation
+[[annotations]]
+path = ["README*", "CHANGES*", "dataset_description.json"]
+precedence = "aggregate"
+SPDX-FileCopyrightText = "YEAR AUTHORS"
+SPDX-License-Identifier = "CC-BY-4.0"
+```
+
+### 4. Handle BIDS dataset_description.json
+
+**Current format (BIDS 1.x):**
+```json
+{
+  "Name": "Dataset Name",
+  "BIDSVersion": "1.9.0",
+  "License": "CC0"
+}
+```
+
+**Proposed enhanced format (per bids-spec#2015 and #2078):**
+```json
+{
+  "Name": "Dataset Name",
+  "BIDSVersion": "1.9.0",
+  "License": "CC0",
+  "DataUseOntology": [
+    "DUO:0000042",
+    "DUO:0000028"
+  ],
+  "DataUseDescription": "General research use; No re-identification"
+}
+```
+
+**Common DUO codes:**
+- `DUO:0000042` - General research use
+- `DUO:0000028` - No re-identification
+- `DUO:0000006` - Health or medical or biomedical research
+- `DUO:0000007` - Disease-specific research
+- `DUO:0000021` - Ethics approval required
+- `DUO:0000043` - Clinical care use
+
+### 5. Exclude build artifacts via `.gitignore`
+
+REUSE 3.x honors `.gitignore` — anything matched there is automatically
+skipped by `reuse lint`. **Do not create a `.reuseignore` file** (it is
+deprecated). Add build artifacts and caches to `.gitignore` if not
+already there:
+
+```gitignore
+# Build artifacts and caches
+.tox/
+.venv*/
+__pycache__/
+*.egg-info/
+build/
+dist/
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+node_modules/
+```
+
+**BIDS-specific excludes:**
+```gitignore
+# BIDS working directories (if any)
+sourcedata/
+work/
+.bidsignore
+.datalad/
+```
+
+If a legacy `.reuseignore` exists, migrate its entries to `.gitignore`
+and delete the file. Large generated/binary artifacts that you
+intentionally want tracked but excluded from REUSE (rare) should instead
+be covered by an `[[annotations]]` block in `REUSE.toml` with
+appropriate SPDX tags.
+
+### 6. Integrate with Build Systems
+
+**A. tox.ini Integration:**
+```ini
+[testenv:reuse]
+skip_install = true
+deps = reuse
+description = Check REUSE specification compliance
+commands =
+    reuse lint
+
+[gh-actions]
+python =
+    3.12: py312, lint, type, reuse
+```
+
+**B. pre-commit Integration:**
+```yaml
+repos:
+  - repo: https://github.com/fsfe/reuse-tool
+    rev: v4.0.3
+    hooks:
+      - id: reuse
+```
+
+**C. Makefile Integration:**
+```makefile
+.PHONY: reuse-lint reuse-download
+
+reuse-lint:
+	@echo "=== Checking REUSE compliance ==="
+	reuse lint
+
+reuse-download:
+	@echo "=== Downloading missing licenses ==="
+	reuse download --all
+
+reuse-annotate:
+	@echo "=== Annotating file with license header ==="
+	@read -p "File to annotate: " file; \
+	reuse annotate --license Apache-2.0 --copyright "YEAR AUTHOR" $$file
+```
+
+**D. GitHub Actions Integration:**
+```yaml
+- name: Check REUSE compliance
+  uses: fsfe/reuse-action@v5
+```
+
+### 7. Validate and Report
+
+Run validation:
+```bash
+reuse lint
+```
+
+Expected output sections:
+- **Bad licenses**: License files with issues
+- **Missing licenses**: Referenced but not in LICENSES/
+- **Files with copyright information**: X / Y
+- **Files with license information**: X / Y
+
+Goal: 100% compliance (all files have both copyright and license info)
+
+### 8. DUO Validation (BIDS Datasets)
+
+If DUO codes are present, validate them:
+1. Check codes exist in DUO ontology: https://www.ebi.ac.uk/ols/ontologies/duo
+2. Ensure codes are consistent with License field
+3. Verify DataUseDescription matches codes
+4. Check for conflicting restrictions
+
+**Common patterns:**
+- CC0 + DUO:0000042 → "Open data, general research use"
+- CC-BY-4.0 + DUO:0000028 → "Attribution required, no re-identification"
+- Custom + DUO:0000021 → "Restricted access, ethics approval required"
+
+## Optional: Patches against external upstream + DEP-3
+
+**Skip this section unless** the repository carries `*.patch` files that
+modify some other project's source (e.g. a vendoring/CI repo with
+`patches/` applied at build time). This is a relatively rare setup —
+most projects do not need it.
+
+When it does apply, REUSE alone is not enough: each patch should also
+carry a [DEP-3](https://dep-team.pages.debian.net/deps/dep3/) header so
+its provenance, upstream-forwarding status, and license are documented
+in-band.
+
+### Licensing of patch files
+
+Patches are derivative works of the upstream they modify and must
+inherit the upstream license. Choose the SPDX identifier from upstream's
+license:
+- git-annex / GPL upstreams → `AGPL-3.0-or-later` / `GPL-2.0-or-later` / etc.
+- BSD/MIT upstreams → match exactly.
+
+In `REUSE.toml`, use `precedence = "closest"` on the patches subtree so
+the per-patch SPDX header (added below) wins over the block-level
+fallback:
+
+```toml
+[[annotations]]
+path = "patches/**"
+precedence = "closest"
+SPDX-FileCopyrightText = "YEAR PROJECT TEAM <email>"
+SPDX-License-Identifier = "AGPL-3.0-or-later"  # match upstream
+```
+
+### DEP-3 + SPDX header template
+
+Prepend the following RFC-2822-style block to every `*.patch`. The
+trailing `---` line terminates the metadata; everything after it is the
+ordinary `git diff` content. Patch tools (`git apply`, `git apply -R
+--check`, `quilt`, `patch`) accept and ignore the preamble.
+
+```
+Description: <one-line summary>
+ <longer explanation: why this patch exists, what it works around,
+ who benefits, whether it is vendor-specific>
+Origin: vendor, https://<commit-url>
+Author: First Last <email@example.org>
+Forwarded: not-needed   # OR: <URL of upstream submission>; OR: no
+Last-Update: YYYY-MM-DD
+Bug: <upstream bug URL, if any>
+Applied-Upstream: <commit/version, if it landed>
+SPDX-FileCopyrightText: YEAR First Last <email@example.org>
+SPDX-License-Identifier: <upstream-matching SPDX ID>
+---
+diff --git a/...
+```
+
+Field reference (DEP-3):
+- `Description` (required) — short summary on first line, longer
+  explanation indented on following lines.
+- `Origin` (required unless `Author`) — `upstream`, `backport`,
+  `vendor`, or `other`, optionally with a URL.
+- `Author` / `From` — patch author(s).
+- `Forwarded` — `yes`/URL, `no`, or `not-needed`.
+- `Last-Update` — ISO date the metadata was last revised.
+- `Bug`, `Bug-<Vendor>`, `Reviewed-by`, `Applied-Upstream` — optional.
+
+### Verify patch tooling tolerates the preamble
+
+Before committing, sanity-check the project's actual patch-application
+path (not just `git apply`). For example:
+```bash
+git apply --check patches/<file>.patch
+git apply -R --check patches/<file>.patch  # if reverse-check is used
+```
+If the project uses `quilt`, `patch -p1`, or a custom script, run that
+too. Most tools ignore the preamble, but confirm before assuming.
+
+### Document it in the README
+
+Add a brief Licensing section pointing at `REUSE.toml` and `LICENSES/`,
+and extend any "Submitting Patches" / contributing guidance with the
+DEP-3 + SPDX template so new patches are compliant out of the gate.
+
+## Decision Points
+
+### License Selection
+
+**For code:**
+- Apache-2.0: Permissive, patent grant
+- MIT: Simple, permissive
+- GPL-3.0-or-later: Copyleft
+
+**For data:**
+- CC0-1.0: Public domain dedication
+- CC-BY-4.0: Attribution required
+- PDDL-1.0: Open Data Commons Public Domain
+
+**For documentation:**
+- CC-BY-4.0: Standard for documentation
+- CC-BY-SA-4.0: Share-alike for wikis
+
+### DUO Code Selection (BIDS)
+
+Ask user about data use restrictions:
+1. Is this general research use? → DUO:0000042
+2. Can data be used for re-identification? → If no, add DUO:0000028
+3. Is ethics approval required? → DUO:0000021
+4. Disease-specific restrictions? → DUO:0000007 + specific disease
+5. Collaboration required? → DUO:0000020
+6. Time limit? → DUO:0000024 + duration
+
+### Build System Priority
+
+If multiple systems exist, suggest:
+1. **Primary**: tox (Python standard)
+2. **Developer workflow**: pre-commit (catches issues early)
+3. **CI/CD**: GitHub Actions (automated checks)
+4. **Make**: For projects already using it
+
+## Output
+
+Provide:
+1. **Status report**: Current compliance level
+2. **Action items**: What needs to be done
+3. **File changes**: Specific files to create/modify
+4. **Integration steps**: How to add to build systems
+5. **Validation command**: How to check compliance
+
+For BIDS datasets, additionally provide:
+- Suggested dataset_description.json updates
+- DUO code recommendations based on data type
+- Explanation of REUSE + DUO synergy
+
+## References
+
+- REUSE Specification: https://reuse.software/spec/
+- REUSE Tutorial: https://reuse.software/tutorial/
+- DEP-3 (Patch Tagging Guidelines): https://dep-team.pages.debian.net/deps/dep3/
+- BIDS REUSE Issue: https://github.com/bids-standard/bids-specification/issues/2015
+- BIDS DUO Issue: https://github.com/bids-standard/bids-specification/issues/2078
+- DUO Ontology: https://github.com/EBISPOT/DUO
+- GA4GH DUO Standard: https://www.ga4gh.org/product/data-use-ontology-duo/
+
+## Notes
+
+- Always preserve existing licensing information when adding REUSE compliance
+- For BIDS: License field in dataset_description.json should match REUSE.toml data annotations
+- DUO codes are complementary to licenses, not replacements
+- REUSE handles "can you legally use this?", DUO handles "under what conditions?"
+- When in doubt about DUO codes, consult institutional review board or data governance team

From 5ada2e1912587e5c823a3cde248d1d460b6600a3 Mon Sep 17 00:00:00 2001
From: Yaroslav Halchenko <debian@onerussian.com>
Date: Wed, 29 Apr 2026 09:03:45 -0400
Subject: [PATCH 8/8] Rename reuse-compliance -> introduce-reuse-compliance

Match the `introduce-*` naming used by the other "set this up in a
project" skills (introduce-codespell, introduce-git-bug,
introduce-mailmap). Updates the directory, the `name:` field in the
frontmatter, the heading, the example slash-command in "When to Use",
and the README entry (also re-sorted into its alphabetical slot).

The `~/.claude/skills/` symlink is repointed locally; not part of this
commit.

Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 <noreply@anthropic.com>
---
 README.md                                                 | 2 +-
 {reuse-compliance => introduce-reuse-compliance}/SKILL.md | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)
 rename {reuse-compliance => introduce-reuse-compliance}/SKILL.md (96%)

diff --git a/README.md b/README.md
index 8423ab7..c83f79f 100644
--- a/README.md
+++ b/README.md
@@ -11,10 +11,10 @@ software project maintenance, triage, and automation.
 | [github-project-status](github-project-status/) | Assess whether a GitHub project is healthy, in maintenance mode, stagnant, or abandoned. Checks commits, releases, issues, PRs, forks, and package registries to produce a structured status report. |
 | [introduce-codespell](introduce-codespell/) | Add [codespell](https://github.com/codespell-project/codespell) spell-checking to a project end-to-end: config, GitHub Actions workflow, pre-commit hook, exclusion tuning, ambiguous-typo review, and automated fixes via `datalad run`. |
 | [introduce-git-bug](introduce-git-bug/) | Set up [git-bug](https://github.com/git-bug/git-bug) distributed issue tracking: configure GitHub bridge, sync issues, push `refs/bugs/*`, and document the workflow in DEVELOPMENT.md / CLAUDE.md. |
+| [introduce-reuse-compliance](introduce-reuse-compliance/) | Introduce [REUSE](https://reuse.software/) licensing compliance to a project: `LICENSES/`, `REUSE.toml`, SPDX headers, and integration with tox / pre-commit / Makefile / GitHub Actions. Handles BIDS dataset data-vs-code separation, [DUO](https://github.com/EBISPOT/DUO) data-use ontology codes, and DEP-3 patch tagging for vendoring repos. |
 | [issue-triage](issue-triage/) | Triage open GitHub issues by cross-referencing the codebase and git history. Detects duplicates, drafts proposed comments, and serves results in a local web dashboard. Includes Python helper scripts for gathering and serving data. |
 | [pr-feedback-review](pr-feedback-review/) | Load a PR's review feedback (human + bot), classify each comment by type and actionability, and recommend what to address vs dismiss — with draft code changes and responses. Works from a local repo or a PR URL. |
 | [pr-review-update](pr-review-update/) | Scan an [improveit-dashboard](https://github.com/yarikoptic/improveit-dashboard) for PRs awaiting your response, assess confidence, auto-rebase codespell PRs, and produce copy-paste-ready push commands. |
-| [reuse-compliance](reuse-compliance/) | Set up and validate [REUSE](https://reuse.software/) licensing compliance: `LICENSES/`, `REUSE.toml`, SPDX headers, and integration with tox / pre-commit / Makefile / GitHub Actions. Handles BIDS dataset data-vs-code separation, [DUO](https://github.com/EBISPOT/DUO) data-use ontology codes, and DEP-3 patch tagging for vendoring repos. |
 | [scan-projects](scan-projects/) | Walk subdirectories of git repos, collect metadata (language, license, commit dates, remote URL), and generate concise LLM-produced summaries into a `projects.tsv` file. Ships with helper scripts for batch updates. |
 | [tinuous-analyzer](tinuous-analyzer/) | Analyze CI log collections gathered by [con/tinuous](https://github.com/con/tinuous/) to pinpoint when a test started failing, diff environment/dependency changes between passing and failing runs, and recommend investigation steps. |
 
diff --git a/reuse-compliance/SKILL.md b/introduce-reuse-compliance/SKILL.md
similarity index 96%
rename from reuse-compliance/SKILL.md
rename to introduce-reuse-compliance/SKILL.md
index 07857d9..563eb51 100644
--- a/reuse-compliance/SKILL.md
+++ b/introduce-reuse-compliance/SKILL.md
@@ -1,11 +1,11 @@
 ---
-name: reuse-compliance
-description: Set up and validate REUSE specification compliance (LICENSES/ directory, REUSE.toml, SPDX headers) for software projects and BIDS datasets. Covers BIDS data-vs-code separation, DUO (Data Use Ontology) integration, DEP-3 patch tagging for vendoring repos, and integration with tox / pre-commit / Makefile / GitHub Actions. Use when adding licensing metadata to a project, fixing `reuse lint` failures, licensing a BIDS dataset, or annotating patches in a vendoring repo.
+name: introduce-reuse-compliance
+description: Introduce REUSE specification compliance (LICENSES/ directory, REUSE.toml, SPDX headers) to a software project or BIDS dataset, then validate it. Covers BIDS data-vs-code separation, DUO (Data Use Ontology) integration, DEP-3 patch tagging for vendoring repos, and integration with tox / pre-commit / Makefile / GitHub Actions. Use when adding licensing metadata to a project, fixing `reuse lint` failures, licensing a BIDS dataset, or annotating patches in a vendoring repo.
 allowed-tools: Bash, Read, Edit, Write, Glob, Grep, AskUserQuestion
 user-invocable: true
 ---
 
-# REUSE Compliance Skill
+# Introduce REUSE Compliance to a Project
 
 Implement the [REUSE specification](https://reuse.software/) for clear,
 machine-readable licensing and copyright information. Includes special
@@ -15,7 +15,7 @@ DEP-3 patch tagging for vendoring repositories.
 ## When to Use
 
 - User wants to add REUSE / SPDX licensing metadata to a project
-- User asks to "introduce REUSE" or runs `/reuse-compliance`
+- User asks to "introduce REUSE" or runs `/introduce-reuse-compliance`
 - `reuse lint` is failing and needs to be brought to 100% compliance
 - User is licensing a BIDS dataset (data + code + docs separately)
 - User is annotating `*.patch` files in a vendoring repo with DEP-3 headers