Skip to content

fix(skilify): plug the three holes that produced cross-author duplicate skills#119

Merged
efenocchi merged 6 commits into
mainfrom
fix/skilify-dedup-and-bugs
May 12, 2026
Merged

fix(skilify): plug the three holes that produced cross-author duplicate skills#119
efenocchi merged 6 commits into
mainfrom
fix/skilify-dedup-and-bugs

Conversation

@efenocchi
Copy link
Copy Markdown
Collaborator

@efenocchi efenocchi commented May 11, 2026

Summary

Three tactical fixes for the cross-author and cross-project duplicate skills problem on the skills Deeplake table. Each commit is independent and focused; landing all three closes the gap that produced the duplicates we currently have in prod (e.g. meta-harness-continual-learningcontinual-learning-meta-harness, the three near-identical hivemind-*-testing skills).

What was actually broken

  1. created_at was reset on every MERGE. All 27 rows in the prod skills table have created_at == updated_at because the worker stamped now() on both fields at every INSERT. The local SKILL.md frontmatter already preserved the v=1 creation date correctly — only the DB row didn't see it.

  2. deriveProjectKey hashed the raw git remote URL. Five surface forms of the same repo (git@..., https://..., with/without .git, with embedded credentials) produced five different project_key values. Two devs cloning the same repo with different URL styles ended up with disjoint state.

  3. The gate only read <cwd>/.claude/skills/, never ~/.claude/skills/. Autopull lands skills under the global root regardless of scope-config; the worker only looked at the project root. With the default install=project, the gate saw zero existing skills no matter how many had been autopulled — so every mining run started from a blank slate and re-created near-identical skills.

Commits

  • fix(skilify): preserve created_at across MERGE in skills table — thread the v=1 created_at through SkillWriteResult instead of stamping now().
  • fix(skilify): normalize git remote URL before hashing for projectKey — collapse SSH/HTTPS/.git/cred variants to a canonical host/owner/repo form.
  • fix(skilify): gate reads project + global skills, only [project] mergeable — extract the existing-skills logic into a testable module, read both roots, tag entries [project] (MERGE-eligible) or [global, read-only] (reference only). Cross-author MERGE intentionally stays forbidden in this PR.

Out of scope (filed separately)

  • Cross-author MERGE with a contributors column and auto-promote me → team on cross-author edits — see skills table: add contributors column for team/org scope edits #118. The interim restriction in this PR (MERGE only into your own project skills) avoids silently overwriting teammates' work until that lands.
  • Compaction of old version rows (e.g. 7 skills-autopull-fanout-symlinks versions) — purely a storage-bloat concern, not a correctness one.

Test plan

  • npm run build clean
  • npm test — 2189 passing (was 2175 before; +14 from new tests)
  • New tests: skilify-state.test.ts (URL normalization, 4 cases), skilify-skill-writer.test.ts (createdAt threading through SkillWriteResult), skilify-existing-skills.test.ts (7 cases: both roots, dedupe project-wins, source tagging, char-cap placeholder)
  • Bundle-scan tests updated for new gate-prompt heading
  • Manual smoke: run a session in a project with autopulled global skills, confirm the gate prompt now mentions them as [global, read-only]

Summary by CodeRabbit

Release Notes

  • New Features

    • Skills now include creation and modification timestamps for better tracking.
    • Enhanced existing skills display showing both project and global read-only skills with clear source labels.
  • Improvements

    • Clarified merge eligibility rules in skill curation prompts—only project skills are valid merge targets; global skills are reference-only references.
    • Improved project identification through consistent normalization of Git remote URLs.

Review Change Stack

efenocchi added 3 commits May 11, 2026 22:53
The worker stamped `new Date().toISOString()` on both createdAt and
updatedAt at every INSERT to the skills table — so after MERGE, the
v+1 row reported the merge timestamp as creation date, losing the
original creation date entirely. Across the 27 rows in prod today,
every single one has created_at == updated_at.

The local SKILL.md frontmatter already preserved created_at
correctly across merges (skill-writer.ts mergeSkill inherits the
v=1 timestamp). This change threads that value through
SkillWriteResult so the worker can pass it to insertSkillRow
instead of stamping a fresh now().

Tests assert that mergeSkill returns the v=1 createdAt unchanged
and a fresh updatedAt, matching the frontmatter behavior.
Two developers cloning the same repo with different URL styles
(SSH vs HTTPS, with/without .git suffix, with embedded credentials)
were getting different project_key values for the same project,
because deriveProjectKey just sha1'd the raw output of
`git config --get remote.origin.url`.

Concrete cost: 5 distinct keys for the same repo across realistic
clone variants — which means the dedup gate, the per-project state
file, and the skills table all treated the same logical project as
multiple separate ones. Cross-cloner skill dedup didn't work at all.

normalizeGitRemoteUrl now collapses all common surface forms down
to `host/owner/repo` (lowercased) before hashing:
  - strip URL scheme (https://, git://, ssh://, …)
  - convert SCP-style git@host:path → host/path
  - drop user@ / user:pass@ credentials
  - drop trailing .git and trailing slashes
  - lowercase the result

Tests exercise the full set of clone styles seen in the wild plus
a fallback case for non-URL inputs (the cwd-hash path still works
when the cwd isn't a git repo).
…eable

The autopull lands pulled skills under ~/.claude/skills/<name>--<author>/
regardless of scope-config, while the worker was reading only
<cwd>/.claude/skills/. With the default install=project setting (no
~/.deeplake/state/skilify/config.json present), the gate saw zero
existing skills no matter how many had been autopulled — root cause of
every cross-author duplicate currently in the table.

This change moves the existing-skills enumeration into a dedicated
testable module that reads from BOTH roots, dedupes by name (project
wins), and tags each entry with its source so the gate prompt can
distinguish them. Skills are now rendered as either:

  --- existing skill [project]: <name> ---
  --- existing skill [global, read-only]: <name> ---

The prompt restricts MERGE targets to [project] only. Globally-pulled
skills are reference-only so the gate can avoid duplicating work
already covered there, but cannot silently overwrite teammates' rows.
A full "edit teammates' skills" model (contributors column,
auto-promote scope on cross-author MERGE) is tracked separately in
#118.

The refactor is mechanical — the prompt-rendering logic moved verbatim
into src/skilify/existing-skills.ts. New unit tests cover:
  - both roots scanned, source tagged correctly
  - same name in both roots: project wins
  - MERGE target list excludes [global] entries
  - char-cap path produces the omitted-count placeholder

Bundle-scan test updated to look for the new gate-prompt heading.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7ed24656-8887-4dc6-8d85-6858ce1bc17c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds Git remote URL normalization for stable project keys, introduces a new existing-skills management system that aggregates project and global skills with separate merge eligibility, and extends skill metadata to include creation and update timestamps that propagate through the persistence pipeline.

Changes

Skilify System Enhancement: Git Normalization, Existing Skills, and Metadata Timestamps

Layer / File(s) Summary
Git Remote URL Normalization
src/skilify/state.ts
New normalizeGitRemoteUrl(url) helper canonicalizes git remote URLs by stripping schemes, converting SCP-style syntax, removing .git and trailing slashes, and lowercasing. deriveProjectKey() now uses the normalized remote URL for project key hashing instead of the raw URL.
Existing Skills Types and Functions
src/skilify/existing-skills.ts
New module with TaggedSkill and ExistingSkillsBlock interfaces. listAllExistingSkills(cwd) enumerates skills from both project (.claude/skills) and global (~/.claude/skills) roots, tags each by origin, and deduplicates by name with project taking precedence. renderExistingSkillsBlock(cwd, charCap) renders a character-capped prompt block labeling skills as [project] or [global, read-only] and returns merge-eligible project skill names.
Skill Writer Timestamps
src/skilify/skill-writer.ts
SkillWriteResult interface extended with createdAt and updatedAt fields. writeNewSkill() and mergeSkill() now extract and return these timestamps from the written SKILL.md frontmatter instead of relying on caller-supplied values.
Skilify Worker Integration
src/skilify/skilify-worker.ts
Imports renderExistingSkillsBlock from the new existing-skills module. buildPrompt() now calls this function to generate the "EXISTING SKILLS" section, restricts MERGE targets to project skills via mergeTargetNames, and explicitly marks global skills as read-only reference-only. recordToDeeplake() updated to use skill-writer-provided createdAt/updatedAt instead of generating fresh timestamps at insert time.
Tests
claude-code/tests/skilify-*.test.ts, codex/tests/...
New skilify-existing-skills.test.ts suite covering enumeration, deduplication, merge-eligibility restriction, and character-cap rendering. Updated skilify-skill-writer.test.ts to assert timestamp fields. Updated skilify-state.test.ts with normalizeGitRemoteUrl tests covering URL variant canonicalization and case-insensitive normalization. Updated bundle scan tests for new "EXISTING SKILLS" heading and "[project] are MERGE-eligible" marker.
Bundled Agent Code
claude-code/bundle/*, codex/bundle/*, cursor/bundle/*, hermes/bundle/*, pi/bundle/*
All five agent bundles updated to include the normalizeGitRemoteUrl() helper in their bundled capture.js, session-end.js, and stop.js; bundled skilify-worker.js updated with existing-skills aggregation logic and timestamp propagation.

Sequence Diagram(s)

sequenceDiagram
  participant Agent as Skilify Agent
  participant Writer as Skill Writer
  participant Existing as Existing Skills
  participant Prompt as Prompt Builder
  participant Deeplake as Deeplake Table
  Agent->>Existing: listAllExistingSkills(cwd)
  Existing->>Existing: load project + global skills
  Existing->>Existing: deduplicate, tag by source
  Existing-->>Agent: [TaggedSkill[], mergeTargetNames]
  Agent->>Prompt: buildPrompt(pairs, existing)
  Prompt->>Prompt: render "EXISTING SKILLS" block
  Prompt->>Prompt: restrict MERGE to mergeTargetNames
  Prompt-->>Agent: curated_prompt
  Agent->>Writer: writeNewSkill() or mergeSkill()
  Writer->>Writer: extract createdAt, updatedAt from frontmatter
  Writer-->>Agent: {path, action, version, createdAt, updatedAt}
  Agent->>Deeplake: insertSkillRow(..., createdAt, updatedAt)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • activeloopai/hivemind#98: Both modify deriveProjectKey() in skilify/state.ts by adding the normalizeGitRemoteUrl() helper for stable project key derivation.
  • activeloopai/hivemind#116: Related at the code level—both touch the same skilify/state.ts module and project-key derivation logic that PR #116 renames and augments.

Suggested reviewers

  • kaghni

Poem

🐰 A rabbit hops through Git remotes so fine,
Normalizing URLs in a single line,
Global skills read-only, project skills bold,
Timestamps preserved as stories unfold,
Skills that merge true, in bundles they shine!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly summarizes the main change: fixing three specific bugs (cross-author duplicate skills) with clear specificity about the problem being addressed.
Description check ✅ Passed The description includes all required template sections: a comprehensive summary explaining what was broken, version bump instructions, and a detailed test plan with all checkboxes completed except the manual smoke test (which is pending as expected).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/skilify-dedup-and-bugs

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🟢 Lines 95.19% (🎯 90%) 198 / 208
🟢 Statements 93.67% (🎯 90%) 222 / 237
🟢 Functions 100.00% (🎯 90%) 34 / 34
🔴 Branches 86.09% (🎯 90%) 99 / 115
File Coverage — 4 files changed
File Stmts Branches Functions Lines
src/skillify/existing-skills.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/skillify/skill-writer.ts 🟢 98.9% 🔴 88.3% 🟢 100.0% 🟢 100.0%
src/skillify/skillify-worker.ts
src/skillify/state.ts 🔴 87.7% 🔴 80.8% 🟢 100.0% 🔴 89.8%

Generated for commit 310d60d.

@coderabbitai coderabbitai Bot requested a review from kaghni May 11, 2026 23:03
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
claude-code/tests/skilify-state.test.ts (1)

70-84: ⚡ Quick win

Add regression cases for explicit default ports.

The canonicalization suite currently misses ssh://...:22/... and https://...:443/... variants, which are common enough to reintroduce project-key fragmentation if behavior regresses.

✅ Suggested test additions
   it("collapses all common git URL forms to one canonical string", () => {
     const variants = [
       "git@github.com:activeloopai/hivemind.git",
       "git@github.com:activeloopai/hivemind",
       "https://github.com/activeloopai/hivemind.git",
       "https://github.com/activeloopai/hivemind",
+      "https://github.com:443/activeloopai/hivemind.git",
       "https://github.com/activeloopai/hivemind/",
       "https://emanuele@github.com/activeloopai/hivemind.git",
       "https://emanuele:secret@github.com/activeloopai/hivemind.git",
       "ssh://git@github.com/activeloopai/hivemind.git",
+      "ssh://git@github.com:22/activeloopai/hivemind.git",
     ];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude-code/tests/skilify-state.test.ts` around lines 70 - 84, Add regression
cases for explicit default ports to the canonicalization test: update the
variants array in skilify-state.test.ts (the test that calls
normalizeGitRemoteUrl) to include SSH with :22 (e.g.,
"ssh://git@github.com:22/activeloopai/hivemind.git" and
"git@github.com:22:activeloopai/hivemind.git" or equivalent scp-style if
supported) and HTTPS with :443 (e.g.,
"https://github.com:443/activeloopai/hivemind.git" and credentialed forms like
"https://user:pass@github.com:443/activeloopai/hivemind.git"), then assert they
normalize to the same canonical "github.com/activeloopai/hivemind".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@claude-code/bundle/session-end.js`:
- Around line 356-369: The normalizeGitRemoteUrl function currently preserves
explicit default ports (e.g. :22, :443) so equivalent remotes normalize to
different strings; update normalizeGitRemoteUrl to detect and strip default
ports for common schemes (SSH -> :22, HTTPS -> :443, HTTP -> :80) before
removing .git and trailing slashes. Specifically, when parsing the input in
normalizeGitRemoteUrl, capture the scheme (if any) and host:port, and if the
port equals the scheme's default, remove the :port fragment (while keeping
non-default ports intact); ensure this logic works both for normal URL forms and
for scp-style remotes handled by the existing scp branch so projectKey
generation becomes stable.

In `@src/skilify/state.ts`:
- Around line 75-102: normalizeGitRemoteUrl currently leaves explicit default
ports (e.g., :22 or :443) in the host path which causes duplicate project keys;
update normalizeGitRemoteUrl to strip default SSH and HTTPS ports from the host
portion (remove trailing :22 and :443 when they appear right after the hostname
and before a slash or end) while preserving non-default ports, so
deriveProjectKey sees canonical signatures; modify the hostname normalization
logic inside normalizeGitRemoteUrl (the code handling scp/scheme stripping and
host/path assembly) to run a regex that removes :22 and :443 only for
default-scheme cases and not arbitrary numeric suffixes.

---

Nitpick comments:
In `@claude-code/tests/skilify-state.test.ts`:
- Around line 70-84: Add regression cases for explicit default ports to the
canonicalization test: update the variants array in skilify-state.test.ts (the
test that calls normalizeGitRemoteUrl) to include SSH with :22 (e.g.,
"ssh://git@github.com:22/activeloopai/hivemind.git" and
"git@github.com:22:activeloopai/hivemind.git" or equivalent scp-style if
supported) and HTTPS with :443 (e.g.,
"https://github.com:443/activeloopai/hivemind.git" and credentialed forms like
"https://user:pass@github.com:443/activeloopai/hivemind.git"), then assert they
normalize to the same canonical "github.com/activeloopai/hivemind".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8941a04d-60c0-419a-8972-2f20c45b9c5f

📥 Commits

Reviewing files that changed from the base of the PR and between ed7634b and f875f15.

📒 Files selected for processing (20)
  • claude-code/bundle/capture.js
  • claude-code/bundle/session-end.js
  • claude-code/bundle/skilify-worker.js
  • claude-code/tests/skilify-bundle-scan.test.ts
  • claude-code/tests/skilify-existing-skills.test.ts
  • claude-code/tests/skilify-skill-writer.test.ts
  • claude-code/tests/skilify-state.test.ts
  • codex/bundle/skilify-worker.js
  • codex/bundle/stop.js
  • cursor/bundle/capture.js
  • cursor/bundle/session-end.js
  • cursor/bundle/skilify-worker.js
  • hermes/bundle/capture.js
  • hermes/bundle/session-end.js
  • hermes/bundle/skilify-worker.js
  • pi/bundle/skilify-worker.js
  • src/skilify/existing-skills.ts
  • src/skilify/skilify-worker.ts
  • src/skilify/skill-writer.ts
  • src/skilify/state.ts

Comment thread claude-code/bundle/session-end.js
Comment thread src/skillify/state.ts Outdated
efenocchi added 2 commits May 12, 2026 00:02
…-bugs

# Conflicts:
#	claude-code/bundle/skillify-worker.js
#	claude-code/tests/skillify-bundle-scan.test.ts
#	codex/bundle/skillify-worker.js
#	cursor/bundle/skillify-worker.js
#	hermes/bundle/skillify-worker.js
#	pi/bundle/skilify-worker.js
#	src/skillify/existing-skills.ts
Equivalent remotes were still producing different project_keys when the
URL carried the scheme's default port explicitly:

  https://github.com:443/org/repo
  ssh://git@github.com:22/org/repo
  git://github.com:9418/org/repo
  http://github.com:80/org/repo

vs the same URLs without the explicit port. Rare in hand-typed config
but realistic for automation / hosting that emits canonical URLs with
ports, so worth folding in.

Non-default ports (e.g. :8443 for an internal git server) stay — they
identify the remote and must not be stripped.
Comment thread src/skillify/state.ts
return join(STATE_DIR, `${projectKey}.lock`);
}

/**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we use git for skills and why

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deriveProjectKey produces a stable per-project identifier used in three places:

  1. skills table partition key — every row carries project_key. Combined with name it's the dedup key the gate uses to decide KEEP vs MERGE, and what pull.ts uses to namespace skills cross-project (<root>/<project_key>/... in the legacy layout, now flattened to <name>--<author>/).
  2. Per-project state file~/.deeplake/state/skillify/<project_key>.json (Stop counter, watermark UUID, last mined date). Two projects mustn't share this file.
  3. Autopull manifestpulled.json records each pulled skill with its project_key so unpull knows which entries belong to which project.

Why git remote URL specifically as the input to the hash: it's the most reliable cross-machine identifier of "the same logical project". A repo cloned by Alice into /Users/alice/work/foo and by me into /home/emanuele/code/foo shares git remote.origin.url — so both their skills get the same project_key, the dedup gate can reason about them as one project, and pulling Alice's skill lands it in my namespace correctly.

If we hashed the absolute cwd instead, every clone of the same repo on every machine would have a different key → duplicate skills across cloners, no cross-cloner dedup. The cwd is only a fallback when there's no git remote (e.g. mining in a non-repo working dir like /home/ec2-user) — and that fallback is exactly what produced some of the duplicates we cleaned up in this PR (the three hivemind-*-testing skills emanuele had: one project cwd had git remote, one didn't, so they hashed differently).

The fix in this PR (normalizeGitRemoteUrl) makes the git-URL path actually deliver on that promise — before the fix, SSH vs HTTPS vs .git-suffix variants of the same URL were producing 5 different hashes for the same repo (test cases in claude-code/tests/skillify-state.test.ts).

@kaghni
Copy link
Copy Markdown
Collaborator

kaghni commented May 12, 2026

openclaw is missing

openclaw was the only agent ever excluded from the per-agent
bundle-scan iteration, even though its `dist/skillify-worker.js`
ships exactly the same compiled worker code as the others. That
left a real gap: if a build regression dropped the gate prompt or
the migration helper specifically from openclaw's bundle (esbuild
config drift, cache staleness), no test would have caught it.

Adds "openclaw" to AGENTS and maps it to `dist/` via a small
special-case in `bundlePath` (openclaw is a separate npm sub-package
whose build output is conventionally `dist/`, gitignored, and
regenerated by `npm run build`). Same assertions now apply:

- skillify-worker bundle ships, contains EXISTING SKILLS heading
  and the [project] are MERGE-eligible clause
- UPDATE-on-skills anti-pattern check
- legacy state-dir migration helper present and called from readState

The dedicated openclaw test at the bottom (inlined migration helper
inside index.js) stays — that helper is genuinely a different shape
(`migrateOpenclawSkillifyLegacyStateDir`, not the shared
`migrateLegacyStateDir`) and lives in a different bundle.

Test count: 19 -> 21.
@efenocchi
Copy link
Copy Markdown
Collaborator Author

Good catch — addressed in 529a26f (pushed right before your comment landed). openclaw is now in the AGENTS const of skillify-bundle-scan.test.ts, with a small special-case in bundlePath() to point at openclaw/dist/ instead of openclaw/bundle/ (openclaw is a separate npm sub-package whose build output is conventionally dist/, gitignored, regenerated by npm run build). All three describe-loops that iterate over AGENTS (worker bundle shipped, UPDATE-on-skills anti-pattern, legacy state-dir migration) now cover openclaw too. Test count: 19 → 21.

The dedicated test on openclaw/dist/index.js (the inlined migrateOpenclawSkillifyLegacyStateDir helper) stays separate — it's checking a different bundle with a different helper shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants