Skip to content

chore(skills): refresh from context-hub (HTML scrapes → clean markdown)#430

Open
kelsonpw wants to merge 9 commits intomainfrom
kelsonpw/skills-html-strip
Open

chore(skills): refresh from context-hub (HTML scrapes → clean markdown)#430
kelsonpw wants to merge 9 commits intomainfrom
kelsonpw/skills-html-strip

Conversation

@kelsonpw
Copy link
Copy Markdown
Collaborator

@kelsonpw kelsonpw commented Apr 30, 2026

Summary

Refreshes wizard's bundled skills/ from context-hub origin/main + amplitude/context-hub#59 (the HTML→markdown conversion in the build pipeline). Skill payloads sent to the agent on every run drop substantially — reference files are no longer raw HTML scrapes of amplitude.com/docs.

Why this exists

Per-skill references like references/browser-sdk-2.md were full HTML pages: <head> meta tags, cookie consent JavaScript, top/side nav, inline SVG logos with raw path data, the "Copy as Markdown" widget. ~80% of those files was page chrome the agent had to swim through before reaching the SDK reference content (init signatures, options, examples). First-turn latency in the wizard was paying for that on every skill load.

The HTML stripping happens at the source in context-hub#59 (cheerio + turndown in the build pipeline). This wizard PR consumes the resulting cleaner skills.

Impact (measured on integration-nextjs-app-router)

file before after reduction
amplitude-quickstart.md 1844 209 8.8x
browser-sdk-2.md 4726 1834 2.6x
browser-unified-sdk.md 2211 322 6.9x

Across all 32 integration skills + 5 instrumentation skills + 2 taxonomy skills, the refresh is 241K lines removed / 45K added. No skill directories added or removed — just regenerated content. Critical SDK content (init signatures, fenced code blocks, option tables) verified intact in spot-checks.

Source

context-hub commit base: 435c0f2 (origin/main, 2026-04-27) + the HTML→markdown converter from amplitude/context-hub#59. Once #59 merges and a release ships, this state will be reproducible via pnpm skills:refresh. Until then, this is an out-of-band refresh — flagging that explicitly in the commit message.

Test plan

  • pnpm test — 2466 tests pass / 14 skipped (no regressions from skill content changes)
  • pnpm lint clean
  • Manual: re-run wizard against the nextjs-app-router test app and verify perceived first-turn latency improves; confirm the agent still produces correct SDK init code and event tracking

Related

cc @amplitude/growth

🤖 Generated with Claude Code


Note

Low Risk
Primarily documentation/reference regeneration and link/path adjustments; low runtime risk, with the main risk being broken references or missing doc content if the conversion output is incomplete.

Overview
Refreshes bundled skills/ content to use cleaned, markdown-only documentation instead of raw amplitude.com/docs HTML scrapes, substantially shrinking reference payloads (e.g., integration-android docs like amplitude-quickstart.md and android.md).

Updates several integration skill SKILL.md files to point workflow steps at references/* paths (instead of root-relative filenames), and adjusts instrumentation best-practices guidance to explicitly treat Browser SDK Autocapture as a fixed excluded event set (with a canonical source to keep in sync).

Reviewed by Cursor Bugbot for commit da82fb5. Bugbot is set up for automated code reviews on this repo. Configure here.

Refreshes wizard's bundled `skills/` from context-hub origin/main +
amplitude/context-hub#59 (the HTML→markdown conversion in the build
pipeline). The skill payloads sent to the agent on every run drop
substantially because reference files are no longer raw HTML scrapes
of amplitude.com/docs.

## Why

Per-skill references like `references/browser-sdk-2.md` were full
HTML pages including `<head>` meta tags, cookie consent JavaScript,
top/side nav, inline SVG logos with raw path data, and the
"Copy as Markdown" widget. ~80% of those files were page chrome
the agent had to swim through before reaching the SDK reference
content (init signatures, options, examples). First-turn latency
in the wizard was paying for that on every skill load.

## Impact (measured on integration-nextjs-app-router)

| file                       | before | after | reduction |
|----------------------------|--------|-------|-----------|
| `amplitude-quickstart.md`  | 1844   | 209   | 8.8x      |
| `browser-sdk-2.md`         | 4726   | 1834  | 2.6x      |
| `browser-unified-sdk.md`   | 2211   | 322   | 6.9x      |

Across all 32 integration skills + 5 instrumentation skills + 2
taxonomy skills, the refresh is a net **241K lines removed / 45K
added**. No skill directories added or removed — just regenerated
content. Critical SDK content (init signatures, fenced code blocks,
option tables) verified intact in spot-checks.

## Source

context-hub commit base: `435c0f2` (origin/main, 2026-04-27) + the
HTML→markdown converter from amplitude/context-hub#59. Once #59
merges and a release ships, this state is reproducible via
`pnpm skills:refresh`.

## Test plan

- [x] `pnpm test` — 2466 tests pass / 14 skipped (no regressions)
- [x] `pnpm lint` clean
- [ ] Manual: re-run wizard against the nextjs-app-router test app
  and verify perceived first-turn latency improves; verify the
  agent still produces correct SDK init code and event tracking

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw kelsonpw requested a review from a team April 30, 2026 03:54
@github-actions
Copy link
Copy Markdown
Contributor

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci django
  • /wizard-ci fastapi
  • /wizard-ci flask
  • /wizard-ci javascript-node
  • /wizard-ci javascript-web
  • /wizard-ci next-js
  • /wizard-ci python
  • /wizard-ci react-router
  • /wizard-ci vue

Test an individual app:

  • /wizard-ci django/django3-saas
  • /wizard-ci fastapi/fastapi3-ai-saas
  • /wizard-ci flask/flask3-social-media
Show more apps
  • /wizard-ci javascript-node/express-todo
  • /wizard-ci javascript-node/fastify-blog
  • /wizard-ci javascript-node/hono-links
  • /wizard-ci javascript-node/koa-notes
  • /wizard-ci javascript-node/native-http-contacts
  • /wizard-ci javascript-web/saas-dashboard
  • /wizard-ci next-js/15-app-router-saas
  • /wizard-ci next-js/15-app-router-todo
  • /wizard-ci next-js/15-pages-router-saas
  • /wizard-ci next-js/15-pages-router-todo
  • /wizard-ci python/meeting-summarizer
  • /wizard-ci react-router/react-router-v7-project
  • /wizard-ci react-router/rrv7-starter
  • /wizard-ci react-router/saas-template
  • /wizard-ci react-router/shopper
  • /wizard-ci vue/movies

Results will be posted here when complete.

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Broken relative path to taxonomy SKILL.md
    • Restored the correct relative path ../../taxonomy/amplitude-quickstart-taxonomy-agent/SKILL.md in all three instrumentation skill files, replacing the broken ../taxonomy/SKILL.md that resolved to a non-existent location.

Create PR

Or push these changes by commenting:

@cursor push a8ab29aad3
Preview (a8ab29aad3)
diff --git a/skills/instrumentation/discover-analytics-patterns/SKILL.md b/skills/instrumentation/discover-analytics-patterns/SKILL.md
--- a/skills/instrumentation/discover-analytics-patterns/SKILL.md
+++ b/skills/instrumentation/discover-analytics-patterns/SKILL.md
@@ -25,7 +25,7 @@
 When determining naming conventions in this skill, use the following sources in strict order of preference:
 1. Events and properties observed from the Amplitude MCP server
 2. Real tracking call sites in the codebase
-3. The `taxonomy` skill at `../taxonomy/SKILL.md`
+3. The `taxonomy` skill at `../../taxonomy/amplitude-quickstart-taxonomy-agent/SKILL.md`
 
 ---
 

diff --git a/skills/instrumentation/discover-event-surfaces/SKILL.md b/skills/instrumentation/discover-event-surfaces/SKILL.md
--- a/skills/instrumentation/discover-event-surfaces/SKILL.md
+++ b/skills/instrumentation/discover-event-surfaces/SKILL.md
@@ -23,7 +23,7 @@
 that mirror implementation details. Aim for **breadth and quality** — a
 downstream skill will narrow the list.
 
-Read the `taxonomy` skill at `../taxonomy/SKILL.md` to understand core
+Read the `taxonomy` skill at `../../taxonomy/amplitude-quickstart-taxonomy-agent/SKILL.md` to understand core
 analytics philosophy and naming standards.
 
 ---

diff --git a/skills/instrumentation/instrument-events/SKILL.md b/skills/instrumentation/instrument-events/SKILL.md
--- a/skills/instrumentation/instrument-events/SKILL.md
+++ b/skills/instrumentation/instrument-events/SKILL.md
@@ -24,7 +24,7 @@
 with existing patterns, minimal footprint, and properties that actually power
 dashboards — not vanity fields nobody queries.
 
-Read the `taxonomy` skill at `../taxonomy/SKILL.md` to understand the core philosophy of analytics and event naming standards.
+Read the `taxonomy` skill at `../../taxonomy/amplitude-quickstart-taxonomy-agent/SKILL.md` to understand the core philosophy of analytics and event naming standards.
 
 ---

You can send follow-ups to the cloud agent here.

Comment thread skills/instrumentation/discover-analytics-patterns/SKILL.md Outdated
cursoragent and others added 2 commits April 30, 2026 03:56
)

The IntroScreen's "Change directory" picker freezes after the user
submits a new path. `store.changeInstallDir(newDir)` resets
`detectionComplete = false` (so the spinner reappears) and then
fires the registered redetector callback — but in production the
redetector was never registered, so the spinner spun forever.

Same gap meant `store.autoEnableInlineAddons('auto-tui')` never
fired in TUI mode. It's only called inside `runFrameworkDetection`,
the helper extracted from bin.ts for re-runnability — and that
helper had no production caller.

`default.ts`'s detection task ran inline as a one-shot IIFE that
captured `installDir` from closure and never offered cancellation
or re-run. This commit replaces the inline version with the shared
`runFrameworkDetection` helper and wires up both:

  - `setFrameworkRedetector(...)` — closes the IntroScreen loop so
    submitting a new path actually re-runs detection against it.
  - `registerActiveDetection(controller)` — lets a directory swap
    that lands mid-detection cancel the in-flight run instead of
    leaving a stale `setDetectionComplete()` to fire after the
    state reset.

Removes ~83 lines of duplicated detection logic from default.ts
and drops the `FRAMEWORK_REGISTRY` / `detectAllFrameworks` /
`DETECTION_TIMEOUT_MS` dynamic imports that fed it.

Existing store tests at `src/ui/tui/__tests__/store.test.ts` already
cover the `changeInstallDir` + redetector contract end-to-end
(lines 1717-1803). All 2469 tests pass.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw
Copy link
Copy Markdown
Collaborator Author

@cursor push a8ab29a

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Android reference contains only navigation chrome, no documentation
    • Removed 33 lines of site-wide navigation links and breadcrumb chrome that the HTML-to-markdown conversion failed to strip, leaving only the actual SDK catalog content.

Create PR

Or push these changes by commenting:

@cursor push 8d69f87ff3
Preview (8d69f87ff3)
diff --git a/skills/integration/integration-android/references/android.md b/skills/integration/integration-android/references/android.md
--- a/skills/integration/integration-android/references/android.md
+++ b/skills/integration/integration-android/references/android.md
@@ -1,40 +1,6 @@
-[documentation](/docs)
-
-[Get Started](/docs/get-started)
-
-[Data](/docs/data)
-
-[Analytics](/docs/analytics)
-
-[Amplitude AI](/docs/amplitude-ai)
-
-[Session Replay](/docs/session-replay)
-
-[Guides and Surveys](/docs/guides-and-surveys)
-
-[AI Assistant](/docs/assistant)
-
-[Experiment](/docs/experiment-home)
-
-[Admin](/docs/admin)
-
-[Partners](/docs/partners)
-
-[FAQ](/docs/faq)
-
-[SDKs](/docs/sdks)
-
-/
-
-[Amplitude Analytics SDK Catalog](/docs/sdks/analytics)
-
-/
-
-[Android](/docs/sdks/analytics/android)
-
 # Android
 
-`current`  [## ![](/docs/assets/icons/java.svg) ![](/docs/assets/icons/kotlin.svg) Android-Kotlin SDK](/docs/sdks/analytics/android/android-kotlin-sdk)
+`current` [Android-Kotlin SDK](/docs/sdks/analytics/android/android-kotlin-sdk)
 
 [![](https://img.shields.io/maven-central/v/com.amplitude/analytics-android.svg?label=Maven%20Central)](https://mvnrepository.com/artifact/com.amplitude/analytics-android)
 
@@ -42,10 +8,10 @@
 -   [Releases](https://github.com/amplitude/Amplitude-Kotlin/releases)
 -   [Ampli](/docs/sdks/analytics/android/ampli-for-android-kotlin-sdk)
 
-`maintenance`  [## ![](/docs/assets/icons/java.svg) ![](/docs/assets/icons/kotlin.svg) Android SDK](/docs/sdks/analytics/android/android-sdk)
+`maintenance` [Android SDK](/docs/sdks/analytics/android/android-sdk)
 
 [![](https://img.shields.io/maven-central/v/com.amplitude/android-sdk.svg?label=Maven%20Central&versionPrefix=2)](https://mvnrepository.com/artifact/com.amplitude/android-sdk)
 
 -   [GitHub](https://github.com/amplitude/Amplitude-Android)
 -   [Releases](https://github.com/amplitude/Amplitude-Android/releases)
\ No newline at end of file
--   [Migrate to the Android-Kotlin SDK](/docs/sdks/analytics/android/migrate-to-the-android-kotlin-sdk)
+-   [Migrate to the Android-Kotlin SDK](/docs/sdks/analytics/android/migrate-to-the-android-kotlin-sdk)

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit b3e688d. Configure here.

<div class="w-full pt-5 pl-5">
<div class="box-border">

[FAQ](/docs/faq)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Android reference contains only navigation chrome, no documentation

Medium Severity

The HTML-to-markdown conversion for android.md failed to strip site-wide navigation chrome. Lines 1–23 are top-nav links ([Get Started], [Data], [Analytics], etc.) that provide zero value to the agent. The remaining content (lines 25–52) is just an SDK catalog index with GitHub links — no actual SDK initialization, configuration, or API reference content survives the conversion.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b3e688d. Configure here.

…gration SKILL.md

The HTML→markdown regen at PR #430 produced workflow lists like:

    1. `basic-integration-1.0-begin.md` - Amplitude Setup - Begin

The actual files live at references/basic-integration-1.X-*.md. Without
the prefix, the agent can't resolve the path when it tries to load the
workflow step. The same skill files already use the references/ prefix
in their ## Reference files section — this just aligns the workflow
section with that convention.

128 refs fixed across 32 integration SKILL.md files.
…egen

Three categories of broken references missed by the prior taxonomy fix:

1. discover-analytics-patterns/SKILL.md had a SECOND occurrence of
   `../taxonomy/SKILL.md` (line 130) — only the first was patched
   in fd36663. Updates it to `../../taxonomy/amplitude-quickstart-taxonomy-agent/SKILL.md`
   matching the canonical layout.

2. EXAMPLE.md cross-skill links pointed at non-existent siblings:
   - integration-javascript_web → `../javascript-node`
   - integration-astro-static  → `../astro-ssr`
   - integration-nuxt-4        → `../nuxt-3.6`

   The actual sibling skill dirs are `integration-javascript_node`,
   `integration-astro-ssr`, and `integration-nuxt-3.6` — and they
   live one more level up. Updates each link to `../../integration-<name>/`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants