Skip to content

Add opt-in publish to understand-quickly registry#517

Open
amacsmith wants to merge 2 commits into
AsyncFuncAI:mainfrom
amacsmith:feat/uq-publish
Open

Add opt-in publish to understand-quickly registry#517
amacsmith wants to merge 2 commits into
AsyncFuncAI:mainfrom
amacsmith:feat/uq-publish

Conversation

@amacsmith
Copy link
Copy Markdown

Why

looptech-ai/understand-quickly is a public, machine-readable registry of code-knowledge graphs. It ships an MCP server and a stable registry.json API, so AI agents (Claude, Codex, Cursor via MCP) and human readers can discover graphs and fetch them by URL. Wiring an opt-in publish path into DeepWiki's existing /export/wiki endpoint lets any DeepWiki user land in the registry with one extra request body field — and gets their generated wiki addressable by any agent without any infrastructure on either side.

  • Discoverability. Every published DeepWiki graph appears at https://looptech-ai.github.io/understand-quickly/ with status, schema validation, and drift detection.
  • No infrastructure on our side. Graphs stay in the user's repo / wherever they store the export; the registry only stores pointers and fetches via raw.githubusercontent.com.
  • Agent-consumable. The registry's MCP server reads the generic@1 schema we emit out of the box.

What changes

  • Add "graph" to the WikiExportRequest.format literal alongside the existing "markdown" / "json". When set, /export/wiki emits a generic@1 knowledge graph: each WikiPage becomes a node (kind="wiki-page"), each relatedPages reference becomes a kind="related" edge, and filePaths / importance ride along under node.data so consumers can map nodes back to source files.
  • Add a publish: bool = False flag (and an optional repo: "owner/repo" override) to the same request. With publish=true, after writing the export the API fires a repository_dispatch sync-entry event at looptech-ai/understand-quickly. The dispatch is gated on UNDERSTAND_QUICKLY_TOKEN in the server env — without it, the graph is still produced and the dispatch is silently skipped (no network call, no exit-1).
  • Embed metadata.{tool: "deepwiki-open", tool_version, generated_at, repo_url} on the emitted graph; embed metadata.commit when the helper is given one (the orchestrator passes through git rev-parse HEAD from the cloned source repo when available — the helper is happy to skip it cleanly when not).
  • New self-contained api/publish.py (stdlib only — urllib.request, subprocess, re, json). No new dependencies.
  • README gets a short "Publishing to understand-quickly (opt-in)" subsection under API Server Details.

No-op default

Default behaviour is unchanged. format still accepts markdown and json. publish defaults to false. With publish=true but no UNDERSTAND_QUICKLY_TOKEN, the endpoint emits the graph file as usual and logs a single informational line — no network call, no 5xx.

Token setup

The user adds a fine-grained GitHub PAT to the server env (or repo secrets, when run from CI):

  • Repository access: looptech-ai/understand-quickly only.
  • Permissions: Repository dispatches: write. Nothing else.

A drop-in CI workflow snippet lives at docs/integrations/sample-publish-workflow.yml.

Test plan

15 new unit tests in test/test_publish.py (self-contained — they don't import api.data_pipeline, so they don't need adalflow or any AI provider keys to run):

  • derive_owner_repo parses HTTPS, HTTPS-with-.git, HTTPS-with-trailing-slash, and SSH (git@github.com:...) GitHub URLs; returns None for unknown shapes.
  • build_graph_payload is generic@1-compatible ({nodes, edges}).
  • Metadata fields include tool=="deepwiki-open", tool_version, ISO-8601 generated_at, optional commit (40-hex), repo_url.
  • Nodes carry label, data.filePaths, data.importance.
  • Dangling relatedPages references (pointing at unknown ids) are dropped, not emitted as edges.
  • publish() with no token: returns {dispatched: false, reason: "no-token"}, never calls urlopen.
  • publish() with no owner_repo: returns {dispatched: false, reason: "no-owner-repo"}, never calls urlopen.
  • dispatch_sync() happy path: single POST to https://api.github.com/repos/looptech-ai/understand-quickly/dispatches with body {"event_type":"sync-entry","client_payload":{"id":"<owner>/<repo>"}} and Authorization: Bearer ... header (mocked urlopen).
  • dispatch_sync() HTTPError (e.g. 422 from an unregistered repo) is soft-failed: (False, "HTTP 422: ...").
  • publish() with token + dispatch failure: returns reason: "dispatch-failed" with a message pointing at npx @understand-quickly/cli add.
test/test_publish.py ...............    [100%]
15 passed in 0.03s

The pre-existing test/test_extract_repo_name.py is left untouched. (It currently fails to collect on a fresh checkout because it imports api.data_pipeline, which requires adalflow to be installed — that's pre-existing behaviour, not a side effect of this PR.)

Notes for maintainers

  • The registry is in early adoption; this integration is opt-in for early users. Nothing breaks if you don't merge — users can still register their wikis manually via the wizard (https://looptech-ai.github.io/understand-quickly/add.html) or npx @understand-quickly/cli add.
  • DeepWiki currently emits its persisted wiki shape under ~/.adalflow/wikicache/. This PR does not touch that path or the cache contract. The new format=graph reuses the existing pages: List[WikiPage] request body, which is what the existing Markdown / JSON exports already consume.
  • Schema choice: I picked the generic@1 fallback rather than authoring a deepwiki-open@1 first-class format, on the theory that landing the integration with the broadly-applicable schema is the lowest-friction path. If you'd like a dedicated deepwiki-open@1 schema (so consumers can dispatch on metadata.tool), happy to follow up with a schema PR on the registry side per §7 Format authoring once this lands.
  • Once a few users land in the registry, we can add AsyncFuncAI/deepwiki-open to the verified-publisher allowlist for auto-merge of registry-only PRs.
  • License: I noticed the project is MIT-licensed, same as the registry — no incompatibility to flag here.

Links

Add a `format=graph` choice and an opt-in `publish` flag to the existing
`/export/wiki` endpoint, plus a small `api/publish.py` helper, so a
generated DeepWiki can land in the public knowledge-graph registry at
`looptech-ai/understand-quickly` with no extra infrastructure.

- `format=graph` emits a `generic@1`-shaped graph (pages -> nodes,
  `relatedPages` -> edges) with `metadata.{tool, tool_version,
  generated_at, repo_url}`. Existing `markdown` / `json` exports are
  unchanged.
- `publish=true` (default false) fires a `repository_dispatch`
  `sync-entry` event at the registry, gated on `UNDERSTAND_QUICKLY_TOKEN`
  in the server env. With the token unset the graph is still produced
  and the dispatch is skipped — no network call, no failure.
- Owner/repo derives from `repo_url` (HTTPS + SSH GitHub shapes) with an
  explicit `repo: "owner/repo"` override on the request.
- `api/publish.py` is stdlib-only (`urllib`, `subprocess`, `re`,
  `json`); no new dependencies. 15 unit tests cover URL parsing, graph
  shape, dangling-edge handling, no-op paths, and a mocked dispatch
  request + soft failure on HTTP error.
- README has a short opt-in section under "API Server Details".

Protocol reference:
https://github.com/looptech-ai/understand-quickly/blob/main/docs/integrations/protocol.md
Copilot AI review requested due to automatic review settings May 8, 2026 04:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in path for exporting DeepWiki wiki pages as a generic@1 knowledge graph and (optionally) publishing the repo entry to the looptech-ai/understand-quickly registry via a GitHub repository_dispatch event.

Changes:

  • Extend /export/wiki to support format="graph" and add publish + optional repo (owner/repo) override to the request model.
  • Add a new stdlib-only api/publish.py module to build the graph payload and best-effort dispatch a sync-entry event.
  • Add unit tests for the publish helpers and update README with opt-in publishing instructions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
api/api.py Adds graph export format plus optional publish/dispatch behavior and response headers.
api/publish.py Implements graph payload creation and GitHub repository_dispatch triggering (best-effort, token-gated).
test/test_publish.py Adds self-contained unit tests for URL parsing, graph payload shape/metadata, and dispatch behavior.
README.md Documents how to export format=graph and enable opt-in publishing with UNDERSTAND_QUICKLY_TOKEN.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/api.py
Comment on lines +296 to +305
if request.publish:
owner_repo = request.repo or derive_owner_repo(request.repo_url)
publish_status = publish_to_registry(payload, owner_repo=owner_repo)
headers["X-Understand-Quickly-Dispatched"] = (
"true" if publish_status.get("dispatched") else "false"
)
if publish_status.get("reason"):
headers["X-Understand-Quickly-Reason"] = str(
publish_status["reason"]
)
Comment thread api/api.py Outdated
media_type = "application/json"

if request.publish:
owner_repo = request.repo or derive_owner_repo(request.repo_url)
Comment thread api/api.py
# Get current timestamp for the filename
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

publish_status: Optional[Dict[str, Any]] = None
Comment thread api/publish.py
Comment on lines +200 to +216
def publish(
payload: Mapping[str, Any],
*,
owner_repo: Optional[str] = None,
token: Optional[str] = None,
) -> Dict[str, Any]:
"""
Best-effort publish path.

Always returns a small status dict. Never raises — callers can wire
this in next to a normal export and trust that a failure here will
not knock over the parent request.

``payload`` is the full graph dict (used here only for log lines /
sanity). ``owner_repo`` is an explicit ``owner/repo`` to register
against; if omitted, no dispatch is attempted.
"""
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an opt-in feature to export wiki content as a knowledge graph and publish it to the understand-quickly registry. It updates the /export/wiki endpoint to support a new graph format and adds a dedicated publish module for handling GitHub repository dispatches. Feedback focuses on preventing the FastAPI event loop from blocking during synchronous network I/O by using asyncio.to_thread. Other suggestions include adding a commit field to the export request to enhance graph metadata and removing the unused git_head_sha function.

Comment thread api/api.py Outdated

if request.publish:
owner_repo = request.repo or derive_owner_repo(request.repo_url)
publish_status = publish_to_registry(payload, owner_repo=owner_repo)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The publish_to_registry function performs synchronous network I/O using urllib.request.urlopen. Calling this directly within an async def endpoint will block the FastAPI event loop, which can lead to performance degradation or unresponsiveness under load. Since asyncio is already imported and used elsewhere in this file, you should offload this blocking call to a separate thread using asyncio.to_thread.

                publish_status = await asyncio.to_thread(
                    publish_to_registry, payload, owner_repo=owner_repo
                )

Comment thread api/api.py
Comment on lines +136 to +142
repo: Optional[str] = Field(
None,
description=(
"Optional 'owner/repo' override for the registry id. If "
"omitted, derived from `repo_url`."
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To fully support the commit metadata field in the generated graph (as mentioned in the PR description and implemented in build_graph_payload), consider adding a commit field to the WikiExportRequest model. This allows the caller (e.g., a CI workflow or an orchestrator) to provide the specific git SHA associated with the wiki content.

    repo: Optional[str] = Field(
        None,
        description=(
            "Optional 'owner/repo' override for the registry id. If "
            "omitted, derived from `repo_url`."
        ),
    )
    commit: Optional[str] = Field(
        None,
        description="Optional 40-hex git commit SHA to embed in the graph metadata.",
    )

Comment thread api/api.py
Comment on lines +288 to +291
payload = build_graph_payload(
[page.model_dump() for page in request.pages],
repo_url=request.repo_url,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Pass the commit from the request to the build_graph_payload function so it can be included in the graph metadata.

Suggested change
payload = build_graph_payload(
[page.model_dump() for page in request.pages],
repo_url=request.repo_url,
)
payload = build_graph_payload(
[page.model_dump() for page in request.pages],
repo_url=request.repo_url,
commit=request.commit,
)

Comment thread api/publish.py
Comment on lines +63 to +85
def git_head_sha(repo_path: Optional[str] = None) -> Optional[str]:
"""
Return the 40-hex SHA of HEAD in ``repo_path`` (or cwd), or ``None``
if not a git checkout / git is unavailable.
"""
try:
result = subprocess.run(
["git", "rev-parse", "HEAD"],
cwd=repo_path or None,
capture_output=True,
text=True,
timeout=5,
check=False,
)
except (OSError, subprocess.SubprocessError) as exc: # pragma: no cover
logger.debug("git rev-parse failed: %s", exc)
return None
if result.returncode != 0:
return None
sha = result.stdout.strip()
if re.fullmatch(r"[0-9a-f]{40}", sha):
return sha
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The git_head_sha function is defined but appears to be unused in both this module and api/api.py. If it's intended for future use or by external callers, it's fine to keep, but otherwise, it should be removed to avoid dead code.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an opt-in feature to export wiki content as a knowledge graph and register it with the understand-quickly registry. It adds a new graph format to the /export/wiki endpoint, implements a publishing module for GitHub repository dispatches, and includes corresponding unit tests. Feedback indicates that the git_head_sha helper, though implemented, is not currently called in the API endpoint; it is recommended to use this helper to include commit metadata in the graph payload and update the imports accordingly.

Comment thread api/api.py
Comment on lines +288 to +291
payload = build_graph_payload(
[page.model_dump() for page in request.pages],
repo_url=request.repo_url,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PR description mentions that metadata.commit is embedded when available, and api/publish.py includes a git_head_sha helper for this purpose. However, build_graph_payload is currently called without the commit argument, and git_head_sha is never invoked. To fulfill the intended functionality, you should attempt to resolve the current commit SHA.

Suggested change
payload = build_graph_payload(
[page.model_dump() for page in request.pages],
repo_url=request.repo_url,
)
payload = build_graph_payload(
[page.model_dump() for page in request.pages],
repo_url=request.repo_url,
commit=git_head_sha(),
)

Comment thread api/api.py
Comment on lines +282 to +286
from api.publish import (
build_graph_payload,
derive_owner_repo,
publish as publish_to_registry,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If you apply the suggestion to include the commit SHA, ensure you also import the git_head_sha helper here.

Suggested change
from api.publish import (
build_graph_payload,
derive_owner_repo,
publish as publish_to_registry,
)
from api.publish import (
build_graph_payload,
derive_owner_repo,
git_head_sha,
publish as publish_to_registry,
)

- Wrap blocking publish_to_registry() in asyncio.to_thread so the
  /export/wiki async endpoint does not stall the FastAPI event loop
  while urllib does network I/O (Copilot HIGH, Gemini HIGH).
- Reject WikiExportRequest.repo overrides that disagree with
  derive_owner_repo(repo_url). Without this, an unauthenticated
  caller could trigger sync-entry dispatches for any registry id
  once UNDERSTAND_QUICKLY_TOKEN is configured (Copilot HIGH).
- Add WikiExportRequest.commit and thread it (or git_head_sha() as
  a best-effort fallback) into build_graph_payload so the
  metadata.commit field actually populates instead of leaving the
  git_head_sha helper as dead code (Gemini MEDIUM x3).

Existing publish unit tests still pass (15/15).
@amacsmith
Copy link
Copy Markdown
Author

Thanks @gemini-code-assist and @copilot — addressed the actionable review items in 5d69957:

File / line Reviewer Issue Fix
api/api.py:298 Copilot HIGH / Gemini HIGH publish_to_registry() does blocking urllib I/O inside an async endpoint, stalling the FastAPI event loop. Now wrapped in await asyncio.to_thread(publish_to_registry, payload, owner_repo=owner_repo). asyncio was already imported.
api/api.py:297 Copilot HIGH request.repo could override the registry id with anything once a token is set (spam vector for repository_dispatch). Mismatches now return HTTP 400. The override is only honoured when it matches derive_owner_repo(request.repo_url).
api/api.py:142 Gemini MEDIUM WikiExportRequest has no commit field, so metadata.commit was never populated. Added commit: Optional[str] to the model.
api/api.py:286,291 Gemini MEDIUM x2 git_head_sha was implemented but never imported / called. Now imported and used as a best-effort fallback when the caller omits commit.

Existing self-contained publish tests still pass (test/test_publish.py — 15/15) and api.py parses cleanly.

I left the two LOW-severity nits (publish_status local + the payload docstring in publish.py) alone for now — happy to fold them in if you'd like.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thanks for the update, @amacsmith. The fixes look solid and address the concerns raised during the review, particularly regarding the blocking I/O in the async endpoint and the metadata population. The implementation of asyncio.to_thread for the registry dispatch is the correct approach here to avoid stalling the event loop. Everything looks good to go.

@amacsmith
Copy link
Copy Markdown
Author

Quick update from upstream — the registry just hit v0.3.0 with the full toolchain live:

  • @looptech-ai/understand-quickly-cli@0.1.2 (npm)
  • @looptech-ai/understand-quickly-mcp@0.1.2 (npm, also listed at io.github.looptech-ai/understand-quickly on the MCP Registry)
  • understand-quickly@0.1.1 (PyPI)
  • looptech-ai/uq-publish-action@v0.1.0 (GitHub Marketplace)

This PR keeps DeepWiki's in-API publish path (format=graph + publish=true) since the FastAPI server is the natural producer — no CI/Action indirection needed for users already calling /export/wiki. The Marketplace Action is mainly for tools that only run in CI.

The generic@1 payload shape this PR emits is now governed by the Code-Knowledge Graph Protocol v1 spec. If a deepwiki-open@1 first-class schema would be preferable down the line, happy to follow up on the registry side per protocol §7 once this lands.

Already rebased, the asyncio.to_thread fix is in, and gemini-code-assist signed off on the changes. Happy to address any further feedback whenever you have a moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants