Add opt-in publish to understand-quickly registry#517
Conversation
Add a `format=graph` choice and an opt-in `publish` flag to the existing
`/export/wiki` endpoint, plus a small `api/publish.py` helper, so a
generated DeepWiki can land in the public knowledge-graph registry at
`looptech-ai/understand-quickly` with no extra infrastructure.
- `format=graph` emits a `generic@1`-shaped graph (pages -> nodes,
`relatedPages` -> edges) with `metadata.{tool, tool_version,
generated_at, repo_url}`. Existing `markdown` / `json` exports are
unchanged.
- `publish=true` (default false) fires a `repository_dispatch`
`sync-entry` event at the registry, gated on `UNDERSTAND_QUICKLY_TOKEN`
in the server env. With the token unset the graph is still produced
and the dispatch is skipped — no network call, no failure.
- Owner/repo derives from `repo_url` (HTTPS + SSH GitHub shapes) with an
explicit `repo: "owner/repo"` override on the request.
- `api/publish.py` is stdlib-only (`urllib`, `subprocess`, `re`,
`json`); no new dependencies. 15 unit tests cover URL parsing, graph
shape, dangling-edge handling, no-op paths, and a mocked dispatch
request + soft failure on HTTP error.
- README has a short opt-in section under "API Server Details".
Protocol reference:
https://github.com/looptech-ai/understand-quickly/blob/main/docs/integrations/protocol.md
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in path for exporting DeepWiki wiki pages as a generic@1 knowledge graph and (optionally) publishing the repo entry to the looptech-ai/understand-quickly registry via a GitHub repository_dispatch event.
Changes:
- Extend
/export/wikito supportformat="graph"and addpublish+ optionalrepo(owner/repo) override to the request model. - Add a new stdlib-only
api/publish.pymodule to build the graph payload and best-effort dispatch async-entryevent. - Add unit tests for the publish helpers and update README with opt-in publishing instructions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
api/api.py |
Adds graph export format plus optional publish/dispatch behavior and response headers. |
api/publish.py |
Implements graph payload creation and GitHub repository_dispatch triggering (best-effort, token-gated). |
test/test_publish.py |
Adds self-contained unit tests for URL parsing, graph payload shape/metadata, and dispatch behavior. |
README.md |
Documents how to export format=graph and enable opt-in publishing with UNDERSTAND_QUICKLY_TOKEN. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if request.publish: | ||
| owner_repo = request.repo or derive_owner_repo(request.repo_url) | ||
| publish_status = publish_to_registry(payload, owner_repo=owner_repo) | ||
| headers["X-Understand-Quickly-Dispatched"] = ( | ||
| "true" if publish_status.get("dispatched") else "false" | ||
| ) | ||
| if publish_status.get("reason"): | ||
| headers["X-Understand-Quickly-Reason"] = str( | ||
| publish_status["reason"] | ||
| ) |
| media_type = "application/json" | ||
|
|
||
| if request.publish: | ||
| owner_repo = request.repo or derive_owner_repo(request.repo_url) |
| # Get current timestamp for the filename | ||
| timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") | ||
|
|
||
| publish_status: Optional[Dict[str, Any]] = None |
| def publish( | ||
| payload: Mapping[str, Any], | ||
| *, | ||
| owner_repo: Optional[str] = None, | ||
| token: Optional[str] = None, | ||
| ) -> Dict[str, Any]: | ||
| """ | ||
| Best-effort publish path. | ||
|
|
||
| Always returns a small status dict. Never raises — callers can wire | ||
| this in next to a normal export and trust that a failure here will | ||
| not knock over the parent request. | ||
|
|
||
| ``payload`` is the full graph dict (used here only for log lines / | ||
| sanity). ``owner_repo`` is an explicit ``owner/repo`` to register | ||
| against; if omitted, no dispatch is attempted. | ||
| """ |
There was a problem hiding this comment.
Code Review
This pull request introduces an opt-in feature to export wiki content as a knowledge graph and publish it to the understand-quickly registry. It updates the /export/wiki endpoint to support a new graph format and adds a dedicated publish module for handling GitHub repository dispatches. Feedback focuses on preventing the FastAPI event loop from blocking during synchronous network I/O by using asyncio.to_thread. Other suggestions include adding a commit field to the export request to enhance graph metadata and removing the unused git_head_sha function.
|
|
||
| if request.publish: | ||
| owner_repo = request.repo or derive_owner_repo(request.repo_url) | ||
| publish_status = publish_to_registry(payload, owner_repo=owner_repo) |
There was a problem hiding this comment.
The publish_to_registry function performs synchronous network I/O using urllib.request.urlopen. Calling this directly within an async def endpoint will block the FastAPI event loop, which can lead to performance degradation or unresponsiveness under load. Since asyncio is already imported and used elsewhere in this file, you should offload this blocking call to a separate thread using asyncio.to_thread.
publish_status = await asyncio.to_thread(
publish_to_registry, payload, owner_repo=owner_repo
)| repo: Optional[str] = Field( | ||
| None, | ||
| description=( | ||
| "Optional 'owner/repo' override for the registry id. If " | ||
| "omitted, derived from `repo_url`." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
To fully support the commit metadata field in the generated graph (as mentioned in the PR description and implemented in build_graph_payload), consider adding a commit field to the WikiExportRequest model. This allows the caller (e.g., a CI workflow or an orchestrator) to provide the specific git SHA associated with the wiki content.
repo: Optional[str] = Field(
None,
description=(
"Optional 'owner/repo' override for the registry id. If "
"omitted, derived from `repo_url`."
),
)
commit: Optional[str] = Field(
None,
description="Optional 40-hex git commit SHA to embed in the graph metadata.",
)| payload = build_graph_payload( | ||
| [page.model_dump() for page in request.pages], | ||
| repo_url=request.repo_url, | ||
| ) |
There was a problem hiding this comment.
Pass the commit from the request to the build_graph_payload function so it can be included in the graph metadata.
| payload = build_graph_payload( | |
| [page.model_dump() for page in request.pages], | |
| repo_url=request.repo_url, | |
| ) | |
| payload = build_graph_payload( | |
| [page.model_dump() for page in request.pages], | |
| repo_url=request.repo_url, | |
| commit=request.commit, | |
| ) |
| def git_head_sha(repo_path: Optional[str] = None) -> Optional[str]: | ||
| """ | ||
| Return the 40-hex SHA of HEAD in ``repo_path`` (or cwd), or ``None`` | ||
| if not a git checkout / git is unavailable. | ||
| """ | ||
| try: | ||
| result = subprocess.run( | ||
| ["git", "rev-parse", "HEAD"], | ||
| cwd=repo_path or None, | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=5, | ||
| check=False, | ||
| ) | ||
| except (OSError, subprocess.SubprocessError) as exc: # pragma: no cover | ||
| logger.debug("git rev-parse failed: %s", exc) | ||
| return None | ||
| if result.returncode != 0: | ||
| return None | ||
| sha = result.stdout.strip() | ||
| if re.fullmatch(r"[0-9a-f]{40}", sha): | ||
| return sha | ||
| return None |
There was a problem hiding this comment.
Code Review
This pull request introduces an opt-in feature to export wiki content as a knowledge graph and register it with the understand-quickly registry. It adds a new graph format to the /export/wiki endpoint, implements a publishing module for GitHub repository dispatches, and includes corresponding unit tests. Feedback indicates that the git_head_sha helper, though implemented, is not currently called in the API endpoint; it is recommended to use this helper to include commit metadata in the graph payload and update the imports accordingly.
| payload = build_graph_payload( | ||
| [page.model_dump() for page in request.pages], | ||
| repo_url=request.repo_url, | ||
| ) |
There was a problem hiding this comment.
The PR description mentions that metadata.commit is embedded when available, and api/publish.py includes a git_head_sha helper for this purpose. However, build_graph_payload is currently called without the commit argument, and git_head_sha is never invoked. To fulfill the intended functionality, you should attempt to resolve the current commit SHA.
| payload = build_graph_payload( | |
| [page.model_dump() for page in request.pages], | |
| repo_url=request.repo_url, | |
| ) | |
| payload = build_graph_payload( | |
| [page.model_dump() for page in request.pages], | |
| repo_url=request.repo_url, | |
| commit=git_head_sha(), | |
| ) |
| from api.publish import ( | ||
| build_graph_payload, | ||
| derive_owner_repo, | ||
| publish as publish_to_registry, | ||
| ) |
There was a problem hiding this comment.
If you apply the suggestion to include the commit SHA, ensure you also import the git_head_sha helper here.
| from api.publish import ( | |
| build_graph_payload, | |
| derive_owner_repo, | |
| publish as publish_to_registry, | |
| ) | |
| from api.publish import ( | |
| build_graph_payload, | |
| derive_owner_repo, | |
| git_head_sha, | |
| publish as publish_to_registry, | |
| ) |
- Wrap blocking publish_to_registry() in asyncio.to_thread so the /export/wiki async endpoint does not stall the FastAPI event loop while urllib does network I/O (Copilot HIGH, Gemini HIGH). - Reject WikiExportRequest.repo overrides that disagree with derive_owner_repo(repo_url). Without this, an unauthenticated caller could trigger sync-entry dispatches for any registry id once UNDERSTAND_QUICKLY_TOKEN is configured (Copilot HIGH). - Add WikiExportRequest.commit and thread it (or git_head_sha() as a best-effort fallback) into build_graph_payload so the metadata.commit field actually populates instead of leaving the git_head_sha helper as dead code (Gemini MEDIUM x3). Existing publish unit tests still pass (15/15).
|
Thanks @gemini-code-assist and @copilot — addressed the actionable review items in
Existing self-contained publish tests still pass ( I left the two LOW-severity nits ( |
|
Thanks for the update, @amacsmith. The fixes look solid and address the concerns raised during the review, particularly regarding the blocking I/O in the async endpoint and the metadata population. The implementation of |
|
Quick update from upstream — the registry just hit
This PR keeps DeepWiki's in-API publish path ( The Already rebased, the |
Why
looptech-ai/understand-quicklyis a public, machine-readable registry of code-knowledge graphs. It ships an MCP server and a stableregistry.jsonAPI, so AI agents (Claude, Codex, Cursor via MCP) and human readers can discover graphs and fetch them by URL. Wiring an opt-in publish path into DeepWiki's existing/export/wikiendpoint lets any DeepWiki user land in the registry with one extra request body field — and gets their generated wiki addressable by any agent without any infrastructure on either side.raw.githubusercontent.com.generic@1schema we emit out of the box.What changes
"graph"to theWikiExportRequest.formatliteral alongside the existing"markdown"/"json". When set,/export/wikiemits ageneric@1knowledge graph: eachWikiPagebecomes a node (kind="wiki-page"), eachrelatedPagesreference becomes akind="related"edge, andfilePaths/importanceride along undernode.dataso consumers can map nodes back to source files.publish: bool = Falseflag (and an optionalrepo: "owner/repo"override) to the same request. Withpublish=true, after writing the export the API fires arepository_dispatchsync-entryevent atlooptech-ai/understand-quickly. The dispatch is gated onUNDERSTAND_QUICKLY_TOKENin the server env — without it, the graph is still produced and the dispatch is silently skipped (no network call, no exit-1).metadata.{tool: "deepwiki-open", tool_version, generated_at, repo_url}on the emitted graph; embedmetadata.commitwhen the helper is given one (the orchestrator passes throughgit rev-parse HEADfrom the cloned source repo when available — the helper is happy to skip it cleanly when not).api/publish.py(stdlib only —urllib.request,subprocess,re,json). No new dependencies.No-op default
Default behaviour is unchanged.
formatstill acceptsmarkdownandjson.publishdefaults tofalse. Withpublish=truebut noUNDERSTAND_QUICKLY_TOKEN, the endpoint emits the graph file as usual and logs a single informational line — no network call, no5xx.Token setup
The user adds a fine-grained GitHub PAT to the server env (or repo secrets, when run from CI):
looptech-ai/understand-quicklyonly.Repository dispatches: write. Nothing else.A drop-in CI workflow snippet lives at
docs/integrations/sample-publish-workflow.yml.Test plan
15 new unit tests in
test/test_publish.py(self-contained — they don't importapi.data_pipeline, so they don't needadalflowor any AI provider keys to run):derive_owner_repoparses HTTPS, HTTPS-with-.git, HTTPS-with-trailing-slash, and SSH (git@github.com:...) GitHub URLs; returnsNonefor unknown shapes.build_graph_payloadisgeneric@1-compatible ({nodes, edges}).tool=="deepwiki-open",tool_version, ISO-8601generated_at, optionalcommit(40-hex),repo_url.label,data.filePaths,data.importance.relatedPagesreferences (pointing at unknown ids) are dropped, not emitted as edges.publish()with no token: returns{dispatched: false, reason: "no-token"}, never callsurlopen.publish()with noowner_repo: returns{dispatched: false, reason: "no-owner-repo"}, never callsurlopen.dispatch_sync()happy path: single POST tohttps://api.github.com/repos/looptech-ai/understand-quickly/dispatcheswith body{"event_type":"sync-entry","client_payload":{"id":"<owner>/<repo>"}}andAuthorization: Bearer ...header (mockedurlopen).dispatch_sync()HTTPError(e.g. 422 from an unregistered repo) is soft-failed:(False, "HTTP 422: ...").publish()with token + dispatch failure: returnsreason: "dispatch-failed"with a message pointing atnpx @understand-quickly/cli add.The pre-existing
test/test_extract_repo_name.pyis left untouched. (It currently fails to collect on a fresh checkout because it importsapi.data_pipeline, which requiresadalflowto be installed — that's pre-existing behaviour, not a side effect of this PR.)Notes for maintainers
npx @understand-quickly/cli add.~/.adalflow/wikicache/. This PR does not touch that path or the cache contract. The newformat=graphreuses the existingpages: List[WikiPage]request body, which is what the existing Markdown / JSON exports already consume.generic@1fallback rather than authoring adeepwiki-open@1first-class format, on the theory that landing the integration with the broadly-applicable schema is the lowest-friction path. If you'd like a dedicateddeepwiki-open@1schema (so consumers can dispatch onmetadata.tool), happy to follow up with a schema PR on the registry side per §7 Format authoring once this lands.AsyncFuncAI/deepwiki-opento the verified-publisher allowlist for auto-merge of registry-only PRs.Links
generic@1schema: https://github.com/looptech-ai/understand-quickly/blob/main/schemas/generic@1.json