Skip to content

feat: auto-suggest tags based on content analysis (#243)#273

Merged
RobertLD merged 1 commit intomainfrom
feat/auto-suggest-tags
Mar 2, 2026
Merged

feat: auto-suggest tags based on content analysis (#243)#273
RobertLD merged 1 commit intomainfrom
feat/auto-suggest-tags

Conversation

@RobertLD
Copy link
Owner

@RobertLD RobertLD commented Mar 2, 2026

Closes #243

Adds tag suggestion via TF-IDF-like keyword extraction:

  • Core: suggestTags() in tags.ts — tokenize, stopword filter, TF scoring, known-tag boosting
  • MCP: suggest-tags tool
  • CLI: libscope tag suggest <docId> [--limit <n>]
  • REST: GET /api/v1/documents/:id/suggest-tags?limit=5
  • 6 new tests

Copilot AI review requested due to automatic review settings March 2, 2026 20:40
@vercel
Copy link

vercel bot commented Mar 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
libscope Ignored Ignored Preview Mar 2, 2026 8:47pm

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds content-based tag suggestions to LibScope, exposing the functionality through core APIs and multiple user-facing interfaces (CLI, REST, MCP) to improve document discoverability.

Changes:

  • Implemented suggestTags() in src/core/tags.ts using tokenization, stopword filtering, TF scoring, and boosting for existing system tags.
  • Exposed tag suggestion via MCP tool (suggest-tags), CLI command (libscope tag suggest), and REST endpoint (GET /api/v1/documents/:id/suggest-tags).
  • Added unit tests covering suggestTags() behavior and error handling.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/core/tags.ts Adds stopword list, tokenizer, and suggestTags() implementation.
src/core/index.ts Re-exports suggestTags from the core entrypoint.
src/mcp/server.ts Adds MCP tool for suggesting tags for a document.
src/cli/index.ts Adds libscope tag suggest command to print suggested tags.
src/api/routes.ts Adds REST route matcher + handler for suggest-tags endpoint.
tests/unit/tags.test.ts Adds unit tests for suggestTags() including boosting, exclusion, limits, and missing-doc error.

const suggestDocId = matchDocumentSuggestTags(segments);
if (suggestDocId && method === "GET") {
const limitRaw = url.searchParams.get("limit");
const limit = limitRaw ? parseInt(limitRaw, 10) : 5;
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit is parsed with parseInt but not validated for NaN/non-numeric values (unlike other endpoints in this file that convert NaN to undefined). As written, ?limit=foo results in limit = NaN and an empty suggestions list rather than a 400. Validate the query param (finite integer >= 1, and optionally cap it) and return a VALIDATION_ERROR on invalid input.

Suggested change
const limit = limitRaw ? parseInt(limitRaw, 10) : 5;
let limit = 5;
if (limitRaw !== null) {
const parsedLimit = Number.parseInt(limitRaw, 10);
if (!Number.isFinite(parsedLimit) || !Number.isInteger(parsedLimit) || parsedLimit < 1) {
sendError(res, 400, "VALIDATION_ERROR", "Query parameter 'limit' must be a positive integer");
return;
}
const MAX_LIMIT = 100;
limit = Math.min(parsedLimit, MAX_LIMIT);
}

Copilot uses AI. Check for mistakes.
Comment on lines +380 to +388
// Suggest tags for a document
const suggestDocId = matchDocumentSuggestTags(segments);
if (suggestDocId && method === "GET") {
const limitRaw = url.searchParams.get("limit");
const limit = limitRaw ? parseInt(limitRaw, 10) : 5;
const suggestions = suggestTags(db, suggestDocId, limit);
const took = Math.round(performance.now() - start);
sendJson(res, 200, { documentId: suggestDocId, suggestions }, took);
return;
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a new REST endpoint (GET /api/v1/documents/:id/suggest-tags) but there are existing API route tests in tests/unit/api.test.ts for other /api/v1/documents/* endpoints. Add unit coverage for the new route (success case, invalid limit, and 404 for missing document) to prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +330 to +331
const limit = maxSuggestions ?? 5;

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxSuggestions is used directly to compute limit, but callers can pass NaN, 0, negative numbers, or very large values (e.g., REST/CLI parseInt failures). This can lead to surprising behavior (e.g., slice(0, NaN) returns an empty list). Validate maxSuggestions inside suggestTags (finite integer, >= 1, and ideally a reasonable max) and throw ValidationError when invalid.

Suggested change
const limit = maxSuggestions ?? 5;
const defaultLimit = 5;
const maxAllowedLimit = 100;
let limit: number;
if (maxSuggestions === undefined) {
limit = defaultLimit;
} else {
if (
!Number.isFinite(maxSuggestions) ||
!Number.isInteger(maxSuggestions) ||
maxSuggestions < 1 ||
maxSuggestions > maxAllowedLimit
) {
throw new ValidationError(
`maxSuggestions must be an integer between 1 and ${maxAllowedLimit}, got ${String(
maxSuggestions,
)}`,
);
}
limit = maxSuggestions;
}

Copilot uses AI. Check for mistakes.
Comment on lines +365 to +368
// Get all known tags in the system for boosting
const knownTags = new Set(
(db.prepare("SELECT name FROM tags").all() as Array<{ name: string }>).map((r) => r.name),
);
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knownTags is built by selecting all tag names on every suggestTags call. If the tag table grows, this becomes an O(totalTags) read per request/tool invocation. Consider querying only the subset relevant to the extracted terms (e.g., SELECT name FROM tags WHERE name IN (...)) or otherwise caching known tag names for the lifetime of the process.

Suggested change
// Get all known tags in the system for boosting
const knownTags = new Set(
(db.prepare("SELECT name FROM tags").all() as Array<{ name: string }>).map((r) => r.name),
);
// Get known tags among the candidate terms for boosting
const candidateTerms = [...tf.keys()];
const knownTags =
candidateTerms.length === 0
? new Set<string>()
: new Set(
(
db
.prepare(
`SELECT name FROM tags WHERE name IN (${candidateTerms
.map(() => "?")
.join(", ")})`,
)
.all(...candidateTerms) as Array<{ name: string }>
).map((r) => r.name),
);

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +100
"its",
"our",
"their",
"your",
"his",
"her",
"our",
"out",
"then",
"there",
"these",
"those",
"them",
"they",
"you",
"your",
"only",
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STOPWORDS includes duplicates (e.g., "our" and "your" appear twice). This is harmless at runtime because it's a Set, but it adds noise and makes future maintenance harder. Remove the duplicates to keep the list clean.

Copilot uses AI. Check for mistakes.
.action((documentId: string, opts: { limit: string }) => {
const { db } = initializeApp();
try {
const limit = parseInt(opts.limit, 10);
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command parses --limit with parseInt directly, but the CLI already has parseIntOption() which validates numeric options and exits with a clear error. Using raw parseInt here allows NaN (or negative/zero) to flow into suggestTags, producing confusing results. Use parseIntOption(opts.limit, "--limit") (and enforce >= 1) for consistent CLI behavior.

Suggested change
const limit = parseInt(opts.limit, 10);
const limit = parseIntOption(opts.limit, "--limit");
if (limit < 1) {
console.error("--limit must be at least 1");
process.exitCode = 1;
return;
}

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@RobertLD RobertLD force-pushed the feat/auto-suggest-tags branch from f93c920 to 43146e4 Compare March 2, 2026 20:47
@RobertLD RobertLD merged commit da158ff into main Mar 2, 2026
4 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Auto-suggest tags for documents using content analysis

2 participants