Skip to content

Conversation

afterrburn
Copy link
Contributor

@afterrburn afterrburn commented Jun 15, 2025

AI Agent Documentation Sync

Overview

This PR implements an AI agent that automatically syncs documentation changes with a vector database for enhanced search and retrieval capabilities.

Workflow

Automated Sync (Main Branch)

  • Trigger: GitHub Action activated when PR is merged into main branch
  • Process:
    • GitHub Action sends CURL request to agent with file changes from git diff
    • Agent receives list of changed/deleted file paths
    • For each changed file:
      • Search vector store by metadata (file path)
      • Delete existing vectors if found
      • Re-embed updated document content
      • Store new vectors back to database
    • Deleted files are automatically removed from vector store

Document Processing

  • Chunking: Documents are chunked based on content type for optimal retrieval
  • Embedding: Uses OpenAI text-embedding-3-small model
    • Note: May upgrade to larger model for improved accuracy based on quality assessment

Manual Operations

  • Full Refresh: Agent supports on-demand vector store clearing and complete re-upload of all documentation

Benefits

  • Real-time documentation sync with vector database
  • Efficient incremental updates (only changed files processed)
  • Maintains search accuracy with latest document versions
  • Flexible manual override capabilities

Summary by CodeRabbit

  • New Features

    • Added multiple GitHub Actions workflows for manual, push-triggered, and full synchronization of documentation files to an external vector store.
    • Introduced modules for content-aware chunking, keyword extraction using LLMs, embedding generation, and orchestration of document synchronization without filesystem dependency.
    • Enhanced the agent to process JSON payloads for syncing documentation changes with validation and detailed error handling.
  • Documentation

    • Added comprehensive design, user stories, and TODO documents detailing the Retrieval-Augmented Generation (RAG) system architecture and implementation plan.
  • Tests

    • Added extensive tests covering content type detection and document chunking to ensure accurate and reliable processing.
  • Chores

    • Updated project dependencies to include new libraries supporting document parsing, embedding, and testing.

Copy link
Contributor

coderabbitai bot commented Jun 15, 2025

Warning

Rate limit exceeded

@afterrburn has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 9 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 48600eb and e48edbd.

📒 Files selected for processing (3)
  • .github/workflows/sync-docs-full.yml (1 hunks)
  • agent-docs/src/agents/doc-processing/docs-orchestrator.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/embed-chunks.ts (1 hunks)

Walkthrough

This update introduces a comprehensive Retrieval-Augmented Generation (RAG) document processing pipeline for documentation. It adds new modules for chunking, embedding, keyword extraction, and orchestrating synchronization with a vector store, along with supporting design/user story documents, workflow automation via GitHub Actions, and initial test coverage for chunking logic.

Changes

File(s) Change Summary
.github/workflows/@sync-docs.yml
.github/workflows/sync-docs.yml
.github/workflows/sync-docs-full.yml
Added GitHub Actions workflows to detect documentation changes and sync them to an external vector store via webhook, supporting manual triggers, push events, and full syncs of all docs.
agent-docs/RAG-TODO.md
agent-docs/RAG-design.md
agent-docs/RAG-user-stories.md
Added design, TODO, and user story documents outlining requirements, architecture, flows, and success criteria for the RAG documentation system.
agent-docs/package.json Added gray-matter, langchain, and vitest as dependencies for document parsing, language processing, and testing.
agent-docs/src/agents/doc-processing/chunk-mdx.ts Added content-aware MDX chunking logic, content type detection, and document enrichment functions for downstream processing.
agent-docs/src/agents/doc-processing/config.ts Added export of the VECTOR_STORE_NAME constant for vector store configuration.
agent-docs/src/agents/doc-processing/docs-orchestrator.ts Added logic to sync documentation from a webhook payload, handling changed and removed files, and managing vector store updates.
agent-docs/src/agents/doc-processing/docs-processor.ts Added pipeline to chunk, enrich, and embed document content, producing vector upsert parameters for storage.
agent-docs/src/agents/doc-processing/embed-chunks.ts Added embedding utility to generate vector representations of text chunks using OpenAI models.
agent-docs/src/agents/doc-processing/index.ts Added main agent handler for documentation sync requests, with validation, error handling, and orchestration.
agent-docs/src/agents/doc-processing/keyword-extraction.ts Added LLM-based keyword extraction module for document chunks, with configurable options and structured results.
agent-docs/src/agents/doc-processing/test/chunk-mdx.test.ts Added comprehensive tests for chunking and content type detection, covering a variety of Markdown structures and edge cases.
agent-docs/src/agents/doc-processing/types.ts Added TypeScript interfaces for file payloads, sync payloads, and sync statistics to type synchronization data and results.

Sequence Diagram(s)

sequenceDiagram
    participant GitHub
    participant Workflow
    participant Webhook
    participant Agent
    participant VectorStore

    GitHub->>Workflow: Push or PR event (docs changed)
    Workflow->>Webhook: POST sync payload (changed/removed files)
    Webhook->>Agent: Forward payload
    Agent->>Agent: Validate payload
    Agent->>Agent: For each changed file:
    Agent->>Agent: - Decode content, chunk, enrich
    Agent->>Agent: - Embed chunks
    Agent->>VectorStore: Upsert vectors with metadata
    Agent->>Agent: For each removed file:
    Agent->>VectorStore: Delete vectors by file path
    Agent-->>Webhook: Respond with sync stats
    Webhook-->>Workflow: Sync result
Loading

Suggested reviewers

  • rblalock

Poem

A rabbit hops through docs anew,
Chunking, embedding, syncing too.
Keywords found with LLM’s might,
Vectors stored for future insight.
Workflows dance with every push,
Tests ensure there’s not a mush.
🥕—RAG leaps forward, what a sight!

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in Comment
  • Commit Unit Tests in branch srith/agent-391-doc-processor

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

cloudflare-workers-and-pages bot commented Jun 15, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
docs e48edbd Commit Preview URL Jun 18 2025, 04:26 AM

Base automatically changed from seng/create-agent-docs to main June 17, 2025 14:01
@coderabbitai coderabbitai bot requested review from jhaynie and rblalock June 17, 2025 14:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (17)
agent-docs/.gitignore (1)

15-15: Correct log file ignore pattern

The current entry _.log only matches filenames with a single character prefix. To ignore all .log files, use *.log instead:

- _.log
+ *.log
agent-docs/src/agents/doc-processing/config.ts (1)

1-1: Add explicit type annotation and use nullish coalescing

For clarity and to avoid unintentionally falling back on an empty string, annotate VECTOR_STORE_NAME and switch to ??:

-export const VECTOR_STORE_NAME = process.env.VECTOR_STORE_NAME || 'docs';
+export const VECTOR_STORE_NAME: string = process.env.VECTOR_STORE_NAME ?? 'docs';
agent-docs/README.md (2)

5-7: Missing alt text on badge image

<img src="https://app.agentuity.com/img/deploy.svg" /> lacks an alt attribute, triggering MD045 and reducing accessibility.

-            <img src="https://app.agentuity.com/img/deploy.svg" /> 
+            <img src="https://app.agentuity.com/img/deploy.svg" alt="Deploy to Agentuity" />

65-71: Specify a language for the fenced code block

Markdownlint MD040 warns when language identifiers are omitted.

-```
+```text
agent-docs/agentuity.yaml (1)

75-78: Typos & missing metadata

  • description: (l 15) is empty – fill this so the project is discoverable.
  • "An applicaiton that process documents" (l 77) ➜ “An application that processes documents”.
-    description: An applicaiton that process documents
+    description: An application that processes documents
agent-docs/RAG-TODO.md (1)

1-1: Grammar nit: use “To-Dos” (hyphen)

Heading should read “RAG System Implementation To-Dos” for correctness.

agent-docs/.cursor/rules/agent.mdc (1)

11-15: Consistent casing for “TypeScript”

Update “Typescript” ➜ “TypeScript” to match official spelling.

agent-docs/src/agents/doc-processing/docs-orchestrator.ts (1)

63-75: Avoid mutating loop variable & ensure metadata merge

chunk is reused; mutating chunk.metadata inside the for-loop can cause hidden side-effects if the object is reused elsewhere.

Prefer creating a new object for upsert:

-        chunk.metadata = {
-          ...chunk.metadata,
-          path: logicalPath,
-        };
-        await ctx.vector.upsert(VECTOR_STORE_NAME, chunk);
+        await ctx.vector.upsert(VECTOR_STORE_NAME, {
+          ...chunk,
+          metadata: { ...chunk.metadata, path: logicalPath },
+        });
agent-docs/index.ts (1)

3-9: Ambient-type augmentation is too broad

Adding isBun to the global Process interface inside a top-level file pollutes every consumer that imports this module, risking declaration collisions in downstream packages/tests. Prefer a dedicated types/global.d.ts (referenced via tsconfig.json -> files) so the augmentation is scoped to the package instead of the compiled JS bundle.

agent-docs/src/agents/doc-processing/index.ts (1)

37-60: Validation logic is correct but repetitive – consider a schema validator

Three hand-rolled loops check structure and types. A small Zod/Valibot schema would:

  1. Collapse 20 LOC into 2.
  2. Produce consolidated, descriptive errors.
  3. Future-proof the endpoint as payload evolves.

Not blocking, yet improves reliability & readability.

const SyncSchema = z.object({
  commit: z.string().optional(),
  repo: z.string().optional(),
  changed: z.array(z.object({ path: z.string(), content: z.string() })),
  removed: z.array(z.string())
});
const payload = SyncSchema.parse(await req.data.json());
agent-docs/src/agents/doc-processing/test/chunk-mdx.test.ts (1)

5-6: Use the Document constructor to retain prototype helpers

Creating a plain object misses Document methods (e.g., .split()) that some downstream utilities rely on.

-const makeDoc = (content: string): Document => ({ pageContent: content, metadata: { contentType: "text" } });
+const makeDoc = (content: string) =>
+  new Document({ pageContent: content, metadata: { contentType: "text" } });
agent-docs/RAG-user-stories.md (1)

63-63: Minor wording duplication

are are duplicated in “Answers are accurate and up-to-date”.

agent-docs/.cursor/rules/sdk.mdc (1)

30-38: Consider using string[] instead of string for keywords in examples.

Throughout the documentation, keywords are treated conceptually as a list; showing them as a plain string (single value) may confuse users and diverge from the actual implementation in DocumentMetadata & the RAG pipeline, where keywords are an array.
Updating the sample types keeps docs and code aligned.

.github/workflows/@sync-docs.yml (2)

11-11: Upgrade to actions/checkout@v4 to avoid deprecation warnings.

v3 is still functional but now shows Node-20 deprecation warnings in CI. Switching to v4 is a drop-in replacement and future-proofs the workflow.


77-77: Add a newline at EOF & strip trailing spaces.
Resolves YAML-lint errors and keeps tooling quiet.

agent-docs/RAG-design.md (1)

16-23: keywords should be string[] for consistency.

All later sections treat keywords as an array (boosting, highlighting, etc.). Updating the interface prevents downstream type confusion.

agent-docs/src/agents/doc-processing/chunk-mdx.ts (1)

98-108: Avoid any[] – preserve chunk typing for downstream safety.

Use Document[] (or a dedicated MarkdownChunk interface) instead of any[] for finalChunks; this prevents silent shape drift later in the pipeline.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e84056 and b52980b.

⛔ Files ignored due to path filters (1)
  • agent-docs/bun.lock is excluded by !**/*.lock
📒 Files selected for processing (25)
  • .github/workflows/@sync-docs.yml (1 hunks)
  • .github/workflows/sync-docs.yml (1 hunks)
  • agent-docs/.cursor/rules/agent.mdc (1 hunks)
  • agent-docs/.cursor/rules/agentuity.mdc (1 hunks)
  • agent-docs/.cursor/rules/sdk.mdc (1 hunks)
  • agent-docs/.editorconfig (1 hunks)
  • agent-docs/.gitignore (1 hunks)
  • agent-docs/RAG-TODO.md (1 hunks)
  • agent-docs/RAG-design.md (1 hunks)
  • agent-docs/RAG-user-stories.md (1 hunks)
  • agent-docs/README.md (1 hunks)
  • agent-docs/agentuity.yaml (1 hunks)
  • agent-docs/biome.json (1 hunks)
  • agent-docs/index.ts (1 hunks)
  • agent-docs/package.json (1 hunks)
  • agent-docs/src/agents/doc-processing/chunk-mdx.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/config.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/docs-orchestrator.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/docs-processor.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/embed-chunks.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/index.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/keyword-extraction.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/test/chunk-mdx.test.ts (1 hunks)
  • agent-docs/tsconfig.json (1 hunks)
  • tsconfig.json (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
agent-docs/src/agents/doc-processing/index.ts (1)
agent-docs/src/agents/doc-processing/docs-orchestrator.ts (1)
  • syncDocsFromPayload (49-102)
agent-docs/src/agents/doc-processing/docs-orchestrator.ts (2)
agent-docs/src/agents/doc-processing/config.ts (1)
  • VECTOR_STORE_NAME (1-1)
agent-docs/src/agents/doc-processing/docs-processor.ts (1)
  • processDoc (21-25)
🪛 LanguageTool
agent-docs/RAG-TODO.md

[grammar] ~1-~1: It appears that a hyphen is missing in the plural noun “to-dos”?
Context: # RAG System Implementation TODOs ## 1. Document Chunking & Metadata - [...

(TO_DO_HYPHEN)

agent-docs/RAG-design.md

[style] ~94-~94: This phrase is redundant. Consider writing “relevant”.
Context: ...yword matches. Why? - Ensures that highly relevant technical results (e.g., containing exa...

(HIGHLY_RELEVANT)


[uncategorized] ~217-~217: You might be missing the article “the” here.
Context: ...tion" } ``` --- ## 12. Summary - Only main content is embedded; keywords and metad...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

agent-docs/RAG-user-stories.md

[duplication] ~63-~63: Possible typo: you repeated a word.
Context: ...les ## Success Criteria ### For Quick Answers - Answers are accurate and up-to-date - Responses...

(ENGLISH_WORD_REPEAT_RULE)

🪛 markdownlint-cli2 (0.17.2)
agent-docs/README.md

6-6: Images should have alternate text (alt text)
null

(MD045, no-alt-text)


65-65: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🪛 actionlint (1.7.7)
.github/workflows/@sync-docs.yml

11-11: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

.github/workflows/sync-docs.yml

14-14: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)
.github/workflows/@sync-docs.yml

[error] 29-29: trailing spaces

(trailing-spaces)


[error] 33-33: trailing spaces

(trailing-spaces)


[error] 36-36: trailing spaces

(trailing-spaces)


[error] 64-64: trailing spaces

(trailing-spaces)


[error] 73-73: trailing spaces

(trailing-spaces)


[error] 77-77: no new line character at the end of file

(new-line-at-end-of-file)


[error] 77-77: trailing spaces

(trailing-spaces)

.github/workflows/sync-docs.yml

[error] 20-20: trailing spaces

(trailing-spaces)


[error] 24-24: trailing spaces

(trailing-spaces)


[error] 27-27: trailing spaces

(trailing-spaces)


[error] 55-55: trailing spaces

(trailing-spaces)


[error] 66-66: trailing spaces

(trailing-spaces)


[error] 71-71: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs
🔇 Additional comments (13)
agent-docs/.gitignore (1)

1-40: Comprehensive ignore rules for agent-docs directory

The .gitignore covers common artifacts—dependencies, build outputs, coverage, logs, env files, caches, IDE files, macOS metadata, and Agentuity-specific files. This establishes a clean repo surface and prevents accidental check-ins of transient or sensitive files.

agent-docs/biome.json (1)

1-27: Validated Biome configuration

The Biome setup is well-structured: imports are organized, linting is enabled with recommended rules, formatting rules enforce 2-space indentation, single quotes, ES5 trailing commas, and mandatory semicolons for JS, and the .agentuity folder is correctly excluded.

tsconfig.json (1)

28-28: Confirm exclusion of the agent-docs subproject

Excluding "agent-docs" from the root TS build is correct since it has its own tsconfig.json. Ensure that the subproject’s config covers all intended files.

agent-docs/.cursor/rules/agentuity.mdc (1)

1-10: File is configuration-only – no issues detected

Nothing actionable surfaced in this cursor rule.

agent-docs/package.json (1)

3-5: main points to a file that is never emitted

tsconfig.json sets "noEmit": true, and your runtime entry in the start script is .agentuity/index.js. The "main": "index.js" field therefore advertises a file that doesn’t exist in the published package, breaking consumers that import/require it.

Options:

  1. Remove the main field entirely if the package is private and only invoked through agentuity.
  2. Or point it to the bundled artifact (.agentuity/index.js) created in prestart.
agent-docs/tsconfig.json (1)

20-22: Non-existent type reference

"types": ["@types/bun", "@agentuity/sdk"] – only the first entry resolves to a definitely-typed package. Unless the SDK ships its own types entry-point, this directive will fail type-resolution in editors.

Confirm the presence of node_modules/@agentuity/sdk/index.d.ts; otherwise, drop it or add a types export in the SDK.

agent-docs/agentuity.yaml (1)

53-57: Resource limits look suspiciously low for embedding + vector-store workloads

350 Mi memory / 500 m CPU may be insufficient when:

  • loading the OpenAI SDK & streaming embeddings;
  • holding ~1 k embeddings in memory during upserts.

Please benchmark a realistic sync (e.g. full docs refresh) and tune these values to avoid OOM kills or throttling.
Consider starting at 1 Gi memory & 1 CPU.

agent-docs/src/agents/doc-processing/docs-processor.ts (1)

43-47: Verify VectorUpsertParams field name

Most vector DB SDKs expect the embedding under values or vector, not embeddings.
Double-check the Agentuity SDK; otherwise upserts will fail at runtime.

agent-docs/src/agents/doc-processing/index.ts (1)

66-75: Error response leaks nothing sensitive – good practice

Catching unknown and returning the message while still logging stack traces keeps the external API clean. Nice.

agent-docs/src/agents/doc-processing/test/chunk-mdx.test.ts (1)

1-4: Test suite is tied to Bun – portability concern

bun:test is great locally, but CI & most developers default to Node + Vitest/Jest. Unless the entire repo standardises on Bun, consider exporting test helpers so the logic can be executed under any runner, or add a Node-based parallel config to avoid fragmenting the toolchain.

agent-docs/src/agents/doc-processing/keyword-extraction.ts (2)

31-38: Prompt may exceed model context for large chunks.

chunkContent is injected verbatim; a large chunk could breach token limits and hard-fail the request. Consider truncating or recursively splitting very long chunks before calling the LLM.


68-71: Return keywords even when extraction fails to keep schema stable.

If no keywords are extracted, return an empty array rather than undefined to avoid downstream undefined.map errors.

agent-docs/src/agents/doc-processing/chunk-mdx.ts (1)

44-46: List-item regex has incorrect precedence, causing false positives.

/^[-*+]\s+|^\d+\.\s+/ means “bullet or start-of-line digit” – the anchors only apply to the first alternative. Wrap the alternation:

- const listLines = lines.filter(line => /^[-*+]\s+|^\d+\.\s+/.test(line.trim()));
+ const listLines = lines.filter(line =>
+   /^([-*+]|\d+\.)\s+/.test(line.trim())
+ );

Likely an incorrect or invalid review comment.

afterrburn added 2 commits June 17, 2025 08:14
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Seng Rith <50646727+afterrburn@users.noreply.github.com>
@coderabbitai coderabbitai bot requested a review from rblalock June 17, 2025 14:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
.github/workflows/sync-docs.yml (2)

65-71: Fix JSON payload quoting to avoid syntax errors with embedded quotes.

Wrapping ${{ steps.files.outputs.payload }} in single quotes will break valid JSON containing single quotes. Use printf with double quotes or a heredoc:

- echo '${{ steps.files.outputs.payload }}' | jq '.'
+ printf '%s\n' "${{ steps.files.outputs.payload }}" | jq '.'

...
- curl https://agentuity.ai/... \
-   -d '${{ steps.files.outputs.payload }}'
+ curl https://agentuity.ai/... \
+   --data "${{ steps.files.outputs.payload }}"

34-41: Safely iterate over changed files to handle spaces in filenames.

The for f in $CHANGED_FILES; do loop splits on IFS (spaces) and will break on filenames containing spaces or special characters. Consider switching to a null-terminated approach:

- CHANGED_FILES=$(git diff --name-only ${{ github.event.before }} ${{ github.sha }} -- 'content/**/*.mdx' | sed 's|^content/||')
+ CHANGED_FILES=$(git diff --name-only -z ${{ github.event.before }} ${{ github.sha }} -- 'content/**/*.mdx' \
+   | sed -z 's|^content/||' \
+   | tr '\0' '\n')

...
-   for f in $CHANGED_FILES; do
+   while IFS= read -r f; do
🧹 Nitpick comments (2)
.github/workflows/sync-docs.yml (2)

20-27: Remove trailing whitespace in YAML.

YAML-lint flags trailing spaces on lines 20, 24, 27, 55, and 66. Please trim these to comply with YAML standards.

Also applies to: 55-66


71-71: Ensure newline at end of file.

Add a newline character at EOF to satisfy the new-line-at-end-of-file rule.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1599923 and c3f2303.

📒 Files selected for processing (1)
  • .github/workflows/sync-docs.yml (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
.github/workflows/sync-docs.yml

[error] 20-20: trailing spaces

(trailing-spaces)


[error] 24-24: trailing spaces

(trailing-spaces)


[error] 27-27: trailing spaces

(trailing-spaces)


[error] 55-55: trailing spaces

(trailing-spaces)


[error] 66-66: trailing spaces

(trailing-spaces)


[error] 71-71: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
agent-docs/src/agents/doc-processing/docs-orchestrator.ts (1)

9-24: Still capped at 1 000 vectors – previous feedback not addressed
The hard‐coded limit: 1000 means vectors beyond that window are silently left behind. This was already pointed out in an earlier review; please loop until no results remain or use a metadata bulk-delete API if available.

🧹 Nitpick comments (3)
agent-docs/src/agents/doc-processing/docs-orchestrator.ts (2)

19-21: Delete vectors in parallel or by bulk API for big docs
Sequential await ctx.vector.delete(...) can take minutes when thousands of vectors exist. If the store supports batch deletion or if you can Promise.all() deletes in chunks (e.g. groups of 50-100), overall sync time will drop dramatically.


64-71: Consider bulk upsert for throughput
Upserting each chunk individually increases latency and API overhead. If the vector store supports it, collect chunks and issue a single upsertMany/batch call.

agent-docs/src/agents/doc-processing/index.ts (1)

23-25: Blind type-cast skips structural validation
await req.data.json() as unknown as SyncPayload forces the compiler to trust the payload shape. Using a runtime schema (e.g. zod, io-ts) would remove the need for manual field checks below and catch extra/invalid properties early.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3f2303 and a0b7960.

📒 Files selected for processing (3)
  • agent-docs/src/agents/doc-processing/docs-orchestrator.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/index.ts (2 hunks)
  • agent-docs/src/agents/doc-processing/types.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • agent-docs/src/agents/doc-processing/types.ts
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs

Comment on lines +64 to +69
for (const chunk of chunks) {
chunk.metadata = {
...chunk.metadata,
path: logicalPath,
};
await ctx.vector.upsert(VECTOR_STORE_NAME, chunk);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

chunk.metadata may be undefined – spread will crash
If processDoc returns chunks without a metadata object, the spread operator throws a TypeError. Safest fix:

-        chunk.metadata = {
-          ...chunk.metadata,
-          path: logicalPath,
-        };
+        chunk.metadata = {
+          ...(chunk.metadata ?? {}),
+          path: logicalPath,
+        };
🤖 Prompt for AI Agents
In agent-docs/src/agents/doc-processing/docs-orchestrator.ts around lines 64 to
69, the code spreads chunk.metadata which may be undefined, causing a TypeError.
To fix this, ensure chunk.metadata is initialized as an object before spreading
by using a default empty object if it is undefined. This prevents the spread
operator from crashing when metadata is missing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
.github/workflows/@sync-docs.yml (2)

17-26: Diff logic misses branch and PR changes
Comparing only the last two commits on origin/main ignores feature-branch and PR diffs. Use GitHub-provided SHAs (github.event.before/github.event.after or PR head.sha/base.sha) to capture changes on the current ref.


73-76: Leak-prone hard-coded webhook URL
Store the webhook URL in an encrypted secret (e.g., AGENTUITY_WEBHOOK_URL) and reference ${{ secrets.AGENTUITY_WEBHOOK_URL }} to avoid exposing credentials.

🧹 Nitpick comments (3)
.github/workflows/sync-docs-full.yml (2)

3-5: Restrict full sync trigger
Triggering a full re-sync on every push can overload CI and the vector store; scope push events to main or use a schedule for periodic full syncs.


18-18: Clean up trailing whitespace and EOF newline
Multiple lines have trailing spaces and there's no final newline, causing YAML lint errors. Please remove trailing whitespace and add a newline at EOF.

Also applies to: 22-22, 29-29, 33-33, 41-41, 48-48, 53-53, 62-62, 65-65, 68-68, 72-72, 82-82, 87-87

.github/workflows/@sync-docs.yml (1)

28-28: Remove trailing spaces and ensure EOF newline
YAML lint reports multiple trailing spaces and a missing final newline. Please clean up whitespace and add a newline at EOF.

Also applies to: 32-32, 35-35, 63-63, 72-72, 76-76

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0b7960 and 48600eb.

📒 Files selected for processing (2)
  • .github/workflows/@sync-docs.yml (1 hunks)
  • .github/workflows/sync-docs-full.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/sync-docs-full.yml

10-10: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

.github/workflows/@sync-docs.yml

10-10: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)
.github/workflows/sync-docs-full.yml

[error] 18-18: trailing spaces

(trailing-spaces)


[error] 22-22: trailing spaces

(trailing-spaces)


[error] 29-29: trailing spaces

(trailing-spaces)


[error] 33-33: trailing spaces

(trailing-spaces)


[error] 41-41: trailing spaces

(trailing-spaces)


[error] 48-48: trailing spaces

(trailing-spaces)


[error] 53-53: trailing spaces

(trailing-spaces)


[error] 62-62: trailing spaces

(trailing-spaces)


[error] 65-65: trailing spaces

(trailing-spaces)


[error] 68-68: trailing spaces

(trailing-spaces)


[error] 72-72: trailing spaces

(trailing-spaces)


[error] 82-82: trailing spaces

(trailing-spaces)


[error] 87-87: no new line character at the end of file

(new-line-at-end-of-file)


[error] 87-87: trailing spaces

(trailing-spaces)

.github/workflows/@sync-docs.yml

[error] 28-28: trailing spaces

(trailing-spaces)


[error] 32-32: trailing spaces

(trailing-spaces)


[error] 35-35: trailing spaces

(trailing-spaces)


[error] 63-63: trailing spaces

(trailing-spaces)


[error] 72-72: trailing spaces

(trailing-spaces)


[error] 76-76: no new line character at the end of file

(new-line-at-end-of-file)


[error] 76-76: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs

name: Sync Docs to Vector Store (PR & Push)

on:
workflow_dispatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix workflow_dispatch syntax
The workflow_dispatch event declaration must include a colon (workflow_dispatch:). Without it, the workflow won’t trigger.

🤖 Prompt for AI Agents
In .github/workflows/@sync-docs.yml at line 4, the workflow_dispatch event is
missing a colon. Add a colon after workflow_dispatch to correct the syntax,
changing it from "workflow_dispatch" to "workflow_dispatch:" so the workflow
triggers properly.

afterrburn and others added 7 commits June 17, 2025 22:04
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Seng Rith <50646727+afterrburn@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Seng Rith <50646727+afterrburn@users.noreply.github.com>
@afterrburn afterrburn merged commit 8d70955 into main Jun 19, 2025
2 checks passed
@afterrburn afterrburn deleted the srith/agent-391-doc-processor branch June 19, 2025 01:34
@coderabbitai coderabbitai bot mentioned this pull request Jun 19, 2025
This was referenced Jun 28, 2025
@coderabbitai coderabbitai bot mentioned this pull request Sep 5, 2025
@coderabbitai coderabbitai bot mentioned this pull request Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants