Skip to content

Conversation

afterrburn
Copy link
Contributor

@afterrburn afterrburn commented Jun 19, 2025

Summary by CodeRabbit

  • New Features
    • Introduced a new "doc-qa" agent capable of answering questions based on developer documentation.
    • Added support for retrieving and displaying relevant documentation in response to user prompts.
  • Bug Fixes
    • Cleaned up extraneous code and improved error handling in document processing and embedding logic.
  • Chores
    • Modularized and externalized documentation sync workflows, improving reliability and maintainability.
    • Added multiple utility scripts for collecting, validating, building payloads, and sending documentation files.
  • Documentation
    • Added and updated type definitions to improve clarity and maintainability of code related to document processing and retrieval.
  • Configuration
    • Added a new configuration constant to control the number of vector search results returned.

Copy link
Contributor

coderabbitai bot commented Jun 19, 2025

Walkthrough

This update introduces a new "doc-qa" agent for question answering over developer documentation, adds supporting types and configuration, and modularizes the document sync process by replacing inline shell logic in GitHub Actions workflows with standalone scripts for file collection, validation, payload building, and webhook sending. Several TypeScript types and import paths are refactored for clarity and maintainability.

Changes

File(s) Change Summary
.github/workflows/sync-docs.yml, .github/workflows/sync-docs-full.yml Refactored workflows to use external scripts for file collection, validation, payload building, and webhook sending; removed inline shell logic and JSON construction.
bin/collect-all-files.sh, bin/collect-changed-files.sh, bin/validate-files.sh Added scripts to collect all or changed .mdx files, and validate file paths for sync operations.
bin/build-payload.sh, bin/send-webhook.sh Added scripts to construct JSON payloads from file lists and send them to a webhook with retry logic.
agent-docs/agentuity.yaml Added a new agent entry: doc-qa, for answering questions based on developer documentation.
agent-docs/src/agents/doc-qa/index.ts, agent-docs/src/agents/doc-qa/types.ts Implemented the doc-qa agent: retrieves relevant docs from vector store, streams GPT-4o answers, and defines a RelevantDoc interface.
agent-docs/src/agents/doc-processing/docs-orchestrator.ts Updated import path for VECTOR_STORE_NAME, removed unused import, and added logging of vector upsert results.
agent-docs/src/agents/doc-processing/docs-processor.ts, agent-docs/src/agents/doc-processing/types.ts Moved ChunkMetadata type to a separate file and updated imports accordingly.
agent-docs/src/agents/doc-processing/embed-chunks.ts Removed an extraneous closing brace; no functional change.
agent-docs/src/agents/doc-processing/index.ts Removed unused type import.
config.ts Added exported constant vectorSearchNumber (value: 20).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant doc-qa Agent
    participant VectorStore
    participant OpenAI GPT-4o

    User->>doc-qa Agent: Sends question prompt
    doc-qa Agent->>VectorStore: retrieveRelevantDocs(prompt)
    VectorStore-->>doc-qa Agent: Returns relevant document vectors
    doc-qa Agent->>VectorStore: retrieveDocumentBasedOnPath(path) (for each doc)
    VectorStore-->>doc-qa Agent: Returns full document content(s)
    doc-qa Agent->>OpenAI GPT-4o: Streams answer with system prompt and docs
    OpenAI GPT-4o-->>doc-qa Agent: Streams answer JSON
    doc-qa Agent-->>User: Returns streamed answer and doc paths
Loading

Possibly related PRs

  • Create document processor #195: Introduced the initial document processor agent and a GitHub workflow for syncing docs, which this PR now refactors to use external scripts for modularity and maintainability.

Suggested reviewers

  • jhaynie
  • rblalock

Poem

In the warren where code and docs entwine,
A new doc-qa agent hops in line.
Scripts now scurry, neat and spry,
Gathering files as workflows fly.
With answers streamed both swift and bright—
This bunny’s work feels just right!
🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

npm error Exit handler never called!
npm error This is an error with npm itself. Please report this error at:
npm error https://github.com/npm/cli/issues
npm error A complete log of this run can be found in: /.npm/_logs/2025-06-20T14_10_31_458Z-debug-0.log


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d07da8 and f1d2d28.

📒 Files selected for processing (1)
  • agent-docs/src/agents/doc-qa/index.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • agent-docs/src/agents/doc-qa/index.ts
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in Comment
  • Commit Unit Tests in branch seng/fix-gh-action

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

cloudflare-workers-and-pages bot commented Jun 19, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
docs b2c4f50 Commit Preview URL Jun 20 2025, 02:32 PM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
.github/workflows/sync-docs-full.yml (3)

3-5: Consider re-adding manual invocation
Might be useful to retain workflow_dispatch for emergency or ad-hoc runs:

on:
  push:
  workflow_dispatch:

55-57: Include cleanup of temporary file list
mdx_files.txt remains after the run. Consider removing it as well to keep the workspace clean:

+ rm -f mdx_files.txt

25-25: Remove trailing spaces
YAML lint reports trailing spaces on these blank lines. Please strip them to satisfy the linter.

Also applies to: 40-40, 43-43, 54-54

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d70955 and ef7bfb6.

📒 Files selected for processing (1)
  • .github/workflows/sync-docs-full.yml (2 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
.github/workflows/sync-docs-full.yml

[error] 25-25: trailing spaces

(trailing-spaces)


[error] 40-40: trailing spaces

(trailing-spaces)


[error] 43-43: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs
🔇 Additional comments (4)
.github/workflows/sync-docs-full.yml (4)

3-5: Automatic trigger on push
Switching from workflow_dispatch to push ensures docs sync occurs on every commit without manual steps.


21-39: Iterative JSON assembly improves readability
Replacing a complex jq pipeline with a shell loop makes it easier to follow and handle file-level errors. Skips empty lines and non-existent paths gracefully.


41-42: Consistent removed files generation
Mirroring the changed-files approach to build removed_files.json keeps the full-sync logic symmetric and clear.


44-53: Clear final payload construction
Using --slurpfile for both changed and removed arrays with a default fallback ensures the payload is always valid JSON.

@coderabbitai coderabbitai bot requested review from jhaynie and rblalock June 19, 2025 02:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
agent-docs/src/agents/doc-processing/docs-orchestrator.ts (1)

79-80: Verify the return type of vector upsert operation.

The logging addition improves observability, but please verify that the ctx.vector.upsert() method returns an object with a length property. If the return type is not an array or array-like object, this will cause a runtime error.

Run the following script to verify the upsert method signature and return type:

#!/bin/bash
# Description: Search for the vector upsert method definition and its return type
# Expected: Find the method signature and return type documentation

# Search for upsert method definition in the codebase
ast-grep --pattern 'upsert($_, $_) {
  $$$
}'

# Search for upsert method type definitions
rg -A 5 "upsert.*:" --type=ts

# Search for vector context interface or type definitions
rg -A 10 -B 2 "interface.*Vector|type.*Vector" --type=ts

Consider the log level appropriateness:

Info-level logging for individual chunk operations might be verbose in production. Consider using debug level instead:

-        ctx.logger.info('Upserted chunk: %o', result.length);
+        ctx.logger.debug('Upserted chunk: %o', result.length);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef7bfb6 and 2ec0e46.

📒 Files selected for processing (2)
  • .github/workflows/sync-docs-full.yml (2 hunks)
  • agent-docs/src/agents/doc-processing/docs-orchestrator.ts (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
agent-docs/src/agents/doc-processing/docs-orchestrator.ts (1)
agent-docs/src/agents/doc-processing/config.ts (1)
  • VECTOR_STORE_NAME (1-1)
🪛 YAMLlint (1.37.1)
.github/workflows/sync-docs-full.yml

[error] 25-25: trailing spaces

(trailing-spaces)


[error] 40-40: trailing spaces

(trailing-spaces)


[error] 43-43: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


[error] 73-73: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.github/workflows/sync-docs-full.yml (1)

25-25: Address the formatting issues that were previously flagged.

The static analysis tools are still flagging the same trailing spaces and missing newline issues that were identified in previous reviews. These formatting issues need to be resolved.

Also applies to: 40-40, 43-43, 54-54, 57-57, 76-76

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ec0e46 and afec57f.

📒 Files selected for processing (3)
  • .github/workflows/sync-docs-full.yml (2 hunks)
  • agent-docs/agentuity.yaml (1 hunks)
  • agent-docs/src/agents/doc-qa/index.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • agent-docs/agentuity.yaml
🧰 Additional context used
🪛 YAMLlint (1.37.1)
.github/workflows/sync-docs-full.yml

[error] 25-25: trailing spaces

(trailing-spaces)


[error] 40-40: trailing spaces

(trailing-spaces)


[error] 43-43: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


[error] 57-57: trailing spaces

(trailing-spaces)


[error] 76-76: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs
🔇 Additional comments (1)
.github/workflows/sync-docs-full.yml (1)

63-64: Good addition of authentication.

The addition of the Authorization header with the bearer token from GitHub secrets properly secures the webhook endpoint communication.

Also applies to: 74-74

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
.github/workflows/sync-docs-full.yml (2)

16-16: Remove trailing spaces.
Eliminate trailing whitespace on this blank line to satisfy YAMLlint.


26-26: Add newline at end-of-file.
Include a trailing newline to comply with POSIX and YAML standards.

🧹 Nitpick comments (12)
bin/collect-all-files.sh (2)

15-23: Handle filenames with spaces and special characters
Using a line-based loop can break on filenames containing whitespace. Consider a null-delimited approach:

-find content -type f -name "*.mdx" | \
-    sed 's|^content/||' | \
-    sort | \
-    while read -r file; do
+find content -type f -name "*.mdx" -print0 | \
+    sed -z 's|content/||g' | \
+    sort -z | \
+    while IFS= read -r -d '' file; do

26-27: Avoid scanning twice
You’re calling find again to count files. Capture the list once and reuse it to improve performance:

files=$(find content -type f -name "*.mdx")
echo "$files" | sed ... 
file_count=$(echo "$files" | wc -l)
bin/validate-files.sh (2)

27-31: Quote file paths in warnings
To avoid ambiguity when a filename contains spaces or special chars, wrap the path in quotes in warning messages:

-echo "Warning: Unsafe path detected, skipping: $file" >&2
+echo "Warning: Unsafe path detected, skipping: '$file'" >&2

47-50: Graceful exit when no valid files
Exiting with an error if there are zero valid files can break incremental syncs with no changes. Consider exiting with status 0 and emitting an informational message instead:

-if [ "$valid_count" -eq 0 ]; then
-   echo "Error: No valid files found" >&2
-   exit 1
-fi
+if [ "$valid_count" -eq 0 ]; then
+   echo "No valid files to process, exiting gracefully" >&2
+   exit 0
+fi
bin/send-webhook.sh (2)

25-30: Stream payload directly to curl
Storing the entire payload in a variable can be memory-intensive and may alter newlines. You can pipe stdin directly into curl:

- # Read payload from stdin
- payload=$(cat)
+ # Stream payload from stdin

Then build your curl_args with --data-binary @- and invoke:

echo "$payload" | curl "${curl_args[@]}" "$WEBHOOK_URL"

33-36: Ensure jq is available
Before using jq for JSON validation, verify it’s installed to fail fast:

command -v jq >/dev/null 2>&1 || { echo "Error: jq is required but not found" >&2; exit 1; }
bin/build-payload.sh (2)

57-76: Use a JSON tool to build payload
Manually concatenating JSON risks quoting and escaping bugs. Consider using jq:

jq -Rn \
  --arg repo "$REPO_NAME" \
  --argjson changed "$(printf '%s\n' "${changed_files[@]}" | jq -R . | jq -s '.')" \
  --argjson removed "$(printf '%s\n' "${removed_files[@]}" | jq -R . | jq -s '.')" \
  '{repo: $repo, changed: $changed, removed: $removed}'

46-51: Verify mode consistency
Currently, in full mode you treat all changed files as removed but still include them in the changed array (so you delete and then re-upload). Confirm this is intentional; if you only want a purge, you may want to clear changed_files in full mode:

 if [ "$MODE" = "full" ]; then
-    removed_files=("${changed_files[@]}")
+    removed_files=("${changed_files[@]}")
+    changed_files=()
     echo "Full mode: treating all files as removed for refresh" >&2
 fi
.github/workflows/sync-docs-full.yml (1)

24-26: Simplify input handling and catch errors.
Replace the redundant cat with direct input redirection and enable pipefail so any failure bubbles up immediately.

 run: |
+  set -o pipefail
-  cat all-files.txt | \
-    ./bin/build-payload.sh "${{ github.repository }}" full | \
-    ./bin/send-webhook.sh "https://agentuity.ai/webhook/f61d5ce9d6ed85695cc992c55ccdc2a6" "Bearer $AGENTUITY_TOKEN"
+  ./bin/build-payload.sh "${{ github.repository }}" full < all-files.txt | \
+    ./bin/send-webhook.sh "https://agentuity.ai/webhook/f61d5ce9d6ed85695cc992c55ccdc2a6" "Bearer $AGENTUITY_TOKEN"
.github/workflows/sync-docs.yml (3)

29-31: Remove redundant cat and enforce error exit.
Streamline the payload step by reading changed-files.txt directly and using pipefail to catch failures without extra subshells.

 run: |
+  set -o pipefail
-  cat changed-files.txt | \
-    ./bin/build-payload.sh "${{ github.repository }}" incremental | \
-    ./bin/send-webhook.sh "https://agentuity.ai/webhook/f61d5ce9d6ed85695cc992c55ccdc2a6" "Bearer $AGENTUITY_TOKEN"
+  ./bin/build-payload.sh "${{ github.repository }}" incremental < changed-files.txt | \
+    ./bin/send-webhook.sh "https://agentuity.ai/webhook/f61d5ce9d6ed85695cc992c55ccdc2a6" "Bearer $AGENTUITY_TOKEN"

21-21: Remove trailing spaces.
Clear any trailing whitespace here to resolve YAMLlint errors.


31-31: Add newline at end-of-file.
Ensure a final newline for POSIX and YAML compliance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between afec57f and bd0ec71.

📒 Files selected for processing (7)
  • .github/workflows/sync-docs-full.yml (1 hunks)
  • .github/workflows/sync-docs.yml (1 hunks)
  • bin/build-payload.sh (1 hunks)
  • bin/collect-all-files.sh (1 hunks)
  • bin/collect-changed-files.sh (1 hunks)
  • bin/send-webhook.sh (1 hunks)
  • bin/validate-files.sh (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Full Docs Sync to Vector Store
bin/collect-all-files.sh

[error] 20-20: Shell script error: 'echo' write error: Broken pipe. The script exited with code 1.

🪛 actionlint (1.7.7)
.github/workflows/sync-docs.yml

28-28: shellcheck reported issue in this script: SC2002:style:1:5: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead

(shellcheck)

.github/workflows/sync-docs-full.yml

23-23: shellcheck reported issue in this script: SC2002:style:1:5: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead

(shellcheck)

🪛 YAMLlint (1.37.1)
.github/workflows/sync-docs.yml

[error] 21-21: trailing spaces

(trailing-spaces)


[error] 31-31: no new line character at the end of file

(new-line-at-end-of-file)

.github/workflows/sync-docs-full.yml

[error] 16-16: trailing spaces

(trailing-spaces)


[error] 26-26: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs
🔇 Additional comments (4)
bin/send-webhook.sh (1)

1-4: Well-structured retry logic
The script cleanly handles retries with exponential backoff and error reporting.

bin/collect-changed-files.sh (1)

1-55: Solid implementation for detecting changed MDX files
The script correctly parses added, modified, and deleted files with clear logging and safe Git refs validation.

bin/build-payload.sh (1)

1-3: Clear usage and strict error handling
Shebang, set -euo pipefail, and the usage function are well-defined.

.github/workflows/sync-docs-full.yml (1)

4-4: Verify full-sync trigger scope.
The workflow now runs on every push across all branches, triggering a full vector sync each time. Confirm this broad trigger aligns with your performance and usage expectations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.github/workflows/sync-docs-full.yml (1)

13-15: Enable fast-fail on collection pipeline
Include set -o pipefail at the start of this step so any failure in collect-all-files.sh or validate-files.sh causes the job to fail immediately.

 run: |
+  set -o pipefail
   ./bin/collect-all-files.sh | \
     ./bin/validate-files.sh > all-files.txt
🧹 Nitpick comments (2)
.github/workflows/sync-docs-full.yml (2)

16-16: Remove trailing whitespace
This blank line contains trailing spaces and triggers a lint error.

-          
+

26-26: Add newline at end of file
POSIX standards and linters expect a trailing newline.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3cbb4c9 and ff284c1.

📒 Files selected for processing (1)
  • .github/workflows/sync-docs-full.yml (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
.github/workflows/sync-docs-full.yml

[error] 16-16: trailing spaces

(trailing-spaces)


[error] 26-26: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.github/workflows/sync-docs-full.yml (1)

28-28: Secure webhook endpoint and add newline at EOF.
Replace the hard-coded URL with a GitHub secret and add a terminating newline:

-          ./bin/send-webhook.sh "https://agentuity.ai/webhook/…" "Bearer $AGENTUITY_TOKEN"
+          ./bin/send-webhook.sh "${{ secrets.SYNC_DOCS_WEBHOOK_URL }}" "Bearer $AGENTUITY_TOKEN"

+# (ensure newline at end of file)
🧹 Nitpick comments (3)
.github/workflows/sync-docs-full.yml (1)

26-27: Avoid useless use of cat.
ShellCheck SC2002 warns against cat all-files.txt | cmd. Use input redirection instead:

-          cat all-files.txt | \
-            ./bin/build-payload.sh "${{ github.repository }}" full | \
+          ./bin/build-payload.sh "${{ github.repository }}" full < all-files.txt | \
             ./bin/send-webhook.sh "https://agentuity.ai/webhook/…" "Bearer $AGENTUITY_TOKEN"
.github/workflows/sync-docs.yml (2)

33-34: Eliminate unnecessary cat.
Use input redirection to feed changed-files.txt:

-          cat changed-files.txt | \
-            ./bin/build-payload.sh "${{ github.repository }}" incremental | \
+          ./bin/build-payload.sh "${{ github.repository }}" incremental < changed-files.txt | \
             ./bin/send-webhook.sh "https://agentuity.ai/webhook/…" "Bearer $AGENTUITY_TOKEN"

24-24: Remove trailing whitespace.
YAMLlint flags trailing spaces on this line.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff284c1 and 8928aa3.

📒 Files selected for processing (3)
  • .github/workflows/sync-docs-full.yml (1 hunks)
  • .github/workflows/sync-docs.yml (1 hunks)
  • bin/collect-all-files.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/collect-all-files.sh
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/sync-docs-full.yml

24-24: shellcheck reported issue in this script: SC2002:style:2:5: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead

(shellcheck)

.github/workflows/sync-docs.yml

31-31: shellcheck reported issue in this script: SC2002:style:2:5: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead

(shellcheck)

🪛 YAMLlint (1.37.1)
.github/workflows/sync-docs-full.yml

[error] 17-17: trailing spaces

(trailing-spaces)


[error] 28-28: no new line character at the end of file

(new-line-at-end-of-file)

.github/workflows/sync-docs.yml

[error] 24-24: trailing spaces

(trailing-spaces)


[error] 35-35: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs
🔇 Additional comments (7)
.github/workflows/sync-docs-full.yml (3)

12-12: Step definition looks good.
This clearly separates file collection and validation into its own job step.


14-16: Error handling with set -euo pipefail is correctly implemented.
It ensures failures in either script abort the workflow immediately.


18-19: Clarity of output is adequate.
Logging the list of files to sync aids debugging.

.github/workflows/sync-docs.yml (4)

15-16: Full-history checkout added.
fetch-depth: 0 ensures diffs include the full commit history needed for comparisons.


20-23: Pipeline error handling & fetch context are solid.
Combining set -euo pipefail with git fetch origin provides robust failure detection and proper diff base.


26-27: Logging of changed files is clear.
Echoing and displaying changed-files.txt aids troubleshooting.


31-31: Repeated error handling is consistent.
Reapplying set -euo pipefail in the second step is good practice.

set -euo pipefail
cat changed-files.txt | \
./bin/build-payload.sh "${{ github.repository }}" incremental | \
./bin/send-webhook.sh "https://agentuity.ai/webhook/f61d5ce9d6ed85695cc992c55ccdc2a6" "Bearer $AGENTUITY_TOKEN"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Secure webhook URL and add newline at EOF.
Replace the hard-coded endpoint with a GitHub secret and append a newline:

-            ./bin/send-webhook.sh "https://agentuity.ai/webhook/…"
+            ./bin/send-webhook.sh "${{ secrets.SYNC_DOCS_WEBHOOK_URL }}"

+# (ensure newline at end of file)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 35-35: no new line character at the end of file

(new-line-at-end-of-file)

🤖 Prompt for AI Agents
In .github/workflows/sync-docs.yml at line 35, the webhook URL is hard-coded,
which is insecure. Replace the URL string with a reference to a GitHub secret
(e.g., ${{ secrets.WEBHOOK_URL }}) to secure the endpoint. Also, ensure the file
ends with a newline character to comply with POSIX standards.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
agent-docs/src/agents/doc-qa/index.ts (1)

89-95: Clarify the empty space query and arbitrary limit.

The query uses a single space ' ' and a limit of 10000, which seems arbitrary and potentially inefficient for large document collections.

Consider using a more explicit approach:

async function retrieveDocumentBasedOnPath(ctx: AgentContext, path: string): Promise<string> {
  const dbQuery = {
-   query: ' ',
-   limit: 10000,
+   query: '', // Empty query to match all chunks for this path
+   limit: 1000, // More reasonable limit, adjust based on your max document size
    metadata: {
      path: path
    }
  }

Or better yet, if the vector store supports it, use a metadata-only query without requiring a text query.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8928aa3 and 4dccc27.

📒 Files selected for processing (7)
  • agent-docs/src/agents/doc-processing/docs-orchestrator.ts (2 hunks)
  • agent-docs/src/agents/doc-processing/docs-processor.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/embed-chunks.ts (1 hunks)
  • agent-docs/src/agents/doc-processing/types.ts (1 hunks)
  • agent-docs/src/agents/doc-qa/config.ts (1 hunks)
  • agent-docs/src/agents/doc-qa/index.ts (1 hunks)
  • agent-docs/src/agents/doc-qa/types.ts (1 hunks)
✅ Files skipped from review due to trivial changes (5)
  • agent-docs/src/agents/doc-processing/embed-chunks.ts
  • agent-docs/src/agents/doc-processing/types.ts
  • agent-docs/src/agents/doc-processing/docs-processor.ts
  • agent-docs/src/agents/doc-qa/types.ts
  • agent-docs/src/agents/doc-qa/config.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • agent-docs/src/agents/doc-processing/docs-orchestrator.ts
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs
🔇 Additional comments (2)
agent-docs/src/agents/doc-qa/index.ts (2)

1-7: LGTM: Clean imports and proper typing.

The imports are well-organized and include proper TypeScript types.


88-116: Good error handling implementation.

The error handling in retrieveDocumentBasedOnPath is well-implemented with proper logging and graceful degradation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
agent-docs/src/agents/doc-qa/index.ts (1)

25-39: JSON response format is requested but not validated or parsed.

The system prompt explicitly requests a JSON response with "answer" and "documents" fields, but the streaming response bypasses any JSON validation. This could lead to malformed responses being sent to clients if the LLM doesn't follow the JSON format correctly.

🧹 Nitpick comments (3)
agent-docs/src/agents/doc-qa/index.ts (3)

58-60: Clean up formatting and variable declaration.

The empty lines and var declaration should be cleaned up for consistency.

  try {
-

-
-    var vectors = await ctx.vector.search(VECTOR_STORE_NAME, dbQuery);
+    const vectors = await ctx.vector.search(VECTOR_STORE_NAME, dbQuery);

93-94: Consider using a more explicit query for document retrieval.

Using a single space ' ' as the query parameter seems unusual. Consider using an empty string '' or a more explicit approach like omitting the query parameter entirely when filtering by metadata only.

  const dbQuery = {
-    query: ' ',
+    query: '',
    limit: 10000,
    metadata: {
      path: path
    }
  }

104-108: Optimize redundant type casting.

The metadata is cast to ChunkMetadata twice in consecutive lines, which is redundant and can be simplified.

    const sortedVectors = vectors
      .map(vec => {
+       const metadata = vec.metadata as ChunkMetadata;
+       return {
+         metadata,
+         index: metadata.chunkIndex
+       };
+     })
-      .map(vec => ({
-        metadata: vec.metadata as ChunkMetadata,
-        index: (vec.metadata as ChunkMetadata).chunkIndex
-      }))
      .sort((a, b) => a.index - b.index);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5efc854 and 7d07da8.

📒 Files selected for processing (1)
  • agent-docs/src/agents/doc-qa/index.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
agent-docs/src/agents/doc-qa/index.ts (1)
Learnt from: afterrburn
PR: agentuity/docs#198
File: agent-docs/src/agents/doc-qa/index.ts:9-50
Timestamp: 2025-06-20T13:09:14.197Z
Learning: In the agent-docs codebase, the user prefers to let exceptions throw from Agent functions rather than adding local try-catch blocks, allowing errors to be handled at a higher level in the application.
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Workers Builds: docs
🔇 Additional comments (1)
agent-docs/src/agents/doc-qa/index.ts (1)

94-94: Verify the high limit for document chunk retrieval.

The limit of 10000 for retrieving document chunks seems very high. This could potentially cause performance issues or memory constraints for large documents.

Consider whether this limit is necessary or if it could be reduced to a more reasonable value based on typical document sizes in your system.

model: openai('gpt-4o'),
system: systemPrompt,
prompt: prompt,
maxTokens: 2048,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model's size is 128,000 tokens - why so low here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I'll extend the size in the next PR. Mainly, I wanted to keep the generation concise and short for now. Will push its limit.

@afterrburn afterrburn merged commit c581ebc into main Jun 22, 2025
2 checks passed
@afterrburn afterrburn deleted the seng/fix-gh-action branch June 22, 2025 16:01
@coderabbitai coderabbitai bot mentioned this pull request Jun 28, 2025
@coderabbitai coderabbitai bot mentioned this pull request Sep 8, 2025
@coderabbitai coderabbitai bot mentioned this pull request Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants