Skip to content

feat: adds workflow to ingest all docs#24

Merged
semmet95 merged 1 commit into
mainfrom
feat/all-docs-ingestion
May 10, 2026
Merged

feat: adds workflow to ingest all docs#24
semmet95 merged 1 commit into
mainfrom
feat/all-docs-ingestion

Conversation

@semmet95
Copy link
Copy Markdown
Contributor

@semmet95 semmet95 commented May 10, 2026

Fixes: #17

Summary by CodeRabbit

  • Chores
    • Added a manual documentation ingestion workflow to let maintainers trigger doc processing.
    • Ingestion now collects and processes documentation parts in a fixed order and exits early when nothing new is found.
    • The workflow only runs post-processing when new files are detected.
    • Updated CI Python setup to the latest supported version for improved workflow reliability.

Review Change Stack

@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ea9a7a99-6aa5-4f61-b7ed-902e38276305

📥 Commits

Reviewing files that changed from the base of the PR and between d94102b and 4002779.

📒 Files selected for processing (2)
  • .github/workflows/ingest_all.yml
  • .github/workflows/post_on_merge.yml
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/post_on_merge.yml
  • .github/workflows/ingest_all.yml

📝 Walkthrough

Walkthrough

Adds a new manual "Ingest all docs" workflow and updates the post-merge workflow to produce a deterministic ADDED_FILES ordering (sources → claims → proofs). Both workflows install Python deps, discover files in those directories, export an ordered list to GITHUB_ENV, and conditionally run the ingestion script when files exist.

Changes

Documentation Ingestion Workflows

Layer / File(s) Summary
Workflow Definition & Setup
.github/workflows/ingest_all.yml, .github/workflows/post_on_merge.yml
New manual-trigger workflow ingest_all added; job checkout and Python setup established; post_on_merge.yml updated to actions/setup-python@v6.
Dependencies
.github/workflows/ingest_all.yml
Upgrades pip and installs packages from requirements.txt.
File Discovery & Ordering
.github/workflows/ingest_all.yml, .github/workflows/post_on_merge.yml
Discovers files in sources/, claims/, and proofs/ and builds a deterministic ordered list exported as ADDED_FILES/ordered_files to GITHUB_ENV (sources → claims → proofs).
Ingestion Execution
.github/workflows/ingest_all.yml
Runs python scripts/post_requests.py only when ADDED_FILES is non-empty, supplying API_BASE_URL and API_KEY via workflow env/secrets.
sequenceDiagram
  participant User as User (manual trigger / merge)
  participant GH as GitHub Actions
  participant Runner as Runner (checkout, setup, install)
  participant Script as post_requests.py
  participant API as External API

  User->>GH: trigger workflow (manual or on merge)
  GH->>Runner: checkout, setup Python, install deps
  Runner->>Runner: discover files (sources → claims → proofs)
  Runner->>GH: export ADDED_FILES
  GH->>Script: run script when ADDED_FILES non-empty
  Script->>API: POST requests per file
Loading

🎯 3 (Moderate) | ⏱️ ~20 minutes

"I hop through files in ordered rows,
sources first where knowledge grows,
claims then proofs, I post with glee,
a rabbit's curl of CI harmony 🐇✨"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: adds workflow to ingest all docs' directly and clearly describes the main change: adding a new GitHub Actions workflow for ingesting documentation.
Linked Issues check ✅ Passed The pull request successfully implements the requirements from issue #17: adds a new workflow to ingest all docs in the specified order (sources → claims → proofs).
Out of Scope Changes check ✅ Passed All changes are in-scope: the new ingest_all.yml workflow directly addresses issue #17, and the post_on_merge.yml update improves file ordering consistency which supports the primary objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/all-docs-ingestion

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ingest_all.yml:
- Around line 21-24: The step currently exits early when a single directory has
no files (e.g., the checks using variables like sources_files and proofs_files),
which prevents collecting later groups; instead remove the immediate exit calls,
detect each directory (sources_files, proofs_files, claims_files) and append any
found files into a shared ADDED_FILES variable incrementally, and after checking
all three directories perform one final conditional: if ADDED_FILES is empty
then echo "No sources files found" and exit 0, otherwise continue; update the
checks surrounding the variables referenced in this diff so they only add to
ADDED_FILES and do not call exit until after all directories are processed.

In @.github/workflows/post_on_merge.yml:
- Around line 33-35: The current construction of ordered_files always appends
newline separators even when grep returns nothing, causing ordered_files to be
non-empty and triggering post_requests.py; update the logic that builds
ordered_files (the ordered_files variable assignment block) to only append each
section when the grep result is non-empty (i.e., check the output of echo
"$files" | grep '^sources/' etc. before concatenating or use conditional
appends), and ensure the final ordered_files is tested for non-empty before
invoking post_requests.py so the script only runs when there are actual added
docs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c641bec2-e95b-4137-aba2-370ca520a802

📥 Commits

Reviewing files that changed from the base of the PR and between 0b79e2b and 8b5453c.

📒 Files selected for processing (2)
  • .github/workflows/ingest_all.yml
  • .github/workflows/post_on_merge.yml

Comment thread .github/workflows/ingest_all.yml
Comment thread .github/workflows/post_on_merge.yml
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".github/workflows/ingest_all.yml">

<violation number="1" location=".github/workflows/ingest_all.yml:21">
P2: The script returns early when `sources/` is empty, which skips ingesting `claims/` and `proofs/` files. For an "ingest all docs" workflow, this causes incomplete ingestion.</violation>
</file>

<file name=".github/workflows/post_on_merge.yml">

<violation number="1" location=".github/workflows/post_on_merge.yml:33">
P2: The new ordered-files concatenation makes `ADDED_FILES` non-empty (`\n\n`) when no files were added, so the POST step is triggered unnecessarily.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread .github/workflows/ingest_all.yml
Comment thread .github/workflows/post_on_merge.yml
@semmet95 semmet95 force-pushed the feat/all-docs-ingestion branch from 8b5453c to d94102b Compare May 10, 2026 13:55
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.github/workflows/ingest_all.yml (1)

21-36: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove early exits so later document groups are always considered.

Line 21 and Line 30 still short-circuit the step, so claims/ or proofs/ can be skipped even when files exist. Build ADDED_FILES incrementally across all three groups, then do a single final empty-check.

Suggested minimal fix
-          sources_files=$(find sources -type f 2>/dev/null | sort)
-          if [ -z "$sources_files" ]; then
-            echo "No sources files found"
-            exit 0
-          fi
-          
-          ADDED_FILES="$sources_files"
-          echo "Sources files added to ADDED_FILES"
+          ADDED_FILES=""
+          sources_files=$(find sources -type f 2>/dev/null | sort)
+          if [ -n "$sources_files" ]; then
+            ADDED_FILES="$sources_files"
+            echo "Sources files added to ADDED_FILES"
+          else
+            echo "No sources files found"
+          fi
           
           claims_files=$(find claims -type f 2>/dev/null | sort)
-          if [ -z "$claims_files" ]; then
-            echo "No claims files found, skipping proofs"
-            echo "ADDED_FILES<<EOF" >> "$GITHUB_ENV"
-            echo "$ADDED_FILES" >> "$GITHUB_ENV"
-            echo "EOF" >> "$GITHUB_ENV"
-            exit 0
-          fi
-          
-          ADDED_FILES="$ADDED_FILES"$'\n'"$claims_files"
-          echo "Claims files added to ADDED_FILES"
+          if [ -n "$claims_files" ]; then
+            ADDED_FILES="${ADDED_FILES:+$ADDED_FILES$'\n'}$claims_files"
+            echo "Claims files added to ADDED_FILES"
+          else
+            echo "No claims files found"
+          fi
           
           proofs_files=$(find proofs -type f 2>/dev/null | sort)
           if [ -n "$proofs_files" ]; then
-            ADDED_FILES="$ADDED_FILES"$'\n'"$proofs_files"
+            ADDED_FILES="${ADDED_FILES:+$ADDED_FILES$'\n'}$proofs_files"
             echo "Proofs files added to ADDED_FILES"
           else
             echo "No proofs files found"
           fi
+
+          if [ -z "$ADDED_FILES" ]; then
+            echo "No request documents found to ingest"
+            exit 0
+          fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ingest_all.yml around lines 21 - 36, The step currently
exits early when sources_files or claims_files are empty, preventing later
groups from being processed; instead, remove those early exit branches and
always append each group's found files to the ADDED_FILES variable (use
ADDED_FILES="$ADDED_FILES\n$sources_files" and similarly for claims_files and
proofs_files), only writing ADDED_FILES to GITHUB_ENV and performing a single
emptiness check at the end of the step; update the blocks that now echo to
GITHUB_ENV (currently inside the claims_files empty branch) to be executed once
after all groups are aggregated, and ensure you no longer call exit 0 in the
intermediate checks so all groups (sources, claims, proofs) are considered
before finalizing ADDED_FILES.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.github/workflows/ingest_all.yml:
- Around line 21-36: The step currently exits early when sources_files or
claims_files are empty, preventing later groups from being processed; instead,
remove those early exit branches and always append each group's found files to
the ADDED_FILES variable (use ADDED_FILES="$ADDED_FILES\n$sources_files" and
similarly for claims_files and proofs_files), only writing ADDED_FILES to
GITHUB_ENV and performing a single emptiness check at the end of the step;
update the blocks that now echo to GITHUB_ENV (currently inside the claims_files
empty branch) to be executed once after all groups are aggregated, and ensure
you no longer call exit 0 in the intermediate checks so all groups (sources,
claims, proofs) are considered before finalizing ADDED_FILES.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: df716271-877c-40b4-9fd7-5e85f5389d48

📥 Commits

Reviewing files that changed from the base of the PR and between 8b5453c and d94102b.

📒 Files selected for processing (2)
  • .github/workflows/ingest_all.yml
  • .github/workflows/post_on_merge.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/post_on_merge.yml

@semmet95
Copy link
Copy Markdown
Contributor Author

@cubic-dev-ai

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 10, 2026

@cubic-dev-ai

@semmet95 I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".github/workflows/ingest_all.yml">

<violation number="1" location=".github/workflows/ingest_all.yml:21">
P1: Do not exit early when `sources/` is empty; it prevents ingesting claims/proofs in the same run.</violation>

<violation number="2" location=".github/workflows/ingest_all.yml:30">
P2: Removing the early `exit 0` in the claims check is needed so proofs can still be discovered and ingested.</violation>
</file>

<file name=".github/workflows/post_on_merge.yml">

<violation number="1" location=".github/workflows/post_on_merge.yml:37">
P2: The new file-ordering normalization corrupts filenames containing spaces by splitting on spaces. Keep `ADDED_FILES` newline-delimited without `tr ' ' '\n'`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread .github/workflows/ingest_all.yml
Comment thread .github/workflows/ingest_all.yml
Comment thread .github/workflows/post_on_merge.yml Outdated
Signed-off-by: Amit Singh <singhamitch@outlook.com>
@semmet95 semmet95 force-pushed the feat/all-docs-ingestion branch from d94102b to 4002779 Compare May 10, 2026 14:11
@semmet95
Copy link
Copy Markdown
Contributor Author

@cubic-dev-ai

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 10, 2026

@cubic-dev-ai

@semmet95 I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".github/workflows/ingest_all.yml">

<violation number="1" location=".github/workflows/ingest_all.yml:23">
P2: The early `exit 0` when `claims/` is empty makes the proofs scan unreachable, so this workflow can skip ingesting existing `proofs/` files.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread .github/workflows/ingest_all.yml
@semmet95 semmet95 merged commit e10f2f6 into main May 10, 2026
2 checks passed
@semmet95 semmet95 deleted the feat/all-docs-ingestion branch May 10, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a workflow to ingest all the docs at once

1 participant