Skip to content

feat: adds automation to create PR to add popular sources#20

Merged
semmet95 merged 8 commits into
mainfrom
feat/new-sources-worflow
May 8, 2026
Merged

feat: adds automation to create PR to add popular sources#20
semmet95 merged 8 commits into
mainfrom
feat/new-sources-worflow

Conversation

@semmet95
Copy link
Copy Markdown
Contributor

@semmet95 semmet95 commented May 8, 2026

Fixes: #19

Summary by CodeRabbit

  • New Features

    • Automated workflow to fetch and add new sources on a weekly schedule or manual trigger; creates a pull request only when changes are detected.
    • New CLI utilities to scrape external sites, process content via an external LLM service, and generate sanitized source entries while avoiding duplicates.
  • Chores

    • Added a scraping-related dependency.
    • Updated ignore patterns to exclude virtualenvs, editor settings, and Python bytecode caches.

semmet95 added 6 commits May 7, 2026 19:02
Signed-off-by: Amit Singh <singhamitch@outlook.com>
Signed-off-by: Amit Singh <singhamitch@outlook.com>
Signed-off-by: Amit Singh <singhamitch@outlook.com>
Signed-off-by: Amit Singh <singhamitch@outlook.com>
Signed-off-by: Amit Singh <singhamitch@outlook.com>
Signed-off-by: Amit Singh <singhamitch@outlook.com>
@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an automated pipeline: Firecrawl scraper, OpenRouter LLM processing (extract/validate/generate YAML), Bash orchestration, scheduled GitHub Actions workflow that conditionally commits and opens PRs, plus dependency and .gitignore updates.

Changes

Source Scraping Pipeline

Layer / File(s) Summary
Dependencies & Configuration
requirements.txt, .gitignore
firecrawl-py dependency added; .venv/ and .vscode/ are now ignored alongside __pycache__.
Web Scraper (Firecrawl)
scripts/scrape_firecrawl.py
CLI validates a URL arg and FIRECRAWL_API_KEY, scrapes the URL to Markdown via Firecrawl, validates response, and prints Markdown to stdout.
Bash Orchestration Script
scripts/source_scraper.sh
Enables strict mode, reads TOP_SOURCES_LIST, validates FIRECRAWL_API_KEY and OPENROUTER_API_KEY, creates a temp markdown file, runs the Firecrawl scraper, and invokes scripts/openrouter.py.
LLM Processing (OpenRouter)
scripts/openrouter.py
Adds network helpers and three LLM phases: extract raw URLs from Markdown, validate/format URLs with openrouter:web_search, and generate YAML source documents; filters duplicates by name/uri and writes unique sources/*.yaml.
GitHub Actions Workflow
.github/workflows/fetch_new_sources.yml
Workflow "Fetch and add new sources" triggers weekly and manually, creates a run-specific branch from main, runs scripts/source_scraper.sh with retries (max 3, 120s delay), conditionally commits/pushes, and opens a PR to main only when changes occur.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hop through feeds at break of dawn,
Firecrawl digs deep where links are drawn,
OpenRouter hums and fashions YAML light,
I plant new branches in the quiet night,
PRs bloom softly — a rabbit's delight.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning Issue #19 requires updating README.md and AGENTS.md Workflows sections, but the PR only adds automation code without documentation updates. Update README.md and AGENTS.md to document the new 'Fetch and add new sources' workflow as specified in issue #19.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: automation for creating a PR to add popular sources via a new GitHub Actions workflow.
Out of Scope Changes check ✅ Passed All changes are in-scope: workflow file, scripts, dependencies, and gitignore align with the automation objective; no unrelated modifications present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/new-sources-worflow

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/fetch_new_sources.yml:
- Around line 19-31: The workflow is missing Python setup and dependency
installation which causes ModuleNotFoundError when running Python scripts;
update the job steps (e.g., after the "Checkout" and "Setup Git" steps and
before any Python script runs) to add an actions/setup-python@v4 step to install
the required Python version and a step that runs pip install -r requirements.txt
(optionally with pip cache) so dependencies like firecrawl and pyyaml are
available for subsequent steps such as the "Create branch from main" related
scripts.
- Around line 33-53: The workflow step that runs bash scripts/source_scraper.sh
is missing required environment variables (TOP_SOURCES_LIST, FIRECRAWL_API_KEY,
OPENROUTER_API_KEY) so the script fails with "env var not set"; update the "Run
source scraper (with retries)" step to provide these variables (either by adding
an env block for TOP_SOURCES_LIST, FIRECRAWL_API_KEY, OPENROUTER_API_KEY or
exporting them before invoking scripts/source_scraper.sh) so the script receives
them during each attempt and retries behave correctly.

In `@requirements.txt`:
- Around line 1-4: The requirements file uses an incorrect package name and
lacks pinned versions; change "firecrawl" to the correct PyPI name
"firecrawl-py" and add explicit version pins for all packages (e.g., pyyaml,
jsonschema, referencing, firecrawl-py) to ensure reproducible installs—update
the lines for "pyyaml", "jsonschema", "referencing", and the renamed
"firecrawl-py" to include the chosen version specifiers (== or ~=) consistent
with your compatibility policy.

In `@scripts/openrouter.py`:
- Around line 35-37: The urlopen calls in fetch_text and post_openrouter lack
timeouts and can hang indefinitely; update both functions to pass a reasonable
timeout (e.g., timeout=10) to urlopen and handle socket.timeout/URLError
exceptions accordingly so the workflow fails fast and logs or raises a clear
error; locate the urlopen usage in fetch_text and the corresponding call in
post_openrouter and add the timeout argument and appropriate exception handling
around the network call.
- Around line 287-292: The code uses parsed.get('name') and immediately calls
.strip() which will raise AttributeError if the YAML lacks a name; update the
logic around the filename assignment (the variable filename and the call to
re.sub) to first check for None/empty, e.g. obtain raw_name = parsed.get('name')
and if raw_name is falsy use a safe default (like "unnamed-{uuid4()}" or a
timestamp) before calling .strip() and re.sub, then append ".yaml"; ensure this
change is applied where filename is computed so downstream code receives a valid
string.

In `@scripts/scrape_firecrawl.py`:
- Around line 22-30: Update the misleading comment above the fc.scrape call:
change the text that says the SDK "returns a dict with a 'markdown' field" to
state that fc.scrape returns an object (or remove the type reference entirely),
since the code uses doc.markdown; ensure the comment references fc.scrape and
doc.markdown so it accurately reflects the SDK returns an object with a markdown
attribute.

In `@scripts/source_scraper.sh`:
- Around line 4-9: The script crashes under set -u because URL is assigned from
an unset TOP_SOURCES_LIST; change the assignment URL="$TOP_SOURCES_LIST" to use
the safe default expansion URL="${TOP_SOURCES_LIST:-}" so the script can run the
custom error check (the TOP_SOURCES_LIST variable is the unique symbol to
update), leaving the subsequent checks that already use the :- pattern
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ceee9ef7-f81f-4391-a31d-513f19b7d2a0

📥 Commits

Reviewing files that changed from the base of the PR and between 9c228ac and 4e50063.

📒 Files selected for processing (6)
  • .github/workflows/fetch_new_sources.yml
  • .gitignore
  • requirements.txt
  • scripts/openrouter.py
  • scripts/scrape_firecrawl.py
  • scripts/source_scraper.sh

Comment thread .github/workflows/fetch_new_sources.yml
Comment thread .github/workflows/fetch_new_sources.yml
Comment thread requirements.txt Outdated
Comment thread scripts/openrouter.py
Comment thread scripts/scrape_firecrawl.py
Comment thread scripts/source_scraper.sh Outdated
Signed-off-by: Amit Singh <singhamitch@outlook.com>
@semmet95 semmet95 force-pushed the feat/new-sources-worflow branch from 4e50063 to d90143f Compare May 8, 2026 12:07
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/openrouter.py">

<violation number="1" location="scripts/openrouter.py:287">
P1: `parsed.get('name')` can return `None` if the YAML document lacks a `name` field, causing `filename.strip()` to raise `AttributeError`. Add a guard to skip documents without a valid `name`.</violation>
</file>

<file name="scripts/source_scraper.sh">

<violation number="1" location="scripts/source_scraper.sh:4">
P2: Handle `TOP_SOURCES_LIST` without a bare expansion here; under `set -u`, an unset variable exits before the explicit error message runs.</violation>
</file>

<file name=".github/workflows/fetch_new_sources.yml">

<violation number="1" location=".github/workflows/fetch_new_sources.yml:23">
P0: The workflow runs Python scripts that depend on packages from `requirements.txt` (firecrawl, pyyaml, etc.), but never sets up Python or installs dependencies. Add `actions/setup-python` and `pip install -r requirements.txt` steps before executing the scripts, otherwise they will fail with `ModuleNotFoundError`.</violation>

<violation number="2" location=".github/workflows/fetch_new_sources.yml:33">
P0: The `source_scraper.sh` script requires `TOP_SOURCES_LIST`, `FIRECRAWL_API_KEY`, and `OPENROUTER_API_KEY` environment variables, but none are set in this workflow step. The script will fail immediately with "env var not set" errors. Add an `env` block to pass these secrets.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread .github/workflows/fetch_new_sources.yml
Comment thread .github/workflows/fetch_new_sources.yml
Comment thread scripts/openrouter.py
Comment thread scripts/source_scraper.sh
@semmet95
Copy link
Copy Markdown
Contributor Author

semmet95 commented May 8, 2026

@cubic-dev-ai

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 8, 2026

@cubic-dev-ai

@semmet95 I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
.github/workflows/fetch_new_sources.yml (2)

33-40: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Pass required secrets/env vars into the scraper step.

source_scraper.sh depends on TOP_SOURCES_LIST, FIRECRAWL_API_KEY, and OPENROUTER_API_KEY; without them this step fails deterministically.

🔧 Proposed fix
       - name: Run source scraper (with retries)
+        env:
+          TOP_SOURCES_LIST: ${{ secrets.TOP_SOURCES_LIST }}
+          FIRECRAWL_API_KEY: ${{ secrets.FIRECRAWL_API_KEY }}
+          OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
         run: |
           set -e
           attempt=1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/fetch_new_sources.yml around lines 33 - 40, The workflow
step "Run source scraper (with retries)" calls scripts/source_scraper.sh but
doesn't pass the required environment variables; update that step to export or
pass TOP_SOURCES_LIST, FIRECRAWL_API_KEY, and OPENROUTER_API_KEY (sourced from
repository or organization secrets) into the step environment so the script can
access them (e.g., using env: or by exporting them before invoking bash) and
ensure secret values use GitHub Secrets to avoid plaintext exposure.

20-33: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Set up Python and install requirements before running scraper scripts.

The workflow invokes Python tooling but does not provision Python packages first, so runs can fail with missing-module errors.

🔧 Proposed fix
       - name: Checkout
         uses: actions/checkout@v4

+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install dependencies
+        run: pip install -r requirements.txt
+
       - name: Setup Git
         run: |
           git config user.name "github-actions[bot]"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/fetch_new_sources.yml around lines 20 - 33, The workflow
is missing a Python setup and dependency install before running the scraper; add
a step after "Setup Git" (and before "Run source scraper (with retries)") to use
actions/setup-python to install the required Python version and then run pip
install -r requirements.txt (or the project's install command) so the scraper
has its modules available; ensure the new step is clearly named (e.g., "Set up
Python and install requirements") and placed between the "Create branch from
main" and "Run source scraper (with retries)" steps so dependencies are
installed prior to executing the scraper.
scripts/source_scraper.sh (1)

4-4: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard TOP_SOURCES_LIST before expansion under strict mode.

On Line 4, set -u can abort before your intended validation flow if TOP_SOURCES_LIST is unset. Use safe expansion plus an explicit check so the failure is deterministic and readable.

🔧 Proposed fix
-URL="$TOP_SOURCES_LIST"
+URL="${TOP_SOURCES_LIST:-}"
+
+if [[ -z "$URL" ]]; then
+  echo "Error: TOP_SOURCES_LIST is not set." >&2
+  exit 1
+fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/source_scraper.sh` at line 4, The script uses strict mode (set -u)
but expands TOP_SOURCES_LIST directly into URL which can cause an immediate
unhelpful abort; change the assignment to use a safe expansion (e.g.,
default-empty or parameter expansion) and add an explicit check for
TOP_SOURCES_LIST before using it so failures are deterministic and
readable—update the line that sets URL and add a subsequent conditional that
tests TOP_SOURCES_LIST (referencing the TOP_SOURCES_LIST variable and the URL
assignment) and emits a clear error and exit if it is unset or empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/fetch_new_sources.yml:
- Around line 60-65: The workflow uses unquoted $GITHUB_OUTPUT in the echo
redirections (the echo "changed=false" >> $GITHUB_OUTPUT and echo "changed=true"
>> $GITHUB_OUTPUT lines), which risks word-splitting; update those redirections
to quote the variable (use >> "$GITHUB_OUTPUT") so the shell treats the path as
a single token and avoids SC2086 warnings.

---

Duplicate comments:
In @.github/workflows/fetch_new_sources.yml:
- Around line 33-40: The workflow step "Run source scraper (with retries)" calls
scripts/source_scraper.sh but doesn't pass the required environment variables;
update that step to export or pass TOP_SOURCES_LIST, FIRECRAWL_API_KEY, and
OPENROUTER_API_KEY (sourced from repository or organization secrets) into the
step environment so the script can access them (e.g., using env: or by exporting
them before invoking bash) and ensure secret values use GitHub Secrets to avoid
plaintext exposure.
- Around line 20-33: The workflow is missing a Python setup and dependency
install before running the scraper; add a step after "Setup Git" (and before
"Run source scraper (with retries)") to use actions/setup-python to install the
required Python version and then run pip install -r requirements.txt (or the
project's install command) so the scraper has its modules available; ensure the
new step is clearly named (e.g., "Set up Python and install requirements") and
placed between the "Create branch from main" and "Run source scraper (with
retries)" steps so dependencies are installed prior to executing the scraper.

In `@scripts/source_scraper.sh`:
- Line 4: The script uses strict mode (set -u) but expands TOP_SOURCES_LIST
directly into URL which can cause an immediate unhelpful abort; change the
assignment to use a safe expansion (e.g., default-empty or parameter expansion)
and add an explicit check for TOP_SOURCES_LIST before using it so failures are
deterministic and readable—update the line that sets URL and add a subsequent
conditional that tests TOP_SOURCES_LIST (referencing the TOP_SOURCES_LIST
variable and the URL assignment) and emits a clear error and exit if it is unset
or empty.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9a358d31-10ee-4745-aa50-1d4d1e3d0245

📥 Commits

Reviewing files that changed from the base of the PR and between 4e50063 and d90143f.

📒 Files selected for processing (2)
  • .github/workflows/fetch_new_sources.yml
  • scripts/source_scraper.sh

Comment thread .github/workflows/fetch_new_sources.yml Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".github/workflows/fetch_new_sources.yml">

<violation number="1" location=".github/workflows/fetch_new_sources.yml:16">
P2: Branch naming uses only `github.run_id`, so reruns reuse the same branch name and can fail when pushing updates. Include `github.run_attempt` (or another unique suffix) in the branch name.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread .github/workflows/fetch_new_sources.yml
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/openrouter.py">

<violation number="1" location="scripts/openrouter.py:289">
P1: Don't reference `e` here; this branch runs outside the exception handler and will crash when `name` is missing.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread scripts/openrouter.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
.github/workflows/fetch_new_sources.yml (1)

71-77: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote $GITHUB_OUTPUT in the redirections.

These writes should use >> "$GITHUB_OUTPUT" so the path is treated as a single token and the step stays shell-safe.

🔧 Suggested change
-            echo "changed=false" >> $GITHUB_OUTPUT
+            echo "changed=false" >> "$GITHUB_OUTPUT"
...
-            echo "changed=true" >> $GITHUB_OUTPUT
+            echo "changed=true" >> "$GITHUB_OUTPUT"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/fetch_new_sources.yml around lines 71 - 77, The
redirections currently append to an unquoted $GITHUB_OUTPUT which can break with
spaces or special chars; update the redirect usages in the shell block that
contains the git check/commit (the lines using git diff --cached --quiet and git
commit -m "feat(sources): adds new most popular sources") to append to the
output variable using >> "$GITHUB_OUTPUT" (replace all occurrences of >>
$GITHUB_OUTPUT with >> "$GITHUB_OUTPUT") so the path is treated as a single
token and the step remains shell-safe.
scripts/openrouter.py (1)

190-197: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate parsed YAML is a mapping before using .get().

Both code paths assume yaml.safe_load(...) returns a dict. LLM output can still parse into a scalar/list/None, which makes .get(...) blow up. Also, the missing-name guard still references e on Line 289, so that fallback path raises NameError instead of skipping the bad doc.

🛡️ Suggested change
     for doc_str in source_docs:
         try:
             doc = yaml.safe_load(doc_str)
         except Exception as e:
             print(f"Warning: failed to parse source_doc; not adding it to the source list it. Error: {e}", file=sys.stderr)
             continue
+        if not isinstance(doc, dict):
+            print("Warning: source_doc is not a YAML mapping; skipping", file=sys.stderr)
+            continue
         duplicate = False
         for src in existing:
             if doc.get('name') == src.get('name') or doc.get('uri') == src.get('uri'):
                 duplicate = True
                 break
...
     for doc_str in unique_src_docs:
         try:
             parsed = yaml.safe_load(doc_str)
         except Exception as e:
             print(f"Warning: failed to parse src_doc : {doc_str}: {e}", file=sys.stderr)
             continue
+        if not isinstance(parsed, dict):
+            print(f"Warning: parsed source doc is not a mapping, skipping: {doc_str}", file=sys.stderr)
+            continue
 
         filename = parsed.get('name')
-        if filename == None:
-            print(f"Warning: failed to extract name field from : {doc_str}: {e}", file=sys.stderr)
+        if not filename:
+            print(f"Warning: failed to extract name field from: {doc_str}", file=sys.stderr)
             continue
Can PyYAML's yaml.safe_load return a scalar, list, or None instead of a dict depending on the YAML document content?

Also applies to: 281-293

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/openrouter.py` around lines 190 - 197, The YAML loader may return
non-mapping types, so before using doc.get(...) validate that the parsed doc is
a dict/mapping (e.g., isinstance(doc, dict)) and skip/log a warning and continue
if it's None, a scalar, or a list; update the duplicate/name checks that
reference doc.get('name') or doc.get('uri') to only run when doc is a mapping.
Also fix the missing-name guard that incorrectly references the exception
variable e (causing NameError) — replace that reference with a safe message or
use the parsed doc/state instead so the fallback path cleanly skips bad docs
(adjust logic around doc_str/doc and existing to avoid referencing e).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/fetch_new_sources.yml:
- Around line 15-20: The workflow uses a repo-specific secret GH_TOKEN which may
be absent; update the env entry that sets GH_TOKEN to use the built-in Actions
token instead (replace the GH_TOKEN value ${ { secrets.GH_TOKEN } } with the
default token expression such as ${{ secrets.GITHUB_TOKEN }} or ${{ github.token
}}), so steps that run gh pr create can use the provided workflow token;
specifically modify the env block where BRANCH and GH_TOKEN are defined to set
GH_TOKEN to the built-in token.

In `@scripts/openrouter.py`:
- Around line 145-155: The current selection of latest = max(files, key=lambda
p: os.path.getmtime(p)) is unreliable in CI because mtime reflects checkout
time; replace this logic to choose a deterministic or VCS-derived "latest"
instead: either query git for the most recently added/committed source under
sources_dir (e.g., using git log/git ls-files to pick the file with the newest
commit that added or modified it) and set latest to that path, or pick a
deterministic fixed sample (e.g., sorted(files)[0] or a specific filename) so
behavior is stable in CI; update any error handling around entries/files/latest
to report failures if git lookup or deterministic selection returns nothing.
- Line 26: MD_PROCESSING_SKILL_URL currently points to the mutable
refs/heads/main raw URL; replace it with an immutable reference (either the raw
URL pinned to a specific commit SHA for the upstream file or vendored into this
repo and point MD_PROCESSING_SKILL_URL to the local path). Update the constant
MD_PROCESSING_SKILL_URL in scripts/openrouter.py to use the chosen fixed
location and ensure any tests/consumers still read from that constant.
- Around line 54-64: The code in post_openrouter currently only catches
HTTPError which misses transport errors (URLError) like timeouts or DNS failures
and causes the run to abort; update the exception handling around urlopen (the
try/except block that reads status/body) to also catch urllib.error.URLError (or
a combined except HTTPError, URLError as e) and set status and body similar to
the HTTPError path (e.g., set status to a sentinel or None and body to the
readable message from e or empty string) so the caller can continue fallback to
the next model; locate the try block that opens urlopen(req, timeout=10) and
modify its except clauses to handle URLError alongside HTTPError.

---

Duplicate comments:
In @.github/workflows/fetch_new_sources.yml:
- Around line 71-77: The redirections currently append to an unquoted
$GITHUB_OUTPUT which can break with spaces or special chars; update the redirect
usages in the shell block that contains the git check/commit (the lines using
git diff --cached --quiet and git commit -m "feat(sources): adds new most
popular sources") to append to the output variable using >> "$GITHUB_OUTPUT"
(replace all occurrences of >> $GITHUB_OUTPUT with >> "$GITHUB_OUTPUT") so the
path is treated as a single token and the step remains shell-safe.

In `@scripts/openrouter.py`:
- Around line 190-197: The YAML loader may return non-mapping types, so before
using doc.get(...) validate that the parsed doc is a dict/mapping (e.g.,
isinstance(doc, dict)) and skip/log a warning and continue if it's None, a
scalar, or a list; update the duplicate/name checks that reference
doc.get('name') or doc.get('uri') to only run when doc is a mapping. Also fix
the missing-name guard that incorrectly references the exception variable e
(causing NameError) — replace that reference with a safe message or use the
parsed doc/state instead so the fallback path cleanly skips bad docs (adjust
logic around doc_str/doc and existing to avoid referencing e).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 44ba98e9-2b0a-4fe4-8f92-681932d269e0

📥 Commits

Reviewing files that changed from the base of the PR and between d90143f and a7bacfe.

📒 Files selected for processing (4)
  • .github/workflows/fetch_new_sources.yml
  • requirements.txt
  • scripts/openrouter.py
  • scripts/scrape_firecrawl.py
✅ Files skipped from review due to trivial changes (1)
  • requirements.txt

Comment thread .github/workflows/fetch_new_sources.yml
Comment thread scripts/openrouter.py
Comment thread scripts/openrouter.py
Comment thread scripts/openrouter.py
@semmet95 semmet95 force-pushed the feat/new-sources-worflow branch 2 times, most recently from a812e98 to 6b7f3d9 Compare May 8, 2026 13:20
@semmet95
Copy link
Copy Markdown
Contributor Author

semmet95 commented May 8, 2026

@cubic-dev-ai

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 8, 2026

@cubic-dev-ai

@semmet95 I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
scripts/openrouter.py (3)

54-62: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Initialize status on the failure path.

If urlopen raises here, Line 62 returns an undefined status, so the script dies with UnboundLocalError before it can fall back to the next model.

🩹 Suggested change
 def post_openrouter(api_key: str, payload: dict) -> Tuple[int, str]:
     data = json.dumps(payload).encode("utf-8")
+    status = 0
+    body = ""
     req = Request(
         "https://openrouter.ai/api/v1/chat/completions",
         data=data,
@@
     try:
         with urlopen(req, timeout=60) as r:
             status = r.getcode()
             body = r.read().decode("utf-8")
     except Exception as e:
         print(f"Error making request to OpenRouter API: {e}", file=sys.stderr)
-        body = ""
 
     return status, body
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/openrouter.py` around lines 54 - 62, The except block after the
urlopen call leaves `status` undefined which causes an UnboundLocalError on
return; in the except branch inside the function that uses urlopen(req,
timeout=60) set `status` to a safe sentinel (e.g., None or -1) before returning
along with `body` (keep the existing error print), so change the except handler
to assign status = None (or -1) and body = "" so the final `return status, body`
always has both variables defined.

26-26: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin MD_PROCESSING_SKILL_URL to an immutable revision.

Using refs/heads/main means upstream prompt edits can silently change scheduled output without any PR in this repo.

🔒 Suggested change
-MD_PROCESSING_SKILL_URL = "https://raw.githubusercontent.com/semmet95/agent-skills/refs/heads/main/md-processing/SKILL.md"
+MD_PROCESSING_SKILL_URL = "https://raw.githubusercontent.com/semmet95/agent-skills/<pinned-commit-sha>/md-processing/SKILL.md"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/openrouter.py` at line 26, MD_PROCESSING_SKILL_URL currently points
to a mutable branch ref ("refs/heads/main") which allows upstream changes to
silently alter behavior; update MD_PROCESSING_SKILL_URL to use an immutable
commit or tag reference (SHA or annotated tag) instead, e.g., replace the
"refs/heads/main" segment with the specific commit SHA or a release tag so the
URL always fetches the exact same SKILL.md; locate the constant
MD_PROCESSING_SKILL_URL in scripts/openrouter.py and change its value to the
pinned immutable revision.

141-152: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

mtime is not a stable “latest source” signal in CI.

On a fresh GitHub Actions checkout, file mtimes reflect checkout timing, not when a source was added. This makes the schema sample effectively arbitrary between runs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/openrouter.py` around lines 141 - 152, The current selection of
latest source uses filesystem mtime (latest = max(files, key=lambda p:
os.path.getmtime(p))) which is unstable in CI; replace this with a
Git-commit-based timestamp comparison. Add a helper (e.g.,
get_git_commit_time(path)) that runs git log --format=%ct -1 -- <path> via
subprocess and returns an integer timestamp (fall back to os.path.getmtime if
git fails), then compute latest = max(files, key=get_git_commit_time) so
selection is deterministic across checkouts; update the code paths that build
files and the latest assignment to use this helper.
.github/workflows/fetch_new_sources.yml (1)

15-19: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Export a token for gh pr create.

This job has the right permissions, but the GitHub CLI will not use them unless GH_TOKEN or GITHUB_TOKEN is present in the environment. The PR step will fail on the first run that actually has changes.

🔑 Suggested change
     env:
       BRANCH: src-auto-add-${{ github.run_id }}
+      GH_TOKEN: ${{ github.token }}
       TOP_SOURCES_LIST: ${{ vars.TOP_SOURCES_LIST }}
       FIRECRAWL_API_KEY: ${{ secrets.FIRECRAWL_API_KEY }}
       OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/fetch_new_sources.yml around lines 15 - 19, The job sets
env vars (BRANCH, TOP_SOURCES_LIST, FIRECRAWL_API_KEY, OPENROUTER_API_KEY) but
does not export a GitHub token for the GitHub CLI; add GITHUB_TOKEN (or
GH_TOKEN) to the environment so `gh pr create` can authenticate (e.g., set
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} in the env block) and ensure the job
will succeed on first run that creates a PR.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/fetch_new_sources.yml:
- Line 16: The branch name currently set in the BRANCH env ("src-auto-add-${{
github.run_id }}") can collide on workflow reruns; update the BRANCH value to
include the run attempt (e.g., append `${{ github.run_attempt }}`) so the BRANCH
variable (BRANCH: src-auto-add-...) becomes unique per attempt and avoids
non-fast-forward push failures or PR conflicts.

In `@scripts/openrouter.py`:
- Around line 187-194: The code calls doc.get(...) and src.get(...) but
yaml.safe_load(doc_str) can return None or non-mapping types; update the loop in
scripts/openrouter.py to first check that doc is a mapping (e.g.,
isinstance(doc, dict)) and skip with a warning if not, and likewise ensure each
existing item (src) is a mapping before calling src.get() to avoid
AttributeError; keep the duplicate detection logic the same once both doc and
src are guaranteed dicts.
- Around line 284-292: The sanitized filename may be empty (e.g., when
parsed.get('name') is only punctuation/whitespace), so after performing the
re.sub on parsed.get('name') and before appending ".yaml" you must validate the
resulting basename (the variable filename) is non-empty and safe; if it is
empty, either skip the document or replace it with a default safe basename
(e.g., "untitled-<unique>" or derived from another field) and log a warning to
stderr. Update the code that sets filename (uses parsed.get('name'),
re.sub(...), and then filename = filename + ".yaml") to perform this check and
take one of the safe actions (skip/write with default) so you never write to
"sources/.yaml".

---

Duplicate comments:
In @.github/workflows/fetch_new_sources.yml:
- Around line 15-19: The job sets env vars (BRANCH, TOP_SOURCES_LIST,
FIRECRAWL_API_KEY, OPENROUTER_API_KEY) but does not export a GitHub token for
the GitHub CLI; add GITHUB_TOKEN (or GH_TOKEN) to the environment so `gh pr
create` can authenticate (e.g., set GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} in
the env block) and ensure the job will succeed on first run that creates a PR.

In `@scripts/openrouter.py`:
- Around line 54-62: The except block after the urlopen call leaves `status`
undefined which causes an UnboundLocalError on return; in the except branch
inside the function that uses urlopen(req, timeout=60) set `status` to a safe
sentinel (e.g., None or -1) before returning along with `body` (keep the
existing error print), so change the except handler to assign status = None (or
-1) and body = "" so the final `return status, body` always has both variables
defined.
- Line 26: MD_PROCESSING_SKILL_URL currently points to a mutable branch ref
("refs/heads/main") which allows upstream changes to silently alter behavior;
update MD_PROCESSING_SKILL_URL to use an immutable commit or tag reference (SHA
or annotated tag) instead, e.g., replace the "refs/heads/main" segment with the
specific commit SHA or a release tag so the URL always fetches the exact same
SKILL.md; locate the constant MD_PROCESSING_SKILL_URL in scripts/openrouter.py
and change its value to the pinned immutable revision.
- Around line 141-152: The current selection of latest source uses filesystem
mtime (latest = max(files, key=lambda p: os.path.getmtime(p))) which is unstable
in CI; replace this with a Git-commit-based timestamp comparison. Add a helper
(e.g., get_git_commit_time(path)) that runs git log --format=%ct -1 -- <path>
via subprocess and returns an integer timestamp (fall back to os.path.getmtime
if git fails), then compute latest = max(files, key=get_git_commit_time) so
selection is deterministic across checkouts; update the code paths that build
files and the latest assignment to use this helper.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 08bd737e-8b65-4167-a277-f99e1427603b

📥 Commits

Reviewing files that changed from the base of the PR and between a7bacfe and 6b7f3d9.

📒 Files selected for processing (4)
  • .github/workflows/fetch_new_sources.yml
  • requirements.txt
  • scripts/openrouter.py
  • scripts/scrape_firecrawl.py
✅ Files skipped from review due to trivial changes (1)
  • requirements.txt

Comment thread .github/workflows/fetch_new_sources.yml
Comment thread scripts/openrouter.py
Comment thread scripts/openrouter.py
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".github/workflows/fetch_new_sources.yml">

<violation number="1" location=".github/workflows/fetch_new_sources.yml:82">
P2: Authenticate `gh pr create` with `GH_TOKEN`; otherwise PR creation can fail in CI due to missing GitHub CLI auth.</violation>
</file>

<file name="scripts/openrouter.py">

<violation number="1" location="scripts/openrouter.py:60">
P1: `status` can be undefined on request errors, causing `UnboundLocalError` when returning from `post_openrouter`.</violation>

<violation number="2" location="scripts/openrouter.py:290">
P1: If the `name` field is all whitespace or punctuation, `re.sub` sanitizes it to an empty string, resulting in writing to `sources/.yaml`. Add a guard after sanitization to skip empty basenames. Also, use `is None` instead of `== None`.</violation>
</file>

<file name="scripts/source_scraper.sh">

<violation number="1" location="scripts/source_scraper.sh:4">
P2: Validate `TOP_SOURCES_LIST` before using it. An empty repo variable currently becomes a blank URL and causes a downstream scrape failure.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread scripts/openrouter.py
Comment thread scripts/openrouter.py
Comment thread .github/workflows/fetch_new_sources.yml
Comment thread scripts/source_scraper.sh
Signed-off-by: Amit Singh <singhamitch@outlook.com>
@semmet95 semmet95 force-pushed the feat/new-sources-worflow branch from 6b7f3d9 to e73294b Compare May 8, 2026 13:40
@semmet95
Copy link
Copy Markdown
Contributor Author

semmet95 commented May 8, 2026

@cubic-dev-ai

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 8, 2026

@cubic-dev-ai

@semmet95 I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
scripts/openrouter.py (3)

26-26: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin MD_PROCESSING_SKILL_URL to an immutable revision.

Fetching the prompt from refs/heads/main makes scheduled runs non-deterministic and lets upstream changes alter generation behavior without any PR in this repo. Use a commit SHA or vendor the prompt locally.

Suggested change
-MD_PROCESSING_SKILL_URL = "https://raw.githubusercontent.com/semmet95/agent-skills/refs/heads/main/md-processing/SKILL.md"
+MD_PROCESSING_SKILL_URL = "https://raw.githubusercontent.com/semmet95/agent-skills/<pinned-commit-sha>/md-processing/SKILL.md"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/openrouter.py` at line 26, MD_PROCESSING_SKILL_URL currently points
to refs/heads/main making prompt fetches non-deterministic; change it to an
immutable reference by replacing the branch URL with a commit SHA URL (use the
raw.githubusercontent.com URL with the specific commit hash for the SKILL.md) or
vendor the SKILL.md into the repo and update MD_PROCESSING_SKILL_URL to a local
relative path; update any code that fetches this constant (scripts/openrouter.py
-> MD_PROCESSING_SKILL_URL) and add a short comment explaining why we pin the
revision.

188-198: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard non-mapping YAML before calling .get().

yaml.safe_load(doc_str) can return a list or scalar. In that case, Line 198 raises AttributeError on doc.get(...) and aborts the run instead of skipping malformed model output.

Suggested change
     for doc_str in source_docs:
         try:
             doc = yaml.safe_load(doc_str)
         except Exception as e:
             print(f"Warning: failed to parse source_doc; not adding it to the source list it. Error: {e}", file=sys.stderr)
             continue
-        if doc == None:
-            print(f"Warning: skipping the following yaml string as it failed to load: {doc_str}", file=sys.stderr)
+        if not isinstance(doc, dict):
+            print(f"Warning: generated source doc is not a YAML mapping; skipping: {doc_str}", file=sys.stderr)
             continue
         duplicate = False
         for src in existing:
             if doc.get('name') == src.get('name') or doc.get('uri') == src.get('uri'):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/openrouter.py` around lines 188 - 198, The YAML loader can return
non-mapping types so calling doc.get(...) can raise; update the block after
yaml.safe_load to verify the parsed doc is a mapping (e.g., isinstance(doc,
dict)) and if not, warn to stderr and continue; also guard the loop over
existing by ensuring each src is a mapping before calling src.get(...) (skip or
coerce non-mappings) so the duplicate check in the for src in existing loop (the
doc.get(...) and src.get(...) calls) never invokes .get on non-dict objects.

138-153: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a deterministic schema sample instead of checkout mtime.

On GitHub Actions checkouts, file mtimes reflect checkout timing rather than when a source was added or last updated in git, so max(files, key=os.path.getmtime) can pick an arbitrary sample file. That makes the schema prompt unstable across runs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/openrouter.py` around lines 138 - 153, Replace the filesystem
mtime-based selection in get_latest_source with a git-commit-time-based
selection so results are deterministic in CI: for each candidate file in files
(the list built from sources_dir), call git to get the last commit timestamp for
that path (e.g. via subprocess.run(["git","log","-1","--format=%ct","--", path],
...)), parse the epoch seconds, and use that timestamp as the key to select
latest instead of os.path.getmtime; update error handling around the subprocess
calls and ensure the variable latest is assigned from this git-timestamp-based
max selection (still using sources_dir, files and latest identifiers).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/openrouter.py`:
- Around line 196-202: The current dedupe loop only compares each new document
(doc) against on-disk files (existing), so duplicate docs generated in the same
run can slip through; update the logic in the dedup block (where variables
duplicate, existing, cleaned, doc, doc_str are used) to compare against both
existing and already-collected cleaned entries (e.g., iterate over existing +
cleaned or check cleaned entries by parsing their names/uris) and skip adding
doc_str if any entry in either set has the same doc.get('name') or
doc.get('uri'), preserving current behavior for disk dedupe while preventing
intra-run duplicates.

---

Duplicate comments:
In `@scripts/openrouter.py`:
- Line 26: MD_PROCESSING_SKILL_URL currently points to refs/heads/main making
prompt fetches non-deterministic; change it to an immutable reference by
replacing the branch URL with a commit SHA URL (use the
raw.githubusercontent.com URL with the specific commit hash for the SKILL.md) or
vendor the SKILL.md into the repo and update MD_PROCESSING_SKILL_URL to a local
relative path; update any code that fetches this constant (scripts/openrouter.py
-> MD_PROCESSING_SKILL_URL) and add a short comment explaining why we pin the
revision.
- Around line 188-198: The YAML loader can return non-mapping types so calling
doc.get(...) can raise; update the block after yaml.safe_load to verify the
parsed doc is a mapping (e.g., isinstance(doc, dict)) and if not, warn to stderr
and continue; also guard the loop over existing by ensuring each src is a
mapping before calling src.get(...) (skip or coerce non-mappings) so the
duplicate check in the for src in existing loop (the doc.get(...) and
src.get(...) calls) never invokes .get on non-dict objects.
- Around line 138-153: Replace the filesystem mtime-based selection in
get_latest_source with a git-commit-time-based selection so results are
deterministic in CI: for each candidate file in files (the list built from
sources_dir), call git to get the last commit timestamp for that path (e.g. via
subprocess.run(["git","log","-1","--format=%ct","--", path], ...)), parse the
epoch seconds, and use that timestamp as the key to select latest instead of
os.path.getmtime; update error handling around the subprocess calls and ensure
the variable latest is assigned from this git-timestamp-based max selection
(still using sources_dir, files and latest identifiers).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8766f742-99ae-4ebb-a7f7-3d318d2cd19c

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7f3d9 and e73294b.

📒 Files selected for processing (4)
  • .github/workflows/fetch_new_sources.yml
  • requirements.txt
  • scripts/openrouter.py
  • scripts/scrape_firecrawl.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • requirements.txt
  • .github/workflows/fetch_new_sources.yml

Comment thread scripts/openrouter.py
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 6 files

@semmet95 semmet95 merged commit 0bc8524 into main May 8, 2026
2 checks passed
@semmet95 semmet95 deleted the feat/new-sources-worflow branch May 8, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Workflows section in Readme.md and AGENTS.md

1 participant