Skip to content

Add boost_collector_runner app with YAML-driven Celery schedule (#76)#96

Merged
snowfox1003 merged 24 commits intocppalliance:developfrom
snowfox1003:dev-76
Mar 16, 2026
Merged

Add boost_collector_runner app with YAML-driven Celery schedule (#76)#96
snowfox1003 merged 24 commits intocppalliance:developfrom
snowfox1003:dev-76

Conversation

@snowfox1003
Copy link
Copy Markdown
Collaborator

@snowfox1003 snowfox1003 commented Mar 5, 2026

Summary

Introduces the boost_collector_runner app so collector runs are driven by a YAML schedule instead of code. Celery Beat is built from config/boost_collector_schedule.yaml, so adding, reordering, or rescheduling collectors only requires editing the YAML.

Changes

New app: boost_collector_runner

  • schedule_config.py – Loads and validates the YAML; exposes get_beat_schedule() for Celery Beat and get_tasks_for_schedule() for the runner. Supports schedule types: daily, weekly, monthly, interval, on_release. One Beat entry per group (group batch at default_time); interval tasks get separate Beat entries.
  • run_scheduled_collectors – Management command that runs collectors for a given schedule (and optional --group). For group runs, runs daily + weekly (today) + monthly (today) + on_release (if new Boost release) together so no two tasks in the same group run in separate batches. Interval and on_release (standalone) still supported.
  • run_collectors – Legacy command (fixed list); YAML-driven runs use run_scheduled_collectors instead.
  • tasks.py – Celery task run_scheduled_collectors_task invoked by Beat; forwards schedule/group/interval args to the management command.
  • Tests – Tests for run_collectors and run_scheduled_collectors (including YAML with default_time per group).

Boost library tracker: on_release support

  • release_check.pyhas_new_boost_release() checks GitHub API vs BoostVersion; only stable tags boost-X.Y.Z (no beta/rc); minimum version 1.16.1.
  • check_new_boost_release – Management command (exit 0 if new release, 1 otherwise) for automation.

Config and settings

  • config/boost_collector_schedule.yaml – Defines groups (e.g. github, slack, mailing_list) with default_time and tasks (command, schedule, args). Times are in America/Los_Angeles (CELERY_TIMEZONE).
  • config/settings.py – Adds boost_collector_runner to INSTALLED_APPS; replaces hardcoded CELERY_BEAT_SCHEDULE with get_beat_schedule() from the YAML. Adds BOOST_COLLECTOR_SCHEDULE_YAML.

Docs and deps

  • docs/Workflow.md – Documents YAML schedule, schedule types, group batch behavior, and Beat.
  • docs/Development_guideline.md – Updated for boost_collector_runner.
  • requirements.txt – New dependency (e.g. PyYAML) if added.
  • conftest.py – Fixture(s) for runner tests.

Behaviour

  • Groups: One Celery Beat entry per group at that group’s default_time. When it runs, all non-interval tasks for that group run sequentially (daily, weekly for today, monthly for today, on_release if new release).
  • Interval: Separate Beat entries per interval (e.g. every N minutes); not tied to a group.
  • on_release: Runs with the group batch; release check is done before adding on_release tasks.

Summary by CodeRabbit

  • New Features

    • YAML-driven collector scheduling (daily/weekly/monthly/interval/on-release) with group-based default times.
    • Management command and a Celery task to run scheduled collectors; Beat schedule now built from YAML.
    • On-release detection to trigger release-only tasks.
  • Documentation

    • Workflow and development guides updated for YAML-driven scheduling.
  • Chores

    • Added PyYAML dependency, app registered, example YAML and .gitignore entry for the schedule.

@snowfox1003 snowfox1003 self-assigned this Mar 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 99dc29d5-b490-40f2-8ff9-b26b78b793fc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new YAML-driven Boost collector scheduler: a Django app that loads/validates YAML schedules, generates Celery Beat entries, provides a management command and Celery task to run scheduled collectors, adds release-detection utilities, tests, docs, and wiring in settings/requirements.

Changes

Cohort / File(s) Summary
Core app setup
boost_collector_runner/__init__.py, boost_collector_runner/admin.py, boost_collector_runner/apps.py, boost_collector_runner/models.py
App metadata, AppConfig, small doc/comments; no models registered.
Schedule loader & validator
boost_collector_runner/schedule_config.py
New YAML loader/validator/normalizer; exposes load_config, get_groups_and_tasks, get_tasks_for_schedule, get_beat_schedule; builds Celery Beat entries and enforces schedule rules.
Runner command & Celery task
boost_collector_runner/management/commands/run_scheduled_collectors.py, boost_collector_runner/tasks.py
New management command and @shared_task wrapper that fetch tasks from YAML, validate schedule args, optionally check releases, execute collector commands sequentially, support stop-on-failure and aggregate results.
Tests for runner & config
boost_collector_runner/tests/test_commands.py, boost_collector_runner/tests/test_schedule_config.py, boost_collector_runner/tests/__init__.py, boost_collector_runner/tests/fixtures.py
Comprehensive unit/command tests for schedule parsing, filtering, beat generation, and command execution behavior.
Release detection
boost_library_tracker/release_check.py, boost_library_tracker/management/commands/check_new_boost_release.py
New has_new_boost_release() that checks GitHub tags against DB versions and a management command to expose that check (exit 0/1).
Settings & CI/test wiring
config/settings.py, conftest.py, requirements.txt, .gitignore
Registers app in INSTALLED_APPS, adds BOOST_COLLECTOR_SCHEDULE_YAML, loads CELERY_BEAT_SCHEDULE via schedule_config.get_beat_schedule() with fallback, enables CELERY_ENABLE_UTC, adds PyYAML dependency, adds pytest fixture plugin and ignores local YAML.
Example config & docs
config/boost_collector_schedule.yaml.example, docs/Workflow.md, docs/Development_guideline.md
Example YAML for groups/tasks/schedules; docs updated to describe dual-mode execution (fixed workflow vs YAML), Beat generation and CLI usage.
GitHub token handling
github_ops/tokens.py
Adds GITHUB_TOKENS_SCRAPING support with cycling/strip; consolidates write token resolution and clearer errors.

Sequence Diagram(s)

sequenceDiagram
    participant Beat as Celery Beat
    participant Task as run_scheduled_collectors_task
    participant Cmd as run_scheduled_collectors (management cmd)
    participant Config as schedule_config
    participant YAML as YAML file
    participant GitHub as boost_library_tracker.release_check
    participant Collector as collector command (call_command)

    Beat->>Task: trigger (schedule_kind, params)
    Task->>Cmd: call_command("run_scheduled_collectors", args)
    Cmd->>Config: get_tasks_for_schedule(schedule_kind, filters...)
    Config->>YAML: load_config()
    YAML-->>Config: YAML content
    Config-->>Cmd: list of (group, task) to run

    alt schedule_kind is on_release or default with release check
        Cmd->>GitHub: has_new_boost_release()
        GitHub-->>Cmd: True/False
    end

    Cmd->>Collector: call_command(task_name, task_args) [sequential]
    Collector-->>Cmd: success/failure
    Cmd-->>Task: aggregated exit/status
    Task-->>Beat: completion
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • jonathanMLDev

Poem

🐇 I nibble YAML leaves at dawn,

Groups and times neatly drawn,
Beat drums tap, commands take flight,
Collectors hum from morn to night,
Hooray—no hardcode in my burrow bright!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: introducing a new Django app (boost_collector_runner) with a YAML-driven Celery schedule system. It directly reflects the core feature addition across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 96.72% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@CodeRabbit full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
docs/Development_guideline.md (1)

102-102: Reword this list item for readability.

Line 102 is a bit repetitive and dense; a small rewrite will make it easier to scan.

✏️ Suggested wording tweak
-4. Add the command to the list run by `run_all_collectors` (workflow app) or add it to `config/boost_collector_schedule.yaml` under the right group with the right schedule (see [Workflow.md](Workflow.md#2-boost-collector-runner-and-yaml-schedule)).
+4. Register the command in either execution path:
+   - add it to `run_all_collectors` (workflow app), or
+   - place it in `config/boost_collector_schedule.yaml` under the appropriate group/schedule (see [Workflow.md](Workflow.md#2-boost-collector-runner-and-yaml-schedule)).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Development_guideline.md` at line 102, Rewrite the list item to be
clearer and less repetitive: mention adding the command either to the
run_all_collectors list used by the workflow app (referencing
run_all_collectors) or to the boost schedule file
(config/boost_collector_schedule.yaml) under the appropriate group and schedule,
and add a brief parenthetical pointer to the relevant docs
(Workflow.md#2-boost-collector-runner-and-yaml-schedule); keep the sentence
concise and parallel in structure so it reads like two simple alternatives.
boost_collector_runner/tests/test_commands.py (1)

130-139: Remove redundant import of call_command.

call_command is already imported at line 7. The local import at line 133 is unnecessary.

♻️ Proposed fix
 `@pytest.mark.django_db`
 def test_run_scheduled_collectors_requires_schedule():
     """run_scheduled_collectors without --schedule raises CommandError."""
-    from django.core.management import call_command
-
     out = StringIO()
     err = StringIO()
     with pytest.raises(CommandError) as exc_info:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/test_commands.py` around lines 130 - 139, Remove
the redundant local import of call_command inside the test function
test_run_scheduled_collectors_requires_schedule; since call_command is already
imported at module scope, delete the line "from django.core.management import
call_command" within the function so the test uses the existing module-level
import and avoids a duplicate import.
boost_collector_runner/management/commands/run_scheduled_collectors.py (1)

160-180: Rename unused loop variable task_group_id to _task_group_id.

The loop variable is unpacked but never used within the loop body. Per Python convention, prefix with underscore to indicate it's intentionally unused.

♻️ Proposed fix
-        for task_group_id, task in tasks:
+        for _task_group_id, task in tasks:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`
around lines 160 - 180, The loop unpacks (task_group_id, task) but never uses
task_group_id; rename it to _task_group_id in the for loop (`for _task_group_id,
task in tasks:`) to follow Python convention for intentionally unused variables
(in file run_scheduled_collectors.py around the loop that calls call_command and
appends to results), leaving the rest of the loop body unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_collector_runner/schedule_config.py`:
- Around line 247-260: Validate weekly/monthly selector values immediately after
parsing: when schedule_kind == "weekly" call _normalize_day_of_week(day_of_week)
and if it returns an invalid/None result raise ValueError("invalid day_of_week
for schedule_kind='weekly'"); when schedule_kind == "monthly" convert
day_of_month to int (day_of_month_int) and assert 1 <= day_of_month_int <= 31,
otherwise raise ValueError("day_of_month must be 1-31 for
schedule_kind='monthly'"); keep the existing interval validation for
interval_minutes and reuse day_of_week_full and day_of_month_int identifiers so
invalid selectors fail fast instead of silently producing no tasks.

In `@boost_library_tracker/release_check.py`:
- Around line 53-70: The code compares raw GitHub tag strings like
"boost-1.90.0" to DB values like "1.90.0", causing false positives; update the
release loop in the client.rest_request pagination to normalize tags via
_parse_stable_version and compare that parsed_version (e.g., the "1.90.0"
returned by _parse_stable_version) against the existing set before deciding it's
new. Specifically, in the loop around BoostVersion and tag handling, call
parsed_version = _parse_stable_version(tag), skip if parsed_version is None or
parsed_version in existing, and only log/return True when parsed_version is not
in existing.
- Around line 48-52: The call to get_github_token() must be moved inside the
existing try block so its ValueError is caught; update the function so you
instantiate token = get_github_token() and client = GitHubAPIClient(token)
inside the try, add an except ValueError branch that logs a warning (same
message used originally) and returns False, and keep other exceptions handled as
before so failures fall back gracefully.

In `@config/settings.py`:
- Around line 287-296: The try/except around from
boost_collector_runner.schedule_config import get_beat_schedule leaves
CELERY_BEAT_SCHEDULE undefined on failure; catch the exception, set
CELERY_BEAT_SCHEDULE to a safe explicit fallback (e.g., an empty dict or default
schedule) and log the full exception/traceback by using
logging.getLogger(__name__).exception or including exc_info=True in the error
call so startup behavior is deterministic; update the except block that
references get_beat_schedule and CELERY_BEAT_SCHEDULE accordingly.

In `@docs/Workflow.md`:
- Line 37: Update the "daily" description to match the implementation: remove
the phrase "(or task's)" so the line under the **daily** schedule reads that it
runs every day at the group's time (tasks use the group's default_time). Change
the text associated with **daily** to reference the group's/default_time only to
avoid implying tasks have their own time.

---

Nitpick comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 160-180: The loop unpacks (task_group_id, task) but never uses
task_group_id; rename it to _task_group_id in the for loop (`for _task_group_id,
task in tasks:`) to follow Python convention for intentionally unused variables
(in file run_scheduled_collectors.py around the loop that calls call_command and
appends to results), leaving the rest of the loop body unchanged.

In `@boost_collector_runner/tests/test_commands.py`:
- Around line 130-139: Remove the redundant local import of call_command inside
the test function test_run_scheduled_collectors_requires_schedule; since
call_command is already imported at module scope, delete the line "from
django.core.management import call_command" within the function so the test uses
the existing module-level import and avoids a duplicate import.

In `@docs/Development_guideline.md`:
- Line 102: Rewrite the list item to be clearer and less repetitive: mention
adding the command either to the run_all_collectors list used by the workflow
app (referencing run_all_collectors) or to the boost schedule file
(config/boost_collector_schedule.yaml) under the appropriate group and schedule,
and add a brief parenthetical pointer to the relevant docs
(Workflow.md#2-boost-collector-runner-and-yaml-schedule); keep the sentence
concise and parallel in structure so it reads like two simple alternatives.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2de00502-33a1-4a9f-8afa-2fcc133e6e83

📥 Commits

Reviewing files that changed from the base of the PR and between 2a2e0ea and 6fda649.

📒 Files selected for processing (22)
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_collectors.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • requirements.txt

Comment thread boost_collector_runner/schedule_config.py
Comment thread boost_library_tracker/release_check.py Outdated
Comment thread boost_library_tracker/release_check.py Outdated
Comment thread config/settings.py
Comment thread docs/Workflow.md Outdated
@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
boost_collector_runner/tasks.py (1)

50-59: Address static analysis hints for logging.

The error handling logic is correct, but the logging can be improved:

  1. Line 52-55: logger.error won't include the traceback. Using logger.exception would provide more diagnostic context.
  2. Line 58: The exception object e is redundant in logger.exception since it automatically captures the exception info.
♻️ Proposed fix
     except SystemExit as e:
         if e.code != 0:
-            logger.error(
-                "run_scheduled_collectors_task: command exited with code %s",
-                e.code,
-            )
+            logger.exception(
+                "run_scheduled_collectors_task: command exited with code %s",
+                e.code,
+            )
             raise
-    except Exception as e:
-        logger.exception("run_scheduled_collectors_task failed: %s", e)
+    except Exception:
+        logger.exception("run_scheduled_collectors_task failed")
         raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tasks.py` around lines 50 - 59, In the except
SystemExit block inside run_scheduled_collectors_task, replace the logger.error
call with logger.exception so the traceback is captured when e.code != 0 (keep
the same message wording but use logger.exception and still re-raise); in the
generic except Exception block remove the redundant exception argument from
logger.exception and call it with just the message
"run_scheduled_collectors_task failed" (logger.exception already captures
exception info) and then re-raise.
docs/Development_guideline.md (1)

8-20: Consider explicitly listing run_collectors in fixed-list examples.

These sections mention fixed-list execution via run_all_collectors, but this PR also introduces/retains run_collectors for fixed-order runs. Adding it here would reduce contributor confusion during local command testing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Development_guideline.md` around lines 8 - 20, Update the documentation
to include the alternate fixed-list management command name by adding
`run_collectors` alongside `run_all_collectors` in the examples and
descriptions; mention that both `run_collectors` and `run_all_collectors` are
valid fixed-order entry points implemented as Django management commands in
`management/commands/` (e.g., `run_boost_library_tracker.py`), and keep the
existing reference to `boost_collector_runner` and
`config/boost_collector_schedule.yaml` for YAML-driven scheduling.
config/boost_collector_schedule.yaml (1)

9-29: Add optional validation to catch command name typos at config load time rather than execution time.

Lines 9–29 define scheduled commands by string name. The current validation in schedule_config.py (lines 117–121) only verifies that command is a non-empty string; it does not check whether the corresponding management command module actually exists. A typo in any command name will only surface at execution time when Celery Beat attempts to invoke it.

Since all six commands referenced in this file have matching management command modules, this is not a current blocker. However, adding an import check during config load would prevent future misconfigurations and improve reliability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/boost_collector_schedule.yaml` around lines 9 - 29, Update the config
load validation in schedule_config.py (the block that currently checks the
`command` key at lines ~117–121) to optionally verify that the named management
command can be imported at load time: for each scheduled task entry with a
`command` string, attempt to import the corresponding Django management command
module (e.g.
importlib.import_module(f"{app_namespace}.management.commands.{command}") or
otherwise resolve your project's management command lookup) and raise a clear
validation error if the import fails; make this import-check opt-in (via a flag
or environment variable) so it can be disabled in contexts where imports are
undesirable, and ensure the error message includes the `command` value so typos
are easy to spot.
boost_collector_runner/schedule_config.py (1)

322-323: Remove duplicated docstring sentence.

The same fallback sentence is repeated twice; trimming it improves readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/schedule_config.py` around lines 322 - 323, Remove the
duplicated sentence in the docstring that reads "If the YAML file does not exist
or is invalid, returns {} (no beat schedule)." — keep a single instance of that
fallback sentence in the docstring for the function handling the schedule/YAML
parsing (the docstring containing that exact line) to improve readability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_collector_runner/management/commands/run_collectors.py`:
- Around line 53-70: Add an explicit SystemExit handler around the call_command
block to guard against nested sys.exit() calls: catch SystemExit (before the
generic Exception clause), extract a numeric exit code from the exception (use
getattr(e, "code", 1) or default to 1), log the failure via logger (similar
message format used for CommandError), append the (name, code) entry to results,
set exit_code accordingly and respect stop_on_failure by breaking if set;
reference the existing call_command invocation, CommandError handling, logger,
results, exit_code and stop_on_failure to implement this defensive branch.

In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 167-183: The child collector commands can raise SystemExit which
currently bypasses your except Exception/CommandError handlers; add an explicit
except SystemExit as e block around the call_command(name, *args) invocation to
capture e.code (or 1 if falsy), append (name, code) to results, set exit_code
accordingly, log the failure (use logger.error or logger.exception with the
command name), and honor stop_on_failure just like the other handlers so result
accounting, summary display, and final exit handling remain correct; reference
the call_command invocation, CommandError handling, results list, exit_code
variable, stop_on_failure flag, and logger used in this function.
- Around line 67-69: Replace the naive date.today() usage with a timezone-aware
date computed from the configured CELERY_TIMEZONE so weekly/monthly batching
uses the Celery/Django timezone; e.g., read settings.CELERY_TIMEZONE, build a tz
object (pytz.timezone or zoneinfo.ZoneInfo), compute today =
datetime.now(tz).date(), then derive today_weekday =
today.strftime("%A").lower() and today_day = today.day (update the code around
the today/today_weekday/today_day assignments in run_scheduled_collectors.py to
use this timezone-aware today).

In `@boost_collector_runner/schedule_config.py`:
- Around line 337-345: The current try/except around load_config only catches
YAML errors during parsing but allows ValueError from the normalization/build
phase (e.g., in _collect_distinct_schedules or schedule construction) to
propagate; expand the try block in get_beat_schedule to wrap both
load_config(...) and the subsequent normalization/build steps that call
_collect_distinct_schedules(...) and the loop that constructs schedule so any
ValueError or yaml.YAMLError raised during normalization will be caught, log the
same warning (including the exception) and return {} as the graceful fallback.

In `@docs/Workflow.md`:
- Line 54: Update the sentence about interval tasks to clarify that although
"Interval tasks" are configured under "groups" (and inherit group-level
configuration like "default_time"), they are not executed as part of the group's
sequential run; instead each interval task gets its own Celery Beat entry and
runs independently at execution time. Keep references to "groups",
"default_time", "Interval tasks", and "Celery Beat" in the revised wording so
readers understand configuration vs. runtime behavior.
- Line 10: Update the schedule-type list in the docs for the
boost_collector_runner app to include "interval" for consistency with the
supported types; specifically, in the paragraph that describes
config/boost_collector_schedule.yaml and the valid schedule types (daily,
weekly, monthly, on_release), add "interval" so the example and the description
of the python manage.py run_scheduled_collectors command reflect that interval
schedules are supported by the YAML and Celery Beat.

---

Nitpick comments:
In `@boost_collector_runner/schedule_config.py`:
- Around line 322-323: Remove the duplicated sentence in the docstring that
reads "If the YAML file does not exist or is invalid, returns {} (no beat
schedule)." — keep a single instance of that fallback sentence in the docstring
for the function handling the schedule/YAML parsing (the docstring containing
that exact line) to improve readability.

In `@boost_collector_runner/tasks.py`:
- Around line 50-59: In the except SystemExit block inside
run_scheduled_collectors_task, replace the logger.error call with
logger.exception so the traceback is captured when e.code != 0 (keep the same
message wording but use logger.exception and still re-raise); in the generic
except Exception block remove the redundant exception argument from
logger.exception and call it with just the message
"run_scheduled_collectors_task failed" (logger.exception already captures
exception info) and then re-raise.

In `@config/boost_collector_schedule.yaml`:
- Around line 9-29: Update the config load validation in schedule_config.py (the
block that currently checks the `command` key at lines ~117–121) to optionally
verify that the named management command can be imported at load time: for each
scheduled task entry with a `command` string, attempt to import the
corresponding Django management command module (e.g.
importlib.import_module(f"{app_namespace}.management.commands.{command}") or
otherwise resolve your project's management command lookup) and raise a clear
validation error if the import fails; make this import-check opt-in (via a flag
or environment variable) so it can be disabled in contexts where imports are
undesirable, and ensure the error message includes the `command` value so typos
are easy to spot.

In `@docs/Development_guideline.md`:
- Around line 8-20: Update the documentation to include the alternate fixed-list
management command name by adding `run_collectors` alongside
`run_all_collectors` in the examples and descriptions; mention that both
`run_collectors` and `run_all_collectors` are valid fixed-order entry points
implemented as Django management commands in `management/commands/` (e.g.,
`run_boost_library_tracker.py`), and keep the existing reference to
`boost_collector_runner` and `config/boost_collector_schedule.yaml` for
YAML-driven scheduling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 90b5a9ab-8305-474a-8970-bcb0f1851a2b

📥 Commits

Reviewing files that changed from the base of the PR and between 2a2e0ea and e91de6d.

📒 Files selected for processing (22)
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_collectors.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • requirements.txt

Comment thread boost_collector_runner/management/commands/run_collectors.py Outdated
Comment thread boost_collector_runner/management/commands/run_scheduled_collectors.py Outdated
Comment thread boost_collector_runner/schedule_config.py
Comment thread docs/Workflow.md Outdated
Comment thread docs/Workflow.md Outdated
@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

✅ Actions performed

Full review triggered.

@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
boost_library_tracker/release_check.py (1)

77-79: Preserve traceback on unexpected failures

Line 77-79 logs only the exception message; that makes production debugging harder. Prefer logger.exception(...) here to keep stack traces while still returning False.

♻️ Proposed diff
-    except Exception as e:
-        logger.warning("Failed to check for new Boost release: %s", e)
+    except Exception:
+        logger.exception("Failed to check for new Boost release")
         return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_tracker/release_check.py` around lines 77 - 79, Replace the
logger.warning call in the except block of release_check.py so the full
traceback is preserved: instead of logger.warning("Failed to check for new Boost
release: %s", e) call logger.exception with a similar message (e.g., "Failed to
check for new Boost release") and keep the surrounding except Exception as e
handler and the subsequent return False; refer to the existing except block and
the logger variable to locate where to change the call.
boost_collector_runner/tests/test_commands.py (1)

46-59: Strengthen the execution assertion with mock call verification.

Current assertions rely on stdout text only. Consider also asserting the patched call_command invocation (e.g., called once with run_boost_library_tracker) so task dispatch behavior is validated directly, not just logs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/test_commands.py` around lines 46 - 59, The test
currently only checks stdout text; also capture and verify the patched
call_command invocation to ensure task dispatch happened: when patching
"boost_collector_runner.management.commands.run_scheduled_collectors.call_command"
assign it to a mock (e.g., mock_call_command) and after calling
call_command("run_scheduled_collectors", "--schedule", "daily", ...) add an
assertion like mock_call_command.assert_any_call("run_boost_library_tracker",
ANY, ANY) or mock_call_command.assert_any_call("run_boost_library_tracker") (use
appropriate args/kwargs matching the real dispatcher) to confirm the dispatcher
invoked run_boost_library_tracker.
boost_collector_runner/management/commands/run_scheduled_collectors.py (1)

83-89: Don’t silently swallow missing boost_library_tracker in group batching.

This except ImportError: pass makes on-release exclusion opaque at runtime. Please log a warning when import fails so skipped behavior is observable.

Based on learnings, this project prefers graceful degradation with monitored errors, not silent suppression.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`
around lines 83 - 89, Replace the silent except ImportError: pass in the block
that imports boost_library_tracker.release_check (the has_new_boost_release
check that conditionally extends tasks with get_tasks_for_schedule("on_release",
group_id=group_id)) with an explicit warning log: catch ImportError as e and
call logger.warning (or create module logger via logging.getLogger(__name__) if
none exists) including context (e.g., "boost_library_tracker not available,
skipping on_release checks" and the group_id and exception message) so the
missing dependency is visible while preserving the current fallback behavior of
skipping on-release tasks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_collector_runner/schedule_config.py`:
- Around line 127-133: The monthly schedule block currently calls int(on_val)
directly which raises an unscoped ValueError for non-numeric values; update the
logic in the "monthly" branch (where schedule == "monthly", on_val, task,
command, group_id are used) to wrap the int(on_val) conversion in a try/except
like the interval parsing does, and on any exception raise the same task-scoped
ValueError message used now (f"Task {command!r} in group {group_id!r}:
'schedule: monthly' requires 'on' (1-31, day of month)"). Ensure you still
validate the numeric range (1-31) after successful conversion.

In `@config/settings.py`:
- Line 285: Update the inline comment that currently says "fallback to workflow
daily" to accurately reflect the runtime behavior: the code falls back to an
empty beat schedule ({} ) when the YAML for boost_collector_runner is missing;
locate the comment near the boost_collector_runner schedule setup and change
phrasing to indicate an empty/default beat schedule fallback so operators aren't
misled.

In `@docs/Workflow.md`:
- Line 100: The docs incorrectly state a fallback to the workflow app’s single
daily task; update docs/Workflow.md to reflect the actual behavior in
config/settings.py where YAML load failures result in CELERY_BEAT_SCHEDULE = {},
i.e., no autogenerated schedule is used when boost_collector_schedule.yaml fails
to load; mention the boost_collector_runner and boost_collector_schedule.yaml
and that the system will end up with an empty CELERY_BEAT_SCHEDULE rather than
the workflow app’s daily task.
- Around line 5-10: Update the overview sentence that currently asserts “no
parallel execution” to clarify the scope: change it to state there is no
parallel execution within a single command run/batch (e.g., when running python
manage.py run_all_collectors or run_scheduled_collectors), while noting that
Celery Beat entries (workflow.tasks.run_all_collectors_task and the YAML-driven
schedule in config/boost_collector_schedule.yaml) can cause different groups or
scheduled jobs to run in parallel across workers; reference the commands
run_all_collectors and run_scheduled_collectors and the YAML groups when editing
the text to avoid contradiction.

In `@requirements.txt`:
- Line 10: Update the PyYAML requirement in requirements.txt from "PyYAML>=6.0"
to "PyYAML>=6.0.1" to avoid build failures with Cython 3.x; locate the existing
requirement line (PyYAML>=6.0) and change it to PyYAML>=6.0.1 so installs pick
up the fixed packaging metadata in 6.0.1+.

---

Nitpick comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 83-89: Replace the silent except ImportError: pass in the block
that imports boost_library_tracker.release_check (the has_new_boost_release
check that conditionally extends tasks with get_tasks_for_schedule("on_release",
group_id=group_id)) with an explicit warning log: catch ImportError as e and
call logger.warning (or create module logger via logging.getLogger(__name__) if
none exists) including context (e.g., "boost_library_tracker not available,
skipping on_release checks" and the group_id and exception message) so the
missing dependency is visible while preserving the current fallback behavior of
skipping on-release tasks.

In `@boost_collector_runner/tests/test_commands.py`:
- Around line 46-59: The test currently only checks stdout text; also capture
and verify the patched call_command invocation to ensure task dispatch happened:
when patching
"boost_collector_runner.management.commands.run_scheduled_collectors.call_command"
assign it to a mock (e.g., mock_call_command) and after calling
call_command("run_scheduled_collectors", "--schedule", "daily", ...) add an
assertion like mock_call_command.assert_any_call("run_boost_library_tracker",
ANY, ANY) or mock_call_command.assert_any_call("run_boost_library_tracker") (use
appropriate args/kwargs matching the real dispatcher) to confirm the dispatcher
invoked run_boost_library_tracker.

In `@boost_library_tracker/release_check.py`:
- Around line 77-79: Replace the logger.warning call in the except block of
release_check.py so the full traceback is preserved: instead of
logger.warning("Failed to check for new Boost release: %s", e) call
logger.exception with a similar message (e.g., "Failed to check for new Boost
release") and keep the surrounding except Exception as e handler and the
subsequent return False; refer to the existing except block and the logger
variable to locate where to change the call.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: be40bd87-f825-4d61-ba8a-3d9020784441

📥 Commits

Reviewing files that changed from the base of the PR and between 2a2e0ea and 68732a6.

📒 Files selected for processing (21)
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • requirements.txt

Comment thread boost_collector_runner/schedule_config.py
Comment thread config/settings.py Outdated
Comment thread docs/Workflow.md Outdated
Comment thread docs/Workflow.md Outdated
Comment thread requirements.txt Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

✅ Actions performed

Full review triggered.

1 similar comment
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (5)
requirements.txt (1)

10-10: ⚠️ Potential issue | 🟡 Minor

Bump minimum PyYAML to 6.0.1+.

PyYAML>=6.0 still permits 6.0.0, which is the problematic build range already discussed in earlier review rounds. Recommend setting minimum to 6.0.1.

🔧 Proposed dependency fix
-PyYAML>=6.0
+PyYAML>=6.0.1
#!/bin/bash
set -euo pipefail

python - <<'PY'
import json, urllib.request
url = "https://pypi.org/pypi/PyYAML/json"
data = json.load(urllib.request.urlopen(url))
latest = data["info"]["version"]
has_600 = "6.0" in data["releases"]
has_601 = "6.0.1" in data["releases"]
print(f"Latest PyYAML on PyPI: {latest}")
print(f"6.0 available: {has_600}")
print(f"6.0.1 available: {has_601}")
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements.txt` at line 10, Update the PyYAML version spec in
requirements.txt to avoid allowing the problematic 6.0 release: replace the
existing "PyYAML>=6.0" entry with "PyYAML>=6.0.1" so the package resolver will
exclude 6.0 and allow 6.0.1 or newer.
docs/Workflow.md (2)

100-100: ⚠️ Potential issue | 🟡 Minor

Update fallback behavior to match settings runtime.

This line says runner falls back to the workflow app’s daily task, but current settings fallback is an empty CELERY_BEAT_SCHEDULE.

📝 Proposed fix
-- Task scheduling – Celery and Celery Beat run tasks on a schedule (e.g. daily at 1:00 AM PST). When using **boost_collector_runner**, the Beat schedule is generated from `config/boost_collector_schedule.yaml`; otherwise the **workflow** app’s single daily task is used.
+- Task scheduling – Celery and Celery Beat run tasks on a schedule (e.g. daily at 1:00 AM PST). When using **boost_collector_runner**, the Beat schedule is generated from `config/boost_collector_schedule.yaml`; if YAML loading fails, `CELERY_BEAT_SCHEDULE` falls back to `{}` (no autogenerated schedule).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Workflow.md` at line 100, Update the docs to reflect current runtime
settings: change the statement that using boost_collector_runner falls back to
the workflow app’s single daily task and instead state that when
boost_collector_runner is not used the fallback is an empty CELERY_BEAT_SCHEDULE
(i.e., no automatic scheduled tasks), and mention that explicit schedule files
like config/boost_collector_schedule.yaml are required or that management
commands (python manage.py run_all_collectors / run_scheduled_collectors
--schedule daily) must be used to trigger collectors; reference
boost_collector_runner, config/boost_collector_schedule.yaml,
CELERY_BEAT_SCHEDULE and the workflow app in the updated sentence.

5-10: ⚠️ Potential issue | 🟡 Minor

Clarify the scope of “no parallel execution.”

This currently conflicts with later text describing parallel runs across workers for different Beat entries/groups. Scope the statement to a single command run/batch.

📝 Proposed fix
-Collectors run one after another—no parallel execution.
+Within a single command run/batch, collectors run one after another (no in-batch parallel execution). Separate Celery Beat entries can still run in parallel across workers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Workflow.md` around lines 5 - 10, Clarify that "no parallel execution"
refers only to the execution within a single management-command invocation/batch
(e.g., a single run_all_collectors or a single run_scheduled_collectors run) or
a single Beat job: update the Workflow.md text to state that collectors executed
by a single command (run_all_collectors) or a single scheduled
run_scheduled_collectors group are run sequentially, but note that different
Beat entries/groups (from config/boost_collector_schedule.yaml) or separate
workers may run in parallel across groups/entries.
config/settings.py (1)

285-285: ⚠️ Potential issue | 🟡 Minor

Align inline fallback comment with actual runtime fallback.

The comment states fallback to workflow daily, but the actual fallback is an empty beat schedule ({}).

📝 Proposed fix
-# Schedule from YAML (boost_collector_runner); fallback to workflow daily if YAML missing
+# Schedule from YAML (boost_collector_runner); fallback to empty beat schedule if YAML load fails
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/settings.py` at line 285, Update the inline comment next to the
"Schedule from YAML (boost_collector_runner)" note to reflect the actual runtime
behavior: it falls back to an empty beat schedule ({}) rather than "workflow
daily". Locate the comment near the boost_collector_runner YAML schedule
handling in config/settings.py and change the wording to state that the fallback
is an empty beat schedule (or alternatively update the fallback logic to use the
workflow daily schedule if that behavior is intended).
boost_collector_runner/schedule_config.py (1)

127-133: ⚠️ Potential issue | 🟡 Minor

Handle non-numeric monthly on values with scoped validation.

Line 129 can raise a raw conversion ValueError, bypassing your task-scoped message.

🔧 Proposed fix
     if schedule == "monthly":
         on_val = task.get("on") if "on" in task else task.get("day_of_month")
-        if on_val is None or not (1 <= int(on_val) <= 31):
+        try:
+            on_val_int = int(on_val)
+        except (TypeError, ValueError):
+            on_val_int = None
+        if on_val_int is None or not (1 <= on_val_int <= 31):
             raise ValueError(
                 f"Task {command!r} in group {group_id!r}: "
                 f"'schedule: monthly' requires 'on' (1-31, day of month)"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/schedule_config.py` around lines 127 - 133, The
monthly schedule block can raise a raw ValueError when converting non-numeric on
values; update the validation in the schedule == "monthly" branch (the on_val
check around variables on_val, task, command, group_id) to first verify on_val
is numeric (e.g., using str.isdigit() or a safe cast check) before calling
int(on_val), and if it's missing or non-numeric or out of range raise a
ValueError that includes the task-scoped message (the same f"Task {command!r} in
group {group_id!r}: 'schedule: monthly' requires 'on' (1-31, day of month)").
🧹 Nitpick comments (1)
boost_collector_runner/tests/test_commands.py (1)

46-59: Assert the mocked sub-command call explicitly.

Current assertions rely on stdout text only. Add a direct assertion on the patched call_command invocation to harden this test against output-only false positives.

✅ Proposed enhancement
-    with patch(
+    with patch(
         "boost_collector_runner.management.commands.run_scheduled_collectors.call_command",
         return_value=None,
-    ):
+    ) as call_command_mock:
         call_command(
             "run_scheduled_collectors",
             "--schedule",
             "daily",
             stdout=out,
             stderr=err,
         )
+    call_command_mock.assert_called_once_with("run_boost_library_tracker")
     content = out.getvalue()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/test_commands.py` around lines 46 - 59, Add an
explicit assertion that the mocked call_command was invoked with the expected
sub-command instead of relying solely on stdout: inside the patch context that
mocks
boost_collector_runner.management.commands.run_scheduled_collectors.call_command,
assert the mock (the patched call_command) was called and that one of the call
args or calls includes "run_boost_library_tracker" (and optionally the
"--schedule", "daily" args) after invoking
call_command("run_scheduled_collectors", ...); use the mock's assert_called /
assert_any_call / call_args to validate the sub-command invocation in the
test_test_commands.py test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 83-89: The try/except around importing has_new_boost_release
should not silently swallow ImportError; modify the block in
run_scheduled_collectors so that when ImportError is raised you log or report
the exception (e.g., using the module logger or existing error reporting like
Sentry) including the exception details and context, then continue to allow
other tasks to run; specifically, catch ImportError as e and call
logger.exception or error-reporting API before skipping the
get_tasks_for_schedule("on_release", group_id=group_id) call.

In `@boost_collector_runner/schedule_config.py`:
- Around line 265-268: When schedule_kind == "interval", avoid letting
int(interval_minutes) raise raw exceptions: first attempt to convert
interval_minutes to an int (e.g., minutes_int) inside a try/except that catches
ValueError/TypeError and raises a clear ValueError like "interval_minutes must
be an integer"; then perform the range check 1 <= minutes_int <=
INTERVAL_MINUTES_MAX and raise the existing ValueError message if out of range.
Use the symbols schedule_kind, interval_minutes, minutes_int (converted value)
and INTERVAL_MINUTES_MAX to locate and update the logic.

In `@boost_library_tracker/release_check.py`:
- Around line 74-76: The current except ValueError block is too broad and treats
any ValueError as a token problem; narrow the try/except to only the token
acquisition call (the get_github_token or equivalent token retrieval function)
so that only failures from that call trigger logger.warning("No GitHub token;
cannot check for new Boost release: %s", e) and return False, while other
ValueError exceptions in the rest of release_check.py are allowed to propagate
(or be handled separately); move the try/except so it wraps only the token
retrieval expression and keep the logger.warning and return False inside that
narrowed except.

In `@config/boost_collector_schedule.yaml`:
- Around line 1-3: The header comment contains hard-coded conversion examples
that are now stale; update the comment around default_time and CELERY_TIMEZONE
to remove or correct the specific conversion examples (e.g., "04:10 Pacific =
12:10 UTC") and instead state that default_time is interpreted in
America/Los_Angeles (CELERY_TIMEZONE) and that UTC offsets vary with DST,
advising operators to convert times using a reliable timezone-aware tool; ensure
references to default_time and CELERY_TIMEZONE remain in the comment for
clarity.

---

Duplicate comments:
In `@boost_collector_runner/schedule_config.py`:
- Around line 127-133: The monthly schedule block can raise a raw ValueError
when converting non-numeric on values; update the validation in the schedule ==
"monthly" branch (the on_val check around variables on_val, task, command,
group_id) to first verify on_val is numeric (e.g., using str.isdigit() or a safe
cast check) before calling int(on_val), and if it's missing or non-numeric or
out of range raise a ValueError that includes the task-scoped message (the same
f"Task {command!r} in group {group_id!r}: 'schedule: monthly' requires 'on'
(1-31, day of month)").

In `@config/settings.py`:
- Line 285: Update the inline comment next to the "Schedule from YAML
(boost_collector_runner)" note to reflect the actual runtime behavior: it falls
back to an empty beat schedule ({}) rather than "workflow daily". Locate the
comment near the boost_collector_runner YAML schedule handling in
config/settings.py and change the wording to state that the fallback is an empty
beat schedule (or alternatively update the fallback logic to use the workflow
daily schedule if that behavior is intended).

In `@docs/Workflow.md`:
- Line 100: Update the docs to reflect current runtime settings: change the
statement that using boost_collector_runner falls back to the workflow app’s
single daily task and instead state that when boost_collector_runner is not used
the fallback is an empty CELERY_BEAT_SCHEDULE (i.e., no automatic scheduled
tasks), and mention that explicit schedule files like
config/boost_collector_schedule.yaml are required or that management commands
(python manage.py run_all_collectors / run_scheduled_collectors --schedule
daily) must be used to trigger collectors; reference boost_collector_runner,
config/boost_collector_schedule.yaml, CELERY_BEAT_SCHEDULE and the workflow app
in the updated sentence.
- Around line 5-10: Clarify that "no parallel execution" refers only to the
execution within a single management-command invocation/batch (e.g., a single
run_all_collectors or a single run_scheduled_collectors run) or a single Beat
job: update the Workflow.md text to state that collectors executed by a single
command (run_all_collectors) or a single scheduled run_scheduled_collectors
group are run sequentially, but note that different Beat entries/groups (from
config/boost_collector_schedule.yaml) or separate workers may run in parallel
across groups/entries.

In `@requirements.txt`:
- Line 10: Update the PyYAML version spec in requirements.txt to avoid allowing
the problematic 6.0 release: replace the existing "PyYAML>=6.0" entry with
"PyYAML>=6.0.1" so the package resolver will exclude 6.0 and allow 6.0.1 or
newer.

---

Nitpick comments:
In `@boost_collector_runner/tests/test_commands.py`:
- Around line 46-59: Add an explicit assertion that the mocked call_command was
invoked with the expected sub-command instead of relying solely on stdout:
inside the patch context that mocks
boost_collector_runner.management.commands.run_scheduled_collectors.call_command,
assert the mock (the patched call_command) was called and that one of the call
args or calls includes "run_boost_library_tracker" (and optionally the
"--schedule", "daily" args) after invoking
call_command("run_scheduled_collectors", ...); use the mock's assert_called /
assert_any_call / call_args to validate the sub-command invocation in the
test_test_commands.py test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ce55a03a-ede9-48ab-91ad-f0050b1b5519

📥 Commits

Reviewing files that changed from the base of the PR and between 2a2e0ea and 68732a6.

📒 Files selected for processing (21)
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • requirements.txt

Comment thread boost_collector_runner/management/commands/run_scheduled_collectors.py Outdated
Comment thread boost_collector_runner/schedule_config.py Outdated
Comment thread boost_library_tracker/release_check.py Outdated
Comment thread config/boost_collector_schedule.yaml Outdated
@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

✅ Actions performed

Full review triggered.

@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review resume

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

@snowfox1003 Sure! I'll resume automated reviews for this PR right away.


🧠 Learnings used
Learnt from: snowfox1003
Repo: CppDigest/boost-data-collector PR: 67
File: boost_library_tracker/management/commands/collect_boost_libraries.py:89-91
Timestamp: 2026-02-25T15:36:20.460Z
Learning: In the boost_library_tracker management command (collect_boost_libraries.py), the `_fetch_releases()` function intentionally uses `break` (not `raise`) when pagination fails. This partial-progress-on-error pattern is acceptable because: (1) the job runs daily and is idempotent—processed releases are never reprocessed; (2) partial progress today allows tomorrow's run to continue from where it left off; (3) errors are monitored and sent to maintainers. This design choice prioritizes incremental progress over all-or-nothing consistency.
<!-- [review_comment_addressed]
✅ Actions performed

Reviews resumed.

@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
boost_library_tracker/release_check.py (2)

54-56: Unreachable code: if not token branch.

Per github_ops/tokens.py (context snippet lines 21-58), get_github_token(use="scraping") raises ValueError when no token is configured—it never returns a falsy value. Since the ValueError is now caught at lines 50-52, this if not token: check is dead code.

♻️ Proposed cleanup
     try:
         token = get_github_token(use="scraping")
     except ValueError as e:
         logger.warning("No GitHub token; cannot check for new Boost release: %s", e)
         return False

-    if not token:
-        logger.warning("No GitHub token; cannot check for new Boost release")
-        return False
-
     try:
         client = GitHubAPIClient(token)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_tracker/release_check.py` around lines 54 - 56, The
if-not-token branch in release_check.py is dead because
get_github_token(use="scraping") raises ValueError instead of returning falsy;
remove the unreachable block (the "if not token: logger.warning(...); return
False") and rely on the existing try/except around get_github_token (or, if you
prefer explicitness, change the try/except to handle a None return instead of
ValueError). Update only the conditional branch and any related early return so
code paths use the exception handling from get_github_token and avoid redundant
checks.

80-82: Consider narrowing exception handling for better observability.

The broad except Exception masks distinct failure modes (rate limits, connection errors, HTTP errors) that GitHubAPIClient.rest_request() can raise. While the "return False on failure" behavior is appropriate for graceful degradation, logging the exception type would aid debugging.

♻️ Optional: catch specific exceptions for clarity
-    except Exception as e:
-        logger.warning("Failed to check for new Boost release: %s", e)
+    except (requests.exceptions.RequestException, ConnectionError) as e:
+        logger.warning("Failed to check for new Boost release (%s): %s", type(e).__name__, e)
         return False

Add import at top:

import requests.exceptions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_tracker/release_check.py` around lines 80 - 82, The broad
except in the release_check.py function handling the call to
GitHubAPIClient.rest_request() should be narrowed and made more informative:
catch specific exceptions (e.g., requests.exceptions.RequestException,
HTTPError, ConnectionError, and any custom errors thrown by GitHubAPIClient) and
log both the exception type and message in the logger.warning call, while
preserving the current return False behavior for graceful degradation; add the
requests.exceptions import and update the exception handlers around the call to
GitHubAPIClient.rest_request() (and the logger.warning invocation) to include
the exception class name in the log.
boost_collector_runner/tests/test_commands.py (1)

46-59: Strengthen this test with a direct mock assertion.

The stdout assertion is good, but adding a call assertion would better guarantee the intended collector command is actually dispatched.

🔧 Suggested test hardening
-    with patch(
+    with patch(
         "boost_collector_runner.management.commands.run_scheduled_collectors.call_command",
         return_value=None,
-    ):
+    ) as mocked_call:
         call_command(
             "run_scheduled_collectors",
             "--schedule",
             "daily",
             stdout=out,
             stderr=err,
         )
+    mocked_call.assert_called_once_with("run_boost_library_tracker")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/test_commands.py` around lines 46 - 59, The test
should explicitly assert that the scheduled collector dispatch invoked the
intended collector command: change the patch of
"boost_collector_runner.management.commands.run_scheduled_collectors.call_command"
to use an as-bound mock (e.g., with patch(... ) as mock_call) and after
call_command("run_scheduled_collectors", ...) add
mock_call.assert_any_call("run_boost_library_tracker") (or assert_called_with
with expected args if exact signature is known) to ensure the collector command
was actually dispatched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_collector_runner/schedule_config.py`:
- Around line 199-201: The loop over groups doesn't validate keys; ensure each
group_id is a non-empty string before using it by adding a check in the same
loop (for group_id, group_data in groups.items()) to raise a ValueError if not
isinstance(group_id, str) or if group_id.strip() == "" with a clear message
referencing group_id and expected type; keep the existing group_data dict-type
check (raise ValueError(f"Group {group_id!r} must be a dict")) after the key
validation so invalid keys are caught early during YAML load.

---

Nitpick comments:
In `@boost_collector_runner/tests/test_commands.py`:
- Around line 46-59: The test should explicitly assert that the scheduled
collector dispatch invoked the intended collector command: change the patch of
"boost_collector_runner.management.commands.run_scheduled_collectors.call_command"
to use an as-bound mock (e.g., with patch(... ) as mock_call) and after
call_command("run_scheduled_collectors", ...) add
mock_call.assert_any_call("run_boost_library_tracker") (or assert_called_with
with expected args if exact signature is known) to ensure the collector command
was actually dispatched.

In `@boost_library_tracker/release_check.py`:
- Around line 54-56: The if-not-token branch in release_check.py is dead because
get_github_token(use="scraping") raises ValueError instead of returning falsy;
remove the unreachable block (the "if not token: logger.warning(...); return
False") and rely on the existing try/except around get_github_token (or, if you
prefer explicitness, change the try/except to handle a None return instead of
ValueError). Update only the conditional branch and any related early return so
code paths use the exception handling from get_github_token and avoid redundant
checks.
- Around line 80-82: The broad except in the release_check.py function handling
the call to GitHubAPIClient.rest_request() should be narrowed and made more
informative: catch specific exceptions (e.g.,
requests.exceptions.RequestException, HTTPError, ConnectionError, and any custom
errors thrown by GitHubAPIClient) and log both the exception type and message in
the logger.warning call, while preserving the current return False behavior for
graceful degradation; add the requests.exceptions import and update the
exception handlers around the call to GitHubAPIClient.rest_request() (and the
logger.warning invocation) to include the exception class name in the log.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c6326805-23e8-4941-867f-962138636a54

📥 Commits

Reviewing files that changed from the base of the PR and between 2a2e0ea and e9ac8e1.

📒 Files selected for processing (21)
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • requirements.txt

Comment thread boost_collector_runner/schedule_config.py
@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
boost_collector_runner/tests/fixtures.py (1)

1-3: Empty fixtures module registered as pytest plugin.

This module is registered in conftest.py but defines no fixtures. If fixtures are planned for the future, this is fine as a placeholder. Otherwise, consider either adding the intended fixtures or removing the registration from conftest.py until fixtures are needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/fixtures.py` around lines 1 - 3, The fixtures
module (fixtures.py) is an empty pytest plugin currently registered from
conftest.py; either implement the intended pytest fixtures inside fixtures.py
(add fixture functions decorated with `@pytest.fixture` and any helper imports) so
the registration is meaningful, or unregister/remove the plugin registration
from conftest.py until fixtures are needed to avoid a no-op plugin being loaded.
docs/Development_guideline.md (1)

99-103: Minor: Consider varying sentence structure.

Lines 99-103 have four consecutive steps starting with "Add" or "Create". This is a minor readability concern flagged by static analysis. Consider rephrasing one or two steps for variety, e.g., "Include the command in the list..." or "Register the command...".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Development_guideline.md` around lines 99 - 103, Update the phrasing in
the listed steps in docs/Development_guideline.md to avoid repetitive sentence
starts: reword at least two of the consecutive "Add"/"Create" steps (for example
change step 3 to "Implement a management command (e.g.
my_app/management/commands/run_my_app.py) that runs the app logic and returns
the correct exit code" and step 4 to "Register the command with
run_all_collectors or include it in config/boost_collector_schedule.yaml under
the appropriate group/schedule (see Workflow.md)"); keep the same referenced
artifacts (run_all_collectors, config/boost_collector_schedule.yaml,
Workflow.md, management command path) and preserve the original instructions and
links while only altering sentence openings for variety and readability.
boost_collector_runner/tests/test_commands.py (1)

65-65: Remove duplicate import.

call_command is already imported at the module level (line 7). This local import is redundant.

♻️ Suggested fix
 `@pytest.mark.django_db`
 def test_run_scheduled_collectors_requires_schedule():
     """run_scheduled_collectors without --schedule raises CommandError."""
-    from django.core.management import call_command
-
     out = StringIO()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/test_commands.py` at line 65, Remove the
duplicate local import of call_command in the test file: delete the redundant
line "from django.core.management import call_command" near the test at
tests/test_commands.py so the module-level import (call_command already imported
at top) is the only import; ensure no other references are changed and run tests
to confirm.
boost_collector_runner/management/commands/run_scheduled_collectors.py (1)

182-182: Rename unused loop variable to _task_group_id.

The task_group_id variable is unpacked but not used in the loop body. Prefix with underscore to indicate it's intentionally unused.

♻️ Suggested fix
-        for task_group_id, task in tasks:
+        for _task_group_id, task in tasks:
             name = task.get("command")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/management/commands/run_scheduled_collectors.py` at
line 182, The loop unpacks a pair into task_group_id and task but never uses
task_group_id; rename the unused variable to _task_group_id in the loop header
(for _task_group_id, task in tasks:) to follow Python conventions and silence
linters in the run_scheduled_collectors.py loop that currently reads "for
task_group_id, task in tasks:".
boost_collector_runner/tests/test_schedule_config.py (1)

260-265: Use raw strings for regex patterns in match=.

The patterns contain regex metacharacters (.*). While they work correctly, using raw strings (r"...") is clearer and avoids potential issues with escape sequences.

♻️ Suggested fix
 def test_validate_task_not_dict():
     """Task that is not a dict raises ValueError."""
-    with pytest.raises(ValueError, match="Task in group .* must be a dict"):
+    with pytest.raises(ValueError, match=r"Task in group .* must be a dict"):
         _validate_task([], "g1")
-    with pytest.raises(ValueError, match="Task in group .* must be a dict"):
+    with pytest.raises(ValueError, match=r"Task in group .* must be a dict"):
         _validate_task("task", "g1")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/test_schedule_config.py` around lines 260 - 265,
Update the test_validate_task_not_dict assertions to use raw-string regex
literals for the pytest.raises match patterns so backslashes and metacharacters
are handled literally; specifically change the match arguments to raw strings
(e.g., match=r"Task in group .* must be a dict") in the two calls that invoke
_validate_task([], "g1") and _validate_task("task", "g1") to make the regex
intent explicit.
boost_collector_runner/tasks.py (1)

57-59: Remove redundant exception object from logging.exception.

logging.exception automatically includes the exception information from the current context. Passing e explicitly is redundant.

♻️ Suggested fix
     except Exception as e:
-        logger.exception("run_scheduled_collectors_task failed: %s", e)
+        logger.exception("run_scheduled_collectors_task failed")
         raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tasks.py` around lines 57 - 59, The logger.exception
call in the exception handler for run_scheduled_collectors_task is passing the
exception object redundantly; remove the "%s" format and the "e" argument so it
relies on logger.exception's built‑in exception context (i.e., change the call
to logger.exception with only a descriptive message) and keep the subsequent
raise unchanged.
boost_collector_runner/schedule_config.py (1)

292-296: Minor: Redundant int() conversions.

day_of_month is already converted to int at line 277, and interval_minutes_int is created at line 286. The conversions at lines 293-296 are redundant.

♻️ Suggested cleanup
-    day_of_week_full = _normalize_day_of_week(day_of_week) if day_of_week else None
-    day_of_month_int = int(day_of_month) if day_of_month is not None else None
-    interval_minutes_int = (
-        int(interval_minutes) if interval_minutes is not None else None
-    )
+    day_of_week_full = _normalize_day_of_week(day_of_week) if day_of_week else None
+    # day_of_month already converted to int at line 277 if schedule_kind == "monthly"
+    day_of_month_int = day_of_month
+    # interval_minutes_int already set at line 286 if schedule_kind == "interval"
+    if schedule_kind != "interval":
+        interval_minutes_int = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/schedule_config.py` around lines 292 - 296, The code
performs redundant int() conversions for day_of_month and interval_minutes;
remove the extra conversions and use the already-converted variables
instead—i.e., stop re-casting day_of_month to int (use the previously assigned
day_of_month_int) and reuse interval_minutes_int rather than calling int()
again; update the block that sets
day_of_week_full/day_of_month_int/interval_minutes_int (and any references to
those names) so only one conversion occurs and
_normalize_day_of_week(day_of_week) is still used for day_of_week.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_collector_runner/schedule_config.py`:
- Around line 352-354: Remove the duplicated sentence in the docstring that
repeats "If the YAML file does not exist or is invalid, returns {} (no beat
schedule)." in schedule_config.py; edit the docstring for the function that
returns the beat schedule so the sentence appears only once and the docstring
remains a single clear description of behavior.

In `@boost_library_tracker/release_check.py`:
- Around line 54-56: Remove the redundant falsy-token guard: the call to
get_github_token(use="scraping") never returns a falsy value (it raises
ValueError instead), so delete the "if not token:" block that logs
logger.warning("No GitHub token; cannot check for new Boost release") and the
subsequent return False; rely on the existing ValueError handling around
get_github_token to handle missing tokens.

---

Nitpick comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Line 182: The loop unpacks a pair into task_group_id and task but never uses
task_group_id; rename the unused variable to _task_group_id in the loop header
(for _task_group_id, task in tasks:) to follow Python conventions and silence
linters in the run_scheduled_collectors.py loop that currently reads "for
task_group_id, task in tasks:".

In `@boost_collector_runner/schedule_config.py`:
- Around line 292-296: The code performs redundant int() conversions for
day_of_month and interval_minutes; remove the extra conversions and use the
already-converted variables instead—i.e., stop re-casting day_of_month to int
(use the previously assigned day_of_month_int) and reuse interval_minutes_int
rather than calling int() again; update the block that sets
day_of_week_full/day_of_month_int/interval_minutes_int (and any references to
those names) so only one conversion occurs and
_normalize_day_of_week(day_of_week) is still used for day_of_week.

In `@boost_collector_runner/tasks.py`:
- Around line 57-59: The logger.exception call in the exception handler for
run_scheduled_collectors_task is passing the exception object redundantly;
remove the "%s" format and the "e" argument so it relies on logger.exception's
built‑in exception context (i.e., change the call to logger.exception with only
a descriptive message) and keep the subsequent raise unchanged.

In `@boost_collector_runner/tests/fixtures.py`:
- Around line 1-3: The fixtures module (fixtures.py) is an empty pytest plugin
currently registered from conftest.py; either implement the intended pytest
fixtures inside fixtures.py (add fixture functions decorated with
`@pytest.fixture` and any helper imports) so the registration is meaningful, or
unregister/remove the plugin registration from conftest.py until fixtures are
needed to avoid a no-op plugin being loaded.

In `@boost_collector_runner/tests/test_commands.py`:
- Line 65: Remove the duplicate local import of call_command in the test file:
delete the redundant line "from django.core.management import call_command" near
the test at tests/test_commands.py so the module-level import (call_command
already imported at top) is the only import; ensure no other references are
changed and run tests to confirm.

In `@boost_collector_runner/tests/test_schedule_config.py`:
- Around line 260-265: Update the test_validate_task_not_dict assertions to use
raw-string regex literals for the pytest.raises match patterns so backslashes
and metacharacters are handled literally; specifically change the match
arguments to raw strings (e.g., match=r"Task in group .* must be a dict") in the
two calls that invoke _validate_task([], "g1") and _validate_task("task", "g1")
to make the regex intent explicit.

In `@docs/Development_guideline.md`:
- Around line 99-103: Update the phrasing in the listed steps in
docs/Development_guideline.md to avoid repetitive sentence starts: reword at
least two of the consecutive "Add"/"Create" steps (for example change step 3 to
"Implement a management command (e.g. my_app/management/commands/run_my_app.py)
that runs the app logic and returns the correct exit code" and step 4 to
"Register the command with run_all_collectors or include it in
config/boost_collector_schedule.yaml under the appropriate group/schedule (see
Workflow.md)"); keep the same referenced artifacts (run_all_collectors,
config/boost_collector_schedule.yaml, Workflow.md, management command path) and
preserve the original instructions and links while only altering sentence
openings for variety and readability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 32962207-77da-448b-9af7-05ab9eac9995

📥 Commits

Reviewing files that changed from the base of the PR and between 2a2e0ea and b6c0e94.

📒 Files selected for processing (22)
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_collector_runner/tests/test_schedule_config.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • requirements.txt

Comment thread boost_collector_runner/schedule_config.py
Comment thread boost_library_tracker/release_check.py
@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
boost_library_tracker/release_check.py (1)

80-82: Improve failure diagnostics by logging traceback.

Line 80 currently logs only the exception message. Keep returning False, but switch to traceback logging for faster prod triage.

🔧 Suggested change
-    except Exception as e:
-        logger.warning("Failed to check for new Boost release: %s", e)
+    except Exception:
+        logger.exception("Failed to check for new Boost release")
         return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_tracker/release_check.py` around lines 80 - 82, The except
block in release_check.py currently logs only the exception message; replace
that logger.warning call in the except Exception as e handler with a
traceback-aware log (e.g., use logger.exception or log traceback.format_exc())
so the full stack trace is recorded while preserving the return False; update
the handler around the same except Exception as e block that follows the Boost
release check logic to call logger.exception("Failed to check for new Boost
release") (or include the formatted traceback) and then return False.
docs/Development_guideline.md (1)

102-102: Minor doc-style cleanup on Line 102.
Line 102 repeats sentence starts similarly to adjacent steps; a quick reword improves readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Development_guideline.md` at line 102, Reword the repeated sentence on
Line 102 to improve readability by varying the phrasing while keeping the same
meaning: update the sentence that currently mentions adding the command to
"run_all_collectors" (workflow app) or to "config/boost_collector_schedule.yaml"
under the right group and schedule (see
Workflow.md#2-boost-collector-runner-and-yaml-schedule) so it avoids duplicating
the adjacent step's wording—use an alternative phrasing like "Either include the
command in the run_all_collectors workflow or add it to
config/boost_collector_schedule.yaml under the appropriate group and schedule
(see Workflow.md...)" while preserving the referenced symbols
run_all_collectors, config/boost_collector_schedule.yaml, and Workflow.md.
boost_collector_runner/schedule_config.py (2)

299-321: Consider caching YAML data to avoid redundant loads.

get_tasks_for_schedule calls get_groups_and_tasks() without passing data, causing a fresh YAML load on every invocation. In _get_group_batch_tasks (context snippet 2), this is called 3-4 times per group batch, resulting in redundant file I/O.

♻️ Potential optimization

Either accept an optional data parameter to allow callers to reuse loaded config:

 def get_tasks_for_schedule(
     schedule_kind,
     day_of_week=None,
     day_of_month=None,
     interval_minutes=None,
     group_id=None,
     month=None,
     year=None,
+    data=None,
 ):
     ...
     out = []
-    for gid, tasks in get_groups_and_tasks():
+    for gid, tasks in get_groups_and_tasks(data=data):

Or cache the loaded config at module level (with appropriate invalidation strategy).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/schedule_config.py` around lines 299 - 321,
get_tasks_for_schedule repeatedly calls get_groups_and_tasks() which reloads
YAML each time causing redundant I/O; modify get_tasks_for_schedule to accept an
optional data parameter (or use a module-level cached config) and have callers
such as _get_group_batch_tasks pass the already-loaded config into
get_tasks_for_schedule (or read from the cache) so the YAML is loaded once per
operation instead of on every call; update call sites of get_tasks_for_schedule
and get_groups_and_tasks to thread the data through (or use the cache) and
ensure behavior is unchanged when data is not provided.

383-390: Truthiness checks may inadvertently skip valid values in edge cases.

Lines 385 and 386 use if day_of_month: which would skip day_of_month=0. While 0 is invalid (1-31 range), using explicit is not None checks aligns better with the pattern used for interval_minutes (line 387) and group_id (line 389).

♻️ Consistency improvement
         if day_of_week:
             kwargs["day_of_week"] = day_of_week
-        if day_of_month:
+        if day_of_month is not None:
             kwargs["day_of_month"] = day_of_month
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/schedule_config.py` around lines 383 - 390, The
truthiness check for day_of_month can skip valid explicit values; update the
conditional that sets kwargs["day_of_month"] to use an explicit None check
(i.e., if day_of_month is not None) to match the pattern used for
interval_minutes and group_id and ensure valid falsy values aren't accidentally
omitted when populating kwargs in the schedule-building code that handles
day_of_week, day_of_month, interval_minutes, and group_id.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 190-199: The SystemExit handling currently maps non-int codes to
1, causing SystemExit(None) to be treated as failure; update the except
SystemExit block (where e is caught and code, results, exit_code,
stop_on_failure are used) to treat e.code == None as a success (code 0), keep
integer codes as-is, and only map other non-int/non-None codes to 1 so success
messages (self.stdout.write(self.style.SUCCESS(...))) and exit_code logic behave
correctly.

In `@boost_collector_runner/tasks.py`:
- Around line 50-56: The SystemExit handling in the except SystemExit as e block
of run_scheduled_collectors_task treats e.code == None as failure; normalize the
exit code before testing by mapping None to 0 (e.g., code = 0 if e.code is None
else e.code) and then check if code != 0 before logging the error via
logger.error and re-raising; update the condition around the logger.error call
that currently references e.code so it uses the normalized code value.

In `@boost_collector_runner/tests/test_commands.py`:
- Around line 57-60: The test in tests/test_commands.py is asserting the wrong
success string ("succeeded") whereas the command emits "success"; update the
assertion in the test (the lines around content = out.getvalue() and the asserts
referencing "Running" and "run_boost_library_tracker") to check for "success"
(or accept both "success" and "succeeded") so it matches the actual output from
run_scheduled_collectors.py.

---

Nitpick comments:
In `@boost_collector_runner/schedule_config.py`:
- Around line 299-321: get_tasks_for_schedule repeatedly calls
get_groups_and_tasks() which reloads YAML each time causing redundant I/O;
modify get_tasks_for_schedule to accept an optional data parameter (or use a
module-level cached config) and have callers such as _get_group_batch_tasks pass
the already-loaded config into get_tasks_for_schedule (or read from the cache)
so the YAML is loaded once per operation instead of on every call; update call
sites of get_tasks_for_schedule and get_groups_and_tasks to thread the data
through (or use the cache) and ensure behavior is unchanged when data is not
provided.
- Around line 383-390: The truthiness check for day_of_month can skip valid
explicit values; update the conditional that sets kwargs["day_of_month"] to use
an explicit None check (i.e., if day_of_month is not None) to match the pattern
used for interval_minutes and group_id and ensure valid falsy values aren't
accidentally omitted when populating kwargs in the schedule-building code that
handles day_of_week, day_of_month, interval_minutes, and group_id.

In `@boost_library_tracker/release_check.py`:
- Around line 80-82: The except block in release_check.py currently logs only
the exception message; replace that logger.warning call in the except Exception
as e handler with a traceback-aware log (e.g., use logger.exception or log
traceback.format_exc()) so the full stack trace is recorded while preserving the
return False; update the handler around the same except Exception as e block
that follows the Boost release check logic to call logger.exception("Failed to
check for new Boost release") (or include the formatted traceback) and then
return False.

In `@docs/Development_guideline.md`:
- Line 102: Reword the repeated sentence on Line 102 to improve readability by
varying the phrasing while keeping the same meaning: update the sentence that
currently mentions adding the command to "run_all_collectors" (workflow app) or
to "config/boost_collector_schedule.yaml" under the right group and schedule
(see Workflow.md#2-boost-collector-runner-and-yaml-schedule) so it avoids
duplicating the adjacent step's wording—use an alternative phrasing like "Either
include the command in the run_all_collectors workflow or add it to
config/boost_collector_schedule.yaml under the appropriate group and schedule
(see Workflow.md...)" while preserving the referenced symbols
run_all_collectors, config/boost_collector_schedule.yaml, and Workflow.md.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0584db93-2775-47fd-8112-ddb52ef345ff

📥 Commits

Reviewing files that changed from the base of the PR and between 2a2e0ea and 362e57a.

📒 Files selected for processing (22)
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_collector_runner/tests/test_schedule_config.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • requirements.txt

Comment thread boost_collector_runner/tasks.py Outdated
Comment thread boost_collector_runner/tests/test_commands.py Outdated
@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 6, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
github_ops/tokens.py (1)

36-44: ⚠️ Potential issue | 🟡 Minor

Strip the fallback token before validating it.

A whitespace-only GITHUB_TOKEN bypasses if not token, then returns "". That leaves get_github_client() free to construct a client with an unusable token instead of failing fast here.

🔧 Proposed fix
         if not tokens:
-            token = getattr(settings, "GITHUB_TOKEN", None) or os.environ.get(
-                "GITHUB_TOKEN", ""
-            )
+            token = (
+                getattr(settings, "GITHUB_TOKEN", None)
+                or os.environ.get("GITHUB_TOKEN", "")
+                or ""
+            ).strip()
             if not token:
                 raise ValueError(
                     "No scraping token: set GITHUB_TOKENS_SCRAPING or GITHUB_TOKEN."
                 )
-            return (token or "").strip()
+            return token
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@github_ops/tokens.py` around lines 36 - 44, The fallback GITHUB_TOKEN should
be stripped before the empty check so whitespace-only values don't pass
validation; update the token retrieval in the function containing the shown
block (the branch that executes when not tokens) to strip the fallback result
immediately (i.e., apply .strip() to the getattr/os.environ fallback) and then
raise ValueError if the stripped token is empty, ensuring get_github_client()
never receives a whitespace-only token.
🧹 Nitpick comments (3)
docs/Development_guideline.md (1)

97-103: Consider varying sentence structure for readability.

Static analysis flags that lines 99–102 have three successive sentences beginning with the same word pattern. This is a minor style nit.

📝 Optional wording adjustment
 ### Adding a new Django app to the project
 
-1. Create the app (e.g. `python manage.py startapp my_app` or add the app folder to the project).
-2. Add the app to `INSTALLED_APPS` in settings.
-3. Add a management command (e.g. in `my_app/management/commands/run_my_app.py`) that runs the app logic and returns the correct exit code.
-4. Add the command to the list run by `run_all_collectors` (workflow app) or add it to `config/boost_collector_schedule.yaml` under the right group with the right schedule (see [Workflow.md](Workflow.md#2-boost-collector-runner-and-yaml-schedule)).
+1. Create the app (e.g. `python manage.py startapp my_app` or add the app folder to the project).
+2. Register the app in `INSTALLED_APPS` in settings.
+3. Implement a management command (e.g. in `my_app/management/commands/run_my_app.py`) that runs the app logic and returns the correct exit code.
+4. Include the command in the list run by `run_all_collectors` (workflow app) or in `config/boost_collector_schedule.yaml` under the right group with the right schedule (see [Workflow.md](Workflow.md#2-boost-collector-runner-and-yaml-schedule)).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Development_guideline.md` around lines 97 - 103, The three consecutive
list items under the "Adding a new Django app to the project" section are
stylistically repetitive; rewrite one or two of the sentences to vary their
openings and flow—for example, change "Add the app to INSTALLED_APPS in
settings." to "Register the app in your settings under INSTALLED_APPS." or
combine steps by linking "Add a management command..." with how to include it in
run_all_collectors or config/boost_collector_schedule.yaml, and ensure
references to python manage.py startapp, INSTALLED_APPS, run_all_collectors,
config/boost_collector_schedule.yaml, and Workflow.md remain clear.
boost_collector_runner/schedule_config.py (2)

281-287: Normalize month/year before the monthly fallback.

day_of_month and interval_minutes are coerced up front, but Lines 316-317 pass month/year straight into the fallback branch. A caller passing CLI-style strings or month=13 will leak a raw calendar exception instead of this module’s usual ValueError contract.

🔧 Suggested fix
     if schedule_kind == "monthly":
         try:
             day_of_month = int(day_of_month)
         except (TypeError, ValueError):
             raise ValueError("day_of_month must be an integer") from None
         if not (1 <= day_of_month <= 31):
             raise ValueError("day_of_month must be 1-31")
+    month_int = year_int = None
+    if month is not None and year is not None:
+        try:
+            month_int = int(month)
+            year_int = int(year)
+        except (TypeError, ValueError):
+            raise ValueError("month and year must be integers") from None
+        if not (1 <= month_int <= 12):
+            raise ValueError("month must be 1-12")
 
     day_of_week_full = _normalize_day_of_week(day_of_week) if day_of_week else None
     day_of_month_int = int(day_of_month) if day_of_month is not None else None
@@
             if schedule_kind == "monthly":
                 task_day = int(t.get("day_of_month", 0))
-                if month is not None and year is not None:
-                    _, last_day = calendar.monthrange(year, month)
+                if month_int is not None and year_int is not None:
+                    _, last_day = calendar.monthrange(year_int, month_int)
                     effective_day = min(task_day, last_day)
                 else:
                     effective_day = task_day

Also applies to: 314-317

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/schedule_config.py` around lines 281 - 287, In the
monthly schedule branch (where schedule_kind == "monthly") normalize and
validate month and year before falling back to the calendar logic: coerce month
and year to int (wrap in try/except to raise ValueError on TypeError/ValueError)
and validate month is in 1-12 (and year is a reasonable int if you have
constraints) so callers passing CLI strings or invalid month values get this
module's ValueError contract instead of a raw calendar exception; do this
validation alongside the existing day_of_month coercion (variables:
day_of_month, month, year, interval_minutes, and the monthly fallback branch).

364-376: Returning {} here clears the whole beat schedule on config/read errors.

config/settings.py Lines 285-297 assign CELERY_BEAT_SCHEDULE = get_beat_schedule() directly, so this fallback leaves the process up with no periodic collectors at all. Consider failing fast outside local/dev, or keeping a last-known-good schedule, so a bad YAML does not become a silent scheduler outage.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 143-156: The on_release branch currently calls
has_new_boost_release() directly and will let any exception propagate; wrap the
call in a try/except that catches Exception around the has_new_boost_release()
invocation (inside the existing schedule_kind == "on_release" block) and raise a
django.core.management.base.CommandError with a clear message (including the
original exception as the cause using "from e") so failures are normalized like
the YAML/load errors; keep the existing ImportError handling and the
no-new-release logger logic intact.
- Around line 97-105: Avoid calling has_new_boost_release() unconditionally;
first call get_tasks_for_schedule("on_release", group_id=group_id, data=data) to
see if there are any on_release tasks for this scope, and only if that returned
list is non-empty invoke has_new_boost_release() and then extend tasks; update
the block around has_new_boost_release and get_tasks_for_schedule to fetch
on_release_tasks first and guard the external probe on that non-empty result.

In `@boost_collector_runner/schedule_config.py`:
- Around line 78-84: In _normalize_task, stop silently overwriting a task-level
"time": if the incoming task dict contains a "time" key (and the task is
enabled), raise a clear configuration error (e.g., ValueError) that includes the
group_id and task identifier so the caller knows the offending entry, and do not
replace it with group_default_time; preserve current behavior for disabled tasks
(return early when enabled is False). Apply the same validation logic to the
other task-normalization code paths referenced in the 106-179 region so any
task-provided "time" is rejected consistently rather than ignored.

In `@boost_library_tracker/release_check.py`:
- Around line 49-83: The function currently collapses token/API failures and "no
new release" into False; change the control flow so get_github_token failures
and any GitHub API exception return None (not False), while keeping False for
the "no new release" case and True when a new stable tag is found; specifically
update the try/except around get_github_token, the early token check, and the
outer exception handler (the GitHubAPIClient / rest_request loop that catches
Exception) to return None on operational errors, and leave the loop's normal "no
new tag" exit returning False; ensure callers (e.g., check_new_boost_release
invocation) are updated to map True→exit 0, False→exit 1, None→distinct failure
exit code.

---

Outside diff comments:
In `@github_ops/tokens.py`:
- Around line 36-44: The fallback GITHUB_TOKEN should be stripped before the
empty check so whitespace-only values don't pass validation; update the token
retrieval in the function containing the shown block (the branch that executes
when not tokens) to strip the fallback result immediately (i.e., apply .strip()
to the getattr/os.environ fallback) and then raise ValueError if the stripped
token is empty, ensuring get_github_client() never receives a whitespace-only
token.

---

Nitpick comments:
In `@boost_collector_runner/schedule_config.py`:
- Around line 281-287: In the monthly schedule branch (where schedule_kind ==
"monthly") normalize and validate month and year before falling back to the
calendar logic: coerce month and year to int (wrap in try/except to raise
ValueError on TypeError/ValueError) and validate month is in 1-12 (and year is a
reasonable int if you have constraints) so callers passing CLI strings or
invalid month values get this module's ValueError contract instead of a raw
calendar exception; do this validation alongside the existing day_of_month
coercion (variables: day_of_month, month, year, interval_minutes, and the
monthly fallback branch).

In `@docs/Development_guideline.md`:
- Around line 97-103: The three consecutive list items under the "Adding a new
Django app to the project" section are stylistically repetitive; rewrite one or
two of the sentences to vary their openings and flow—for example, change "Add
the app to INSTALLED_APPS in settings." to "Register the app in your settings
under INSTALLED_APPS." or combine steps by linking "Add a management command..."
with how to include it in run_all_collectors or
config/boost_collector_schedule.yaml, and ensure references to python manage.py
startapp, INSTALLED_APPS, run_all_collectors,
config/boost_collector_schedule.yaml, and Workflow.md remain clear.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f9b56bd8-9300-4c2f-b92d-509aa8485747

📥 Commits

Reviewing files that changed from the base of the PR and between 8ce6ec1 and ff860f9.

📒 Files selected for processing (23)
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_collector_runner/tests/test_schedule_config.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • github_ops/tokens.py
  • requirements.txt

Comment thread boost_collector_runner/management/commands/run_scheduled_collectors.py Outdated
Comment thread boost_collector_runner/management/commands/run_scheduled_collectors.py Outdated
Comment thread boost_collector_runner/schedule_config.py Outdated
Comment thread boost_library_tracker/release_check.py
@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 6, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
boost_collector_runner/tests/test_commands.py (1)

57-60: ⚠️ Potential issue | 🟠 Major

Drop the "succeeded" assertion.

Line 60 will fail even when the command behaves correctly, because the command emits "success", not "succeeded".

🔧 Minimal fix
     content = out.getvalue()
     assert "Running" in content and "run_boost_library_tracker" in content
     assert "success" in content
-    assert "succeeded" in content
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/test_commands.py` around lines 57 - 60, Remove
the incorrect assertion that checks for the string "succeeded" in the test;
locate the test where the command output is read into the variable content (the
assertions that check "Running" and "run_boost_library_tracker" and "success")
and delete the line asserting "succeeded" so the test only asserts the actual
emitted "success" value.
boost_collector_runner/management/commands/run_scheduled_collectors.py (1)

146-179: ⚠️ Potential issue | 🟠 Major

Resolve matching on_release tasks before probing the release check.

This branch still imports/calls has_new_boost_release() before it knows whether the requested scope has any on_release tasks at all. That turns a no-op run into an unnecessary external dependency and can fail on import/network errors before the YAML/no-tasks path is even reached.

🔧 Proposed fix
         else:
-            if schedule_kind == "on_release":
-                try:
-                    from boost_library_tracker.release_check import (
-                        has_new_boost_release,
-                    )
-
-                    if not has_new_boost_release():
-                        logger.info(
-                            "run_scheduled_collectors: no new Boost release; skipping on_release tasks."
-                        )
-                        return
-                except ImportError as e:
-                    raise CommandError(
-                        "on_release requires boost_library_tracker (install and add to INSTALLED_APPS)."
-                    ) from e
-                except Exception as e:
-                    raise CommandError(
-                        f"Failed to check for new Boost release: {e}"
-                    ) from e
             kwargs = dict(
                 schedule_kind=schedule_kind,
                 day_of_week=day_of_week,
                 day_of_month=day_of_month,
                 interval_minutes=interval_minutes,
                 group_id=group_id,
             )
             if schedule_kind == "monthly" and day_of_month is not None:
                 tz_name = getattr(settings, "CELERY_TIMEZONE", "UTC")
                 today = datetime.now(ZoneInfo(tz_name)).date()
                 kwargs["month"] = today.month
                 kwargs["year"] = today.year
             try:
                 tasks = get_tasks_for_schedule(**kwargs)
             except (FileNotFoundError, ValueError, yaml.YAMLError) as e:
                 raise CommandError(str(e)) from e
+            if schedule_kind == "on_release":
+                if not tasks:
+                    return
+                try:
+                    from boost_library_tracker.release_check import (
+                        has_new_boost_release,
+                    )
+
+                    if not has_new_boost_release():
+                        logger.info(
+                            "run_scheduled_collectors: no new Boost release; skipping on_release tasks."
+                        )
+                        return
+                except ImportError as e:
+                    raise CommandError(
+                        "on_release requires boost_library_tracker (install and add to INSTALLED_APPS)."
+                    ) from e
+                except Exception as e:
+                    raise CommandError(
+                        f"Failed to check for new Boost release: {e}"
+                    ) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`
around lines 146 - 179, The code calls/imports has_new_boost_release() as soon
as schedule_kind == "on_release" before building/inspecting the task list,
causing unnecessary dependency import/failure even when no on_release tasks
exist; refactor so you first call get_tasks_for_schedule(...) (using the kwargs
construction and the get_tasks_for_schedule symbol) to see if any tasks with
schedule_kind "on_release" exist and only then import and call
boost_library_tracker.release_check.has_new_boost_release() if those tasks are
present; keep the existing exception handling for ImportError/Exception around
the has_new_boost_release() call and ensure the early return/skip logic remains
unchanged when no new release is detected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 60-65: The help text for the "--group" argparse option is stale
because handle() now forwards group_id for interval and on_release runs; update
the parser.add_argument(help=...) string for "--group" to accurately describe
that it limits tasks by group across all run modes (including interval and
on_release), or alternatively change handle() to reject using --group with
interval/on_release; locate the "--group" argument definition in
run_scheduled_collectors.py and either broaden the help text to state it applies
to daily/weekly/monthly/interval/on_release, or add validation in handle() to
raise a clear error when group_id is supplied with interval or on_release so the
CLI behavior remains unambiguous.

In `@boost_collector_runner/schedule_config.py`:
- Around line 67-75: The _parse_time function currently allows non-numeric parts
like "aa:10" to raise a raw int() ValueError instead of the canonical validation
message; wrap the integer conversions for h and m in a try/except (or validate
parts with str.isdigit()/regex) inside _parse_time to catch conversion errors
and re-raise a ValueError with the same user-facing message ("Invalid time
{s!r}; use HH:MM"), keeping the existing check for range (0-23, 0-59) and the
initial parts length validation so all invalid inputs consistently produce the
intended validation error.

In `@boost_collector_runner/tasks.py`:
- Around line 50-64: The catch for SystemExit in run_scheduled_collectors_task
should not re-raise SystemExit (which is a BaseException) because it escapes
Celery's normal failure handling; instead, after computing code and logging via
logger.error, raise a regular exception (e.g., RuntimeError or Exception) that
includes the exit code and chain the original SystemExit (use "from e") so the
traceback is preserved. Update the except SystemExit as e block in
run_scheduled_collectors_task to log the error as currently done and then raise
a new RuntimeError(f"run_scheduled_collectors exited with code {code}") from e
(do not re-raise the SystemExit).

In `@config/boost_collector_schedule.yaml`:
- Around line 9-11: The current schedule overrides the default tasks by adding
args ["--task","monitor_content"] to the run_boost_usage_tracker entry, which
prevents monitor_stars from running; either remove the args override so
run_boost_usage_tracker runs its default daily jobs, or add a second daily entry
for the same command with args ["--task","monitor_stars"] to restore the missing
monitor_stars run (refer to the command name run_boost_usage_tracker and task
names monitor_content and monitor_stars).

In `@config/settings.py`:
- Around line 287-297: The current except block swallows all import/YAML errors
and sets CELERY_BEAT_SCHEDULE = {} unconditionally; change it so that after
logging the exception you only fall back to an empty schedule for explicit
dev/test environments and re-raise (or raise a ConfigurationError) in
normal/production runs. Locate the try importing
boost_collector_runner.schedule_config and the CELERY_BEAT_SCHEDULE assignment
(symbols: get_beat_schedule, CELERY_BEAT_SCHEDULE) and replace the unconditional
fallback with an environment check (e.g., DEBUG/ENV/SETTINGS flag); if not in a
dev/test mode re-raise the original exception so startup fails instead of
silently running with no schedule. Ensure the logged message still records the
full exception details before re-raising.

---

Duplicate comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 146-179: The code calls/imports has_new_boost_release() as soon as
schedule_kind == "on_release" before building/inspecting the task list, causing
unnecessary dependency import/failure even when no on_release tasks exist;
refactor so you first call get_tasks_for_schedule(...) (using the kwargs
construction and the get_tasks_for_schedule symbol) to see if any tasks with
schedule_kind "on_release" exist and only then import and call
boost_library_tracker.release_check.has_new_boost_release() if those tasks are
present; keep the existing exception handling for ImportError/Exception around
the has_new_boost_release() call and ensure the early return/skip logic remains
unchanged when no new release is detected.

In `@boost_collector_runner/tests/test_commands.py`:
- Around line 57-60: Remove the incorrect assertion that checks for the string
"succeeded" in the test; locate the test where the command output is read into
the variable content (the assertions that check "Running" and
"run_boost_library_tracker" and "success") and delete the line asserting
"succeeded" so the test only asserts the actual emitted "success" value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 13c2832e-bd55-4078-80e9-3d4ee0ee4a6e

📥 Commits

Reviewing files that changed from the base of the PR and between 8ce6ec1 and 8268c0d.

📒 Files selected for processing (23)
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_collector_runner/tests/test_schedule_config.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • github_ops/tokens.py
  • requirements.txt

Comment thread boost_collector_runner/schedule_config.py
Comment thread boost_collector_runner/tasks.py
Comment thread config/boost_collector_schedule.yaml Outdated
Comment thread config/settings.py
@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 6, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
boost_library_tracker/release_check.py (1)

55-57: ⚠️ Potential issue | 🟡 Minor

Harden token guard to reject whitespace-only values.

The current guard if not token: catches empty strings but allows whitespace-only tokens (e.g., " ") to pass through, which would cause a downstream authentication failure with less clear error context.

Per the learning: get_github_token(use="scraping") can return whitespace-only strings when GITHUB_TOKENS_SCRAPING contains such values.

🔧 Proposed fix
-    if not token:
+    if not token or not token.strip():
         logger.warning("No GitHub token; cannot check for new Boost release")
         return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_tracker/release_check.py` around lines 55 - 57, The guard that
checks the GitHub token currently allows whitespace-only values; update the
check around the token obtained from get_github_token(use="scraping") so you
reject tokens that are empty or only whitespace (e.g., use token = token.strip()
then verify token is truthy) and log the same warning via logger.warning("No
GitHub token; cannot check for new Boost release") before returning False;
reference the token variable and the get_github_token call to locate where to
trim and validate the value.
🧹 Nitpick comments (1)
conftest.py (1)

39-39: Remove the empty pytest plugin registration until fixtures are defined.

boost_collector_runner.tests.fixtures is registered in pytest_plugins but contains only a docstring with no fixture or helper definitions. This adds a global import with no benefit. Either populate the module with shared fixtures or remove it from the plugin list.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conftest.py` at line 39, pytest_plugins currently includes
"boost_collector_runner.tests.fixtures" but that module contains only a
docstring; remove that string from the pytest_plugins list in conftest.py to
avoid an unnecessary global import, or alternatively add actual fixture
definitions to the module (define pytest fixtures/functions in
boost_collector_runner.tests.fixtures) and then keep the registration; refer to
the pytest_plugins variable and the module name
"boost_collector_runner.tests.fixtures" when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@boost_library_tracker/release_check.py`:
- Around line 55-57: The guard that checks the GitHub token currently allows
whitespace-only values; update the check around the token obtained from
get_github_token(use="scraping") so you reject tokens that are empty or only
whitespace (e.g., use token = token.strip() then verify token is truthy) and log
the same warning via logger.warning("No GitHub token; cannot check for new Boost
release") before returning False; reference the token variable and the
get_github_token call to locate where to trim and validate the value.

---

Nitpick comments:
In `@conftest.py`:
- Line 39: pytest_plugins currently includes
"boost_collector_runner.tests.fixtures" but that module contains only a
docstring; remove that string from the pytest_plugins list in conftest.py to
avoid an unnecessary global import, or alternatively add actual fixture
definitions to the module (define pytest fixtures/functions in
boost_collector_runner.tests.fixtures) and then keep the registration; refer to
the pytest_plugins variable and the module name
"boost_collector_runner.tests.fixtures" when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6c437a4f-643e-4d20-b6d2-e1dadaaae904

📥 Commits

Reviewing files that changed from the base of the PR and between 8ce6ec1 and f60bf08.

📒 Files selected for processing (23)
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_collector_runner/tests/test_schedule_config.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • github_ops/tokens.py
  • requirements.txt

Copy link
Copy Markdown
Collaborator

@jonathanMLDev jonathanMLDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should fix some logics

Comment thread github_ops/tokens.py
Comment thread config/boost_collector_schedule.yaml Outdated
Comment thread boost_collector_runner/management/commands/run_scheduled_collectors.py Outdated
Comment thread boost_collector_runner/schedule_config.py
Comment thread boost_collector_runner/schedule_config.py
@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 16, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
boost_collector_runner/management/commands/run_scheduled_collectors.py (1)

117-137: ⚠️ Potential issue | 🟡 Minor

Skip the release probe when this run has no on_release tasks.

If tasks contains only daily/weekly/monthly work, this unconditional import/probe still adds a GitHub dependency to the batch and can fail the whole run even though there is nothing release-triggered to execute.

Suggested fix
         run_on_release_tasks = False
+        has_on_release_tasks = any(
+            task.get("schedule") == "on_release" for _, task in tasks
+        )

-        if schedule_kind == "on_release" or schedule_kind == "default":
+        if has_on_release_tasks:
             try:
                 from boost_library_tracker.release_check import (
                     has_new_boost_release,
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`
around lines 117 - 137, Only import and call
boost_library_tracker.release_check.has_new_boost_release when there are
actually collectors that need an on_release probe: inspect the tasks collection
for any entry whose schedule requires "on_release" (or whose schedule_kind ==
"on_release"/"default") and if none exist skip the import and leave
run_on_release_tasks = False; if at least one exists then perform the try/except
import and call to has_new_boost_release (preserving the existing
ImportError/Exception handling and CommandError messages). Use the existing
variables schedule_kind, run_on_release_tasks and the tasks iterable to locate
where to add this conditional check so the GitHub-dependent import is avoided
when no on_release work is present.
🧹 Nitpick comments (4)
boost_library_tracker/management/commands/check_new_boost_release.py (1)

1-5: Consider aligning wording with tag-based detection semantics.

This command currently says “release” in docs/help, while the underlying check is explicitly /tags-based. Updating phrasing to “stable Boost tag” would reduce operator ambiguity.

Also applies to: 15-20

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_tracker/management/commands/check_new_boost_release.py` around
lines 1 - 5, Update the docstring and any user-facing help text in the
check_new_boost_release management command to refer to "stable Boost tag"
instead of "release" (the underlying check queries /tags), so change the
module-level docstring and the Command.help/description strings used by the
management command (e.g., any "check_new_boost_release" Command class help or
usage text) to mention "stable Boost tag" and adjust the exit-condition wording
accordingly (e.g., "Exits 0 if a new stable Boost tag exists (GitHub /tags)..."
vs "else 1").
config/boost_collector_schedule.yaml.example (1)

1-10: Clarify timezone semantics in example comments.

Line 2 states default_time is in UTC, but docs/Workflow.md line 45 says "(project timezone America/Los_Angeles)". While the code does convert UTC times to CELERY_TIMEZONE, the documentation could confuse operators about whether to specify times in UTC or local timezone.

Consider aligning the example comment with the conversion behavior:

-# default_time is in UTC (24h "HH:MM"). The app converts to CELERY_TIMEZONE when building the Beat schedule.
+# default_time is in UTC (24h "HH:MM"). The app converts to CELERY_TIMEZONE (America/Los_Angeles) when building the Beat schedule.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/boost_collector_schedule.yaml.example` around lines 1 - 10, Update the
example header to clearly state that default_time is specified in UTC and will
be converted to the app's CELERY_TIMEZONE when building the Beat schedule;
mention that operators should specify times in UTC (or, if you prefer the other
UX, change the comment to say "specify times in project/local timezone and the
app will convert from CELERY_TIMEZONE to UTC") so the behavior around
default_time and CELERY_TIMEZONE matches docs/Workflow.md and avoids ambiguity
for operators.
boost_collector_runner/tests/test_commands.py (1)

113-117: Strengthen the assertion for group batch test.

The assertion assert "run_foo" in call_names or "run_bar" in call_names is weak — it passes if only one task runs. Since this tests a group batch that should run daily tasks (and potentially weekly if today matches), consider asserting that at least run_foo (the daily task) is always called:

-    assert "run_foo" in call_names or "run_bar" in call_names
+    # Daily task should always run in a group batch
+    assert "run_foo" in call_names
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/test_commands.py` around lines 113 - 117, The
test currently allows either "run_foo" or "run_bar" to have been called which is
too permissive for a group batch that should always run the daily task; update
the assertion to require that "run_foo" appears in call_names (derived from
mock_call.call_args_list) — e.g. replace the or-check with assert "run_foo" in
call_names — and keep any additional assertions about "run_bar" optional if
needed.
boost_collector_runner/tests/test_schedule_config.py (1)

642-652: Unused loop variable time_str.

The loop unpacks time_str but never uses it. Rename to _time_str to indicate it's intentionally unused.

-    for group_id, time_str, key_suffix in [
+    for group_id, _time_str, key_suffix in [
         ("github", "16-10", "16-10"),
         ("slack", "16-30", "16-30"),
     ]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/test_schedule_config.py` around lines 642 - 652,
The loop unpacks a variable named time_str that is never used; update the loop
to rename time_str to _time_str (or simply _ ) to indicate it's intentionally
unused wherever the tuple is unpacked (the for loop that currently reads for
group_id, time_str, key_suffix in [...] and uses group_key, schedule,
DEFAULT_GROUP_BATCH_SCHEDULE_KIND, group_id) so linters won't flag the unused
variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 97-111: The current kwargs population patch in
run_scheduled_collectors.py sets month/year for schedule_kind "default" but
doesn't populate day_of_week or day_of_month, causing weekly/monthly collectors
to be dropped; update the kwargs block in the function that builds args for
get_tasks_for_schedule so that when schedule_kind == "default" you also set
kwargs["day_of_week"] and kwargs["day_of_month"] (using today's values computed
from today = datetime.now(ZoneInfo(tz_name)).date()) in addition to month and
year so weekly and monthly selectors are preserved for the "default" batch.

In `@boost_collector_runner/schedule_config.py`:
- Around line 1-13: The code currently converts YAML default_time from UTC to
CELERY_TIMEZONE via _utc_time_to_celery_tz(), causing day-shift bugs for zones
like America/Los_Angeles; instead treat the YAML default_time as a wall-clock
time in CELERY_TIMEZONE end-to-end: remove _utc_time_to_celery_tz(), update the
module docstring to state default_time is in CELERY_TIMEZONE (not UTC), and have
the functions that build the Beat schedule (where default_time is read and where
Beat entries are created) use default_time directly without converting; ensure
run_scheduled_collectors continues to decide weekly/monthly eligibility using
the same CELERY_TIMEZONE local date so no day offsets occur.
- Around line 29-36: SCHEDULE_TYPES currently includes
DEFAULT_GROUP_BATCH_SCHEDULE_KIND (the internal "default" sentinel), so
_validate_task() accepts user YAML values like "default"; change validation so
the internal sentinel is not allowed for user-provided schedules: either remove
DEFAULT_GROUP_BATCH_SCHEDULE_KIND from the public SCHEDULE_TYPES used for YAML
validation, or add a separate USER_SCHEDULE_TYPES (excluding
DEFAULT_GROUP_BATCH_SCHEDULE_KIND) and switch _validate_task() to validate
against that; update any other validation sites (the similar check around the
block that corresponds to lines ~141-155) to use the user-facing set so
"default" is rejected.

In `@docs/Development_guideline.md`:
- Line 8: Update the examples to use the Beat-equivalent invocation and clarify
the YAML-driven production path: replace the single example `python manage.py
run_scheduled_collectors --schedule daily` with the per-group Beat-style flags
(`--schedule default --group <group_id>` and separate `--schedule interval
--interval-minutes ...`) and mention that `boost_collector_runner` uses
`config/boost_collector_schedule.yaml`; also adjust guidance around
`workflow`/`run_all_collectors` and `run_boost_library_tracker` to note they
differ from how Celery Beat invokes per-group schedules.

---

Duplicate comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 117-137: Only import and call
boost_library_tracker.release_check.has_new_boost_release when there are
actually collectors that need an on_release probe: inspect the tasks collection
for any entry whose schedule requires "on_release" (or whose schedule_kind ==
"on_release"/"default") and if none exist skip the import and leave
run_on_release_tasks = False; if at least one exists then perform the try/except
import and call to has_new_boost_release (preserving the existing
ImportError/Exception handling and CommandError messages). Use the existing
variables schedule_kind, run_on_release_tasks and the tasks iterable to locate
where to add this conditional check so the GitHub-dependent import is avoided
when no on_release work is present.

---

Nitpick comments:
In `@boost_collector_runner/tests/test_commands.py`:
- Around line 113-117: The test currently allows either "run_foo" or "run_bar"
to have been called which is too permissive for a group batch that should always
run the daily task; update the assertion to require that "run_foo" appears in
call_names (derived from mock_call.call_args_list) — e.g. replace the or-check
with assert "run_foo" in call_names — and keep any additional assertions about
"run_bar" optional if needed.

In `@boost_collector_runner/tests/test_schedule_config.py`:
- Around line 642-652: The loop unpacks a variable named time_str that is never
used; update the loop to rename time_str to _time_str (or simply _ ) to indicate
it's intentionally unused wherever the tuple is unpacked (the for loop that
currently reads for group_id, time_str, key_suffix in [...] and uses group_key,
schedule, DEFAULT_GROUP_BATCH_SCHEDULE_KIND, group_id) so linters won't flag the
unused variable.

In `@boost_library_tracker/management/commands/check_new_boost_release.py`:
- Around line 1-5: Update the docstring and any user-facing help text in the
check_new_boost_release management command to refer to "stable Boost tag"
instead of "release" (the underlying check queries /tags), so change the
module-level docstring and the Command.help/description strings used by the
management command (e.g., any "check_new_boost_release" Command class help or
usage text) to mention "stable Boost tag" and adjust the exit-condition wording
accordingly (e.g., "Exits 0 if a new stable Boost tag exists (GitHub /tags)..."
vs "else 1").

In `@config/boost_collector_schedule.yaml.example`:
- Around line 1-10: Update the example header to clearly state that default_time
is specified in UTC and will be converted to the app's CELERY_TIMEZONE when
building the Beat schedule; mention that operators should specify times in UTC
(or, if you prefer the other UX, change the comment to say "specify times in
project/local timezone and the app will convert from CELERY_TIMEZONE to UTC") so
the behavior around default_time and CELERY_TIMEZONE matches docs/Workflow.md
and avoids ambiguity for operators.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1f517aee-cef8-48a7-9bf7-645f270bfd22

📥 Commits

Reviewing files that changed from the base of the PR and between 8ce6ec1 and b86ed2e.

📒 Files selected for processing (24)
  • .gitignore
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_collector_runner/tests/test_schedule_config.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml.example
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • github_ops/tokens.py
  • requirements.txt

Comment thread boost_collector_runner/schedule_config.py
Comment thread boost_collector_runner/schedule_config.py Outdated
Comment thread docs/Development_guideline.md Outdated
@snowfox1003
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 16, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
boost_collector_runner/management/commands/run_scheduled_collectors.py (1)

123-141: ⚠️ Potential issue | 🟠 Major

Gate release probing on presence of on_release tasks.

has_new_boost_release() is called even when the resolved task set has no on_release entries. That adds avoidable external I/O and can fail the run for no functional reason.

🔧 Proposed fix
-        run_on_release_tasks = False
+        run_on_release_tasks = False
+        has_on_release_tasks = any(
+            task.get("schedule") == "on_release" for _, task in tasks
+        )

-        if schedule_kind == "on_release" or schedule_kind == "default":
+        if (schedule_kind == "on_release" or schedule_kind == "default") and has_on_release_tasks:
             try:
                 from boost_library_tracker.release_check import (
                     has_new_boost_release,
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`
around lines 123 - 141, The code calls has_new_boost_release() whenever
schedule_kind is "on_release" or "default", which triggers unnecessary I/O even
if there are no on_release tasks; change the logic so you first check the
resolved task collection for any tasks with schedule_kind == "on_release" (e.g.
any(task.schedule_kind == "on_release" for task in resolved_tasks) or the
appropriate variable holding resolved tasks) and only import and call
has_new_boost_release() if that check is true; if there are no on_release tasks,
set run_on_release_tasks = False and avoid the import/call, otherwise keep the
existing try/except around has_new_boost_release() and set run_on_release_tasks
accordingly.
boost_library_tracker/release_check.py (1)

55-57: ⚠️ Potential issue | 🟡 Minor

Reject whitespace-only tokens before creating GitHubAPIClient.

if not token misses values like " ", which then fail later with less clear auth errors.

🛡️ Proposed fix
-    if not token:
+    if not token or not token.strip():
         logger.warning("No GitHub token; cannot check for new Boost release")
         return False
+    token = token.strip()

Based on learnings: get_github_token(use="scraping") can return empty/whitespace strings when cycling GITHUB_TOKENS_SCRAPING entries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_tracker/release_check.py` around lines 55 - 57, The current
check `if not token` in release_check.py misses whitespace-only tokens returned
by get_github_token(use="scraping"); update the logic to reject tokens that are
empty after trimming (e.g., token = token.strip() or test token.strip()) before
creating the GitHubAPIClient so you log the same warning and return False for
whitespace-only values, and ensure the value passed into GitHubAPIClient is the
trimmed token.
🧹 Nitpick comments (3)
docs/Development_guideline.md (1)

102-102: Optional wording polish: reduce repeated sentence starts (“Add … Add …”).

Low-impact doc readability improvement only.

✍️ Suggested wording tweak
-4. Add the command to the list run by `run_all_collectors` (workflow app) or add it to `config/boost_collector_schedule.yaml` under the right group with the right schedule (see [Workflow.md](Workflow.md#2-boost-collector-runner-and-yaml-schedule)).
+4. Register the command either in the `run_all_collectors` list (workflow app) or in `config/boost_collector_schedule.yaml` under the appropriate group/schedule (see [Workflow.md](Workflow.md#2-boost-collector-runner-and-yaml-schedule)).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Development_guideline.md` at line 102, The sentence repeats the phrase
"Add …" twice; reword the guideline to avoid repeated sentence starts by
combining or varying phrasing — for example, reference the run_all_collectors
list and the alternative config/boost_collector_schedule.yaml path in one
sentence (mentioning run_all_collectors and config/boost_collector_schedule.yaml
and linking to Workflow.md#2-boost-collector-runner-and-yaml-schedule) so the
instruction reads smoothly without duplicate "Add" openings.
boost_collector_runner/tests/fixtures.py (1)

1-3: Empty fixtures module registered in pytest_plugins.

This file is registered in conftest.py via pytest_plugins but currently contains no fixtures. If fixtures are planned for later, consider adding a TODO comment; otherwise, you may defer registering this module until fixtures are actually needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/fixtures.py` around lines 1 - 3, The module
boost_collector_runner/tests/fixtures.py is an empty pytest_plugins-registered
fixtures file; either add a brief TODO comment explaining that fixtures will be
added later (e.g., "TODO: add shared pytest fixtures for
boost_collector_runner") or remove it from the pytest_plugins list in
conftest.py to avoid registering an empty module—edit the fixtures.py module (or
conftest.py registration) accordingly to reflect the intended state.
boost_collector_runner/tests/test_schedule_config.py (1)

642-645: Rename unused loop variable to avoid B007 lint noise.

time_str is declared but unused in the loop tuple.

♻️ Proposed cleanup
-    for group_id, time_str, key_suffix in [
+    for group_id, _time_str, key_suffix in [
         ("github", "16-10", "16-10"),
         ("slack", "16-30", "16-30"),
     ]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_collector_runner/tests/test_schedule_config.py` around lines 642 - 645,
Rename the unused loop variable time_str to a throwaway name (e.g., _ or
_time_str) in the loop that unpacks (group_id, time_str, key_suffix) so the
tuple becomes (group_id, _time_str, key_suffix) or (group_id, _, key_suffix);
update the loop where group_id and key_suffix are used (symbols: group_id,
time_str, key_suffix) to avoid B007 lint noise while keeping behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/Workflow.md`:
- Line 45: Update the docs line describing groups/default_time to state that
default_time is interpreted in UTC (not America/Los_Angeles); locate the
sentence referencing "groups" and "default_time" in Workflow.md and replace the
timezone note so it says required 24h "HH:MM" in UTC (project timezone mention
removed or clarified) to match the scheduler contract and avoid operator
misconfiguration.

---

Duplicate comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 123-141: The code calls has_new_boost_release() whenever
schedule_kind is "on_release" or "default", which triggers unnecessary I/O even
if there are no on_release tasks; change the logic so you first check the
resolved task collection for any tasks with schedule_kind == "on_release" (e.g.
any(task.schedule_kind == "on_release" for task in resolved_tasks) or the
appropriate variable holding resolved tasks) and only import and call
has_new_boost_release() if that check is true; if there are no on_release tasks,
set run_on_release_tasks = False and avoid the import/call, otherwise keep the
existing try/except around has_new_boost_release() and set run_on_release_tasks
accordingly.

In `@boost_library_tracker/release_check.py`:
- Around line 55-57: The current check `if not token` in release_check.py misses
whitespace-only tokens returned by get_github_token(use="scraping"); update the
logic to reject tokens that are empty after trimming (e.g., token =
token.strip() or test token.strip()) before creating the GitHubAPIClient so you
log the same warning and return False for whitespace-only values, and ensure the
value passed into GitHubAPIClient is the trimmed token.

---

Nitpick comments:
In `@boost_collector_runner/tests/fixtures.py`:
- Around line 1-3: The module boost_collector_runner/tests/fixtures.py is an
empty pytest_plugins-registered fixtures file; either add a brief TODO comment
explaining that fixtures will be added later (e.g., "TODO: add shared pytest
fixtures for boost_collector_runner") or remove it from the pytest_plugins list
in conftest.py to avoid registering an empty module—edit the fixtures.py module
(or conftest.py registration) accordingly to reflect the intended state.

In `@boost_collector_runner/tests/test_schedule_config.py`:
- Around line 642-645: Rename the unused loop variable time_str to a throwaway
name (e.g., _ or _time_str) in the loop that unpacks (group_id, time_str,
key_suffix) so the tuple becomes (group_id, _time_str, key_suffix) or (group_id,
_, key_suffix); update the loop where group_id and key_suffix are used (symbols:
group_id, time_str, key_suffix) to avoid B007 lint noise while keeping behavior
unchanged.

In `@docs/Development_guideline.md`:
- Line 102: The sentence repeats the phrase "Add …" twice; reword the guideline
to avoid repeated sentence starts by combining or varying phrasing — for
example, reference the run_all_collectors list and the alternative
config/boost_collector_schedule.yaml path in one sentence (mentioning
run_all_collectors and config/boost_collector_schedule.yaml and linking to
Workflow.md#2-boost-collector-runner-and-yaml-schedule) so the instruction reads
smoothly without duplicate "Add" openings.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8d02ea55-4a04-4c47-a14d-4faa826c60c7

📥 Commits

Reviewing files that changed from the base of the PR and between 656acc4 and 37ee484.

📒 Files selected for processing (24)
  • .gitignore
  • boost_collector_runner/__init__.py
  • boost_collector_runner/admin.py
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/__init__.py
  • boost_collector_runner/management/commands/__init__.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/__init__.py
  • boost_collector_runner/models.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/__init__.py
  • boost_collector_runner/tests/fixtures.py
  • boost_collector_runner/tests/test_commands.py
  • boost_collector_runner/tests/test_schedule_config.py
  • boost_library_tracker/management/commands/check_new_boost_release.py
  • boost_library_tracker/release_check.py
  • config/boost_collector_schedule.yaml.example
  • config/settings.py
  • conftest.py
  • docs/Development_guideline.md
  • docs/Workflow.md
  • github_ops/tokens.py
  • requirements.txt

Comment thread docs/Workflow.md Outdated
@snowfox1003 snowfox1003 merged commit 7d428ae into cppalliance:develop Mar 16, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants