Add clang_github_tracker app in boost-data-collector project #82 by leostar0412 · Pull Request #84 · cppalliance/boost-data-collector

leostar0412 · 2026-03-02T21:35:38Z

Fetches GitHub commits, issues, and PRs for one repo (default: llvm/llvm-project) and saves to raw JSON under workspace/raw/github_activity_tracker/<owner>/<repo>/ — no DB.
Repo is configurable via CLANG_GITHUB_OWNER and CLANG_GITHUB_REPO in .env.
Resume uses workspace/clang_github_activity/state.json; if missing or invalid, state is recomputed from raw (or written with null dates) with one retry, then the command exits on error.
Run: python manage.py run_clang_github_tracker [--dry-run] [--from-date ISO] [--to-date ISO].

Summary by CodeRabbit

New Features
- Added Clang GitHub tracker to fetch and store raw GitHub activity and a management command with --dry-run, --from-date, and --to-date.
Configuration
- New settings and environment variables to configure the GitHub owner and repository; tracker app enabled.
Documentation
- Workspace docs updated to describe raw data layout and state file location.
Tests
- Added tests for command behavior and state parsing.
Chores
- Included tracker in the collectors execution list.

coderabbitai · 2026-03-02T21:35:46Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9608e2c8-0d04-4428-833e-c447bec68ccb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new clang_github_tracker Django app to fetch and store raw GitHub activity for a configurable owner/repo, including workspace path helpers, state management, sync logic, a management command, tests, settings, and documentation.

Changes

Cohort / File(s)	Summary
Configuration & Env `\.env.example`, `config/settings.py`, `config/test_settings.py`	Added CLANG_GITHUB_OWNER and CLANG_GITHUB_REPO env/settings (defaults: `llvm` / `llvm-project`) and registered `clang_github_tracker` in INSTALLED_APPS.
App bootstrap `clang_github_tracker/__init__.py`, `clang_github_tracker/management/__init__.py`, `clang_github_tracker/management/commands/__init__.py`	Added package docstrings and package initializers for the new app and its management commands.
Workspace paths `clang_github_tracker/workspace.py`	New helpers to compute and validate workspace root, raw repo dir, and state path; creates directories as needed.
State management `clang_github_tracker/state.py`	New state layer: load/save state.json, ISO parsing, compute state from raw files, ensure state file exists, and resolve per-entity start/end dates.
Sync logic `clang_github_tracker/sync_raw.py`	New sync implementation to fetch commits/issues/PRs from GitHub, save raw items, update state, and return counts.
Management command `clang_github_tracker/management/commands/run_clang_github_tracker.py`	New Django command with `--dry-run`, `--from-date`, `--to-date`; resolves dates via state and calls `sync_raw_only`.
Tests `clang_github_tracker/tests/__init__.py`, `clang_github_tracker/tests/test_commands.py`, `clang_github_tracker/tests/test_state.py`	Added tests for command behavior (dry-run, date filters, sync invocation) and state parsing/computation.
Docs & integration `docs/Workspace.md`, `workflow/management/commands/run_all_collectors.py`	Documented raw/github_activity_tracker layout and added `run_clang_github_tracker` to collector run order.

Sequence Diagram

sequenceDiagram
    participant User
    participant Cmd as Management Command
    participant State as State Manager
    participant Sync as Sync Engine
    participant GH as GitHub API
    participant Raw as Raw Storage

    User->>Cmd: run_clang_github_tracker [--from-date][--to-date][--dry-run]
    Cmd->>State: resolve_start_end_dates(from,to)
    State->>State: ensure_state_file_exists()
    State->>Raw: compute_state_from_raw()
    Raw-->>State: latest timestamps
    State-->>Cmd: (start_commit, start_issue, start_pr, end_date)

    alt dry-run
        Cmd-->>User: log resolved dates and dry-run
    else
        Cmd->>Sync: sync_raw_only(start_*, end_date)
        Sync->>GH: fetch commits/issues/PRs
        GH-->>Sync: activity payloads
        Sync->>Raw: save raw items (commits/issues/prs)
        Sync->>State: save_state(latest observed dates)
        State->>Raw: persist state.json
        Sync-->>Cmd: (commits_saved, issues_saved, prs_saved)
        Cmd-->>User: log sync results
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

#20-add boost-usage-tracker app #71: Also updates COLLECTOR_COMMANDS in workflow/management/commands/run_all_collectors.py to register a new collector.
#22-add boost-mailinglist-tracker app #72: Adds another collector registration to the same COLLECTOR_COMMANDS list.

Suggested reviewers

snowfox1003

Poem

🐇 A rabbit hops through workspace lanes,
Saving commits, PRs, and issue refrains.
State files whisper what we've just seen,
Raw JSON rows, tidy and clean.
Hooray for trackers—let history be gleaned!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly summarizes the main change: adding a new clang_github_tracker app to the boost-data-collector project, which aligns with the changeset's primary objective.
Docstring Coverage	✅ Passed	Docstring coverage is 96.43% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

clang_github_tracker/workspace.py (1)

44-50: Validate owner and repo path segments before composing filesystem paths.

These values are environment-driven; validating against separators/relative traversal prevents accidental directory escape outside the intended workspace subtree.

Suggested hardening

@@
 _APP_SLUG = "clang_github_activity"
 _RAW_APP_SLUG = "github_activity_tracker"
@@
 OWNER = settings.CLANG_GITHUB_OWNER
 REPO = settings.CLANG_GITHUB_REPO
+
+
+def _validate_segment(value: str, field: str) -> str:
+    if (
+        not value
+        or value in {".", ".."}
+        or "/" in value
+        or "\\" in value
+        or ".." in value
+    ):
+        raise ValueError(f"Invalid {field} path segment: {value!r}")
+    return value
@@
 def get_raw_repo_dir(
     owner: str = OWNER, repo: str = REPO, *, create: bool = True
 ) -> Path:
     """Return workspace/raw/github_activity_tracker/<owner>/<repo>/; creates dirs if missing when create=True."""
-    path = get_raw_root() / owner / repo
+    owner = _validate_segment(owner, "owner")
+    repo = _validate_segment(repo, "repo")
+    path = get_raw_root() / owner / repo

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/workspace.py` around lines 44 - 50, The get_raw_repo_dir
function builds filesystem paths from environment-driven owner and repo;
validate and sanitize these inputs to prevent path-separator or traversal
attacks by: (1) reject or normalize any owner/repo containing path separators or
traversal tokens (e.g., '/', '\\', '..') — raise ValueError if invalid; (2)
construct the candidate path using get_raw_root() / owner / repo, then resolve
it (Path.resolve(strict=False)) and ensure the resolved path is inside the
resolved get_raw_root() (use Path.is_relative_to or compare parents) to prevent
escape; (3) only call path.mkdir(parents=True, exist_ok=True) after validation.
Reference get_raw_repo_dir and get_raw_root in your changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clang_github_tracker/management/commands/run_clang_github_tracker.py`:
- Around line 80-83: The handler currently returns early when
clang_state.resolve_start_end_dates(from_date, to_date) yields None, causing a
zero-exit success; change this to signal failure by raising a Django
CommandError (or calling sys.exit(1)) with a clear message instead of returning.
Locate the management command's handle method (class Command in
run_clang_github_tracker.py) and replace the "if resolved is None: return"
branch with a failure path like "if resolved is None: raise CommandError('Failed
to resolve start/end dates: ...')" so schedulers/monitoring receive a non-zero
exit.

In `@clang_github_tracker/state.py`:
- Around line 57-63: load_state is treating an empty dict as a valid state
(returning a dict of keys with None) which prevents ensure_state_file_exists
from triggering recompute; update load_state to treat empty or missing-key
objects as invalid by returning {} unless the loaded dict is non-empty and
contains at least one of KEY_LAST_COMMIT_DATE, KEY_LAST_ISSUE_DATE, or
KEY_LAST_PR_DATE with a non-None value. Concretely, in load_state check that
isinstance(data, dict) and that any(data.get(KEY_LAST_COMMIT_DATE),
data.get(KEY_LAST_ISSUE_DATE), data.get(KEY_LAST_PR_DATE)) is truthy before
returning the mapped dict; otherwise return {} so ensure_state_file_exists will
recompute from raw.

In `@clang_github_tracker/sync_raw.py`:
- Around line 98-148: The try covers all phases so a failure during issues/PRs
loses already-fetched commits; after each successful phase (commits, issues,
prs) call clang_state.save_state(...) with the corresponding
latest_commit/latest_issue/latest_pr (use merge=True) to persist progress;
locate the loops using fetcher.fetch_commits_from_github,
fetcher.fetch_issues_from_github, fetcher.fetch_pull_requests_from_github and
the save_* helpers (save_commit_raw_source, save_issue_raw_source,
save_pr_raw_source) and insert a clang_state.save_state call immediately after
each loop using the current latest_* values so partial progress is preserved.

In `@docs/Workspace.md`:
- Around line 16-17: Update the workspace layout note under
github_activity_tracker to clarify that the tracked repository is configurable:
mention the CLANG_GITHUB_OWNER and CLANG_GITHUB_REPO environment variables (or
config keys) as controlling which <owner>/<repo> is stored, and replace the
hardcoded "llvm/llvm-project" wording with an explanatory phrase like "e.g.,
llvm/llvm-project — configurable via CLANG_GITHUB_OWNER/CLANG_GITHUB_REPO" so
operators know the repo can be changed; update the sentence that references
llvm/llvm-project accordingly in docs/Workspace.md.

---

Nitpick comments:
In `@clang_github_tracker/workspace.py`:
- Around line 44-50: The get_raw_repo_dir function builds filesystem paths from
environment-driven owner and repo; validate and sanitize these inputs to prevent
path-separator or traversal attacks by: (1) reject or normalize any owner/repo
containing path separators or traversal tokens (e.g., '/', '\\', '..') — raise
ValueError if invalid; (2) construct the candidate path using get_raw_root() /
owner / repo, then resolve it (Path.resolve(strict=False)) and ensure the
resolved path is inside the resolved get_raw_root() (use Path.is_relative_to or
compare parents) to prevent escape; (3) only call path.mkdir(parents=True,
exist_ok=True) after validation. Reference get_raw_repo_dir and get_raw_root in
your changes.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ceab7c and 14116d1.

📒 Files selected for processing (15)

.env.example
clang_github_tracker/__init__.py
clang_github_tracker/management/__init__.py
clang_github_tracker/management/commands/__init__.py
clang_github_tracker/management/commands/run_clang_github_tracker.py
clang_github_tracker/state.py
clang_github_tracker/sync_raw.py
clang_github_tracker/tests/__init__.py
clang_github_tracker/tests/test_commands.py
clang_github_tracker/tests/test_state.py
clang_github_tracker/workspace.py
config/settings.py
config/test_settings.py
docs/Workspace.md
workflow/management/commands/run_all_collectors.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

clang_github_tracker/sync_raw.py (1)

146-148: Remove redundant exception object from logger.exception.

logger.exception() automatically includes the exception info from the current context. Passing e explicitly is redundant.

Proposed fix

     except (ConnectionException, RateLimitException) as e:
-        logger.exception("clang_github_tracker sync failed: %s", e)
+        logger.exception("clang_github_tracker sync failed")
         raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/sync_raw.py` around lines 146 - 148, The except block
catching ConnectionException and RateLimitException passes the caught exception
variable e to logger.exception redundantly; update the handler in sync_raw.py to
call logger.exception with just the message (e.g.,
logger.exception("clang_github_tracker sync failed")) and remove the extra
argument and the unused formatting, leaving the raise as-is so the exception
context is preserved; reference the except (ConnectionException,
RateLimitException) as e block, logger.exception, and the variable e when making
the change.

clang_github_tracker/tests/test_commands.py (1)

16-27: Consider adding cleanup/fixture for state file management.

The test manually deletes the state file if it exists, but doesn't clean up after itself. This could affect other tests if run order changes. Consider using a pytest fixture with cleanup.

Example fixture approach

`@pytest.fixture`
def clean_state_file():
    """Ensure state file is removed before and after test."""
    workspace = get_workspace_path("clang_github_activity")
    state_file = workspace / "state.json"
    if state_file.exists():
        state_file.unlink()
    yield state_file
    if state_file.exists():
        state_file.unlink()


`@pytest.mark.django_db`
def test_run_clang_github_tracker_dry_run_creates_state_if_missing(caplog, clean_state_file):
    """With --dry-run and no state file, command creates state from raw scan and resolves dates."""
    with caplog.at_level(logging.INFO):
        call_command(CMD_NAME, "--dry-run", stdout=StringIO(), stderr=StringIO())
    assert any("Resolved:" in r.getMessage() for r in caplog.records)
    assert any("Dry run" in r.getMessage() for r in caplog.records)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/tests/test_commands.py` around lines 16 - 27, Test
leaves a workspace state file behind which can cause order-dependent failures;
create a pytest fixture (e.g., clean_state_file) that locates the same
state_file via get_workspace_path("clang_github_activity")/ "state.json",
removes it before yielding and removes it again after the test, then update
test_run_clang_github_tracker_dry_run_creates_state_if_missing to accept the
fixture (clean_state_file) and drop the manual unlink logic so the file is
cleaned up both before and after the call_command invocation.

clang_github_tracker/management/commands/run_clang_github_tracker.py (1)

109-111: Remove redundant exception object from logger.exception.

logger.exception() automatically captures and logs the current exception context. Passing e explicitly is redundant.

Proposed fix

         except Exception as e:
-            logger.exception("run_clang_github_tracker failed: %s", e)
+            logger.exception("run_clang_github_tracker failed")
             raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/management/commands/run_clang_github_tracker.py` around
lines 109 - 111, The except block in run_clang_github_tracker.py currently calls
logger.exception("run_clang_github_tracker failed: %s", e) which redundantly
passes the exception object; change the call to
logger.exception("run_clang_github_tracker failed") so that logger.exception
uses the implicit exception context, leaving the subsequent raise intact (locate
the except Exception as e: handler in the run_clang_github_tracker command code
and remove the explicit e argument from logger.exception).

clang_github_tracker/state.py (1)

247-254: Retry logic calls ensure_state_file_exists() twice but may not be effective.

Lines 247-249 call ensure_state_file_exists() and immediately retry if empty. However, if the first call failed due to a transient error (e.g., file system issue), the second call is unlikely to succeed without any delay or backoff. Consider adding a brief delay or removing the redundant retry if it's not needed.

Option 1: Add a brief delay between retries

     state = ensure_state_file_exists()
     if not state:
+        import time
+        time.sleep(0.1)  # Brief delay before retry
         state = ensure_state_file_exists()  # retry once

Option 2: Document the retry intent

     state = ensure_state_file_exists()
     if not state:
-        state = ensure_state_file_exists()  # retry once
+        # Retry once in case of transient filesystem issues
+        state = ensure_state_file_exists()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/state.py` around lines 247 - 254, The current retry
calls ensure_state_file_exists() twice back-to-back which is ineffective for
transient failures; replace the immediate second call with a simple retry loop:
call ensure_state_file_exists() once, and if it returns falsy retry up to 2 more
times with a short sleep/backoff (e.g., time.sleep(0.5) then 1.0) before logging
via logger.error and returning None; ensure you import time and update the state
variable inside the loop so the checks reference the latest result and keep the
existing logger.error message if finally unsuccessful.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/settings.py`:
- Around line 143-146: The CLANG_GITHUB_OWNER and CLANG_GITHUB_REPO assignments
accept whitespace-only environment values because .strip() is applied after the
fallback, producing empty strings; change them to strip the env value first and
then fallback if the stripped result is empty — i.e., call env(...) (with the
same default), strip the returned string, and if that stripped string is falsy
set it to "llvm" for CLANG_GITHUB_OWNER and "llvm-project" for
CLANG_GITHUB_REPO; update the assignments for the symbols CLANG_GITHUB_OWNER and
CLANG_GITHUB_REPO accordingly.

---

Nitpick comments:
In `@clang_github_tracker/management/commands/run_clang_github_tracker.py`:
- Around line 109-111: The except block in run_clang_github_tracker.py currently
calls logger.exception("run_clang_github_tracker failed: %s", e) which
redundantly passes the exception object; change the call to
logger.exception("run_clang_github_tracker failed") so that logger.exception
uses the implicit exception context, leaving the subsequent raise intact (locate
the except Exception as e: handler in the run_clang_github_tracker command code
and remove the explicit e argument from logger.exception).

In `@clang_github_tracker/state.py`:
- Around line 247-254: The current retry calls ensure_state_file_exists() twice
back-to-back which is ineffective for transient failures; replace the immediate
second call with a simple retry loop: call ensure_state_file_exists() once, and
if it returns falsy retry up to 2 more times with a short sleep/backoff (e.g.,
time.sleep(0.5) then 1.0) before logging via logger.error and returning None;
ensure you import time and update the state variable inside the loop so the
checks reference the latest result and keep the existing logger.error message if
finally unsuccessful.

In `@clang_github_tracker/sync_raw.py`:
- Around line 146-148: The except block catching ConnectionException and
RateLimitException passes the caught exception variable e to logger.exception
redundantly; update the handler in sync_raw.py to call logger.exception with
just the message (e.g., logger.exception("clang_github_tracker sync failed"))
and remove the extra argument and the unused formatting, leaving the raise as-is
so the exception context is preserved; reference the except
(ConnectionException, RateLimitException) as e block, logger.exception, and the
variable e when making the change.

In `@clang_github_tracker/tests/test_commands.py`:
- Around line 16-27: Test leaves a workspace state file behind which can cause
order-dependent failures; create a pytest fixture (e.g., clean_state_file) that
locates the same state_file via get_workspace_path("clang_github_activity")/
"state.json", removes it before yielding and removes it again after the test,
then update test_run_clang_github_tracker_dry_run_creates_state_if_missing to
accept the fixture (clean_state_file) and drop the manual unlink logic so the
file is cleaned up both before and after the call_command invocation.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ceab7c and 14116d1.

📒 Files selected for processing (15)

.env.example
clang_github_tracker/__init__.py
clang_github_tracker/management/__init__.py
clang_github_tracker/management/commands/__init__.py
clang_github_tracker/management/commands/run_clang_github_tracker.py
clang_github_tracker/state.py
clang_github_tracker/sync_raw.py
clang_github_tracker/tests/__init__.py
clang_github_tracker/tests/test_commands.py
clang_github_tracker/tests/test_state.py
clang_github_tracker/workspace.py
config/settings.py
config/test_settings.py
docs/Workspace.md
workflow/management/commands/run_all_collectors.py

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

clang_github_tracker/management/commands/run_clang_github_tracker.py (1)

109-111: Consider removing redundant exception object from logging.exception.

logger.exception() automatically includes the exception traceback, so passing e as a format argument is redundant.

♻️ Suggested simplification

         except Exception as e:
-            logger.exception("run_clang_github_tracker failed: %s", e)
+            logger.exception("run_clang_github_tracker failed")
             raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/management/commands/run_clang_github_tracker.py` around
lines 109 - 111, In the except block in run_clang_github_tracker.py (inside the
run_clang_github_tracker command handler), remove the redundant exception object
passed to logger.exception and log only a descriptive message (e.g.,
"run_clang_github_tracker failed") since logger.exception already includes the
traceback; keep the subsequent raise to re-raise the exception.

clang_github_tracker/tests/test_commands.py (1)

48-67: Consider verifying all resolved date parameters for completeness.

The test checks start_commit and end_date but doesn't verify start_issue and start_pr. Since sync_raw_only accepts separate start dates for each entity type, asserting all four parameters would strengthen coverage.

♻️ Suggested enhancement

         sync_mock.assert_called_once()
         call_kw = sync_mock.call_args[1]
         assert "start_commit" in call_kw
+        assert "start_issue" in call_kw
+        assert "start_pr" in call_kw
         assert "end_date" in call_kw
         assert any("saved commits=" in r.getMessage() for r in caplog.records)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/tests/test_commands.py` around lines 48 - 67, Update the
test_run_clang_github_tracker_calls_sync_raw_only_when_not_dry_run to also
assert that sync_raw_only was called with the other resolved date parameters:
after obtaining call_kw from sync_mock.call_args[1], add assertions that
"start_issue" and "start_pr" are present (and optionally that their values match
the resolved start date or expected parsed dates), in addition to the existing
checks for "start_commit" and "end_date" so all four entity start parameters are
verified for completeness.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clang_github_tracker/state.py`:
- Around line 60-73: The state loader currently accepts any truthy last_*_date
values (e.g., ints/dicts) which can masquerade as valid state; update the
validation in the function that contains this snippet (ensure_state_file_exists
/ the state-parsing helper) to only accept strings that are valid ISO8601 dates:
for each of KEY_LAST_COMMIT_DATE, KEY_LAST_ISSUE_DATE, KEY_LAST_PR_DATE check
isinstance(value, str) and attempt to parse with datetime.fromisoformat (or
equivalent) and treat parse failures as invalid; only return the dict when at
least one key contains a successfully parsed ISO string, otherwise return {} so
recompute is triggered.

In `@clang_github_tracker/workspace.py`:
- Around line 44-50: The get_raw_repo_dir function currently composes filesystem
paths from owner and repo without validation; before creating or joining the
path in get_raw_repo_dir, validate both owner and repo (e.g., in the
get_raw_repo_dir function) and reject values that are absolute paths, contain
path separators or traversal segments ("/", "\\", "..") or otherwise do not
match an allowed pattern (for example only alphanumeric, hyphen, underscore); if
invalid, raise ValueError; after validation, continue to build path =
get_raw_root() / owner / repo and mkdir when create is True.

---

Nitpick comments:
In `@clang_github_tracker/management/commands/run_clang_github_tracker.py`:
- Around line 109-111: In the except block in run_clang_github_tracker.py
(inside the run_clang_github_tracker command handler), remove the redundant
exception object passed to logger.exception and log only a descriptive message
(e.g., "run_clang_github_tracker failed") since logger.exception already
includes the traceback; keep the subsequent raise to re-raise the exception.

In `@clang_github_tracker/tests/test_commands.py`:
- Around line 48-67: Update the
test_run_clang_github_tracker_calls_sync_raw_only_when_not_dry_run to also
assert that sync_raw_only was called with the other resolved date parameters:
after obtaining call_kw from sync_mock.call_args[1], add assertions that
"start_issue" and "start_pr" are present (and optionally that their values match
the resolved start date or expected parsed dates), in addition to the existing
checks for "start_commit" and "end_date" so all four entity start parameters are
verified for completeness.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ceab7c and a3a3253.

📒 Files selected for processing (15)

.env.example
clang_github_tracker/__init__.py
clang_github_tracker/management/__init__.py
clang_github_tracker/management/commands/__init__.py
clang_github_tracker/management/commands/run_clang_github_tracker.py
clang_github_tracker/state.py
clang_github_tracker/sync_raw.py
clang_github_tracker/tests/__init__.py
clang_github_tracker/tests/test_commands.py
clang_github_tracker/tests/test_state.py
clang_github_tracker/workspace.py
config/settings.py
config/test_settings.py
docs/Workspace.md
workflow/management/commands/run_all_collectors.py

coderabbitai

♻️ Duplicate comments (1)

clang_github_tracker/workspace.py (1)

44-50: ⚠️ Potential issue | 🟠 Major

Harden owner/repo path segments before path composition.

get_raw_repo_dir() currently accepts raw path components. Values like ../x or embedded separators can redirect writes outside workspace/raw/github_activity_tracker.

🔧 Proposed fix

 from pathlib import Path
+import re
@@
 STATE_FILENAME = "state.json"
+_SEGMENT_RE = re.compile(r"^[A-Za-z0-9_.-]+$")
@@
 OWNER = settings.CLANG_GITHUB_OWNER
 REPO = settings.CLANG_GITHUB_REPO
+
+
+def _validate_repo_segment(value: str, label: str) -> str:
+    segment = (value or "").strip()
+    if (
+        not segment
+        or segment in {".", ".."}
+        or "/" in segment
+        or "\\" in segment
+        or not _SEGMENT_RE.fullmatch(segment)
+    ):
+        raise ValueError(f"Invalid GitHub {label}: {value!r}")
+    return segment
@@
 def get_raw_repo_dir(
     owner: str = OWNER, repo: str = REPO, *, create: bool = True
 ) -> Path:
     """Return workspace/raw/github_activity_tracker/<owner>/<repo>/; creates dirs if missing when create=True."""
-    path = get_raw_root() / owner / repo
+    safe_owner = _validate_repo_segment(owner, "owner")
+    safe_repo = _validate_repo_segment(repo, "repo")
+    path = get_raw_root() / safe_owner / safe_repo
     if create:
         path.mkdir(parents=True, exist_ok=True)
     return path

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/workspace.py` around lines 44 - 50, get_raw_repo_dir
currently concatenates raw owner/repo strings into a filesystem path, allowing
path-traversal (e.g., "../x" or embedded separators); sanitize and validate the
owner and repo before composing the path: in get_raw_repo_dir, reject or
normalize inputs by ensuring owner and repo are non-empty, do not contain path
separators or ".." (check against "/" and os.path.sep and the literal ".."), or
replace them with their basename (e.g., Path(owner).name) and raise ValueError
on invalid input; then build the path using the sanitized values and proceed
with path.mkdir as before; optionally, after creating the directory, assert that
the resolved path is a descendant of get_raw_root().resolve() to be extra-safe.

🧹 Nitpick comments (1)

clang_github_tracker/tests/test_commands.py (1)

48-67: Consider adding test for state resolution failure.

The current tests cover the happy path. Per project learnings, when resolve_start_end_dates() returns None, the command returns early (exits with 0) rather than raising CommandError. Consider adding a test to verify this graceful degradation behavior.

💡 Example test case

`@pytest.mark.django_db`
def test_run_clang_github_tracker_exits_gracefully_on_state_resolution_failure(caplog):
    """When resolve_start_end_dates returns None, command exits gracefully (no sync, no error)."""
    with patch(
        "clang_github_tracker.management.commands.run_clang_github_tracker.resolve_start_end_dates",
        return_value=None,
    ):
        with patch(
            "clang_github_tracker.management.commands.run_clang_github_tracker.sync_raw_only"
        ) as sync_mock:
            # Should not raise
            call_command(CMD_NAME, stdout=StringIO(), stderr=StringIO())
            sync_mock.assert_not_called()

Based on learnings: "when resolve_start_end_dates() returns None (state resolution failure), the command returns early (exits with 0) rather than raising CommandError."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/tests/test_commands.py` around lines 48 - 67, Add a unit
test that verifies the management command returns early (does not call
sync_raw_only and does not raise) when resolve_start_end_dates() fails (returns
None); specifically patch resolve_start_end_dates in the
run_clang_github_tracker module to return None, patch sync_raw_only to ensure it
is not called, invoke call_command(CMD_NAME, stdout=StringIO(),
stderr=StringIO()), and assert sync_mock.assert_not_called() and no exception
was raised. Reference resolve_start_end_dates, sync_raw_only, CMD_NAME, and
call_command in the test to locate the right functions to patch.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@clang_github_tracker/workspace.py`:
- Around line 44-50: get_raw_repo_dir currently concatenates raw owner/repo
strings into a filesystem path, allowing path-traversal (e.g., "../x" or
embedded separators); sanitize and validate the owner and repo before composing
the path: in get_raw_repo_dir, reject or normalize inputs by ensuring owner and
repo are non-empty, do not contain path separators or ".." (check against "/"
and os.path.sep and the literal ".."), or replace them with their basename
(e.g., Path(owner).name) and raise ValueError on invalid input; then build the
path using the sanitized values and proceed with path.mkdir as before;
optionally, after creating the directory, assert that the resolved path is a
descendant of get_raw_root().resolve() to be extra-safe.

---

Nitpick comments:
In `@clang_github_tracker/tests/test_commands.py`:
- Around line 48-67: Add a unit test that verifies the management command
returns early (does not call sync_raw_only and does not raise) when
resolve_start_end_dates() fails (returns None); specifically patch
resolve_start_end_dates in the run_clang_github_tracker module to return None,
patch sync_raw_only to ensure it is not called, invoke call_command(CMD_NAME,
stdout=StringIO(), stderr=StringIO()), and assert sync_mock.assert_not_called()
and no exception was raised. Reference resolve_start_end_dates, sync_raw_only,
CMD_NAME, and call_command in the test to locate the right functions to patch.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ceab7c and a3a3253.

📒 Files selected for processing (15)

.env.example
clang_github_tracker/__init__.py
clang_github_tracker/management/__init__.py
clang_github_tracker/management/commands/__init__.py
clang_github_tracker/management/commands/run_clang_github_tracker.py
clang_github_tracker/state.py
clang_github_tracker/sync_raw.py
clang_github_tracker/tests/__init__.py
clang_github_tracker/tests/test_commands.py
clang_github_tracker/tests/test_state.py
clang_github_tracker/workspace.py
config/settings.py
config/test_settings.py
docs/Workspace.md
workflow/management/commands/run_all_collectors.py

leostar0412 · 2026-03-03T01:48:54Z

@coderabbitai full review

coderabbitai · 2026-03-03T01:49:05Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

clang_github_tracker/workspace.py (1)

31-36: ⚠️ Potential issue | 🟠 Major

Reject . and .. path segments to close traversal escape routes.

Current validation allows "."/"..", which can move writes outside the intended <owner>/<repo> subtree.

🔒 Suggested hardening

 def _sanitize_segment(value: str, label: str) -> str:
     """Validate owner/repo for use as a path segment; reject traversal, separators, empty."""
     value = (value or "").strip()
-    if not value or not _SEGMENT_RE.fullmatch(value):
+    if value in {".", ".."} or not _SEGMENT_RE.fullmatch(value):
         raise ValueError(f"Invalid GitHub {label}: {value!r}")
     return value

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/workspace.py` around lines 31 - 36, The
_sanitize_segment function currently allows "." and ".." which enable path
traversal; update _sanitize_segment to explicitly reject "." and ".." after
trimming and before using _SEGMENT_RE (i.e., if value in {".", ".."}: raise
ValueError(...)); keep the existing ValueError message and continue to validate
with _SEGMENT_RE to preserve other checks.

🧹 Nitpick comments (2)

config/settings.py (1)

142-148: Consider aligning runtime workspace pre-creation with test settings.

Since the new tracker uses workspace/clang_github_activity/state.json, adding clang_github_activity to _WORKSPACE_APP_SLUGS would keep runtime behavior consistent with config/test_settings.py and make workspace layout more predictable.

♻️ Suggested consistency tweak

 _WORKSPACE_APP_SLUGS = (
     "github_activity_tracker",
     "boost_library_tracker",
     "boost_usage_tracker",
+    "clang_github_activity",
     "cppa_slack_transcript_tracker",
     "discord_activity_tracker",
     "boost_mailing_list_tracker",
     "shared",
 )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@config/settings.py` around lines 142 - 148, Add the new tracker app slug to
the runtime workspace slug list: update the _WORKSPACE_APP_SLUGS constant to
include "clang_github_activity" so the runtime pre-creates
workspace/clang_github_activity/state.json the same way tests expect; modify the
list where _WORKSPACE_APP_SLUGS is defined to append or include
"clang_github_activity" alongside existing slugs to align runtime workspace
layout with config/test_settings.py.

clang_github_tracker/tests/test_commands.py (1)

29-67: Add one test for invalid date input fallback behavior.

A dedicated case with invalid --from-date/--to-date would lock in the intended warning-and-fallback flow and prevent future regressions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/tests/test_commands.py` around lines 29 - 67, Add a new
test in the same tests module that patches
clang_github_tracker.management.commands.run_clang_github_tracker.sync_raw_only,
calls call_command(CMD_NAME, "--from-date=invalid", "--to-date=also-bad", ...)
and asserts that (1) caplog captured a warning/error message about invalid dates
(e.g., contains "invalid" or "Resolved:" warning) and (2) sync_raw_only is still
invoked with the fallback/resolved date arguments (check call_kw contains
"start_commit" or "start_date"/"end_date" as used by the command). Ensure the
test mirrors the existing tests' structure (use `@pytest.mark.django_db`,
caplog.at_level(logging.INFO), StringIO for stdout/stderr) and uses
sync_raw_only.assert_called_once() and inspects sync_mock.call_args[1] for the
resolved-date keys.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clang_github_tracker/tests/test_commands.py`:
- Around line 17-27: The test
test_run_clang_github_tracker_dry_run_creates_state_if_missing currently only
asserts logs; update it to assert the actual side-effect by checking the state
file created by call_command: after calling call_command(CMD_NAME, "--dry-run",
...) verify state_file.exists() is True and optionally open/read state_file to
assert expected keys/contents (e.g., that resolved dates are present) so the
test fails if the file is not written; keep references to
get_workspace_path("clang_github_activity"), state_file, and the test function
name when making the change.

---

Duplicate comments:
In `@clang_github_tracker/workspace.py`:
- Around line 31-36: The _sanitize_segment function currently allows "." and
".." which enable path traversal; update _sanitize_segment to explicitly reject
"." and ".." after trimming and before using _SEGMENT_RE (i.e., if value in
{".", ".."}: raise ValueError(...)); keep the existing ValueError message and
continue to validate with _SEGMENT_RE to preserve other checks.

---

Nitpick comments:
In `@clang_github_tracker/tests/test_commands.py`:
- Around line 29-67: Add a new test in the same tests module that patches
clang_github_tracker.management.commands.run_clang_github_tracker.sync_raw_only,
calls call_command(CMD_NAME, "--from-date=invalid", "--to-date=also-bad", ...)
and asserts that (1) caplog captured a warning/error message about invalid dates
(e.g., contains "invalid" or "Resolved:" warning) and (2) sync_raw_only is still
invoked with the fallback/resolved date arguments (check call_kw contains
"start_commit" or "start_date"/"end_date" as used by the command). Ensure the
test mirrors the existing tests' structure (use `@pytest.mark.django_db`,
caplog.at_level(logging.INFO), StringIO for stdout/stderr) and uses
sync_raw_only.assert_called_once() and inspects sync_mock.call_args[1] for the
resolved-date keys.

In `@config/settings.py`:
- Around line 142-148: Add the new tracker app slug to the runtime workspace
slug list: update the _WORKSPACE_APP_SLUGS constant to include
"clang_github_activity" so the runtime pre-creates
workspace/clang_github_activity/state.json the same way tests expect; modify the
list where _WORKSPACE_APP_SLUGS is defined to append or include
"clang_github_activity" alongside existing slugs to align runtime workspace
layout with config/test_settings.py.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ceab7c and 115cdeb.

📒 Files selected for processing (15)

.env.example
clang_github_tracker/__init__.py
clang_github_tracker/management/__init__.py
clang_github_tracker/management/commands/__init__.py
clang_github_tracker/management/commands/run_clang_github_tracker.py
clang_github_tracker/state.py
clang_github_tracker/sync_raw.py
clang_github_tracker/tests/__init__.py
clang_github_tracker/tests/test_commands.py
clang_github_tracker/tests/test_state.py
clang_github_tracker/workspace.py
config/settings.py
config/test_settings.py
docs/Workspace.md
workflow/management/commands/run_all_collectors.py

…ate_if_missing cppalliance#82

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

clang_github_tracker/tests/test_commands.py (1)

17-27: ⚠️ Potential issue | 🟡 Minor

Assert the state-file side effect in the dry-run bootstrap test.

Lines [25]-[26] only validate logs, so this test still passes if state.json creation regresses.

Suggested test hardening

     with caplog.at_level(logging.INFO):
         call_command(CMD_NAME, "--dry-run", stdout=StringIO(), stderr=StringIO())
     assert any("Resolved:" in r.getMessage() for r in caplog.records)
     assert any("Dry run" in r.getMessage() for r in caplog.records)
+    assert state_file.exists()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/tests/test_commands.py` around lines 17 - 27, The test
test_run_clang_github_tracker_dry_run_creates_state_if_missing currently only
checks logs and can miss regressions where state.json is not created; after
calling call_command(CMD_NAME, "--dry-run", ...) assert that the state_file Path
(variable state_file) now exists and optionally contains valid JSON or expected
keys (e.g., load with json.load and assert type/dates) to verify the
side-effect; use the existing state_file variable and CMD_NAME/call_command to
locate where to add these assertions.

🧹 Nitpick comments (2)

clang_github_tracker/tests/test_commands.py (1)

49-67: Strengthen sync-call assertions to verify resolved values, not just kwarg presence.

Lines [64]-[66] can produce false positives even if wrong dates are passed to sync_raw_only.

Suggested assertion upgrade

         sync_mock.assert_called_once()
-        call_kw = sync_mock.call_args[1]
-        assert "start_commit" in call_kw
-        assert "end_date" in call_kw
+        call_kw = sync_mock.call_args.kwargs
+        assert call_kw["start_commit"].isoformat() == "2024-01-01T00:00:00+00:00"
+        assert call_kw["start_issue"].isoformat() == "2024-01-01T00:00:00+00:00"
+        assert call_kw["start_pr"].isoformat() == "2024-01-01T00:00:00+00:00"
+        assert call_kw["end_date"].isoformat() == "2024-01-02T00:00:00+00:00"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/tests/test_commands.py` around lines 49 - 67, The test
test_run_clang_github_tracker_calls_sync_raw_only_when_not_dry_run only asserts
that sync_raw_only was called with keys present, which can pass with wrong
values; update the assertions on the mock (sync_raw_only) to assert the actual
resolved values for the relevant kwargs (e.g., verify call_kw["start_commit"]
equals the expected resolved start value and call_kw["end_date"] equals the
expected resolved end value) after calling call_command(CMD_NAME,
"--from-date=2024-01-01", "--to-date=2024-01-02", ...); keep the existing caplog
check for "saved commits=" but replace the generic "in call_kw" checks with
explicit equality checks against the expected parsed/converted dates or commit
identifiers returned by the command.

clang_github_tracker/management/commands/run_clang_github_tracker.py (1)

75-78: Consider permissive fallback for inverted date ranges to match project command behavior.

Line [76] currently hard-fails; this is stricter than the repo’s graceful date-input handling pattern.

Consistency-oriented alternative

-        if from_date and to_date and from_date > to_date:
-            raise CommandError(
-                "Invalid date range: from_date must be before or equal to to_date."
-            )
+        if from_date and to_date and from_date > to_date:
+            logger.warning(
+                "Invalid date range (--from-date > --to-date); ignoring both and using default date resolution."
+            )
+            from_date = None
+            to_date = None

Based on learnings: In Django management commands in this project, invalid start/end date inputs are intentionally treated permissively and fall back to default logic.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/management/commands/run_clang_github_tracker.py` around
lines 75 - 78, The current strict check in
run_clang_github_tracker.Command.handle that raises CommandError when from_date
> to_date should be made permissive: instead of raising, swap the two values
(set from_date, to_date = to_date, from_date) and emit a warning via
self.stderr.write or the command logger to note the correction; update the block
referencing from_date and to_date so downstream logic uses the swapped values
(look for the if from_date and to_date check and replace the raise with the swap
+ warning).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clang_github_tracker/tests/test_state.py`:
- Around line 27-34: The test patches the wrong function: update the mock to
patch get_raw_repo_dir used by compute_state_from_raw instead of get_raw_root;
specifically patch "clang_github_tracker.state.get_raw_repo_dir" (the function
imported/used by clang_state.compute_state_from_raw), set its return_value to
tmp_path / "raw" / "github_activity_tracker" (or an appropriate Path) and keep
the rest of the assertions unchanged so compute_state_from_raw exercises the
intended no-repo path handling.

---

Duplicate comments:
In `@clang_github_tracker/tests/test_commands.py`:
- Around line 17-27: The test
test_run_clang_github_tracker_dry_run_creates_state_if_missing currently only
checks logs and can miss regressions where state.json is not created; after
calling call_command(CMD_NAME, "--dry-run", ...) assert that the state_file Path
(variable state_file) now exists and optionally contains valid JSON or expected
keys (e.g., load with json.load and assert type/dates) to verify the
side-effect; use the existing state_file variable and CMD_NAME/call_command to
locate where to add these assertions.

---

Nitpick comments:
In `@clang_github_tracker/management/commands/run_clang_github_tracker.py`:
- Around line 75-78: The current strict check in
run_clang_github_tracker.Command.handle that raises CommandError when from_date
> to_date should be made permissive: instead of raising, swap the two values
(set from_date, to_date = to_date, from_date) and emit a warning via
self.stderr.write or the command logger to note the correction; update the block
referencing from_date and to_date so downstream logic uses the swapped values
(look for the if from_date and to_date check and replace the raise with the swap
+ warning).

In `@clang_github_tracker/tests/test_commands.py`:
- Around line 49-67: The test
test_run_clang_github_tracker_calls_sync_raw_only_when_not_dry_run only asserts
that sync_raw_only was called with keys present, which can pass with wrong
values; update the assertions on the mock (sync_raw_only) to assert the actual
resolved values for the relevant kwargs (e.g., verify call_kw["start_commit"]
equals the expected resolved start value and call_kw["end_date"] equals the
expected resolved end value) after calling call_command(CMD_NAME,
"--from-date=2024-01-01", "--to-date=2024-01-02", ...); keep the existing caplog
check for "saved commits=" but replace the generic "in call_kw" checks with
explicit equality checks against the expected parsed/converted dates or commit
identifiers returned by the command.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ceab7c and 115cdeb.

📒 Files selected for processing (15)

.env.example
clang_github_tracker/__init__.py
clang_github_tracker/management/__init__.py
clang_github_tracker/management/commands/__init__.py
clang_github_tracker/management/commands/run_clang_github_tracker.py
clang_github_tracker/state.py
clang_github_tracker/sync_raw.py
clang_github_tracker/tests/__init__.py
clang_github_tracker/tests/test_commands.py
clang_github_tracker/tests/test_state.py
clang_github_tracker/workspace.py
config/settings.py
config/test_settings.py
docs/Workspace.md
workflow/management/commands/run_all_collectors.py

leostar0412 · 2026-03-03T02:27:18Z

@coderabbitai full review

coderabbitai · 2026-03-03T02:27:30Z

✅ Actions performed

Full review triggered.

coderabbitai

🧹 Nitpick comments (2)

clang_github_tracker/sync_raw.py (1)

143-145: Minor: Redundant exception object in logging.exception call.

Same as in the management command—logger.exception() already captures the exception context.

♻️ Optional simplification

     except (ConnectionException, RateLimitException) as e:
-        logger.exception("clang_github_tracker sync failed: %s", e)
+        logger.exception("clang_github_tracker sync failed")
         raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/sync_raw.py` around lines 143 - 145, The except block
catching ConnectionException and RateLimitException in sync_raw.py uses
logger.exception("clang_github_tracker sync failed: %s", e) which redundantly
passes the exception object; change the call to
logger.exception("clang_github_tracker sync failed") (leave the exception
classes ConnectionException and RateLimitException and the subsequent raise
intact) so the exception context is logged automatically by logger.exception.

clang_github_tracker/management/commands/run_clang_github_tracker.py (1)

113-115: Minor: Redundant exception object in logging.exception call.

logger.exception() automatically includes the exception information from the current context; passing e explicitly is redundant.

♻️ Optional simplification

         except Exception as e:
-            logger.exception("run_clang_github_tracker failed: %s", e)
+            logger.exception("run_clang_github_tracker failed")
             raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@clang_github_tracker/management/commands/run_clang_github_tracker.py` around
lines 113 - 115, Remove the redundant exception object from the logger.exception
call: in the exception handler where the code currently does
logger.exception("run_clang_github_tracker failed: %s", e) (inside the
run_clang_github_tracker/handle exception block), change it to call
logger.exception("run_clang_github_tracker failed") so the exception context is
logged automatically, and keep the subsequent raise as-is.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@clang_github_tracker/management/commands/run_clang_github_tracker.py`:
- Around line 113-115: Remove the redundant exception object from the
logger.exception call: in the exception handler where the code currently does
logger.exception("run_clang_github_tracker failed: %s", e) (inside the
run_clang_github_tracker/handle exception block), change it to call
logger.exception("run_clang_github_tracker failed") so the exception context is
logged automatically, and keep the subsequent raise as-is.

In `@clang_github_tracker/sync_raw.py`:
- Around line 143-145: The except block catching ConnectionException and
RateLimitException in sync_raw.py uses logger.exception("clang_github_tracker
sync failed: %s", e) which redundantly passes the exception object; change the
call to logger.exception("clang_github_tracker sync failed") (leave the
exception classes ConnectionException and RateLimitException and the subsequent
raise intact) so the exception context is logged automatically by
logger.exception.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ceab7c and e3f1d5d.

📒 Files selected for processing (15)

.env.example
clang_github_tracker/__init__.py
clang_github_tracker/management/__init__.py
clang_github_tracker/management/commands/__init__.py
clang_github_tracker/management/commands/run_clang_github_tracker.py
clang_github_tracker/state.py
clang_github_tracker/sync_raw.py
clang_github_tracker/tests/__init__.py
clang_github_tracker/tests/test_commands.py
clang_github_tracker/tests/test_state.py
clang_github_tracker/workspace.py
config/settings.py
config/test_settings.py
docs/Workspace.md
workflow/management/commands/run_all_collectors.py

jonathanMLDev

Please read the comments and update the code if it is reasonable.

leostar0412 added 2 commits March 2, 2026 12:33

Add clang_github_tracker app v1 cppalliance#82

43df3f3

Update fetching state logic cppalliance#82

14116d1

leostar0412 self-assigned this Mar 2, 2026

coderabbitai Bot reviewed Mar 2, 2026

View reviewed changes

Comment thread clang_github_tracker/management/commands/run_clang_github_tracker.py

Comment thread clang_github_tracker/state_manager.py

Comment thread clang_github_tracker/sync_raw.py

Comment thread docs/Workspace.md

coderabbitai Bot reviewed Mar 2, 2026

View reviewed changes

Comment thread config/settings.py Outdated

Update the state saving logic cppalliance#82

a3a3253

coderabbitai Bot reviewed Mar 2, 2026

View reviewed changes

Comment thread clang_github_tracker/state_manager.py

Comment thread clang_github_tracker/workspace.py

coderabbitai Bot reviewed Mar 3, 2026

View reviewed changes

Fix: file location error fix - cppalliance#82

115cdeb

coderabbitai Bot reviewed Mar 3, 2026

View reviewed changes

Comment thread clang_github_tracker/tests/test_commands.py

Add new test case in test_run_clang_github_tracker_dry_run_creates_st…

2d236a4

…ate_if_missing cppalliance#82

coderabbitai Bot reviewed Mar 3, 2026

View reviewed changes

Comment thread clang_github_tracker/tests/test_state_manager.py

leostar0412 added 2 commits March 2, 2026 18:13

Fix: test in test_compute_state_from_raw_empty_dir cppalliance#82

25e63f3

Add description in clang_github_activity cppalliance#82

e3f1d5d

coderabbitai Bot reviewed Mar 3, 2026

View reviewed changes

leostar0412 mentioned this pull request Mar 3, 2026

Add clang-github-tracker app in the project #82

Closed

leostar0412 requested a review from jonathanMLDev March 3, 2026 03:44

jonathanMLDev approved these changes Mar 6, 2026

View reviewed changes

Comment thread clang_github_tracker/management/commands/run_clang_github_tracker.py

Comment thread clang_github_tracker/sync_raw.py Outdated

Comment thread clang_github_tracker/state_manager.py

Comment thread clang_github_tracker/state_manager.py

Comment thread clang_github_tracker/sync_raw.py

leostar0412 added 2 commits March 9, 2026 11:16

Refactor state manager and fix UTC datetime conversion cppalliance#82

b872bbc

Fix: remove retry logic in getting state cppalliance#82

2865a96

leostar0412 requested a review from jonathanMLDev March 9, 2026 19:00

jonathanMLDev approved these changes Mar 9, 2026

View reviewed changes

snowfox1003 merged commit 236de33 into cppalliance:develop Mar 9, 2026
3 checks passed

This was referenced Mar 11, 2026

Add WG21 paper tracker with fetch, download, GCS upload, and tests (#24) #104

Merged

Refactor GitHub client, stabilize ETag cache, and fix client tests (#106) #108

Merged

This was referenced Mar 16, 2026

Issue and PR MD create and upload logic added (#113) #114

Merged

#15-create git is&pr preprocessor, update pinecone sync logic #118

Merged

coderabbitai Bot mentioned this pull request Apr 2, 2026

Clang GitHub tracker: DB-backed sync, markdown publish, chunked backfill (#136) #137

Merged

Conversation

leostar0412 commented Mar 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

leostar0412 commented Mar 3, 2026

Uh oh!

coderabbitai Bot commented Mar 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leostar0412 commented Mar 3, 2026

Uh oh!

coderabbitai Bot commented Mar 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jonathanMLDev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leostar0412 commented Mar 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 2, 2026 •

edited

Loading