Skip to content

addressed review-library tool's results#5

Merged
henry0816191 merged 1 commit intocppalliance:developfrom
henry0816191:bugfix/address-issues-from-library-review
May 4, 2026
Merged

addressed review-library tool's results#5
henry0816191 merged 1 commit intocppalliance:developfrom
henry0816191:bugfix/address-issues-from-library-review

Conversation

@henry0816191
Copy link
Copy Markdown
Collaborator

@henry0816191 henry0816191 commented May 4, 2026

Summary by CodeRabbit

  • New Features

    • Probe requests now include automatic retry logic with exponential backoff (up to 3 attempts) to improve resilience against temporary network failures.
  • Refactor

    • Updated internal tier classification system for improved consistency.
    • Simplified dependency management by removing unnecessary external scheduler dependency.

@henry0816191 henry0816191 self-assigned this May 4, 2026
@henry0816191 henry0816191 added the bug Something isn't working label May 4, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

This PR removes the APScheduler dependency, refactors type safety by introducing a Tier enum to replace string tier values, converts the ProbeState.discovered property to a get_all_discovered() method, and adds exponential backoff retry logic to the probe mechanism.

Changes

APScheduler Removal

Layer / File(s) Summary
Dependency Removal
README.md, pyproject.toml
APScheduler dependency documentation and declaration removed from both user-facing and package configuration.
Logging Configuration
src/paperscout/__main__.py
APScheduler removed from the noisy third-party library list forced to WARNING level.

Type Safety & API Refactoring

Layer / File(s) Summary
Enum Definition & Probe Retry
src/paperscout/sources.py
New Tier enum introduced (WATCHLIST, FRONTIER, RECENT, COLD); ProbeHit.tier type changed from str to Tier; _probe_one enhanced with up-to-3 exponential backoff retries on httpx.HTTPError; known_p_numbers() method added to WG21Index.
Storage Layer API
src/paperscout/storage.py
ProbeState.discovered property replaced with get_all_discovered() method; method includes docstring advising full table scan cost.
Scout Integration
src/paperscout/scout.py
Tier comparisons updated from string literals ("frontier") to enum members (Tier.FRONTIER); discovered count and timestamp retrieval switched to use get_all_discovered().
Health & Monitor Integration
src/paperscout/health.py, src/paperscout/monitor.py
Both updated to use get_all_discovered() instead of discovered property; explanatory comment added for async-to-thread dispatch in watchlist matching.
Callback Refactoring
src/paperscout/__main__.py
Scheduler notify callback refactored from lambda returning tuple to named _on_poll_result function.
Test Updates
tests/test_health.py, tests/test_storage.py
Test helpers and assertions updated to use get_all_discovered() method; _FakeState stores discovered data privately and exposes via new method.

Sequence Diagram(s)

No sequence diagrams generated. The changes are primarily type safety improvements, API refactoring, and resilience enhancements rather than new control flow or multi-component interactions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A scholar hops through code so bright,
With enums now in proper sight—
Tier names replace the strings of old,
Retries fierce and brave and bold,
The storage API shines anew! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'addressed review-library tool's results' is vague and generic, using a non-descriptive reference to 'review-library tool's results' that does not convey meaningful information about the actual changeset. Replace with a specific, descriptive title that summarizes the main change, such as 'Replace apscheduler with internal scheduler' or 'Refactor discovered state access pattern and add Tier enum' to clearly indicate the primary modifications.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/paperscout/sources.py (2)

518-519: 💤 Low value

Consider logging malformed Last-Modified headers.

The static analyzer flags the try-except-pass pattern (S110). While verbose logging of every malformed header may not be desired, a debug-level log could help diagnose issues with specific servers.

♻️ Optional: Add debug logging
                 try:
                     last_modified = parsedate_to_datetime(lm_str)
                     threshold = timedelta(hours=self.cfg.alert_modified_hours)
                     is_recent = (
                         datetime.now(timezone.utc) - last_modified
                     ) <= threshold
-                except Exception:
-                    pass
+                except Exception as exc:
+                    log.debug("Failed to parse Last-Modified '%s' for %s: %s", lm_str, url, exc)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/paperscout/sources.py` around lines 518 - 519, The silent except block
swallowing errors when parsing Last-Modified headers should be replaced with a
debug-level log so malformed headers can be diagnosed; locate the try/except
around Last-Modified parsing (the except Exception: pass) in
src/paperscout/sources.py and change it to log the header value and error (e.g.,
logger.debug with the header content and exception info or use exc_info=True)
instead of passing, keeping behavior non-failure but recording details for
troubleshooting.

487-553: 💤 Low value

Unreachable code after retry loop.

The else clause on line 499-500 and return None on line 553 are unreachable. The for-loop either exits via break (line 491) or return None (line 498), so the else block never executes. Similarly, line 553 is outside the async with sem: block but can never be reached.

♻️ Proposed cleanup
             async with sem:
                 _max_retries = 3
                 for _attempt in range(_max_retries):
                     try:
                         resp = await client.head(url)
                         break
                     except httpx.HTTPError as exc:
                         if _attempt < _max_retries - 1:
                             await asyncio.sleep(0.5 * (2 ** _attempt))
                             continue
                         log.debug("ERR   %s  %s (after %d attempts)", url, exc, _max_retries)
                         self._stats["error"] += 1
                         return None
-            else:
-                return None
 
                 if resp.status_code != 200:
                     # ... rest of the logic ...
 
                 return ProbeHit(...)
-        return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/paperscout/sources.py` around lines 487 - 553, The for-loop's else and
the trailing return None are unreachable; remove the for ... else block and the
final return None, and instead after the retry loop (after the for _attempt in
range(_max_retries): ... break/except) add a simple presence check for resp
(e.g. if resp is None or "resp" not in locals(): return None) so the subsequent
code that inspects resp.status_code, parses last-modified and eventually returns
ProbeHit (and calls _fetch_front_text) only runs when resp is valid; reference
symbols: _max_retries, resp, for _attempt, ProbeHit, _fetch_front_text.
src/paperscout/scout.py (1)

416-418: 💤 Low value

Redundant local import.

datetime and timezone are already imported at the top of the file (line 7). The local import with aliases _dt and _tz is unnecessary.

♻️ Remove redundant import
 def _handle_status(state: ProbeState, paper_count_fn, say, reply_opts: dict) -> None:
-    from datetime import datetime as _dt, timezone as _tz
     last = state.last_poll
-    last_str = _dt.fromtimestamp(last, tz=_tz.utc).strftime("%Y-%m-%d %H:%M:%S UTC") if last else "never"
+    last_str = datetime.fromtimestamp(last, tz=timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC") if last else "never"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/paperscout/scout.py` around lines 416 - 418, Remove the redundant local
import "from datetime import datetime as _dt, timezone as _tz" and change the
usage to the module-level imports already present at the top of the file;
specifically update the expression that builds last_str (which reads
state.last_poll into last and formats it) to call datetime.fromtimestamp(last,
tz=timezone.utc). Ensure you keep the same conditional ("if last else 'never'")
and the same strftime format while referencing datetime and timezone (not
_dt/_tz) so the duplicate import and aliased names are eliminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/paperscout/scout.py`:
- Around line 416-418: Remove the redundant local import "from datetime import
datetime as _dt, timezone as _tz" and change the usage to the module-level
imports already present at the top of the file; specifically update the
expression that builds last_str (which reads state.last_poll into last and
formats it) to call datetime.fromtimestamp(last, tz=timezone.utc). Ensure you
keep the same conditional ("if last else 'never'") and the same strftime format
while referencing datetime and timezone (not _dt/_tz) so the duplicate import
and aliased names are eliminated.

In `@src/paperscout/sources.py`:
- Around line 518-519: The silent except block swallowing errors when parsing
Last-Modified headers should be replaced with a debug-level log so malformed
headers can be diagnosed; locate the try/except around Last-Modified parsing
(the except Exception: pass) in src/paperscout/sources.py and change it to log
the header value and error (e.g., logger.debug with the header content and
exception info or use exc_info=True) instead of passing, keeping behavior
non-failure but recording details for troubleshooting.
- Around line 487-553: The for-loop's else and the trailing return None are
unreachable; remove the for ... else block and the final return None, and
instead after the retry loop (after the for _attempt in range(_max_retries): ...
break/except) add a simple presence check for resp (e.g. if resp is None or
"resp" not in locals(): return None) so the subsequent code that inspects
resp.status_code, parses last-modified and eventually returns ProbeHit (and
calls _fetch_front_text) only runs when resp is valid; reference symbols:
_max_retries, resp, for _attempt, ProbeHit, _fetch_front_text.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1bdd593-cbe6-4172-b875-d896895b2344

📥 Commits

Reviewing files that changed from the base of the PR and between 9236231 and 3575f3e.

📒 Files selected for processing (10)
  • README.md
  • pyproject.toml
  • src/paperscout/__main__.py
  • src/paperscout/health.py
  • src/paperscout/monitor.py
  • src/paperscout/scout.py
  • src/paperscout/sources.py
  • src/paperscout/storage.py
  • tests/test_health.py
  • tests/test_storage.py
💤 Files with no reviewable changes (2)
  • README.md
  • pyproject.toml

@henry0816191 henry0816191 requested review from wpak-ai May 4, 2026 19:05
@henry0816191 henry0816191 merged commit 0a41701 into cppalliance:develop May 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants