Onboarding docs + handoff guide#9
Conversation
📝 WalkthroughWalkthroughRepository-wide infrastructure, documentation, and code-hygiene setup: adds CI/CD workflows, issue templates, pre-commit configuration, dependency management, and comprehensive developer onboarding and maintainer handoff guides; includes formatting cleanup across source and test files and Ruff linting configuration. ChangesRepository Infrastructure & Automation
Documentation & Contribution Guidelines
Development Tooling & Code Cleanup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes The PR is broad in scope (40+ files) but primarily homogeneous—extensive documentation, straightforward configuration, and repetitive formatting changes. However, the substantial narrative content in Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (4)
.github/ISSUE_TEMPLATE/config.yml (1)
1-2: ⚡ Quick winConsider disabling blank issues to enforce the new intake flow.
With Line 1 enabled and Line 2 empty, reporters can bypass templates (including security guidance), which weakens triage consistency.
Suggested config tweak
-blank_issues_enabled: true -contact_links: [] +blank_issues_enabled: false +contact_links: + - name: Security vulnerability report + url: https://github.com/cppalliance/paperscout/blob/develop/SECURITY.md + about: Please follow SECURITY.md for private vulnerability reporting.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/ISSUE_TEMPLATE/config.yml around lines 1 - 2, The repo currently allows blank issues via the blank_issues_enabled setting and leaves contact_links empty; change blank_issues_enabled from true to false to prevent reporters bypassing templates and either populate contact_links with appropriate intake/contact entries or remove the empty contact_links array so the config enforces the intended intake flow and security guidance; update the keys blank_issues_enabled and contact_links accordingly.CODE_OF_CONDUCT.md (1)
39-39: ⚡ Quick winReplace template placeholder text with a concrete reporting channel.
Line 39 still includes “Replace this paragraph…”, which reads as unfinished policy text. Please finalize this section with the canonical contact route.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CODE_OF_CONDUCT.md` at line 39, In CODE_OF_CONDUCT.md replace the placeholder sentence "Replace this paragraph with a dedicated conduct email or ticketing process when your organization publishes one." with the project's canonical reporting channel (e.g., a dedicated conduct email address, a link to your internal ticketing form, or a specific maintainer contact) so the paragraph reads as a concrete, actionable reporting instruction; update the sentence that currently mentions "repository maintainers" to reference the chosen channel and ensure the contact method is clearly labeled (e.g., "To report abusive behavior, contact: conduct@example.org" or "Submit incidents via: https://support.example.com/conduct").src/paperscout/sources.py (1)
520-525: ⚡ Quick winConsider logging the Last-Modified parsing exception.
The exception is silently caught, making it harder to debug if the server sends malformed date headers. Adding a debug-level log would improve observability without adding noise.
📝 Suggested improvement
try: last_modified = parsedate_to_datetime(lm_str) threshold = timedelta(hours=self.cfg.alert_modified_hours) is_recent = (datetime.now(timezone.utc) - last_modified) <= threshold - except Exception: - pass + except Exception as exc: + log.debug("Failed to parse Last-Modified '%s' for %s: %s", lm_str, url, exc)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/paperscout/sources.py` around lines 520 - 525, The Last-Modified parsing exception is currently swallowed; update the except block that surrounds parsedate_to_datetime/last_modified/is_recent to log the failure at debug level including the raw lm_str and the exception (e.g., "Failed to parse Last-Modified header: {lm_str} - {err}"), using the module/class logger (self.logger if present or logging.getLogger(__name__)) so you capture malformed date headers for debugging without noisy production logs.tests/conftest.py (1)
208-208: ⚡ Quick winConsider cleaning up temporary test directories.
tempfile.mkdtemp()creates a directory that persists after the test run. Consider usingtempfile.TemporaryDirectory()context manager or pytest'stmp_pathfixture for automatic cleanup.🗑️ Suggested improvement using pytest's built-in fixture
def make_test_settings(**overrides) -> Settings: """Build a Settings instance with safe test defaults (no I/O, no credentials).""" + # Note: To use tmp_path, convert this to a pytest fixture instead of a factory function base: dict = dict( slack_signing_secret="test-secret", slack_bot_token="xoxb-test", port=3000, database_url="", poll_interval_minutes=30, poll_overrun_cooldown_seconds=300, enable_bulk_wg21=True, enable_bulk_openstd=True, enable_iso_probe=True, probe_prefixes=["D", "P"], probe_extensions=[".pdf"], frontier_window_above=3, frontier_window_below=1, frontier_explicit_ranges=[], frontier_gap_threshold=50, hot_lookback_months=6, hot_revision_depth=2, cold_revision_depth=1, cold_cycle_divisor=48, gap_max_rev=1, alert_modified_hours=24, http_concurrency=5, http_timeout_seconds=5, http_use_http2=False, notification_channel="", notify_on_frontier_hit=True, notify_on_any_draft=True, notify_on_dp_transition=True, - data_dir=Path(tempfile.mkdtemp(prefix="paperscout-test-")), + data_dir=Path(tempfile.mkdtemp(prefix="paperscout-test-")), # TODO: Use pytest tmp_path fixture cache_ttl_hours=1, ) base.update(overrides) return Settings.model_construct(**base)Alternatively, if this must remain a factory function, ensure callers clean up or use:
data_dir=Path(tempfile.mkdtemp(prefix="paperscout-test-", dir=pytest.TempPathFactory.getbasetemp()))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/conftest.py` at line 208, The test fixture currently sets data_dir using tempfile.mkdtemp(...), which leaves temp directories behind; change it to use a context-managed TemporaryDirectory or pytest's tmp_path/tmp_path_factory so directories are removed automatically — replace the tempfile.mkdtemp call used to construct Path(data_dir=Path(tempfile.mkdtemp(prefix="paperscout-test-"))) with either a TemporaryDirectory() context (or tempfile.TemporaryDirectory() returned path) or use the pytest tmp_path/tmp_path_factory to create the directory for the fixture so cleanup is automatic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/CODEOWNERS:
- Around line 1-8: Replace the commented-out lines in the CODEOWNERS file with
at least one active ownership rule: add a global fallback owner (e.g., "*
`@org/team`" or "* `@githubusername`") and any path-specific owners needed (e.g.,
"/docs/ `@docsteam`" or "src/** `@backend-team`") using valid GitHub usernames or
`@org/team` names; ensure the file contains no only-comment lines so GitHub will
route reviews to the specified owners.
In @.github/workflows/codeql.yml:
- Around line 5-7: Update the CodeQL workflow triggers so they include the
develop branch in addition to main: modify the two branches arrays under the
workflow's push and pull_request triggers (the YAML keys "branches:" shown on
the lines with the CodeQL trigger entries) to list both main and develop so PRs
and pushes against develop also run CodeQL analysis.
In `@CONTRIBUTING.md`:
- Around line 19-22: Update CONTRIBUTING.md wording that currently instructs
contributors to create feature branches from `main` and claims CI runs on “every
push/pull request”: change the branch guidance to reflect the actual workflow by
mentioning both `main` and `develop` (e.g., “create a feature branch from main
or develop as appropriate”) and update the CI statement to specify that
workflows run on pushes and pull requests that target the `main` and `develop`
branches (or are triggered by pushes to those branches), ensuring references to
`main` are expanded to include `develop` where the workflows are scoped.
In `@docs/handoff.md`:
- Around line 59-64: Replace the generic placeholder contacts in the table rows
titled "Slack app", "PostgreSQL", and "GitHub Environments" with concrete
CppAlliance-specific maintainer channels and escalation points: insert the
official Slack channel or app-admin role (e.g., `#cppalliance-ops` or on-call
alias) for "Slack app", the specific infrastructure/DBA owner or mailing list
for "PostgreSQL" (including backup/restore contact), and the GitHub environment
owners or service account contact for "GitHub Environments"; ensure each cell
includes a direct contact (Slack channel, email alias, or on-call rotation) and
update the surrounding sentence that instructs replacement to confirm these
entries are approved for the public/internal wiki.
In `@docs/onboarding.md`:
- Line 207: The doc text is confusing about whether health_port is already in
Settings and when to set HEALTH_PORT; update the sentence to state clearly that
the Settings variable health_port (default 8080) is already defined in the
Settings module and therefore you only need to set the environment variable
HEALTH_PORT if you want to override that default, and if you expect contributors
to use HEALTH_PORT by default, add it to .env.example; reference the Settings
symbol health_port and the env variable name HEALTH_PORT in the revised sentence
so readers know which code-setting and which .env key are being discussed.
- Line 10: The in-page anchor uses `#deployment` but the actual heading is
"Deployment (summary)"; update the link target to match the heading's slug
(e.g., change the link from "#deployment" to "#deployment-summary") or rename
the heading to "Deployment" so the fragment and heading match; locate the
markdown line containing the link text "Optional: **Docker** + Docker Compose
for containerized runs (see [Deployment](`#deployment`))" and make the
corresponding change.
In `@src/paperscout/config.py`:
- Line 26: poll_overrun_cooldown_seconds can be zero or negative from env which
allows tight-looping in monitor where sleep_for = max(interval - elapsed,
cooldown); enforce a positive cooldown by validating/clamping this config value
on load: in config.py, where poll_overrun_cooldown_seconds is read/initialized,
either raise a ValueError for <=0 or coerce with poll_overrun_cooldown_seconds =
max(1, parsed_value) (or another sensible minimum), and add a short comment and
unit test to cover the validation so monitor's sleep calculation always gets a
positive cooldown.
---
Nitpick comments:
In @.github/ISSUE_TEMPLATE/config.yml:
- Around line 1-2: The repo currently allows blank issues via the
blank_issues_enabled setting and leaves contact_links empty; change
blank_issues_enabled from true to false to prevent reporters bypassing templates
and either populate contact_links with appropriate intake/contact entries or
remove the empty contact_links array so the config enforces the intended intake
flow and security guidance; update the keys blank_issues_enabled and
contact_links accordingly.
In `@CODE_OF_CONDUCT.md`:
- Line 39: In CODE_OF_CONDUCT.md replace the placeholder sentence "Replace this
paragraph with a dedicated conduct email or ticketing process when your
organization publishes one." with the project's canonical reporting channel
(e.g., a dedicated conduct email address, a link to your internal ticketing
form, or a specific maintainer contact) so the paragraph reads as a concrete,
actionable reporting instruction; update the sentence that currently mentions
"repository maintainers" to reference the chosen channel and ensure the contact
method is clearly labeled (e.g., "To report abusive behavior, contact:
conduct@example.org" or "Submit incidents via:
https://support.example.com/conduct").
In `@src/paperscout/sources.py`:
- Around line 520-525: The Last-Modified parsing exception is currently
swallowed; update the except block that surrounds
parsedate_to_datetime/last_modified/is_recent to log the failure at debug level
including the raw lm_str and the exception (e.g., "Failed to parse Last-Modified
header: {lm_str} - {err}"), using the module/class logger (self.logger if
present or logging.getLogger(__name__)) so you capture malformed date headers
for debugging without noisy production logs.
In `@tests/conftest.py`:
- Line 208: The test fixture currently sets data_dir using
tempfile.mkdtemp(...), which leaves temp directories behind; change it to use a
context-managed TemporaryDirectory or pytest's tmp_path/tmp_path_factory so
directories are removed automatically — replace the tempfile.mkdtemp call used
to construct Path(data_dir=Path(tempfile.mkdtemp(prefix="paperscout-test-")))
with either a TemporaryDirectory() context (or tempfile.TemporaryDirectory()
returned path) or use the pytest tmp_path/tmp_path_factory to create the
directory for the fixture so cleanup is automatic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9c35171a-65c5-4842-b626-2d87df3794c3
📒 Files selected for processing (35)
.gitattributes.github/CODEOWNERS.github/ISSUE_TEMPLATE/bug_report.yml.github/ISSUE_TEMPLATE/config.yml.github/ISSUE_TEMPLATE/feature_request.yml.github/dependabot.yml.github/pull_request_template.md.github/workflows/ci.yml.github/workflows/codeql.yml.pre-commit-config.yaml.vscode/settings.jsonCHANGELOG.mdCODE_OF_CONDUCT.mdCONTRIBUTING.mdREADME.mdSECURITY.mddocs/handoff.mddocs/onboarding.mdpyproject.tomlsrc/paperscout/__init__.pysrc/paperscout/__main__.pysrc/paperscout/config.pysrc/paperscout/db.pysrc/paperscout/health.pysrc/paperscout/monitor.pysrc/paperscout/scout.pysrc/paperscout/sources.pysrc/paperscout/storage.pytests/conftest.pytests/test_health.pytests/test_models.pytests/test_monitor.pytests/test_scout.pytests/test_sources.pytests/test_storage.py
|
You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool. What Enabling Code Scanning Means:
For more information about GitHub Code Scanning, check out the documentation. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
docs/onboarding.md (2)
10-10:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the in-page deployment link target syntax.
Line 10 wraps the fragment in backticks, which breaks the anchor target for the “Deployment (summary)” section.
Suggested patch
-- Optional: **Docker** + Docker Compose for containerized runs (see [Deployment](`#deployment-summary`)) +- Optional: **Docker** + Docker Compose for containerized runs (see [Deployment](`#deployment-summary`))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/onboarding.md` at line 10, The in-page anchor link currently uses backticks around the fragment ([Deployment](`#deployment-summary`)) which breaks the target; update the markdown link in the line containing "Optional: **Docker** + Docker Compose..." to remove the backticks so the fragment becomes `#deployment-summary` (i.e., change [Deployment](`#deployment-summary`) to [Deployment](`#deployment-summary`)) ensuring it correctly links to the "Deployment (summary)" section.
80-80:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate CI branch wording to match current workflow scope.
Line 80 says
mainonly, which conflicts with the branch guidance elsewhere (main+develop) and can confuse new contributors.Suggested patch
-CI configuration: [.github/workflows/ci.yml](../.github/workflows/ci.yml) (Python 3.10–3.12 on `ubuntu-latest`, `main` only). +CI configuration: [.github/workflows/ci.yml](../.github/workflows/ci.yml) (Python 3.10–3.12 on `ubuntu-latest`, for `main` and `develop`).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/onboarding.md` at line 80, Update the CI branch wording in the onboarding sentence that currently reads "CI configuration: [.github/workflows/ci.yml] (Python 3.10–3.12 on `ubuntu-latest`, `main` only)" to reflect the actual workflow scope (e.g., "`main` and `develop`" or whatever branches the CI targets), ensuring the phrase precisely matches the project's CI configuration; edit that sentence in the onboarding documentation so it no longer contradicts other branch guidance.CONTRIBUTING.md (1)
45-56:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove contradictory CI-trigger wording in this section.
Line 45 already scopes behavior to configured branches, but Line 56 reintroduces “every push and pull request.” Please keep one consistent branch-scoped statement.
Suggested patch
-CI runs `pre-commit run --all-files` on every push and pull request (see the `lint` job in `.github/workflows/ci.yml`). +CI runs `pre-commit run --all-files` for pushes/PRs on configured branches (currently `main` and `develop`; see `.github/workflows/ci.yml`).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CONTRIBUTING.md` around lines 45 - 56, The section currently contains two contradictory CI descriptions; keep the branch-scoped wording mentioned earlier (CI runs `pre-commit run --all-files` for configured branches as defined in `.github/workflows/ci.yml`) and remove or change the later sentence that states CI runs "on every push and pull request" to match the branch-scoped behavior. Edit the paragraph that references the `lint` job in `ci.yml` (and the final sentence that repeats CI behavior) so only the branch-scoped statement using `pre-commit run --all-files` and the configured branches is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/CODEOWNERS:
- Around line 6-8: Remove the Markdown backticks and any trailing spaces from
the CODEOWNERS entries so owners are plain `@tokens` and lines end cleanly;
specifically update the entries for the global pattern (*) and the directory
patterns (/docs/ and /src/) to read with plain owner tokens (e.g., `@will`
`@henry0816191`) without surrounding backticks or trailing whitespace so the
CODEOWNERS parser and the trailing-whitespace pre-commit hook both pass.
---
Duplicate comments:
In `@CONTRIBUTING.md`:
- Around line 45-56: The section currently contains two contradictory CI
descriptions; keep the branch-scoped wording mentioned earlier (CI runs
`pre-commit run --all-files` for configured branches as defined in
`.github/workflows/ci.yml`) and remove or change the later sentence that states
CI runs "on every push and pull request" to match the branch-scoped behavior.
Edit the paragraph that references the `lint` job in `ci.yml` (and the final
sentence that repeats CI behavior) so only the branch-scoped statement using
`pre-commit run --all-files` and the configured branches is present.
In `@docs/onboarding.md`:
- Line 10: The in-page anchor link currently uses backticks around the fragment
([Deployment](`#deployment-summary`)) which breaks the target; update the
markdown link in the line containing "Optional: **Docker** + Docker Compose..."
to remove the backticks so the fragment becomes `#deployment-summary` (i.e.,
change [Deployment](`#deployment-summary`) to [Deployment](`#deployment-summary`))
ensuring it correctly links to the "Deployment (summary)" section.
- Line 80: Update the CI branch wording in the onboarding sentence that
currently reads "CI configuration: [.github/workflows/ci.yml] (Python 3.10–3.12
on `ubuntu-latest`, `main` only)" to reflect the actual workflow scope (e.g.,
"`main` and `develop`" or whatever branches the CI targets), ensuring the phrase
precisely matches the project's CI configuration; edit that sentence in the
onboarding documentation so it no longer contradicts other branch guidance.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3b538974-7c37-496b-a142-38f698f00f8b
📒 Files selected for processing (7)
.github/CODEOWNERS.github/workflows/codeql.ymlCONTRIBUTING.mdREADME.mddocs/handoff.mddocs/onboarding.mdsrc/paperscout/config.py
✅ Files skipped from review due to trivial changes (2)
- .github/workflows/codeql.yml
- docs/handoff.md
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/CODEOWNERS (1)
6-6: ⚡ Quick winAdd a second fallback owner to avoid review-routing bottlenecks.
*@wpak-ai`` is valid, but a single global owner creates a handoff risk if that account is unavailable. Add at least one additional maintainer/team to align with the “second maintainer” objective.Suggested minimal update
-* `@wpak-ai` +* `@wpak-ai` `@will`🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/CODEOWNERS at line 6, The CODEOWNERS entry currently lists a single global owner "* `@wpak-ai`" which creates a review-routing bottleneck; update that line in .github/CODEOWNERS to include at least one additional owner (either a second user handle or a team handle like `@org/team-name` or `@org/team`) as a fallback so reviews will be routed to both the primary and a secondary maintainer; ensure you follow CODEOWNERS syntax (space-separated owners) and pick an actual active account/team (e.g., add a known maintainer or on-call team) so the entry becomes "* `@wpak-ai` <second-owner>".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/CODEOWNERS:
- Line 6: The CODEOWNERS entry currently lists a single global owner "*
`@wpak-ai`" which creates a review-routing bottleneck; update that line in
.github/CODEOWNERS to include at least one additional owner (either a second
user handle or a team handle like `@org/team-name` or `@org/team`) as a fallback so
reviews will be routed to both the primary and a secondary maintainer; ensure
you follow CODEOWNERS syntax (space-separated owners) and pick an actual active
account/team (e.g., add a known maintainer or on-call team) so the entry becomes
"* `@wpak-ai` <second-owner>".
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ff0f59e7-9fd2-421f-87e1-2c198a81862c
📒 Files selected for processing (3)
.github/CODEOWNERSCONTRIBUTING.mddocs/handoff.md
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/handoff.md
- CONTRIBUTING.md
close #8
Summary by CodeRabbit
New Features
Documentation
Chores