Skip to content

Probe Volume Envelope is Undocumented Institutional Knowledge #39

@henry0816191

Description

@henry0816191

Problem

The probe cycle's performance envelope — what constitutes a normal cycle (request count, timing, error rate), what thresholds indicate degradation, and what operational responses are appropriate — exists only as the sole contributor's institutional knowledge. With a bus factor of one, this knowledge has no durable representation. A future operator inheriting the service would have no quantitative reference for the system's most operationally sensitive component: the HEAD request volume against isocpp.org.

Acceptance Criteria

  • Create docs/probe-operations.md documenting: normal request count per cycle (1,600-2,000), expected cycle duration, hot/cold split ratio, configurable parameters and their effects, and degradation indicators
  • Document the HTTP_CONCURRENCY, POLL_INTERVAL_SECONDS, and POLL_OVERRUN_COOLDOWN_SECONDS settings with their operational implications and recommended ranges
  • Include a "What to do if..." troubleshooting section: cycle takes >X minutes, error rate exceeds Y%, isocpp.org returns 429s
  • Add structured logging (or Prometheus metrics if applicable) that emits per-cycle summary stats: request count, success/error counts, wall-clock duration

Bugfix bundle — Last-Modified handling in probes (paperscout_bugfix_bundle_27f91caa.plan.md §2)

  • In ISOProber._probe_one (sources.py): after parsedate_to_datetime, if last_modified.tzinfo is None, normalize to UTC (replace(tzinfo=timezone.utc)) before comparing to datetime.now(timezone.utc) (document in a short comment).
  • On parse failure or post-parse errors in that block: set last_modified = None, is_recent = True, bucket as hit_no_lm (same as absent header; comment that bad LM is intentionally merged with no-LM to avoid silent drops).
  • Update ProbeHit docstring in models.py so “recent” explicitly includes missing or unusable Last-Modified.
  • Invert tests/test_sources.py test_probe_one_bad_last_modified_header to expect is_recent is True and last_modified is None; add a test for naive-but-parseable LM within the alert window if practical.

Implementation Notes

This item complements item 2 (benchmark) but focuses on operational documentation rather than automated testing. The structured logging addition is the highest-value code change: emit a single log line per cycle with {"cycle_requests": N, "cycle_duration_s": X, "errors": M, "hot_probes": H, "cold_probes": C}. This gives operators (and future Grafana/Datadog integrations) a machine-readable performance record without requiring a full observability stack.

References

  • Eval finding: Compound-5 (Unobservable Probe Volume), T33 + T38
  • Related files: src/paperscout/sources.py (ISOProber), src/paperscout/config.py, src/paperscout/monitor.py

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions