Skip to content

Implicit Single-Thread Contract Breaks on Extension #37

@henry0816191

Description

@henry0816191

Problem

ISOProber._stats is mutated from coroutines dispatched via asyncio.gather without explicit synchronization. This is safe today because asyncio's cooperative scheduling guarantees single-threaded execution within the event loop — but this safety guarantee is implicit and undocumented. The project already uses asyncio.to_thread() for matches_for_users in monitor.py:233, establishing a precedent that a future contributor could follow when adding a new blocking data source. If a contributor wraps a new data source call in asyncio.to_thread() and that code path touches _stats, the implicit safety invariant breaks without any change to ISOProber itself.

Acceptance Criteria

  • Add a clear docstring or code comment on ISOProber._stats documenting the single-thread invariant: "This dict is mutated from async coroutines on the event loop. Thread-safety depends on asyncio cooperative scheduling. Do NOT access from asyncio.to_thread() or thread pool executors."
  • Add a CONTRIBUTING.md or docs/architecture.md section documenting the concurrency model: what runs on the event loop, what runs in threads, and the rules for adding new data sources
  • Consider adding a threading.Lock guard around _stats mutations as defense-in-depth (low overhead for dict updates, eliminates the implicit contract)
  • Add a test that verifies _stats integrity when probe_all() processes concurrent batches

Bugfix bundle — WG21Index / self.papers contract (paperscout_bugfix_bundle_27f91caa.plan.md §5)

  • Add a one-line comment on WG21Index.self.papers (or immediately below the class docstring in sources.py): self.papers is replaced wholesale on every refresh(); never mutate in place — this is why len(index.papers) from Bolt / health threads is safe today (no separate paper_count() API required unless the implementation later mutates in place).

Implementation Notes

The lightweight fix is documentation + a defensive lock. The _stats dict is small and updated infrequently (once per probe batch), so lock contention is negligible. The WG21Index docstring at sources.py:33-41 already warns against cross-thread access — extend this pattern to ISOProber. The open-std.org scraper at sources.py:607-649 is the most likely extension point; ensure its integration pattern is documented.

References

  • Eval finding: Compound-6 (Implicit Concurrency / Extension), T13 residual + T23
  • Related files: src/paperscout/sources.py (ISOProber._stats, WG21Index), src/paperscout/monitor.py (asyncio.to_thread usage)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions