Skip to content

[repo-monitor] Medium: Checkpoint on pause loses in-flight request results #9

@Liohtml

Description

@Liohtml

Summary

When request_pause() is called, the crawl loop breaks immediately without waiting for in-flight tokio::spawn tasks to finish. The checkpoint is saved before those tasks complete, permanently losing any URLs they would have enqueued.

Location

  • File: src/spiders/engine.rs
  • Line(s): 147–219

Severity

Medium

Details

Active spawned tasks continue executing after the loop breaks. The checkpoint is saved immediately, but URLs enqueued by in-flight tasks are lost. A resumed crawl from the checkpoint will miss those URLs.

Suggested Fix

Wait for all active tasks to drain before saving the checkpoint:

// After loop break:
while self.active_tasks.load(Ordering::SeqCst) > 0 {
    tokio::time::sleep(Duration::from_millis(50)).await;
}
// Then save checkpoint

Automated finding by repo-monitor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions