Summary
When request_pause() is called, the crawl loop breaks immediately without waiting for in-flight tokio::spawn tasks to finish. The checkpoint is saved before those tasks complete, permanently losing any URLs they would have enqueued.
Location
- File:
src/spiders/engine.rs
- Line(s): 147–219
Severity
Medium
Details
Active spawned tasks continue executing after the loop breaks. The checkpoint is saved immediately, but URLs enqueued by in-flight tasks are lost. A resumed crawl from the checkpoint will miss those URLs.
Suggested Fix
Wait for all active tasks to drain before saving the checkpoint:
// After loop break:
while self.active_tasks.load(Ordering::SeqCst) > 0 {
tokio::time::sleep(Duration::from_millis(50)).await;
}
// Then save checkpoint
Automated finding by repo-monitor
Summary
When
request_pause()is called, the crawl loop breaks immediately without waiting for in-flighttokio::spawntasks to finish. The checkpoint is saved before those tasks complete, permanently losing any URLs they would have enqueued.Location
src/spiders/engine.rsSeverity
Medium
Details
Active spawned tasks continue executing after the loop breaks. The checkpoint is saved immediately, but URLs enqueued by in-flight tasks are lost. A resumed crawl from the checkpoint will miss those URLs.
Suggested Fix
Wait for all active tasks to drain before saving the checkpoint:
Automated finding by repo-monitor