Skip to content

fix: Keep named request queues across runs#2015

Merged
vdusek merged 2 commits into
masterfrom
fix/keep-named-request-queues
Jul 3, 2026
Merged

fix: Keep named request queues across runs#2015
vdusek merged 2 commits into
masterfrom
fix/keep-named-request-queues

Conversation

@vdusek

@vdusek vdusek commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Problem

A second crawler.run() with the default purge_request_queue=True purged the request manager unconditionally, including a user-supplied named RequestQueue. Named storages are documented as persistent, and StorageClient._purge_if_needed already exempts them from implicit purging.

Changes

The implicit purge in run() now skips named queues, including a named queue wrapped in a ThrottlingRequestManager.

Verification

Two regression tests that fail without the fix: a named queue survives a second run(), and the same holds when the named queue is wrapped in a ThrottlingRequestManager.

Behavior change

This intentionally changes observable behavior, since the old behavior was the bug: named request queues survive repeated runs.

@vdusek vdusek added t-tooling Issues with this label are in the ownership of the tooling team. adhoc Ad-hoc unplanned task added during the sprint. labels Jul 3, 2026
@vdusek vdusek self-assigned this Jul 3, 2026
@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.33%. Comparing base (7780e78) to head (0cc5929).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2015      +/-   ##
==========================================
- Coverage   93.35%   93.33%   -0.02%     
==========================================
  Files         179      179              
  Lines       12482    12488       +6     
==========================================
+ Hits        11652    11656       +4     
- Misses        830      832       +2     
Flag Coverage Δ
unit 93.33% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes BasicCrawler.run() so the default implicit purge on consecutive runs does not wipe a user-supplied named RequestQueue (including when wrapped by ThrottlingRequestManager), aligning run() behavior with the “named storages are persistent” contract used elsewhere in the storage layer.

Changes:

  • Update BasicCrawler.run() purge logic to skip purging when the effective queue is a named RequestQueue (with special handling for ThrottlingRequestManager).
  • Clarify the purge_request_queue docstring to document the named-queue exemption.
  • Add two regression tests covering consecutive runs with a named queue directly and with a named queue wrapped by ThrottlingRequestManager.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/crawlee/crawlers/_basic/_basic_crawler.py Skips implicit purge for named request queues (including when wrapped in ThrottlingRequestManager).
tests/unit/crawlers/_basic/test_basic_crawler.py Adds regression tests ensuring named queues survive consecutive run() calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/crawlee/crawlers/_basic/_basic_crawler.py Outdated
@vdusek vdusek merged commit 3b9f8c6 into master Jul 3, 2026
37 checks passed
@vdusek vdusek deleted the fix/keep-named-request-queues branch July 3, 2026 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants