refactor: use `ContextPipeline` to initialize `BasicCrawler`'s context idiomatically by barjin · Pull Request #3388 · apify/crawlee

barjin · 2026-02-05T15:05:48Z

Extracts all CrawlingContext initialization to ContextPipeline steps to tighten the control over the CrawlingContext contents.

Blocks #3380

…iomatically

Copilot

Pull request overview

This pull request refactors the context initialization logic in Crawlee's crawler architecture by moving all CrawlingContext setup into the ContextPipeline. This change provides tighter control over context construction and prepares the codebase for the upcoming session pool exclusivity changes in PR #3380.

Changes:

Introduces a new buildContextPipeline() method in BasicCrawler that handles all core context initialization (helpers, request fetching, session management, etc.)
Moves context pipeline invocation from runRequestHandler() to the runTaskFunction level in AutoscaledPool
Updates subclasses (HttpCrawler, BrowserCrawler, FileDownload) to call super.buildContextPipeline() and extend the pipeline idiomatically

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
packages/basic-crawler/src/internals/basic-crawler.ts	Adds `buildContextPipeline()` method for idiomatic context initialization; refactors `runTaskFunction` to invoke the pipeline at a higher level with improved error handling
packages/browser-crawler/src/internals/browser-crawler.ts	Updates to call `super.buildContextPipeline()` and adds `override` keyword for type safety
packages/http-crawler/src/internals/http-crawler.ts	Updates to call `super.buildContextPipeline()` instead of creating a new pipeline; moves `ContextPipeline` import to type-only import
packages/http-crawler/src/internals/file-download.ts	Updates to call `this.buildContextPipeline()` for consistency with the new architecture
packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts	Updates to apply result-bound helpers after pipeline execution to avoid being overwritten by base crawler helpers

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts

janbuchar

this is more of a refactor, I'd say...

packages/basic-crawler/src/internals/basic-crawler.ts

…extHelpers The enqueueLinks helper was accidentally removed from the resultBoundContextHelpers, causing links to not be enqueued correctly through the RequestHandlerResult in the adaptive crawler.

…line building Start context pipelines from {} instead of lying about an empty object being a CrawlingContext. The pipeline gradually extends the type through compose() calls until it reaches the final CrawlingContext shape.

janbuchar

Just a bunch of nits, good stuff overall!

packages/basic-crawler/src/internals/basic-crawler.ts

…line

janbuchar

Only three comments, two of them are fairly important.

packages/basic-crawler/src/internals/basic-crawler.ts

packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts

packages/basic-crawler/src/internals/basic-crawler.ts

janbuchar · 2026-02-16T15:02:41Z

packages/basic-crawler/src/internals/basic-crawler.ts

     * then retries them in a case of an error, etc.
     */
-    protected async _runTaskFunction() {
+    protected async _runTaskFunction(crawlingContext: ExtendedContext) {


Calling contextPipeline.call with this from autoscaledPoolOptions in run makes the split between runTaskFunction and runRequestHandler awkward. Do we even need both?

Also, originally, adaptive crawler would override runRequestHandler with the two-pipeline mechanism. Now, this mechanism is a part of one more, "outer" context pipeline. Is that intentional? I guess it shouldn't have any unforeseeable consequences, but still, it is an unexpected pattern.

Updated in the last three commits. The basic pipeline is now separate, and all the crawler "subclass" pipelines expect its output as their input.

This is further enabled by the new .chain() API from 797003b.

This means (among other things) that the AdaptivePlaywrightCrawler http / browser approach will both run on the same "base" context, but won't run the BasicCrawler's pipeline again.

Okay, I like the changes, buuut... BasicCrawler._runTaskFunction is called by the arrow function passed to AutoscaledPool, and delegates to BasicCrawler.runRequestHandler, right? And the chained pipeline is between the AutoscaledPool arrow function and BasicCrawler._runTaskFunction. So when AdaptivePlaywrightCrawler overrides runRequestHandler, it still runs the two pipelines "inside" another pipeline?

I'm not sure I fully understand, but I believe what you're describing is correct and in line with the recent changes.

BasicCrawler will create the basic context (turquoise), which should be the same for the entirety of the request processing (request data, session (proxy, etc.), helpers...). This context is then used as the base for both the static and browser processing (note that staticContextPipeline / browserContextPipeline now start with the BasicContext and only add the crawler-specific bits). These are the green areas. Note that both modes share the same request/session, etc.

AdaptivePlaywrightCrawler doesn't have its own pipeline extension (buildContextPipeline implementation), so its native crawling context === base crawling context - even after the .chain() call.

What am I missing? 👀

Oh I think I finally understand how the adaptive crawler did not break with your changes 😁 See, previously, it did not call the "outer" pipeline at all - https://github.com/apify/crawlee/pull/3388/changes#diff-f409afe36a2511464bd45cfdf042c4f0a2e47717f2a55f951b4457757c95ff58R309 just returned a null that would crash it if it did. Instead, it put its context pipeline logic in the runRequestHandler override.

Your changes substitute the "basic" pipeline in case the contextPipelineBuilder returns null. This is problematic from a type safety perspective (the crawler uses a pipeline that works with a smaller context type than what its type parameters require).

Any ideas what to do about that?

Can we use AdaptivePlaywrightCrawler.buildContextPipeline to return a bunch of bogus Proxy objects, throwing errors on property access/call (in case the user somehow gets to these directly)? All of them would then get overridden in the inner pipelines with the correct equivalents. wdyt?

packages/browser-crawler/src/internals/browser-crawler.ts

packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts

…ptivePlaywrightCrawler`

packages/basic-crawler/src/internals/basic-crawler.ts

packages/core/src/crawlers/context_pipeline.ts

janbuchar · 2026-02-23T16:44:08Z

packages/basic-crawler/src/internals/basic-crawler.ts

     * then retries them in a case of an error, etc.
     */
-    protected async _runTaskFunction() {
+    protected async _runTaskFunction(crawlingContext: ExtendedContext) {


Okay, I like the changes, buuut... BasicCrawler._runTaskFunction is called by the arrow function passed to AutoscaledPool, and delegates to BasicCrawler.runRequestHandler, right? And the chained pipeline is between the AutoscaledPool arrow function and BasicCrawler._runTaskFunction. So when AdaptivePlaywrightCrawler overrides runRequestHandler, it still runs the two pipelines "inside" another pipeline?

Copilot · 2026-02-24T09:37:13Z

@barjin I've opened a new pull request, #3436, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-02-24T09:44:45Z

@barjin I've opened a new pull request, #3437, to work on those changes. Once the pull request is ready, I'll request review from you.

janbuchar · 2026-02-24T13:39:39Z

packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts

-        const subCrawlerContext = { ...context, ...resultBoundContextHelpers };
+        const subCrawlerContext = { ...context };
+
+        for (const [key, descriptor] of Object.entries(Object.getOwnPropertyDescriptors(resultBoundContextHelpers))) {


Can you add a comment to explain this please?

janbuchar · 2026-02-24T13:49:57Z

packages/basic-crawler/src/internals/basic-crawler.ts

     * then retries them in a case of an error, etc.
     */
-    protected async _runTaskFunction() {
+    protected async _runTaskFunction(crawlingContext: ExtendedContext) {


Oh I think I finally understand how the adaptive crawler did not break with your changes 😁 See, previously, it did not call the "outer" pipeline at all - https://github.com/apify/crawlee/pull/3388/changes#diff-f409afe36a2511464bd45cfdf042c4f0a2e47717f2a55f951b4457757c95ff58R309 just returned a null that would crash it if it did. Instead, it put its context pipeline logic in the runRequestHandler override.

Your changes substitute the "basic" pipeline in case the contextPipelineBuilder returns null. This is problematic from a type safety perspective (the crawler uses a pipeline that works with a smaller context type than what its type parameters require).

Any ideas what to do about that?

chore: use ContextPipeline to initialize Request and Session id…

57267ac

…iomatically

barjin self-assigned this Feb 5, 2026

barjin added the adhoc Ad-hoc unplanned task added during the sprint. label Feb 5, 2026

barjin added 3 commits February 5, 2026 16:08

chore: run linter

f3eb103

chore: fix basic-crawler

d878cc3

fix: implement AdaptiveCrawler tricks with the new context pipeline

42a47d4

barjin requested a review from Copilot February 6, 2026 14:23

Copilot started reviewing on behalf of barjin February 6, 2026 14:24 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts Show resolved Hide resolved

barjin marked this pull request as ready for review February 6, 2026 15:50

barjin requested review from janbuchar February 6, 2026 15:50

janbuchar reviewed Feb 6, 2026

View reviewed changes

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved

barjin added 5 commits February 8, 2026 12:11

fix: restore missing enqueueLinks in adaptive crawler resultBoundCont…

7ae1ecf

…extHelpers The enqueueLinks helper was accidentally removed from the resultBoundContextHelpers, causing links to not be enqueued correctly through the RequestHandlerResult in the adaptive crawler.

chore: trim verbose JSDoc on buildContextPipeline()

74ed230

chore: remove unnecessary comment in adaptive-playwright-crawler

3ca5f40

fix: AdaptiveCrawler patches to BasicCrawler

ac4cd1f

barjin changed the title ~~chore: use ContextPipeline to initialize BasicCrawler's context idiomatically~~ refactor: use ContextPipeline to initialize BasicCrawler's context idiomatically Feb 9, 2026

janbuchar reviewed Feb 10, 2026

View reviewed changes

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved

chore: apply PR comments

35b769c

janbuchar reviewed Feb 12, 2026

View reviewed changes

packages/basic-crawler/src/internals/basic-crawler.ts Show resolved Hide resolved

barjin added 2 commits February 13, 2026 12:15

refactor: consistent use of contextPipelineBuilder / buildContextPipe…

d08424b

…line

chore: apply PR suggestion

2564f85

barjin requested a review from janbuchar February 13, 2026 12:18

janbuchar reviewed Feb 13, 2026

View reviewed changes

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved

packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts Outdated Show resolved Hide resolved

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved

barjin added 3 commits February 16, 2026 11:10

chore: extract BasicCrawler pipeline steps to methods

2eecaad

chore: make BasicCrawler pipeline start with { Request }

9e74fe7

Merge branch 'v4' into chore/more-context-pipeline

4f5d471

barjin force-pushed the chore/more-context-pipeline branch from c3c8474 to 4f5d471 Compare February 16, 2026 12:01

chore: type fixes

b744f96

barjin requested review from janbuchar and removed request for janbuchar February 16, 2026 12:39

barjin added 2 commits February 16, 2026 13:58

chore: early return on null request

93d952a

chore: tighter context builder types

b0adeb9

barjin requested a review from janbuchar February 16, 2026 13:03

janbuchar reviewed Feb 16, 2026

View reviewed changes

barjin added 2 commits February 17, 2026 11:28

chore: review comments

f6c13bb

chore: skip non-configurable properties in ContextPipeline

075af99

barjin force-pushed the chore/more-context-pipeline branch from 9f17fb8 to 075af99 Compare February 17, 2026 12:34

barjin added this to the 4.0 milestone Feb 17, 2026

barjin mentioned this pull request Feb 20, 2026

refactor!: Extract service management from Configuration into ServiceLocator class #3325

Merged

15 tasks

barjin added 5 commits February 23, 2026 11:55

Merge branch 'v4' into chore/more-context-pipeline

c1ba7f1

chore: allow symbols in context extensions

9c5bb8f

fix: separate BasicCrawler context pipeline runs only once for `Ada…

953c191

…ptivePlaywrightCrawler`

fix: handle pipeline errors correctly with .chain

797003b

chore: run linter

f7c82a3

janbuchar reviewed Feb 23, 2026

View reviewed changes

This was referenced Feb 24, 2026

[WIP] Add unit tests for chain method in BasicCrawler #3436

Closed

Add JSDoc for basicContextPipeline getter in BasicCrawler #3437

Closed

barjin added 4 commits February 24, 2026 10:48

chore: add unit tests for the .chain() method

c3f540d

chore: add better docs, tests

6f9a182

refactor: extract isAllowedBasedOnRobotsTxtFile to a pipeline step

0b461a9

chore: fix comments about error handling

b2e8f86

janbuchar reviewed Feb 24, 2026

View reviewed changes

Comments

Conversation

barjin commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

janbuchar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janbuchar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janbuchar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

barjin Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Feb 24, 2026

Uh oh!

Copilot AI commented Feb 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

barjin Feb 25, 2026 •

edited

Loading