-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Firecrawl - update Crawl URL to v2 #18280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughFirecrawl actions updated: crawl-url props revised (added prompt, renamed several fields), and its version set to 1.1.0. Firecrawl app now parameterizes API versioning, routing crawl to v2, others default to v1. Other actions received version bumps only. Package version incremented to 1.3.1. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant App as Firecrawl App
participant APIv2 as Firecrawl API /v2
participant APIv1 as Firecrawl API /v1
User->>App: Run Crawl URL (with optional prompt)
App->>APIv2: POST /v2/crawl (payload incl. prompt/props)
APIv2-->>App: Crawl job response
App-->>User: Return job info
User->>App: Other actions (scrape, map, status, extract)
App->>APIv1: Calls default /v1 endpoints
APIv1-->>App: Responses
App-->>User: Results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/firecrawl/firecrawl.app.mjs (1)
51-58: Bug risk:getCrawlStatusstill defaults to v1 while crawl jobs are created on v2.v2 crawls typically expect status reads at
/v2/crawl/{id}. Keeping this on v1 can produce 404s or mismatched behavior. Recommend pinning status reads to v2.Apply:
getCrawlStatus({ crawlId, ...opts }) { return this._makeRequest({ - path: `/crawl/${crawlId}`, + path: `/crawl/${crawlId}`, + version: "v2", ...opts, }); },Refs: v2 “Get Crawl Status” and examples returning v2 status URLs. (docs.firecrawl.dev)
🧹 Nitpick comments (6)
components/firecrawl/actions/map-url/map-url.mjs (2)
35-40: Explicitly pin API version to v1 for forward-compat.Map is expected to remain on v1; pinning avoids surprises if the app’s default changes later.
Apply this diff:
const response = await firecrawl._makeRequest({ $, path: "/map", method: "POST", - data, + data, + version: "v1", });
30-33: Minor grammar nit in comment.Use “it’s” instead of “its”.
- // Including search parameter in payload only when its not empty + // Include the search parameter in the payload only when it's not emptycomponents/firecrawl/firecrawl.app.mjs (2)
35-39: Nice:_makeRequestnow supports explicit API versioning.Consider setting a sane default timeout for external calls to avoid hanging workflows.
Apply:
- return axios($, { - url: `${this._baseUrl()}/${version}${path}`, - headers: this._headers(), - ...opts, - }); + return axios($, { + url: `${this._baseUrl()}/${version}${path}`, + headers: this._headers(), + timeout: 30_000, + ...opts, // allow overrides + });
43-50: Optional: expose v2 “params-preview” helper.Firecrawl offers
/v2/crawl/params-previewto translate a natural-language prompt into crawl options. Handy for UIs and debugging.Apply:
crawl(opts = {}) { return this._makeRequest({ method: "POST", path: "/crawl", version: "v2", ...opts, }); }, + crawlParamsPreview(opts = {}) { + return this._makeRequest({ + method: "POST", + path: "/crawl/params-preview", + version: "v2", + ...opts, + }); + },Docs: Params Preview (v2). (docs.firecrawl.dev)
components/firecrawl/actions/crawl-url/crawl-url.mjs (1)
70-75: Optional: addallowSubdomainsprop for completeness.It’s supported by v2; exposing it avoids forcing users to drop down to
additionalOptions.Apply:
allowExternalLinks: { type: "boolean", label: "Allow External Links", description: "Allows the crawler to follow links to external websites", optional: true, }, + allowSubdomains: { + type: "boolean", + label: "Allow Subdomains", + description: "Follow links to subdomains of the main domain", + optional: true, + },Docs (v2 crawl options list). (docs.firecrawl.dev)
components/firecrawl/actions/extract-data/extract-data.mjs (1)
79-89: Add a max wait and avoid extra sleep after terminal state.Prevent indefinite polling and skip the extra 3s delay once the job is no longer “processing”.
Apply:
- if (this.waitForCompletion) { - const id = response.id; - const timer = (ms) => new Promise((res) => setTimeout(res, ms)); - do { - response = await this.firecrawl.getExtractStatus({ - $, - id, - }); - await timer(3000); - } while (response.status === "processing"); - } + if (this.waitForCompletion) { + const id = response.id; + const timer = (ms) => new Promise((res) => setTimeout(res, ms)); + const maxAttempts = 100; // ~5 minutes + let attempts = 0; + do { + response = await this.firecrawl.getExtractStatus({ $, id }); + if (response.status !== "processing") break; + await timer(3000); + } while (++attempts < maxAttempts); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (9)
components/firecrawl/actions/crawl-url/crawl-url.mjs(4 hunks)components/firecrawl/actions/extract-data/extract-data.mjs(1 hunks)components/firecrawl/actions/get-crawl-status/get-crawl-status.mjs(1 hunks)components/firecrawl/actions/get-extract-status/get-extract-status.mjs(1 hunks)components/firecrawl/actions/map-url/map-url.mjs(1 hunks)components/firecrawl/actions/scrape-page/scrape-page.mjs(1 hunks)components/firecrawl/actions/search/search.mjs(1 hunks)components/firecrawl/firecrawl.app.mjs(2 hunks)components/firecrawl/package.json(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pnpm publish
- GitHub Check: Lint Code Base
- GitHub Check: Publish TypeScript components
- GitHub Check: Verify TypeScript components
🔇 Additional comments (16)
components/firecrawl/actions/search/search.mjs (1)
7-7: Version bump LGTM.No functional changes introduced here. Safe to publish.
components/firecrawl/package.json (1)
3-3: Package version bump looks good.Aligned with action/version updates across the component set.
components/firecrawl/actions/scrape-page/scrape-page.mjs (1)
10-10: Version bump only — OK.No behavior changes; existing scrape flow remains intact.
components/firecrawl/actions/get-crawl-status/get-crawl-status.mjs (1)
7-7: Status endpoint correctly uses defaultv1. Default_makeRequestinfirecrawl.app.mjssetsversion = "v1"andgetCrawlStatusdoes not override it, so the status call routes to v1 as expected.components/firecrawl/actions/map-url/map-url.mjs (1)
7-7: Version bump only — OK.No functional changes tied to this file’s metadata.
components/firecrawl/actions/get-extract-status/get-extract-status.mjs (1)
7-7: LGTM: version bump only.No behavioral changes; safe metadata update.
components/firecrawl/firecrawl.app.mjs (2)
26-28: Good: base URL no longer hardcodes version.This enables per-endpoint versioning downstream.
44-49: Correct:crawlnow targets v2.Matches Firecrawl’s v2 crawl POST docs and supports the new prompt-based crawl. (docs.firecrawl.dev)
components/firecrawl/actions/crawl-url/crawl-url.mjs (5)
8-8: LGTM: action version bumped to 1.1.0.
18-23: Good addition:promptprop surfaces v2’s NL-driven crawl config.This aligns with v2 crawl’s support for a top-level
prompt. (docs.firecrawl.dev)
36-41: Rename tomaxDiscoveryDepthmatches v2.Accords with v2 guidance that discovery depth replaces the old
maxDepthusage. (docs.firecrawl.dev)
42-51: Sitemap control matches v2 (include|skip).This mirrors v2 crawl’s improved sitemap handling. (docs.firecrawl.dev)
64-69:crawlEntireDomainexposed — nice parity with v2.This option is documented for v2 crawls. (docs.firecrawl.dev)
components/firecrawl/actions/extract-data/extract-data.mjs (3)
9-9: Version bump looks good.Action metadata updated to 0.0.3 without functional change. No concerns.
8-10: Confirm docs URL and API version alignment.Given this PR moves Crawl to v2, please double-check that the extract docs link and this action still target the correct API/version (extract may still be v1). Update the link or pin version if needed.
66-77: Verify extract endpoint version routing.The app-level versioning changed in this PR. Ensure
firecrawl.extract()is explicitly routed to the intended version (likely v1) and that all params here (enableWebSearch,ignoreSitemap,includeSubdomains,showSources) are supported for that version to avoid 400s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @michelle0927, LGTM! Ready for QA!
Resolves #18112
Summary by CodeRabbit
New Features
Chores