Skip to content

Conversation

@michelle0927
Copy link
Collaborator

@michelle0927 michelle0927 commented Sep 4, 2025

Resolves #18112

Summary by CodeRabbit

  • New Features

    • Added an optional Prompt field to generate crawler options from natural language.
    • Introduced Sitemap mode with “skip” or “include” options (replaces prior sitemap toggle).
    • Renamed and clarified Max Discovery Depth with discovery-order behavior.
    • Replaced backward-link setting with Crawl Entire Domain for broader domain traversal.
    • Crawl requests now use the v2 endpoint by default; other endpoints remain on v1.
  • Chores

    • Version bumps across crawl, extract, status, map, scrape, search actions and package.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Walkthrough

Firecrawl actions updated: crawl-url props revised (added prompt, renamed several fields), and its version set to 1.1.0. Firecrawl app now parameterizes API versioning, routing crawl to v2, others default to v1. Other actions received version bumps only. Package version incremented to 1.3.1.

Changes

Cohort / File(s) Summary
Crawl URL action props update
components/firecrawl/actions/crawl-url/crawl-url.mjs
Version 1.1.0; add optional prompt; rename maxDepth→maxDiscoveryDepth; replace ignoreSitemap→sitemap ("skip","include"); rename allowBackwardLinks→crawlEntireDomain; run logic unchanged.
Firecrawl app versioned requests
components/firecrawl/firecrawl.app.mjs
Base URL drops /v1; _makeRequest gains version param defaulting to "v1"; crawl endpoint explicitly uses v2; other endpoints remain at v1 unless overridden.
Action version bumps (no logic changes)
components/firecrawl/actions/extract-data/extract-data.mjs, .../get-crawl-status/get-crawl-status.mjs, .../get-extract-status/get-extract-status.mjs, .../map-url/map-url.mjs, .../scrape-page/scrape-page.mjs, .../search/search.mjs
Version fields incremented only; no behavioral or API surface changes.
Package version bump
components/firecrawl/package.json
Package version updated from 1.3.0 to 1.3.1.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant App as Firecrawl App
  participant APIv2 as Firecrawl API /v2
  participant APIv1 as Firecrawl API /v1

  User->>App: Run Crawl URL (with optional prompt)
  App->>APIv2: POST /v2/crawl (payload incl. prompt/props)
  APIv2-->>App: Crawl job response
  App-->>User: Return job info

  User->>App: Other actions (scrape, map, status, extract)
  App->>APIv1: Calls default /v1 endpoints
  APIv1-->>App: Responses
  App-->>User: Results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Assessment against linked issues

Objective Addressed Explanation
Add prompt field to crawl endpoint UI/action props (#18112)
Route crawl requests to Firecrawl v2 endpoint (#18112)
Keep non-crawl endpoints functional with current default versioning (#18112)
Align with Firecrawl changelog/docs for new crawl release (#18112) Docs parity beyond prompt and v2 not verifiable from diff.

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Rename maxDepth to maxDiscoveryDepth and update description (components/firecrawl/actions/crawl-url/crawl-url.mjs) Not specified in #18112; unclear if required by API change vs. internal labeling.
Replace ignoreSitemap (boolean) with sitemap (string: "skip","include") (components/firecrawl/actions/crawl-url/crawl-url.mjs) Not mentioned in #18112; may reflect product changes but not explicitly requested.
Rename allowBackwardLinks to crawlEntireDomain with broader description (components/firecrawl/actions/crawl-url/crawl-url.mjs) Outside the stated objective; unclear linkage to v2 crawl requirement.

Poem

I nudge the dials from v1 to v2,
A hop, a skip—new prompts to pursue.
Depth renamed, sitemaps in view,
I twitch my whiskers at endpoints anew.
Thump-thump! The crawl goes through—
Carrots cached, results in queue. 🥕✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch issue-18112

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vercel
Copy link

vercel bot commented Sep 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
pipedream-docs Ignored Ignored Sep 4, 2025 4:08pm
pipedream-docs-redirect-do-not-edit Ignored Ignored Sep 4, 2025 4:08pm

@michelle0927 michelle0927 marked this pull request as ready for review September 4, 2025 16:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/firecrawl/firecrawl.app.mjs (1)

51-58: Bug risk: getCrawlStatus still defaults to v1 while crawl jobs are created on v2.

v2 crawls typically expect status reads at /v2/crawl/{id}. Keeping this on v1 can produce 404s or mismatched behavior. Recommend pinning status reads to v2.

Apply:

   getCrawlStatus({
     crawlId, ...opts
   }) {
     return this._makeRequest({
-      path: `/crawl/${crawlId}`,
+      path: `/crawl/${crawlId}`,
+      version: "v2",
       ...opts,
     });
   },

Refs: v2 “Get Crawl Status” and examples returning v2 status URLs. (docs.firecrawl.dev)

🧹 Nitpick comments (6)
components/firecrawl/actions/map-url/map-url.mjs (2)

35-40: Explicitly pin API version to v1 for forward-compat.

Map is expected to remain on v1; pinning avoids surprises if the app’s default changes later.

Apply this diff:

     const response = await firecrawl._makeRequest({
       $,
       path: "/map",
       method: "POST",
-      data,
+      data,
+      version: "v1",
     });

30-33: Minor grammar nit in comment.

Use “it’s” instead of “its”.

-    // Including search parameter in payload only when its not empty
+    // Include the search parameter in the payload only when it's not empty
components/firecrawl/firecrawl.app.mjs (2)

35-39: Nice: _makeRequest now supports explicit API versioning.

Consider setting a sane default timeout for external calls to avoid hanging workflows.

Apply:

-      return axios($, {
-        url: `${this._baseUrl()}/${version}${path}`,
-        headers: this._headers(),
-        ...opts,
-      });
+      return axios($, {
+        url: `${this._baseUrl()}/${version}${path}`,
+        headers: this._headers(),
+        timeout: 30_000,
+        ...opts, // allow overrides
+      });

43-50: Optional: expose v2 “params-preview” helper.

Firecrawl offers /v2/crawl/params-preview to translate a natural-language prompt into crawl options. Handy for UIs and debugging.

Apply:

   crawl(opts = {}) {
     return this._makeRequest({
       method: "POST",
       path: "/crawl",
       version: "v2",
       ...opts,
     });
   },
+  crawlParamsPreview(opts = {}) {
+    return this._makeRequest({
+      method: "POST",
+      path: "/crawl/params-preview",
+      version: "v2",
+      ...opts,
+    });
+  },

Docs: Params Preview (v2). (docs.firecrawl.dev)

components/firecrawl/actions/crawl-url/crawl-url.mjs (1)

70-75: Optional: add allowSubdomains prop for completeness.

It’s supported by v2; exposing it avoids forcing users to drop down to additionalOptions.

Apply:

     allowExternalLinks: {
       type: "boolean",
       label: "Allow External Links",
       description: "Allows the crawler to follow links to external websites",
       optional: true,
     },
+    allowSubdomains: {
+      type: "boolean",
+      label: "Allow Subdomains",
+      description: "Follow links to subdomains of the main domain",
+      optional: true,
+    },

Docs (v2 crawl options list). (docs.firecrawl.dev)

components/firecrawl/actions/extract-data/extract-data.mjs (1)

79-89: Add a max wait and avoid extra sleep after terminal state.

Prevent indefinite polling and skip the extra 3s delay once the job is no longer “processing”.

Apply:

-    if (this.waitForCompletion) {
-      const id = response.id;
-      const timer = (ms) => new Promise((res) => setTimeout(res, ms));
-      do {
-        response = await this.firecrawl.getExtractStatus({
-          $,
-          id,
-        });
-        await timer(3000);
-      } while (response.status === "processing");
-    }
+    if (this.waitForCompletion) {
+      const id = response.id;
+      const timer = (ms) => new Promise((res) => setTimeout(res, ms));
+      const maxAttempts = 100; // ~5 minutes
+      let attempts = 0;
+      do {
+        response = await this.firecrawl.getExtractStatus({ $, id });
+        if (response.status !== "processing") break;
+        await timer(3000);
+      } while (++attempts < maxAttempts);
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cc81de2 and df5d759.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (9)
  • components/firecrawl/actions/crawl-url/crawl-url.mjs (4 hunks)
  • components/firecrawl/actions/extract-data/extract-data.mjs (1 hunks)
  • components/firecrawl/actions/get-crawl-status/get-crawl-status.mjs (1 hunks)
  • components/firecrawl/actions/get-extract-status/get-extract-status.mjs (1 hunks)
  • components/firecrawl/actions/map-url/map-url.mjs (1 hunks)
  • components/firecrawl/actions/scrape-page/scrape-page.mjs (1 hunks)
  • components/firecrawl/actions/search/search.mjs (1 hunks)
  • components/firecrawl/firecrawl.app.mjs (2 hunks)
  • components/firecrawl/package.json (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pnpm publish
  • GitHub Check: Lint Code Base
  • GitHub Check: Publish TypeScript components
  • GitHub Check: Verify TypeScript components
🔇 Additional comments (16)
components/firecrawl/actions/search/search.mjs (1)

7-7: Version bump LGTM.

No functional changes introduced here. Safe to publish.

components/firecrawl/package.json (1)

3-3: Package version bump looks good.

Aligned with action/version updates across the component set.

components/firecrawl/actions/scrape-page/scrape-page.mjs (1)

10-10: Version bump only — OK.

No behavior changes; existing scrape flow remains intact.

components/firecrawl/actions/get-crawl-status/get-crawl-status.mjs (1)

7-7: Status endpoint correctly uses default v1. Default _makeRequest in firecrawl.app.mjs sets version = "v1" and getCrawlStatus does not override it, so the status call routes to v1 as expected.

components/firecrawl/actions/map-url/map-url.mjs (1)

7-7: Version bump only — OK.

No functional changes tied to this file’s metadata.

components/firecrawl/actions/get-extract-status/get-extract-status.mjs (1)

7-7: LGTM: version bump only.

No behavioral changes; safe metadata update.

components/firecrawl/firecrawl.app.mjs (2)

26-28: Good: base URL no longer hardcodes version.

This enables per-endpoint versioning downstream.


44-49: Correct: crawl now targets v2.

Matches Firecrawl’s v2 crawl POST docs and supports the new prompt-based crawl. (docs.firecrawl.dev)

components/firecrawl/actions/crawl-url/crawl-url.mjs (5)

8-8: LGTM: action version bumped to 1.1.0.


18-23: Good addition: prompt prop surfaces v2’s NL-driven crawl config.

This aligns with v2 crawl’s support for a top-level prompt. (docs.firecrawl.dev)


36-41: Rename to maxDiscoveryDepth matches v2.

Accords with v2 guidance that discovery depth replaces the old maxDepth usage. (docs.firecrawl.dev)


42-51: Sitemap control matches v2 (include | skip).

This mirrors v2 crawl’s improved sitemap handling. (docs.firecrawl.dev)


64-69: crawlEntireDomain exposed — nice parity with v2.

This option is documented for v2 crawls. (docs.firecrawl.dev)

components/firecrawl/actions/extract-data/extract-data.mjs (3)

9-9: Version bump looks good.

Action metadata updated to 0.0.3 without functional change. No concerns.


8-10: Confirm docs URL and API version alignment.

Given this PR moves Crawl to v2, please double-check that the extract docs link and this action still target the correct API/version (extract may still be v1). Update the link or pin version if needed.


66-77: Verify extract endpoint version routing.

The app-level versioning changed in this PR. Ensure firecrawl.extract() is explicitly routed to the intended version (likely v1) and that all params here (enableWebSearch, ignoreSitemap, includeSubdomains, showSources) are supported for that version to avoid 400s.

Copy link
Collaborator

@luancazarine luancazarine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @michelle0927, LGTM! Ready for QA!

@michelle0927 michelle0927 merged commit 99d800d into master Sep 6, 2025
10 checks passed
@michelle0927 michelle0927 deleted the issue-18112 branch September 6, 2025 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ACTION] Firecrawl update API for new release

3 participants