Skip to content

Conversation

@michelle0927
Copy link
Collaborator

@michelle0927 michelle0927 commented Aug 28, 2025

The current version of the Scrape Single URL action isn't returning results. This PR updates the component to use the "Crawl a Single URL" example from Apify's documentation.

Summary by CodeRabbit

  • New Features
    • Single-URL scraping now fetches and returns the page’s HTML directly for easier consumption.
  • Refactor
    • Simplified configuration by removing the crawler type option; a single straightforward fetch is used.
    • Output format changed from a complex response to raw HTML content.
  • Documentation
    • Updated action description to clarify HTML output and added a reference to documentation.
  • Chores
    • Bumped component version and added a new dependency to support the updated scraping approach.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Replaced Apify actor-based scraping with a direct HTTP fetch via got-scraping in the scrape-single-url action. Updated action metadata (description, version, props) to reflect HTML output and simplified inputs. Incremented the package version and added got-scraping as a dependency.

Changes

Cohort / File(s) Summary
Scrape action refactor
components/apify/actions/scrape-single-url/scrape-single-url.mjs
Switched from this.apify.runActor to gotScraping({ url }); return value changed to HTML body. Updated description (now references HTML and docs) and version (0.0.4 → 0.1.0). Removed crawlerType prop; simplified url prop (dropped explicit optional flag).
Package management updates
components/apify/package.json
Bumped package version 0.2.2 → 0.3.0. Added dependency got-scraping@^4.1.2. Existing dependency versions unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as Caller
  participant A as scrape-single-url Action
  participant G as got-scraping
  participant W as Target Website

  Note over A: New flow: direct fetch via got-scraping
  U->>A: invoke({ url })
  A->>G: gotScraping({ url })
  G->>W: HTTP GET url
  W-->>G: HTML response
  G-->>A: { body: "<html>..." }
  A-->>U: HTML body
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitched my nose at the changing breeze,
Swapped actors for scraping with elegant ease.
One hop to the page, HTML in paw,
A simpler trail, no crawler to draw.
Version bumps made, dependencies tight—
Carrot-commit: crisp, clean, light. 🥕✨

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch issue-18167-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vercel
Copy link

vercel bot commented Aug 28, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
pipedream-docs Ignored Ignored Aug 28, 2025 9:09pm
pipedream-docs-redirect-do-not-edit Ignored Ignored Aug 28, 2025 9:09pm

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
components/apify/package.json (1)

17-18: Dependency addition is correct; confirm runtime compatibility

got-scraping requires Node ≥16. Ensure the Pipedream runtime supports Node ≥16, or optionally add an engines constraint:

 {
   "name": "@pipedream/apify",
   "version": "0.3.0",
   "description": "Pipedream Apify Components",
   "main": "apify.app.mjs",
+  "engines": {
+    "node": ">=16"
+  },
   "keywords": [

Run to verify got-scraping’s engine requirement:

npm view got-scraping@4.1.2 engines
# returns { node: '>=16' }
components/apify/actions/scrape-single-url/scrape-single-url.mjs (1)

7-8: Clarify description to reflect implementation (no Actor/proxy by default)

Recommend making it explicit that this uses got-scraping directly and does not invoke an Apify Actor or proxy unless configured.

-  description: "Executes a scraper on a specific website and returns its content as HTML. This action is perfect for extracting content from a single page. [See the documentation](https://docs.apify.com/sdk/js/docs/examples/crawl-single-url)",
+  description: "Fetches a single URL using got-scraping and returns the page HTML. Does not invoke an Apify Actor or use Apify Proxy by default. [See the documentation](https://docs.apify.com/sdk/js/docs/examples/crawl-single-url)",
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 577bd0f and 84938b5.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (2)
  • components/apify/actions/scrape-single-url/scrape-single-url.mjs (1 hunks)
  • components/apify/package.json (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Verify TypeScript components
  • GitHub Check: Publish TypeScript components
  • GitHub Check: pnpm publish
  • GitHub Check: Lint Code Base
🔇 Additional comments (2)
components/apify/package.json (1)

3-3: Semver bump looks right given behavioral change

Returning plain HTML instead of an actor response is a breaking behavioral change at the action level; bumping the package to 0.3.0 is appropriate.

components/apify/actions/scrape-single-url/scrape-single-url.mjs (1)

2-2: Using got-scraping is appropriate for simple single-URL fetches

Import looks good and matches the dependency added in package.json.

@lcaresia lcaresia moved this from Ready for PR Review to Ready for QA in Component (Source and Action) Backlog Aug 28, 2025
@vunguyenhung vunguyenhung moved this from Ready for QA to In QA in Component (Source and Action) Backlog Aug 29, 2025
@vunguyenhung vunguyenhung moved this from In QA to Ready for Release in Component (Source and Action) Backlog Aug 29, 2025
@vunguyenhung
Copy link
Collaborator

Hi everyone, all test cases are passed! Ready for release!

Test report
https://vunguyenhung.notion.site/Apify-API-Key-update-scrape-single-url-25dbf548bb5e818891cbea4a8e2babd0

@michelle0927 michelle0927 merged commit 59b21cc into master Aug 29, 2025
10 checks passed
@michelle0927 michelle0927 deleted the issue-18167-2 branch August 29, 2025 14:30
@github-project-automation github-project-automation bot moved this from Ready for Release to Done in Component (Source and Action) Backlog Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants