-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Apify (API Key) - update scrape-single-url #18210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the WalkthroughReplaced Apify actor-based scraping with a direct HTTP fetch via got-scraping in the scrape-single-url action. Updated action metadata (description, version, props) to reflect HTML output and simplified inputs. Incremented the package version and added got-scraping as a dependency. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as Caller
participant A as scrape-single-url Action
participant G as got-scraping
participant W as Target Website
Note over A: New flow: direct fetch via got-scraping
U->>A: invoke({ url })
A->>G: gotScraping({ url })
G->>W: HTTP GET url
W-->>G: HTML response
G-->>A: { body: "<html>..." }
A-->>U: HTML body
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
components/apify/package.json (1)
17-18: Dependency addition is correct; confirm runtime compatibilitygot-scraping requires Node ≥16. Ensure the Pipedream runtime supports Node ≥16, or optionally add an engines constraint:
{ "name": "@pipedream/apify", "version": "0.3.0", "description": "Pipedream Apify Components", "main": "apify.app.mjs", + "engines": { + "node": ">=16" + }, "keywords": [Run to verify got-scraping’s engine requirement:
npm view got-scraping@4.1.2 engines # returns { node: '>=16' }components/apify/actions/scrape-single-url/scrape-single-url.mjs (1)
7-8: Clarify description to reflect implementation (no Actor/proxy by default)Recommend making it explicit that this uses got-scraping directly and does not invoke an Apify Actor or proxy unless configured.
- description: "Executes a scraper on a specific website and returns its content as HTML. This action is perfect for extracting content from a single page. [See the documentation](https://docs.apify.com/sdk/js/docs/examples/crawl-single-url)", + description: "Fetches a single URL using got-scraping and returns the page HTML. Does not invoke an Apify Actor or use Apify Proxy by default. [See the documentation](https://docs.apify.com/sdk/js/docs/examples/crawl-single-url)",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (2)
components/apify/actions/scrape-single-url/scrape-single-url.mjs(1 hunks)components/apify/package.json(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Verify TypeScript components
- GitHub Check: Publish TypeScript components
- GitHub Check: pnpm publish
- GitHub Check: Lint Code Base
🔇 Additional comments (2)
components/apify/package.json (1)
3-3: Semver bump looks right given behavioral changeReturning plain HTML instead of an actor response is a breaking behavioral change at the action level; bumping the package to 0.3.0 is appropriate.
components/apify/actions/scrape-single-url/scrape-single-url.mjs (1)
2-2: Using got-scraping is appropriate for simple single-URL fetchesImport looks good and matches the dependency added in package.json.
|
Hi everyone, all test cases are passed! Ready for release! Test report |
The current version of the Scrape Single URL action isn't returning results. This PR updates the component to use the "Crawl a Single URL" example from Apify's documentation.
Summary by CodeRabbit