Skip to content

Releases: apify/crawlee

v0.16.0

30 Sep 09:51
Compare
Choose a tag to compare
  • Update @apify/http-request to version 1.1.2.
  • Update CheerioCrawler to use requestAsBrowser() to better disguise as a real browser.

v0.15.5

19 Aug 07:46
Compare
Choose a tag to compare
  • This release just updates some dependencies (not Puppeteer).

v0.15.4

02 Aug 10:22
Compare
Choose a tag to compare
  • DEPRECATED: dataset.delete(), keyValueStore.delete() and requestQueue.delete() methods have been deprecated in favor of *.drop() methods, because the drop name more clearly communicates the fact that those methods drop / delete the storage itself, not individual elements in the storage.
  • Added Apify.utils.requestAsBrowser() helper function that enables you to make HTTP(S) requests disguising as a browser (Firefox). This may help in overcoming certain anti-scraping and anti-bot protections.
  • Added options.gotoTimeoutSecs to PuppeteerCrawler to enable easier setting of navigation timeouts.
  • PuppeteerPool options that were deprecated from the PuppeteerCrawler constructor were finally removed. Please use maxOpenPagesPerInstance, retireInstanceAfterRequestCount, instanceKillerIntervalSecs, killInstanceAfterSecs and proxyUrls via the puppeteerPoolOptions object.
  • On the Apify Platform a warning will now be printed when using an outdated apify package version.
  • Apify.utils.puppeteer.enqueueLinksByClickingElements() will now print a warning when the nodes it
    tries to click become modified (detached from DOM). This is useful to debug unexpected behavior.

v0.15.3

29 Jul 12:15
Compare
Choose a tag to compare
  • Apify.launchPuppeteer() now accepts proxyUrl with the https, socks4
    and socks5 schemes, as long as it doesn't contain username or password.
    This is to fix Issue #420.
  • Added desiredConcurrency option to AutoscaledPool constructor, removed
    unnecessary bound check from the setter property

v0.15.2

11 Jul 11:31
Compare
Choose a tag to compare
  • Fix error where Puppeteer would fail to launch when pipes are turned off.
  • Switch back to default Web Socket transport for Puppeteer due to upstream issues.

v0.15.1

09 Jul 08:03
Compare
Choose a tag to compare
  • BREAKING CHANGE Removed support for Web Driver (Selenium) since no further updates are planned.
    If you wish to continue using Web Driver, please stay on Apify SDK version ^0.14.15
  • BREAKING CHANGE: Dataset.getData() throws an error if user provides an unsupported option
    when using local disk storage.
  • DEPRECATED: options.userData of Apify.utils.enqueueLinks() is deprecated.
    Use options.transformRequestFunction instead.
  • Improve logging of memory overload errors.
  • Improve error message in Apify.call().
  • Fix multiple log lines appearing when a crawler was about to finish.
  • Add Apify.utils.puppeteer.enqueueLinksByClickingElements() function which enables you
    to add requests to the queue from pure JavaScript navigations, form submissions etc.
  • Add Apify.utils.puppeteer.infiniteScroll() function which helps you with scrolling to the bottom
    of websites that auto-load new content.
  • The RequestQueue.handledCount() function has been resurrected from deprecation,
    in order to have compatible interface with RequestList.
  • Add useExtendedUniqueKey option to Request constructor to include method and payload
    in the Request's computed uniqueKey.
  • Updated Puppeteer to 1.18.1
  • Updated apify-client to 0.5.22