Skip to content

Releases: apify/crawlee

v0.21.9

03 Nov 17:55
Compare
Choose a tag to compare
  • Fix various issues in stealth.
  • Fix SessionPool not retiring sessions immediately when they become unusable. It fixes a problem where PuppeteerPool would not retire browsers wit bad sessions.

v0.21.8

08 Oct 09:14
Compare
Choose a tag to compare
  • Make PuppeteerCrawler safe against malformed Puppeteer responses.
  • Update default user agent to Chrome 86
  • Bump Puppeteer to 5.3.1 with Chromium 86

v0.21.7

03 Oct 20:56
Compare
Choose a tag to compare
  • Fix an error in PuppeteerCrawler caused by page.goto() randomly returning null.

v0.21.6

02 Oct 15:55
Compare
Choose a tag to compare

It appears that CheerioCrawler was correctly retiring sessions on timeouts
and blocked status codes (401, 403, 429), whereas PuppeteerCrawler did not.
Apologies for the omission, this release fixes the problem.

  • Fix sessions not being retired on blocked status codes in PuppeteerCrawler.
  • Fix sessions not being marked bad on navigation timeouts in PuppeteerCrawler.
  • Update apify-shared to version 0.5.0.

v0.21.5

30 Sep 12:23
Compare
Choose a tag to compare

This is a very minor release that fixes some issues that were preventing
use of the SDK with Node 14.

  • Update the request serialization process which is used in RequestList
    to work with Node 10+ and not only 10 and 12.
  • Update some TypeScript types that were preventing build due to changes
    in typed dependencies.

v0.21.4

02 Sep 20:06
Compare
Choose a tag to compare

The request statistics that you may remember from logs are now persisted in key-value store,
so you won't lose count when your actor restarts. We've also added a lot of useful
stats in there which can be useful to you after a run finishes. Besides that,
we fixed some bugs and annoyances and improved the TypeScript experience a bit.

  • Add persistence to Statistics class and automatically persist it in BasicCrawler.
  • Fix issue where inaccessible Apify Proxy would cause ProxyConfiguration to throw
    a timeout error.
  • Update default user agent to Chrome 85
  • Bump Puppeteer to 5.2.1 which uses Chromium 85
  • TypeScript: Fix RequestAsBrowserOptions missing some values and add RequestQueueInfo
    as a return value from requestQueue.getInfo()

v0.21.3

27 Jul 18:09
Compare
Choose a tag to compare
  • Fix useless logging in Session.

v0.21.2

27 Jul 17:24
Compare
Choose a tag to compare
  • Fix cookies with leading dot in domain (as extracted from Puppeteer) not being correctly added to Sessions.

v0.21.1

21 Jul 12:59
0017b47
Compare
Choose a tag to compare

We fixed some bugs, improved a few things and bumped Puppeteer to match latest Chrome 84.

  • Allow Apify.createProxyConfiguration to be used seamlessly with the proxy component
    of Actor Input UI.
  • Fix integration of plugins into CheerioCrawler with the crawler.use() function.
  • Fix a race condition which caused RequestQueueLocal to fail handling requests.
  • Fix broken debug logging in SessionPool.
  • Improve ProxyConfiguration error message for missing password / token.
  • Update Puppeteer to 5.2.0
  • Improve docs, update packages and so on.

v0.21.0

06 Jun 14:30
Compare
Choose a tag to compare

This release comes with breaking changes that will affect most, if not all of your projects. See the migration guide for more information and examples.

First large change is a redesigned proxy configuration. Cheerio and Puppeteer crawlers now accept a proxyConfiguration parameter, which is an instance of ProxyConfiguration. This class now exclusively manages both Apify Proxy and custom proxies. Visit the new proxy management guide

We also removed Apify.utils.getRandomUserAgent() as it was no longer effective in avoiding bot detection and changed the default values for empty properties in Request instances.

  • BREAKING: Removed Apify.getApifyProxyUrl(). To get an Apify Proxy url, use proxyConfiguration.newUrl([sessionId]).
  • BREAKING: Removed useApifyProxy, apifyProxyGroups and apifyProxySession parameters from all applications in the SDK. Use proxyConfiguration in crawlers and proxyUrl in requestAsBrowser and Apify.launchPuppeteer.
  • BREAKING: Removed Apify.utils.getRandomUserAgent() as it was no longer effective in avoiding bot detection.
  • BREAKING: Request instances no longer initialize empty properties with null, which means that:
    • empty errorMessages are now represented by [], and
    • empty loadedUrl, payload and handledAt are undefined.
  • Add Apify.createProxyConfiguration() async function to create ProxyConfiguration instances. ProxyConfiguration itself is not exposed.
  • Add proxyConfiguration to CheerioCrawlerOptions and PuppeteerCrawlerOptions.
  • Add proxyInfo to CheerioHandlePageInputs and PuppeteerHandlePageInputs. You can use this object to retrieve information about the currently used proxy in Puppeteer and Cheerio crawlers.
  • Add click buttons and scroll up options to Apify.utils.puppeteer.infiniteScroll().
  • Fixed a bug where intercepted requests would never continue.
  • Fixed a bug where Apify.utils.requestAsBrowser() would get into redirect loops.
  • Fix Apify.utils.getMemoryInfo() crashing the process on AWS Lambda and on systems running in Docker without memory cgroups enabled.
  • Update Puppeteer to 3.3.0.