node-webcrawler ChangeLog


  • #278 Added filestream require to download section (@swosko)
  • Use nock to mock testing instead of httpbin
  • Replace jshint by eslint
  • Fix code to pass eslint rules


  • Tolerate incorrect Content-Type header #270, #193
  • Added examples #272, 267
  • Fixed "skipDuplicates" and "retries" config incompatible bug #261
  • Fix typo in README #268


  • Upgraded request.js and lodash


  • Recognize all XML MIME types to inject jQuery #245
  • Allow options to specify the Agent for Request #246
  • Added logo


  • added a way to replace the global options.headers keys by queuing options.headers #241
  • fix bug of using last jar object if current options doesn't contain jar option #240
  • fix bug of encoding #233
  • added seenreq options #208
  • added preRequest, setLimiterProperty, direct request functions


  • fix missing debugging messages #213
  • fix bug of 'drain' never called #210


  • fix bug of charset detecting #203
  • keep node version up to date in travis scripts


  • fix bug, skipDuplicate and rotateUA don't work even if set true


  • upgrade jsdom up to 9.6.x
  • remove 0.10 and 0.12 support #170
  • control dependencies version using ^ and ~ #169
  • remove node-pool
  • notify bottleneck until a task is completed
  • replace bottleneck by bottleneckp, which has priority
  • change default log function
  • use event listener on request and drain instead of global function #144
  • default set forceUTF8 to true
  • detect ESOCKETTIMEDOUT instead of ETIMEDOUT when timeout in test
  • add done function in callback to avoid async trap
  • do not convert response body to string if encoding is null #118
  • add result document #68 #116
  • add event schedule which is emitted when a task is being added to scheduler
  • in callback, move $ into res because of weird API
  • change rateLimits to rateLimit


  • delete entity in options before copy, and assgin after, jar is one of the typical properties which is an Entity wich functions #177
  • upgrade request to version 2.74.0


  • change debug option to instance level instead of options
  • update to detail error handling
  • call onDrain with scope of this
  • upgrade seenreq version to 0.1.7


  • cancel recursion in queue
  • upgrade request version to v2.67.0


  • use bottleneckConcurrent instead of maxConnections, default 10000
  • add debug info


  • fix a deep and big bug when initializing Pool, that may lead to sequence execution. #2
  • print log of Pool status


  • you could also get result.options from callback even when some errors ouccurred #127 #86
  • add test for bottleneck


  • add bottleneck to implement rate limit, one can set limit for each connection at same time.


  • you can manually terminate all the resources in your pool, when onDrain called, before their timeouts have been reached
  • add a read-only property queueSize to crawler #148 #76 #107


  • remove cache feature, it's useless
  • add localAddress, time, tunnel, proxyHeaderWhiteList, proxyHeaderExclusiveList properties to pass to request #155


  • parse charset from content-type in http headers or meta tag in html, then convert
  • big5 charset is avaliable as the iconv-lite has already supported it
  • default enable gzip in request header
  • remove unzip code in crawler since request will do this
  • body will return as a Buffer if encoding is null which is an option in request
  • remove cache and skip duplicate request for GET, POST(only for type urlencode), HEAD
  • add log feature, you can use winston to set logger:winston, or crawler will output to console
  • rotate user-agent in case some sites ban your requests