Use Buffers when decoding

Alexander Shtuchkin edited this page Jun 11, 2014 · 5 revisions

Decoding a string is probably the most common mistake when working with legacy encoded resources. Why? Lets see.

Problem

This is wrong:

var http = require('http'),
    iconv = require('iconv-lite');

http.get("http://website.com/", function(res) {
  var body = '';
  res.on('data', function(chunk) {
    body += chunk;
  });
  res.on('end', function() {
    var decodedBody = iconv.decode(body, 'win1252');
    console.log(decodedBody);
  });
});

Before being decoded with iconv.decode function, the original resource was (unintentionally) already decoded in body += chunk via javascript type conversion. What really happens here is:

  res.on('data', function(chunkBuffer) {
    body += chunkBuffer.toString('utf8');
  });

The same conversion is done behind the scenes if you call res.setEncoding('utf8');.

Not only double-decoding leads to wrong results, it is also nearly impossible to restore original bytes because utf8 conversion is lossy, so even iconv.decode(new Buffer(body, 'utf8'), 'win1252') will not help.

Note: theoretically, if you use 'binary' encoding to first decode to strings, then feed them to decode, you get the correct results. This is a bad practice because it's slower, it's mixing concepts and 'binary' encoding is deprecated.

Solution

Keep original Buffer-s and provide them to iconv.decode. Use Buffer.concat() if needed.

In general, keep in mind that all javascript strings are already decoded and should not be decoded again.

http.get("http://website.com/", function(res) {
  var chunks = [];
  res.on('data', function(chunk) {
    chunks.push(chunk);
  });
  res.on('end', function() {
    var decodedBody = iconv.decode(Buffer.concat(chunks), 'win1252');
    console.log(decodedBody);
  });
});

// Or, with iconv-lite@0.4 and Node v0.10+, you can use streaming support with `collect` helper
http.get("http://website.com/", function(res) {
  res.pipe(iconv.decodeStream('win1252')).collect(function(err, decodedBody) {
    console.log(decodedBody);
  });
});

What if you know what you're doing and just want to mute the warning?

iconv.skipDecodeWarning = true;
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.