Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert from/to encodings with iconv-lite only #36

Closed
alexandernst opened this issue Oct 4, 2013 · 7 comments
Closed

Convert from/to encodings with iconv-lite only #36

alexandernst opened this issue Oct 4, 2013 · 7 comments

Comments

@alexandernst
Copy link

Take this example:

mystr = new Buffer("base64 string with ISO-8859-1 encoding goes here", "base64").toString();

buffer = new Buffer(buffer, "ISO-8859-1"); <--- this will fail as Buffer doesn't support that encoding

buffer = iconv.decode(buffer, "ISO-8859-1");
buffer = iconv.encode(buffer, "utf8").toString("utf8");

This code should be able to convert from a ISO-8859-1 (or any other encoding) string to UTF8, but it will fail because of the second line.

Can that be done using iconv-lite?

@ashtuchkin
Copy link
Owner

Yes, sure. iconv.decode(buf, encoding) takes binary data and decodes it into a JS string. So, what you're need here is:

originalData = new Buffer("base64 string with ISO-8859-1 encoding goes here", "base64");  // Notice, no .toString().
jsStr = iconv.decode(originalData, "ISO-8859-1");

// Here you can use jsStr as usual javascript String:
jsStr.replace("hello", "world");

// If you need a buffer with this string UTF-8 encoded, then you can use Buffer like that:
utf8EncodedStringBuf = new Buffer(jsStr);  // utf8 is the default encoding.

// Or, equivalent
utf8EncodedStringBuf = iconv.encode(jsStr, "utf8");

@alexandernst
Copy link
Author

Ok, but what if I get a string in ISO-8859-1 instead of base64?

@ashtuchkin
Copy link
Owner

It depends on how that string was converted from the actual bytes.

If you have a JS string which was UTF-8-decoded from a buffer of ISO-8859-1-encoded data, then the recovery is possible only if the original string was all ASCII. UTF-8 is (obviously) not compatible with ISO-8859-1 and will produce all sorts of weird chars and decoding errors otherwise.

In general, you should attempt to get original bytes (in Buffer-s), that way it's easier and more robust.
If its not possible, then the best you can do is to try do encode it back as UTF8, then decode as ISO-8859-1, but it'll surely lose some information.

str = iconv.decode(new Buffer(str), "ISO-8859-1");

@alexandernst
Copy link
Author

I'm getting the string from this library: https://github.com/mscdex/node-imap

I fetch a mail part with:

f.on("message", function(msg, seqno){

    msg.on("body", function(stream, info){
        var buffer = "";
        stream.on("data", function(chunk){
            buffer += chunk;
        });

        stream.once("end", function(){
            //convert buffer from whatever encoding has to utf8
        });

..........

I'll ask that there too :)

@ashtuchkin
Copy link
Owner

You're inadvertently converting the chunk to string by doing buffer += chunk. You should keep it as a buffer, something like this:

        var buffer = new Buffer(0);
        stream.on("data", function(chunk) {
            buffer = Buffer.concat([buffer, chunk]);
        });

        stream.once("end", function(){
            str = iconv.decode(buffer, "ISO-8859-1");
        });

@alexandernst
Copy link
Author

That seems reasonable. I'll try it monday morning as soon as I get to the office :)

@alexandernst
Copy link
Author

I tried concat-ing the buffer instead of converting it to a string and now it works as expected, thank you! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants