Skip to content

Almost complete TextEncoder polyfill module (utf8, utf16, utf32, ascii, latin1, cp1250, etc)

Notifications You must be signed in to change notification settings

0r4nd/TextEncoderLite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TextEncoderLite

This is a Polyfill that allows to replace the TextEncoder API of javascript.
Why this library since it already exists as API? Because I didn't see that the TextEncoder API already existed!

Performance

  • The encoder/decoder are 4x to 10x slower than API (depend of the charset and usage of V8 or Spidermonkey)
  • The module takes ~150kb in memory (the tables have been optimized with this tool)

Implementation status

TextDecoderLite() - label:

  • The recommended encoding for the Web: ✅
    "utf-8"

  • The legacy single-byte encodings: ✅
    "ibm866", "iso-8859-2", "iso-8859-3", "iso-8859-4", "iso-8859-5", "iso-8859-6", "iso-8859-7", "iso-8859-8", "iso-8859-8i", "iso-8859-10", "iso-8859-13", "iso-8859-14", "iso-8859-15", "iso-8859-16", "koi8-r", "koi8-u", "macintosh", "windows-874", "windows-1250", "windows-1251", "windows-1252", "windows-1253", "windows-1254", "windows-1255", "windows-1256", "windows-1257", "windows-1258", "x-mac-cyrillic"

  • The legacy multi-byte Chinese (simplified) encodings: ⭕
    "gbk", "gb18030"

  • The legacy multi-byte Chinese (traditional) encoding: ⭕
    "big5"

  • The legacy multi-byte Japanese encodings: ⭕
    "euc-jp", "iso-2022-jp", "shift_jis"

  • The legacy multi-byte Korean encodings: ⭕
    "euc-kr"

  • The legacy miscellaneous encodings: ⭕
    "utf-16be", "utf-16le", "x-user-defined"

  • A special encoding ⭕
    "replacement". This decodes empty input into empty output and any other arbitrary-length input into a single replacement character. It is used to prevent attacks that mismatch encodings between the client and server. The following encodings also map to the replacement encoding: ISO-2022-CN, ISO-2022-CN-ext, "iso-2022-kr", and "hz-gb-2312".

TextEncoderLite() - experimental:

  • The official TextEncoder API can only encode to "utf-8"
  • if "experimental:true" is set, the module can also encode to (already implemented) labels
var utf8Enc = new TextEncoderLite(); // default is "utf-8" like API
var win1252Enc = new TextEncoderLite("windows-1252", {experimental:true});
var res = new Uint8Array(20);

utf8Enc.encodeInto("hell⚽", res);
console.log(res);
win1252Enc.encodeInto("hell⚽", res);
console.log(res);

// encoding can generate errors, "errorMode" define an error handling mode.
// "strict"             Raises an exception if the data cannot be converted.
// "replace"            Substitutes a special marker character ("�" or "?") for data that cannot be encoded.
// "ignore"             Skips the data.
// "xmlcharrefreplace"  XML character (exemple: "��") (encoding only)
// "backslashreplace"   escape sequence (exemple: "\\uFFFD\\uFFFD") (encoding only)
var greekEnc = new TextEncoderLite("greek8", {experimental:true, errorMode:"backslashreplace"});
greekEnc.encodeInto("hell⚽", res);
console.log(res);

Usage

var str = "a😆b😆c";
var res = new Uint8Array(14);
 
var encoder = new TextEncoderLite();
var decoder = new TextDecoderLite("utf-8", {fatal:true, ignoreBOM:true});

var tst = encoder.encodeInto(str, res);
console.log(decoder.decode(res), res, tst);
 

About

Almost complete TextEncoder polyfill module (utf8, utf16, utf32, ascii, latin1, cp1250, etc)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages