Skip to content
Character encoding auto-detection in JavaScript (port of python's chardet)
Branch: master
Clone or download
aadsm Version 2.1.0
- Add support for UTF-8 emoji
Latest commit 0addfaa Jan 22, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
dist Version 2.1.0 Jan 22, 2019
src
tests
.gitignore
CONTRIBUTORS
LICENSE
README.md Hide logger behind enableDebug Jan 22, 2019
index.js
package.json

README.md

NPM

JsChardet

Port of python's chardet (https://github.com/chardet/chardet).

License

LGPL

How To Use It

Node

npm install jschardet
var jschardet = require("jschardet")

// "àíàçã" in UTF-8
jschardet.detect("\xc3\xa0\xc3\xad\xc3\xa0\xc3\xa7\xc3\xa3")
// { encoding: "UTF-8", confidence: 0.9690625 }

// "次常用國字標準字體表" in Big5
jschardet.detect("\xa6\xb8\xb1\x60\xa5\xce\xb0\xea\xa6\x72\xbc\xd0\xb7\xc7\xa6\x72\xc5\xe9\xaa\xed")
// { encoding: "Big5", confidence: 0.99 }

Browser

Copy and include jschardet.min.js in your web page.

This library is also available in cdnjs at https://cdnjs.cloudflare.com/ajax/libs/jschardet/1.4.1/jschardet.min.js

Options

// See all information related to the confidence levels of each encoding.
// This is useful to see why you're not getting the expected encoding.
jschardet.enableDebug();

// Default minimum accepted confidence level is 0.20 but sometimes this is not
// enough, specially when dealing with files mostly with numbers.
// To change this to 0 to always get something or any other value that can
// work for you.
jschardet.detect(str, { minimumThreshold: 0 });

Supported Charsets

  • Big5, GB2312/GB18030, EUC-TW, HZ-GB-2312, and ISO-2022-CN (Traditional and Simplified Chinese)
  • EUC-JP, SHIFT_JIS, and ISO-2022-JP (Japanese)
  • EUC-KR and ISO-2022-KR (Korean)
  • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, and windows-1251 (Russian)
  • ISO-8859-2 and windows-1250 (Hungarian)
  • ISO-8859-5 and windows-1251 (Bulgarian)
  • windows-1252
  • ISO-8859-7 and windows-1253 (Greek)
  • ISO-8859-8 and windows-1255 (Visual and Logical Hebrew)
  • TIS-620 (Thai)
  • UTF-32 BE, LE, 3412-ordered, or 2143-ordered (with a BOM)
  • UTF-16 BE or LE (with a BOM)
  • UTF-8 (with or without a BOM)
  • ASCII

Technical Information

I haven't been able to create tests to correctly detect:

  • ISO-2022-CN
  • windows-1250 in Hungarian
  • windows-1251 in Bulgarian
  • windows-1253 in Greek
  • EUC-CN

Development

Use npm run dist to update the distribution files. They're available at https://github.com/aadsm/jschardet/tree/master/dist.

Authors

Ported from python to JavaScript by António Afonso (https://github.com/aadsm/jschardet)

Transformed into an npm package by Markus Ast (https://github.com/brainafk)

You can’t perform that action at this time.