universal character encoding detector
C++ C Python Makefile Shell CMake Batchfile
Latest commit 217e2b2 Nov 4, 2016 @PyYoshi add tiemstamp
Permalink
Failed to load latest commit information.
dockerfiles support manylinux1 Oct 17, 2016
src version 1.1.1 Nov 4, 2016
.gitignore use tox Oct 17, 2016
.travis.yml unsupport 3.3 Oct 17, 2016
CHANGES.rst add tiemstamp Nov 4, 2016
MANIFEST.in include tests Oct 18, 2016
Makefile support manylinux1 Oct 17, 2016
README.rst add rst documents Oct 17, 2016
appveyor.yml unsupport 3.3 Oct 17, 2016
build.cmd add build.cmd Oct 17, 2016
setup.cfg use tox Oct 17, 2016
setup.py use version var Nov 4, 2016
tox.ini unsupport py33 Oct 17, 2016

README.rst

cChardet

cChardet is high speed universal character encoding detector. - binding to charsetdetect.

PyPI version Travis Ci build status AppVeyor build status

Support codecs

  • Big5
  • EUC-JP
  • EUC-KR
  • GB18030
  • HZ-GB-2312
  • IBM855
  • IBM866
  • ISO-2022-CN
  • ISO-2022-JP
  • ISO-2022-KR
  • ISO-8859-2
  • ISO-8859-5
  • ISO-8859-7
  • ISO-8859-8
  • KOI8-R
  • Shift_JIS
  • TIS-620
  • UTF-8
  • UTF-16BE
  • UTF-16LE
  • UTF-32BE
  • UTF-32LE
  • WINDOWS-1250
  • WINDOWS-1251
  • WINDOWS-1252
  • WINDOWS-1253
  • WINDOWS-1255
  • EUC-TW
  • X-ISO-10646-UCS-4-2143
  • X-ISO-10646-UCS-4-3412
  • x-mac-cyrillic

Requirements

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
    result = chardet.detect(msg)
    print(result)

Benchmark

$ cd src/
$ pip install chardet
$ python tests/bench.py

Results

CPU: Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz

RAM: DDR3 1600Mhz 16GB

Platform: Ubuntu 16.04 amd64

Python 2.7.12

  Request (call/s)
chardet 0.26
cchardet 1408.73

Python 3.5.2

  Request (call/s)
chardet 0.28
cchardet 1380.40

License

  • The MIT License: src/cchardet
  • Other Libraries License: Please, look at the src/ext directory.

Thanks

Contact

Issues