universal character encoding detector
C++ C Python Makefile Shell CMake Batchfile
Latest commit 2cec051 Jan 8, 2017 @PyYoshi update setup.py
Permalink
Failed to load latest commit information.
dockerfiles Support Python 3.6 Jan 8, 2017
src version 1.1.2 Jan 8, 2017
.gitignore use tox Oct 17, 2016
.travis.yml add TOXENV=py37 Jan 8, 2017
CHANGES.rst version 1.1.2 Jan 8, 2017
MANIFEST.in include tests Oct 18, 2016
Makefile add build cmd Jan 8, 2017
README.rst add rst documents Oct 17, 2016
appveyor.yml Support Python 3.6 Jan 8, 2017
build.cmd add build.cmd Oct 17, 2016
setup.cfg use tox Oct 17, 2016
setup.py update setup.py Jan 8, 2017
tox.ini add TOXENV=py37 Jan 8, 2017

README.rst

cChardet

cChardet is high speed universal character encoding detector. - binding to charsetdetect.

PyPI version Travis Ci build status AppVeyor build status

Support codecs

  • Big5
  • EUC-JP
  • EUC-KR
  • GB18030
  • HZ-GB-2312
  • IBM855
  • IBM866
  • ISO-2022-CN
  • ISO-2022-JP
  • ISO-2022-KR
  • ISO-8859-2
  • ISO-8859-5
  • ISO-8859-7
  • ISO-8859-8
  • KOI8-R
  • Shift_JIS
  • TIS-620
  • UTF-8
  • UTF-16BE
  • UTF-16LE
  • UTF-32BE
  • UTF-32LE
  • WINDOWS-1250
  • WINDOWS-1251
  • WINDOWS-1252
  • WINDOWS-1253
  • WINDOWS-1255
  • EUC-TW
  • X-ISO-10646-UCS-4-2143
  • X-ISO-10646-UCS-4-3412
  • x-mac-cyrillic

Requirements

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
    result = chardet.detect(msg)
    print(result)

Benchmark

$ cd src/
$ pip install chardet
$ python tests/bench.py

Results

CPU: Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz

RAM: DDR3 1600Mhz 16GB

Platform: Ubuntu 16.04 amd64

Python 2.7.12

  Request (call/s)
chardet 0.26
cchardet 1408.73

Python 3.5.2

  Request (call/s)
chardet 0.28
cchardet 1380.40

License

  • The MIT License: src/cchardet
  • Other Libraries License: Please, look at the src/ext directory.

Thanks

Contact

Issues