Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
This library is high speed universal character encoding detector. - binding to libcharsetdetect
C++ C Python Shell
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
src
test
.gitignore ignore .python-version file
MANIFEST.in
README.markdown readme: update benchmark result and example codes
README.rst
pandoc_markdown2rst.bat add md2rst_converter
setup.py version 0.3.5

README.markdown

cChardet

cChardet is high speed universal character encoding detector. - binding to charsetdetect.

Support codecs

  • Big5
  • EUC-JP
  • EUC-KR
  • GB18030
  • HZ-GB-2312
  • IBM855
  • IBM866
  • ISO-2022-CN
  • ISO-2022-JP
  • ISO-2022-KR
  • ISO-8859-2
  • ISO-8859-5
  • ISO-8859-7
  • ISO-8859-8
  • KOI8-R
  • Shift_JIS
  • TIS-620
  • UTF-8
  • UTF-16BE
  • UTF-16LE
  • UTF-32BE
  • UTF-32LE
  • WINDOWS-1250
  • WINDOWS-1251
  • WINDOWS-1252
  • WINDOWS-1253
  • WINDOWS-1255
  • EUC-TW
  • X-ISO-10646-UCS-4-2143
  • X-ISO-10646-UCS-4-3412
  • x-mac-cyrillic

Requires

e.g.) Ubuntu 12.04

$ sudo apt-get install build-essential python-dev cython

Installation

$ cd /tmp
$ git clone git://github.com/PyYoshi/cChardet.git
$ cd cChardet
$ python setup.py build
$ sudo python setup.py install

or

$ sudo easy_install cchardet

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
result = chardet.detect(msg)
print(result)

Test

$ sudo easy_install or pip install -U chardet nose
$ cd test
$ nosetests --nocapture tests.py

Benchmark

code: tests.TestCchardetSpeed

sample: test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt

Performance:

CPU: Intel Core i7 860 2.8GHz

RAM: DDR3-1333 16GB

Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit

Result:

Request (call/s)
chardet0.32
cchardet975.46

License

  • The MIT License: src/cchardet

  • Other Libraries License: Please, look at the src/ext directory.

Thanks

Contact

My blog

Issues

Sorry for my poor English :)

Something went wrong with that request. Please try again.