Determine the charset of the input data with Mozilla Universal Charset Detection PHP extension
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
tests
vim
.gitignore
Changelog
README.ko.md
README.md
Reference
config.m4
init.sh
php_chardet.c
php_chardet.h
php_chardet_class.c
php_chardet_class.h
sample-oop.php
sample.php

README.md

mod_chardet php extension

License: MPL 1.1 GitHub closed issues GitHub closed pull requests

License

Copyright © 2016 JoungKyun.Kim and all right reserved.

This program is under MPL 1.1 or GPL v2

Abstract

Determine the charset of the input data with Mozilla Universal Charset Detection C/C++ library

This is php extension that is libchardet PHP frontend.

libchardet is based on Mozilla Universal Charset Detection C/C++ library and, detects the character set used to encode data.

This module is a c-binding, is much faster than the other chardet packages taht is made by PHP code.

mod_chardet extension supports three method for detecting charset. Supporting method and required library is as follow:

  • libchardet - Mozilla Universal Charset Detect C/C++ library
  • ICU - IBM International Components for Unicode
  • python-chardet - Mozilla Universal Charset Detect with pure python

For CJKV(Chinese, Japanese, Korean, Vitenams) languages, recommended to use MUCD(Mozilla Universal Charset Detect). This method is best. And, about single byte languages, MUCD and ICU all best.

In the case of python-chardet mode, even use the MUCD. However, the call performance is very not good. The mode is support for test, so when if you don't give configure options, this mode does not work basically.

For more informations, see also Reference document.

Downloads

Installation

1. Requires

  • mod_chardet versions
    • PHP 7 and after : mod_chardet >= 1.0.0
    • PHP 5 ans before : mod_chardet < 1.0.0
  • PHP >= 4.1
  • libchardet >= 1.0.5
  • libicu (optional)
  • python-chardet (optional)

2. Build

First, check libraries about libchardet, libicu and python-chardet.

You must install one of libchardet or libicu.

The function of python-chardet is for checking result with python-chardet. The performance of this feature is not very good and we don't recommand to use this feature.

[root@host mod_chardet]$ phpize
[root@host mod_chardet]$ ./configure --help
  ...
  --enable-moz-chardet    Support Mozilla chardet [default=yes]
  --enable-icu-chardet    Support ICU chardet [default=yes]
  --enable-py-chardet     Support python chardet [default=no]
  ...
[root@host mod_chardet]$ ./configure
[root@host mod_chardet]$ make && make install

3. Configurations

add DSO extension config to your php.ini

extension = chardet.so

Usages

See also sample script of repository.