This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
cburgmer@ira.uka.de (author)
Fri Dec 04 13:37:37 -0800 2009
cjklib /
| name | age | message | |
|---|---|---|---|
| |
COPYING | Sat Nov 08 07:16:36 -0800 2008 | |
| |
DEVELOPMENT | ||
| |
MANIFEST.in | ||
| |
README | ||
| |
THANKS | ||
| |
TODO | ||
| |
changelog | ||
| |
cjklib.conf | ||
| |
cjklib.egg-info/ | ||
| |
cjklib/ | ||
| |
epydoc_config | ||
| |
examples/ | ||
| |
pylintrc | ||
| |
scripts/ | ||
| |
setup.py | ||
| |
test/ |
README
Cjklib is a library providing Han character related methods for CJKV languages (Chinese, Japanese, Korean and Vietnamese). Introduction ============ Cjklib provides language routines related to Han characters (characters based on Chinese characters named Hanzi, Kanji, Hanja and chu Han respectively) used in writing of the Chinese, the Japanese, infrequently the Korean and formerly the Vietnamese language(s). Functionality is included for character pronunciations, radicals, glyph components, stroke decomposition and variant information. Installing ========== If you are installing from the source package you need to deploy the library on your system: $ python setup.py install Documentation ============= The API under http://cburgmer.nfshost.com/cjklib/ includes a lot of documentation. Also see the project page. Usage ===== The main components of this package are accessible through the Python library. But there is a small command line tool 'cjknife' that offers some of the library's functions. See "cjknife --help" for an overview. This tool also offers simple access to dictionaries which though have to be built before use. Currently supported are EDICT, CEDICT, HanDeDict, CFDICT, CEDICTGR. To create the needed tables and indices for e.g. the CEDICT dictionary run $ buildcjkdb build fullCEDICT --dataPath=PathToCedictFile Database ======== Packaged versions of the library will ship with a pre-build SQLite database file. If you want to, you can rebuild the database. First download the newest Unihan file: $ wget ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip Then start the build process: $ buildcjkdb build cjklibData Alternatively use e.g. 'fullMandarin' for data needed by the library's functions for the Mandarin Chinese language. SQLite ------ Currently only characters from the Basic Multilingual Plane (BMP) of Unicode are supported, due to missing support in MySQL (see below). To enable full support set wideBuild to True in cjklib.conf. SQLite offers a full-text search with extension FTS3 which needs to be compiled in to used. Cjknife can use the full-text capabilities for the dictionary search and performs a full table scan in fuzzy search if this extension is not available. To enable it set enableFTS3 to True in cjklib.conf. No full-text support is currently given for MySQL. MySQL ----- With MySQL 5 the following CREATE command creates a database with utf8 as character set using the general Unicode collation: CREATE DATABASE cjklib DEFAULT CHARACTER SET utf8 COLLATE utf8_bin; You might need to set access rights, too (substitute user_name and host_name): GRANT ALL ON cjklib.* TO 'user_name'@'host_name'; Now you need to change cjklib.conf to tell cjklib to use MySQL. MySQL < 6 doesn't support true UTF-8, and uses a Version with max 3 bytes, so characters outside the Basic Multilingual Plane (BMP) can't be encoded. Building the Unihan database thus might result in Warnings, Characters above 0x20000 can't be built at all. Contributing ============ If you are interested in contributing to cjklib, join cjklib-devel@googlegroups.com (http://groups.google.com/group/cjklib-devel). Please report bugs to http://code.google.com/p/cjklib/issues/list.








