buganini / bsdconv

BSD licensed charset/encoding converter library with more function than libiconv

This URL has Read+Write access

name age message
file Makefile Sun Dec 27 08:42:32 -0800 2009 make "all" the first target [buganini]
file Makefile.win Sun Sep 20 12:37:19 -0700 2009 put target meta back for header file [buganini]
file README Sat Jan 02 04:16:39 -0800 2010 add support for BOM decoding [buganini]
directory addons/ Sun Dec 20 15:26:00 -0800 2009 improve irssi script, dont do failed conversion [Buganini]
directory codecs/ Sun Jan 03 13:06:33 -0800 2010 enhance readability [buganini]
directory src/ Sun Jan 03 12:03:55 -0800 2010 allow using empty data [buganini]
directory tools/ Mon Dec 28 12:26:23 -0800 2009 add script generating ZH_COMP/ZH_DECOMP from ht... [buganini]
directory wrapper/ Thu Dec 31 08:00:01 -0800 2009 happy new year [Buganini]
README
Table format:
from<tab>to

Internal encoding:
* Should be as unicoded as possible
* BSDCONV special chars are prefixed with 00
  * BOM is 0000
* UNICODE is prefixed with 01
* CNS11643 is prefixed with 02
* RAW is prefixed with 03
* Chinese components is prefixed with 04

NOTICE:
Though many charset encoding is ascii-compatible,
 ascii is excluded from their codecs to provide flexibility.

--
Compiling & Installation (FreeBSD):
make
sudo make install

--
Run:
Convert traditional chinese big5 to simplified chinese utf-8
> bsdconv big5,ascii:chs:utf-8,ascii in.txt out.txt

Convert traditional chinese utf-8 to simplified chinese utf-8
> bsdconv utf-8,ascii:zhtw_normalize:cht:utf-8,ascii in.txt out.txt

Convert big5 data, traditional chinese to simplified chinese, 
CRLF/CR/LF to CRLF, to big5 data, translate simplified chinese words, which are
not in big5, to HTML entities
> bsdconv big5,ascii:chs,win:cp950,ascii,htmlentity in.txt out.txt

Very useful for migrating MySQL DB from Big5 to UTF-8
> bsdconv htmlentity,big5-5c,big5,ascii:utf-8,ascii in.sql out.sql

More example:
> bsdconv big5,ascii:nl2br:ascii,html-img in.txt out.htm
#html-img is actually ASCII-HTML-UNICODE-IMG

> bsdconv ascii,utf-8:ascii,ascii-html-cns11643-img in.txt out.htm
#if you prefer to use glyph image from http://www.cns11643.gov.tw

Maintain inter map:
> bsdconv bsdconv_keyword,bsdconv:bsdconv_keyword,utf-8,ascii MAP.txt edit.tmp
> vi edit.tmp
> bsdconv bsdconv_keyword,ascii,utf-8:bsdconv_keyword,bsdconv edit.tmp MAP.txt

--
WINDOWS:
http://github.com/buganini/bsdconv/downloads
Download and extract it, then goto build/ and run mk_table.bat
then copy everythings in build/ to c:\bsdconv\

--
If you want to install to directory other than default path
set BSDCONV_PATH environment variable to your path