Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
doc
 
 
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Documentation & Support

http://www.slideshare.net/buganini/bsdconv

http://www.slideshare.net/Buganini/journey-of-bsdconv

API Reference: http://buganini.github.io/bsdconv/

Use bsdconv-man to show manual page for each module

IRC: irc://irc.freenode.net#bsdconv

Compilation & Installation

make PREFIX=${prefix} # default to /usr/local
sudo make install PREFIX=${prefix} # default to /usr/local
sudo ldconfig ${prefix}/lib # Linux
sudo ldconfig -m ${prefix}/lib # FreeBSD

Add codec alias

Update modules/{from,inter,to}/alias
make alias

Example

Convert traditional chinese big5 to simplified chinese utf-8

bsdconv big5:zhcn:utf-8 in.txt > out.txt
bsdconv big5:zhcn:utf-8 -i in.txt #inplace

Convert traditional chinese utf-8 to simplified chinese GB2312 with transliteration

bsdconv utf-8:zhcn:cp936,cp936-trans in.txt > out.txt

Convert simplified chinese to traditional chinese

bsdconv utf-8:zhtw:zhtw-words:utf-8

And ignoring whitespaces mixed in words

bsdconv utf-8:whitespace-derail:zhtw:zhtw-words:whitespace-rerail:utf-8

Convert big5 data, traditional chinese to simplified chinese, CRLF/CR/LF to CRLF, to big5 data, translate simplified chinese words, which are not in big5, to HTML entities, and uppercase the ascii characters.

bsdconv big5:zhcn:win:upper:big5,htmlentity in.txt > out.txt

Counting character width

echo -n "aa" | bsdconv utf-8:width:null
FULL: 1
HALF: 1

echo -n "aaˇ" | bsdconv utf-8:width:null
FULL: 1
HALF: 1
AMBI: 1

Very useful for migrating MySQL DB from Big5 to UTF-8

bsdconv htmlentity,big5-5c,big5:utf-8 in.sql > out.sql

Recover from mis-decoding/encoding (mistreated big5 as iso-8859-1 and converted to utf-8)

bsdconv 'utf-8:iso-8859-1|big5:utf-8'

Decode escaped data (byte/unicode mixed) like %u9644%20

bsdconv 'escape,byte:unicode,byte|skip,ascii:utf-8'

Generate string for fuzzy comparison

echo ¼ℌăDžⓐ⁹灣湾ド鬒鬒æß | bsdconv UTF-8:ZH-FUZZY-TW:KANA-PHONETIC:NFKD-CASEFOLD:UTF-8
    1⁄4hădža9灣灣do鬒鬒æss

Translate text to HTML

bsdconv big5:nl2br:ascii,html-img in.txt > out.htm

Use glyph image from http://www.cns11643.gov.tw

bsdconv utf-8:ascii,ascii-html-cns11643-img in.txt out.htm

Maintain inter map:

bsdconv bsdconv-keyword,bsdconv:bsdconv-keyword,utf-8 inter/FOO.txt > edit.tmp
vi edit.tmp
bsdconv bsdconv-keyword,utf-8:bsdconv-keyword,bsdconv edit.tmp > inter/FOO.txt

Windows

Use mingw with Makefile.win to build it, then copy everythings in build/ to c:\bsdconv
the path of the executable will be c:\bsdconv\bsdconv.exe

If you want to install to directory other than default path set BSDCONV_PATH environment variable to your path.

Run setEnvVar.bat as administrator could help you set proper environment variables.

Bindings

Python

Perl

PHP

Ruby

Go

Java

Haskell

Elasticsearch

PostgreSQL

MySQL

About

A simple but powerful DSL for charset/encoding conversion and transformation, pure C implemetation with no extra dependencies

Resources

License

Packages

No packages published