This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
detenc /
| name | age | message | |
|---|---|---|---|
| |
COPYING | Thu Jan 08 07:54:17 -0800 2009 | |
| |
README | Thu Jan 08 07:54:17 -0800 2009 | |
| |
Rakefile | Thu Jan 08 09:04:25 -0800 2009 | |
| |
src/ | Tue Jan 27 07:38:37 -0800 2009 | |
| |
test/ | Fri Jan 23 06:40:22 -0800 2009 |
README
detenc A lightweight, low-memory character encoding detector. Paul Battley <pbattley@gmail.com> http://www.reevoo.com/ detenc is a fast character encoding detector for Western European text. It can determine whether a file is encoded in US-ASCII, UTF-8, ISO-8859-15, WINDOWS-1252, or something else. It can distinguish ISO-8859-15 and WINDOWS-1252 where there is enough information: this means that Euro signs are handled correctly. The program was written to help normalise the encoding of very large data feeds (of the order of several gigabytes) at Reevoo. It uses very little memory and can determine the encoding of a two-gigabyte file in under a minute. Build The program is written in C and uses standard libraries. The test suite is written in Ruby. Additionally, the build process requires Rake: rake # builds binary rake test # runs test suite against binary rake install # installs binary to /usr/local/bin It is also possible to build the binary manually: use something like: cc -o bin/detenc src/*.c Use detenc FILENAME (FILENAME ...) This will output the filename and encoding, one per line. To print just the encoding, use the -q switch.







