A Ruby command-line utility for translating files using a YAML wordlist
Ruby
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
lib
spec
.autotest
.gitignore
.rspec
Gemfile
LICENSE
README.rdoc
Rakefile
konjac.gemspec

README.rdoc

Konjac

A Ruby command-line utility for translating files using a YAML wordlist

Homepage

www.brymck.com

Documentation

brymck.github.com/konjac/

Author

Bryan McKelvey

Copyright

© 2012 Bryan McKelvey

License

MIT

Features

  • Fuzzy matching - Konjac can make suggestions for similar words based on their similarity.

  • Whitespace handling - It's still pretty lazy but at least functional

  • One-way translation - For example, you would always convert full-width letters and numbers in Japanese to their half-width counterparts in English. The converse is not necessarily true.

  • Regular expressions - Stuff like /(\d+)年(\d+)月(\d+)日/\2\/\3\/\1/ (i.e. 1984年11月23日 # => 11/23/1984)

  • Importing/exporting text from/to Office documents - Currently only working on Mac (support for Word planned, but for *nix it's probably too difficult).

Installation

Stable

With Ruby installed, run the following in your terminal:

gem install konjac

Development

With Ruby, Git and Bundler installed, navigate in your command line to a directory of your choice, then run:

git clone git://github.com/brymck/konjac.git
cd konjac
bundle update
rake install

Usage

Translate all text files in the current directory from Japanese into English:

konjac translate *.txt --from japanese --to english
konjac translate *.txt -f ja -t en

Use multiple dictionaries:

konjac translate financial_report_en.txt --to japanese --using {finance,fluffery}
konjac translate financial_report_en.txt -t ja -u {finance,fluffery}

Extract text from a .docx? document (creates a plain-text test.konjac file from test.docx):

konjac export test.doc
konjac export test.docx

Extract text from a .docx document and process with a dictionary

konjac export test.docx --from japanese --to english --using pre
konjac export test.docx -f ja -t en -u pre

Import tags file back into .docx document (for .doc files, this opens the file in word and imports the changes; for .docx files this outputs a new file named test_imported.docx):

konjac import test.doc
konjac import test.docx

Add a word to your dictionary:

konjac add --original dog --from english --translation 犬 --to japanese
konjac add -o dog -f en -r 犬 -t ja

Translate a word using your dictionary:

konjac translate dog --from english --to japanese --word
konjac translate dog -f en -t ja -w

Suggest a word using your dictionary:

konjac suggest dog --from english --to japanese
konjac suggest dog -f en -t ja

Ruby

Create a Suggestor object:

require "konjac"
Konjac::Dictionary.add_word :from => :en, :original => "word",
                            :to => :ja, :translation => "言葉"
s = Konjac::Suggestor.new(:en, :ja)
s.suggest "word" # => [[1.0, "word", "言葉"]]

Dictionary Format

Store terms in ~/.konjac/dict.yml.

Simple (two-way equivalent terms) - English “I” is equivalent to Spanish “yo”:

-
  en: I
  es: yo

Not as simple - Japanese lacks a plural, therefore both “dog” and “dogs” translate as 犬:

-
  en: dog
  ja:
    ja: 犬
    en: dogs?
    regex: true  # i.e. the regular expression /dogs?/

Documentation

Should be simple enough to generate yourself:

rm -rf konjac
git clone git://github.com/brymck/konjac
cd konjac
bundle update
rake rdoc
rm -rf !(doc)
mv doc/rdoc/* .
rm -rf doc

Supplementary Stuff

Name

Hon'yaku means “translation” in Japanese. This utility relies on a YAML wordlist. Konnyaku (Japanese for “konjac”) rhymes with hon'yaku and is a type of yam. Also, Doraemon had something called a hon'yaku konnyaku that allowed him to speak every language. IIRC it worked with animals too. But I digress.