Japanese tokenizer and misc #55

Merged
merged 14 commits into from Sep 10, 2012