Skip to content
rogerbraun edited this page Nov 24, 2011 · 7 revisions

WaDokuJT dictionary search

Picky is used to find dictionary entries in WaDokuJT, the largest Japanese-German dictionary.

Amount of Data

There are around 250.000 entries in the WaDokuJT file, with around 5 main fields. The file is only 60 mb, but the field for the German version of the entries has a lot of internal structure that would in other cases often be modeled with database relations. These relations have a lot of semantic information and have to be remodeled in the indexing step.

Features

All categories are indexed with full partial search. Also, several virtual fields exist that are created at indexing time, like the romaji field, which is generated directly from the Japanese characters. In the future it is planned to add more virtual fields like headwords, place names etc. Picky makes this very easy, as you can just write these virtual fields with standard Ruby code.

Speed

Queries are often just one word searches and not very complex. Picky can usually serve the request in under a millisecond. A "like %"-based SQLite search on the same data took around 2~3 seconds.

Indexing is very fast,, too. The server is an Xserve3,1 with a Quad-Core Xeon with 2.26 Ghz.

edv@rokuhara:~/Sites/picky_speed_test$ time bundle exec rake index
Loaded picky with environment 'development' in /Users/edv/Sites/picky_s
peed_test on Ruby 1.9.2.
Application  loaded.
[...]
real    8m39.234s
user    11m46.977s
sys     1m11.744s

Final remarks

Using Picky is one of the things that makes wadoku.eu good and easy to use. Having just one search field instead of the usual "advanced search" is great and we expect it to be a great advantage. This still has to be tested by our users, though.

Having your search completely seperated from your database design is a huge relief and makes it easy to change and optimize both search and database functions separately. Picky can also serve as a lightweight search API for third party services that want to use our data without any additional work on our part.