Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Branch: master
Commits on Sep 14, 2011
  1. @gpeterson2

    Update to use the observer pattern.

    gpeterson2 authored
    As print can only get you so far.
Commits on Sep 13, 2011
  1. Updates to get this running on Windows.

    Greg Peterson authored
    - Removed instances of sqlalchemy from main. May add it in again later, but
    not in the main program.
    - Removed some unecesary sections from the parser.
    - Updated the data writer to handle the new entries list. Not 100% and running
    into issues with encodings, either writing to the db or in the windows shell.
Commits on Aug 15, 2011
  1. Fix for parser being a built-in on Windows.

    Greg Peterson authored
    It may be from some other libraries I installed, but on Windows 7
    running "import Parser from parser" lead to a name collision. This
    actually makes sense to do in general, though.
    Also added a .gz ignore because I downloaded the JMdict file locally
Commits on Aug 4, 2011
  1. @gpeterson2

    Add unit tests for the parser.

    gpeterson2 authored
    Also required updating the parse to return a list of Entry objects. Rather
    than the combination of items from before. Probably a better decision overall
    but this means that the database insert no longer works.
    Also created a gloss class due to the annoyence of dealing with tuples.
    Finall moved the JMdict file to the main directory as it tends to be easier
    to type when testing. So added it to the ignore file.
Commits on Mar 13, 2011
  1. @gpeterson2

    Add "warehouse" table.

    gpeterson2 authored
    Basically a simple collection of all of the data, un-normalized,
    but faster to search and display.
    The sad fact of the matter is that it's probably exactly what I
    want, aside from the glosses not containing all of their information.
    Well I should say I do still want to have a normalized table to
    do random querying against, but for the other immediate tools I want to
    build this will probably work perfectly (i.e. it's fast).
  2. @gpeterson2

    Add comments and redid the message system to support unicode.

    gpeterson2 authored
    Also added a to string to entries, for testing purposes. It
    doesn't handle the glosses very well yet.
  3. @gpeterson2

    Add comments

    gpeterson2 authored
  4. @gpeterson2

    Changed how output messages are handled.

    gpeterson2 authored
    As using print statments is nice for debugging, but not for much
  5. @gpeterson2

    Split data parsing and data saving function.

    gpeterson2 authored
    Should have done this a while ago. I'm kind of toying with the
    idea of going back and setting up the ORM again, but I'm still
    concerned with performance issues.
Commits on Feb 10, 2011
  1. @gpeterson2

    Cleaned up code a bit more.

    gpeterson2 authored
    Still fast, and seems to get all the correct pairings. But querying
    everything is still incredibly slow.
Commits on Jan 14, 2011
  1. @gpeterson2

    Tried setting up a couple specific dictionaries.

    gpeterson2 authored
    Using those instead to track joins. Not sure it's really helping.
    Things seem to be missing from the final listing, but I can't
    quite seem to track them down.
    I wonder if it might be easier to keep a list of joins, and read
    the ids back out from the table? Then again the big problem seems
    to be a lack of kana kanji link and spead of reading the list back
    out more than anything else.
Commits on Dec 2, 2010
  1. @gpeterson2

    Added kana/kanji ids.

    gpeterson2 authored
    Still slow to query the entire thing.
  2. @gpeterson2

    Add ids for join table.

    gpeterson2 authored
    Incrementing my own ids as a solution.
    I don't like how I figured out how to do this, and I certainly
    don't like that it's valid sqlite syntax...but such is life.
    The sub selects weren't working, so I guess for now this works.
    Then again I was trying to cut out everything I could to increse
    the speed of the conversion.
  3. @gpeterson2

    Add gloss entry join table.

    gpeterson2 authored
    Thinking about it, I'm not entirely sure that the sense element
    was entirely necessary. I couldn't find a unique way to identify
    them aside from the entire element, so it made sense to just move
    the part of speach into the gloss and run with that.
    Who knows, I may be completely wrong about it.
    It is becoming increasingly obvious that I need to references ids
    somehow, this join table is noticably slow, so figuring that out
    will be the first step to try fixing it.
  4. @gpeterson2

    Added kanji/kana to entry join tables.

    gpeterson2 authored
    I really need a way to get ids at this point. It seems that querying
    the table as is right now is noticably slower than before. Or maybe
    it's just my imagination.
    Working on the pos now. Simplified the error checking before
    thinking that it should be a separate commit.
  5. @gpeterson2

    Fix to add lists directly to database.

    gpeterson2 authored
    Rather than using intermediary files.
    Ran into an odd issue where it seemed the cursor was trying to
    iterate over the characters in the text rather than taking them
    as a single item. Actually that may be the correct way of handling
    these lists...either way, I just ran a list comprehension on them
    to get everything into the correct values.
  6. @gpeterson2

    Redid the parser to not use sqlalchemy for inserts.

    gpeterson2 authored
    The files (while not yet quite complete) insert all of this data
    in seconds rather than hours.
    Perhaps there are things that can be done with sqlalchemy to speed
    things up. Although I doubt it will ever match a bulk insert.
    That's not a huge priority, I started using it mostly to make
    querying easier not necessarily insertions.
Commits on Nov 30, 2010
  1. @gpeterson2

    Fix for missing pos in dictionary.

    gpeterson2 authored
    Seems to be "none" although I'm not sure why that would be included
    in the xml file...
    Well I should remember to do further testing assuming I run it against
    the whole file.
  2. @gpeterson2
  3. @gpeterson2

    Add part of speach table.

    gpeterson2 authored
    Hardcoded it into the parser because I don't think it's possible
    to easily read out of the xml file (at least not with lxml). I
    doubt these will change that much at this point, but you never know.
    The next step is to alter the parser so this information can be
    linked to the sense elment.
Commits on Nov 29, 2010
  1. @gpeterson2

    Add language attribute into the gloss table.

    gpeterson2 authored
    It's being grabbed in a roundabout way simply because I didn't want
    to hardcode the whole namespace defenition. Yeah, that's a little
    lazy, but for now it works.
    I also don't like how I'm displaying these, but then again using
    the __str__ method was only meant to used for debugging purposes
    anyway. I'll have to write more robust print code again eventually
    anyway. So this can wait for now.
  2. @gpeterson2
  3. @gpeterson2

    Rename model to models

    gpeterson2 authored
    That's another thing which has been bothering me for a while.
    It makes more sense as a plural given that there are more than
    one models.
  4. @gpeterson2

    Fix for dropping and creating tables before insert.

    gpeterson2 authored
    That's been bugging me for a while.
    It probably shouldn't need to be called explicitely, although the
    option is nice.
    Also I'm not 100% sure if this is the best way to handle setting
    the sqlalchemy objects, but the globals weren't being set correctly
    the way I was importing and initializing them before.
  5. @gpeterson2

    Add command line args.

    gpeterson2 authored
    Default is nothing, a file may be imported, or contents listed.
Commits on Nov 18, 2010
  1. @gpeterson2

    Revert to many-to-one relationship

    gpeterson2 authored
    If I was smart I would have acually reverted...
    Anyway, it makes more sense the way it was, but I could come
    back to this in the future.
  2. @gpeterson2
  3. @gpeterson2
  4. @gpeterson2

    Almost complete redesign of the dabase.

    gpeterson2 authored
    I wanted to move the ids in the entry, although this still seems
    like it should be many-to-many.
    Also renamed the kana/kanji elements as k_ele and r_ele were
    getting annoying to remember.
    - Still need to rename the columns in the database to something
    easier to understand.
Commits on Nov 17, 2010
  1. @gpeterson2

    Add kana, kanji, and start to gloss values to parser.

    gpeterson2 authored
    Still incredibly slow, and there seems to be an issue
    with duplicate entries.
    Still that might be fine if all I care about it filling
    the db, and use only use it later. The one test I tried
    showed that I could get results fast enough.
Commits on Sep 28, 2010
  1. @gpeterson2

    Add entry entries.

    gpeterson2 authored
    The commit's still take forever.
    The reb element isn't complete either, if it's inside a r_elem(?)
    element then it has a slightly different meaning which should
    be preseved.
    Also this will just blindly add the elements to the database, it
    should be deleted/dropped prior to this.
  2. @gpeterson2

    Fix so that an element will only be added on start.

    gpeterson2 authored
    Instead of trying it again on the end tag as well.
Commits on Aug 13, 2010
  1. @gpeterson2

    Add an incomplete parser and main model classes.

    gpeterson2 authored
     * The names could use some clarification in order to make the objects
        easier to use.
     * It takes quite a whilee to read the file right now, the code should
        be profiled, but the item lookup is a likely cause. Maybe it would
        be possible to sacrifice memory for speed?
  2. @gpeterson2

    Add sqlalchemy init

    gpeterson2 authored
  3. @gpeterson2

    Initially adding.

    gpeterson2 authored
Something went wrong with that request. Please try again.