Commits on Sep 14, 2011
  1. @gpeterson2

    Update to use the observer pattern.

    As print can only get you so far.
    gpeterson2 committed Sep 14, 2011
Commits on Sep 13, 2011
  1. Updates to get this running on Windows.

    - Removed instances of sqlalchemy from main. May add it in again later, but
    not in the main program.
    - Removed some unecesary sections from the parser.
    - Updated the data writer to handle the new entries list. Not 100% and running
    into issues with encodings, either writing to the db or in the windows shell.
    Greg Peterson committed Sep 13, 2011
Commits on Aug 15, 2011
  1. Fix for parser being a built-in on Windows.

    It may be from some other libraries I installed, but on Windows 7
    running "import Parser from parser" lead to a name collision. This
    actually makes sense to do in general, though.
    
    Also added a .gz ignore because I downloaded the JMdict file locally
    here.
    Greg Peterson committed Aug 15, 2011
Commits on Aug 4, 2011
  1. @gpeterson2

    Add unit tests for the parser.

    Also required updating the parse to return a list of Entry objects. Rather
    than the combination of items from before. Probably a better decision overall
    but this means that the database insert no longer works.
    
    Also created a gloss class due to the annoyence of dealing with tuples.
    
    Finall moved the JMdict file to the main directory as it tends to be easier
    to type when testing. So added it to the ignore file.
    gpeterson2 committed Aug 3, 2011
Commits on Mar 13, 2011
  1. @gpeterson2

    Add "warehouse" table.

    Basically a simple collection of all of the data, un-normalized,
    but faster to search and display.
    
    The sad fact of the matter is that it's probably exactly what I
    want, aside from the glosses not containing all of their information.
    
    Well I should say I do still want to have a normalized table to
    do random querying against, but for the other immediate tools I want to
    build this will probably work perfectly (i.e. it's fast).
    gpeterson2 committed Mar 13, 2011
  2. @gpeterson2

    Add comments and redid the message system to support unicode.

    Also added a to string to entries, for testing purposes. It
    doesn't handle the glosses very well yet.
    gpeterson2 committed Mar 13, 2011
  3. @gpeterson2

    Add comments

    gpeterson2 committed Mar 13, 2011
  4. @gpeterson2

    Changed how output messages are handled.

    As using print statments is nice for debugging, but not for much
    else.
    gpeterson2 committed Mar 13, 2011
  5. @gpeterson2

    Split data parsing and data saving function.

    Should have done this a while ago. I'm kind of toying with the
    idea of going back and setting up the ORM again, but I'm still
    concerned with performance issues.
    gpeterson2 committed Mar 13, 2011
Commits on Feb 10, 2011
  1. @gpeterson2

    Cleaned up code a bit more.

    Still fast, and seems to get all the correct pairings. But querying
    everything is still incredibly slow.
    gpeterson2 committed Feb 9, 2011
Commits on Jan 14, 2011
  1. @gpeterson2

    Tried setting up a couple specific dictionaries.

    Using those instead to track joins. Not sure it's really helping.
    Things seem to be missing from the final listing, but I can't
    quite seem to track them down.
    
    I wonder if it might be easier to keep a list of joins, and read
    the ids back out from the table? Then again the big problem seems
    to be a lack of kana kanji link and spead of reading the list back
    out more than anything else.
    gpeterson2 committed Jan 13, 2011
Commits on Dec 2, 2010
  1. @gpeterson2

    Added kana/kanji ids.

    Still slow to query the entire thing.
    gpeterson2 committed Dec 1, 2010
  2. @gpeterson2

    Add ids for join table.

    Incrementing my own ids as a solution.
    
    I don't like how I figured out how to do this, and I certainly
    don't like that it's valid sqlite syntax...but such is life.
    
    The sub selects weren't working, so I guess for now this works.
    Then again I was trying to cut out everything I could to increse
    the speed of the conversion.
    gpeterson2 committed Dec 1, 2010
  3. @gpeterson2

    Add gloss entry join table.

    Thinking about it, I'm not entirely sure that the sense element
    was entirely necessary. I couldn't find a unique way to identify
    them aside from the entire element, so it made sense to just move
    the part of speach into the gloss and run with that.
    
    Who knows, I may be completely wrong about it.
    
    It is becoming increasingly obvious that I need to references ids
    somehow, this join table is noticably slow, so figuring that out
    will be the first step to try fixing it.
    gpeterson2 committed Dec 1, 2010
  4. @gpeterson2

    Added kanji/kana to entry join tables.

    I really need a way to get ids at this point. It seems that querying
    the table as is right now is noticably slower than before. Or maybe
    it's just my imagination.
    
    Working on the pos now. Simplified the error checking before
    thinking that it should be a separate commit.
    gpeterson2 committed Dec 1, 2010
  5. @gpeterson2

    Fix to add lists directly to database.

    Rather than using intermediary files.
    
    Ran into an odd issue where it seemed the cursor was trying to
    iterate over the characters in the text rather than taking them
    as a single item. Actually that may be the correct way of handling
    these lists...either way, I just ran a list comprehension on them
    to get everything into the correct values.
    gpeterson2 committed Dec 1, 2010
  6. @gpeterson2

    Redid the parser to not use sqlalchemy for inserts.

    The files (while not yet quite complete) insert all of this data
    in seconds rather than hours.
    
    Perhaps there are things that can be done with sqlalchemy to speed
    things up. Although I doubt it will ever match a bulk insert.
    
    That's not a huge priority, I started using it mostly to make
    querying easier not necessarily insertions.
    gpeterson2 committed Dec 1, 2010
Commits on Nov 30, 2010
  1. @gpeterson2

    Fix for missing pos in dictionary.

    Seems to be "none" although I'm not sure why that would be included
    in the xml file...
    
    Well I should remember to do further testing assuming I run it against
    the whole file.
    gpeterson2 committed Nov 30, 2010
  2. @gpeterson2
  3. @gpeterson2

    Add part of speach table.

    Hardcoded it into the parser because I don't think it's possible
    to easily read out of the xml file (at least not with lxml). I
    doubt these will change that much at this point, but you never know.
    
    The next step is to alter the parser so this information can be
    linked to the sense elment.
    gpeterson2 committed Nov 30, 2010
Commits on Nov 29, 2010
  1. @gpeterson2

    Add language attribute into the gloss table.

    It's being grabbed in a roundabout way simply because I didn't want
    to hardcode the whole namespace defenition. Yeah, that's a little
    lazy, but for now it works.
    
    I also don't like how I'm displaying these, but then again using
    the __str__ method was only meant to used for debugging purposes
    anyway. I'll have to write more robust print code again eventually
    anyway. So this can wait for now.
    gpeterson2 committed Nov 29, 2010
  2. @gpeterson2
  3. @gpeterson2

    Rename model to models

    That's another thing which has been bothering me for a while.
    It makes more sense as a plural given that there are more than
    one models.
    gpeterson2 committed Nov 29, 2010
  4. @gpeterson2

    Fix for dropping and creating tables before insert.

    That's been bugging me for a while.
    
    It probably shouldn't need to be called explicitely, although the
    option is nice.
    
    Also I'm not 100% sure if this is the best way to handle setting
    the sqlalchemy objects, but the globals weren't being set correctly
    the way I was importing and initializing them before.
    gpeterson2 committed Nov 29, 2010
  5. @gpeterson2

    Add command line args.

    Default is nothing, a file may be imported, or contents listed.
    gpeterson2 committed Nov 29, 2010
Commits on Nov 18, 2010
  1. @gpeterson2

    Revert to many-to-one relationship

    If I was smart I would have acually reverted...
    
    Anyway, it makes more sense the way it was, but I could come
    back to this in the future.
    gpeterson2 committed Nov 17, 2010
  2. @gpeterson2
  3. @gpeterson2
  4. @gpeterson2

    Almost complete redesign of the dabase.

    I wanted to move the ids in the entry, although this still seems
    like it should be many-to-many.
    
    Also renamed the kana/kanji elements as k_ele and r_ele were
    getting annoying to remember.
    
    TODO:
    - Still need to rename the columns in the database to something
    easier to understand.
    gpeterson2 committed Nov 17, 2010
Commits on Nov 17, 2010
  1. @gpeterson2

    Add kana, kanji, and start to gloss values to parser.

    Still incredibly slow, and there seems to be an issue
    with duplicate entries.
    
    Still that might be fine if all I care about it filling
    the db, and use only use it later. The one test I tried
    showed that I could get results fast enough.
    gpeterson2 committed Nov 16, 2010
Commits on Sep 28, 2010
  1. @gpeterson2

    Add entry entries.

    The commit's still take forever.
    
    The reb element isn't complete either, if it's inside a r_elem(?)
    element then it has a slightly different meaning which should
    be preseved.
    
    Also this will just blindly add the elements to the database, it
    should be deleted/dropped prior to this.
    gpeterson2 committed Sep 27, 2010
  2. @gpeterson2

    Fix so that an element will only be added on start.

    Instead of trying it again on the end tag as well.
    gpeterson2 committed Sep 27, 2010
Commits on Aug 13, 2010
  1. @gpeterson2

    Add an incomplete parser and main model classes.

    TODO:
     * The names could use some clarification in order to make the objects
        easier to use.
    
     * It takes quite a whilee to read the file right now, the code should
        be profiled, but the item lookup is a likely cause. Maybe it would
        be possible to sacrifice memory for speed?
    gpeterson2 committed Aug 12, 2010
  2. @gpeterson2

    Add sqlalchemy init

    gpeterson2 committed Aug 12, 2010
  3. @gpeterson2

    Initially adding.

    gpeterson2 committed Aug 12, 2010