Skip to content
Commits on Jun 19, 2011
  1. @piskvorky
  2. @piskvorky
  3. @piskvorky

    up version: 0.8.0rc1

    piskvorky committed Jun 20, 2011
  4. @piskvorky
  5. @piskvorky
Commits on Jun 18, 2011
  1. @piskvorky

    improved doc strings

    piskvorky committed Jun 18, 2011
Commits on Jun 16, 2011
  1. @piskvorky
  2. @piskvorky

    Added chunking for lsi[corpus] transformation (about 3x faster)

    * before, lsi[corpus] was just syntactic sugar for (lsi[doc] for doc in corpus)
    * now, lsi[corpus] proceeds in chunks of documents (256 by default) and transforms each entire chunk at once
    * the reason is, transforming a chunk = matrix * matrix multiply, is faster than 256 single document transforms = matrix * vector multiplies (bc. of cache&co)
    piskvorky committed Jun 16, 2011
Commits on Jun 15, 2011
  1. @piskvorky
  2. @piskvorky
  3. @piskvorky
  4. @piskvorky
Commits on Jun 13, 2011
  1. @piskvorky
  2. @piskvorky
  3. @piskvorky
  4. @piskvorky

    updated docs to reflect PEP8 changes

    * also fixed and updated several doc strings and comments, esp. docsim.py
    piskvorky committed Jun 13, 2011
  5. @piskvorky

    added chunking to Similarity

    piskvorky committed Jun 13, 2011
  6. @piskvorky

    added unit tests for similarities

    * 1st working version of sharded Similarity
    piskvorky committed Jun 13, 2011
Commits on Jun 12, 2011
  1. @piskvorky
Commits on Jun 10, 2011
  1. @piskvorky

    removed threaded chunking

    * users reported problems and the speed gain was small...
    * now uses simple itertools.groupby to chunk again, like in 0.7.7
    piskvorky committed Jun 9, 2011
Commits on Jun 9, 2011
  1. @piskvorky
  2. @piskvorky
  3. @piskvorky

    re #10: PEP8-fied function/variable names

    * backwards incompatible, breaks all existing code!
    * but the changes are straightforward: numTopics => num_topics, addDocuments => add_documents etc.
    * documentation to be updated in a separate commit
    piskvorky committed Jun 7, 2011
Commits on Jun 7, 2011
  1. @piskvorky

    * backwards incompatible, breaks all existing code!

    * but the changes are straightforward: numTopics => num_topics, addDocuments => add_documents etc.
    piskvorky committed Jun 7, 2011
  2. @piskvorky
Commits on Jun 6, 2011
  1. @piskvorky

    deleted old unused SVD algos

    piskvorky committed Jun 6, 2011
Commits on Jun 1, 2011
  1. @piskvorky

    turn off threading in chunking by default

    * users reported problems (gensim stalling indefinitely, some deadlock?),
    http://groups.google.com/group/gensim/browse_thread/thread/c834e0c61eb50548
    piskvorky committed Jun 1, 2011
Commits on May 25, 2011
  1. @piskvorky
  2. @piskvorky

    unitVec returns scipy.sparse output for scipy.sparse input (was: retu…

    …rns dense numpy array)
    piskvorky committed May 25, 2011
Commits on May 21, 2011
  1. @piskvorky
Commits on May 20, 2011
  1. @piskvorky
  2. @piskvorky
  3. @piskvorky

    more efficient sparse matrix generation

    When the sparse properties (#documents, #terms, #non-zeroes) are known in advance, a much more efficient code path is taken.
    This is the case with MmCorpus, so pass a MmCorpus object to SparseSimilarityIndex whenever possible.
    Eligibility for the fast code path is determined by duck-typing, so any corpus supporting self.numDocs, self.numTerms and self.numElements will do (MmCorpus is one such example).
    piskvorky committed May 20, 2011
Commits on May 19, 2011
  1. @piskvorky

    changed default dense chunks size to 256 in indexing (was: 100)

    * powers of 2 give the best performance, i guess due to better cache alignment
    piskvorky committed May 19, 2011
  2. @piskvorky

    fixed bug where scipy.sparse arrays cannot be sliced beyond their end…

    … (unlike plain lists or numpy arrays)
    piskvorky committed May 19, 2011
Something went wrong with that request. Please try again.