@menshikh-iv menshikh-iv released this Jul 6, 2018 · 127 commits to develop since this release

3.5.0, 2018-07-06

This release comprises a glorious 38 pull requests from 28 contributors. Most of the effort went into improving the documentation—hence the release code name "Docs 💬"!

Apart from the massive overhaul of all Gensim documentation (including docstring style and examples—you asked for it), we also managed to sneak in some new functionality and a number of bug fixes. As usual, see the notes below for a complete list, with links to pull requests for more details.

Huge thanks to all contributors! Nobody loves working on documentation. 3.5.0 is a result of several months of laborious, unglamorous, and sometimes invisible work. Enjoy!

📚 Documentation improvements

🌟 New features:

  • Add add_entity method to KeyedVectors to allow adding word vectors manually (@persiyanov, #1957)
  • Add inference for new unseen author to AuthorTopicModel (@Stamenov, #1766)
  • Add evaluate_word_analogies (will replace accuracy) method to KeyedVectors (@akutuzov, #1935)
  • Add Pivot Normalization to TfidfModel (@markroxor, #1780)

👍 Improvements

🔴 Bug fixes:

  • Fix Doc2Vec.infer_vector + notebook cleanup (@gojomo, #2103)
  • Fix linear decay for learning rate in Doc2Vec.infer_vector (@umangv, #2063)
  • Fix negative sampling floating-point error for `gensim.models.Poincare (@jayantj, #1959)
  • Fix loading word2vec and doc2vec models saved using old Gensim versions (@manneshiva, #2012)
  • Fix SoftCosineSimilarity.get_similarities on corpora ssues/1955) (@Witiko, #1972)
  • Fix return dtype for matutils.unitvec according to input dtype (@o-P-o, #1992)
  • Fix passing empty dictionary to gensim.corpora.WikiCorpus (@steremma, #2042)
  • Fix bug in Similarity.query_shards in multiprocessing case (@bohea, #2044)
  • Fix SMART from TfidfModel for case when df == "n" (@PeteBleackley, #2021)
  • Fix OverflowError when loading a large term-document matrix in compiled MatrixMarket format (@arlenk, #2001)
  • Update rules for removing table markup from Wikipedia dumps (@chaitaliSaini, #1954)
  • Fix _is_single from Phrases for case when corpus is a NumPy array (@rmalouf, #1987)
  • Fix tests for EuclideanKeyedVectors.similarity_matrix (@Witiko, #1984)
  • Fix deprecated parameters in D2VTransformer and W2VTransformer(@MritunjayMohitesh, #1945)
  • Fix Doc2Vec.infer_vector after loading old Doc2Vec (gensim<=3.2)(@manneshiva, #1974)
  • Fix inheritance chain for load_word2vec_format (@DennisChen0307, #1968)
  • Update Keras version (avoid bug from keras==2.1.5) (@menshikh-iv, #1963)

⚠️ Deprecations (will be removed in the next major release)

  • Remove

    • gensim.models.wrappers.fasttext (obsoleted by the new native gensim.models.fasttext implementation)
    • gensim.examples
    • gensim.nosy
    • gensim.scripts.word2vec_standalone
    • gensim.scripts.make_wiki_lemma
    • gensim.scripts.make_wiki_online
    • gensim.scripts.make_wiki_online_lemma
    • gensim.scripts.make_wiki_online_nodebug
    • gensim.scripts.make_wiki (all of these obsoleted by the new native gensim.scripts.segment_wiki implementation)
    • "deprecated" functions and attributes
  • Move

    • gensim.scripts.make_wikicorpusgensim.scripts.make_wiki.py
    • gensim.summarizationgensim.models.summarization
    • gensim.topic_coherencegensim.models._coherence
    • gensim.utilsgensim.utils.utils (old imports will continue to work)
    • gensim.parsing.*gensim.utils.text_utils
Assets 26