Refactor documentation for `gensim.similarities.docsim`. #1910

CLearERR · 2018-02-15T19:33:45Z

menshikh-iv · 2018-02-20T04:49:31Z

gensim/corpora/textcorpus.py

+        >>>
+        >>> corpus = CorpusMiislita(datapath('head500.noblanks.cor.bz2'))
+        >>> corpus.get_texts()
+        <generator object get_texts at 0x7fa932f397d0>


bad output, can you show the concrete line of the dataset next(iter(corpus.get_texts())) ?

menshikh-iv · 2018-02-20T04:51:13Z

gensim/corpora/textcorpus.py

+        >>>                 if word not in CorpusMiislita.stoplist]
+        >>>
+        >>>     def __len__(self):
+        >>>         if 'length' not in self.__dict__:


no need to write something with logger, this should be simple & small example

menshikh-iv · 2018-02-20T04:51:29Z

gensim/corpora/textcorpus.py

+        >>>
+        >>>     def get_texts(self):
+        >>>         for doc in self.getstream():
+        >>>             yield [word for word in utils.to_unicode(doc).lower().split()


some issues with formatting

menshikh-iv · 2018-02-20T04:51:36Z

gensim/corpora/textcorpus.py

+        >>> corpus = CorpusMiislita(datapath('head500.noblanks.cor.bz2'))
+        >>> corpus.get_texts()
+        <generator object get_texts at 0x7fa932f397d0>
+        >>> corpus.__len__()


please len(dorpus) instead of this one, call "magic" directly is bad pattern (and is justified only for specific cases)

menshikh-iv · 2018-02-20T04:52:36Z

gensim/similarities/docsim.py

+
+        Return
+        ------
+        {:class: `~scipy.sparse.csr_matrix`, :class: `~numpy.array`}


numpy.array -> numpy.ndarray here and everywhere. Also, in this case, link shouldn't be rendered -> don't use ~ for numpy/scipy

menshikh-iv · 2018-02-20T04:53:59Z

gensim/similarities/docsim.py

+            Size of shards should be chosen so that a `shardsize x chunksize` matrix of floats fits comfortably into
+            main memory.
+        norm : str, optional
+            Normalization to use. Accepted values: {l1, l2}.


In this case, instead of

norm : str, optional Normalization to use. Accepted values: {l1, l2}.

should be

norm : {'l1', 'l2'}, optional Normalization to use.

This is better notation when we have several string pre-defined values.

CLearERR added 2 commits February 16, 2018 00:28

Added example for text_corpus.py

cb092aa

Fix for example

6bf32bd

menshikh-iv added the incubator project PR is RaRe incubator project label Feb 16, 2018

Updated docstrings for docsim.py

abe79ce

menshikh-iv suggested changes Feb 20, 2018

View reviewed changes

CLearERR and others added 10 commits February 21, 2018 02:19

Beta_docstrings for docsim.py

1cdb6e8

Gamma_docstrings for docsim.py

8d25d65

Merge remote-tracking branch 'upstream/develop' into docsim

6cd1c86

Massive package of different files.

6063cba

fix build (PEP8, rst)

5fe4ae5

retranslate _mmreader.pyx with cython==0.27.3

e92d2d7

fix matutils

262bc1c

fix textcorpus

7139e6a

fix mmcorpus

f2cb977

fix mmreader[2]

5ceda90

menshikh-iv mentioned this pull request Feb 22, 2018

Fix file-like closing bug from gensim.corpora.MmCorpus. Fix #1869 #1911

Merged

menshikh-iv added 5 commits February 23, 2018 05:32

fix docsim[1]

70f2de4

fix docsim[2]

91d1ee1

fix docsim[3]

b84535d

fix docsim[4]

0909472

fix docsim[5]

67e445b

menshikh-iv merged commit 5355c06 into piskvorky:develop Feb 23, 2018

menshikh-iv mentioned this pull request Mar 9, 2018

Refactor API reference gensim.similarities #1666

Closed

3 tasks

piskvorky mentioned this pull request Apr 30, 2018

Documentation fixes #2037

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor documentation for `gensim.similarities.docsim`. #1910

Refactor documentation for `gensim.similarities.docsim`. #1910

CLearERR commented Feb 15, 2018

menshikh-iv Feb 20, 2018

menshikh-iv Feb 20, 2018

menshikh-iv Feb 20, 2018

menshikh-iv Feb 20, 2018

menshikh-iv Feb 20, 2018

menshikh-iv Feb 20, 2018

menshikh-iv Feb 20, 2018

Refactor documentation for gensim.similarities.docsim. #1910

Refactor documentation for gensim.similarities.docsim. #1910

Conversation

CLearERR commented Feb 15, 2018

menshikh-iv Feb 20, 2018

Choose a reason for hiding this comment

menshikh-iv Feb 20, 2018

Choose a reason for hiding this comment

menshikh-iv Feb 20, 2018

Choose a reason for hiding this comment

menshikh-iv Feb 20, 2018

Choose a reason for hiding this comment

menshikh-iv Feb 20, 2018

Choose a reason for hiding this comment

menshikh-iv Feb 20, 2018

Choose a reason for hiding this comment

menshikh-iv Feb 20, 2018

Choose a reason for hiding this comment

Refactor documentation for `gensim.similarities.docsim`. #1910

Refactor documentation for `gensim.similarities.docsim`. #1910