Refactor code with PEP8 and additional limitations. Fix #1521 #1569

menshikh-iv · 2017-09-05T18:01:31Z

No description provided.

zsef123 · 2017-09-06T22:55:05Z

gensim/models/wrappers/ldamallet.py

-                    doc = [(id_, weight)
-                           for id_, weight in enumerate(map(float, parts))
-                           if abs(weight) > eps]
+                    doc = [(id_, weight) for id_, weight in enumerate(float(x) for x in parts) if abs(weight) > eps]


@menshikh-iv
line 308:

doc = [(int(id_), float(weight)) for id_, weight in zip(*[iter(parts)] * 2) if abs(float(weight)) > eps]

line 310 :

doc = [(id_, float(weight)) for id_, weight in enumerate(parts) if abs(float(weight)) > eps]

How about above code?
Above code is less loop

Ok, accepted.

@zsef123 Both examples above are incorrect. The correct formatting using hanging indent:

doc = [ (id_, float(weight)) for id_, weight in enumerate(parts) if abs(float(weight)) > eps ]

Actually, this line not too long, so simple:

doc = [(id_, float(weight)) for id_, weight in enumerate(parts) if abs(float(weight)) > eps]

would work too.

already done as your last variant @piskvorky

zsef123 · 2017-09-06T23:42:24Z

gensim/test/test_sklearn_api.py

@@ -363,7 +363,7 @@ def testPipeline(self):
        data = cache
        test_data = data.data[0:2]
        test_target = data.target[0:2]
-        id2word = Dictionary(map(lambda x: x.split(), test_data))
+        id2word = Dictionary([x.split() for x in data.data])


data.data need change to test_data

Thanks, fixed.

zsef123 · 2017-09-06T23:44:05Z

gensim/test/test_atmodel.py

@@ -436,7 +436,7 @@ def testPasses(self):
            for test_rhot in test_rhots:
                model.update(corpus, author2doc)

-                msg = ", ".join(map(str, [passes, model.num_updates, model.state.numdocs]))
+                msg = ", ".join(str(x) for x in [passes, model.num_updates, model.state.numdocs])


How about msg = "%d, %d, %d" % (passes, model.num_updates, model.state.numdocs)?
This is more simple.

Agree, fixed

… nose) + small changes for models

piskvorky

Epic! I reviewed the diff and left some minor comments inside. I will look at the non-diffed (unchanged) parts next.

piskvorky · 2017-09-07T12:35:48Z

gensim/corpora/dictionary.py

-            self.dfs = dict((tokenid, freq)
-                            for tokenid, freq in iteritems(self.dfs)
-                            if tokenid not in bad_ids)
+            self.token2id = dict((token, tokenid) for token, tokenid in iteritems(self.token2id) if tokenid not in bad_ids)


We no longer support py2.6, so this can be a dict comprehension: {token: tokenid for ...}. Here and elsewhere.

piskvorky · 2017-09-07T12:36:20Z

gensim/corpora/dictionary.py

-        logger.info(
-            "keeping %i tokens which were in no less than %i and no more than %i (=%.1f%%) documents",
-            len(good_ids), no_below, no_above_abs, 100.0 * no_above)
+        logger.info("keeping %i tokens which were in no less than %i and no more than %i (=%.1f%%) documents", len(good_ids), no_below, no_above_abs, 100.0 * no_above)


Line too long, the args should be on their own separate line. Here and elsewhere.

piskvorky · 2017-09-07T12:38:38Z

gensim/corpora/dictionary.py

@@ -333,7 +322,7 @@ def merge_with(self, other):
            old2new[other_id] = new_id
            try:
                self.dfs[new_id] += other.dfs[other_id]
-            except Exception:
+            except AttributeError:


Why this change? What will happen if other.dfs exists, but is not a dictionary?

piskvorky · 2017-09-07T12:39:50Z

gensim/corpora/sharded_corpus.py

@@ -254,20 +252,14 @@ def init_shards(self, output_prefix, corpus, shardsize=4096, dtype=_default_dtyp
        """Initialize shards from the corpus."""

        if not gensim.utils.is_corpus(corpus):
-            raise ValueError('Cannot initialize shards without a corpus to read'
-                             ' from! (Got corpus type: {0})'.format(type(corpus)))
+            raise ValueError('Cannot initialize shards without a corpus to read from! (Got corpus type: {0})'.format(type(corpus)))


I'd prefer to use one type of formatting consistently (%s instead of .format). Here and elsewhere.

For my opinion both variants is good, for this reason we shouldn't use only one

piskvorky · 2017-09-07T12:41:01Z

gensim/corpora/sharded_corpus.py

@@ -484,8 +456,7 @@ def resize_shards(self, shardsize):
            # If something happens when we're in this stage, we're screwed.
            except Exception as e:
                print(e)
-                raise RuntimeError('Resizing completely failed for some reason.'
-                                   ' Sorry, dataset is probably ruined...')
+                raise RuntimeError('Resizing completely failed for some reason. Sorry, dataset is probably ruined...')


We should use logger.exception to log the exception, and then simply re-raise the original error, instead of printing it.

piskvorky · 2017-09-07T12:41:55Z

gensim/corpora/sharded_corpus.py

-                raise TypeError('Couldn\'t find number of features, '
-                                 'refusing to guess (dimension set to {0},'
-                                 'type of corpus: {1}).'.format(self.dim, type(corpus)))
+                raise TypeError('Couldn\'t find number of features, refusing to guess (dimension set to {0}, type of corpus: {1}).'.format(self.dim, type(corpus)))


Code style: Better use " to delimit the string, to avoid the \' inside, to improve readability. Here and elsewhere.

Fixed (\' occurs only in this file several times)

piskvorky · 2017-09-07T12:49:58Z

gensim/examples/dmlcz/dmlcorpus.py

        self.dictionary = dictionary.Dictionary()
        numPositions = 0
        for docNo, (sourceId, docUri) in enumerate(self.documents):
            if docNo % 1000 == 0:
-                logger.info("PROGRESS: at document #%i/%i (%s, %s)" %
-                             (docNo, len(self.documents), sourceId, docUri))
+                logger.info("PROGRESS: at document #%i/%i (%s, %s)", docNo, len(self.documents), sourceId, docUri)


docNo and resultFile is Java-style. I think this file (package) should be simply removed, instead of updated.

Completely agree, will removed it soon (but not in this PR)

piskvorky · 2017-09-07T12:51:46Z

gensim/models/wrappers/ldavowpalwabbit.py

-            LOG.warning('no word id mapping provided; initializing from '
-                        'corpus, assuming identity')
+                raise ValueError('at least one of corpus/id2word must be specified, to establish input space dimensionality')
+            LOG.warning('no word id mapping provided; initializing from corpus, assuming identity')


What is this LOG? We use logger in gensim, by convention.

menshikh-iv added 2 commits September 5, 2017 22:59

Replace map(..) to comprehensions

394913d

Fix logging (remove '%'/'.format' + longer lines)

62e92fc

menshikh-iv mentioned this pull request Sep 5, 2017

Refactor current gensim code by PEP8 #1521

Closed

4 tasks

menshikh-iv added this to In progress in Code cleanup Sep 5, 2017

menshikh-iv added 4 commits September 6, 2017 01:14

style-check[1]

6c6213b

Small fix for bash scripts

f34e7b3

style-check[2] (corpora)

ab34c2e

flake8 check

7bbd7a6

zsef123 reviewed Sep 6, 2017

View reviewed changes

menshikh-iv added 6 commits September 7, 2017 12:14

Fix shared_corpus API + resolve comment from review

bb1a07b

Remove legacy "endclass" from corpora

d7bc17a

style-check[3]

49a3bfb

style-check[4]

630c390

Rename test_base_tm to basetmtest (for preventing direct running with…

aaeda8e

… nose) + small changes for models

style-check[5]

7e94ce0

piskvorky requested changes Sep 7, 2017

View reviewed changes

menshikh-iv added 8 commits September 8, 2017 12:36

Replace LOG -> logger

9d2e473

Return broad exception to dictionary

a242e06

Replace "dict((" -> dict comprehension

cccf8d9

Replace "print(e)" -> "logger.exception(e)"

a6370af

Fix quotation

c9b138f

Reduce long lines

614fb7c

missed PEP8

2a72161

style-check[6]

b28b6c3

menshikh-iv merged commit db9e230 into develop Sep 8, 2017

menshikh-iv moved this from In progress to Done in Code cleanup Sep 8, 2017

menshikh-iv deleted the pep8-continue branch October 6, 2017 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor code with PEP8 and additional limitations. Fix #1521 #1569

Refactor code with PEP8 and additional limitations. Fix #1521 #1569

menshikh-iv commented Sep 5, 2017

zsef123 Sep 6, 2017

menshikh-iv Sep 7, 2017

piskvorky Sep 7, 2017 •

edited

Loading

menshikh-iv Sep 8, 2017

zsef123 Sep 6, 2017

menshikh-iv Sep 7, 2017

zsef123 Sep 6, 2017

menshikh-iv Sep 7, 2017

piskvorky left a comment

piskvorky Sep 7, 2017 •

edited

Loading

menshikh-iv Sep 8, 2017

piskvorky Sep 7, 2017

menshikh-iv Sep 8, 2017

piskvorky Sep 7, 2017

menshikh-iv Sep 8, 2017

piskvorky Sep 7, 2017

menshikh-iv Sep 7, 2017

piskvorky Sep 7, 2017 •

edited

Loading

menshikh-iv Sep 8, 2017

piskvorky Sep 7, 2017 •

edited

Loading

menshikh-iv Sep 8, 2017

piskvorky Sep 7, 2017 •

edited

Loading

menshikh-iv Sep 7, 2017

piskvorky Sep 7, 2017

menshikh-iv Sep 8, 2017

Refactor code with PEP8 and additional limitations. Fix #1521 #1569

Refactor code with PEP8 and additional limitations. Fix #1521 #1569

Conversation

menshikh-iv commented Sep 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piskvorky Sep 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piskvorky left a comment

Choose a reason for hiding this comment

piskvorky Sep 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piskvorky Sep 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piskvorky Sep 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piskvorky Sep 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piskvorky Sep 7, 2017 •

edited

Loading

piskvorky Sep 7, 2017 •

edited

Loading

piskvorky Sep 7, 2017 •

edited

Loading

piskvorky Sep 7, 2017 •

edited

Loading

piskvorky Sep 7, 2017 •

edited

Loading