Add style checks and refactor suggestions #121

yogeshg · 2018-04-24T00:17:41Z

In this PR, we

add python/.pylint/suggested.rc adapted from the default configuration generated by pylint
allow both camelCase and snake_case using regexes lifted from pylint source code
increase thresholds for number of arguments, local, variables
disable checks that are used often in this project: unused-argument, too-many-arguments, no-member, missing-docstring, no-init, protected-access, misplaced-comparison-constant, no-else-return, fixme
escape some code with # pylint: disable=... because it was hard to refactor without thorough testing

Some style decisions that were discussed are:

disables are acceptable if there is no other way to do this, in which case a comment must be left explaining that
other disables should be removed and should be considered similar to todos
we allow todo marks in code because these are acceptable for this project and should be taken care of in future
there are 50 todos, fixmes or pylint disables currently, we should aim to bring this down
find python/sparkdl | grep ".*\.py$" | xargs egrep -ino --color=auto "(TODO|FIXME|# pylint).*"

function calls and function defintions that span more than 1 line are left to committer and reviewer's discretion

pep8 style:

long_function_name(
    long_argument_one = "one",
    long_argument_two = "two",
    long_argument_three = "three",
    long_argument_four = "four",
    long_argument_five = "five")

MLlib style:

long_function_name(
    long_argument_one = "one", long_argument_two = "two", long_argument_three = "three",
    long_argument_four = "four", long_argument_five = "five")

* remove unused type * rename globals * ignore a variable name

codecov-io · 2018-04-24T01:07:44Z

Codecov Report

Merging #121 into master will decrease coverage by 0.01%.
The diff coverage is 93.93%.

@@            Coverage Diff             @@
##           master     #121      +/-   ##
==========================================
- Coverage   85.38%   85.37%   -0.02%     
==========================================
  Files          33       33              
  Lines        1923     1921       -2     
  Branches       44       44              
==========================================
- Hits         1642     1640       -2     
  Misses        281      281

Impacted Files	Coverage Δ
...n/sparkdl/estimators/keras_image_file_estimator.py	`88.97% <ø> (ø)`	⬆️
python/sparkdl/__init__.py	`100% <ø> (ø)`	⬆️
python/sparkdl/graph/input.py	`98.05% <ø> (ø)`	⬆️
python/sparkdl/graph/pieces.py	`96.55% <0%> (ø)`	⬆️
python/sparkdl/transformers/keras_applications.py	`85.71% <100%> (ø)`	⬆️
python/sparkdl/transformers/named_image.py	`90.06% <100%> (-0.07%)`	⬇️
python/sparkdl/graph/builder.py	`93.85% <100%> (+0.1%)`	⬆️
python/sparkdl/transformers/keras_tensor.py	`100% <100%> (ø)`	⬆️
python/sparkdl/transformers/keras_image.py	`100% <100%> (ø)`	⬆️
python/sparkdl/param/shared_params.py	`86.13% <100%> (-0.27%)`	⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f05fc2...e90d3ad. Read the comment docs.

smurching

@yogeshg this is great, thanks for making this PR :)

Had a few Qs in comments below. Was also wondering - it looks like there are some pylint-disabling annotations that appear frequently & it'd be nice to find a way to have slightly fewer style annotations in our code:

# pylint: disable=too-few-public-methods. This shows up a lot but it looks like it actually catches non-Pythonic code more than scenarios where we simply extend a class & so don't have many public methods, so I'm fine keeping it. I guess the alternative would be to disable it or reduce the min number of public methods in .pylint.
pylint: disable=fixme. Is it possible to remove the need for this annotation by just allowing TODOs etc as per https://stackoverflow.com/questions/33157982/how-do-i-disable-todo-warnings-in-pylint?
# pylint: disable=invalid-name. Can we get rid of this by renaming variables?

Thanks again, happy you made this PR party-hat-emoji

smurching · 2018-04-24T01:07:14Z

python/.pylint/accepted.rc

@@ -0,0 +1,556 @@
+[MASTER]


QQ: how'd you pick the settings in this file (& in .pylint/suggested.rc)? is there a default PEP8 pylint file that we can/should diff this one against?

I generated the default rc file that pylint uses and made incremental changes to suggested.rc and accepted.rc. I am adding a diff too, so that larger audience can see that. Thanks for pointing out!

(dev3) yo@C02VM7GBHTD5:~/code/spark-deep-learning$ pylint --generate-rcfile > python/.pylint/default.rc No config file found, using default configuration

Note: on my local python2 and 3 generate sections in different order, so use python2.7.

smurching · 2018-04-24T01:10:40Z

python/sparkdl/estimators/keras_image_file_estimator.py

@@ -264,7 +265,7 @@ def fitMultiple(self, dataset, paramMaps):
                     existence of a sufficiently large (and writable) file system, users are
                     advised to not train too many models in a single Spark job.
        """
-        [self._validateParams(pm) for pm in paramMaps]
+        assert all([self._validateParams(pm) for pm in paramMaps])


Q: why the assert? looks like we already raise a ValueError if a paramMap is invalid in _validateParams

yeah I agree, you could move the assert above to signal that it is always expected to be there

assert any([...]) was changed to _ = [...] for readability, because it suggested that the error would be thrown because of the assert, but in reality the error was thrown in _validateParams

smurching · 2018-04-24T01:12:04Z

python/sparkdl/graph/builder.py

@@ -83,15 +84,15 @@ def asGraphFunction(self, inputs, outputs, strip_and_freeze=True):

        :param inputs: list, graph elements representing the inputs
        :param outputs: list, graph elements representing the outputs
-        :param strip_and_freeze: bool, should we remove unused part of the graph and freee its values
+        :param strip_and_freeze: bool, should we remove unused part of the graph and free its values


lol I think it's actually supposed to be freeze instead of free :P

no idea how this happened :P

smurching · 2018-04-24T01:25:43Z

python/sparkdl/transformers/named_image.py

-        transformer = _NamedImageTransformer(inputCol=self.getInputCol(),
-                                             outputCol=self._getIntermediateOutputCol(),
-                                             modelName=self.getModelName(), featurize=False)
+        transformer = _NamedImageTransformer(


one of the style things we're inconsistent about in this repo is how to handle wrapping function calls that span > 1 line. i think we'd decided to use multiple lines but try to fit as many arguments as possible onto each line, e.g:

transformer = _NamedImageTransformer( inputCol=self.getInputCol(), outputCol=self._getIntermediateOutputCol(), modelName=self.getModelName(), featurize=False)

i'll let @sueann confirm, but i'm fine with this as is too :) (especially since it seems like it'd be a pretty substantial change to make the style consistent everywhere)

I went with the path of least resistance while making changes, however I agree we should decide and document this change.

Yeah I think it's better to compact the code as much as possible. It looks fine in this case but this pattern wastes a lot of space elsewhere and makes it harder to read in my opinion.

yogeshg · 2018-04-24T04:17:02Z

Thanks for the review @smurching and @tomasatdatabricks !
I update with following changes:

freeze instead of free: python/sparkdl/graph/builder.py
assert any([...]) was changed to _ = [...] for readability, because it suggested that the error would be thrown because of the assert, but in reality the error was thrown in _validateParams
disable=fixme was removed from accepted.rc because todos are acceptable for this project and should be taken care of in future

Following things were discussed offline:

Function calls (and definitions) that span > 1 line are left on developer and reviewer's discretion
the settings were chosen incrementally after generating a default rc from the tool, diffs added to PR
too-few-public-methods, invalid-name and other disables that are good to keep but hard to change (50 such) are left in the code with the intent to decrease them

tomasatdatabricks

lgtm with two minor comments. Thanks Yogesh!

tomasatdatabricks · 2018-04-24T19:06:35Z

python/sparkdl/transformers/named_image.py

-        transformer = _NamedImageTransformer(inputCol=self.getInputCol(),
-                                             outputCol=self._getIntermediateOutputCol(),
-                                             modelName=self.getModelName(), featurize=False)
+        transformer = _NamedImageTransformer(


Yeah I think it's better to compact the code as much as possible. It looks fine in this case but this pattern wastes a lot of space elsewhere and makes it harder to read in my opinion.

tomasatdatabricks · 2018-04-24T19:14:28Z

python/sparkdl/estimators/keras_image_file_estimator.py

@@ -264,7 +265,7 @@ def fitMultiple(self, dataset, paramMaps):
                     existence of a sufficiently large (and writable) file system, users are
                     advised to not train too many models in a single Spark job.
        """
-        [self._validateParams(pm) for pm in paramMaps]
+        assert all([self._validateParams(pm) for pm in paramMaps])


yeah I agree, you could move the assert above to signal that it is always expected to be there

…z -> resize_image

yogeshg · 2018-04-24T21:57:43Z

difference in accepted.rc vs default rc file generated by pylint.

129,137c129
<         dict-values-not-iterating,
<         unused-argument, too-many-arguments,
<         no-member,
<         missing-docstring,
<         no-init,
<         protected-access,
<         misplaced-comparison-constant,
<         no-else-return,
<         fixme
---
>         dict-values-not-iterating
215,217c207
<       TODO,
<       fixme,
<       todo
---
>       TODO
353c343
< #argument-naming-style=
---
> argument-naming-style=snake_case
357c347
< argument-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #argument-rgx=
360c350
< #attr-naming-style=
---
> attr-naming-style=snake_case
364c354
< attr-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #attr-rgx=
388c378
< #const-naming-style=
---
> const-naming-style=UPPER_CASE
392c382
< const-rgx=(([A-Z_][A-Z0-9_]*)|(__.*__))$|logger
---
> #const-rgx=
399c389
< #function-naming-style=
---
> function-naming-style=snake_case
403c393
< function-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #function-rgx=
411,415c401
<            _,
<            x, y, X, Y,
<            sc,
<            df,
<            PIL_decode, PIL_decode_and_resize, PIL_to_imageStruct
---
>            _
428c414
< #method-naming-style=
---
> method-naming-style=snake_case
432,433c418
< method-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$|(test_[a-zA-Z0-9_]{1,63})$
< #              [_]camelCase 3-30     |  __camelCase__ >1        |  _snake_case 3-30      | _snake_case | __snake_case__        | test_whatEva_h
---
> #method-rgx=
436c421
< #module-naming-style=
---
> module-naming-style=snake_case
440c425
< module-rgx=([a-z_][a-z0-9_]*)$|([a-z_][a-zA-Z0-9]*)$
---
> #module-rgx=
455c440
< #variable-naming-style=
---
> variable-naming-style=snake_case
459c444
< variable-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #variable-rgx=
465c450
< max-args=15
---
> max-args=5
477c462
< max-locals=31
---
> max-locals=15
480c465
< max-parents=15
---
> max-parents=7

yogeshg · 2018-04-24T21:59:56Z

diff between suggested and accepted file.

131c131,137
<         no-member 
---
>         no-member,
>         missing-docstring,
>         no-init,
>         protected-access,
>         misplaced-comparison-constant,
>         no-else-return,
>         fixme
209c215,217
<       TODO
---
>       TODO,
>       fixme,
>       todo
356c364
< #attr-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> attr-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
380c388
< const-naming-style=UPPER_CASE
---
> #const-naming-style=
384c392
< #const-rgx=
---
> const-rgx=(([A-Z_][A-Z0-9_]*)|(__.*__))$|logger
397d404
< 
406c413,415
<            sc
---
>            sc,
>            df,
>            PIL_decode, PIL_decode_and_resize, PIL_to_imageStruct
423c432,433
< method-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> method-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$|(test_[a-zA-Z0-9_]{1,63})$
> #              [_]camelCase 3-30     |  __camelCase__ >1        |  _snake_case 3-30      | _snake_case | __snake_case__        | test_whatEva_h
426c436
< module-naming-style=snake_case
---
> #module-naming-style=
430c440
< #module-rgx=
---
> module-rgx=([a-z_][a-z0-9_]*)$|([a-z_][a-zA-Z0-9]*)$

smurching

Thanks for the updates, LGTM!

smurching · 2018-04-25T18:42:34Z

Thanks for the PR! Merging with master :)

yogeshg added 30 commits April 20, 2018 15:33

[ML-3487] add accepted and suggested pylint rc files

4062166

space, comments, messages

f3b8f63

refactor: rename or ignore invalid-names; remove unused import

9edd45b

* remove unused type * rename globals * ignore a variable name

bugfix: variables named without updating in error message

b7f8c5a

space, comment, alignment changes

4681189

pylint disable protected-access

5683cd7

ignore stlye case wise

6328650

bugfix

a8a5062

pylint snake_case and camelCase attribute names allowed

6a92e56

ignore stlye case wise

e6e2ce2

add spaces, indent, expressions to simplify reading

50010e0

pylint disable what can be kept

6e69803

if len if -> if any; imports, ignore importing issues

fb35644

add spaces, indent, group imports

1637a09

requires refactoring

50b6dba

Undefined variable name 'imageType' in __all__

06013ec

add spaces, indent, group imports

65c2e2e

add and ignore todos

92c20d1

no len as comparsion, indent, simple expressions

852625b

add spaces, indent

a66e6f6

no-else-return seems weird on local machine, disabling

07ba657

group imports, ignore g, op names

ecb4181

remove trailing white spaces

f92fc4a

ignore import error, add refactor todo, lazy logging

018e36d

fix imports, disable some

2054849

use output of _validateParam, ignore too few in ThreadSafeIterator

6df2b2e

optimize imports, ignore fixme, no-self-use

862246b

optimize imports

5cbcd7c

optimize imports, space

9c170e8

fix cyclic import (sparkdl.param -> sparkdl.param.image_params)

5cd8161

optimize imports

5ce5b56

smurching reviewed Apr 24, 2018

View reviewed changes

smurching requested review from smurching and removed request for smurching April 24, 2018 02:01

smurching self-assigned this Apr 24, 2018

tomasatdatabricks approved these changes Apr 24, 2018

View reviewed changes

fixmes in code are acceptable; throw away value of validate_params; s…

e90d3ad

…z -> resize_image

smurching approved these changes Apr 25, 2018

View reviewed changes

smurching merged commit 16415f7 into databricks:master Apr 25, 2018

yogeshg deleted the ML-3487 branch April 25, 2018 19:44

yogeshg mentioned this pull request Apr 26, 2018

Dev scripts #118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add style checks and refactor suggestions #121

Add style checks and refactor suggestions #121

yogeshg commented Apr 24, 2018 •

edited

codecov-io commented Apr 24, 2018 •

edited

smurching left a comment

smurching Apr 24, 2018

yogeshg Apr 24, 2018 •

edited

smurching Apr 24, 2018

tomasatdatabricks Apr 24, 2018

yogeshg Apr 24, 2018

smurching Apr 24, 2018

yogeshg Apr 24, 2018

smurching Apr 24, 2018 •

edited

yogeshg Apr 24, 2018

tomasatdatabricks Apr 24, 2018

yogeshg commented Apr 24, 2018 •

edited

tomasatdatabricks left a comment

tomasatdatabricks Apr 24, 2018

tomasatdatabricks Apr 24, 2018

yogeshg commented Apr 24, 2018

yogeshg commented Apr 24, 2018

smurching left a comment

smurching commented Apr 25, 2018

Add style checks and refactor suggestions #121

Add style checks and refactor suggestions #121

Conversation

yogeshg commented Apr 24, 2018 • edited

codecov-io commented Apr 24, 2018 • edited

Codecov Report

smurching left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yogeshg Apr 24, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smurching Apr 24, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yogeshg commented Apr 24, 2018 • edited

tomasatdatabricks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yogeshg commented Apr 24, 2018

yogeshg commented Apr 24, 2018

smurching left a comment

Choose a reason for hiding this comment

smurching commented Apr 25, 2018

yogeshg commented Apr 24, 2018 •

edited

codecov-io commented Apr 24, 2018 •

edited

yogeshg Apr 24, 2018 •

edited

smurching Apr 24, 2018 •

edited

yogeshg commented Apr 24, 2018 •

edited