Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add style checks and refactor suggestions #121

Merged
merged 32 commits into from Apr 25, 2018
Merged

Conversation

yogeshg
Copy link
Contributor

@yogeshg yogeshg commented Apr 24, 2018

In this PR, we

  • add python/.pylint/suggested.rc adapted from the default configuration generated by pylint
  • allow both camelCase and snake_case using regexes lifted from pylint source code
  • increase thresholds for number of arguments, local, variables
  • disable checks that are used often in this project: unused-argument, too-many-arguments, no-member, missing-docstring, no-init, protected-access, misplaced-comparison-constant, no-else-return, fixme
  • escape some code with # pylint: disable=... because it was hard to refactor without thorough testing

Some style decisions that were discussed are:

  • disables are acceptable if there is no other way to do this, in which case a comment must be left explaining that
  • other disables should be removed and should be considered similar to todos
  • we allow todo marks in code because these are acceptable for this project and should be taken care of in future
  • there are 50 todos, fixmes or pylint disables currently, we should aim to bring this down
    find python/sparkdl | grep ".*\.py$" | xargs egrep -ino --color=auto "(TODO|FIXME|# pylint).*"
  • function calls and function defintions that span more than 1 line are left to committer and reviewer's discretion
    • pep8 style:
    long_function_name(
        long_argument_one = "one",
        long_argument_two = "two",
        long_argument_three = "three",
        long_argument_four = "four",
        long_argument_five = "five")
    
    • MLlib style:
    long_function_name(
        long_argument_one = "one", long_argument_two = "two", long_argument_three = "three",
        long_argument_four = "four", long_argument_five = "five")
    

 * remove unused type
 * rename globals
 * ignore a variable name
@codecov-io
Copy link

codecov-io commented Apr 24, 2018

Codecov Report

Merging #121 into master will decrease coverage by 0.01%.
The diff coverage is 93.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #121      +/-   ##
==========================================
- Coverage   85.38%   85.37%   -0.02%     
==========================================
  Files          33       33              
  Lines        1923     1921       -2     
  Branches       44       44              
==========================================
- Hits         1642     1640       -2     
  Misses        281      281
Impacted Files Coverage Δ
...n/sparkdl/estimators/keras_image_file_estimator.py 88.97% <ø> (ø) ⬆️
python/sparkdl/__init__.py 100% <ø> (ø) ⬆️
python/sparkdl/graph/input.py 98.05% <ø> (ø) ⬆️
python/sparkdl/graph/pieces.py 96.55% <0%> (ø) ⬆️
python/sparkdl/transformers/keras_applications.py 85.71% <100%> (ø) ⬆️
python/sparkdl/transformers/named_image.py 90.06% <100%> (-0.07%) ⬇️
python/sparkdl/graph/builder.py 93.85% <100%> (+0.1%) ⬆️
python/sparkdl/transformers/keras_tensor.py 100% <100%> (ø) ⬆️
python/sparkdl/transformers/keras_image.py 100% <100%> (ø) ⬆️
python/sparkdl/param/shared_params.py 86.13% <100%> (-0.27%) ⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f05fc2...e90d3ad. Read the comment docs.

Copy link
Collaborator

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yogeshg this is great, thanks for making this PR :)

Had a few Qs in comments below. Was also wondering - it looks like there are some pylint-disabling annotations that appear frequently & it'd be nice to find a way to have slightly fewer style annotations in our code:

  • # pylint: disable=too-few-public-methods. This shows up a lot but it looks like it actually catches non-Pythonic code more than scenarios where we simply extend a class & so don't have many public methods, so I'm fine keeping it. I guess the alternative would be to disable it or reduce the min number of public methods in .pylint.
  • pylint: disable=fixme. Is it possible to remove the need for this annotation by just allowing TODOs etc as per https://stackoverflow.com/questions/33157982/how-do-i-disable-todo-warnings-in-pylint?
  • # pylint: disable=invalid-name. Can we get rid of this by renaming variables?

Thanks again, happy you made this PR party-hat-emoji

@@ -0,0 +1,556 @@
[MASTER]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: how'd you pick the settings in this file (& in .pylint/suggested.rc)? is there a default PEP8 pylint file that we can/should diff this one against?

Copy link
Contributor Author

@yogeshg yogeshg Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generated the default rc file that pylint uses and made incremental changes to suggested.rc and accepted.rc. I am adding a diff too, so that larger audience can see that. Thanks for pointing out!

(dev3) yo@C02VM7GBHTD5:~/code/spark-deep-learning$ pylint --generate-rcfile > python/.pylint/default.rc
No config file found, using default configuration

Note: on my local python2 and 3 generate sections in different order, so use python2.7.

@@ -264,7 +265,7 @@ def fitMultiple(self, dataset, paramMaps):
existence of a sufficiently large (and writable) file system, users are
advised to not train too many models in a single Spark job.
"""
[self._validateParams(pm) for pm in paramMaps]
assert all([self._validateParams(pm) for pm in paramMaps])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why the assert? looks like we already raise a ValueError if a paramMap is invalid in _validateParams

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree, you could move the assert above to signal that it is always expected to be there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert any([...]) was changed to _ = [...] for readability, because it suggested that the error would be thrown because of the assert, but in reality the error was thrown in _validateParams

@@ -83,15 +84,15 @@ def asGraphFunction(self, inputs, outputs, strip_and_freeze=True):

:param inputs: list, graph elements representing the inputs
:param outputs: list, graph elements representing the outputs
:param strip_and_freeze: bool, should we remove unused part of the graph and freee its values
:param strip_and_freeze: bool, should we remove unused part of the graph and free its values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I think it's actually supposed to be freeze instead of free :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea how this happened :P

transformer = _NamedImageTransformer(inputCol=self.getInputCol(),
outputCol=self._getIntermediateOutputCol(),
modelName=self.getModelName(), featurize=False)
transformer = _NamedImageTransformer(
Copy link
Collaborator

@smurching smurching Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the style things we're inconsistent about in this repo is how to handle wrapping function calls that span > 1 line. i think we'd decided to use multiple lines but try to fit as many arguments as possible onto each line, e.g:

        transformer = _NamedImageTransformer(
            inputCol=self.getInputCol(), outputCol=self._getIntermediateOutputCol(),
            modelName=self.getModelName(), featurize=False)

i'll let @sueann confirm, but i'm fine with this as is too :) (especially since it seems like it'd be a pretty substantial change to make the style consistent everywhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the path of least resistance while making changes, however I agree we should decide and document this change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's better to compact the code as much as possible. It looks fine in this case but this pattern wastes a lot of space elsewhere and makes it harder to read in my opinion.

@smurching smurching requested review from smurching and removed request for smurching April 24, 2018 02:01
@smurching smurching self-assigned this Apr 24, 2018
@yogeshg
Copy link
Contributor Author

yogeshg commented Apr 24, 2018

Thanks for the review @smurching and @tomasatdatabricks !
I update with following changes:

  • freeze instead of free: python/sparkdl/graph/builder.py
  • assert any([...]) was changed to _ = [...] for readability, because it suggested that the error would be thrown because of the assert, but in reality the error was thrown in _validateParams
  • disable=fixme was removed from accepted.rc because todos are acceptable for this project and should be taken care of in future

Following things were discussed offline:

  • Function calls (and definitions) that span > 1 line are left on developer and reviewer's discretion
  • the settings were chosen incrementally after generating a default rc from the tool, diffs added to PR
  • too-few-public-methods, invalid-name and other disables that are good to keep but hard to change (50 such) are left in the code with the intent to decrease them

Copy link
Collaborator

@tomasatdatabricks tomasatdatabricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with two minor comments. Thanks Yogesh!

transformer = _NamedImageTransformer(inputCol=self.getInputCol(),
outputCol=self._getIntermediateOutputCol(),
modelName=self.getModelName(), featurize=False)
transformer = _NamedImageTransformer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's better to compact the code as much as possible. It looks fine in this case but this pattern wastes a lot of space elsewhere and makes it harder to read in my opinion.

@@ -264,7 +265,7 @@ def fitMultiple(self, dataset, paramMaps):
existence of a sufficiently large (and writable) file system, users are
advised to not train too many models in a single Spark job.
"""
[self._validateParams(pm) for pm in paramMaps]
assert all([self._validateParams(pm) for pm in paramMaps])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree, you could move the assert above to signal that it is always expected to be there

@yogeshg
Copy link
Contributor Author

yogeshg commented Apr 24, 2018

difference in accepted.rc vs default rc file generated by pylint.

129,137c129
<         dict-values-not-iterating,
<         unused-argument, too-many-arguments,
<         no-member,
<         missing-docstring,
<         no-init,
<         protected-access,
<         misplaced-comparison-constant,
<         no-else-return,
<         fixme
---
>         dict-values-not-iterating
215,217c207
<       TODO,
<       fixme,
<       todo
---
>       TODO
353c343
< #argument-naming-style=
---
> argument-naming-style=snake_case
357c347
< argument-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #argument-rgx=
360c350
< #attr-naming-style=
---
> attr-naming-style=snake_case
364c354
< attr-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #attr-rgx=
388c378
< #const-naming-style=
---
> const-naming-style=UPPER_CASE
392c382
< const-rgx=(([A-Z_][A-Z0-9_]*)|(__.*__))$|logger
---
> #const-rgx=
399c389
< #function-naming-style=
---
> function-naming-style=snake_case
403c393
< function-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #function-rgx=
411,415c401
<            _,
<            x, y, X, Y,
<            sc,
<            df,
<            PIL_decode, PIL_decode_and_resize, PIL_to_imageStruct
---
>            _
428c414
< #method-naming-style=
---
> method-naming-style=snake_case
432,433c418
< method-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$|(test_[a-zA-Z0-9_]{1,63})$
< #              [_]camelCase 3-30     |  __camelCase__ >1        |  _snake_case 3-30      | _snake_case | __snake_case__        | test_whatEva_h
---
> #method-rgx=
436c421
< #module-naming-style=
---
> module-naming-style=snake_case
440c425
< module-rgx=([a-z_][a-z0-9_]*)$|([a-z_][a-zA-Z0-9]*)$
---
> #module-rgx=
455c440
< #variable-naming-style=
---
> variable-naming-style=snake_case
459c444
< variable-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #variable-rgx=
465c450
< max-args=15
---
> max-args=5
477c462
< max-locals=31
---
> max-locals=15
480c465
< max-parents=15
---
> max-parents=7

@yogeshg
Copy link
Contributor Author

yogeshg commented Apr 24, 2018

diff between suggested and accepted file.

131c131,137
<         no-member 
---
>         no-member,
>         missing-docstring,
>         no-init,
>         protected-access,
>         misplaced-comparison-constant,
>         no-else-return,
>         fixme
209c215,217
<       TODO
---
>       TODO,
>       fixme,
>       todo
356c364
< #attr-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> attr-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
380c388
< const-naming-style=UPPER_CASE
---
> #const-naming-style=
384c392
< #const-rgx=
---
> const-rgx=(([A-Z_][A-Z0-9_]*)|(__.*__))$|logger
397d404
< 
406c413,415
<            sc
---
>            sc,
>            df,
>            PIL_decode, PIL_decode_and_resize, PIL_to_imageStruct
423c432,433
< method-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> method-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$|(test_[a-zA-Z0-9_]{1,63})$
> #              [_]camelCase 3-30     |  __camelCase__ >1        |  _snake_case 3-30      | _snake_case | __snake_case__        | test_whatEva_h
426c436
< module-naming-style=snake_case
---
> #module-naming-style=
430c440
< #module-rgx=
---
> module-rgx=([a-z_][a-z0-9_]*)$|([a-z_][a-zA-Z0-9]*)$

Copy link
Collaborator

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, LGTM!

@smurching
Copy link
Collaborator

Thanks for the PR! Merging with master :)

@smurching smurching merged commit 16415f7 into databricks:master Apr 25, 2018
@yogeshg yogeshg deleted the ML-3487 branch April 25, 2018 19:44
@yogeshg yogeshg mentioned this pull request Apr 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants