New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add style checks and refactor suggestions #121
Conversation
* remove unused type * rename globals * ignore a variable name
Codecov Report
@@ Coverage Diff @@
## master #121 +/- ##
==========================================
- Coverage 85.38% 85.37% -0.02%
==========================================
Files 33 33
Lines 1923 1921 -2
Branches 44 44
==========================================
- Hits 1642 1640 -2
Misses 281 281
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yogeshg this is great, thanks for making this PR :)
Had a few Qs in comments below. Was also wondering - it looks like there are some pylint-disabling annotations that appear frequently & it'd be nice to find a way to have slightly fewer style annotations in our code:
# pylint: disable=too-few-public-methods
. This shows up a lot but it looks like it actually catches non-Pythonic code more than scenarios where we simply extend a class & so don't have many public methods, so I'm fine keeping it. I guess the alternative would be to disable it or reduce the min number of public methods in .pylint.pylint: disable=fixme
. Is it possible to remove the need for this annotation by just allowing TODOs etc as per https://stackoverflow.com/questions/33157982/how-do-i-disable-todo-warnings-in-pylint?# pylint: disable=invalid-name
. Can we get rid of this by renaming variables?
Thanks again, happy you made this PR party-hat-emoji
@@ -0,0 +1,556 @@ | |||
[MASTER] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: how'd you pick the settings in this file (& in .pylint/suggested.rc
)? is there a default PEP8 pylint file that we can/should diff this one against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generated the default rc file that pylint uses and made incremental changes to suggested.rc
and accepted.rc
. I am adding a diff too, so that larger audience can see that. Thanks for pointing out!
(dev3) yo@C02VM7GBHTD5:~/code/spark-deep-learning$ pylint --generate-rcfile > python/.pylint/default.rc
No config file found, using default configuration
Note: on my local python2 and 3 generate sections in different order, so use python2.7.
@@ -264,7 +265,7 @@ def fitMultiple(self, dataset, paramMaps): | |||
existence of a sufficiently large (and writable) file system, users are | |||
advised to not train too many models in a single Spark job. | |||
""" | |||
[self._validateParams(pm) for pm in paramMaps] | |||
assert all([self._validateParams(pm) for pm in paramMaps]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: why the assert? looks like we already raise a ValueError if a paramMap is invalid in _validateParams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree, you could move the assert above to signal that it is always expected to be there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert any([...])
was changed to _ = [...]
for readability, because it suggested that the error would be thrown because of the assert, but in reality the error was thrown in _validateParams
python/sparkdl/graph/builder.py
Outdated
@@ -83,15 +84,15 @@ def asGraphFunction(self, inputs, outputs, strip_and_freeze=True): | |||
|
|||
:param inputs: list, graph elements representing the inputs | |||
:param outputs: list, graph elements representing the outputs | |||
:param strip_and_freeze: bool, should we remove unused part of the graph and freee its values | |||
:param strip_and_freeze: bool, should we remove unused part of the graph and free its values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol I think it's actually supposed to be freeze
instead of free
:P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea how this happened :P
transformer = _NamedImageTransformer(inputCol=self.getInputCol(), | ||
outputCol=self._getIntermediateOutputCol(), | ||
modelName=self.getModelName(), featurize=False) | ||
transformer = _NamedImageTransformer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of the style things we're inconsistent about in this repo is how to handle wrapping function calls that span > 1 line. i think we'd decided to use multiple lines but try to fit as many arguments as possible onto each line, e.g:
transformer = _NamedImageTransformer(
inputCol=self.getInputCol(), outputCol=self._getIntermediateOutputCol(),
modelName=self.getModelName(), featurize=False)
i'll let @sueann confirm, but i'm fine with this as is too :) (especially since it seems like it'd be a pretty substantial change to make the style consistent everywhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with the path of least resistance while making changes, however I agree we should decide and document this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's better to compact the code as much as possible. It looks fine in this case but this pattern wastes a lot of space elsewhere and makes it harder to read in my opinion.
Thanks for the review @smurching and @tomasatdatabricks !
Following things were discussed offline:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with two minor comments. Thanks Yogesh!
transformer = _NamedImageTransformer(inputCol=self.getInputCol(), | ||
outputCol=self._getIntermediateOutputCol(), | ||
modelName=self.getModelName(), featurize=False) | ||
transformer = _NamedImageTransformer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's better to compact the code as much as possible. It looks fine in this case but this pattern wastes a lot of space elsewhere and makes it harder to read in my opinion.
@@ -264,7 +265,7 @@ def fitMultiple(self, dataset, paramMaps): | |||
existence of a sufficiently large (and writable) file system, users are | |||
advised to not train too many models in a single Spark job. | |||
""" | |||
[self._validateParams(pm) for pm in paramMaps] | |||
assert all([self._validateParams(pm) for pm in paramMaps]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree, you could move the assert above to signal that it is always expected to be there
…z -> resize_image
difference in 129,137c129
< dict-values-not-iterating,
< unused-argument, too-many-arguments,
< no-member,
< missing-docstring,
< no-init,
< protected-access,
< misplaced-comparison-constant,
< no-else-return,
< fixme
---
> dict-values-not-iterating
215,217c207
< TODO,
< fixme,
< todo
---
> TODO
353c343
< #argument-naming-style=
---
> argument-naming-style=snake_case
357c347
< argument-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #argument-rgx=
360c350
< #attr-naming-style=
---
> attr-naming-style=snake_case
364c354
< attr-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #attr-rgx=
388c378
< #const-naming-style=
---
> const-naming-style=UPPER_CASE
392c382
< const-rgx=(([A-Z_][A-Z0-9_]*)|(__.*__))$|logger
---
> #const-rgx=
399c389
< #function-naming-style=
---
> function-naming-style=snake_case
403c393
< function-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #function-rgx=
411,415c401
< _,
< x, y, X, Y,
< sc,
< df,
< PIL_decode, PIL_decode_and_resize, PIL_to_imageStruct
---
> _
428c414
< #method-naming-style=
---
> method-naming-style=snake_case
432,433c418
< method-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$|(test_[a-zA-Z0-9_]{1,63})$
< # [_]camelCase 3-30 | __camelCase__ >1 | _snake_case 3-30 | _snake_case | __snake_case__ | test_whatEva_h
---
> #method-rgx=
436c421
< #module-naming-style=
---
> module-naming-style=snake_case
440c425
< module-rgx=([a-z_][a-z0-9_]*)$|([a-z_][a-zA-Z0-9]*)$
---
> #module-rgx=
455c440
< #variable-naming-style=
---
> variable-naming-style=snake_case
459c444
< variable-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> #variable-rgx=
465c450
< max-args=15
---
> max-args=5
477c462
< max-locals=31
---
> max-locals=15
480c465
< max-parents=15
---
> max-parents=7 |
diff between suggested and accepted file. 131c131,137
< no-member
---
> no-member,
> missing-docstring,
> no-init,
> protected-access,
> misplaced-comparison-constant,
> no-else-return,
> fixme
209c215,217
< TODO
---
> TODO,
> fixme,
> todo
356c364
< #attr-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> attr-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
380c388
< const-naming-style=UPPER_CASE
---
> #const-naming-style=
384c392
< #const-rgx=
---
> const-rgx=(([A-Z_][A-Z0-9_]*)|(__.*__))$|logger
397d404
<
406c413,415
< sc
---
> sc,
> df,
> PIL_decode, PIL_decode_and_resize, PIL_to_imageStruct
423c432,433
< method-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$
---
> method-rgx=(([a-z_][a-zA-Z0-9]{2,30})|(__[a-z][a-zA-Z0-9_]+__))$|(([a-z_][a-z0-9_]{2,30})|(_[a-z0-9_]*)|(__[a-z][a-z0-9_]+__))$|(test_[a-zA-Z0-9_]{1,63})$
> # [_]camelCase 3-30 | __camelCase__ >1 | _snake_case 3-30 | _snake_case | __snake_case__ | test_whatEva_h
426c436
< module-naming-style=snake_case
---
> #module-naming-style=
430c440
< #module-rgx=
---
> module-rgx=([a-z_][a-z0-9_]*)$|([a-z_][a-zA-Z0-9]*)$
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, LGTM!
Thanks for the PR! Merging with master :) |
In this PR, we
python/.pylint/suggested.rc
adapted from the default configuration generated by pylintcamelCase
andsnake_case
using regexes lifted from pylint source codeunused-argument, too-many-arguments, no-member, missing-docstring, no-init, protected-access, misplaced-comparison-constant, no-else-return, fixme
# pylint: disable=...
because it was hard to refactor without thorough testingSome style decisions that were discussed are:
todo
stodo
marks in code because these are acceptable for this project and should be taken care of in futurefind python/sparkdl | grep ".*\.py$" | xargs egrep -ino --color=auto "(TODO|FIXME|# pylint).*"