Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8706] [PySpark] [Project infra] Add pylint checks to PySpark #7241

Closed
wants to merge 9 commits into from

Conversation

MechCoder
Copy link
Contributor

This adds Pylint checks to PySpark.

For now this lazy installs using easy_install to /dev/pylint (similar to the pep8 script).
We still need to figure out what rules to be allowed.

@MechCoder
Copy link
Contributor Author

ping @JoshRosen

For now, it gives a score of 5.5 on 10

+-------------------------------+------------+
|message id |occurrences |
+===============================+============+
|invalid-name |2480 |
+-------------------------------+------------+
|missing-docstring |812 |
+-------------------------------+------------+
|protected-access |579 |
+-------------------------------+------------+
|unused-argument |364 |
+-------------------------------+------------+
|no-member |132 |
+-------------------------------+------------+
|unused-wildcard-import |110 |
+-------------------------------+------------+
|redefined-builtin |110 |
+-------------------------------+------------+
|too-many-arguments |74 |
+-------------------------------+------------+
|unused-variable |72 |
+-------------------------------+------------+
|too-few-public-methods |62 |
+-------------------------------+------------+
|bad-continuation |57 |
+-------------------------------+------------+
|duplicate-code |47 |
+-------------------------------+------------+
|redefined-outer-name |44 |
+-------------------------------+------------+
|too-many-ancestors |41 |
+-------------------------------+------------+
|import-error |37 |
+-------------------------------+------------+
|superfluous-parens |30 |
+-------------------------------+------------+
|unused-import |28 |
+-------------------------------+------------+
|line-too-long |25 |
+-------------------------------+------------+
|no-name-in-module |24 |
+-------------------------------+------------+
|unnecessary-lambda |23 |
+-------------------------------+------------+
|import-self |23 |
+-------------------------------+------------+
|no-self-use |22 |
+-------------------------------+------------+
|unidiomatic-typecheck |20 |
+-------------------------------+------------+
|fixme |20 |
+-------------------------------+------------+
|too-many-locals |19 |
+-------------------------------+------------+
|cyclic-import |19 |
+-------------------------------+------------+
|bad-builtin |19 |
+-------------------------------+------------+
|too-many-branches |15 |
+-------------------------------+------------+
|bare-except |14 |
+-------------------------------+------------+
|wildcard-import |13 |
+-------------------------------+------------+
|dangerous-default-value |13 |
+-------------------------------+------------+
|broad-except |13 |
+-------------------------------+------------+
|too-many-public-methods |9 |
+-------------------------------+------------+
|deprecated-lambda |9 |
+-------------------------------+------------+
|anomalous-backslash-in-string |9 |
+-------------------------------+------------+
|too-many-lines |8 |
+-------------------------------+------------+
|reimported |8 |
+-------------------------------+------------+
|too-many-statements |7 |
+-------------------------------+------------+
|bad-whitespace |7 |
+-------------------------------+------------+
|unpacking-non-sequence |6 |
+-------------------------------+------------+
|too-many-instance-attributes |6 |
+-------------------------------+------------+
|abstract-method |6 |
+-------------------------------+------------+
|old-style-class |5 |
+-------------------------------+------------+
|global-statement |5 |
+-------------------------------+------------+
|attribute-defined-outside-init |5 |
+-------------------------------+------------+
|arguments-differ |5 |
+-------------------------------+------------+
|undefined-all-variable |4 |
+-------------------------------+------------+
|no-init |4 |
+-------------------------------+------------+
|useless-else-on-loop |3 |
+-------------------------------+------------+
|super-init-not-called |3 |
+-------------------------------+------------+
|notimplemented-raised |3 |
+-------------------------------+------------+
|blacklisted-name |3 |
+-------------------------------+------------+
|trailing-whitespace |2 |
+-------------------------------+------------+
|too-many-return-statements |2 |
+-------------------------------+------------+
|pointless-string-statement |2 |
+-------------------------------+------------+
|global-variable-undefined |2 |
+-------------------------------+------------+
|bad-classmethod-argument |2 |
+-------------------------------+------------+
|too-many-format-args |1 |
+-------------------------------+------------+
|pointless-statement |1 |
+-------------------------------+------------+
|parse-error |1 |
+-------------------------------+------------+
|no-self-argument |1 |
+-------------------------------+------------+

@JoshRosen
Copy link
Contributor

I think that we should start by creating a PyLint configuration file which disables checks that we're not interested in fixing / are too noisey to be useful. I'd start by trying to disable the checks which warn about excessive complexity, since those will be the hardest to fix and are some of the least-helpful warnings (e.g. too-many-return-statements, too-many-instance-attributes, too-many-ancestors`, etc.

@MechCoder
Copy link
Contributor Author

Okay. Could you please check quickly if the general approach is okay?

@JoshRosen
Copy link
Contributor

The overall approach seems fine to me. Can you post the more detailed PyLint output, perhaps as a Gist? I'd like to see whether we're getting false-positives due to PYTHONPATH import issues.

@MechCoder
Copy link
Contributor Author

@JoshRosen
Copy link
Contributor

A few off-the-cuff thoughts (will have time to comment in more detail later):

  • We should exclude the heapq3 module for now.
  • We need to redefine the regex used for matching names so that it doesn't complain about Java-style camelCased variable and method names.
  • We might want to disable the single-character variable name warning; let's leave it up to reviewer judgement to handle those.
  • We should disable "too many lines in module" warning.

Once we disable the noisy variable name regex warnings, we'll be able to get a better sense of how many legitimate warnings we have.

@MechCoder
Copy link
Contributor Author

Thanks for the quick comments. I just wanted to make sure I was on the right track. I shall ping you after I do the configuration changes.

@MechCoder
Copy link
Contributor Author

@JoshRosen I have added a configuration file that disables a few warnings and increases the code quality to 8.59 . However, I suggest we disable all warnings and enable only those which we feel are very important right now (and maybe add others incrementally). wdyt?

The updated gist can be seen here (https://gist.github.com/MechCoder/0c4433ff12f8f156d6f1)

@SparkQA
Copy link

SparkQA commented Jul 6, 2015

Test build #36589 has finished for PR 7241 at commit d28109f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 6, 2015

Test build #36595 has finished for PR 7241 at commit 892ac22.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 6, 2015

Test build #36596 has finished for PR 7241 at commit 24e13f9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

I think that PyLint has a feature for writing out a nicely-formatted default configuration file which captures the values of all settings, including the defaults. Do you mind using that to generate a template configuration, then edit that configuration to reflect your changes? This is in line with what we did for scalastyle.xml.

@SparkQA
Copy link

SparkQA commented Jul 8, 2015

Test build #36808 has finished for PR 7241 at commit 3464666.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • # mixin class is detected if its name ends with "mixin" (case insensitive).
    • # Maximum number of parents for a class (see R0901).
    • # Maximum number of attributes for a class (see R0902).
    • # Minimum number of public methods for a class (see R0903).
    • # Maximum number of public methods for a class (see R0904).

@MechCoder
Copy link
Contributor Author

@JoshRosen Sorry for the delay. I have added the default configuration file. Now the question is what tests we would like to enable.

The entire list is here (http://docs.pylint.org/features.html)

@JoshRosen
Copy link
Contributor

I like your suggestion of disabling most of the warnings for now, then gradually re-enabling them in followup patches.

Is pylint currently configured to treat all warnings as errors (e.g. fail the linter if a warning is printed)? If not, I think we should go ahead and switch to that fail-fast behavior so that we don't unknowingly introduce new violations. Once we get that basic infra in place, we can commit this and iterate on enabling more warnings.

@SparkQA
Copy link

SparkQA commented Jul 8, 2015

Test build #36811 has finished for PR 7241 at commit f9dac65.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • # mixin class is detected if its name ends with "mixin" (case insensitive).
    • # Maximum number of parents for a class (see R0901).
    • # Maximum number of attributes for a class (see R0902).
    • # Minimum number of public methods for a class (see R0903).
    • # Maximum number of public methods for a class (see R0904).

@@ -70,4 +70,26 @@ fi
# rm "$PEP8_SCRIPT_PATH"
rm "$PYTHON_LINT_REPORT_PATH"

# Easy install pylint in /dev/pylint. To easy_install into a directory, the PYTHONPATH should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we group this installation with the pep8 installation above?

@MechCoder
Copy link
Contributor Author

@JoshRosen Some updates

I disabled only the errors that pylint was already throwing (instead of all), so that it prevents creep of new errors.
Now ./dev/lint-python will throw an error instead of silently giving a report.
In the last commit, I quickly fixed 2 pylint errors, just for starters.

done

if [ "${PIPESTATUS[0]}" -ne 0 ]; then
lint_status=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be assigning a non-zero value in order to trigger a build failure?

@@ -238,7 +238,6 @@ def test_basic_functions(self):
df = self.sqlCtx.jsonRDD(rdd)
df.count()
df.collect()
df.schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the pylint error for this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointless statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that this was testing whether df.schema threw an error or not. Can you re-enable this statement and try adding a comment to exclude this from Pylint to confirm that the exclusion comment works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

@MechCoder
Copy link
Contributor Author

Thanks for your quick feedback. I have addressed your comments. Btw is easy_install made available on Jenkins?

@MechCoder
Copy link
Contributor Author

Actually I cherry-picked the fix onto this branch. That should work right?

@MechCoder
Copy link
Contributor Author

I think the code that failed the pylint tests, was merged after the test status showed passed by Jenkins. Hence the test failure in master.

@MechCoder
Copy link
Contributor Author

@JoshRosen If this does pass tests, it would be great to merge it as quickly to avoid future test failures.

@davies
Copy link
Contributor

davies commented Jul 13, 2015

Sounds good to me.

@JoshRosen
Copy link
Contributor

After we merge this, let's kick off Jenkins re-builds on all of the open PySpark PRs.

@SparkQA
Copy link

SparkQA commented Jul 13, 2015

Test build #37151 has finished for PR 7241 at commit 2fc7291.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MechCoder
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37196 has finished for PR 7241 at commit 2fc7291.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@JoshRosen
Copy link
Contributor

Master's tests might be in a bad state right now. If all else fails, I can try manually running this tomorrow on my local machine to make sure that tests pass, then can merge and re-trigger tests on all Python PRs. Thanks for your patience while we've been fighting build / test break issues.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #8 has finished for PR 7241 at commit 2fc7291.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37211 has finished for PR 7241 at commit 2fc7291.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MechCoder
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #22 has finished for PR 7241 at commit 2fc7291.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37345 has finished for PR 7241 at commit 2fc7291.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Ping @davies, since this is now passing tests do you think that we should try merging it in again and hotfixing any problems?

@davies
Copy link
Contributor

davies commented Jul 15, 2015

Yes, merging this into master!

@asfgit asfgit closed this in 20bb10f Jul 15, 2015
@MechCoder MechCoder deleted the pylint branch July 15, 2015 16:03
@MechCoder
Copy link
Contributor Author

thanks for the review !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants