New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8706] [PySpark] [Project infra] Add pylint checks to PySpark #7241
Conversation
ping @JoshRosen For now, it gives a score of 5.5 on 10 +-------------------------------+------------+ |
I think that we should start by creating a PyLint configuration file which disables checks that we're not interested in fixing / are too noisey to be useful. I'd start by trying to disable the checks which warn about excessive complexity, since those will be the hardest to fix and are some of the least-helpful warnings (e.g. |
Okay. Could you please check quickly if the general approach is okay? |
The overall approach seems fine to me. Can you post the more detailed PyLint output, perhaps as a Gist? I'd like to see whether we're getting false-positives due to PYTHONPATH import issues. |
A few off-the-cuff thoughts (will have time to comment in more detail later):
Once we disable the noisy variable name regex warnings, we'll be able to get a better sense of how many legitimate warnings we have. |
Thanks for the quick comments. I just wanted to make sure I was on the right track. I shall ping you after I do the configuration changes. |
@JoshRosen I have added a configuration file that disables a few warnings and increases the code quality to 8.59 . However, I suggest we disable all warnings and enable only those which we feel are very important right now (and maybe add others incrementally). wdyt? The updated gist can be seen here (https://gist.github.com/MechCoder/0c4433ff12f8f156d6f1) |
Test build #36589 has finished for PR 7241 at commit
|
Test build #36595 has finished for PR 7241 at commit
|
Test build #36596 has finished for PR 7241 at commit
|
I think that PyLint has a feature for writing out a nicely-formatted default configuration file which captures the values of all settings, including the defaults. Do you mind using that to generate a template configuration, then edit that configuration to reflect your changes? This is in line with what we did for scalastyle.xml. |
Test build #36808 has finished for PR 7241 at commit
|
@JoshRosen Sorry for the delay. I have added the default configuration file. Now the question is what tests we would like to enable. The entire list is here (http://docs.pylint.org/features.html) |
I like your suggestion of disabling most of the warnings for now, then gradually re-enabling them in followup patches. Is pylint currently configured to treat all warnings as errors (e.g. fail the linter if a warning is printed)? If not, I think we should go ahead and switch to that fail-fast behavior so that we don't unknowingly introduce new violations. Once we get that basic infra in place, we can commit this and iterate on enabling more warnings. |
Test build #36811 has finished for PR 7241 at commit
|
@@ -70,4 +70,26 @@ fi | |||
# rm "$PEP8_SCRIPT_PATH" | |||
rm "$PYTHON_LINT_REPORT_PATH" | |||
|
|||
# Easy install pylint in /dev/pylint. To easy_install into a directory, the PYTHONPATH should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we group this installation with the pep8 installation above?
@JoshRosen Some updates I disabled only the errors that pylint was already throwing (instead of all), so that it prevents creep of new errors. |
done | ||
|
||
if [ "${PIPESTATUS[0]}" -ne 0 ]; then | ||
lint_status=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be assigning a non-zero value in order to trigger a build failure?
@@ -238,7 +238,6 @@ def test_basic_functions(self): | |||
df = self.sqlCtx.jsonRDD(rdd) | |||
df.count() | |||
df.collect() | |||
df.schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the pylint error for this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointless statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that this was testing whether df.schema
threw an error or not. Can you re-enable this statement and try adding a comment to exclude this from Pylint to confirm that the exclusion comment works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
Thanks for your quick feedback. I have addressed your comments. Btw is easy_install made available on Jenkins? |
Actually I cherry-picked the fix onto this branch. That should work right? |
I think the code that failed the pylint tests, was merged after the test status showed passed by Jenkins. Hence the test failure in master. |
@JoshRosen If this does pass tests, it would be great to merge it as quickly to avoid future test failures. |
Sounds good to me. |
After we merge this, let's kick off Jenkins re-builds on all of the open PySpark PRs. |
Test build #37151 has finished for PR 7241 at commit
|
retest this please |
Test build #37196 has finished for PR 7241 at commit
|
Jenkins, retest this please. |
Master's tests might be in a bad state right now. If all else fails, I can try manually running this tomorrow on my local machine to make sure that tests pass, then can merge and re-trigger tests on all Python PRs. Thanks for your patience while we've been fighting build / test break issues. |
Test build #8 has finished for PR 7241 at commit
|
Test build #37211 has finished for PR 7241 at commit
|
test this please |
Test build #22 has finished for PR 7241 at commit
|
Test build #37345 has finished for PR 7241 at commit
|
Ping @davies, since this is now passing tests do you think that we should try merging it in again and hotfixing any problems? |
Yes, merging this into master! |
thanks for the review ! |
This adds Pylint checks to PySpark.
For now this lazy installs using easy_install to /dev/pylint (similar to the pep8 script).
We still need to figure out what rules to be allowed.