Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically #1744

Closed
wants to merge 28 commits into from
Closed

[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically #1744

wants to merge 28 commits into from

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented Aug 2, 2014

As described in SPARK-2627, we'd like Python code to automatically be checked for PEP 8 compliance by Jenkins. This pull request aims to do that.

Notes:

  • We may need to install pep8 on the build server.
  • I'm expecting tests to fail now that PEP 8 compliance is being checked as part of the build. I'm fine with cleaning up any remaining PEP 8 violations as part of this pull request.
  • I did not understand why the RAT and scalastyle reports are saved to text files. I did the same for the PEP 8 check, but only so that the console output style can match those for the RAT and scalastyle checks. The PEP 8 report is removed right after the check is complete.
  • Updates to the "Contributing to Spark" guide will be submitted elsewhere, as I don't believe that text is part of the Spark repo.

The RAT and PEP8 checks don’t print a blank line after successful runs.
The scalastyle check shouldn’t either.
This guy just runs the pep8 utility on all code in the python
directory, minus cloudpickle, which is a 3rd-party library.
This guy just calls scalastyle.
@SparkQA
Copy link

SparkQA commented Aug 2, 2014

QA tests have started for PR 1744. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17788/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 2, 2014

QA results for PR 1744:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17788/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 3, 2014

QA tests have started for PR 1744. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17800/consoleFull

@pwendell
Copy link
Contributor

pwendell commented Aug 3, 2014

Hey nick - thanks for taking a crack at this. It's great to see us adding more automated code quality checks. Couple things:

  1. Could you add [PySpark] to the title of this PR? We are using tags like that to do sorting amongst the committership and it will get noticed that way.
  2. In terms of the dependency on pep8, we've tried really hard to avoid having exogenous dependencies in Spark. It makes porting things like our QA environment very difficult. So one idea - could this have a script that just lazily fetches the pep8 library directly? For instance, this is what we do with our sbt tool - we just wget the sbt jar... it seems like you could do something similar for pep8. Not sure if that totally works, but just an idea.

@SparkQA
Copy link

SparkQA commented Aug 3, 2014

QA results for PR 1744:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17800/consoleFull

@nchammas nchammas changed the title [SPARK-2627] have the build enforce PEP 8 automatically [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically Aug 3, 2014
*fingers crossed*

I admit I’m not exactly sure how this works… Let’s see if I did the
right thing.
@nchammas
Copy link
Contributor Author

nchammas commented Aug 3, 2014

Hey Pat!

  1. I've edited the title accordingly.
  2. Makes sense. I'll take a crack at fetching pep8 lazily as you describe. Is there something you can point me to that I can use as a reference for how to do this?

By the way, I'm not sure I resolved my merge conflicts in the best way. Does it look good to you?

@SparkQA
Copy link

SparkQA commented Aug 3, 2014

QA tests have started for PR 1744. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17808/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 3, 2014

QA results for PR 1744:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17808/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 3, 2014

QA tests have started for PR 1744. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17809/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 3, 2014

QA tests have started for PR 1744. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17810/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 3, 2014

QA results for PR 1744:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17809/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 3, 2014

QA results for PR 1744:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17810/consoleFull

Everything looks cramped and it’s hard to tell at a glance where
sections begin. Adding a blank line between sections should fix that.
See the discussion here:
#1744 (comment)

Get the pep8 utility at runtime so that it’s not required to be
installed on the build server.
@nchammas
Copy link
Contributor Author

nchammas commented Aug 3, 2014

Patrick, I believe I've addressed the issues you called out. I'm not sure if the approach I took to getting pep8 at runtime is the best. Let me know what you think.

@SparkQA
Copy link

SparkQA commented Aug 3, 2014

QA tests have started for PR 1744. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17816/consoleFull

Don’t just assume curl got the file alright. Check and exit properly if
there were any problems.
@rxin
Copy link
Contributor

rxin commented Aug 6, 2014

This looks good to me. What do you think, @JoshRosen / @davies ?

@davies
Copy link
Contributor

davies commented Aug 6, 2014

One question (maybe too late), could we customize the lint tool to relax some rules? Such as change line width to 100, or change indent to 2 spaces. Recently, I found that it's not easy to write the code to fit with 4 spaces and 80 chars, always needed to break the line, and finally, the code became not easy to read, especially when you have several levels of structures, such as class, function, function, if and so on.

If we can have a tool to format the code to follow pep8 that will even better.

@rxin
Copy link
Contributor

rxin commented Aug 6, 2014

The line length is already 100 in this PR.

"""General PySpark tests that depend on numpy """

def test_statcounter_array(self):
x = self.sc.parallelize([np.array([1.0,1.0]), np.array([2.0,2.0]), np.array([3.0,3.0])])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line has 96 characters, need break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no need for the break. I'll fix this. (It's actually 99 characters without the break, I believe.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and it's now 99 characters (as opposed to the original 96) because PEP 8 wants you to put spaces after your commas:

./python/pyspark/tests.py:1018:47: E231 missing whitespace after ','

@nchammas
Copy link
Contributor Author

nchammas commented Aug 6, 2014

@davies The line length to check for is set to 100 in tox.ini, and pep8 checks there for configuration settings.

@davies
Copy link
Contributor

davies commented Aug 6, 2014

Oh, sorry, I saw it break some lines which does not have more than 100 characters.

LTGM now, but i'm have some questions about the changes in this PR. Maybe it's not enforced by pep8, just changed to make the code looks better?

@davies
Copy link
Contributor

davies commented Aug 6, 2014

LGTM. no more comments.

@nchammas
Copy link
Contributor Author

nchammas commented Aug 6, 2014

Maybe it's not enforced by pep8, just changed to make the code looks better?

The only changes that should be like that are the ones related to the indentation of some of the newAPIHadoop...() calls in tests.py, and I believe there are only a handful of them.

The remainder of the whitespace changes are indeed mandated by PEP 8.

@nchammas
Copy link
Contributor Author

nchammas commented Aug 6, 2014

Oh, sorry, I saw it break some lines which does not have more than 100 characters.

Ah yeah, that happened because of a mistake I made when I called autopep8 on some files with the default setting of breaking lines at 80 characters. I tried to undo all those changes. You found one that I missed, but it should be fixed now.

@nchammas
Copy link
Contributor Author

nchammas commented Aug 6, 2014

I've made one new commit to address @davies's comments.

@nchammas
Copy link
Contributor Author

nchammas commented Aug 6, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1744. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18038/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1744:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18038/consoleFull

@rxin
Copy link
Contributor

rxin commented Aug 6, 2014

Ok I'm merging this in master. Thanks, @nchammas.

@rxin
Copy link
Contributor

rxin commented Aug 6, 2014

And branch-1.1 too.

asfgit pushed a commit that referenced this pull request Aug 6, 2014
As described in [SPARK-2627](https://issues.apache.org/jira/browse/SPARK-2627), we'd like Python code to automatically be checked for PEP 8 compliance by Jenkins. This pull request aims to do that.

Notes:
* We may need to install [`pep8`](https://pypi.python.org/pypi/pep8) on the build server.
* I'm expecting tests to fail now that PEP 8 compliance is being checked as part of the build. I'm fine with cleaning up any remaining PEP 8 violations as part of this pull request.
* I did not understand why the RAT and scalastyle reports are saved to text files. I did the same for the PEP 8 check, but only so that the console output style can match those for the RAT and scalastyle checks. The PEP 8 report is removed right after the check is complete.
* Updates to the ["Contributing to Spark"](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) guide will be submitted elsewhere, as I don't believe that text is part of the Spark repo.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Author: nchammas <nicholas.chammas@gmail.com>

Closes #1744 from nchammas/master and squashes the following commits:

274b238 [Nicholas Chammas] [SPARK-2627] [PySpark] minor indentation changes
983d963 [nchammas] Merge pull request #5 from apache/master
1db5314 [nchammas] Merge pull request #4 from apache/master
0e0245f [Nicholas Chammas] [SPARK-2627] undo erroneous whitespace fixes
bf30942 [Nicholas Chammas] [SPARK-2627] PEP8: comment spacing
6db9a44 [nchammas] Merge pull request #3 from apache/master
7b4750e [Nicholas Chammas] merge upstream changes
91b7584 [Nicholas Chammas] [SPARK-2627] undo unnecessary line breaks
44e3e56 [Nicholas Chammas] [SPARK-2627] use tox.ini to exclude files
b09fae2 [Nicholas Chammas] don't wrap comments unnecessarily
bfb9f9f [Nicholas Chammas] [SPARK-2627] keep up with the PEP 8 fixes
9da347f [nchammas] Merge pull request #2 from apache/master
aa5b4b5 [Nicholas Chammas] [SPARK-2627] follow Spark bash style for if blocks
d0a83b9 [Nicholas Chammas] [SPARK-2627] check that pep8 downloaded fine
dffb5dd [Nicholas Chammas] [SPARK-2627] download pep8 at runtime
a1ce7ae [Nicholas Chammas] [SPARK-2627] space out test report sections
21da538 [Nicholas Chammas] [SPARK-2627] it's PEP 8, not PEP8
6f4900b [Nicholas Chammas] [SPARK-2627] more misc PEP 8 fixes
fe57ed0 [Nicholas Chammas] removing merge conflict backups
9c01d4c [nchammas] Merge pull request #1 from apache/master
9a66cb0 [Nicholas Chammas] resolving merge conflicts
a31ccc4 [Nicholas Chammas] [SPARK-2627] miscellaneous PEP 8 fixes
beaa9ac [Nicholas Chammas] [SPARK-2627] fail check on non-zero status
723ed39 [Nicholas Chammas] always delete the report file
0541ebb [Nicholas Chammas] [SPARK-2627] call Python linter from run-tests
12440fa [Nicholas Chammas] [SPARK-2627] add Scala linter
61c07b9 [Nicholas Chammas] [SPARK-2627] add Python linter
75ad552 [Nicholas Chammas] make check output style consistent

(cherry picked from commit d614967)
Signed-off-by: Reynold Xin <rxin@apache.org>
@asfgit asfgit closed this in d614967 Aug 6, 2014
@nchammas
Copy link
Contributor Author

nchammas commented Aug 6, 2014

Thank you @davies @pwendell @JoshRosen and @rxin for reviewing this PR.

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
As described in [SPARK-2627](https://issues.apache.org/jira/browse/SPARK-2627), we'd like Python code to automatically be checked for PEP 8 compliance by Jenkins. This pull request aims to do that.

Notes:
* We may need to install [`pep8`](https://pypi.python.org/pypi/pep8) on the build server.
* I'm expecting tests to fail now that PEP 8 compliance is being checked as part of the build. I'm fine with cleaning up any remaining PEP 8 violations as part of this pull request.
* I did not understand why the RAT and scalastyle reports are saved to text files. I did the same for the PEP 8 check, but only so that the console output style can match those for the RAT and scalastyle checks. The PEP 8 report is removed right after the check is complete.
* Updates to the ["Contributing to Spark"](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) guide will be submitted elsewhere, as I don't believe that text is part of the Spark repo.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Author: nchammas <nicholas.chammas@gmail.com>

Closes apache#1744 from nchammas/master and squashes the following commits:

274b238 [Nicholas Chammas] [SPARK-2627] [PySpark] minor indentation changes
983d963 [nchammas] Merge pull request apache#5 from apache/master
1db5314 [nchammas] Merge pull request apache#4 from apache/master
0e0245f [Nicholas Chammas] [SPARK-2627] undo erroneous whitespace fixes
bf30942 [Nicholas Chammas] [SPARK-2627] PEP8: comment spacing
6db9a44 [nchammas] Merge pull request apache#3 from apache/master
7b4750e [Nicholas Chammas] merge upstream changes
91b7584 [Nicholas Chammas] [SPARK-2627] undo unnecessary line breaks
44e3e56 [Nicholas Chammas] [SPARK-2627] use tox.ini to exclude files
b09fae2 [Nicholas Chammas] don't wrap comments unnecessarily
bfb9f9f [Nicholas Chammas] [SPARK-2627] keep up with the PEP 8 fixes
9da347f [nchammas] Merge pull request apache#2 from apache/master
aa5b4b5 [Nicholas Chammas] [SPARK-2627] follow Spark bash style for if blocks
d0a83b9 [Nicholas Chammas] [SPARK-2627] check that pep8 downloaded fine
dffb5dd [Nicholas Chammas] [SPARK-2627] download pep8 at runtime
a1ce7ae [Nicholas Chammas] [SPARK-2627] space out test report sections
21da538 [Nicholas Chammas] [SPARK-2627] it's PEP 8, not PEP8
6f4900b [Nicholas Chammas] [SPARK-2627] more misc PEP 8 fixes
fe57ed0 [Nicholas Chammas] removing merge conflict backups
9c01d4c [nchammas] Merge pull request apache#1 from apache/master
9a66cb0 [Nicholas Chammas] resolving merge conflicts
a31ccc4 [Nicholas Chammas] [SPARK-2627] miscellaneous PEP 8 fixes
beaa9ac [Nicholas Chammas] [SPARK-2627] fail check on non-zero status
723ed39 [Nicholas Chammas] always delete the report file
0541ebb [Nicholas Chammas] [SPARK-2627] call Python linter from run-tests
12440fa [Nicholas Chammas] [SPARK-2627] add Scala linter
61c07b9 [Nicholas Chammas] [SPARK-2627] add Python linter
75ad552 [Nicholas Chammas] make check output style consistent
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants