Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9175] Introduce an autoformatting tool to Python SDK #10684

Merged
merged 5 commits into from
Feb 6, 2020

Conversation

kamilwu
Copy link
Contributor

@kamilwu kamilwu commented Jan 24, 2020

This PR adds yapf configuration and all the changes made after running it over the entire Python SDK codebase.
The result is fully compliant with PEP8 (lint check passes, the issues were manually resolved).

Some of the examples and rules of the newly enforced style (they can be put under discussion and reviewed):

  • Multiple dict entries can be put in the same line.
    before:

    RECORDS = [
    {'name': 'Thomas',
    'favorite_number': 1,
    'favorite_color': 'blue'},
    {'name': 'Henry',
    'favorite_number': 3,
    'favorite_color': 'green'},
    {'name': 'Toby',
    'favorite_number': 7,
    'favorite_color': 'brown'},
    {'name': 'Gordon',
    'favorite_number': 4,
    'favorite_color': 'blue'},
    {'name': 'Emily',
    'favorite_number': -1,
    'favorite_color': 'Red'},
    {'name': 'Percy',
    'favorite_number': 6,
    'favorite_color': 'Green'}
    ]

    after:
    RECORDS = [{'name': 'Thomas', 'favorite_number': 1, 'favorite_color': 'blue'},
    {'name': 'Henry', 'favorite_number': 3, 'favorite_color': 'green'},
    {'name': 'Toby', 'favorite_number': 7, 'favorite_color': 'brown'},
    {'name': 'Gordon', 'favorite_number': 4, 'favorite_color': 'blue'},
    {'name': 'Emily', 'favorite_number': -1, 'favorite_color': 'Red'},
    {'name': 'Percy', 'favorite_number': 6, 'favorite_color': 'Green'}]

    With this setting, it seems the result conforms more to the already used style (lower number of lines changed).

  • Yapf tries to put the first parameter on the same line where the function begins:

    self.check_annotation(warning=w,
    warning_type=BeamDeprecationWarning,
    obj_name='fnc_test_deprecated_with_since_current',
    annotation_type='deprecated',
    label_check_list=[('since', True),
    ('instead', True)])

    def __init__(self,
    table=None,
    dataset=None,
    project=None,
    query=None,
    validate=False,
    coder=None,
    use_standard_sql=False,
    flatten_results=True,
    kms_key=None):

...unless some of parameters have type annotation or are too long (then it does the split before the first argument):

def _update_pending(
self,
input_committed_bundle,
applied_ptransform, # type: AppliedPTransform
completed_timers,
output_committed_bundles, # type: Iterable[_Bundle]
unprocessed_bundles # type: Iterable[_Bundle]
):

Indent width is 2 and continuation indent is 4.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@mwalenia
Copy link
Member

retest this please

@chadrik
Copy link
Contributor

chadrik commented Jan 27, 2020

We should add a pre-commit lint job to run yapf, otherwise the codebase will drift from the proper yapf formatting over time, and will be rectified all at once in some unrelated PR when a contributor finally runs it. I think we can all agree we don't want to be reviewing unrelated style changes in a PR.

From the docs:

If --diff is supplied, YAPF returns zero when no changes were necessary, non-zero otherwise (including program error). You can use this in a CI workflow to test that code has been YAPF-formatted.

Once this is in place we can dismantle the pylint and flake8 checks that overlap with yapf.

@robertwb
Copy link
Contributor

I would suggest COALESCE_BRACKETS , SPLIT_BEFORE_FIRST_ARGUMENT, SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES, and possibly SPLIT_BEFORE_EXPRESSION_AFTER_OPENING_PAREN to reduce churn.

SPLIT_ARGUMENTS_WHEN_COMMA_TERMINATED could also give authors a bit more freedom.

@robertwb
Copy link
Contributor

Why was SPLIT_BEFORE_LOGICAL_OPERATOR disabled? Also, +1 to a precommit.

@udim
Copy link
Member

udim commented Jan 28, 2020

@kamilwu
Copy link
Contributor Author

kamilwu commented Jan 28, 2020

@robertwb I managed to add all knobs you asked for. I think the code looks better now (much less weird lines, that's for sure).

I also added a pre-commit job that runs yapf with --diff option.

@kamilwu
Copy link
Contributor Author

kamilwu commented Jan 28, 2020

There is still a number of pylint issues Wrong continued indentation. Most of them appear because of how lambda is formatted. For example:

table=lambda x,
tables:
(tables['table1'] if 'language' in x else tables['table2']),

I don't know yet how to deal with it. Unless there is a knob for this, I'll just put a # yapf: disable comments in these places.

@Ardagan
Copy link
Contributor

Ardagan commented Jan 28, 2020

Is it possible to split change to one that adds formatter and one that shows what formatting actually looks like?

@Ardagan
Copy link
Contributor

Ardagan commented Jan 28, 2020

+1 for running yapf on pre-commit. (Can be done in separate PR)

@udim
Copy link
Member

udim commented Jan 28, 2020

There is still a number of pylint issues Wrong continued indentation. Most of them appear because of how lambda is formatted. For example:

table=lambda x,
tables:
(tables['table1'] if 'language' in x else tables['table2']),

I don't know yet how to deal with it. Unless there is a knob for this, I'll just put a # yapf: disable comments in these places.

If we adopt yapf's indentation style, then we can disable the same check in pylint.

@Ardagan
Copy link
Contributor

Ardagan commented Jan 28, 2020

There is still a number of pylint issues Wrong continued indentation. Most of them appear because of how lambda is formatted. For example:

table=lambda x,
tables:
(tables['table1'] if 'language' in x else tables['table2']),

I don't know yet how to deal with it. Unless there is a knob for this, I'll just put a # yapf: disable comments in these places.

If we adopt yapf's indentation style, then we can disable the same check in pylint.

If we utilize autoformatter, especially on pre-commit, it would be feasible to disable all formatting checks on linter. They become obsolete.

@kamilwu
Copy link
Contributor Author

kamilwu commented Jan 29, 2020

Is it possible to split change to one that adds formatter and one that shows what formatting actually looks like?

It's already split like this. The first commit only adds the formatter, and the second commit is the one that applies formatting changes to the code. Or do you mean creating a separate PR?

@kamilwu
Copy link
Contributor Author

kamilwu commented Jan 29, 2020

If we adopt yapf's indentation style, then we can disable the same check in pylint.

That's true. If that's so, I can disable that particular pylint check which causes problems. The question is, are we OK with how some lambdas are formatted?

@mwalenia
Copy link
Member

retest this please

2 similar comments
@mwalenia
Copy link
Member

retest this please

@mwalenia
Copy link
Member

retest this please

@kamilwu
Copy link
Contributor Author

kamilwu commented Jan 29, 2020

R: @udim @chadrik @robertwb Please take a look once again.

@mwalenia
Copy link
Member

Run Java_Examples_Dataflow PreCommit

1 similar comment
@mwalenia
Copy link
Member

Run Java_Examples_Dataflow PreCommit

@mwalenia
Copy link
Member

Run seed job

@mwalenia
Copy link
Member

run seed job

@mwalenia
Copy link
Member

Run PythonFormatter PreCommit

@robertwb
Copy link
Contributor

I'm not a huge fan of how lambdas are formatted (tempted to file a bug against yapf), but there aren't too many of them so I'd be fine either disabling lint or disabling yapf for those lines.

@kamilwu
Copy link
Contributor Author

kamilwu commented Jan 31, 2020

Pylint shows 67 bad-continuation warnings, but some of them refer to the same malformed lambda. This means that we have 30-40 malformed lambdas in the whole project. I agree this isn't a large number.

@robertwb I've already disable lint for those lines in this PR

@kamilwu
Copy link
Contributor Author

kamilwu commented Jan 31, 2020

Java_Examples_Dataflow failure is unrelated (this check is flaky)

@udim
Copy link
Member

udim commented Jan 31, 2020

LGTM

@mwalenia
Copy link
Member

mwalenia commented Feb 3, 2020

Run seed job

@mwalenia
Copy link
Member

mwalenia commented Feb 3, 2020

retest this please

@mwalenia
Copy link
Member

mwalenia commented Feb 3, 2020

Run PythonFormatter PreCommit

@mwalenia
Copy link
Member

mwalenia commented Feb 3, 2020

Run Python PreCommit

1 similar comment
@mwalenia
Copy link
Member

mwalenia commented Feb 3, 2020

Run Python PreCommit

@aaltay
Copy link
Member

aaltay commented Feb 5, 2020

Is it possible to update some docs/wiki, explaining how users could run this locally?

@kamilwu
Copy link
Contributor Author

kamilwu commented Feb 6, 2020

Is it possible to update some docs/wiki, explaining how users could run this locally?

Yes, luckily I have edit access to the wiki. I'll prepare an instruction. I think Python Tips is the best place for this.

@mwalenia
Copy link
Member

mwalenia commented Feb 6, 2020

run seed job

@mwalenia
Copy link
Member

mwalenia commented Feb 6, 2020

retest this please

@mwalenia
Copy link
Member

mwalenia commented Feb 6, 2020

run seed job

1 similar comment
@mwalenia
Copy link
Member

mwalenia commented Feb 6, 2020

run seed job

@mwalenia
Copy link
Member

mwalenia commented Feb 6, 2020

retest this please

@mwalenia
Copy link
Member

mwalenia commented Feb 6, 2020

Run PythonFormatter PreCommit

@mwalenia mwalenia merged commit b91560c into apache:master Feb 6, 2020
@kamilwu kamilwu deleted the yapf branch February 6, 2020 11:42
@kamilwu
Copy link
Contributor Author

kamilwu commented Feb 6, 2020

Thanks everyone!

I've added an appropriate instruction here: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips

@@ -35,6 +35,9 @@ lint.dependsOn lintPy37
toxTask "mypyPy37", "py37-mypy"
lint.dependsOn mypyPy37

toxTask "formatter", "py3-yapf-check"
check.dependsOn formatter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this in my review. The lint precommit runs :pythonLintPreCommit, which depends on lint tasks.
Suggestion:

lint.dependsOn formatter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see there's a new job named PythonFormatter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants