[`pycodestyle`] Add blank line(s) rules (`E301`, `E302`, `E303`, `E304`, `E305`, `E306`) #4694

hoel-bagard · 2023-05-28T17:08:35Z

Summary

This PR is part of #2402, it adds the E301, E302, E303, E304, E305, E306 error rules along with their fixes.

A first attempt at implementing E305 using physical lines was done here, however since some operations are expensive when using physical lines, this implementation uses logical lines.

To be able to use NonLogicalNewline to detect blank lines, I have removed the check explicitly skipping them. I have modified the existing rules so that their behavior remains unchanged.

Test Plan

The test fixture uses the one from pycodestyle as is, but the notes in the file about which line should generate which error do not fully match the rules nor the implementation. I therefore did my best to manually check that what I did matches the rules/implementation in a way that makes sense.

For example, the rule E306's example is not detected by pycodestyle, but this false positive is.

…e \r\n cases.

Made blank_characters into a u32 again to avoid all the try_from.

charliermarsh · 2023-06-13T03:20:03Z

I looked into merging this but it doesn't seem to properly handle nested definitions -- this snippet is clean under pycodestyle, but yields violations here:

class C:
    def f():
        pass


def f():
    def f():
        pass

Yields:

foo.py:2:5: E306 [*] Expected 1 blank line before a nested definition, found 0
foo.py:7:5: E302 [*] Expected 2 blank lines, found 0
foo.py:7:5: E306 [*] Expected 1 blank line before a nested definition, found 0

hoel-bagard · 2023-06-13T03:27:39Z

@charliermarsh Yes, the rules and examples of pycodestyles do not match the actual implementation.

For example, the rule E306's example is not detected by pycodestyle (as you've pointed out above), but this false positive is.

hoel-bagard · 2023-06-13T03:29:55Z

I haven't had time to look into tracking the number of blank lines in the logical lines builder yet, but I'll try to do it once I have time.

…r in order to not emit empty lines as logical lines.

hoel-bagard · 2023-06-16T14:35:17Z

I've moved the tracking of blank lines into the logical lines builder, so blank lines are no longer emitted as logical lines.
However, I'm not sure how to track the number of blank characters there. I used to have line.text().chars().count(), but I don't think I can access the actual text from within the builder, so instead I used the tokens.

hoel-bagard · 2023-07-07T03:00:58Z

I started reviewing the output on the pycodestyle fixture, and found a major issue with this PR. Pycodestyle's E3 rules take into account comments. For example:

a = 1




# a



# a
class A:
    pass

Pycodestyle output:

example.py:6:1: E303 too many blank lines (4)
example.py:10:1: E303 too many blank lines (3)
example.py:11:1: E302 expected 2 blank lines, found 4

However, since comment only lines are not logical lines, I can't reproduce this output. Moreover, this causes the implementation of the autofix to mess up the code.

Without making empty lines/comment only lines into logical lines (or using physical lines), I don't think these rules can be implemented.
@charliermarsh Would you have a recommendation on how I could proceed ?

This reverts commit 071849e.

MichaReiser · 2023-07-07T09:16:31Z

If the rules instead test for proper spacing between all statements, then I think it's best if we would build out a new TokenVisitor infrastructure that allows to runs some logic for every token in the program. Both LogicalLines and your new rule could implement the TokenVisitor trait that has a single visit_token(&mut self, token: &Tok, range: TextRange) method. The visitor can track its own state internally and can trigger its rules when reaching a certain token (end of a logical line, end of a blank line). I would prefer this over changing the semantics of logical line because it still visits the tokens only once and provides us with a more extensible solution that could potentially even allow the other physical lines rule to work on tokens, rather than using the UniversalNewlinesIterator. @charliermarsh what do you think, would you have time to build such infrastructure?

@charliermarsh what do you think?

akx · 2023-07-24T11:52:02Z

Could we move this forward somehow? I was about to start implementing these rules but luckily checked whether there was momentum on it already :)

hoel-bagard · 2023-07-24T14:25:45Z

I'll finish cleaning the fixture to make it easier to check that the implementation works (or not).

But without implementing something like the proposed TokenVisitor, it's difficult to progress.

hoel-bagard · 2023-09-25T10:57:56Z

@MichaReiser Did something like the TokenVisitor get implemented since you talked about it ? I would like to finish the PR, but I can't do it using logical lines, since some cases do not include any logical line and should still emit an error (see the example below).

# comment




# comment

MichaReiser · 2023-09-25T12:09:51Z

@MichaReiser Did something like the TokenVisitor get implemented since you talked about it ? I would like to finish the PR, but I can't do it using logical lines, since some cases do not include any logical line and should still emit an error (see the example below).
# comment




# comment

Not the way I described it but there is a set of token-based rules that operate only on the tokens/indexer. Would that meet your needs?

ruff/crates/ruff_linter/src/checkers/tokens.rs

Lines 22 to 197 in b34278e

    
           pub(crate) fn check_tokens( 
        
               tokens: &[LexResult], 
        
               path: &Path, 
        
               locator: &Locator, 
        
               indexer: &Indexer, 
        
               settings: &LinterSettings, 
        
               is_stub: bool, 
        
           ) -> Vec<Diagnostic> { 
        
               let mut diagnostics: Vec<Diagnostic> = vec![]; 
        
               if settings.rules.enabled(Rule::BlanketNOQA) { 
        
                   pygrep_hooks::rules::blanket_noqa(&mut diagnostics, indexer, locator); 
        
               } 
        
               if settings.rules.enabled(Rule::BlanketTypeIgnore) { 
        
                   pygrep_hooks::rules::blanket_type_ignore(&mut diagnostics, indexer, locator); 
        
               } 
        
               if settings.rules.any_enabled(&[ 
        
                   Rule::AmbiguousUnicodeCharacterString, 
        
                   Rule::AmbiguousUnicodeCharacterDocstring, 
        
                   Rule::AmbiguousUnicodeCharacterComment, 
        
               ]) { 
        
                   let mut state_machine = StateMachine::default(); 
        
                   for &(ref tok, range) in tokens.iter().flatten() { 
        
                       let is_docstring = state_machine.consume(tok); 
        
                       if matches!(tok, Tok::String { .. } | Tok::Comment(_)) { 
        
                           ruff::rules::ambiguous_unicode_character( 
        
                               &mut diagnostics, 
        
                               locator, 
        
                               range, 
        
                               if tok.is_string() { 
        
                                   if is_docstring { 
        
                                       Context::Docstring 
        
                                   } else { 
        
                                       Context::String 
        
                                   } 
        
                               } else { 
        
                                   Context::Comment 
        
                               }, 
        
                               settings, 
        
                           ); 
        
                       } 
        
                   } 
        
               } 
        
               if settings.rules.enabled(Rule::CommentedOutCode) { 
        
                   eradicate::rules::commented_out_code(&mut diagnostics, locator, indexer, settings); 
        
               } 
        
               if settings.rules.enabled(Rule::UTF8EncodingDeclaration) { 
        
                   pyupgrade::rules::unnecessary_coding_comment(&mut diagnostics, locator, indexer, settings); 
        
               } 
        
               if settings.rules.enabled(Rule::InvalidEscapeSequence) { 
        
                   for (tok, range) in tokens.iter().flatten() { 
        
                       if tok.is_string() { 
        
                           pycodestyle::rules::invalid_escape_sequence( 
        
                               &mut diagnostics, 
        
                               locator, 
        
                               *range, 
        
                               settings.rules.should_fix(Rule::InvalidEscapeSequence), 
        
                           ); 
        
                       } 
        
                   } 
        
               } 
        
               if settings.rules.enabled(Rule::TabIndentation) { 
        
                   pycodestyle::rules::tab_indentation(&mut diagnostics, tokens, locator, indexer); 
        
               } 
        
               if settings.rules.any_enabled(&[ 
        
                   Rule::InvalidCharacterBackspace, 
        
                   Rule::InvalidCharacterSub, 
        
                   Rule::InvalidCharacterEsc, 
        
                   Rule::InvalidCharacterNul, 
        
                   Rule::InvalidCharacterZeroWidthSpace, 
        
               ]) { 
        
                   for (tok, range) in tokens.iter().flatten() { 
        
                       if tok.is_string() { 
        
                           pylint::rules::invalid_string_characters(&mut diagnostics, *range, locator); 
        
                       } 
        
                   } 
        
               } 
        
               if settings.rules.any_enabled(&[ 
        
                   Rule::MultipleStatementsOnOneLineColon, 
        
                   Rule::MultipleStatementsOnOneLineSemicolon, 
        
                   Rule::UselessSemicolon, 
        
               ]) { 
        
                   pycodestyle::rules::compound_statements( 
        
                       &mut diagnostics, 
        
                       tokens, 
        
                       locator, 
        
                       indexer, 
        
                       settings, 
        
                   ); 
        
               } 
        
               if settings.rules.any_enabled(&[ 
        
                   Rule::BadQuotesInlineString, 
        
                   Rule::BadQuotesMultilineString, 
        
                   Rule::BadQuotesDocstring, 
        
                   Rule::AvoidableEscapedQuote, 
        
               ]) { 
        
                   flake8_quotes::rules::from_tokens(&mut diagnostics, tokens, locator, settings); 
        
               } 
        
               if settings.rules.any_enabled(&[ 
        
                   Rule::SingleLineImplicitStringConcatenation, 
        
                   Rule::MultiLineImplicitStringConcatenation, 
        
               ]) { 
        
                   flake8_implicit_str_concat::rules::implicit( 
        
                       &mut diagnostics, 
        
                       tokens, 
        
                       &settings.flake8_implicit_str_concat, 
        
                       locator, 
        
                   ); 
        
               } 
        
               if settings.rules.any_enabled(&[ 
        
                   Rule::MissingTrailingComma, 
        
                   Rule::TrailingCommaOnBareTuple, 
        
                   Rule::ProhibitedTrailingComma, 
        
               ]) { 
        
                   flake8_commas::rules::trailing_commas(&mut diagnostics, tokens, locator, settings); 
        
               } 
        
               if settings.rules.enabled(Rule::ExtraneousParentheses) { 
        
                   pyupgrade::rules::extraneous_parentheses(&mut diagnostics, tokens, locator, settings); 
        
               } 
        
               if is_stub && settings.rules.enabled(Rule::TypeCommentInStub) { 
        
                   flake8_pyi::rules::type_comment_in_stub(&mut diagnostics, locator, indexer); 
        
               } 
        
               if settings.rules.any_enabled(&[ 
        
                   Rule::ShebangNotExecutable, 
        
                   Rule::ShebangMissingExecutableFile, 
        
                   Rule::ShebangLeadingWhitespace, 
        
                   Rule::ShebangNotFirstLine, 
        
                   Rule::ShebangMissingPython, 
        
               ]) { 
        
                   flake8_executable::rules::from_tokens(tokens, path, locator, settings, &mut diagnostics); 
        
               } 
        
               if settings.rules.any_enabled(&[ 
        
                   Rule::InvalidTodoTag, 
        
                   Rule::MissingTodoAuthor, 
        
                   Rule::MissingTodoLink, 
        
                   Rule::MissingTodoColon, 
        
                   Rule::MissingTodoDescription, 
        
                   Rule::InvalidTodoCapitalization, 
        
                   Rule::MissingSpaceAfterTodoColon, 
        
                   Rule::LineContainsFixme, 
        
                   Rule::LineContainsXxx, 
        
                   Rule::LineContainsTodo, 
        
                   Rule::LineContainsHack, 
        
               ]) { 
        
                   let todo_comments: Vec<TodoComment> = indexer 
        
                       .comment_ranges() 
        
                       .iter() 
        
                       .enumerate() 
        
                       .filter_map(|(i, comment_range)| { 
        
                           let comment = locator.slice(*comment_range); 
        
                           TodoComment::from_comment(comment, *comment_range, i) 
        
                       }) 
        
                       .collect(); 
        
                   flake8_todos::rules::todos(&mut diagnostics, &todo_comments, locator, indexer, settings); 
        
                   flake8_fixme::rules::todos(&mut diagnostics, &todo_comments); 
        
               } 
        
               diagnostics.retain(|diagnostic| settings.rules.enabled(diagnostic.kind.rule())); 
        
               diagnostics 
        
           }

hoel-bagard · 2023-09-25T12:27:18Z

I think so, thank you.
Is there something I should keep in mind while using the tokens ? For example, in my attempt using physical lines it ended up too slow to be usable.

MichaReiser · 2023-09-25T13:44:13Z

I should keep in mind while using the tokens ? For example, in my attempt using physical lines it ended up too slow to be usable.

Try to perform as many operations as possible on the tokens directly. Only fall back to inspect the source code (or token content) if you must.

hoel-bagard · 2023-11-16T13:55:48Z

Closed in favor of #8720.

hoel-bagard added 30 commits May 25, 2023 16:18

Added test fixture.

7a1e782

Start working on blank lines using logical lines. Added violations.

569dae9

Began looking at logic.

119520d

Began looking at the logic.

7667d39

Removed debugs

e7f28ac

Minor cleanup.

41cddde

Add blank line rules to KNOWN_FORMATTING_VIOLATIONS

47788de

Fixed errors created by not ignoring empty lines anymore.

3ee9186

Fixed parser test expectation for E302 since support is added for it.

8f60ac9

Add snapshot for E304.

42c4737

Count blank characters and use them to remove lines in order to handl…

4bf7326

…e \r\n cases.

Finished E303.

e082747

Made blank_characters into a u32 again to avoid all the try_from.

Added E306

4f369cc

Added is_in_class variable.

e1c3df0

Added E301.

ccd7468

Added E302.

44b146c

Added E305.

6ba1bb7

Linting fixes.

d20fd0c

Put variables used to track the current state into a struct.

0dc80d5

Added clippy allow

95cb064

Add rules to ruff.schema.json

a8242c1

Put errors in order.

210940a

Fixed E303

00cfdce

Add E301 snapshot

6183a1c

Add E302 snapshot

e2923e0

Add E305 snapshot

db678a8

Add E306 snapshot

5c4b75b

Add E303 snapshot

88a1fa4

Fixed E302. Fixed is_in_class and is_in_fn being updated too late.

c4d4713

Updated snapshots.

4b0933f

charliermarsh self-requested a review June 13, 2023 02:08

charliermarsh added 2 commits June 12, 2023 22:48

Merge branch 'main' into add_blank_lines_E30

57b320c

Update fixtures

4cf44a3

One violation per line

6e8c1dc

Track the number of preceding blank lines in the logical lines builde…

0b6e962

…r in order to not emit empty lines as logical lines.

hoel-bagard added 3 commits June 30, 2023 00:10

Fix E302 not taking into account nested functions.

53e94f0

Ignore comment only lines.

071849e

Fix E302 not detecting issues with classes

75b11a9

Revert "Ignore comment only lines."

9f74835

This reverts commit 071849e.

hoel-bagard added 3 commits August 22, 2023 15:24

Update E30 fixture

10c9571

Fix false positive 301 on def following @ and class definition (E306).

a12a57c

Fix non-comment related E302 errors.

3072418

hoel-bagard marked this pull request as draft August 26, 2023 12:12

timj mentioned this pull request Aug 30, 2023

DM-39845: Add a section on ruff configuration lsst-dm/dm_dev_guide#630

Merged

ksunden mentioned this pull request Sep 20, 2023

Implement remaining pycodestyle rules #2402

Open

70 tasks

hoel-bagard mentioned this pull request Nov 16, 2023

[pycodestyle] Add blank line(s) rules (E301, E302, E303, E304, E305, E306) #8720

Closed

hoel-bagard closed this Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`pycodestyle`] Add blank line(s) rules (`E301`, `E302`, `E303`, `E304`, `E305`, `E306`) #4694

[`pycodestyle`] Add blank line(s) rules (`E301`, `E302`, `E303`, `E304`, `E305`, `E306`) #4694

hoel-bagard commented May 28, 2023

charliermarsh commented Jun 13, 2023

hoel-bagard commented Jun 13, 2023 •

edited

Loading

hoel-bagard commented Jun 13, 2023 •

edited

Loading

hoel-bagard commented Jun 16, 2023 •

edited

Loading

hoel-bagard commented Jul 7, 2023

MichaReiser commented Jul 7, 2023

akx commented Jul 24, 2023

hoel-bagard commented Jul 24, 2023

hoel-bagard commented Sep 25, 2023

MichaReiser commented Sep 25, 2023

hoel-bagard commented Sep 25, 2023

MichaReiser commented Sep 25, 2023

hoel-bagard commented Nov 16, 2023

[pycodestyle] Add blank line(s) rules (E301, E302, E303, E304, E305, E306) #4694

[pycodestyle] Add blank line(s) rules (E301, E302, E303, E304, E305, E306) #4694

Conversation

hoel-bagard commented May 28, 2023

Summary

Test Plan

charliermarsh commented Jun 13, 2023

hoel-bagard commented Jun 13, 2023 • edited Loading

hoel-bagard commented Jun 13, 2023 • edited Loading

hoel-bagard commented Jun 16, 2023 • edited Loading

hoel-bagard commented Jul 7, 2023

MichaReiser commented Jul 7, 2023

akx commented Jul 24, 2023

hoel-bagard commented Jul 24, 2023

hoel-bagard commented Sep 25, 2023

MichaReiser commented Sep 25, 2023

hoel-bagard commented Sep 25, 2023

MichaReiser commented Sep 25, 2023

hoel-bagard commented Nov 16, 2023

[`pycodestyle`] Add blank line(s) rules (`E301`, `E302`, `E303`, `E304`, `E305`, `E306`) #4694

[`pycodestyle`] Add blank line(s) rules (`E301`, `E302`, `E303`, `E304`, `E305`, `E306`) #4694

hoel-bagard commented Jun 13, 2023 •

edited

Loading

hoel-bagard commented Jun 13, 2023 •

edited

Loading

hoel-bagard commented Jun 16, 2023 •

edited

Loading