Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pycodestyle] Add blank line(s) rules (E301, E302, E303, E304, E305, E306) #4694

Closed
wants to merge 49 commits into from

Conversation

hoel-bagard
Copy link
Contributor

Summary

This PR is part of #2402, it adds the E301, E302, E303, E304, E305, E306 error rules along with their fixes.

A first attempt at implementing E305 using physical lines was done here, however since some operations are expensive when using physical lines, this implementation uses logical lines.

To be able to use NonLogicalNewline to detect blank lines, I have removed the check explicitly skipping them. I have modified the existing rules so that their behavior remains unchanged.

Test Plan

The test fixture uses the one from pycodestyle as is, but the notes in the file about which line should generate which error do not fully match the rules nor the implementation. I therefore did my best to manually check that what I did matches the rules/implementation in a way that makes sense.

For example, the rule E306's example is not detected by pycodestyle, but this false positive is.

@charliermarsh charliermarsh self-requested a review June 13, 2023 02:08
@charliermarsh
Copy link
Member

I looked into merging this but it doesn't seem to properly handle nested definitions -- this snippet is clean under pycodestyle, but yields violations here:

class C:
    def f():
        pass


def f():
    def f():
        pass

Yields:

foo.py:2:5: E306 [*] Expected 1 blank line before a nested definition, found 0
foo.py:7:5: E302 [*] Expected 2 blank lines, found 0
foo.py:7:5: E306 [*] Expected 1 blank line before a nested definition, found 0

@hoel-bagard
Copy link
Contributor Author

hoel-bagard commented Jun 13, 2023

@charliermarsh Yes, the rules and examples of pycodestyles do not match the actual implementation.

For example, the rule E306's example is not detected by pycodestyle (as you've pointed out above), but this false positive is.

@hoel-bagard
Copy link
Contributor Author

hoel-bagard commented Jun 13, 2023

I haven't had time to look into tracking the number of blank lines in the logical lines builder yet, but I'll try to do it once I have time.

…r in order to not emit empty lines as logical lines.
@hoel-bagard
Copy link
Contributor Author

hoel-bagard commented Jun 16, 2023

I've moved the tracking of blank lines into the logical lines builder, so blank lines are no longer emitted as logical lines.
However, I'm not sure how to track the number of blank characters there. I used to have line.text().chars().count(), but I don't think I can access the actual text from within the builder, so instead I used the tokens.

@hoel-bagard
Copy link
Contributor Author

I started reviewing the output on the pycodestyle fixture, and found a major issue with this PR. Pycodestyle's E3 rules take into account comments. For example:

a = 1




# a



# a
class A:
    pass

Pycodestyle output:

example.py:6:1: E303 too many blank lines (4)
example.py:10:1: E303 too many blank lines (3)
example.py:11:1: E302 expected 2 blank lines, found 4

However, since comment only lines are not logical lines, I can't reproduce this output. Moreover, this causes the implementation of the autofix to mess up the code.

Without making empty lines/comment only lines into logical lines (or using physical lines), I don't think these rules can be implemented.
@charliermarsh Would you have a recommendation on how I could proceed ?

@MichaReiser
Copy link
Member

If the rules instead test for proper spacing between all statements, then I think it's best if we would build out a new TokenVisitor infrastructure that allows to runs some logic for every token in the program. Both LogicalLines and your new rule could implement the TokenVisitor trait that has a single visit_token(&mut self, token: &Tok, range: TextRange) method. The visitor can track its own state internally and can trigger its rules when reaching a certain token (end of a logical line, end of a blank line). I would prefer this over changing the semantics of logical line because it still visits the tokens only once and provides us with a more extensible solution that could potentially even allow the other physical lines rule to work on tokens, rather than using the UniversalNewlinesIterator. @charliermarsh what do you think, would you have time to build such infrastructure?

@charliermarsh what do you think?

@akx
Copy link
Contributor

akx commented Jul 24, 2023

Could we move this forward somehow? I was about to start implementing these rules but luckily checked whether there was momentum on it already :)

@hoel-bagard
Copy link
Contributor Author

I'll finish cleaning the fixture to make it easier to check that the implementation works (or not).

But without implementing something like the proposed TokenVisitor, it's difficult to progress.

@hoel-bagard
Copy link
Contributor Author

@MichaReiser Did something like the TokenVisitor get implemented since you talked about it ? I would like to finish the PR, but I can't do it using logical lines, since some cases do not include any logical line and should still emit an error (see the example below).

# comment




# comment

@MichaReiser
Copy link
Member

@MichaReiser Did something like the TokenVisitor get implemented since you talked about it ? I would like to finish the PR, but I can't do it using logical lines, since some cases do not include any logical line and should still emit an error (see the example below).

# comment




# comment

Not the way I described it but there is a set of token-based rules that operate only on the tokens/indexer. Would that meet your needs?

pub(crate) fn check_tokens(
tokens: &[LexResult],
path: &Path,
locator: &Locator,
indexer: &Indexer,
settings: &LinterSettings,
is_stub: bool,
) -> Vec<Diagnostic> {
let mut diagnostics: Vec<Diagnostic> = vec![];
if settings.rules.enabled(Rule::BlanketNOQA) {
pygrep_hooks::rules::blanket_noqa(&mut diagnostics, indexer, locator);
}
if settings.rules.enabled(Rule::BlanketTypeIgnore) {
pygrep_hooks::rules::blanket_type_ignore(&mut diagnostics, indexer, locator);
}
if settings.rules.any_enabled(&[
Rule::AmbiguousUnicodeCharacterString,
Rule::AmbiguousUnicodeCharacterDocstring,
Rule::AmbiguousUnicodeCharacterComment,
]) {
let mut state_machine = StateMachine::default();
for &(ref tok, range) in tokens.iter().flatten() {
let is_docstring = state_machine.consume(tok);
if matches!(tok, Tok::String { .. } | Tok::Comment(_)) {
ruff::rules::ambiguous_unicode_character(
&mut diagnostics,
locator,
range,
if tok.is_string() {
if is_docstring {
Context::Docstring
} else {
Context::String
}
} else {
Context::Comment
},
settings,
);
}
}
}
if settings.rules.enabled(Rule::CommentedOutCode) {
eradicate::rules::commented_out_code(&mut diagnostics, locator, indexer, settings);
}
if settings.rules.enabled(Rule::UTF8EncodingDeclaration) {
pyupgrade::rules::unnecessary_coding_comment(&mut diagnostics, locator, indexer, settings);
}
if settings.rules.enabled(Rule::InvalidEscapeSequence) {
for (tok, range) in tokens.iter().flatten() {
if tok.is_string() {
pycodestyle::rules::invalid_escape_sequence(
&mut diagnostics,
locator,
*range,
settings.rules.should_fix(Rule::InvalidEscapeSequence),
);
}
}
}
if settings.rules.enabled(Rule::TabIndentation) {
pycodestyle::rules::tab_indentation(&mut diagnostics, tokens, locator, indexer);
}
if settings.rules.any_enabled(&[
Rule::InvalidCharacterBackspace,
Rule::InvalidCharacterSub,
Rule::InvalidCharacterEsc,
Rule::InvalidCharacterNul,
Rule::InvalidCharacterZeroWidthSpace,
]) {
for (tok, range) in tokens.iter().flatten() {
if tok.is_string() {
pylint::rules::invalid_string_characters(&mut diagnostics, *range, locator);
}
}
}
if settings.rules.any_enabled(&[
Rule::MultipleStatementsOnOneLineColon,
Rule::MultipleStatementsOnOneLineSemicolon,
Rule::UselessSemicolon,
]) {
pycodestyle::rules::compound_statements(
&mut diagnostics,
tokens,
locator,
indexer,
settings,
);
}
if settings.rules.any_enabled(&[
Rule::BadQuotesInlineString,
Rule::BadQuotesMultilineString,
Rule::BadQuotesDocstring,
Rule::AvoidableEscapedQuote,
]) {
flake8_quotes::rules::from_tokens(&mut diagnostics, tokens, locator, settings);
}
if settings.rules.any_enabled(&[
Rule::SingleLineImplicitStringConcatenation,
Rule::MultiLineImplicitStringConcatenation,
]) {
flake8_implicit_str_concat::rules::implicit(
&mut diagnostics,
tokens,
&settings.flake8_implicit_str_concat,
locator,
);
}
if settings.rules.any_enabled(&[
Rule::MissingTrailingComma,
Rule::TrailingCommaOnBareTuple,
Rule::ProhibitedTrailingComma,
]) {
flake8_commas::rules::trailing_commas(&mut diagnostics, tokens, locator, settings);
}
if settings.rules.enabled(Rule::ExtraneousParentheses) {
pyupgrade::rules::extraneous_parentheses(&mut diagnostics, tokens, locator, settings);
}
if is_stub && settings.rules.enabled(Rule::TypeCommentInStub) {
flake8_pyi::rules::type_comment_in_stub(&mut diagnostics, locator, indexer);
}
if settings.rules.any_enabled(&[
Rule::ShebangNotExecutable,
Rule::ShebangMissingExecutableFile,
Rule::ShebangLeadingWhitespace,
Rule::ShebangNotFirstLine,
Rule::ShebangMissingPython,
]) {
flake8_executable::rules::from_tokens(tokens, path, locator, settings, &mut diagnostics);
}
if settings.rules.any_enabled(&[
Rule::InvalidTodoTag,
Rule::MissingTodoAuthor,
Rule::MissingTodoLink,
Rule::MissingTodoColon,
Rule::MissingTodoDescription,
Rule::InvalidTodoCapitalization,
Rule::MissingSpaceAfterTodoColon,
Rule::LineContainsFixme,
Rule::LineContainsXxx,
Rule::LineContainsTodo,
Rule::LineContainsHack,
]) {
let todo_comments: Vec<TodoComment> = indexer
.comment_ranges()
.iter()
.enumerate()
.filter_map(|(i, comment_range)| {
let comment = locator.slice(*comment_range);
TodoComment::from_comment(comment, *comment_range, i)
})
.collect();
flake8_todos::rules::todos(&mut diagnostics, &todo_comments, locator, indexer, settings);
flake8_fixme::rules::todos(&mut diagnostics, &todo_comments);
}
diagnostics.retain(|diagnostic| settings.rules.enabled(diagnostic.kind.rule()));
diagnostics
}

@hoel-bagard
Copy link
Contributor Author

I think so, thank you.
Is there something I should keep in mind while using the tokens ? For example, in my attempt using physical lines it ended up too slow to be usable.

@MichaReiser
Copy link
Member

I should keep in mind while using the tokens ? For example, in my attempt using physical lines it ended up too slow to be usable.

Try to perform as many operations as possible on the tokens directly. Only fall back to inspect the source code (or token content) if you must.

@hoel-bagard
Copy link
Contributor Author

Closed in favor of #8720.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants