Skip to content

Commit

Permalink
New AST nodes for f-string elements (#8835)
Browse files Browse the repository at this point in the history
Rebase of #6365 authored by @davidszotten.

## Summary

This PR updates the AST structure for an f-string elements.

The main **motivation** behind this change is to have a dedicated node
for the string part of an f-string. Previously, the existing
`ExprStringLiteral` node was used for this purpose which isn't exactly
correct. The `ExprStringLiteral` node should include the quotes as well
in the range but the f-string literal element doesn't include the quote
as it's a specific part within an f-string. For example,

```python
f"foo {x}"
# ^^^^
# This is the literal part of an f-string
```

The introduction of `FStringElement` enum is helpful which represent
either the literal part or the expression part of an f-string.

### Rule Updates

This means that there'll be two nodes representing a string depending on
the context. One for a normal string literal while the other is a string
literal within an f-string. The AST checker is updated to accommodate
this change. The rules which work on string literal are updated to check
on the literal part of f-string as well.

#### Notes

1. The `Expr::is_literal_expr` method would check for
`ExprStringLiteral` and return true if so. But now that we don't
represent the literal part of an f-string using that node, this improves
the method's behavior and confines to the actual expression. We do have
the `FStringElement::is_literal` method.
2. We avoid checking if we're in a f-string context before adding to
`string_type_definitions` because the f-string literal is now a
dedicated node and not part of `Expr`.
3. Annotations cannot use f-string so we avoid changing any rules which
work on annotation and checks for `ExprStringLiteral`.

## Test Plan

- All references of `Expr::StringLiteral` were checked to see if any of
the rules require updating to account for the f-string literal element
node.
- New test cases are added for rules which check against the literal
part of an f-string.
- Check the ecosystem results and ensure it remains unchanged.

## Performance

There's a performance penalty in the parser. The reason for this remains
unknown as it seems that the generated assembly code is now different
for the `__reduce154` function. The reduce function body is just popping
the `ParenthesizedExpr` on top of the stack and pushing it with the new
location.

- The size of `FStringElement` enum is the same as `Expr` which is what
it replaces in `FString::format_spec`
- The size of `FStringExpressionElement` is the same as
`ExprFormattedValue` which is what it replaces

I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly
resulted in any performance gain. The difference can be seen here:
- Original profile: https://share.firefox.dev/3Taa7ES
- Profile after boxing some node fields:
https://share.firefox.dev/3GsNXpD

### Backtracking

I tried backtracking the changes to see if any of the isolated change
produced this regression. The problem here is that the overall change is
so small that there's only a single checkpoint where I can backtrack and
that checkpoint results in the same regression. This checkpoint is to
revert using `Expr` to the `FString::format_spec` field. After this
point, the change would revert back to the original implementation.

## Review process

The review process is similar to #7927. The first set of commits update
the node structure, parser, and related AST files. Then, further commits
update the linter and formatter part to account for the AST change.

---------

Co-authored-by: David Szotten <davidszotten@gmail.com>
  • Loading branch information
dhruvmanila and davidszotten committed Dec 7, 2023
1 parent fcc0889 commit cdac90e
Show file tree
Hide file tree
Showing 77 changed files with 1,711 additions and 1,922 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ def func(address):
# Error
"0.0.0.0"
'0.0.0.0'
f"0.0.0.0"


# Error
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
with open("/tmp/abc", "w") as f:
f.write("def")

with open(f"/tmp/abc", "w") as f:
f.write("def")

with open("/var/tmp/123", "w") as f:
f.write("def")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ def f8(x: bytes = b"50 character byte stringgggggggggggggggggggggggggg\xff") ->

foo: str = "50 character stringggggggggggggggggggggggggggggggg"
bar: str = "51 character stringgggggggggggggggggggggggggggggggg"
baz: str = f"51 character stringgggggggggggggggggggggggggggggggg"

baz: bytes = b"50 character byte stringgggggggggggggggggggggggggg"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ baz: bytes = b"50 character byte stringgggggggggggggggggggggggggg" # OK

qux: bytes = b"51 character byte stringggggggggggggggggggggggggggg\xff" # Error: PYI053

ffoo: str = f"50 character stringggggggggggggggggggggggggggggggg" # OK

fbar: str = f"51 character stringgggggggggggggggggggggggggggggggg" # Error: PYI053

class Demo:
"""Docstrings are excluded from this rule. Some padding.""" # OK

Expand Down
53 changes: 42 additions & 11 deletions crates/ruff_linter/src/checkers/ast/analyze/expression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use ruff_python_literal::cformat::{CFormatError, CFormatErrorType};
use ruff_diagnostics::Diagnostic;

use ruff_python_ast::types::Node;
use ruff_python_ast::AstNode;
use ruff_python_semantic::analyze::typing;
use ruff_python_semantic::ScopeKind;
use ruff_text_size::Ranged;
Expand Down Expand Up @@ -1006,6 +1007,30 @@ pub(crate) fn expression(expr: &Expr, checker: &mut Checker) {
pyupgrade::rules::unicode_kind_prefix(checker, string_literal);
}
}
for literal in value.elements().filter_map(|element| element.as_literal()) {
if checker.enabled(Rule::HardcodedBindAllInterfaces) {
flake8_bandit::rules::hardcoded_bind_all_interfaces(
checker,
&literal.value,
literal.range,
);
}
if checker.enabled(Rule::HardcodedTempFile) {
flake8_bandit::rules::hardcoded_tmp_directory(
checker,
&literal.value,
literal.range,
);
}
if checker.source_type.is_stub() {
if checker.enabled(Rule::StringOrBytesTooLong) {
flake8_pyi::rules::string_or_bytes_too_long(
checker,
literal.as_any_node_ref(),
);
}
}
}
}
Expr::BinOp(ast::ExprBinOp {
left,
Expand Down Expand Up @@ -1270,30 +1295,36 @@ pub(crate) fn expression(expr: &Expr, checker: &mut Checker) {
refurb::rules::math_constant(checker, number_literal);
}
}
Expr::BytesLiteral(_) => {
Expr::BytesLiteral(bytes_literal) => {
if checker.source_type.is_stub() && checker.enabled(Rule::StringOrBytesTooLong) {
flake8_pyi::rules::string_or_bytes_too_long(checker, expr);
flake8_pyi::rules::string_or_bytes_too_long(
checker,
bytes_literal.as_any_node_ref(),
);
}
}
Expr::StringLiteral(string) => {
Expr::StringLiteral(string_literal @ ast::ExprStringLiteral { value, range }) => {
if checker.enabled(Rule::HardcodedBindAllInterfaces) {
if let Some(diagnostic) =
flake8_bandit::rules::hardcoded_bind_all_interfaces(string)
{
checker.diagnostics.push(diagnostic);
}
flake8_bandit::rules::hardcoded_bind_all_interfaces(
checker,
value.to_str(),
*range,
);
}
if checker.enabled(Rule::HardcodedTempFile) {
flake8_bandit::rules::hardcoded_tmp_directory(checker, string);
flake8_bandit::rules::hardcoded_tmp_directory(checker, value.to_str(), *range);
}
if checker.enabled(Rule::UnicodeKindPrefix) {
for string_part in string.value.parts() {
for string_part in value.parts() {
pyupgrade::rules::unicode_kind_prefix(checker, string_part);
}
}
if checker.source_type.is_stub() {
if checker.enabled(Rule::StringOrBytesTooLong) {
flake8_pyi::rules::string_or_bytes_too_long(checker, expr);
flake8_pyi::rules::string_or_bytes_too_long(
checker,
string_literal.as_any_node_ref(),
);
}
}
}
Expand Down
19 changes: 2 additions & 17 deletions crates/ruff_linter/src/checkers/ast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -815,8 +815,7 @@ where

fn visit_expr(&mut self, expr: &'b Expr) {
// Step 0: Pre-processing
if !self.semantic.in_f_string()
&& !self.semantic.in_typing_literal()
if !self.semantic.in_typing_literal()
&& !self.semantic.in_deferred_type_definition()
&& self.semantic.in_type_definition()
&& self.semantic.future_annotations()
Expand Down Expand Up @@ -1238,10 +1237,7 @@ where
}
}
Expr::StringLiteral(ast::ExprStringLiteral { value, .. }) => {
if self.semantic.in_type_definition()
&& !self.semantic.in_typing_literal()
&& !self.semantic.in_f_string()
{
if self.semantic.in_type_definition() && !self.semantic.in_typing_literal() {
self.deferred.string_type_definitions.push((
expr.range(),
value.to_str(),
Expand Down Expand Up @@ -1326,17 +1322,6 @@ where
self.semantic.flags = flags_snapshot;
}

fn visit_format_spec(&mut self, format_spec: &'b Expr) {
match format_spec {
Expr::FString(ast::ExprFString { value, .. }) => {
for expr in value.elements() {
self.visit_expr(expr);
}
}
_ => unreachable!("Unexpected expression for format_spec"),
}
}

fn visit_parameters(&mut self, parameters: &'b Parameters) {
// Step 1: Binding.
// Bind, but intentionally avoid walking default expressions, as we handle them
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::ExprStringLiteral;
use ruff_text_size::TextRange;

use crate::checkers::ast::Checker;

/// ## What it does
/// Checks for hardcoded bindings to all network interfaces (`0.0.0.0`).
Expand Down Expand Up @@ -34,10 +36,10 @@ impl Violation for HardcodedBindAllInterfaces {
}

/// S104
pub(crate) fn hardcoded_bind_all_interfaces(string: &ExprStringLiteral) -> Option<Diagnostic> {
if string.value.to_str() == "0.0.0.0" {
Some(Diagnostic::new(HardcodedBindAllInterfaces, string.range))
} else {
None
pub(crate) fn hardcoded_bind_all_interfaces(checker: &mut Checker, value: &str, range: TextRange) {
if value == "0.0.0.0" {
checker
.diagnostics
.push(Diagnostic::new(HardcodedBindAllInterfaces, range));
}
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use ruff_python_ast::{self as ast, Expr};
use ruff_text_size::TextRange;

use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
Expand Down Expand Up @@ -51,13 +52,13 @@ impl Violation for HardcodedTempFile {
}

/// S108
pub(crate) fn hardcoded_tmp_directory(checker: &mut Checker, string: &ast::ExprStringLiteral) {
pub(crate) fn hardcoded_tmp_directory(checker: &mut Checker, value: &str, range: TextRange) {
if !checker
.settings
.flake8_bandit
.hardcoded_tmp_directory
.iter()
.any(|prefix| string.value.to_str().starts_with(prefix))
.any(|prefix| value.starts_with(prefix))
{
return;
}
Expand All @@ -76,8 +77,8 @@ pub(crate) fn hardcoded_tmp_directory(checker: &mut Checker, string: &ast::ExprS

checker.diagnostics.push(Diagnostic::new(
HardcodedTempFile {
string: string.value.to_string(),
string: value.to_string(),
},
string.range,
range,
));
}
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ S104.py:9:1: S104 Possible binding to all interfaces
9 | "0.0.0.0"
| ^^^^^^^^^ S104
10 | '0.0.0.0'
11 | f"0.0.0.0"
|

S104.py:10:1: S104 Possible binding to all interfaces
Expand All @@ -15,21 +16,30 @@ S104.py:10:1: S104 Possible binding to all interfaces
9 | "0.0.0.0"
10 | '0.0.0.0'
| ^^^^^^^^^ S104
11 | f"0.0.0.0"
|

S104.py:14:6: S104 Possible binding to all interfaces
S104.py:11:3: S104 Possible binding to all interfaces
|
13 | # Error
14 | func("0.0.0.0")
9 | "0.0.0.0"
10 | '0.0.0.0'
11 | f"0.0.0.0"
| ^^^^^^^ S104
|

S104.py:15:6: S104 Possible binding to all interfaces
|
14 | # Error
15 | func("0.0.0.0")
| ^^^^^^^^^ S104
|

S104.py:18:9: S104 Possible binding to all interfaces
S104.py:19:9: S104 Possible binding to all interfaces
|
17 | def my_func():
18 | x = "0.0.0.0"
18 | def my_func():
19 | x = "0.0.0.0"
| ^^^^^^^^^ S104
19 | print(x)
20 | print(x)
|


Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,31 @@ S108.py:5:11: S108 Probable insecure usage of temporary file or directory: "/tmp
6 | f.write("def")
|

S108.py:8:11: S108 Probable insecure usage of temporary file or directory: "/var/tmp/123"
S108.py:8:13: S108 Probable insecure usage of temporary file or directory: "/tmp/abc"
|
6 | f.write("def")
7 |
8 | with open("/var/tmp/123", "w") as f:
| ^^^^^^^^^^^^^^ S108
8 | with open(f"/tmp/abc", "w") as f:
| ^^^^^^^^ S108
9 | f.write("def")
|

S108.py:11:11: S108 Probable insecure usage of temporary file or directory: "/dev/shm/unit/test"
S108.py:11:11: S108 Probable insecure usage of temporary file or directory: "/var/tmp/123"
|
9 | f.write("def")
10 |
11 | with open("/dev/shm/unit/test", "w") as f:
| ^^^^^^^^^^^^^^^^^^^^ S108
11 | with open("/var/tmp/123", "w") as f:
| ^^^^^^^^^^^^^^ S108
12 | f.write("def")
|

S108.py:14:11: S108 Probable insecure usage of temporary file or directory: "/dev/shm/unit/test"
|
12 | f.write("def")
13 |
14 | with open("/dev/shm/unit/test", "w") as f:
| ^^^^^^^^^^^^^^^^^^^^ S108
15 | f.write("def")
|


Original file line number Diff line number Diff line change
Expand Up @@ -10,30 +10,39 @@ S108.py:5:11: S108 Probable insecure usage of temporary file or directory: "/tmp
6 | f.write("def")
|

S108.py:8:11: S108 Probable insecure usage of temporary file or directory: "/var/tmp/123"
S108.py:8:13: S108 Probable insecure usage of temporary file or directory: "/tmp/abc"
|
6 | f.write("def")
7 |
8 | with open("/var/tmp/123", "w") as f:
| ^^^^^^^^^^^^^^ S108
8 | with open(f"/tmp/abc", "w") as f:
| ^^^^^^^^ S108
9 | f.write("def")
|

S108.py:11:11: S108 Probable insecure usage of temporary file or directory: "/dev/shm/unit/test"
S108.py:11:11: S108 Probable insecure usage of temporary file or directory: "/var/tmp/123"
|
9 | f.write("def")
10 |
11 | with open("/dev/shm/unit/test", "w") as f:
| ^^^^^^^^^^^^^^^^^^^^ S108
11 | with open("/var/tmp/123", "w") as f:
| ^^^^^^^^^^^^^^ S108
12 | f.write("def")
|

S108.py:14:11: S108 Probable insecure usage of temporary file or directory: "/dev/shm/unit/test"
|
12 | f.write("def")
13 |
14 | with open("/dev/shm/unit/test", "w") as f:
| ^^^^^^^^^^^^^^^^^^^^ S108
15 | f.write("def")
|

S108.py:15:11: S108 Probable insecure usage of temporary file or directory: "/foo/bar"
S108.py:18:11: S108 Probable insecure usage of temporary file or directory: "/foo/bar"
|
14 | # not ok by config
15 | with open("/foo/bar", "w") as f:
17 | # not ok by config
18 | with open("/foo/bar", "w") as f:
| ^^^^^^^^^^ S108
16 | f.write("def")
19 | f.write("def")
|


Original file line number Diff line number Diff line change
Expand Up @@ -1083,7 +1083,7 @@ pub(crate) fn fix_unnecessary_map(
// If the expression is embedded in an f-string, surround it with spaces to avoid
// syntax errors.
if matches!(object_type, ObjectType::Set | ObjectType::Dict) {
if parent.is_some_and(Expr::is_formatted_value_expr) {
if parent.is_some_and(Expr::is_f_string_expr) {
content = format!(" {content} ");
}
}
Expand Down
Loading

0 comments on commit cdac90e

Please sign in to comment.