Skip to content

Commit

Permalink
Update string nodes for implicit concatenation (#7927)
Browse files Browse the repository at this point in the history
## Summary

This PR updates the string nodes (`ExprStringLiteral`,
`ExprBytesLiteral`, and `ExprFString`) to account for implicit string
concatenation.

### Motivation

In Python, implicit string concatenation are joined while parsing
because the interpreter doesn't require the information for each part.
While that's feasible for an interpreter, it falls short for a static
analysis tool where having such information is more useful. Currently,
various parts of the code uses the lexer to get the individual string
parts.

One of the main challenge this solves is that of string formatting.
Currently, the formatter relies on the lexer to get the individual
string parts, and formats them including the comments accordingly. But,
with PEP 701, f-string can also contain comments. Without this change,
it becomes very difficult to add support for f-string formatting.

### Implementation

The initial proposal was made in this discussion:
#6183 (comment).
There were various AST designs which were explored for this task which
are available in the linked internal document[^1].

The selected variant was the one where the nodes were kept as it is
except that the `implicit_concatenated` field was removed and instead a
new struct was added to the `Expr*` struct. This would be a private
struct would contain the actual implementation of how the AST is
designed for both single and implicitly concatenated strings.

This implementation is achieved through an enum with two variants:
`Single` and `Concatenated` to avoid allocating a vector even for single
strings. There are various public methods available on the value struct
to query certain information regarding the node.

The nodes are structured in the following way:

```
ExprStringLiteral - "foo" "bar"
|- StringLiteral - "foo"
|- StringLiteral - "bar"

ExprBytesLiteral - b"foo" b"bar"
|- BytesLiteral - b"foo"
|- BytesLiteral - b"bar"

ExprFString - "foo" f"bar {x}"
|- FStringPart::Literal - "foo"
|- FStringPart::FString - f"bar {x}"
  |- StringLiteral - "bar "
  |- FormattedValue - "x"
```

[^1]: Internal document:
https://www.notion.so/astral-sh/Implicit-String-Concatenation-e036345dc48943f89e416c087bf6f6d9?pvs=4

#### Visitor

The way the nodes are structured is that the entire string, including
all the parts that are implicitly concatenation, is a single node
containing individual nodes for the parts. The previous section has a
representation of that tree for all the string nodes. This means that
new visitor methods are added to visit the individual parts of string,
bytes, and f-strings for `Visitor`, `PreorderVisitor`, and
`Transformer`.

## Test Plan

- `cargo insta test --workspace --all-features --unreferenced reject`
- Verify that the ecosystem results are unchanged
  • Loading branch information
dhruvmanila committed Nov 24, 2023
1 parent 2590aa3 commit 017e829
Show file tree
Hide file tree
Showing 121 changed files with 14,809 additions and 12,644 deletions.
16 changes: 9 additions & 7 deletions crates/ruff_linter/src/checkers/ast/analyze/definitions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -158,21 +158,23 @@ pub(crate) fn definitions(checker: &mut Checker) {
}

// Extract a `Docstring` from a `Definition`.
let Some(expr) = docstring else {
let Some(string_literal) = docstring else {
pydocstyle::rules::not_missing(checker, definition, *visibility);
continue;
};

let contents = checker.locator().slice(expr);
let contents = checker.locator().slice(string_literal);

let indentation = checker.locator().slice(TextRange::new(
checker.locator.line_start(expr.start()),
expr.start(),
checker.locator.line_start(string_literal.start()),
string_literal.start(),
));

if expr.implicit_concatenated {
if string_literal.value.is_implicit_concatenated() {
#[allow(deprecated)]
let location = checker.locator.compute_source_location(expr.start());
let location = checker
.locator
.compute_source_location(string_literal.start());
warn_user!(
"Docstring at {}:{}:{} contains implicit string concatenation; ignoring...",
relativize_path(checker.path),
Expand All @@ -186,7 +188,7 @@ pub(crate) fn definitions(checker: &mut Checker) {
let body_range = raw_contents_range(contents).unwrap();
let docstring = Docstring {
definition,
expr,
expr: string_literal,
contents,
body_range,
indentation,
Expand Down
20 changes: 16 additions & 4 deletions crates/ruff_linter/src/checkers/ast/analyze/expression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -988,15 +988,22 @@ pub(crate) fn expression(expr: &Expr, checker: &mut Checker) {
pylint::rules::await_outside_async(checker, expr);
}
}
Expr::FString(fstring @ ast::ExprFString { values, .. }) => {
Expr::FString(f_string_expr @ ast::ExprFString { value, .. }) => {
if checker.enabled(Rule::FStringMissingPlaceholders) {
pyflakes::rules::f_string_missing_placeholders(fstring, checker);
pyflakes::rules::f_string_missing_placeholders(checker, f_string_expr);
}
if checker.enabled(Rule::ExplicitFStringTypeConversion) {
for f_string in value.f_strings() {
ruff::rules::explicit_f_string_type_conversion(checker, f_string);
}
}
if checker.enabled(Rule::HardcodedSQLExpression) {
flake8_bandit::rules::hardcoded_sql_expression(checker, expr);
}
if checker.enabled(Rule::ExplicitFStringTypeConversion) {
ruff::rules::explicit_f_string_type_conversion(checker, expr, values);
if checker.enabled(Rule::UnicodeKindPrefix) {
for string_literal in value.literals() {
pyupgrade::rules::unicode_kind_prefix(checker, string_literal);
}
}
}
Expr::BinOp(ast::ExprBinOp {
Expand Down Expand Up @@ -1278,6 +1285,11 @@ pub(crate) fn expression(expr: &Expr, checker: &mut Checker) {
if checker.enabled(Rule::HardcodedTempFile) {
flake8_bandit::rules::hardcoded_tmp_directory(checker, string);
}
if checker.enabled(Rule::UnicodeKindPrefix) {
for string_part in string.value.parts() {
pyupgrade::rules::unicode_kind_prefix(checker, string_part);
}
}
if checker.source_type.is_stub() {
if checker.enabled(Rule::StringOrBytesTooLong) {
flake8_pyi::rules::string_or_bytes_too_long(checker, expr);
Expand Down
6 changes: 3 additions & 3 deletions crates/ruff_linter/src/checkers/ast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1303,9 +1303,9 @@ where

fn visit_format_spec(&mut self, format_spec: &'b Expr) {
match format_spec {
Expr::FString(ast::ExprFString { values, .. }) => {
for value in values {
self.visit_expr(value);
Expr::FString(ast::ExprFString { value, .. }) => {
for expr in value.elements() {
self.visit_expr(expr);
}
}
_ => unreachable!("Unexpected expression for format_spec"),
Expand Down
4 changes: 0 additions & 4 deletions crates/ruff_linter/src/checkers/tokens.rs
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,6 @@ pub(crate) fn check_tokens(
pycodestyle::rules::tab_indentation(&mut diagnostics, tokens, locator, indexer);
}

if settings.rules.enabled(Rule::UnicodeKindPrefix) {
pyupgrade::rules::unicode_kind_prefix(&mut diagnostics, tokens);
}

if settings.rules.any_enabled(&[
Rule::InvalidCharacterBackspace,
Rule::InvalidCharacterSub,
Expand Down
29 changes: 26 additions & 3 deletions crates/ruff_linter/src/cst/matchers.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
use crate::fix::codemods::CodegenStylist;
use anyhow::{bail, Result};
use libcst_native::{
Arg, Attribute, Call, Comparison, CompoundStatement, Dict, Expression, FunctionDef,
GeneratorExp, If, Import, ImportAlias, ImportFrom, ImportNames, IndentedBlock, Lambda,
ListComp, Module, Name, SmallStatement, Statement, Suite, Tuple, With,
Arg, Attribute, Call, Comparison, CompoundStatement, Dict, Expression, FormattedString,
FormattedStringContent, FormattedStringExpression, FunctionDef, GeneratorExp, If, Import,
ImportAlias, ImportFrom, ImportNames, IndentedBlock, Lambda, ListComp, Module, Name,
SmallStatement, Statement, Suite, Tuple, With,
};
use ruff_python_codegen::Stylist;

Expand Down Expand Up @@ -153,6 +154,28 @@ pub(crate) fn match_lambda<'a, 'b>(expression: &'a Expression<'b>) -> Result<&'a
}
}

pub(crate) fn match_formatted_string<'a, 'b>(
expression: &'a mut Expression<'b>,
) -> Result<&'a mut FormattedString<'b>> {
if let Expression::FormattedString(formatted_string) = expression {
Ok(formatted_string)
} else {
bail!("Expected Expression::FormattedString");
}
}

pub(crate) fn match_formatted_string_expression<'a, 'b>(
formatted_string_content: &'a mut FormattedStringContent<'b>,
) -> Result<&'a mut FormattedStringExpression<'b>> {
if let FormattedStringContent::Expression(formatted_string_expression) =
formatted_string_content
{
Ok(formatted_string_expression)
} else {
bail!("Expected FormattedStringContent::Expression")
}
}

pub(crate) fn match_function_def<'a, 'b>(
statement: &'a mut Statement<'b>,
) -> Result<&'a mut FunctionDef<'b>> {
Expand Down
1 change: 0 additions & 1 deletion crates/ruff_linter/src/registry.rs
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,6 @@ impl Rule {
| Rule::TabIndentation
| Rule::TrailingCommaOnBareTuple
| Rule::TypeCommentInStub
| Rule::UnicodeKindPrefix
| Rule::UselessSemicolon
| Rule::UTF8EncodingDeclaration => LintSource::Tokens,
Rule::IOError => LintSource::Io,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ pub(crate) fn variable_name_task_id(
let ast::ExprStringLiteral { value: task_id, .. } = keyword.value.as_string_literal_expr()?;

// If the target name is the same as the task_id, no violation.
if id == task_id {
if task_id == id {
return None;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,6 @@ fn check_dynamically_typed<F>(
if let Expr::StringLiteral(ast::ExprStringLiteral {
range,
value: string,
..
}) = annotation
{
// Quoted annotations
Expand Down
2 changes: 1 addition & 1 deletion crates/ruff_linter/src/rules/flake8_bandit/helpers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ static PASSWORD_CANDIDATE_REGEX: Lazy<Regex> = Lazy::new(|| {

pub(super) fn string_literal(expr: &Expr) -> Option<&str> {
match expr {
Expr::StringLiteral(ast::ExprStringLiteral { value, .. }) => Some(value),
Expr::StringLiteral(ast::ExprStringLiteral { value, .. }) => Some(value.as_str()),
_ => None,
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ impl Violation for HardcodedBindAllInterfaces {

/// S104
pub(crate) fn hardcoded_bind_all_interfaces(string: &ExprStringLiteral) -> Option<Diagnostic> {
if string.value == "0.0.0.0" {
if string.value.as_str() == "0.0.0.0" {
Some(Diagnostic::new(HardcodedBindAllInterfaces, string.range))
} else {
None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,12 @@ use ruff_python_ast::{self as ast, Expr, Operator};
use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::helpers::any_over_expr;
use ruff_python_semantic::SemanticModel;
use ruff_python_ast::str::raw_contents;
use ruff_source_file::Locator;
use ruff_text_size::Ranged;

use crate::checkers::ast::Checker;

use super::super::helpers::string_literal;

static SQL_REGEX: Lazy<Regex> = Lazy::new(|| {
Regex::new(r"(?i)\b(select\s.+\sfrom\s|delete\s+from\s|(insert|replace)\s.+\svalues\s|update\s.+\sset\s)")
.unwrap()
Expand Down Expand Up @@ -46,53 +45,77 @@ impl Violation for HardcodedSQLExpression {
}
}

fn has_string_literal(expr: &Expr) -> bool {
string_literal(expr).is_some()
}

fn matches_sql_statement(string: &str) -> bool {
SQL_REGEX.is_match(string)
/// Concatenates the contents of an f-string, without the prefix and quotes,
/// and escapes any special characters.
///
/// ## Example
///
/// ```python
/// "foo" f"bar {x}" "baz"
/// ```
///
/// becomes `foobar {x}baz`.
fn concatenated_f_string(expr: &ast::ExprFString, locator: &Locator) -> String {
expr.value
.parts()
.filter_map(|part| {
raw_contents(locator.slice(part)).map(|s| s.escape_default().to_string())
})
.collect()
}

fn matches_string_format_expression(expr: &Expr, semantic: &SemanticModel) -> bool {
match expr {
/// S608
pub(crate) fn hardcoded_sql_expression(checker: &mut Checker, expr: &Expr) {
let content = match expr {
// "select * from table where val = " + "str" + ...
// "select * from table where val = %s" % ...
Expr::BinOp(ast::ExprBinOp {
op: Operator::Add | Operator::Mod,
..
op: Operator::Add, ..
}) => {
// Only evaluate the full BinOp, not the nested components.
if semantic
if !checker
.semantic()
.current_expression_parent()
.map_or(true, |parent| !parent.is_bin_op_expr())
{
if any_over_expr(expr, &has_string_literal) {
return true;
}
return;
}
if !any_over_expr(expr, &Expr::is_string_literal_expr) {
return;
}
false
checker.generator().expr(expr)
}
// "select * from table where val = %s" % ...
Expr::BinOp(ast::ExprBinOp {
left,
op: Operator::Mod,
..
}) => {
let Some(string) = left.as_string_literal_expr() else {
return;
};
string.value.escape_default().to_string()
}
Expr::Call(ast::ExprCall { func, .. }) => {
let Expr::Attribute(ast::ExprAttribute { attr, value, .. }) = func.as_ref() else {
return false;
return;
};
// "select * from table where val = {}".format(...)
attr == "format" && string_literal(value).is_some()
if attr != "format" {
return;
}
let Some(string) = value.as_string_literal_expr() else {
return;
};
string.value.escape_default().to_string()
}
// f"select * from table where val = {val}"
Expr::FString(_) => true,
_ => false,
}
}
Expr::FString(f_string) => concatenated_f_string(f_string, checker.locator()),
_ => return,
};

/// S608
pub(crate) fn hardcoded_sql_expression(checker: &mut Checker, expr: &Expr) {
if matches_string_format_expression(expr, checker.semantic()) {
if matches_sql_statement(&checker.generator().expr(expr)) {
checker
.diagnostics
.push(Diagnostic::new(HardcodedSQLExpression, expr.range()));
}
if SQL_REGEX.is_match(&content) {
checker
.diagnostics
.push(Diagnostic::new(HardcodedSQLExpression, expr.range()));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ pub(crate) fn hardcoded_tmp_directory(checker: &mut Checker, string: &ast::ExprS

checker.diagnostics.push(Diagnostic::new(
HardcodedTempFile {
string: string.value.clone(),
string: string.value.to_string(),
},
string.range,
));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -854,11 +854,11 @@ pub(crate) fn suspicious_function_call(checker: &mut Checker, call: &ExprCall) {
["six", "moves", "urllib", "request", "urlopen" | "urlretrieve" | "Request"] => {
// If the `url` argument is a string literal, allow `http` and `https` schemes.
if call.arguments.args.iter().all(|arg| !arg.is_starred_expr()) && call.arguments.keywords.iter().all(|keyword| keyword.arg.is_some()) {
if let Some(Expr::StringLiteral(ast::ExprStringLiteral { value: url, .. })) = &call.arguments.find_argument("url", 0) {
let url = url.trim_start();
if url.starts_with("http://") || url.starts_with("https://") {
return None;
}
if let Some(Expr::StringLiteral(ast::ExprStringLiteral { value, .. })) = &call.arguments.find_argument("url", 0) {
let url = value.trim_start();
if url.starts_with("http://") || url.starts_with("https://") {
return None;
}
}
}
Some(SuspiciousURLOpenUsage.into())
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ pub(crate) fn unnecessary_dict_kwargs(checker: &mut Checker, expr: &Expr, kwargs
fn as_kwarg(key: &Expr) -> Option<&str> {
if let Expr::StringLiteral(ast::ExprStringLiteral { value, .. }) = key {
if is_identifier(value) {
return Some(value);
return Some(value.as_str());
}
}
None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ pub(crate) fn unrecognized_platform(checker: &mut Checker, test: &Expr) {
if !matches!(value.as_str(), "linux" | "win32" | "cygwin" | "darwin") {
checker.diagnostics.push(Diagnostic::new(
UnrecognizedPlatformName {
platform: value.clone(),
platform: value.to_string(),
},
right.range(),
));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,13 @@ pub(super) fn is_empty_or_null_string(expr: &Expr) -> bool {
match expr {
Expr::StringLiteral(ast::ExprStringLiteral { value, .. }) => value.is_empty(),
Expr::NoneLiteral(_) => true,
Expr::FString(ast::ExprFString { values, .. }) => {
values.iter().all(is_empty_or_null_string)
Expr::FString(ast::ExprFString { value, .. }) => {
value.parts().all(|f_string_part| match f_string_part {
ast::FStringPart::Literal(literal) => literal.is_empty(),
ast::FStringPart::FString(f_string) => {
f_string.values.iter().all(is_empty_or_null_string)
}
})
}
_ => false,
}
Expand Down
Loading

0 comments on commit 017e829

Please sign in to comment.