Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit as_str (no deref), add no allocation methods #8826

Merged
merged 16 commits into from
Nov 25, 2023

Conversation

dhruvmanila
Copy link
Member

Summary

This PR is a follow-up to the AST refactor which does the following:

  • Remove Deref implementation on StringLiteralValue and use explicit as_str calls instead. The Deref implementation would implicitly perform allocations in case of implicitly concatenated strings. This is to make sure the allocation is explicit.
  • Now, certain methods can be implemented to do zero allocations which have been implemented in this PR. They are:
    • is_empty
    • len
    • chars
    • Custom PartialEq implementation to compare each character

Test Plan

Run the linter test suite and make sure all tests pass.

This commit implements the AST design to account for implicit
concatenation in string nodes, specifically the `ExprFString`,
`ExprStringLiteral`, and `ExprBytesLiteral` nodes.
This commit adds the new variants for the string parts to `AnyNode` and
`AnyNodeRef` enums. These parts are `StringLiteral` (part of
`ExprStringLiteral`), `BytesLiteral` (part of `ExprBytesLiteral`), and
`FString` (part of `ExprFString`).

The reason for this is to add visitor methods for these parts. This is
done in the following commit. So, the visitor would visit the string as
a whole first and then visit each part.

```
ExprStringLiteral - "foo" "bar"
|- StringLiteral - "foo"
|- StringLiteral - "bar"
```

The above tree helps understand the way visitor would work.
The visitor implementations are updated to visit each part nodes for the
respective string nodes.

The following example better highlights this:
```
ExprStringLiteral - "foo" "bar"
|- StringLiteral - "foo"
|- StringLiteral - "bar"
```

The `visit_expr` method would be use to visit the `ExprStringLiteral`
while the `visit_string_literal` method would be use for the
`StringLiteral` node. Similar methods are added for bytes and f-strings.
The generator is basically improved. Earlier, for an implicitly
concatenated string we would produce the joined form. So,

```python
"foo" "bar" "baz"
```

For the above example, the generator would give us:
```python
"foobarbaz"
```

Now, as we have the information for each part, we will be producing the
exact code back.
`Expr` is a general type for all expressions while
`LiteralExpressionRef` is a type which includes only the literal
expressions. The method is suited more for this type instead.

This will also help in the formatter change.
As highlighted in the review:

> If you have two `ConcatenatedStringLiteral` values where both have
> equivalent values for `strings` but where one has `value` initialized
> and the other does not, would you expect them to compare equal?
> Semantically I think I would, since the alternative is that equality is
> dependent on whether `as_str()` has been called, which seems incidental.

#7927 (comment)
Copy link
Contributor

github-actions bot commented Nov 23, 2023

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Base automatically changed from dhruv/implicit-str-concat-node to main November 24, 2023 23:55
@dhruvmanila dhruvmanila enabled auto-merge (squash) November 24, 2023 23:58
@dhruvmanila dhruvmanila merged commit 626b057 into main Nov 25, 2023
16 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/custom-methods branch November 25, 2023 00:04
@MichaReiser
Copy link
Member

MichaReiser commented Nov 28, 2023

What has been the main motivation for the change (just asking)? I assume it's for better performance but codespeed isn't loading for me right now.

Could we keep the Deref implementation in addition to as_str for case where the Deref lifetimes work? I'm asking because some of the code seems more complicated now, then when using Deref.

@dhruvmanila
Copy link
Member Author

What has been the main motivation for the change (just asking)? I assume it's for better performance but codespeed isn't loading for me right now.

These methods are just some low hanging fruits which are simple to implement to work on each character instead of (probably) allocating a String. That said, I haven't seen any regression or improvements.

Could we keep the Deref implementation in addition to as_str for case where the Deref lifetimes work? I'm asking because some of the code seems more complicated now, then when using Deref.

Can you provide an example where you think the code is complicated? That would help me understand your perspective.

Comment on lines 379 to 385
if checker.enabled(Rule::StaticJoinToFString) {
flynt::rules::static_join_to_fstring(checker, expr, string);
flynt::rules::static_join_to_fstring(
checker,
expr,
string_value.as_str(),
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous code was easier to read than now where we have the explicit as_str call and using Deref seemed to work just fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation here is that the Deref implementation includes an allocation -- the conversion here can allocate if the string is implicitly concatenated. So Deref is essentially hiding an allocation. I prefer that we do something explicit over an implicit allocation. It's not about the lifetimes or anything like that. It's intentionally more verbose.

Copy link
Member

@MichaReiser MichaReiser Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should then call the method to_str rather than as_str according to https://rust-lang.github.io/api-guidelines/naming.html#ad-hoc-conversions-follow-as_-to_-into_-conventions-c-conv Because I assume that a function called as to be free (or extremely cheap), whereas to might be cheap, but it depends, and into_ consumes self

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think using to_str makes more sense. I was not aware of that. I'll create a follow-up PR to rename that.

@@ -799,7 +799,7 @@ where
if let Expr::StringLiteral(ast::ExprStringLiteral { value, .. }) = expr {
self.deferred.string_type_definitions.push((
expr.range(),
value,
value.as_str(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for all these changes. If we could keep the Deref implementation in addition to the explicit as_str method, then code that doesn't need the more relaxed lifetime could be left as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants