Skip to content
This repository was archived by the owner on Jul 27, 2023. It is now read-only.

Remove asdl #21

Merged
merged 3 commits into from
Jul 5, 2023
Merged

Remove asdl #21

merged 3 commits into from
Jul 5, 2023

Conversation

konstin
Copy link
Member

@konstin konstin commented Jun 30, 2023

This removes the ASDL code generation in favor of handwriting the AST.

The motivations for moving away from the ASDL are:

  • CPython compatibility is no longer a goal
  • The ASDL grammar isn't as expressive as we would like
  • The codegen scripts have a high complexity which makes extensions time consuming
  • We don't make heavy use of code generation (compared to e.g. RustPython that generates Pyo3 bindings, a fold implementation etc).

We may want to revisit a grammar based code generation in the future, e.g. by using ungrammar

@konstin konstin mentioned this pull request Jun 30, 2023
@konstin
Copy link
Member Author

konstin commented Jun 30, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@konstin konstin requested a review from MichaReiser June 30, 2023 19:45
Base automatically changed from impl_ranged_for_textrange to main July 2, 2023 08:11
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please extend the PR summary with why using the ASDL was more trouble for us than it is worth (CPython AST compatibility is a non-goal, limited grammar, complicated code gen scripts, ..). It would help a future us help understand the reasoning if we reconsider that choice in the future.

@konstin konstin merged commit 2d1f69c into main Jul 5, 2023
@konstin konstin deleted the remove_asdl branch July 5, 2023 12:25
zanieb pushed a commit that referenced this pull request Jul 10, 2023
This removes the ASDL code generation in favor of handwriting the AST. 

The motivations for moving away from the ASDL are:

* CPython compatibility is no longer a goal
* The ASDL grammar isn't as expressive as we would like
* The codegen scripts have a high complexity which makes extensions time
consuming
* We don't make heavy use of code generation (compared to e.g.
RustPython that generates Pyo3 bindings, a fold implementation etc).

We may want to revisit a grammar based code generation in the future,
e.g. by using [ungrammar](https://github.com/rust-analyzer/ungrammar)
@davidszotten
Copy link

👋 what's the plan for integrating this into ruff? i'm trying to add an unrelated commit to this repo but this commit fails the ruff test suite when i bring it in there

         "RUF012",
         "RUF013",
<        "RUF014",
         "RUF015",
         "RUF016",

[...]
failures:
    generate_json_schema::tests::test_generate_json_schema

@konstin
Copy link
Member Author

konstin commented Jul 17, 2023

@davidszotten it's not related to this PR, the problem is astral-sh/ruff#5832 (sorry for the disruption). --all-features should hopefully do as a workaround in the meantime

zanieb pushed a commit that referenced this pull request Jul 17, 2023
This removes the ASDL code generation in favor of handwriting the AST.

The motivations for moving away from the ASDL are:

* CPython compatibility is no longer a goal
* The ASDL grammar isn't as expressive as we would like
* The codegen scripts have a high complexity which makes extensions time
consuming
* We don't make heavy use of code generation (compared to e.g.
RustPython that generates Pyo3 bindings, a fold implementation etc).

We may want to revisit a grammar based code generation in the future,
e.g. by using [ungrammar](https://github.com/rust-analyzer/ungrammar)
zanieb pushed a commit that referenced this pull request Jul 17, 2023
This removes the ASDL code generation in favor of handwriting the AST.

The motivations for moving away from the ASDL are:

* CPython compatibility is no longer a goal
* The ASDL grammar isn't as expressive as we would like
* The codegen scripts have a high complexity which makes extensions time
consuming
* We don't make heavy use of code generation (compared to e.g.
RustPython that generates Pyo3 bindings, a fold implementation etc).

We may want to revisit a grammar based code generation in the future,
e.g. by using [ungrammar](https://github.com/rust-analyzer/ungrammar)
konstin added a commit to astral-sh/ruff that referenced this pull request Jul 18, 2023
## Summary

Previously, `StmtIf` was defined recursively as
```rust
pub struct StmtIf {
    pub range: TextRange,
    pub test: Box<Expr>,
    pub body: Vec<Stmt>,
    pub orelse: Vec<Stmt>,
}
```
Every `elif` was represented as an `orelse` with a single `StmtIf`. This
means that this representation couldn't differentiate between
```python
if cond1:
    x = 1
else:
    if cond2:
        x = 2
```
and 
```python
if cond1:
    x = 1
elif cond2:
    x = 2
```
It also makes many checks harder than they need to be because we have to
recurse just to iterate over an entire if-elif-else and because we're
lacking nodes and ranges on the `elif` and `else` branches.

We change the representation to a flat

```rust
pub struct StmtIf {
    pub range: TextRange,
    pub test: Box<Expr>,
    pub body: Vec<Stmt>,
    pub elif_else_clauses: Vec<ElifElseClause>,
}

pub struct ElifElseClause {
    pub range: TextRange,
    pub test: Option<Expr>,
    pub body: Vec<Stmt>,
}
```
where `test: Some(_)` represents an `elif` and `test: None` an else.

This representation is different tradeoff, e.g. we need to allocate the
`Vec<ElifElseClause>`, the `elif`s are now different than the `if`s
(which matters in rules where want to check both `if`s and `elif`s) and
the type system doesn't guarantee that the `test: None` else is actually
last. We're also now a bit more inconsistent since all other `else`,
those from `for`, `while` and `try`, still don't have nodes. With the
new representation some things became easier, e.g. finding the `elif`
token (we can use the start of the `ElifElseClause`) and formatting
comments for if-elif-else (no more dangling comments splitting, we only
have to insert the dangling comment after the colon manually and set
`leading_alternate_branch_comments`, everything else is taken of by
having nodes for each branch and the usual placement.rs fixups).

## Merge Plan

This PR requires coordination between the parser repo and the main ruff
repo. I've split the ruff part, into two stacked PRs which have to be
merged together (only the second one fixes all tests), the first for the
formatter to be reviewed by @MichaReiser and the second for the linter
to be reviewed by @charliermarsh.

* MH: Review and merge
astral-sh/RustPython-Parser#20
* MH: Review and merge or move later in stack
astral-sh/RustPython-Parser#21
* MH: Review and approve
astral-sh/RustPython-Parser#22
* MH: Review and approve formatter PR
#5459
* CM: Review and approve linter PR
#5460
* Merge linter PR in formatter PR, fix ecosystem checks (ecosystem
checks can't run on the formatter PR and won't run on the linter PR, so
we need to merge them first)
 * Merge astral-sh/RustPython-Parser#22
 * Create tag in the parser, update linter+formatter PR
 * Merge linter+formatter PR #5459

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
evanrittenhouse pushed a commit to evanrittenhouse/ruff that referenced this pull request Jul 19, 2023
## Summary

Previously, `StmtIf` was defined recursively as
```rust
pub struct StmtIf {
    pub range: TextRange,
    pub test: Box<Expr>,
    pub body: Vec<Stmt>,
    pub orelse: Vec<Stmt>,
}
```
Every `elif` was represented as an `orelse` with a single `StmtIf`. This
means that this representation couldn't differentiate between
```python
if cond1:
    x = 1
else:
    if cond2:
        x = 2
```
and 
```python
if cond1:
    x = 1
elif cond2:
    x = 2
```
It also makes many checks harder than they need to be because we have to
recurse just to iterate over an entire if-elif-else and because we're
lacking nodes and ranges on the `elif` and `else` branches.

We change the representation to a flat

```rust
pub struct StmtIf {
    pub range: TextRange,
    pub test: Box<Expr>,
    pub body: Vec<Stmt>,
    pub elif_else_clauses: Vec<ElifElseClause>,
}

pub struct ElifElseClause {
    pub range: TextRange,
    pub test: Option<Expr>,
    pub body: Vec<Stmt>,
}
```
where `test: Some(_)` represents an `elif` and `test: None` an else.

This representation is different tradeoff, e.g. we need to allocate the
`Vec<ElifElseClause>`, the `elif`s are now different than the `if`s
(which matters in rules where want to check both `if`s and `elif`s) and
the type system doesn't guarantee that the `test: None` else is actually
last. We're also now a bit more inconsistent since all other `else`,
those from `for`, `while` and `try`, still don't have nodes. With the
new representation some things became easier, e.g. finding the `elif`
token (we can use the start of the `ElifElseClause`) and formatting
comments for if-elif-else (no more dangling comments splitting, we only
have to insert the dangling comment after the colon manually and set
`leading_alternate_branch_comments`, everything else is taken of by
having nodes for each branch and the usual placement.rs fixups).

## Merge Plan

This PR requires coordination between the parser repo and the main ruff
repo. I've split the ruff part, into two stacked PRs which have to be
merged together (only the second one fixes all tests), the first for the
formatter to be reviewed by @MichaReiser and the second for the linter
to be reviewed by @charliermarsh.

* MH: Review and merge
astral-sh/RustPython-Parser#20
* MH: Review and merge or move later in stack
astral-sh/RustPython-Parser#21
* MH: Review and approve
astral-sh/RustPython-Parser#22
* MH: Review and approve formatter PR
astral-sh#5459
* CM: Review and approve linter PR
astral-sh#5460
* Merge linter PR in formatter PR, fix ecosystem checks (ecosystem
checks can't run on the formatter PR and won't run on the linter PR, so
we need to merge them first)
 * Merge astral-sh/RustPython-Parser#22
 * Create tag in the parser, update linter+formatter PR
 * Merge linter+formatter PR astral-sh#5459

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
konstin added a commit to astral-sh/ruff that referenced this pull request Jul 19, 2023
## Summary

Previously, `StmtIf` was defined recursively as
```rust
pub struct StmtIf {
    pub range: TextRange,
    pub test: Box<Expr>,
    pub body: Vec<Stmt>,
    pub orelse: Vec<Stmt>,
}
```
Every `elif` was represented as an `orelse` with a single `StmtIf`. This
means that this representation couldn't differentiate between
```python
if cond1:
    x = 1
else:
    if cond2:
        x = 2
```
and 
```python
if cond1:
    x = 1
elif cond2:
    x = 2
```
It also makes many checks harder than they need to be because we have to
recurse just to iterate over an entire if-elif-else and because we're
lacking nodes and ranges on the `elif` and `else` branches.

We change the representation to a flat

```rust
pub struct StmtIf {
    pub range: TextRange,
    pub test: Box<Expr>,
    pub body: Vec<Stmt>,
    pub elif_else_clauses: Vec<ElifElseClause>,
}

pub struct ElifElseClause {
    pub range: TextRange,
    pub test: Option<Expr>,
    pub body: Vec<Stmt>,
}
```
where `test: Some(_)` represents an `elif` and `test: None` an else.

This representation is different tradeoff, e.g. we need to allocate the
`Vec<ElifElseClause>`, the `elif`s are now different than the `if`s
(which matters in rules where want to check both `if`s and `elif`s) and
the type system doesn't guarantee that the `test: None` else is actually
last. We're also now a bit more inconsistent since all other `else`,
those from `for`, `while` and `try`, still don't have nodes. With the
new representation some things became easier, e.g. finding the `elif`
token (we can use the start of the `ElifElseClause`) and formatting
comments for if-elif-else (no more dangling comments splitting, we only
have to insert the dangling comment after the colon manually and set
`leading_alternate_branch_comments`, everything else is taken of by
having nodes for each branch and the usual placement.rs fixups).

## Merge Plan

This PR requires coordination between the parser repo and the main ruff
repo. I've split the ruff part, into two stacked PRs which have to be
merged together (only the second one fixes all tests), the first for the
formatter to be reviewed by @MichaReiser and the second for the linter
to be reviewed by @charliermarsh.

* MH: Review and merge
astral-sh/RustPython-Parser#20
* MH: Review and merge or move later in stack
astral-sh/RustPython-Parser#21
* MH: Review and approve
astral-sh/RustPython-Parser#22
* MH: Review and approve formatter PR
#5459
* CM: Review and approve linter PR
#5460
* Merge linter PR in formatter PR, fix ecosystem checks (ecosystem
checks can't run on the formatter PR and won't run on the linter PR, so
we need to merge them first)
 * Merge astral-sh/RustPython-Parser#22
 * Create tag in the parser, update linter+formatter PR
 * Merge linter+formatter PR #5459

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants