Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parsing IPython escape commands #8354

Closed
maximlt opened this issue Oct 30, 2023 · 10 comments
Closed

Support parsing IPython escape commands #8354

maximlt opened this issue Oct 30, 2023 · 10 comments
Labels
parser Related to the parser

Comments

@maximlt
Copy link

maximlt commented Oct 30, 2023

With ruff 0.1.3, running ruff check test.ipynb on a notebook whose content is:

%time x = 2
print(x)

returns:

test.ipynb:cell 2:1:7: F821 Undefined name `x`
Found 1 error.

While I would expect no error to be found. It seems the IPython %time magic isn't supported. I haven't tested other magics.

@konstin konstin added the parser Related to the parser label Oct 30, 2023
@konstin
Copy link
Member

konstin commented Oct 30, 2023

Currently, we don't yet parse the contents of ipython, we merely ignore them.

@charliermarsh
Copy link
Member

@dhruvmanila - What would it take for us to selectively support common magics, i.e., parse the bodies?

@dhruvmanila
Copy link
Member

Related #8094. I'll close that one in favor of this and bring any relevant information here.

I'm a bit hesitant to support parsing magics as there's no clear grammar for it. There are other reasons as well which I'll outline here.

Magic arguments

Each magic command can have any number of CLI type of arguments which needs to be skipped as they aren't valid Python syntax. For example, the %timeit documentation usage is:

%timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] statement

But then how do we differentiate between a CLI type of argument and an actual Python code. That would probably require having an understanding of the supported magic commands and basically adding the logic for parsing such commands in Ruff.

Syntax

The linked issue (#8094) provides a real world example for a false positive. I've narrowed it down to the following lines:

versions = {'Latest': 1, 'v7.2-112': 2, 'v6': 3}
# ^^^^^^ F841 unused variable
!sed -i '/^GPUOWL_PRP_PM1=/c\GPUOWL_PRP_PM1={versions[gpuowl_prp_pm1]}' '{file}'
#                                            ^^^^^^^^ used here

Here, the warning we raise is that the versions variable is defined but unused (F841). The problem here is that the variable is used in bash syntax and not Python.

That said, if we do decide to parse the escape commands, the following would be required:

  1. Understanding the said magic command (in case of %), it's CLI type arguments and the position of Python code in the command itself
  2. Integrating a shell parser (in case of !), updating our AST structure to include that.

Even in the case of (2), we would further need to add an ability to parse arguments for the shell commands (see the above sed example).

@dhruvmanila dhruvmanila added the needs-decision Awaiting a decision from a maintainer label Oct 31, 2023
@dhruvmanila dhruvmanila changed the title IPython %time magic not supported Support parsing IPython escape commands Oct 31, 2023
@dhruvmanila
Copy link
Member

(Converting this into a general issue to discuss parsing IPython escape commands.)

@tdulcet
Copy link

tdulcet commented Oct 31, 2023

2. Integrating a shell parser (in case of !), updating our AST structure to include that.

I am not sure if Ruff would actually need a full shell parser, as least not for that example from my notebook. Ruff would only need to look for matching pairs of curly brackets and check if there is valid Python syntax inside. This would need to be done before running a shell parser anyway, as it could be invalid Bash syntax, especially if it is unquoted. Jupyter seems to be smart enough to silently ignore any curly brackets with invalid Python syntax inside, such as with Bash brace expansion or an awk command for example.

Supporting my first example in #8094 likely would require a shell parser, but that would only affect a single Ruff rule (F841) compared to the bracket syntax that could affect dozens of rules since it can include arbitrary Python expressions. Note that Python does include a (partial) shell parser with the shlex library.

@henryiii
Copy link
Contributor

The simplest starting point for cell magics would be to mark some cell magics as transparent (possibly a customizable list to allow third party magics to be added?). That would allow this:

y = 1
%%timeit
x = y

to no longer mark y as unused. It's only a first order fix - %%timeit doesn't add variables to the environment, it only captures it. So x isn't set to y after this statement. But that's probably a lot better than all the errors that pop up about unused imports and variables; I'm having to deactivate related checks due to this. Some other cell magics, like %%time, are truly transparent, they are just like normal code.

I think these are the built-in transparent cell magics (based on looking at the details given by %magic):

  • %%capture
  • %%debug
  • %%prun
  • %%pypy
  • %%python
  • %%python3
  • %%time
  • %%timeit (captures environment, but doesn't set new variables)
  • %%writefile/%%file (only with a .py extension).

Supporting inline magics seems like a second order issue, and a tricker one, since the arguments are inline with the body. And you can rewrite the above example into a cell magic easily.

@henryiii
Copy link
Contributor

@charliermarsh
Copy link
Member

I'm very interested in supporting this!

@dhruvmanila
Copy link
Member

The simplest starting point for cell magics would be to mark some cell magics as transparent (possibly a customizable list to allow third party magics to be added?).

Correct me if I'm wrong but by transparent, I'm assuming you mean that the variables / imports defined in that cell are available in the global scope to be accessed by other code in other cells.

That would allow this:

y = 1
%%timeit
x = y

to no longer mark y as unused.

I see what you mean here. I think this should be trivial to implement. We would just need to update the is_magic_cell function to consider the cells as valid if it's one of the listed magics. The parser then should take care of the rest.

/// Returns `true` if a cell should be ignored due to the use of cell magics.
fn is_magic_cell<'a>(lines: impl Iterator<Item = &'a str>) -> bool {

For example, taking the above code:

Module(
    ModModule {
        range: 0..20,
        body: [
            Assign(
                StmtAssign { .. },  // y = 1
            ),
            IpyEscapeCommand(
                StmtIpyEscapeCommand {
                    range: 6..13,
                    kind: Magic2,
                    value: "timeit",
                },
            ),
            Assign(
                StmtAssign { .. },  // x = y
            ),
        ],
    },
)

This means we wouldn't mark y as unused then.

dhruvmanila added a commit that referenced this issue Dec 7, 2023
## Summary

This PR updates the logic for `is_magic_cell` to include certain cell
magics. These cell magics would contain Python code following the line
defining the command. The code could define a variable which can then be
referenced in other cells. Currently, we would ignore the cell
completely leading to undefined-name violation.

As discussed in
#8354 (comment)

## Test Plan

Add new test case to validate this scenario.
@charliermarsh
Copy link
Member

I believe we now support these transparent / simple cases, which is probably the best we can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to the parser
Projects
None yet
Development

No branches or pull requests

6 participants