Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make calling ruff programmatically slightly faster #3694

Open
samuelcolvin opened this issue Mar 23, 2023 · 23 comments
Open

Make calling ruff programmatically slightly faster #3694

samuelcolvin opened this issue Mar 23, 2023 · 23 comments
Labels
core Related to core functionality needs-decision Awaiting a decision from a maintainer

Comments

@samuelcolvin
Copy link

(this is low priority, but maybe interesting)

I using ruff in pytest-examples to lint and fix code examples.

Currently I have to write a ruff.toml config file, then write the python file, then call ruff by subprocess.

I guess if ruff had someway to set all config via cli arguments, then receive the python code via stdin, I could avoid needing to write to files.

@charliermarsh
Copy link
Member

...then receive the python code via stdin.

If helpful in the interim: Ruff can take code via stdin:

ruff on main [$!?] is 📦 v0.0.259 via 🐍 v3.11.0 (.venv) via 🦀 v1.67.1
❯ cat foo.py
───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: foo.py
───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ def f():
   2   │     x = 1
───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

ruff on main [$!?] is 📦 v0.0.259 via 🐍 v3.11.0 (.venv) via 🦀 v1.67.1
❯ cat foo.py | ruff - --stdin-filename=foo.py
foo.py:2:5: F841 [*] Local variable `x` is assigned to but never used
Found 1 error.
[*] 1 potentially fixable with the --fix option.

ruff on main [$!?] is 📦 v0.0.259 via 🐍 v3.11.0 (.venv) via 🦀 v1.67.1
❯ cat foo.py | ruff - --stdin-filename=foo.py --fix
def f():
    pass

@charliermarsh charliermarsh added core Related to core functionality wish Not on the current roadmap; maybe in the future and removed wish Not on the current roadmap; maybe in the future labels Mar 23, 2023
@charliermarsh
Copy link
Member

(But there's no way to provide configuration programmatically. You could write to a tempfile and pass it in as --config /path/to/file.toml if easier, but nothing better than that sadly.)

@samuelcolvin
Copy link
Author

Thanks for the pointer on --stdin-filename.

(But there's no way to provide configuration programmatically. You could write to a tempfile and pass it in as --config /path/to/file.toml if easier, but nothing better than that sadly.)

yup, that's what I'm doing now.

BTW, it's slightly surprising that line-length can be set via the command line.

@charliermarsh
Copy link
Member

Haha it's hidden! We deprecated it but it's still possible :D

@samuelcolvin
Copy link
Author

samuelcolvin commented Mar 23, 2023

ha, thanks for the pointer about stdin, that's working well (seems with stdin, you have to manually define the config file).

It would amazing if config could be set via an (albeit) very ugly command line flag, perhaps base64 encoded to avoid newline weirdnesses.

My solution still involves writing the config on every test which is slowing things down.

@DanCardin
Copy link
Contributor

DanCardin commented Mar 24, 2023

Idk if this sounds crazy or not, but if ruff were to support file-level configuration comments akin to the noqa ones (but also for selection) i.e. # ruff: ignore: N, E5 / # ruff: select: N, E5, then for this particular usecase you could(?) pipe some comment configuration ahead of the code you're piping in.

With the extra benefit that the new behavior would be cross functional for things outside this usecase.

@MichaReiser
Copy link
Member

I would refrain from adding new features to ruff only to make programmatically calling ruff via the CLI slightly faster. The proper solution in my view is to add a python API to lint a file given its content as a string and the settings

@DanCardin
Copy link
Contributor

DanCardin commented Mar 24, 2023

I would think file-level configuration comments (which enable both selecting and ignoring) would be a generally useful feature. given that you can already noqa specific violation types.

...and it just so happens that it would enable you to avoid writing files (even if it's somewhat of a hack for you). Although i absolutely agree, an actual programmatic API would make more sense trying to address this specific problem.

@MichaReiser
Copy link
Member

I would think file-level configuration comments (which enable both selecting and ignoring) would be a generally useful feature. given that you can already noqa specific violation types.

I'm sorry. My intention wasn't to say that this feature isn't useful or that I consider it a hack. The only thing I wanted to convey is that I don't recommend any new features with no practical use other than speeding up ruff's programmatic CLI invocation.

@samuelcolvin
Copy link
Author

I would refrain from adding new features to ruff only to make programmatically calling ruff via the CLI slightly faster. The proper solution in my view is to add a python API to lint a file given its content as a string and the settings

That's exactly what I'm trying to do, but since Ruff doesn't build with pyo3 whatsoever, adding a formal python interface is a massive change.

Hence allowing full functionality via subprocess without touching the filesystem might be the best solution IMHO.

Also @charliermarsh, I've run into a problem using stdin - when using --fix I get back the reformatted content which is great, but there's no way to get error messages in this case.

Surely the formatted python is going to stdout, error messages should be piped to stderr?

@charliermarsh
Copy link
Member

Hmm yeah, perhaps they should be included as stderr -- it's just a bit strange because there's no other context in which we write diagnostics to stderr (then again, there's no other context in which we write "fixed" content to stdout).

If you're looking to parse the diagnostics, you could also consider using --format=json?

@samuelcolvin
Copy link
Author

Currently I'm just using a regex to fix the file name and line number, when printing.

Would be great if stderr could get the diagnostics in this case.

@samuelcolvin
Copy link
Author

I've run into a problem using stdin - when using --fix I get back the reformatted content which is great, but there's no way to get error messages in this case.

For anyone else running into this problem, my fix is to run ruff again without --fix and use the error from that run, see pydantic/pytest-examples#5.

@MichaReiser
Copy link
Member

MichaReiser commented Jul 10, 2023

See #5102 for another use case where an option to pass the configuration (they propose to pass the config as JSON string) would enable more advanced workflows.

@MichaReiser MichaReiser added the needs-decision Awaiting a decision from a maintainer label Jul 10, 2023
@MichaReiser
Copy link
Member

It probably doesn't satisfy all your needs and we don't make any stability guarantees but an alternative is to run ruff's wasm version. I don't know how fast https://github.com/wasmerio/wasmer-python is... could as well be slower than spawning a new process

@Finkregh
Copy link

This would also help when providing pre-commit hooks to others. I'd like to allow others to use prepared hooks to get a consistant linting/formatting without having to add additional config files to their repository.
Currently this excludes setting e.g. pydocstyle as pre-commit only allows setting arguments, not copying files from other repositories.

This also applies to CI jobs, but there you could just copy a config file from somewhere else, of course.

(Thanks for this great tool!)

@samuelcolvin
Copy link
Author

@charliermarsh any progress on this?

This problem get's worse now that we have ruff format which has to be run separately from ruff check --fix - instead of invoking one process for each snippet, we now have to invoke two to use just ruff.

  1. It would be awesome if there was a way to run code through ruff without having to create a new process each time - for me this doesn't need to be a python library, just a way to stream multiple files to ruff, then later stream another etc.
  2. (and I assume this is already on your roadmap) it would be nice if there was a way to get all the stuff done by check --fix in the same process as format.

🙏 thank you.

@zanieb
Copy link
Member

zanieb commented Dec 21, 2023

Hey Samuel!

  1. We're interested in porting ruff-lsp to Rust which would give you a persistent process you could call (Is there interest in moving ruff-lsp implementation into rust? ruff-lsp#300), I wonder if that would work for you.
  2. We are planning a single command; see Unified command for linting and formatting #8232

@charliermarsh
Copy link
Member

I don’t know that the LSP rewrite will help here — would need to look deeper… Since you’ll still have subprocess overhead when calling from Python.

@samuelcolvin
Copy link
Author

I think ruff-lsp might work, creating one subprocess is absolutely fine if we can use it for all tests.

The problem in our case is creating one process (two now we're using ruff format not black for formatting) for every code example in markdown and docstrings while running tersts.

Timing examples:

  • running pdm run pytest tests/test_docs.py runs 478 tests in 7.93s (16ms/test) - that's basically running, linting and formatting for ever code snippet in the code base using pytest-examples, I think much of that time is spent spinning up the (currently) one ruff process per case
  • for comparison, running pdm run pytest tests/test_types.py runs 754 tersts in 1.42s (1.9ms/test)

Those timings roughly match my intuition that launching a rust process takes ~10ms.

@zanieb
Copy link
Member

zanieb commented Dec 21, 2023

We do support formatting of docstring code snippets now :) perhaps what you really want is linter support for code snippets without extracting them yourself.

@samuelcolvin
Copy link
Author

ye, quite possibly.

A few questions:

  • can this work in markdown files too?
  • do you provide a way for magic markdown classes to disable linting, like this
  • (I guess this is possible by disabling some linting rule), we use #> to identify print output which is automatically inserted into snippets

Sorry to derail this discussions, just interested in whether we could get rid of the linting inside pytest-examples, it would make it much simpler.

cc @davidhewitt

@MichaReiser
Copy link
Member

MichaReiser commented Feb 23, 2024

The latest release provides a way to set arbitrary configuration options via the CLI, and you can pass the file content on via stdin. Could you let me know if this is enough for your use case?

Also, @snowsignal is working on our LSP rewrite in rust. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to core functionality needs-decision Awaiting a decision from a maintainer
Projects
None yet
Development

No branches or pull requests

6 participants