-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write a script to aggregate OSS data on Ruff configuration #3365
Comments
Labeling this as "good first issue", not because it's well-scoped, but because it's a relatively independent project (and could be done without writing any Rust, if anyone is eager to help out). |
I could take a look – https://github.com/search?q=path%3A**%2Fpyproject.toml+ruff&type=code seems like a good starting point (along with |
fwiw it seems that github doesn't expose an api to their dependency graph, so i'd also go with that code search (or something api friendlier if github has too tight query limits) |
@konstin Dependency graph probably wouldn't directly pick up e.g. |
Alright, here goes with some initial data 😁
The GitHub Search API doesn't seem to be using the new Code Search stuff and is heavily rate-limited, so the dataset may not be great just yet (PRs welcome, of course). Let me know if you want the files I have so far! |
What about the |
If you mean the Similarly, I guess a more holistic view (that knew about all of the codes Ruff knows about) would be an interesting next step! |
Oooh this is so useful! Thank you @akx! The other piece of data that would be really useful to see, though not sure whether it can fit into this paradigm, is how often various codes are used in Another question I'd have (though not your responsibility to answer, only if you're curious): I know the data is based on 306 TOML files. I'd be interested to know how often various fields are set vs. unset. (E.g., the |
@charliermarsh Sure! I updated the gist with "Fields set in configuration" (since I already had that data anyway, it just wasn't aggregated). It's also based on a slightly larger dataset (I realized I didn't process Finding noqas will require a bit more magic, but now that I do have a good starter dataset of which repos may have Ruff enabled, we can go from there (but not right now since I have 9% laptop battery left 😂) |
This comment was marked as outdated.
This comment was marked as outdated.
Welp, turns out there was a bug or two that conflated "unset" with "empty set" 🙄 – I think this data is now more or less correct:
Also updated into the gist, with all of the details... |
Updated with some 90+ new repos in a fortnight: https://gist.github.com/akx/211308a4d2b31aaf4412558af6fe62a1
|
Awesome, thank you for this! (Gonna close issue :)) |
https://gist.github.com/akx/817f5dc5663b80ae1315e108393b11a5 126 new unique configuration files in a week 🎉
|
I love these updates. Thank you @akx, for updating them. |
https://gist.github.com/akx/291e96a3cb4f085d86e4830eecb5375e
|
https://gist.github.com/akx/01c0d37eedd921c0b88d06262812413a
|
Thank you for the continued data collection! Would it be feasible to filter out forks? i think we're seeing quite a bit of duplication in the dataset from them, there's e.g. 111 OpenBBTerminal in https://github.com/akx/ruff-usage-aggregate/blob/master/data/known-github-tomls.jsonl . |
@konstin Forks might be hard to find without consulting the GitHub API more, but we're already only considering unique TOMLs: |
Right now, it's hard for us to make data-informed decisions. It'd be nice to leverage our open-source usage to help understand questions like:
# noqa
ignored (i.e., false positives)?This is good prior art (\ht @konstin): rust-lang/rust-clippy#7666
The text was updated successfully, but these errors were encountered: