-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Currently, for compatibility with clap, each language provides static "prepared queries". For example, the definition for Python is:
srgn/src/scoping/langs/python.rs
Lines 36 to 77 in 635839b
| /// Prepared tree-sitter queries for Python. | |
| #[derive(Debug, Clone, Copy, ValueEnum)] | |
| pub enum PreparedQuery { | |
| /// Comments. | |
| Comments, | |
| /// Strings (raw, byte, f-strings; interpolation not included). | |
| Strings, | |
| /// Module names in imports (incl. periods; excl. `import`/`from`/`as`/`*`). | |
| Imports, | |
| /// Docstrings (not including multi-line strings). | |
| DocStrings, | |
| /// Function names, at the definition site. | |
| FunctionNames, | |
| /// Function calls. | |
| FunctionCalls, | |
| /// Class definitions (in their entirety). | |
| Class, | |
| /// Function definitions (*all* `def` block in their entirety). | |
| Def, | |
| /// Async function definitions (*all* `async def` block in their entirety). | |
| AsyncDef, | |
| /// Function definitions inside `class` bodies. | |
| Methods, | |
| /// Function definitions decorated as `classmethod` (excl. the decorator). | |
| ClassMethods, | |
| /// Function definitions decorated as `staticmethod` (excl. the decorator). | |
| StaticMethods, | |
| /// `with` blocks (in their entirety). | |
| With, | |
| /// `try` blocks (in their entirety). | |
| Try, | |
| /// `lambda` statements (in their entirety). | |
| Lambda, | |
| /// Global, i.e. module-level variables. | |
| Globals, | |
| /// Identifiers for variables (left-hand side of assignments). | |
| VariableIdentifiers, | |
| /// Types in type hints. | |
| Types, | |
| /// Identifiers (variable names, ...). | |
| Identifiers, | |
| } |
Notice that the enum is a unit enum, i.e. variants do not have associated data (i.e. they aren't tuple or struct variants). The enum is later mapped to tree-sitter queries like:
srgn/src/scoping/langs/python.rs
Lines 79 to 83 in 635839b
| impl PreparedQuery { | |
| #[allow(clippy::too_many_lines)] | |
| const fn as_str(self) -> &'static str { | |
| match self { | |
| Self::Comments => "(comment) @comment", |
Notice how the result is actually a &'static str. It'd be super useful to have this be more dynamic. For example, a definition more like (abbreviated for the example):
#[derive(Debug, ValueEnum)]
enum PreparedQuery {
Strings,
Class(Option<String>),
}which means:
-
we can query for a
Stringin Python, such as"hello world": it does not have a concept of "namedness", so it remains a unit variant -
Python
classes however do have a name:class TheName: ...
The
Option<String>now says:- if it's
None, query for all classes, of any name - if it's
Some(name_pattern), query only classes whose name matches the pattern
The concept of "can be named" expends to functions, modules etc., while things like "comments" remain unnamed.
Note: some things could carry multiple names. E.g., assignment like
x = 3could be an enum variant of roughlyAssignment(Option<String>, Option<String>), to say "left side of equal signs has to match.0, right side.1. IfNone, would mean "matches anything" again. This would be a nice-to-have.Note: the
Option<String>could also be justString, with a default value of.*, aka "matches anything" regex pattern. I use this style here:Line 1063 in 635839b
default_value = GLOBAL_SCOPE, aka the CLI argument is a
Stringwith adefault_value, instead of anOption<String>with more logic attached to it. The former style is simpler and works. - if it's
So when we extract a tree-sitter query later on, it would look more like:
impl PreparedQuery {
fn as_str(&self) -> String {
match self {
Self::Strings => "(string_content) @string",
Self::Class(None) => "(class_definition) @class", // ANY class
Self::Class(Some(pattern)) => r#"(class_definition name: (identifier) @x (@match? @x "{pattern}"))"#, // only classes whose `name` matches the `pattern`
}
}.into()
}which would open a whole new level of usage. Ideally, this would be a drop-in replacement for
Line 1505 in 635839b
| python: Vec<python::PreparedQuery>, |
which would continue to "just work", just with added benefits. The CLI would then look like:
$ srgn --python strings # find all strings in Python source code
$ srgn --python class # find all `class`es, anywhere
$ srgn --python class 'Test.+' # find all `class`es whose name matches this regex
$ srgn --python class -- 'hello .+' # find the regex 'hello .+' in _any_ class; `--` disambiguates positional arg
$ srgn --python class 'Meta.+' -- 'bye .+' # find the regex 'bye .+' _only_ inside of classes matching the regexThis seems pretty dynamic, so not sure it could work. It mainly hinges on clap-rs/clap#2621.
Workarounds
All queries as individual flags
Example usage:
$ srgn --python-class
$ srgn --python-class -- bla
$ srgn --python-class 'Test.+' -- blawith the same logic as above. In source code, it would be something like:
#[derive(Parser, Debug, Clone)]
#[group(required = false, multiple = false)]
struct PythonScope {
/// A Python class.
#[arg(long, env, verbatim_doc_comment, default_missing_value = "", num_args=0..=1)]
python_class: Option<python::Class>,with a custom impl FromStr for Class. A bit of a lackluster solution:
- lots of boilerplate
- manual mapping of the different
python_<whatever>options - can no longer do pipelining: for
srgn --python-class 'Test.+' --python-string, the current logic ofsrgnis to look for strings only inside of bodies of classes (of name'Test.+'). With Rust likepython_class, I don't think we'll be able to access the order of arguments; we just get the fact they are present or not.