New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow language extensions to have consistent symbol matching for document and workspace #34605

Closed
gilmoreorless opened this Issue Sep 18, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@gilmoreorless

gilmoreorless commented Sep 18, 2017

There’s an inconsistency in how language extensions are able to match symbols for a single document versus a workspace.

I’ll explain with an example. Assume for the sake of simplicity that all files in a workspace are the same language, and there is only one extension registered for that language.

When a user chooses “Go to Symbol in File”:

  1. If the language extension has registered a DocumentSymbolProvider, the provideDocumentSymbols method is called with the currently-focused TextDocument.
  2. The language extension returns a list of all symbols in the document.
  3. VS Code filters the full list to match what the user has typed.
  4. VS Code ranks and highlights matched string parts.

When a user chooses “Go to Symbol in Workspace”:

  1. If the language extension has registered a WorkspaceSymbolProvider, the provideWorkspaceSymbols method is called with a query string.
  2. The language extension returns a list of symbols in the workspace that match the query string. In this case, the extension filters the list to match what the user has typed.
  3. VS Code ranks and highlights matched string parts.

So the main difference is that for a document VS Code is doing the string matching, but for a workspace the extension is doing the string matching. This produces inconsistencies because they are not using the same matching algorithm. The problem is the same regardless of whether the language extension is doing the symbol matching within the extension process or as part of a client/server model.

This difference in behaviour has led to multiple issues in the past (e.g. #20039 – "Go to symbol" looks like only take care of upper cases). Within my vscode-zoneinfo extension, I’m using a knowingly-naïve string matching method for the provideWorkspaceSymbols method, because I know that I’m almost guaranteed to have different results from the single-file symbol matching.

I don’t have a definitive solution, but I do have some suggestions.

Option 1: Change provideDocumentSymbols to accept a query string

Pro:
If the provideDocumentSymbols method accepted an extra argument for the query string, an extension could then use the same internal methods for matching document and workspace symbols consistently.

Con:
Of course, this continues to have the current downside that different extensions will match symbols in different ways, so there will still be inconsistent results across languages.

Option 2: Provide a shared API for symbol name matching

There have been many issues raised previously about the fuzzy-ish string matching behaviour in VS Code, collected in a single meta-issue at #27317. If there are going to be changes to the string matching behaviour, it would be a good opportunity to provide the matcher as an API for extensions.

Pro:
If an extension’s provideWorkspaceSymbols method could call something like vscode.matchString(symbolName, query), there would be consistency between document and workspace symbol matching.

Con:
This would only work for language extensions that find symbols within the extension process. I presume that most language extensions use the language server protocol instead, so they won’t be able to use a vscode.* API.

A partial solution to this could be to provide the string matcher as a separate Node.js module (or use one of a few existing ones), which can then be included in VS Code as well as a language server. Unfortunately that only helps language servers that use the Node.js ecosystem. Thinking out loud... maybe provide an open abstract description of the matching algorithm, so that other language servers could implement it as well? Though that has veered wildly into massively over-engineering a solution to a relatively minor problem...

@jrieken

This comment has been minimized.

Member

jrieken commented Sep 19, 2017

So the main difference is that for a document VS Code is doing the string matching, but for a workspace the extension is doing the string matching.

Yeah, the reasons for that is that workspace symbols can be in ten-thousands and sending them all back and forth might be to expensive for a language service. That why we provide the query string, ideally we full access to all models.

For document symbols we expect less symbols and we also have a display of all symbols (in their order, in future in a hierarchy). Then we take on filtering and highlighting.

I think these constraints remain and that we should spec how we expect language servers to interpret/match that query. Many do a "starts-with" or "indexOf" match but would favour a more lax subsequent string matching. So, foo matches For you because all letters f, o, o appear in that order, case-insensitive, in the target string.

@gilmoreorless

This comment has been minimized.

gilmoreorless commented Sep 19, 2017

Yep, the constraints make sense, especially regarding a workspace (which is why I suggested making provideDocumentSymbols work more like provideWorkspaceSymbols, rather than the other way around).

Many do a "starts-with" or "indexOf" match but would favour a more lax subsequent string matching. So, foo matches For you because all letters f, o, o appear in that order, case-insensitive, in the target string.

That’s the approach I ended up taking with my extension in lieu of proper fuzzy searching, but a lot of the results don’t get the matched parts highlighted in the picker:

image

That’s not to say that the matching and/or highlighting is wrong, just inconsistent due to the different processes involved.

@jrieken

This comment has been minimized.

Member

jrieken commented Sep 19, 2017

That’s not to say that the matching and/or highlighting is wrong, just inconsistent due to the different processes involved.

Yeah, we have a strong matching algorithm that we use for IntelliSense and we should also use it here.

jrieken added a commit that referenced this issue Sep 22, 2017

@jrieken

This comment has been minimized.

Member

jrieken commented Dec 11, 2017

Closing this as we have updated the docs about this

@jrieken jrieken closed this Dec 11, 2017

@vscodebot vscodebot bot locked and limited conversation to collaborators Jan 25, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.