Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support semantic highlighting #18

Closed
hackwaly opened this issue Jun 28, 2016 · 122 comments
Closed

Support semantic highlighting #18

hackwaly opened this issue Jun 28, 2016 · 122 comments
Milestone

Comments

@hackwaly
Copy link

@hackwaly hackwaly commented Jun 28, 2016

Like WebStorm and VS does: eg. Symbol is a type or parameter or namespace or unresolved ...

Textmate based grammars are hard to do this. Since we did support Diagnostics, why not support semantic highlighting?

@dbaeumer dbaeumer added this to the 3.0 milestone Jun 29, 2016
@dbaeumer
Copy link
Member

@dbaeumer dbaeumer commented Jun 29, 2016

Fully agree and we already discussed it since we would like to have it in VS Code as well.

@daviwil
Copy link
Member

@daviwil daviwil commented Jun 29, 2016

I'd love to have this for the PowerShell extension also. We provide the ability to create "dynamic keywords" for the purpose of writing domain-specific languages. Semantic highlighting would allow us to colorize those keywords in VS Code even though they aren't part of the PowerShell language spec.

/cc @BrucePay

@smarr
Copy link

@smarr smarr commented Jul 7, 2016

I was looking for that as well.

What's exactly the meaning of the current 'highlighting' for read/write/text (see document highlights)?

I implemented that part but didn't see any for of visual feedback in the editor.

@cdietrich
Copy link

@cdietrich cdietrich commented Jul 7, 2016

DocumentHighlight is for "mark occurrences"

@smarr
Copy link

@smarr smarr commented Jul 7, 2016

@cdietrich sorry for hijacking the topic, but this is very unclear to me. How is document highlights used to realize a 'mark occurrences" feature? How does the read/write/text distinction fit in, and why does it only expect a single result in return? For mark occurrences, I'd expect to be able to return a collection, no?

Would be great if that could be clarified also in protocol.md.

@cdietrich
Copy link

@cdietrich cdietrich commented Jul 7, 2016

@smarr yes you are right. this makes no sense. i asume the return type should be an array.
if i have a look at vscode i can find

export interface DocumentHighlightProvider {
        provideDocumentHighlights(model: editor.IReadOnlyModel, position: Position, token: CancellationToken): DocumentHighlight[] | Thenable<DocumentHighlight[]>;
    }

thus looks like a bug in the protocol

@smarr
Copy link

@smarr smarr commented Jul 7, 2016

@cdietrich thanks, I'll open a separate issue.

@svenefftinge
Copy link
Contributor

@svenefftinge svenefftinge commented Nov 15, 2016

Please find a proposal in PR #124

@jpike88
Copy link

@jpike88 jpike88 commented May 11, 2018

I’m guessing this is dead... anyone got any updates?

@svenefftinge
Copy link
Contributor

@svenefftinge svenefftinge commented May 11, 2018

I am working on a new proposal for semantic coloring.

@axelson
Copy link
Contributor

@axelson axelson commented Jun 22, 2020

As you can see with the new proposed protocol there is now a format and capabilities to indicate what requests the client will sent.

Is there a link where the updated protocol can be reviewed? Is it still primarily in the typescript server?

@dbaeumer
Copy link
Member

@dbaeumer dbaeumer commented Jun 22, 2020

@axelson the proposed version is here in terms of implementation. I am in the process of writing the markdown. https://github.com/microsoft/vscode-languageserver-node/blob/master/protocol/src/common/protocol.semanticTokens.proposed.ts#L1

@dbaeumer
Copy link
Member

@dbaeumer dbaeumer commented Jun 23, 2020

And here is a first version of the spec. No word polish and no spell check :-)

https://microsoft.github.io/language-server-protocol/specifications/specification-3-16/#textDocument_semanticTokens

@kjeremy
Copy link
Contributor

@kjeremy kjeremy commented Jun 23, 2020

@dbaeumer When will we need to move our LSP server implementation over to this one from the current semantic tokens implementation supported in vscode?

@dbaeumer
Copy link
Member

@dbaeumer dbaeumer commented Jun 24, 2020

@kjeremy Are you relying on the implementation in the next version of the LSP libs ?

@kjeremy
Copy link
Contributor

@kjeremy kjeremy commented Jun 24, 2020

@dbaeumer Our server-side implementation is in rust and based on https://github.com/gluon-lang/lsp-types which will need to be updated. The client-side opts in via https://github.com/rust-analyzer/rust-analyzer/blob/master/editors/code/src/client.ts#L151

@ghost
Copy link

@ghost ghost commented Jun 24, 2020

@dbaeumer I really think a remark that clients are expected to cache locally with ranges would be quite important unless you want to make deltas de-facto required for servers. Right now, I can't see such a remark, it should probably go into "Implementation considerations." (Unless you don't think it should be optional @ deltas, but you did sound like you wanted to keep it in but as optional.)

If you don't write that in, some clients will just not do such caching and then servers will be required to implement deltas to guarantee good performance or users will blame it on them, making deltas basically a must.

Edit: also, right now the general concepts part is mixed with the pretty specific integer encoding in one block, I find that this is suboptimal for readability.

Suggestion in detail to split up text:

I think the General Concepts section would benefit from the integer encoding part split into a separate section placed separately. In detail, I am suggesting to split at the following start/end points:

Split start at:

On the capability level types and modifiers are defined using strings. However the real encoding happens using numbers to save bandwidth. [SPLIT HERE]

I would start to cut out parts for a new section after this, named Integer Encoding for Tokens, which contains the text up to BEFORE this paragraph:

Split end at:

[SPLIT HERE] The protocol defines an additional token format capability to allow future extensions of the format. The only format that is currently specified is relative expressing that the tokens are described using relative positions.

So the new General Concepts starting section would go like:

[... General Concepts as before ...]
On the capability level types and modifiers are defined using strings. However the real encoding happens using numbers to save bandwidth, see Integer Encoding (link to section).
The protocol also defines an additional token format capability to allow future extensions of the format. The only format that is currently specified is relative expressing that the tokens are described using relative positions.
[... General Concepts ending parts as before...]

... with the Integer Encoding for Tokens section separated out to below based on above cut out starting with:

The server therefore needs to let the client know which numbers it is using for which types and modifiers. They do so using a legend, which is defined as follows: [...remaining Integer Encoding concepts up to split end...]

This new section could be additionally prefixed with an introducing sentence like: This section describes the integer encoding used for data transfer.

@kjeremy
Copy link
Contributor

@kjeremy kjeremy commented Jul 24, 2020

@dbaeumer

I'm trying to figure out the delta behavior and have some questions:

  1. Does the server fill in the resultId for the range request? That seems odd to me (I'm not sure what I would do with that information).

  2. For the delta I am holding onto all the tokens for the file and computing a diff against that and returning it like how SemanticTokensBuilder works. Does the resultId returned from the delta request represent the state of ALL the tokens in the file?

I guess ultimately I'm trying to figure out what I need to hold onto to compute the deltas. I am assuming that each deltas really asks for the delta between "now" and the previous delta or full request. It does not appear to be spec'd that way however and it could be that a client asks for a delta between "now" and many revisions ago.

@puremourning
Copy link

@puremourning puremourning commented Aug 8, 2020

@dbaeumer i'm reading through the initial draft spec (thanks for writing it up!). I have a handful of comments/questions/clarifications. What's the best way to provide feedback? would you like comments on a PR or something like that?

@dbaeumer
Copy link
Member

@dbaeumer dbaeumer commented Aug 19, 2020

@puremourning I think the comments are best provided as a PR.

@dbaeumer
Copy link
Member

@dbaeumer dbaeumer commented Aug 19, 2020

@kjeremy you are correct a delta for range makes no sense. I am pretty sure VS Code will basically drop a previous result ID.

For the delta is should be always reported against the last result independent whether this was a full or a delta response. So in an easy implementation the id is simply incremented. I have clarified this in the spec.

@dbaeumer
Copy link
Member

@dbaeumer dbaeumer commented Aug 19, 2020

@etc0de thanks for the suggestions. I made them in the 3.16 version of the spec.

@MarFren
Copy link

@MarFren MarFren commented Aug 27, 2020

Regarding #18 (comment)

The idea is that the client pulls for semantic tokens for all visible files. The one not having the focus should be pulled with less frequency. Assuming that the server implements the delta mechanism a pull for a file that has no semantic color changes will not produce any additional payload. This model was chosen to not sync any UI state. Otherwise the server might do an expensive impact analysis about which files need to refresh semantic tokens although non of them are visible in the UI.

If I understand it correctly, the basic idea is to let the client pull semantic tokens for all open files on a regular basis, the one having the focus with high frequency, the other files with low frequency (but automatically). That would allow for updating the semantic highlighting in all open files, even if file A has the focus and a change here does induce a change in highlighting in another file B - am I correct? If so, the current reference implementation in VSCode Insiders (1,49.0-insiders) does not implement automated pulls of semantic tokens of open files, right? Wouldn't it be good to describe the expected pull behavior in the 3.16 specs of LSP?

@kaby76
Copy link
Contributor

@kaby76 kaby76 commented Sep 1, 2020

Now that my LSP server is more stable, I decided to update the extension for VSCode and give the semantic highlighting feature a spin. On the server, I decided to first try out SemanticTokensOptions { range = false, full = true}. However, I see the following:

[Error - 8:40:58 PM] Request textDocument/semanticTokens failed.
  Message: No method by the name 'textDocument/semanticTokens' is found.
  Code: -32601 

Well, yeah, that's right. There is no "textDocument/semanticTokens" message because it only mentions in the spec "textDocument/semanticTokens/range", "textDocument/semanticTokens/full", and "textDocument/semanticTokens/full/delta", but no "textDocument/semanticTokens". Is this spec up to date? I can debug VSCode, figure out what is it expecting, and program to the implementation, but it would be nice to know what's going on.

@DanTup
Copy link
Contributor

@DanTup DanTup commented Sep 1, 2020

@kaby76 I think the request is constructed by the vscode-languageclient package so the issue might be that the extension is not using the latest version of that package (7.0.0-next.9 is the one I've been using to test in the Dart extension and it seems to work well - though I've only implemented /full so far).

@dbaeumer
Copy link
Member

@dbaeumer dbaeumer commented Sep 1, 2020

@MarFren adding this as a recommendation to the spec makes sense. A PR is welcome. However the LSP spec never enforces this. If a client decides to not do this it should still be fine.

@DanTup
Copy link
Contributor

@DanTup DanTup commented Sep 1, 2020

@dbaeumer in the spec for semanticToken/range it says the result is:

result: SemanticTokens | null where SemanticTokensDelta

It seems like the line is incomplete (or the last part shouldn't be there).

If the return value is SemanticTokens, is the lineDelta from the range that was requested, or from the start of the document? (I couldn't find this mentioned anywhere).

@kaby76
Copy link
Contributor

@kaby76 kaby76 commented Sep 2, 2020

@DanTup That worked. I had to also set up a few other dependencies

    "vscode-jsonrpc": "^6.0.0-next.5",
    "vscode-languageclient": "^7.0.0-next.9",
    "vscode-languageserver": "^7.0.0-next.7",
    "vscode-languageserver-protocol": "^3.16.0-next.7",
    "vscode-languageserver-types": "^3.16.0-next.3"

use the "Insiders" VSCode, and change an import because LanguageClient was moved around:

import * as vscodelc from 'vscode-languageclient/node';

But, "textDocument/semanticTokens/full" starting to work. Computing the start line/col deltas a little challenging. And VSCode does not work the same as the LSP client in VS2019 with edits.

@DanTup
Copy link
Contributor

@DanTup DanTup commented Sep 2, 2020

I had to make the /node import changes, though I don't think I needed to use insiders (nor add the other dependencies).

Computing the start line/col deltas a little challenging

I've been refactoring my server work to collect the tokens using absolute data initially (line/cols, enum types) and then at the end do the conversion to the LSP format. This made it much simpler than when I was also doing things like splitting up multiline/nested tokens at the same time. The final conversion is now relatively simple:

var lastLine = 0;
var lastColumn = 0;

_tokens.sort(
  (t1, t2) => t1.line == t2.line
      ? t1.column.compareTo(t2.column)
      : t1.line.compareTo(t2.line),
);

for (final token in _tokens) {
  var relativeLine = token.line - lastLine;
  var relativeColumn = relativeLine == 0
      ? token.column - lastColumn
      : token.column;

  encodedTokens.addAll([
    relativeLine,
    relativeColumn,
    token.length,
    semanticTokenLegend.indexForType(token.type),
    semanticTokenLegend.bitmaskForModifiers(token.modifiers) ?? 0
  ]);

  lastLine = token.line;
  lastColumn = token.column;
}

@kaby76
Copy link
Contributor

@kaby76 kaby76 commented Sep 2, 2020

@DanTup Yes, my code looks more or less just like your code after I noticed initially that only the first symbol was being colored. I tried out some hardwired values, then understood what had to be done (i.e., sort + compute diffs).

@dbaeumer
Copy link
Member

@dbaeumer dbaeumer commented Nov 3, 2020

I will close the issue since SC is now part of the upcoming 3.16 spec.

@dbaeumer dbaeumer closed this Nov 3, 2020
@vscodebot vscodebot bot locked and limited conversation to collaborators Dec 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests