Skip to content

Conversation

@stefanvanburen
Copy link
Member

The existing semantic tokens implementation had a couple of issues - for example, message MyMessage was all considered a semantic token of type "struct", when only "MyMessage" should be considered a struct and "message" should be a keyword.

Iterating on that, I've added some more testcases and handling for the various protobuf types. To handle things that aren't directly in the file's symbol's IR, I reworked the main loop to just collect the tokens up front, and then figure out sorting and encoding once we've grabbed all of the tokens.


I'm fairly certain this can be improved more, but think it's already a decent improvement on what we have, and lays the groundwork for future tweaks with additional testing.

For testing this, I highly recommend checking out this PR, then running make installbuf, and in neovim, using the :Inspect command on different tokens under the cursor, which should show a "Semantic Tokens" section that contains the semantic token under the cursor.

The existing semantic tokens implementation had a couple of issues - for
example, `message MyMessage` was all considered a semantic token of type
"struct", when only "MyMessage" should be considered a struct and
"message" should be a keyword.

Iterating on that, I've added some more testcases and handling for the
various protobuf types. To handle things that aren't directly in the
file's symbol's IR, I reworked the main loop to just collect the tokens
up front, and then figure out sorting and encoding once we've grabbed
all of the tokens.

---

I'm fairly certain this can be improved more, but think it's already
a decent improvement on what we have, and lays the groundwork for future
tweaks with additional testing.

For testing this, I highly recommend checking out this PR, then running
`make installbuf`, and in neovim, using the `:Inspect` command on
different tokens under the cursor, which should show a "Semantic Tokens"
section that contains the semantic token under the cursor.
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

The latest Buf updates on your PR. Results from workflow Buf CI / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJan 12, 2026, 5:47 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the impl in server.go was getting unwieldy, and the constants only pertained to semantic tokens, so I figured a separate file was reasonable now.

Comment on lines +126 to +137
// Collect all comments and certain keywords that can't be fetched in the IR from token stream
for tok := range astFile.Stream().All() {
if tok.Kind() == token.Comment {
collectToken(tok.Span(), semanticTypeComment, 0, keyword.Unknown)
}
kw := tok.Keyword()
switch kw {
// These keywords seemingly are not easy to reach via the IR.
case keyword.Option, keyword.Reserved, keyword.To, keyword.Returns:
collectToken(tok.Span(), semanticTypeKeyword, 0, kw)
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally I had this grabbing all the keywords, but it's too greedy: syntax operators like =, ;, { and } and all considered keywords in the taxa, and it makes the highlighting too confusing (everything becomes bolded as a keyword).

Instead, I tried to be more specific and go through the individual elements below. Again, this could probably be improved; I'm doubtless missing elements that we could probably add.

Comment on lines +173 to +174
// Invalid import should not have string token (only resolved imports get string tokens)
{6, 12, 13, semanticTypeString, "invalid import path should not have string token"},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test-case with an import to a file that doesn't exist, so the IR doesn't have it; seems fine to me — the alternative is uglier.

Comment on lines +167 to +172
{0, 7, 1, semanticTypeKeyword, "'=' should not be keyword"},
{0, 17, 1, semanticTypeKeyword, "';' should not be keyword"},
{10, 20, 1, semanticTypeKeyword, "'=' should not be keyword"},
{10, 26, 1, semanticTypeKeyword, "';' should not be keyword"},
{13, 14, 1, semanticTypeKeyword, "'=' should not be keyword"},
{13, 17, 1, semanticTypeKeyword, "';' should not be keyword"},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of these "operators"(?) ought to be considered keywords; just leaving these negative tests around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be too noisy to consider the operators in proto for keyword tokens, since it's mostly = for alignment and ; for breaking decl statements. I like this as-is.

// decorator (option)
{10, 9, 10, semanticTypeDecorator, "'deprecated' as decorator"},
// built-in type
{13, 2, 6, semanticTypeProperty, "'string' as property"},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to revisit this; not sure that in string name = 1;, the string is a "property" (name is the property); maybe more just a type or keyword?:

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.18/specification/#textDocument_semanticTokens

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea of having built-in scalars being distinct from types declared for protos, but also, they aren't exactly keywords (e.g. I think bool is a type, and true and false are keywords). Since there isn't really a way to differentiate the types, maybe the scalar types should just be type... based on the spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think type is reasonable; I wish there was a scalarType or something in the spec - oh well!: 66a7a88

Included the "syntax highlighting" bit because some users may not know
what semantic tokens actually does from a feature perspective.
@stefanvanburen stefanvanburen merged commit c7390bb into main Jan 12, 2026
10 checks passed
@stefanvanburen stefanvanburen deleted the svanburen/semantic-tokens-improvement branch January 12, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants