Improve LSP semantic tokens #4270

stefanvanburen · 2026-01-09T21:46:04Z

The existing semantic tokens implementation had a couple of issues - for example, message MyMessage was all considered a semantic token of type "struct", when only "MyMessage" should be considered a struct and "message" should be a keyword.

Iterating on that, I've added some more testcases and handling for the various protobuf types. To handle things that aren't directly in the file's symbol's IR, I reworked the main loop to just collect the tokens up front, and then figure out sorting and encoding once we've grabbed all of the tokens.

I'm fairly certain this can be improved more, but think it's already a decent improvement on what we have, and lays the groundwork for future tweaks with additional testing.

For testing this, I highly recommend checking out this PR, then running make installbuf, and in neovim, using the :Inspect command on different tokens under the cursor, which should show a "Semantic Tokens" section that contains the semantic token under the cursor.

The existing semantic tokens implementation had a couple of issues - for example, `message MyMessage` was all considered a semantic token of type "struct", when only "MyMessage" should be considered a struct and "message" should be a keyword. Iterating on that, I've added some more testcases and handling for the various protobuf types. To handle things that aren't directly in the file's symbol's IR, I reworked the main loop to just collect the tokens up front, and then figure out sorting and encoding once we've grabbed all of the tokens. --- I'm fairly certain this can be improved more, but think it's already a decent improvement on what we have, and lays the groundwork for future tweaks with additional testing. For testing this, I highly recommend checking out this PR, then running `make installbuf`, and in neovim, using the `:Inspect` command on different tokens under the cursor, which should show a "Semantic Tokens" section that contains the semantic token under the cursor.

github-actions · 2026-01-09T21:46:16Z

The latest Buf updates on your PR. Results from workflow Buf CI / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	Jan 12, 2026, 5:47 PM

stefanvanburen · 2026-01-09T21:46:42Z

private/buf/buflsp/semantic_tokens.go

the impl in server.go was getting unwieldy, and the constants only pertained to semantic tokens, so I figured a separate file was reasonable now.

stefanvanburen · 2026-01-09T21:48:25Z

private/buf/buflsp/semantic_tokens.go

+	// Collect all comments and certain keywords that can't be fetched in the IR from token stream
+	for tok := range astFile.Stream().All() {
+		if tok.Kind() == token.Comment {
+			collectToken(tok.Span(), semanticTypeComment, 0, keyword.Unknown)
+		}
+		kw := tok.Keyword()
+		switch kw {
+		// These keywords seemingly are not easy to reach via the IR.
+		case keyword.Option, keyword.Reserved, keyword.To, keyword.Returns:
+			collectToken(tok.Span(), semanticTypeKeyword, 0, kw)
+		}
+	}


originally I had this grabbing all the keywords, but it's too greedy: syntax operators like =, ;, { and } and all considered keywords in the taxa, and it makes the highlighting too confusing (everything becomes bolded as a keyword).

Instead, I tried to be more specific and go through the individual elements below. Again, this could probably be improved; I'm doubtless missing elements that we could probably add.

stefanvanburen · 2026-01-09T21:49:44Z

private/buf/buflsp/semantic_tokens_test.go

+				// Invalid import should not have string token (only resolved imports get string tokens)
+				{6, 12, 13, semanticTypeString, "invalid import path should not have string token"},


There's a test-case with an import to a file that doesn't exist, so the IR doesn't have it; seems fine to me — the alternative is uglier.

stefanvanburen · 2026-01-09T21:50:12Z

private/buf/buflsp/semantic_tokens_test.go

+				{0, 7, 1, semanticTypeKeyword, "'=' should not be keyword"},
+				{0, 17, 1, semanticTypeKeyword, "';' should not be keyword"},
+				{10, 20, 1, semanticTypeKeyword, "'=' should not be keyword"},
+				{10, 26, 1, semanticTypeKeyword, "';' should not be keyword"},
+				{13, 14, 1, semanticTypeKeyword, "'=' should not be keyword"},
+				{13, 17, 1, semanticTypeKeyword, "';' should not be keyword"},


I don't think any of these "operators"(?) ought to be considered keywords; just leaving these negative tests around.

I think it would be too noisy to consider the operators in proto for keyword tokens, since it's mostly = for alignment and ; for breaking decl statements. I like this as-is.

stefanvanburen · 2026-01-09T21:53:48Z

private/buf/buflsp/semantic_tokens_test.go

+				// decorator (option)
+				{10, 9, 10, semanticTypeDecorator, "'deprecated' as decorator"},
+				// built-in type
+				{13, 2, 6, semanticTypeProperty, "'string' as property"},


Need to revisit this; not sure that in string name = 1;, the string is a "property" (name is the property); maybe more just a type or keyword?:

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.18/specification/#textDocument_semanticTokens

I like that idea of having built-in scalars being distinct from types declared for protos, but also, they aren't exactly keywords (e.g. I think bool is a type, and true and false are keywords). Since there isn't really a way to differentiate the types, maybe the scalar types should just be type... based on the spec.

yeah, I think type is reasonable; I wish there was a scalarType or something in the spec - oh well!: 66a7a88

Included the "syntax highlighting" bit because some users may not know what semantic tokens actually does from a feature perspective.

stefanvanburen requested review from doriable and emcfarlane January 9, 2026 21:46

stefanvanburen commented Jan 9, 2026

View reviewed changes

stefanvanburen added 3 commits January 12, 2026 12:39

Use semanticTypeType for built-in types

66a7a88

Merge branch 'main' into svanburen/semantic-tokens-improvement

7f4971f

Add CHANGELOG entry

1b48e34

Included the "syntax highlighting" bit because some users may not know what semantic tokens actually does from a feature perspective.

doriable approved these changes Jan 12, 2026

View reviewed changes

stefanvanburen merged commit c7390bb into main Jan 12, 2026
10 checks passed

stefanvanburen deleted the svanburen/semantic-tokens-improvement branch January 12, 2026 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve LSP semantic tokens #4270

Improve LSP semantic tokens #4270

Uh oh!

stefanvanburen commented Jan 9, 2026

Uh oh!

github-actions bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

stefanvanburen Jan 9, 2026

Uh oh!

stefanvanburen Jan 9, 2026

Uh oh!

stefanvanburen Jan 9, 2026

Uh oh!

stefanvanburen Jan 9, 2026

Uh oh!

doriable Jan 12, 2026

Uh oh!

stefanvanburen Jan 9, 2026

Uh oh!

doriable Jan 12, 2026

Uh oh!

stefanvanburen Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// Invalid import should not have string token (only resolved imports get string tokens)
		{6, 12, 13, semanticTypeString, "invalid import path should not have string token"},

Improve LSP semantic tokens #4270

Improve LSP semantic tokens #4270

Uh oh!

Conversation

stefanvanburen commented Jan 9, 2026

Uh oh!

github-actions bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Jan 9, 2026 •

edited

Loading