Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification for the Indexing workflow #54

Closed
kaloyan-raev opened this issue Aug 23, 2016 · 17 comments
Closed

Clarification for the Indexing workflow #54

kaloyan-raev opened this issue Aug 23, 2016 · 17 comments

Comments

@kaloyan-raev
Copy link

kaloyan-raev commented Aug 23, 2016

Many languages require building an index database for an efficient calculation of code completion, validation, highlighting, etc.

The language server protocol currently does not describe how the communication between the client and the server should happen. Thus, every server-client pair take their own approach. For example the Crane PHP language server uses custom message requests and notifications, while the Eclipse Java language server seems to trigger the indexing as part of the initialize request.

Please, consider specifying:

  • How the client request the server to (re-)build the index.
  • How the client tells the server which files are included and excluded for indexing.
  • How the server reports progress to the client.
  • How the server notifies the client that the index is complete.
  • How the server notifies the client for any errors.
@felixfbecker
Copy link

Isn't this covered by didChange, didChangeWatchedFiles etc? Should the client really care about how the server implements the index?

@kaloyan-raev
Copy link
Author

I guess some existing part of the protocol can be used.

Let's take as example the question "When the language server should start indexing the project?". A possible answer is "The indexing should be triggered when the client sends the initialize request.". So far so good. It still needs to be documented, so all clients and servers know this is the way to do it.

But, what if the client needs to send the server some configuration related to the indexing. For example, some clients may want to include/exclude some folders from the indexer. The didChangeConfiguration notification goes after the initialize request, which is late (the indexing is already going). VSCode has an additional initializationOptions as part of InitializeParams, but it is not documented in the protocol. See #55.

How should the settings related to the indexing should look like? There should be some JSON object defined.

Then how the server should report the indexing progress to the client? Via logMessage or showMessage notifications? Or something else?

How should the client request a full index rebuild? I don't find an appropriate existing request/notification in the protocol.

@felixfbecker
Copy link

@kaloyan-raev I'm not so sure if the server should really try to impose one model here that will fit all language servers. As long as the protocol provides all the tools I prefer to leave the server the freedom to handle this. For example, a server may not even need to index the whole project at startup. It could just do it lazily (that's what I wanted to do with the PHP language server). When a file is opened, it is parsed. When a provideCompletion is triggered at a specific cursor position, it would resolve the type before the cursor, which might be in another file. It would resolve the file and parse it aswell, adding it to what you refer to as the "index", or simply an AST cache. On a didChange request, the AST would also get re-parsed. There would be no need to exclude files because it would only ever parse files which are actually interesting. Maybe I'm missing something, but is it really needed to "index" the whole project at startup?

@kaloyan-raev
Copy link
Author

@felixfbecker what you describe may work in a local context, e.g. when you have an object already typed in and the server has to return a proposal list with the object's properties and methods.

But what about the global context? Imagine you start typing a function name (or class/interface/trait/namespace name). After entering the first 1-2-3 characters you invoke the code completion. In this case the server has to return a proposal list with all global functions, variables, constants, classes, interfaces, namespaces, etc. Probably, filtered with what you have already typed in the editor.

How do you build this list without parsing all the PHP files in the project, including the composer dependencies?

@akosyakov
Copy link
Contributor

@kaloyan-raev Could not you do full indexing on initialize request and update it on didChangeWatchedFiles requests? I would prefer to keep a protocol as simple as possible without exposing implementation details of a particular language servers, as indexing.

But documentation can be indeed better in regard to expectations after sending one or another request.

BTW I think you always can add a custom UI action via your language VS Code extension to send full build request.

@kaloyan-raev
Copy link
Author

@akosyakov Yes, this would be possible with the limitation of issue #55 I mentioned above.

The point of opening this issue was to clarify the indexing workflow. If everything can be achieved with the existing protocol capabilities then this is perfect. But it still needs to be documented, because I already see that the few language servers available take completely different approaches. See my initial comment for details.

If many languages require indexing for providing good code completion then we should have the indexing described as part of the protocol to avoid the unnecessary different custom implementations in language servers. This would make it easier to adopt language servers in different IDE, which is the whole point of having this protocol.

@akosyakov
Copy link
Contributor

@kaloyan-raev What's so bad about that different servers take different implementation approaches? If at the end they confirm to the protocol it is OK. It's even good to have such flexibility on the server side and keep IDE (tool) side simple.

I agree that documentation should be improved but not in terms of implementation details, e.g. a server should index on a change file notification, but in terms of client's assumptions, e.g. if a client send a file change notification then the client expects to get diagnostics notifications for files affected by this change, basically in terms of the protocol itself. How the server computes affected files and validates them is not really matter from the client side. It can start indexing during initialization, do it on demand first time or maybe it even does not need an index at all.

@kaloyan-raev
Copy link
Author

@akosyakov It's bad. To the point that it makes the idea of language server protocol useless.

Let me give an example.

We have two different language server implementations (A and B) for the same programming language. Both implement indexing, but using different protocol with the client:

  • A starts the indexer on the initialize request and reports progress to the client via a custom indexing/progress notification.
  • B waits for a custom indexing/start request from the client and reports progress back via the standard window/showMessage notification.

Now, let's have a tool (e.g. VS Code) that uses implementation A. If the tool switches to implementation B then indexing will stop working, because the client won't send the custom indexing/start notification. A change is required in the client to send this message. And another change to display the progress reporting properly.

If both implementations used a common way to implement the client/server communication regarding indexing, then no change on the client would be required. Which is the great flexibility that the language server protocol gives.

Any custom message on top of the protocol introduces tight coupling between the client and the server. Which makes it difficult to reuse language servers across IDEs.

Let's look at the "Indexing" topic as a use case of the protocol. We should have a clear definition how this use case should utilize the protocol and identify any gaps (like #55). Then the more language servers (which implement indexing) and IDE tools follow it, the more reusable they will be.

@felixfbecker
Copy link

@kaloyan-raev you are assuming that custom messages are used, I assumed the opposite. No language server I know takes such a long time to index that a progress would be rectified, and not IDE implementation I know displays some kind of progress that comes from a custom message. So what you are asking for is actually a way for a server to show progress on indexing, which is a feature request for an extension of the protocol. As long as the protocol does not define any message, I would just let the indexing happen in the background without any progress reports. Do you have any data that backs the need for a progress event?

@kaloyan-raev
Copy link
Author

No language server I know takes such a long time to index that a progress would be rectified, and not IDE implementation I know displays some kind of progress that comes from a custom message.

Take a look again at the HvyIndustries/crane language server.

I tried to refactor (PR here) the client/server communication as much as possible, but still needed to use a handful of custom messages not defined by the protocol.

Do you have any data that backs the need for a progress event?

Eclipse PDT, Zend Studio, PhpStorm and VSCode Crane Extension clearly display the progress of the indexing to the user. I haven't checked other IDEs.

Having an index and displaying the progress of building it comes from long years of experience working with projects of average and big sizes. The index provides faster IntelliSense. However, building the index is time consuming (may take several minutes on big projects). Thus users are confused if IntelliSense does not work as expected shortly after opening the project. Displaying the indexing progress resolves this usability issue.

@felixfbecker
Copy link

@kaloyan-raev You seem to have more experience with this than me. I only know that the TypeScript language server for example takes a few seconds until ready. When you hover over a symbol in the meantime, it will display Loading... in the hover for a few seconds, then replace it with the actual result. I never found that experience confusing.

I would welcome any contribution from you to php-language-server on how you would implement indexing.

@akosyakov
Copy link
Contributor

akosyakov commented Sep 13, 2016

@kaloyan-raev it's bad when one customizes the protocol and assumes that everybody should confirm to his customization. It makes clients and implementations of such customized protocol useless.

If A used standard window/showMessage and B used initialize there is no problem here. Both of them make the same mistake instead of understanding and leveraging the protocol first, they changed it from the start.

I am not opposed to the idea of extending the protocol, some languages and front-ends can provide more capabilities, e.g. you are right some IDEs, like IntelliJ and Eclipse, can report about progress. But as a server implementor if i make a minimal usable version for the protocol as it is, i will cover more clients, after that i can introduce customization for clients which can handle them. The same true for a tool implementor. I do think both A and B should reconsider their approaches and first make sure that their work without customizations.

I have 2 questions on my mind regarding extensibility:

  • Which capabilities should be common and defined in the protocol by default and which should be provided as extensions to it? Currently the answer is what VS Code can handle. The better is information hiding that basically says to make something stable don't expose changeable in it. For instance, you say that there are should be API to notify about progress and manage the index, because some IDEs have it, but there are examples which can work without such API, e.g. TypeScript, JSON plugins, Xtext Language Server, so it is a subject to change and to keep the protocol stable, easy to understand and make minimal use of it, i would prefer don't have such APIs exposed by default.
  • How the protocol should be customized/extended? It should not be done ad hoc but it should be possible, so i would imagine that:
    • the protocol should document typical extension points, e.g. ClientCapabilities can be extended to inform about additional capabilites, as progress notifications
    • different language dependent SDKs should allow such extension points and provide a way to define extensions
    • if a server/client implementor makes use of them he should keep in mind that a minimal version should work without them also, to not end up in the situation as A and B did

@dbaeumer
Copy link
Member

Long and great discussion! Hard to answer all aspects of the discussion so I will focus on the index case. My thinking about indexing is as follows:

  • the protocol should not define how indexing works. This will be very hard. I worked on integrating the tsserver, OmniSharp and Java Language Server into VSCode and it is hard to standardize that. For example the tsserver reads all files into memory which doesn't need a separate index whereas Java builds one. Keep this 'open' adds flexibility.
  • I fully agree we should have a document describing typically implementation patterns (e.g. How do you best do indexing in a language server and when).
  • from all the discussion we had lately with server implementers I am all in for adding a 'progress notification' to the protocol. The open question here is whether progress should be per request or whether we have a global progress notification. I am actually leaning towards adding both. Thoughts?
  • fully agree that the client capabilities should always be optional. If a client capability is mandatory we need to add it to the protocol itself.

@dbaeumer dbaeumer modified the milestone: 3.0 Sep 19, 2016
@dbaeumer
Copy link
Member

Opened #70

@mbana
Copy link

mbana commented Jan 24, 2017

hope i'm not too late to this discussion :)

Merkle tree also used in the BitTorrent protocol.
i think if this is used in a smart way might lead to a lot of performance improvements.
it does fit nicely with the files in an IDE model as well, i.e., a tree-like structure.

addressing the original questions:

"How the client request the server to (re-)build the index."

Then, the received hash tree is checked against the trusted top hash, and if the hash tree is damaged or fake, another hash tree from another source will be tried until the program finds one that matches the top hash.

refetch when the top-hash changes, or do as detailed in the algorithm for a more efficient refetch of the tree.

"How the client tells the server which files are included and excluded for indexing."

not sure i fully understand this, but i think using a hash that is derived from the contents of the file, say, will be an indicator of the files open.

not sure if i missed the discussion about this but has there been any mention a different way of doing a textDocument/didOpen? for instance, instead of sending the contents of the file upon a subsequent init, send a hash of the contents along with the file name - or whatever fits the bill. the language server will probably need to indicate what hashing algorithm it's using for the client uses that as well.

not sure if different files that have the same contents, i.e., hash to the same value, should be delt with. e.g., a duplicate file somewhere else in the tree.

how crazy does this all sound to everyone?

@ljw1004
Copy link
Contributor

ljw1004 commented Apr 6, 2017

Just to add another datapoint - for our languages Flow and Hack, and our workloads of many thousands of files in a project, we can't afford to do indexing on demand. Instead we spin up a singleton "language service" on the machine that keeps an index always up to date. Our LSP server merely connects to the existing language service for a given project.

(If the LSP server is launched but there doesn't yet exist a currently-running machine-wide language service for that project, then the LSP server will cause one to spin up. That takes 1-20 minutes depending on project size. We certainly do need some kind of progress report in the protocol!)

@dbaeumer
Copy link
Member

Version 3.15 will have support for progress reporting. I will close the issue as out of scope.

@vscodebot vscodebot bot locked and limited conversation to collaborators Dec 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants