-
Notifications
You must be signed in to change notification settings - Fork 8
Process requests in parallel #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is an attempt to fix clojure-lsp/clojure-lsp#1240, where the failure to run requests in parallel leads to small freezes while typing in Emacs. In other editors it manifests differently. In Calva, we've noticed that diagnostics and semantic tokens are delivered more slowly. This commit changes lsp4clj to allow language servers to process and respond to up to 4 requests in parallel. This is a departure from the [spec](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#messageOrdering), which says "Responses to requests should be sent in roughly the same order as the requests appear". However, it is closer to what lsp4j and cloure-lsp were doing before the migration away from lsp4j. In the original implementation, lsp4j [consumed](https://github.com/eclipse/lsp4j/blob/04b0c6112f0a94140e22b8b15bb5a90d5a0ed851/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/RemoteEndpoint.java#L185-L196) messages one at a time, handing them off to the language server. The language server processed each message serially. lsp4clj fully [processed each notification](https://github.com/eclipse/lsp4j/blob/04b0c6112f0a94140e22b8b15bb5a90d5a0ed851/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/RemoteEndpoint.java#L220) before it moved on to the next message. But requests were a little different. Like notifications, requests were handed to the language server in series. But the language server [returned](https://github.com/eclipse/lsp4j/blob/04b0c6112f0a94140e22b8b15bb5a90d5a0ed851/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/RemoteEndpoint.java#L261) a CompletableFuture which could, at the server's option, be incomplete. lsp4j would [arrange](https://github.com/eclipse/lsp4j/blob/04b0c6112f0a94140e22b8b15bb5a90d5a0ed851/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/RemoteEndpoint.java#L279) for the result of that future, whenever it was done being calculated, to be sent back to the client as the response. This let the language server decide how much of the response to calculate (if any) while blocking other requests and notifications, and how much to calculate in another thread, allowing other requests and notifications to be processed in parallel. If the response creation was wrapped in `CompletableFuture/supplyAsync`, as was the case for [almost all](https://github.com/clojure-lsp/lsp4clj/blob/fe8782941fd871921d55349b9f7803ac0dc8595d/server/src/lsp4clj/core.clj#L100) of clojure-lsp's requests, then the response would be calculated in the "common" ForkJoin thread pool. This thread pool has a configurable parallelism (on my machine it's 3). The end result is that clojure-lsp processed most of its requests in parallel—up to 3 at a time. I didn't understand this arrangment as I was migrating lsp4clj away from lsp4j. I thought that lsp4j procssed every request in series, so that it could follow the spec's guidance. I followed that pattern, making all the requests and notifications serial. (Actually I started with an implementation that distributed both requests and notifications to the server in parallel. That meant the server processed requests in parallel but also that it received message out of order. That parallelism was removed in 6996fdb.) Technically, the strictly serial implementation works, but it causes the lagginess and delays observed in #1240. So, this patch is an attempt to restore the earlier behavior, though with a different implementation. In this implementation lsp4clj continues to read notifications and requests in series, and passes them to the language server with lsp4clj.server/receive-notification and lsp4clj.server/receive-request. The bodies of these functions are executed serially. Before this commit (though after the migration), the return value of receive-request was used as the response, which meant that each response was fully calculated serially. But now the language server can, optionally, return a deref-able object from receive-request. If it does, the deref-able will be derefed in a thread pool of size 4. So, a language server can indicate to lsp4clj that it would like a request to be processed in parallel with other requests by, for example, returning a `delay`. Of course, if it wants a request to block other requests, it can still do that by calculating the full response synchronously in receive-request. Or it can do some of both, calculating part of a response, then wrapping the rest of the calculation in a delay. There were four places where clojure-lsp processed responses synchronously (by using `CompletableFuture/completedFuture` instead of `CompletableFuture/supplyAsync`): `initialize`, `shutdown`, `rangeFormatting` and `executeCommand`. I think it's important that clojure-lsp continues to process the `initialize` request before it allows others through. But the others could be re-considered someday: * `executeCommand`. On the clojure-lsp side, `executeCommand` starts a `future` and returns nil. The only difference that I see from using `supplyAsync` (or now `delay`) is that the client sees a `nil` response before the command is finished executing. If this isn't important—if the client would be content with the command executing asynchronously, but getting a nil only after it had finished—then `executeCommand` could use `delay`. * `rangeFormatting`. On the clojure-lsp side, rangeFormatting is an anomaly. It calculates its response synchronously, at the exclusion of other requests. On top of this, unlike every other request, it sets and releases a global lock, presumably as a way to double-check that if multiple rangeFormatting requests come in, the later ones are dropped. I'm not sure why this happens—it seems like a bug fix from long ago, though I haven't tracked down which bug it was. It seems odd though—by blocking other requests, there can't be other simultaneous rangeFormatting requests, so isn't the lock useless? Anyway, I don't currently see any reason why this couldn't be asynchronous (especially since it already uses `process-after-changes`) though it would need more research. * `shutdown`. The clients aren't likely to send requests or notifications after sending `shutdown`, so I don't see a good reason for it to be synchronous, though it probably doesn't hurt. I've tested this with clojure-lsp, wrapping most of its requests in `delay`s. This does fix #1240, so I think it should be merged. However, in the long run I'm not sure mimicking the old behavior is an ideal solution. Part of the reason we're seeing these delays is because we're processing every request fully, even if it's become out of date. The client is capable of indicating that it no longer needs a response by sending a $/cancelRequest, but we're ignoring those. We also don't stop an in-progress analysis if we've learned that the text of a file has changed since the analysis was started. Fixing that kind of thing might make these spec-bending compromises less necessary.
Uses clojure-lsp/lsp4clj#25 to process request in parallel, fixing typing lag and other performance problems documented in #1240. Fixes #1240.
Uses clojure-lsp/lsp4clj#25 to process request in parallel, fixing typing lag and other performance problems documented in #1240. Fixes #1240.
Uses clojure-lsp/lsp4clj#25 to process request in parallel, fixing typing lag and other performance problems documented in #1240. Fixes #1240.
|
@mainej Can you walk through how you would expect these two async commands to be processed? Is the client expected to queue requests?
|
|
Thanks for the research and good explanation
I think we could change to return nil only after executing, needs a little bit of testing if that affects clients in any way though.
yeah, it was certainly related to bugs, but I can't recall what exactly. I know we had issues in the past where rangeFormatting take a long time and we block the editor with a simple paredit slurp, even so, it doesn't seem related to locking or not, I do think we could think about removing it but would be good to research indeed.
Yeah, I'd keep sync
Totally agreed, we never had fully support for cancelRequest but is something we should consider it indeed, canceling kondo calls may help a lot as you mentioned. |
ericdallo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the stages explanation, it really helps understand
|
@mainej I gave a try with clojure-lsp and it does feel faster than the latest release, but still blocking emacs lsp-mode, I know the block on the editor is another issue to be solved probably on lsp-mode, but my point is that it doesn't block at all on previous release (lsp4j), so maybe we are not handling that parallel as before? One thing to notice is that I do get diagnostics a little bit faster than lsp4j clojure-lsp, but the tradeoff probably is not worth it. How I am testing:
ComparassionPrevious release (using lsp4j) my input is never blocked, I on purpose stop typing x 3 times to see how much time it takes for diagnostics appear: latest release (master) I never stopped inputting x, it's really lagging, I only stop at the last This branch Same as the latest release, but a little bit faster I do think this PR improves but not as before lsp4clj v1, I know the client should not block, but I wonder what we are doing differently that lsp4j was processing "faster"? I'm ok merging this PR too as an improvement, LMK what you think and thanks for the help, LMK if you need any help with any more testing/debugging! |
This is the same level of parallelism that clojure-lsp was using for requests and notifications before the migration away from lsp4j. I believe this parallelism is `# of cores - 1` by default, so this should be a better setting for users with more cores.
|
Thanks for testing @ericdallo. The improvement you're seeing is so small that IMO this PR isn't worth merging yet. Thanks for making the videos. They illustrate the problem well. Unfortunately, I can't reproduce anything like what you're seeing, despite having fewer cores than you. I started making videos to show you, but they all look like your first video. So, to continue making progress, I think I'll need assistance debugging. Would you help me with that? I've pushed updates to both repos that will enable tracing. Will you make a build with those, then send me the logs created from the moment you start typing |
|
Sure @mainej, one question, do you have semantic tokens enabled as well? Edit: I tested disabling semantic toekns, same issue, so no need to test without it |
@snoe as you're aware, version is synced with the client in The new code does the same thing—both notifications and requests are processed in series, and parallelism is only introduced if clojure-lsp requests it. I tried to explain the reasons why and the mechanics of how this works in the PR description, the code comments, and the README. Please let me know if you have suggestions for how to re-word things to make it clearer.
Renames are and always have been difficult to analyze, IMO. Quickly doing two renames in a row has always been buggy. The bugginess is unrelated to any parallelism at the lsp4clj level—it has to do with the fact that prepareRename doesn't So, I'm not sure what we're going to get out of studying rename as an example. It's going to be too hard to reason about whether we've fixed, introduced, or are continuing to see buggy behavior. I think your point though is this: Suppose a client issues two commands that would both change a document. If we process them in parallel, isn't there a race condition where they could both use the same "initial" state, then clobber the document as they each suggest different edits? The answer is, yes, in theory. But, I'm not worried for two reasons:
Together these make me think this is a hypothetical problem, but not a practical one.
No, we can't demand or expect any changes from the clients. |
|
@mainej sorry for the huge log, I tried to keep as minimal as possible getting only logs after inputting x. I didn't record the gif but it's exactly as my previous gif, so don't know if that would make any difference One thing I noticed is that the slowness doesn't happen with small buffers, so it's like the issue would be related with didChange/analysis with clj-kondo locking the uri? not sure. |
|
@ericdallo, yes, I have semantic tokens—and code lens—enabled. |
I guess I'm trying to figure out how lsp4j behaved in this way but didn't seem to exhibit this parallelism and make sure that some sequencing wasn't missed. My experience, and it sounds like you thought the same, was that |
You may be right, but I'm not 100% sure that's true. clojure-lsp gave supplyAsync a Supplier. That always happened in the same thread. But the documentation for supplyAsync doesn't say whether it's obligated to call
@ericdallo as I read the lsp4j code, that's not correct. That executor service you linked to is used for something else—starting the input stream reader thread. It's passed to a ConcurrentMessageProcessor, which uses it to start one and only one thread that reads from the input stream. In fact, to answer your question @snoe, lsp4j doesn't arrange to use a forkjoin thread pool at all. Instead, it was lsp4clj that put the request processing in a forkjoin pool via supplyAsync—the common pool, which is the default when you use supplyAsync. That is, lsp4j uses the CompletableFuture api to configure things so that whether or not the future is executed in a separate thread, then the result is put on the output. But lsp4j doesn't manage any request-processing threads itself, or even require that any asynchronous facilities be used at all. As far as it is concerned, its users have complete control over whether anything is run asynchronously. Its users can return everything in a Does that make sense how by using supplyAsync the asynchrony is our "fault", and lsp4j was only allowing that to happen? @ericdallo thanks for the logs. From them I realized that you have I added more logging to see where lsp4clj v1 is spending time. It appears that with the lower To be honest I don't understand why didn't have theses same problems before. The common ForkJoinPool has a limited amount of threads, so it seems like it would have had the same problem of the pool filling up with I pushed the logging and
I'm feeling this way too. From one perspective this PR is an attempt to replicate the CompletableFuture behavior using core.async. But it doesn't seem to be going very well, and my patience for it is waning. CompletableFuture has an interesting property which I don't know how to replicate in "pure" Clojure. When you create a CompletableFuture whose value will be calculated in another thread, you can also attach other code (via Or at least that's my current thought about why the new code isn't behaving like the CompletableFuture code. My attempts to fix that seem like adding threads on top of threads on top of threads, trying to match the sequencing and asynchrony we had before. So, I think I'll experiment with making lsp4clj v1 use CompletableFutures. That is, I want clojure-lsp to create and return CompletableFutures to lsp4clj, so that lsp4clj can do that lsp4 was doing. That'll remove one big difference between the current code and how lsp4j was doing things. It may not be exactly equivalent, but it'll at least get us a lot closer. How does that sound? Note that I'm not saying I want to remove core.async entirely. It's still a fine model for reading-from-input and writing-to-output. @ericdallo I've suggested the possibility of using |
Oh, you are right, that code is tricky, but makes sense!
Yes, good catch!
Good point, those settings are a tradeoff of smooth UI vs performance, but that used to work very well with old clojure-lsp and other LSPs
Sounds good, I'll test it and let you know!
It'd be really good to know if that difference would really "fix" the problem, so if you wanna try that, please do, it's a pity we would stop using core.async for that, but I'm fine too,
Yeah, I think it's clear now to me how it'd help us, so I'm ok giving it a try :) |
|
I'd also vote for promesa, since it gets us closer to a possible cljs version, where core.async can be problematic. |
|
@ericdallo I've pushed updates to both branches. The current code tries to stay more faithful to lsp4j. Please let me know how performance is for you. Since your last test I switched over to using CompletableFuture. It worked as I had planned, but the performance was (in hindsight, predictably) terrible, even worse than what you and @borkdude were seeing originally. After you stopped typing it would usually be a few seconds before lint and semantic tokens caught up. Eventually I figured out the slow down. When you start typing, the client sends a I knew lsp4j must be doing something to avoid this, but it took awhile to figure out what. I was surprised to realize that lsp4j does handle lsp4j handles the For me there are still times that the pool fills up (because my common pool has only 3 threads). This manifests as a non- Anyway, I'm hopeful you'll see the same improvements I have. Let me know how it goes. |
Uses clojure-lsp/lsp4clj#25 to process request in parallel, fixing typing lag and other performance problems documented in #1240. Fixes #1240.
|
Amazing deep dive @mainej, I'm quite surprised about cancelRequest as well and I can see how that was happening indeed, everything makes sense now, I tested the latest changes and I can confirm it's working the same as was with lsp4j, really nice debugging and explanation! 🎉
Sounds great, it seems process-after-changes current implementation could be improved indeed, sounds like a good high priority for a future release. I will review both PRs, thank you very much for the huge help on this! |
Uses clojure-lsp/lsp4clj#25 to process request in parallel, fixing typing lag and other performance problems documented in #1240. Fixes #1240.
|
That's great @ericdallo! This PR needed one small change to restore some tracing, which I just pushed. The clojure-lsp PR will need a bit more work. It'll have to be pinned to a release of lsp4clj, and I'll need to update the integration test client. |
deps.edn
Outdated
| camel-snake-kebab/camel-snake-kebab {:mvn/version "0.4.3"} | ||
| cheshire/cheshire {:mvn/version "5.11.0"} } | ||
| cheshire/cheshire {:mvn/version "5.11.0"} | ||
| funcool/promesa {:mvn/version "8.0.450"} } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see we using p/all for clojure-lsp code actions itself instead of bunch of futures, not sure what you think and the tradeoff considering start using promessa there just for that.
We could also consider using on process-after-changes to have a more robust/simple mechanism using a executor service for that as you mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using p/let for clojure-lsp code actions itself instead of bunch of futures
Yeah, that might make sense. (I think you'd need p/plet, but I see what you mean.) I'm not sure there's much difference—both use a thread pool. The future version has the advantage that you can start creating the actions before all the futures are finished. But the promesa version would probably look a bit cleaner.
We could also consider using on process-after-changes to have a more robust/simple mechanism using a executor service
That's a great idea.
| (if (instance? CancellationException e) | ||
| (cancellation-response resp req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 promesa really made it easier to understand :)
Feel free to merge/LMK when I can tag a new minor
432d60f to
d762cc6
Compare
|
@ericdallo merged. I have to sign off for the night, so no rush on making a release. |
|
Promesa rules! Let me know when/if I can test a binary locally and where I can get it. |
* Process requests in parallel Uses clojure-lsp/lsp4clj#25 to process request in parallel, fixing typing lag and other performance problems documented in #1240. Fixes #1240. * Turn on tracing temporarily * Process requests in the future threadpool, not the pipeline-blocking threadpool * Revert to running requests in common ForkJoin pool * Turn off tracing * Fix lint * Bump lsp4clj Co-authored-by: Eric Dallo <ericdallo06@hotmail.com>



This is an attempt to fix clojure-lsp/clojure-lsp#1240, where the failure to run requests in parallel leads to small freezes while typing in Emacs. In other editors it manifests differently. In Calva, we've noticed that diagnostics and semantic tokens are delivered more slowly.
This commit changes lsp4clj to allow language servers to process and respond to up to 4 requests in parallel. This is a departure from the spec, which says "Responses to requests should be sent in roughly the same order as the requests appear". However, it is closer to what lsp4j and cloure-lsp were doing before the migration away from lsp4j.
In the original implementation, lsp4j consumed messages one at a time, handing them off to the language server. The language server processed each message serially.
lsp4j fully processed each notification before it moved on to the next message.
But requests were a little different. Like notifications, requests were handed to the language server in series. But the language server returned a CompletableFuture which could, at the server's option, be incomplete. lsp4j would arrange for the result of that future, whenever it was done being calculated, to be sent back to the client as the response. This let the language server decide how much of the response to calculate (if any) while blocking other requests and notifications, and how much to calculate in another thread, allowing other requests and notifications to be processed in parallel.
If the response creation was wrapped in
CompletableFuture/supplyAsync, as was the case for almost all of clojure-lsp's requests, then the response would be calculated in the "common" ForkJoin thread pool. This thread pool has a configurable parallelism (on my machine it's 3). The end result is that clojure-lsp processed most of its requests in parallel—up to 3 at a time in my case.I didn't understand this arrangement as I was migrating lsp4clj away from lsp4j. I thought that lsp4j processed every request in series, so that it could follow the spec's guidance. I followed that pattern, making all the requests and notifications serial. (Actually I started with an implementation that distributed both requests and notifications to the server in parallel. That meant the server processed requests in parallel but also that it received message out of order. That parallelism was removed in 6996fdb.)
Technically, the strictly serial implementation works, but it causes the lagginess and delays observed in #1240. So, this patch is an attempt to restore the earlier behavior, though with a different implementation.
In this implementation lsp4clj continues to read notifications and requests in series, and passes them to the language server with lsp4clj.server/receive-notification and lsp4clj.server/receive-request. The bodies of these functions are executed serially. Before this commit (though after the migration), the return value of receive-request was used as the response, which meant that each response was fully calculated serially. But now the language server can, optionally, return a deref-able object from receive-request. If it does, the deref-able will be derefed in a thread pool of size 4. So, a language server can indicate to lsp4clj that it would like a request to be processed in parallel with other requests by, for example, returning a
delay. Of course, if it wants a request to block other requests, it can still do that by calculating the full response synchronously in receive-request. Or it can do some of both, calculating part of a response, then wrapping the rest of the calculation in a delay.There were four places where clojure-lsp processed responses synchronously (by using
CompletableFuture/completedFutureinstead ofCompletableFuture/supplyAsync):initialize,shutdown,rangeFormattingandexecuteCommand. I think it's important that clojure-lsp continues to process theinitializerequest before it allows others through. But the others could be re-considered someday:executeCommand. On the clojure-lsp side,executeCommandstarts afutureand returns nil. The only difference that I see from usingsupplyAsync(or nowdelay) is that the client sees anilresponse before the command is finished executing. If this isn't important—if the client would be content with the command executing asynchronously, but getting a nil only after it had finished—thenexecuteCommandcould usedelay.rangeFormatting. On the clojure-lsp side,rangeFormattingis an anomaly. It calculates its response synchronously, at the exclusion of other requests. On top of this, unlike every other request, it sets and releases a global lock, presumably as a way to double-check that if multiple rangeFormatting requests come in, the later ones are dropped. I'm not sure why this happens—it seems like a bug fix from long ago, though I haven't tracked down which bug it was. It seems odd though—by blocking other requests, there can't be other simultaneous rangeFormatting requests so isn't the lock useless? Anyway, I don't currently see any reason why this couldn't be asynchronous (especially since it already usesprocess-after-changes) though it would need more research.shutdown. The clients aren't likely to send requests or notifications after sendingshutdown, so I don't see a good reason for it to be synchronous, though it probably doesn't hurt.I've tested this with clojure-lsp, wrapping most of its requests in
delays. This does fix #1240, so I think it should be merged. However, in the long run I'm not sure mimicking the old behavior is an ideal solution. Part of the reason we're seeing these delays is because we're processing every request fully, even if it's become out of date. The client is capable of indicating that it no longer needs a response by sending a $/cancelRequest, but we're ignoring those. We also don't stop an in-progress analysis if we've learned that the text of a file has changed since the analysis was started. Fixing that kind of thing might make these spec-bending compromises less necessary.