Skip to content

Conversation

@mainej
Copy link
Contributor

@mainej mainej commented Sep 20, 2022

This is an attempt to fix clojure-lsp/clojure-lsp#1240, where the failure to run requests in parallel leads to small freezes while typing in Emacs. In other editors it manifests differently. In Calva, we've noticed that diagnostics and semantic tokens are delivered more slowly.

This commit changes lsp4clj to allow language servers to process and respond to up to 4 requests in parallel. This is a departure from the spec, which says "Responses to requests should be sent in roughly the same order as the requests appear". However, it is closer to what lsp4j and cloure-lsp were doing before the migration away from lsp4j.

In the original implementation, lsp4j consumed messages one at a time, handing them off to the language server. The language server processed each message serially.

lsp4j fully processed each notification before it moved on to the next message.

But requests were a little different. Like notifications, requests were handed to the language server in series. But the language server returned a CompletableFuture which could, at the server's option, be incomplete. lsp4j would arrange for the result of that future, whenever it was done being calculated, to be sent back to the client as the response. This let the language server decide how much of the response to calculate (if any) while blocking other requests and notifications, and how much to calculate in another thread, allowing other requests and notifications to be processed in parallel.

If the response creation was wrapped in CompletableFuture/supplyAsync, as was the case for almost all of clojure-lsp's requests, then the response would be calculated in the "common" ForkJoin thread pool. This thread pool has a configurable parallelism (on my machine it's 3). The end result is that clojure-lsp processed most of its requests in parallel—up to 3 at a time in my case.

I didn't understand this arrangement as I was migrating lsp4clj away from lsp4j. I thought that lsp4j processed every request in series, so that it could follow the spec's guidance. I followed that pattern, making all the requests and notifications serial. (Actually I started with an implementation that distributed both requests and notifications to the server in parallel. That meant the server processed requests in parallel but also that it received message out of order. That parallelism was removed in 6996fdb.)

Technically, the strictly serial implementation works, but it causes the lagginess and delays observed in #1240. So, this patch is an attempt to restore the earlier behavior, though with a different implementation.

In this implementation lsp4clj continues to read notifications and requests in series, and passes them to the language server with lsp4clj.server/receive-notification and lsp4clj.server/receive-request. The bodies of these functions are executed serially. Before this commit (though after the migration), the return value of receive-request was used as the response, which meant that each response was fully calculated serially. But now the language server can, optionally, return a deref-able object from receive-request. If it does, the deref-able will be derefed in a thread pool of size 4. So, a language server can indicate to lsp4clj that it would like a request to be processed in parallel with other requests by, for example, returning a delay. Of course, if it wants a request to block other requests, it can still do that by calculating the full response synchronously in receive-request. Or it can do some of both, calculating part of a response, then wrapping the rest of the calculation in a delay.

There were four places where clojure-lsp processed responses synchronously (by using CompletableFuture/completedFuture instead of CompletableFuture/supplyAsync): initialize, shutdown, rangeFormatting and executeCommand. I think it's important that clojure-lsp continues to process the initialize request before it allows others through. But the others could be re-considered someday:

  • executeCommand. On the clojure-lsp side, executeCommand starts a future and returns nil. The only difference that I see from using supplyAsync (or now delay) is that the client sees a nil response before the command is finished executing. If this isn't important—if the client would be content with the command executing asynchronously, but getting a nil only after it had finished—then executeCommand could use delay.
  • rangeFormatting. On the clojure-lsp side, rangeFormatting is an anomaly. It calculates its response synchronously, at the exclusion of other requests. On top of this, unlike every other request, it sets and releases a global lock, presumably as a way to double-check that if multiple rangeFormatting requests come in, the later ones are dropped. I'm not sure why this happens—it seems like a bug fix from long ago, though I haven't tracked down which bug it was. It seems odd though—by blocking other requests, there can't be other simultaneous rangeFormatting requests so isn't the lock useless? Anyway, I don't currently see any reason why this couldn't be asynchronous (especially since it already uses process-after-changes) though it would need more research.
  • shutdown. The clients aren't likely to send requests or notifications after sending shutdown, so I don't see a good reason for it to be synchronous, though it probably doesn't hurt.

I've tested this with clojure-lsp, wrapping most of its requests in delays. This does fix #1240, so I think it should be merged. However, in the long run I'm not sure mimicking the old behavior is an ideal solution. Part of the reason we're seeing these delays is because we're processing every request fully, even if it's become out of date. The client is capable of indicating that it no longer needs a response by sending a $/cancelRequest, but we're ignoring those. We also don't stop an in-progress analysis if we've learned that the text of a file has changed since the analysis was started. Fixing that kind of thing might make these spec-bending compromises less necessary.

This is an attempt to fix
clojure-lsp/clojure-lsp#1240, where the
failure to run requests in parallel leads to small freezes while typing
in Emacs. In other editors it manifests differently. In Calva, we've
noticed that diagnostics and semantic tokens are delivered more slowly.

This commit changes lsp4clj to allow language servers to process and
respond to up to 4 requests in parallel. This is a departure from the
[spec](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#messageOrdering),
which says "Responses to requests should be sent in roughly the same
order as the requests appear". However, it is closer to what lsp4j and
cloure-lsp were doing before the migration away from lsp4j.

In the original implementation, lsp4j
[consumed](https://github.com/eclipse/lsp4j/blob/04b0c6112f0a94140e22b8b15bb5a90d5a0ed851/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/RemoteEndpoint.java#L185-L196)
messages one at a time, handing them off to the language server. The
language server processed each message serially.

lsp4clj fully [processed each
notification](https://github.com/eclipse/lsp4j/blob/04b0c6112f0a94140e22b8b15bb5a90d5a0ed851/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/RemoteEndpoint.java#L220)
before it moved on to the next message.

But requests were a little different. Like notifications, requests were
handed to the language server in series. But the language server
[returned](https://github.com/eclipse/lsp4j/blob/04b0c6112f0a94140e22b8b15bb5a90d5a0ed851/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/RemoteEndpoint.java#L261)
a CompletableFuture which could, at the server's option, be incomplete.
lsp4j would
[arrange](https://github.com/eclipse/lsp4j/blob/04b0c6112f0a94140e22b8b15bb5a90d5a0ed851/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/RemoteEndpoint.java#L279)
for the result of that future, whenever it was done being calculated, to
be sent back to the client as the response. This let the language server
decide how much of the response to calculate (if any) while blocking
other requests and notifications, and how much to calculate in another
thread, allowing other requests and notifications to be processed in
parallel.

If the response creation was wrapped in `CompletableFuture/supplyAsync`,
as was the case for [almost
all](https://github.com/clojure-lsp/lsp4clj/blob/fe8782941fd871921d55349b9f7803ac0dc8595d/server/src/lsp4clj/core.clj#L100)
of clojure-lsp's requests, then the response would be calculated in the
"common" ForkJoin thread pool. This thread pool has a configurable
parallelism (on my machine it's 3). The end result is that clojure-lsp
processed most of its requests in parallel—up to 3 at a time.

I didn't understand this arrangment as I was migrating lsp4clj away from
lsp4j. I thought that lsp4j procssed every request in series, so that it
could follow the spec's guidance. I followed that pattern, making all
the requests and notifications serial. (Actually I started with an
implementation that distributed both requests and notifications to the
server in parallel. That meant the server processed requests in parallel
but also that it received message out of order. That parallelism was
removed in 6996fdb.)

Technically, the strictly serial implementation works, but it causes the
lagginess and delays observed in #1240. So, this patch is an attempt to
restore the earlier behavior, though with a different implementation.

In this implementation lsp4clj continues to read notifications and
requests in series, and passes them to the language server with
lsp4clj.server/receive-notification and lsp4clj.server/receive-request.
The bodies of these functions are executed serially. Before this commit
(though after the migration), the return value of receive-request was
used as the response, which meant that each response was fully
calculated serially. But now the language server can, optionally, return
a deref-able object from receive-request. If it does, the deref-able
will be derefed in a thread pool of size 4. So, a language server can
indicate to lsp4clj that it would like a request to be processed in
parallel with other requests by, for example, returning a `delay`. Of
course, if it wants a request to block other requests, it can still do
that by calculating the full response synchronously in receive-request.
Or it can do some of both, calculating part of a response, then
wrapping the rest of the calculation in a delay.

There were four places where clojure-lsp processed responses
synchronously (by using `CompletableFuture/completedFuture` instead of
`CompletableFuture/supplyAsync`):
`initialize`, `shutdown`, `rangeFormatting` and `executeCommand`.

I think it's important that clojure-lsp continues to process the
`initialize` request before it allows others through. But the others
could be re-considered someday:

* `executeCommand`. On the clojure-lsp side, `executeCommand` starts a
  `future` and returns nil. The only difference that I see from using
  `supplyAsync` (or now `delay`) is that the client sees a `nil`
  response before the command is finished executing. If this isn't
  important—if the client would be content with the command executing
  asynchronously, but getting a nil only after it had finished—then
  `executeCommand` could use `delay`.
* `rangeFormatting`.  On the clojure-lsp side, rangeFormatting is an
  anomaly. It calculates its response synchronously, at the exclusion of
  other requests. On top of this, unlike every other request, it sets
  and releases a global lock, presumably as a way to double-check that
  if multiple rangeFormatting requests come in, the later ones are
  dropped. I'm not sure why this happens—it seems like a bug fix from
  long ago, though I haven't tracked down which bug it was. It seems odd
  though—by blocking other requests, there can't be other simultaneous
  rangeFormatting requests, so isn't the lock useless? Anyway, I don't
  currently see any reason why this couldn't be asynchronous (especially
  since it already uses `process-after-changes`) though it would need
  more research.
* `shutdown`. The clients aren't likely to send requests or
  notifications after sending `shutdown`, so I don't see a good reason
  for it to be synchronous, though it probably doesn't hurt.

I've tested this with clojure-lsp, wrapping most of its requests in
`delay`s. This does fix #1240, so I think it should be merged. However,
in the long run I'm not sure mimicking the old behavior is an ideal
solution. Part of the reason we're seeing these delays is because we're
processing every request fully, even if it's become out of date. The
client is capable of indicating that it no longer needs a response by
sending a $/cancelRequest, but we're ignoring those. We also don't stop
an in-progress analysis if we've learned that the text of a file has
changed since the analysis was started. Fixing that kind of thing might
make these spec-bending compromises less necessary.
@mainej mainej requested a review from ericdallo September 20, 2022 15:23
mainej added a commit to clojure-lsp/clojure-lsp that referenced this pull request Sep 20, 2022
Uses clojure-lsp/lsp4clj#25 to process request
in parallel, fixing typing lag and other performance problems documented
in #1240.

Fixes #1240.
mainej added a commit to clojure-lsp/clojure-lsp that referenced this pull request Sep 20, 2022
Uses clojure-lsp/lsp4clj#25 to process request
in parallel, fixing typing lag and other performance problems documented
in #1240.

Fixes #1240.
mainej added a commit to clojure-lsp/clojure-lsp that referenced this pull request Sep 20, 2022
Uses clojure-lsp/lsp4clj#25 to process request
in parallel, fixing typing lag and other performance problems documented
in #1240.

Fixes #1240.
@snoe
Copy link

snoe commented Sep 20, 2022

@mainej
lsp has a concept of document version, did that come into play with the lsp4j parallelism at all? From the consumption side, version seemed to always increase but that seems unlikely if it was using this pool approach.

Can you walk through how you would expect these two async commands to be processed? Is the client expected to queue requests?

(def a 1) (def x |a)

rename 0,17, 'bee'
rename 0,17, 'c'

@ericdallo
Copy link
Member

Thanks for the research and good explanation

executeCommand

I think we could change to return nil only after executing, needs a little bit of testing if that affects clients in any way though.

rangeFormatting

yeah, it was certainly related to bugs, but I can't recall what exactly. I know we had issues in the past where rangeFormatting take a long time and we block the editor with a simple paredit slurp, even so, it doesn't seem related to locking or not, I do think we could think about removing it but would be good to research indeed.

shutdown

Yeah, I'd keep sync

However, in the long run I'm not sure mimicking the old behavior is an ideal solution. Part of the reason we're seeing these delays is because we're processing every request fully, even if it's become out of date. The client is capable of indicating that it no longer needs a response by sending a $/cancelRequest, but we're ignoring those. We also don't stop an in-progress analysis if we've learned that the text of a file has changed since the analysis was started. Fixing that kind of thing might make these spec-bending compromises less necessary.

Totally agreed, we never had fully support for cancelRequest but is something we should consider it indeed, canceling kondo calls may help a lot as you mentioned.
I'm ok with this PR as we weren't aware of any big concurrency issues, so it's ok to rollback to this behavior for now IMO.

Copy link
Member

@ericdallo ericdallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the stages explanation, it really helps understand

@ericdallo
Copy link
Member

ericdallo commented Sep 21, 2022

@mainej I gave a try with clojure-lsp and it does feel faster than the latest release, but still blocking emacs lsp-mode, I know the block on the editor is another issue to be solved probably on lsp-mode, but my point is that it doesn't block at all on previous release (lsp4j), so maybe we are not handling that parallel as before?

One thing to notice is that I do get diagnostics a little bit faster than lsp4j clojure-lsp, but the tradeoff probably is not worth it.

How I am testing:

  • clone squint-cljs/squint
  • open compiler.cljc
  • add a (def x 1) anywhere on top level
  • write at top level (str xxxxxxxxxx) where each x you take some milliseconds to write.

Comparassion

Previous release (using lsp4j)

my input is never blocked, I on purpose stop typing x 3 times to see how much time it takes for diagnostics appear:

previous-release

latest release (master)

I never stopped inputting x, it's really lagging, I only stop at the last x to wait for last diagnostics to finish.

latest-release

This branch

Same as the latest release, but a little bit faster
Note: used parallelism of 7 as mentioned here

branch

I do think this PR improves but not as before lsp4clj v1, I know the client should not block, but I wonder what we are doing differently that lsp4j was processing "faster"? I'm ok merging this PR too as an improvement, LMK what you think and thanks for the help, LMK if you need any help with any more testing/debugging!

This is the same level of parallelism that clojure-lsp was using for
requests and notifications before the migration away from lsp4j.

I believe this parallelism is `# of cores - 1` by default, so this
should be a better setting for users with more cores.
@mainej
Copy link
Contributor Author

mainej commented Sep 21, 2022

Thanks for testing @ericdallo. The improvement you're seeing is so small that IMO this PR isn't worth merging yet.

Thanks for making the videos. They illustrate the problem well. Unfortunately, I can't reproduce anything like what you're seeing, despite having fewer cores than you. I started making videos to show you, but they all look like your first video. So, to continue making progress, I think I'll need assistance debugging. Would you help me with that?

I've pushed updates to both repos that will enable tracing. Will you make a build with those, then send me the logs created from the moment you start typing xs until whenever the logs settle down after you stop? If possible, a video of the same typing session might be helpful too. I'll start with that, but we might have to enable tracing on an old build too eventually.

@ericdallo
Copy link
Member

ericdallo commented Sep 21, 2022

Sure @mainej, one question, do you have semantic tokens enabled as well?

Edit: I tested disabling semantic toekns, same issue, so no need to test without it

@mainej
Copy link
Contributor Author

mainej commented Sep 21, 2022

lsp has a concept of document version, did that come into play with the lsp4j parallelism at all? From the consumption side, version seemed to always increase but that seems unlikely if it was using this pool approach.

@snoe as you're aware, version is synced with the client in didChange. When a client sends many didChange notifications in a row, lsp4j would process each one in series, not parallel. It processed all notifications in series, and in fact all requests too. Parallelism was only introduced at the behest of clojure-lsp. So, assuming clients sent monotonically increasing version numbers in the didChange notifications (which I believe they all do), then the clojure-lsp db would contain monotonically increasing versions too.

The new code does the same thing—both notifications and requests are processed in series, and parallelism is only introduced if clojure-lsp requests it. I tried to explain the reasons why and the mechanics of how this works in the PR description, the code comments, and the README. Please let me know if you have suggestions for how to re-word things to make it clearer.

Can you walk through how you would expect these two async commands to be processed? rename rename

Renames are and always have been difficult to analyze, IMO. Quickly doing two renames in a row has always been buggy. The bugginess is unrelated to any parallelism at the lsp4clj level—it has to do with the fact that prepareRename doesn't process-after-changes, a parallelism problem in the clojure-lsp level. That is, even in the current release, which has no parallelism from lsp4clj, the bug is still present.

So, I'm not sure what we're going to get out of studying rename as an example. It's going to be too hard to reason about whether we've fixed, introduced, or are continuing to see buggy behavior.

I think your point though is this: Suppose a client issues two commands that would both change a document. If we process them in parallel, isn't there a race condition where they could both use the same "initial" state, then clobber the document as they each suggest different edits? The answer is, yes, in theory. But, I'm not worried for two reasons:

  1. Even with lsp4j this was a risk. You seem doubtful that lsp4j was actually processing commands in parallel, but the code I'm looking at and the logs I'm seeing convinces me it was. If you have evidence otherwise, please share.
  2. Humans initiate commands, which cause document edits. Most of the refactorings are fast enough that a human simply doesn't have time to submit two in quick enough succession for them to conflict with each other.

Together these make me think this is a hypothetical problem, but not a practical one.

Is the client expected to queue requests?

No, we can't demand or expect any changes from the clients.

@ericdallo
Copy link
Member

ericdallo commented Sep 21, 2022

@mainej sorry for the huge log, I tried to keep as minimal as possible getting only logs after inputting x. I didn't record the gif but it's exactly as my previous gif, so don't know if that would make any difference

clojure-lsp-branch-log.txt

One thing I noticed is that the slowness doesn't happen with small buffers, so it's like the issue would be related with didChange/analysis with clj-kondo locking the uri? not sure.

@mainej
Copy link
Contributor Author

mainej commented Sep 21, 2022

@ericdallo, yes, I have semantic tokens—and code lens—enabled.

@snoe
Copy link

snoe commented Sep 21, 2022

The new code does the same thing—both notifications and requests are processed in series, and parallelism is only introduced if clojure-lsp requests it. I tried to explain the reasons why and the mechanics of how this works in the PR description, the code comments, and the README. Please let me know if you have suggestions for how to re-word things to make it clearer.

If the response creation was wrapped in CompletableFuture/supplyAsync, as was the case for almost all of clojure-lsp's requests, then the response would be calculated in the "common" ForkJoin thread pool. This thread pool has a configurable parallelism (on my machine it's 3). The end result is that clojure-lsp processed most of its requests in parallel—up to 3 at a time in my case.

I guess I'm trying to figure out how lsp4j behaved in this way but didn't seem to exhibit this parallelism and make sure that some sequencing wasn't missed. My experience, and it sounds like you thought the same, was that CompletableFuture/supplyAsync at least started execution of the futures in sequence. My worry is that if we don't sequence things exactly as lsp4j did, we'll introduce a number of hard to see bugs. Where in lsp4j is the forkjoin thread pool created?

@ericdallo
Copy link
Member

Where in lsp4j is the forkjoin thread pool created?

@snoe here

@mainej
Copy link
Contributor Author

mainej commented Sep 22, 2022

CompletableFuture/supplyAsync at least started execution of the futures in sequence

You may be right, but I'm not 100% sure that's true. clojure-lsp gave supplyAsync a Supplier. That always happened in the same thread. But the documentation for supplyAsync doesn't say whether it's obligated to call Supplier/get in any particular order. In fact, I suspect that it passes the Supplier off to the common ForkJoinPool, which gets to decide when and in what order to call Supplier/get.

Where in lsp4j is the forkjoin thread pool created?

@snoe here

@ericdallo as I read the lsp4j code, that's not correct. That executor service you linked to is used for something else—starting the input stream reader thread. It's passed to a ConcurrentMessageProcessor, which uses it to start one and only one thread that reads from the input stream.

In fact, to answer your question @snoe, lsp4j doesn't arrange to use a forkjoin thread pool at all. Instead, it was lsp4clj that put the request processing in a forkjoin pool via supplyAsync—the common pool, which is the default when you use supplyAsync. That is, lsp4j uses the CompletableFuture api to configure things so that whether or not the future is executed in a separate thread, then the result is put on the output.

But lsp4j doesn't manage any request-processing threads itself, or even require that any asynchronous facilities be used at all. As far as it is concerned, its users have complete control over whether anything is run asynchronously. Its users can return everything in a CompletableFuture/completedFuture, in which case everything will be synchronous. Or they can use supplyAsync or any of the many other CompletableFuture facilities to calculate a value asynchronously. lsp4clj chose to use supplyAsync, which is what was introducing the asynchrony.

Does that make sense how by using supplyAsync the asynchrony is our "fault", and lsp4j was only allowing that to happen?

@ericdallo thanks for the logs. From them I realized that you have lsp-modeline-code-actions-enable on, which I didn't. So that's another process-after-changes request that your editor sends much more regularly than mine. Looking at your dotfies, you also have lsp-idle-delay set to 0.2, instead of the default of 0.5. Changing both of those things, I'm able to get lagginess more easily, though still not as much as your videos.

I added more logging to see where lsp4clj v1 is spending time. It appears that with the lower lsp-idle-delay, the editor floods clojure-lsp with requests. When enough of them use process-after-changes the threadpool that writes to the output gets completely blocked, meaning other requests that are usually very short-lived end up waiting for a turn to start. I tried solving this switching the delays that I added to clojure-lsp to be futures. That way a request can at least do all of its processing while it's waiting for a turn to put the response on the output.

To be honest I don't understand why didn't have theses same problems before. The common ForkJoinPool has a limited amount of threads, so it seems like it would have had the same problem of the pool filling up with process-after-changes threads. Understanding this better is one possible path forward, but not one I'm excited to follow.

I pushed the logging and delay->future changes, so you can experiment with them by pulling the two branches and making another build. I'd be curious to hear the outcome, but I think it's important to consider one of your points @snoe:

My worry is that if we don't sequence things exactly as lsp4j did, we'll introduce a number of hard to see bugs.

I'm feeling this way too. From one perspective this PR is an attempt to replicate the CompletableFuture behavior using core.async. But it doesn't seem to be going very well, and my patience for it is waning.

CompletableFuture has an interesting property which I don't know how to replicate in "pure" Clojure. When you create a CompletableFuture whose value will be calculated in another thread, you can also attach other code (via .then, and others) that will also run in that thread. So, if you have a bunch of CompletableFutures that are processing requests, you can ask their threads to put their results on the output using .then. I don't know how to replicate that in Clojure. Tools like future don't have facilities to attach extra code to them. Instead, if you want to deref them in parallel, you need another thread pool to do that.

Or at least that's my current thought about why the new code isn't behaving like the CompletableFuture code. My attempts to fix that seem like adding threads on top of threads on top of threads, trying to match the sequencing and asynchrony we had before.

So, I think I'll experiment with making lsp4clj v1 use CompletableFutures. That is, I want clojure-lsp to create and return CompletableFutures to lsp4clj, so that lsp4clj can do that lsp4 was doing. That'll remove one big difference between the current code and how lsp4j was doing things. It may not be exactly equivalent, but it'll at least get us a lot closer. How does that sound?

Note that I'm not saying I want to remove core.async entirely. It's still a fine model for reading-from-input and writing-to-output.

@ericdallo I've suggested the possibility of using promesa before, since it uses CompletableFutures under the hood. You've been hesitant about that. I think your hesitation was mostly that you were curious if we could find another way to do it. But if you have other concerns, please let me know.

@ericdallo
Copy link
Member

In fact, to answer your question @snoe, lsp4j doesn't arrange to use a forkjoin thread pool at all. Instead, it was lsp4clj that put the request processing in a forkjoin pool via supplyAsync—the common pool, which is the default when you use supplyAsync. That is, lsp4j uses the CompletableFuture api to configure things so that whether or not the future is executed in a separate thread, then the result is put on the output.

Oh, you are right, that code is tricky, but makes sense!

Does that make sense how by using supplyAsync the asynchrony is our "fault", and lsp4j was only allowing that to happen?

Yes, good catch!

@ericdallo thanks for the logs. From them I realized that you have lsp-modeline-code-actions-enable on, which I didn't. So that's another process-after-changes request that your editor sends much more regularly than mine. Looking at your dotfies, you also have lsp-idle-delay set to 0.2, instead of the default of 0.5. Changing both of those things, I'm able to get lagginess more easily, though still not as much as your videos.

Good point, those settings are a tradeoff of smooth UI vs performance, but that used to work very well with old clojure-lsp and other LSPs

I pushed the logging and delay->future changes, so you can experiment with them by pulling the two branches and making another build. I'd be curious to hear the outcome, but I think it's important to consider one of your points @snoe:

Sounds good, I'll test it and let you know!

So, I think I'll experiment with making lsp4clj v1 use CompletableFutures. That is, I want clojure-lsp to create and return CompletableFutures to lsp4clj, so that lsp4clj can do that lsp4 was doing. That'll remove one big difference between the current code and how lsp4j was doing things. It may not be exactly equivalent, but it'll at least get us a lot closer. How does that sound?

It'd be really good to know if that difference would really "fix" the problem, so if you wanna try that, please do, it's a pity we would stop using core.async for that, but I'm fine too, promesa should help things become "more clojure" though

@ericdallo I've suggested the possibility of using promesa before, since it uses CompletableFutures under the hood. You've been hesitant about that. I think your hesitation was mostly that you were curious if we could find another way to do it. But if you have other concerns, please let me know.

Yeah, I think it's clear now to me how it'd help us, so I'm ok giving it a try :)

@Cyrik
Copy link

Cyrik commented Sep 22, 2022

I'd also vote for promesa, since it gets us closer to a possible cljs version, where core.async can be problematic.

@mainej
Copy link
Contributor Author

mainej commented Sep 23, 2022

@ericdallo I've pushed updates to both branches. The current code tries to stay more faithful to lsp4j. Please let me know how performance is for you.

Since your last test I switched over to using CompletableFuture. It worked as I had planned, but the performance was (in hindsight, predictably) terrible, even worse than what you and @borkdude were seeing originally. After you stopped typing it would usually be a few seconds before lint and semantic tokens caught up.

Eventually I figured out the slow down. When you start typing, the client sends a did-change. A moment after that, it sends several other requests, including several process-after-changes requests. Those process-after-changes requests fill up the common pool. If you keep typing, entering characters at a certain perfect pace, the client keeps sending did-change, which keeps resetting the server's debounce timer (for seconds potentially). At the same time, the client keeps sending other requests—mostly completion and code-lens, but some others. During this, the process-after-changes requests are still siting in the pool, sleeping in a loop, waiting for A) the debounce timer to expire so that B) the clj-kondo analysis can be run, so that C) the processing-changes flag can be released. When that finally happens, the process-after-changes requests can be processed. Then all the other requests that have backed up behind them can too. But there are a lot of them, and they take awhile to finish.

I knew lsp4j must be doing something to avoid this, but it took awhile to figure out what. I was surprised to realize that lsp4j does handle $/cancelRequest notifications from the client. In these long-running scenarios the client decides that either A) it no longer needs a response, for example because the text has changed since a completion request was placed, or B) the server has taken too long and the client should give up. In the long-typing scenario from your videos, sometimes the client cancels the process-after-changes requests, meaning there are fewer things filling the pool. But mostly the client cancels requests that are waiting for a turn in the pool. When the pool finally empties out, there are only a few queued requests, and they can be processed quickly.

lsp4j handles the $/cancelRequest notification by cancelling the CompletableFuture. After adding that into lsp4clj in 20568af, I think we're finally on par with lsp4j. The performance seems a lot better after that.

For me there are still times that the pool fills up (because my common pool has only 3 threads). This manifests as a non-process-after-changes request whose trace is a lot longer than its logged duration. It happens less often when I have lsp-idle-delay set to 0.5 instead of 0.2. I get better performance when there are fewer did-change requests to reset the debounce and fewer process-after-changes requests to fill up the pool. I can imagine a few ways to fix this extra delay though. We could create a separate pool for the process-after-changes requests, so they only block each other. Or, perhaps, we could change process-after-changes to use something besides the sleep-in-a-loop pattern, so that it doesn't consume a thread.

Anyway, I'm hopeful you'll see the same improvements I have. Let me know how it goes.

mainej added a commit to clojure-lsp/clojure-lsp that referenced this pull request Sep 23, 2022
Uses clojure-lsp/lsp4clj#25 to process request
in parallel, fixing typing lag and other performance problems documented
in #1240.

Fixes #1240.
@ericdallo
Copy link
Member

Amazing deep dive @mainej, I'm quite surprised about cancelRequest as well and I can see how that was happening indeed, everything makes sense now, I tested the latest changes and I can confirm it's working the same as was with lsp4j, really nice debugging and explanation! 🎉

I can imagine a few ways to fix this extra delay though. We could create a separate pool for the process-after-changes requests, so they only block each other. Or, perhaps, we could change process-after-changes to use something besides the sleep-in-a-loop pattern, so that it doesn't consume a thread.

Sounds great, it seems process-after-changes current implementation could be improved indeed, sounds like a good high priority for a future release.

I will review both PRs, thank you very much for the huge help on this!

mainej added a commit to clojure-lsp/clojure-lsp that referenced this pull request Sep 23, 2022
Uses clojure-lsp/lsp4clj#25 to process request
in parallel, fixing typing lag and other performance problems documented
in #1240.

Fixes #1240.
@mainej
Copy link
Contributor Author

mainej commented Sep 23, 2022

That's great @ericdallo! This PR needed one small change to restore some tracing, which I just pushed. The clojure-lsp PR will need a bit more work. It'll have to be pinned to a release of lsp4clj, and I'll need to update the integration test client.

deps.edn Outdated
camel-snake-kebab/camel-snake-kebab {:mvn/version "0.4.3"}
cheshire/cheshire {:mvn/version "5.11.0"} }
cheshire/cheshire {:mvn/version "5.11.0"}
funcool/promesa {:mvn/version "8.0.450"} }
Copy link
Member

@ericdallo ericdallo Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see we using p/all for clojure-lsp code actions itself instead of bunch of futures, not sure what you think and the tradeoff considering start using promessa there just for that.
We could also consider using on process-after-changes to have a more robust/simple mechanism using a executor service for that as you mentioned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using p/let for clojure-lsp code actions itself instead of bunch of futures

Yeah, that might make sense. (I think you'd need p/plet, but I see what you mean.) I'm not sure there's much difference—both use a thread pool. The future version has the advantage that you can start creating the actions before all the futures are finished. But the promesa version would probably look a bit cleaner.

We could also consider using on process-after-changes to have a more robust/simple mechanism using a executor service

That's a great idea.

Comment on lines +245 to +246
(if (instance? CancellationException e)
(cancellation-response resp req)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Member

@ericdallo ericdallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 promesa really made it easier to understand :)
Feel free to merge/LMK when I can tag a new minor

@mainej mainej force-pushed the process-requests-in-parallel branch from 432d60f to d762cc6 Compare September 23, 2022 22:58
@mainej mainej merged commit 5fe5da7 into master Sep 23, 2022
@mainej
Copy link
Contributor Author

mainej commented Sep 23, 2022

@ericdallo merged. I have to sign off for the night, so no rush on making a release.

@mainej mainej deleted the process-requests-in-parallel branch September 23, 2022 23:05
@borkdude
Copy link

Promesa rules! Let me know when/if I can test a binary locally and where I can get it.

ericdallo added a commit to clojure-lsp/clojure-lsp that referenced this pull request Sep 24, 2022
* Process requests in parallel

Uses clojure-lsp/lsp4clj#25 to process request
in parallel, fixing typing lag and other performance problems documented
in #1240.

Fixes #1240.

* Turn on tracing temporarily

* Process requests in the future threadpool, not the pipeline-blocking threadpool

* Revert to running requests in common ForkJoin pool

* Turn off tracing

* Fix lint

* Bump lsp4clj

Co-authored-by: Eric Dallo <ericdallo06@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Small freezes regression during editing on big buffers

6 participants