Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clojure-lsp processes left running/orphaned if VS Code is closed while the lsp server is starting #906

Closed
bpringe opened this issue Dec 29, 2020 · 23 comments
Labels
bug Something isn't working

Comments

@bpringe
Copy link
Member

bpringe commented Dec 29, 2020

I need to investigate this more, but I noticed my system's memory was high and when I checked my processes there were three java processes, all clojure-lsp.jar, using a good bit of memory.

image

@bpringe bpringe added the bug Something isn't working label Jan 4, 2021
@bpringe
Copy link
Member Author

bpringe commented Jan 4, 2021

It seems that if VS Code is closed while clojure-lsp is starting, the lsp process is not killed. If VS Code is closed after clojure-lsp has started, the lsp process is killed.

@bpringe bpringe changed the title Clean up lsp server correctly clojure-lsp processes left running/orphaned if VS Code is closed while the lsp server is starting Jan 4, 2021
@bpringe
Copy link
Member Author

bpringe commented Jan 5, 2021

Somehow this wasn't auto-closed. It's been released in 2.0.146.

@bpringe
Copy link
Member Author

bpringe commented Jan 8, 2021

This seems to be an issue again in 2.0.148, though I couldn't reproduce after fixing before 🤔. Not sure if it's a regression or just wasn't fixed correctly before. The changes that I thought fixed it are in the current release.

If I can't get this fixed from Calva's end, it may need to be fixed in clojure-lsp, such as ending the process when the client disconnects.

@bpringe
Copy link
Member Author

bpringe commented Jan 8, 2021

@ericdallo I noticed in the logs when the clojure-lsp is left running even after Calva is shut down, it shows "Shutdown" but not "Exit" -- the exit server function is not being called. Any idea what could be going on?

This only happens if VS Code is exited right when the clojure-lsp server begins to start. I think we've done about everything we can on Calva's side to tell the server to close. I even tried to explicitly send the exit request before calling client.stop() here.

I've referenced @borkdude's clj-kondo extension and he does the same things as us to ensure the server is shut down correctly, and the kondo server always seems to be killed when it should.

I wonder if something is going on with the clojure-lsp server that prevents it from exiting properly in this case (right at startup). The logs look like this when it doesn't shut down correctly (note it doesn't have an "Exit"):

INFO  clojure-lsp.main: Starting server...
INFO  clojure-lsp.main: ====== LSP nrepl server started on port 38909
INFO  clojure-lsp.main: Initializing...
INFO  clojure-lsp.crawler: Crawling dir /home/brandon/development/clojure-test/src
INFO  clojure-lsp.crawler: Crawling dir /home/brandon/development/clojure-test/test
INFO  clojure-lsp.crawler: Crawling dir other
INFO  clojure-lsp.crawler: Crawling dir src
INFO  clojure-lsp.crawler: skipping classpath scan due to project hash match
DEBUG clojure-lsp.main: :initialize 880 ()
INFO  clojure-lsp.main: Shutting down

What makes this a more typical scenario is that people will open one project and then quickly switch to another, like when VS Code automatically opens a project but they want another. In this case they have an orphanage clojure-lsp process now.

@bpringe
Copy link
Member Author

bpringe commented Jan 8, 2021

Realizing now that exit is a notification. I'll see tomorrow if I can manually send the exit notification right after the shutdown request from Calva. I think the vscode-languageclient is supposed to do this in the stop function, but maybe the VS Code instance is being exited before the shutdown response is received, so it doesn't send the exit notification.

@ericdallo
Copy link
Contributor

Yeah, exactly, never saw that behavior with emacs lsp-mode because even if server doesn't respond, it kill the process manually.
I can take a look tomorrow, check how other server handle this, but my guess is that the server is correct

@PEZ
Copy link
Collaborator

PEZ commented Jan 8, 2021

Not sure if this is at all relevant, but anyway. For the jack-in process we had to go through some special loops and hoops on Windows to kill it properly:

export function calvaJackout() {

@bpringe
Copy link
Member Author

bpringe commented Jan 8, 2021

@PEZ Thanks. What's the purpose of the (Thread/sleep 5000) before shutdown-agents?

@PEZ
Copy link
Collaborator

PEZ commented Jan 8, 2021

I don't know. 😄 The solution is copied from clojure-emacs/cider#390 (comment) (and since it also fixed the orphan java processes we had on Windows, w/o causing any other troubles, we have let it be.)

@bpringe
Copy link
Member Author

bpringe commented Jan 8, 2021

I've spent lots of time on this now and I'm running out of ideas from the client's perspective. My only ideas left are to see if we can heartbeat the client from the server, and have the server kill itself if the heartbeat fails.

Created this issue to see if there's anything else I can try, or if there's some bug upstream: microsoft/vscode-languageserver-node#726

@bpringe
Copy link
Member Author

bpringe commented Jan 8, 2021

For the record, I think I managed to also orphan the clj-kondo lsp server once too, so I think this may indeed be a bug in vscode or the client library.

@bpringe
Copy link
Member Author

bpringe commented Jan 9, 2021

Seems the only route left here after toying with things is modifying the server, unfortunately. I've verified again that the clj-kondo process does get orphaned sometimes too, as it seemed to happen multiple times today.

@ericdallo What would you suggest be done in the server? I can think of two things, one of which I'm not sure the best implementation:

  1. We call System/exit in the shutdown method instead of exit (as well as shutdown-agents). This worked well in my tests, but I know it doesn't follow the protocol, and maybe makes some usages not work, like if shutdown could be used to restart the server without killing the process. I know this is probably not a good idea.
  2. We add a heartbeat mechanism to the server. For example (just one idea), we return the launcher from run in clojure-lsp.main, and save it in an atom, then start continuous loop/process that checks if the client is still connected, like every few seconds, and if it's not, calls the server's exit.

I'm reluctant to push forward with making Calva rely more on clojure-lsp until this is fixed. Even though it's a small window of time in which this happens, it's right during a time window where people switch projects after opening VS Code, so I think it will be experienced fairly often, plus we've had a complaint already about it.

Another thought is if the server's performance is improved, this problem could potentially be eliminated, because it seems the processing at startup is what delays the shutdown call, which occurs so late after the request is made from the client, that the client process is killed before the server responds, it seems. (Happens faster if initialization is complete, which is why we don't see the problem then.)

Long rant! 😄

@ericdallo
Copy link
Contributor

I see you really tested a lot the possibilities, but I can't see this as an issue on the server since we use this way for years for vim emacs and other editors, I'd prefer we don't touch on that code since we can add some regressions and it works well for other clients.

With the #906 (comment), I believe this is really an issue on VSCode side, right?

@ericdallo
Copy link
Contributor

More info on how the protocol suggests that implementation here

@bpringe
Copy link
Member Author

bpringe commented Jan 9, 2021

I understand your reasoning, and I've read the shutdown and exit parts of the protocol, which is why I said my # 1 didn't follow the protocol.

I have one more path to explore from the client side, and that's starting the process in detached mode, as mentioned here, which may give us more control over killing the child process (though they mention vscode not killing the server there, it's worth exploring).

Maybe I'll get some good response on the vscode issue I created 🤞.

@PEZ
Copy link
Collaborator

PEZ commented Jan 9, 2021

I can't reproduce the problem in development. Maybe VS Code is killing the dev extension host differently? Using a regular VS Code window the problem is immediate and obvious.

@bpringe
Copy link
Member Author

bpringe commented Jan 9, 2021

Hmm.. I've reproduced consistently with the extension host, and verified I can in the dev branch too. I do find that it's consistent if killed immediately after the progress notification pops up. Could be an OS difference regarding the extension host though.

@bpringe
Copy link
Member Author

bpringe commented Jan 11, 2021

@ericdallo I've got some additional info on this here. It seems that while there is a bug in VS Code, the server is also expected to monitor the client process and gracefully exit if the process no longer exists (it crashed). The processId param of the initializeRequest can be used for this. VS Code sends the processId in the request (not sure if other clients do).

I've briefly looked around at what methods are available from Java to check if a process with a given pid exists, and it seems the Java 9 Process API allows a check by pid, but the Java 8 one does not. So it seems a solution that works with Java 8 would require using shell commands, and using the right ones for the host OS.

I wonder your thoughts on a good solution to this. Clojure-lsp could also try to ping the client with a notification or something, but I think checking if the client process is still alive from the processId in the initializeRequest would be the canonical way.

@ericdallo
Copy link
Contributor

Oh, nice I didn't see that on the LSP spec, good to know.
Yeah, I think the canonical way is to check the processId indeed, so maybe we need to check the old java8 way, or if have any Clojure way for that. I'd like to avoid using shell manually, but if we have no choice, we need to test it with Linux and Windows

@bpringe
Copy link
Member Author

bpringe commented Jan 11, 2021

Is this something you want to work on or would you like me to?

@ericdallo
Copy link
Contributor

Help is always welcome :)

bpringe added a commit that referenced this issue Jan 12, 2021
@bpringe bpringe mentioned this issue Jan 12, 2021
21 tasks
@bpringe
Copy link
Member Author

bpringe commented Jan 12, 2021

Fixed in clojure-lsp here

@bpringe
Copy link
Member Author

bpringe commented Jan 12, 2021

Not sure if the auto close is delayed or didn't work, but this fix is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants