Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues when adding lots of files #824

Open
tsmaeder opened this issue Oct 10, 2018 · 11 comments
Open

Performance issues when adding lots of files #824

tsmaeder opened this issue Oct 10, 2018 · 11 comments

Comments

@tsmaeder
Copy link
Contributor

When we import a new project into a che workspace, we add thousands of files, leading to thousands of "didChangeWatchedFiles()" notifications. Since the notifications are handled synchronously, this may delay processing of commands by minutes, leading to timeouts on the client side.
Handling change notifications in WorkspaceEvents seems to do many things which might not be strictly necessary. We should strive to optimzed this code path because it is used a lot.
As a more permanent fix, we should think about processing the notifications asynchronously and maybe even batching them: when adding or changing many files, subsequent calls are likely concern files close to each other in the file system tree.

@tsmaeder
Copy link
Contributor Author

The main use case for this is when importing a new project into our workspace. I was thinking about not reporting changes while we add a new project. The problem with that is that changes may occur while the project is being imported by jdt.ls. If we turn off changes, these changes will be lost :-(

@tsmaeder
Copy link
Contributor Author

Discussed with @fbricon that we could put the change notifications processing in the background and have a works queue. On the work queue, we could do optimizations like

  • collapse multiple events for the same uri (create+modify+delete == nothing, for example)
  • collapse events for directories and their descendants. For example, if we hit a "create" in a directory that is not known yet, we can do a refresh(infiniteDepth) on the directory and delete all pending change notifications on that directory.
    If we combine this with some proper waiting and batching (e.g. wait for some time and/or a certain queue size before processing events), we should get far in the "add project" scenario.

The second avenue of approach is optimizing the code path for the common case (i.e. not doing excessive work int the change handler, as proposed in the issue description.

@snjeza
Copy link
Contributor

snjeza commented Apr 2, 2019

I have imported the che project (https://github.com/eclipse/che) to Che 7 in the following way:

  • started minikube
minikube start --cpus 2 --memory 8192 --vm-driver=kvm2 --extra-config=apiserver.authorization-mode=RBAC --docker-opt userland-proxy=false --disk-size 50g
  • started che
chectl server:start
  • created Che 7 with 3g and the following configuration:
"attributes": {
    "editor": "org.eclipse.che.editor.theia:next",
    "sidecar.org.eclipse.che.vscode-redhat.java.memory_limit": "1536Mi",
    "plugins": "che-machine-exec-plugin:0.0.1,org.eclipse.che.vscode-redhat.java:0.38.0"
  }
  • started Che 7 without any projects
  • logged to a Che 7 container and created the following files:
    a) /projects/settings.xml
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <localRepository>/workspace_logs/maven2</localRepository>
</settings>

The Che 7 workspace uses the '/root/.m2' maven local repository that isn't persistent. I have used /workspace_logs/maven2.
b) .theia/settings.json

{
   "java.trace.server": "messages",
   "java.configuration.maven.userSettings": "/projects/settings.xml",
   "java.jdt.ls.vmargs": "-Dlog.level=ALL -noverify -Xmx1g -XX:+UseSerialGC -XX:+UseStringDeduplication -XX:GCTimeRatio=4 -XX:ActiveProcessorCount=1 -XX:AdaptiveSizePolicyWeight=90 -XX:+TieredCompilation -XX:TieredStopAtLevel=1",
   "java.configuration.checkProjectSettingsExclusions": false
}

c) downloaded java-0.43.0.vsix from https://raw.githubusercontent.com/snjeza/vscode-test/master/java-0.43.0.vsix and copied it to the /workspace_logs directory.

cd /plugins/org.eclipse.che.vscode-redhat.java.0.38.0.redhat_java/
rm -rf *
unzip /workspace_logs/java-0.43.0.vsix

You can also use the snjeza/che-plugin-registry image and the following configuration:

"attributes": {
    "editor": "org.eclipse.che.editor.theia:next",
    "sidecar.org.eclipse.che.vscode-redhat.java.memory_limit": "1536Mi",
    "plugins": "che-machine-exec-plugin:0.0.1,http://che-plugin-registry-che.<minikube_ip>.nip.io/plugins/org.eclipse.che.vscode-redhat.java:0.43.0:"
  }

d) added the che project to the Che 7 workspace

git clone https://github.com/eclipse/che.git

e) opened the /projects workspace in the Che Theia editor
Importing the che project takes 25 minutes on my machine when a local maven repository is empty. It takes a minute to open a theia workspace when using a persistent maven local repository. If a Java LS workspace isn't saved when it is closed,rebuilding will last 1-2 minutes. See #758 and redhat-developer/vscode-java#793.
Che 7 uses the /home/theia/.theia/workspace-storage//redhat.java/jdt_ws directory as a Java LS workspace which isn't persistent. Because of that, the workspace is imported again when stopping/running the Che 7 workspace.

java-0.43.0.vsix includes the following changes:

  1. org.eclipse.jdt.ls.core.internal.managers.ProjectsManager.fileChanged(String, CHANGE_TYPE) calls findFile twice in https://github.com/eclipse/eclipse.jdt.ls/blob/master/org.eclipse.jdt.ls.core/src/org/eclipse/jdt/ls/core/internal/managers/ProjectsManager.java#L254

  2. org.eclipse.jdt.ls.core.internal.JDTUtils.resolveCompilationUnit(URI) always creates a java element. I created only compilation units.

  3. org.eclipse.jdt.ls.core.internal.handlers.WorkspaceEventsHandler.createCompilationUnit(ICompilationUnit) ignores those units that aren't on the classpath

  4. WorkspaceDiagnosticsHandler is added before JDT LS is initialized when the initialization lasts more than five minutes. See JobHelpers.waitForInitializeJobs()
    VS Code 0.42.0 sends 4800+ the publishDiagnostics events. if we add WorkspaceDiagnosticsHandler after the initialization, VS Code sends 1080 events.

  5. Java LS watches changed files before it is initialized. We can't stop watchers that are statically added - https://github.com/redhat-developer/vscode-java/blob/master/src/extension.ts#L60. I have removed those watchers from vscode-java and added them dynamically to Java LS. VS Code sends 5800 didChangeWatchedFiles events when importing the che project. If watchers are registered after initializing Java LS, VS Code won't send any events. Related issue: file system watchers in vscode-java  #926

  6. The VS Code Java client adds the '/src/' watcher. That's why VS Code sometimes sends duplicate didChangeWatchedFiles events if a project has source folders within the src directory. I have skipped all watchers containing the /src/ string in their path.

  7. org.eclipse.jdt.ls.core.internal.managers.ProjectsManager.initializeProjects(Collection, IProgressMonitor) runs as a Java runnable which is not necessary because the isAutoBuilding property is false in that moment. I have removed it.

  8. Java LS downloads all maven sources artifacts. Che 7 uses an empty maven repository when importing a project. I have added a feature that enables downloading a source code when it is required.

  9. The maven importer imports all projects in one step - https://github.com/eclipse/eclipse.jdt.ls/blob/master/org.eclipse.jdt.ls.core/src/org/eclipse/jdt/ls/core/internal/managers/MavenProjectImporter.java#L153. In this way, Java LS requires a lot of memory. The che project contains 299 maven projects. VS Code 0.42.0 requires 2g of heap when importing it using an empty maven local repository. I have added the following feature:

@fbricon
Copy link
Contributor

fbricon commented Apr 4, 2019

So I've opened several sub-issues related to @snjeza's proposal:

@tsmaeder
Copy link
Contributor Author

tsmaeder commented Apr 5, 2019

I salute the efforts to make notification processing more efficient. However, on big ticket item that I don't see here is the offloading from the message handling thread.
One big problem with change notification is that it is blocking the handling of requests from the client, because the message queue is full of file change notifications.
So what's the plan on that one?

@snjeza
Copy link
Contributor

snjeza commented Apr 5, 2019

One big problem with change notification is that it is blocking the handling of requests from the client,

I think that is an upstream issue. See redhat-developer/vscode-java#867. The issue has been fixed in VS Code 1.33.

@tsmaeder
Copy link
Contributor Author

tsmaeder commented Apr 8, 2019

I think that is an upstream issue.

From reading the issue, I don't think this is the same: the issue mentions cloning the project, then opening the cloned project folder. The scenario I'm thinking of is adding a project to an already open folder, thus creating boatloads of "file added" notifications.
Since the notifications are handled on a single thread, no requests are handled until the last of the notifications is processed.

So the test case for this issue is really this:

  1. Create a Che 7 workspace with Java enabled
  2. Make sure Theia is opened on the "/projects" folder
  3. git clone "che" into /projects

I don't see that in your test script, correct?

@snjeza
Copy link
Contributor

snjeza commented Apr 8, 2019

@tsmaeder It is related to #901. I will test it.

@nicobao
Copy link

nicobao commented Dec 9, 2020

Hi,
Thanks for your work on eclipse.jdt.ls! Thanks to you I can use vim for professional Java development 👍
When using eclipse.jdt.ls with neovim, I often experience very high CPU usage (300%) for quite a long time, especially on opening moderately large projects, leading to my machine making lots and lots of noise. So much that I feel the need to close the files.
I suppose it is related to these issues, it is a known problem and that you are already working on it?

@fbricon
Copy link
Contributor

fbricon commented Dec 9, 2020

@nicobao you can try to increase the heap in the java.jdt.ls.vmargs setting (it's 1GB by default). But I have no idea how to do that from neovim though.

@nicobao
Copy link

nicobao commented Dec 9, 2020

Thanks for the tip. I changed from the default:
"java.jdt.ls.vmargs": "-noverify -Xmx1G -XX:+UseG1GC -XX:+UseStringDeduplication"
to
"java.jdt.ls.vmargs": "-noverify -Xmx6G -XX:+UseG1GC -XX:+UseStringDeduplication"

I will see and let you know if I run into performance issue again.

For info, configuring LSP on neovim is as easy as with vscode. If you use coc.nvim, you can modify ~/.config/nvim/coc-settings.json (or calling :CocConfig from inside vim) and then modify the config with the same parameters.
Neovim develop branch has a built-in LSP client - and things work differently from coc.nvim but I don't use that yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants