Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a RefreshProvider base on the NIO API and enable autorefresh for all operating systems except MacOS #299

Open
laeubi opened this issue Dec 16, 2022 · 17 comments
Labels
future ideas Improvements that are currently not planned or can not be implemented with the current architecture

Comments

@laeubi
Copy link
Contributor

laeubi commented Dec 16, 2022

Currently Eclipse do not refresh resources by default, and even if enabled, only Win32 has support for native events.

Java contains an own platform independent API to watch local files systems. We should reuse this whenever possible as it is likely better maintained, optimized and improved than everything we can offer ourself.

Given that all operating systems except MacOS seem to support native events, we should even enable this by default (except for MacOS), one can even think about simply checking if PollingWatchService is used (e.g. by create a dummy one and check the classname).

@mickaelistria
Copy link
Contributor

Big +1 here.

@vogella
Copy link
Contributor

vogella commented Dec 16, 2022

Big +1. Just to consider, IIRC a running Maven build did render the ide useless as the notifications were triggering many updates in the ide (e.g.builds etc)

@laeubi
Copy link
Contributor Author

laeubi commented Dec 16, 2022

Just to consider, IIRC a running Maven build did render the ide useless as the notifications were triggering many updates in the ide (e.g.builds etc)

I think this should be handled on a different level, strategies might include to cancel running builds if the become useless, not triggering (automatic) builds more often or to condense repeating updates. I think it is even fair to collect an resource-refresh during a build and trigger the refresh afterwards.

@iloveeclipse
Copy link
Member

Not sure if this is changed, but in Java 8 we've evaluated that for our internal use and found a big black hole in the API: WatchService doesn't monitor files in sub directories of monitored directory, and it is supposed to explicitly add watch service recursively to every sub directory in order to monitor really entire tree.

Not sure how this would work for monitoring any decent project, not saying about multiple hundreds of projects with lot of files, because in order to watch them, the entire forest need to be traversed at startup and for every subdirectory registration needed, and of course all subtree creation/move etc need another traversal with unregister/register etc. I'm pretty sure this can't scale for bigger workspaces.

Here is the example implementation from Oracle: https://docs.oracle.com/javase/tutorial/essential/io/examples/WatchDir.java
Note the Map<WatchKey,Path> keys; - this will contain every directory in the workspace... Not even sure if that will even hit some OS limits (check google for "User limit of inotify instances reached") for workspaces over 100.000 files.

Also not sure about workspaces on NFS, see javadoc of WatchService:

If a watched file is not located on a local storage device then it is implementation specific if changes to the file can be detected. In particular, it is not required that changes to files carried out on remote systems be detected.

So in case someone going to implement that, this should be at least optional or configurable which algorithm should be used for watching.

we should even enable this by default

Let see if that works / scales first?

@laeubi
Copy link
Contributor Author

laeubi commented Dec 16, 2022

Not sure how this would work for monitoring any decent project, not saying about multiple hundreds of projects with lot of files,

Currently The MonitorManager registers all resources of interest to be watched, could be files, folder, ... and there is already a Win32 native impl that also only watches folders so I think at laest it should work by design

Let see if that works / scales first?

The Win32 impl was added 15 years ago, so what are the measures of this "see if that works / scale" ... there should be a huge amount of the no? :-)

Honestly I don't think we will get any competitive measure/feedback here for a feature disabled by default... Users will simply not understand why they need to enable this. On the other hand if there are issues and one can disable this for the ones with millions of (physical) folders in there workspace sounds valid to me.

I'm using the WatchService in my own projects all the time, its a bit cumbersome to setup/use but seems to work quite fine.

@jukzi
Copy link
Contributor

jukzi commented Dec 16, 2022

windows supports recursive updates readdirectorychangesw bWatchSubtree
https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-readdirectorychangesw
Linux does not support recursive update:
https://man7.org/linux/man-pages/man7/inotify.7.html
Inotify monitoring of directories is not recursive

It does not matter if you use the JRE wrapper around it or eclipse's. The JRE implementation is not better but only provides the least common subset.

@laeubi
Copy link
Contributor Author

laeubi commented Dec 16, 2022

It does not matter if you use the JRE wrapper around it or eclipse's.

Would you mind update/modernize the Win32 part of Eclipse in this regard? At least the javadoc claims:

/**
 * A monitor that works on Win32 platforms. Provides simple notification of
 * entire trees by reporting that the root of the tree has changed to depth
 * DEPTH_INFINITE.
 */

so for me it sound as any change simply refresh the whole tree with infinite depth, that do not sounds very efficient...

@laeubi
Copy link
Contributor Author

laeubi commented Dec 19, 2022

laeubi added a commit to laeubi/eclipse.platform that referenced this issue Dec 20, 2022
@laeubi
Copy link
Contributor Author

laeubi commented Dec 20, 2022

@iloveeclipse @jukzi thanks for pointing out the differences and pitfalls regarding linux/windows. I have now made a little prototype and checked the limit on my system that seems 65.536 watches and if I understand correctly this is one per file so it seems this limit can really ruin your day with large workspace ... I assume mac would do this very similar.

So the question is how one might can still leverage this but not hitting the limits:

  1. Maybe eclipse launcher can raise the limits? Can a user do this or is admin permission required?
  2. One could thing about only listen to "interesting" file, e.g skip everything starting with a dot.
  3. Maybe one should only watch source folders? Or even only projects that are currently expanded (but that information is not really accessible)
  4. Have a counter and if w reach a critical mass fallback to polling, also here probably one can have a metric what are "more interesting" parts to watch?
  5. Make this part more accessible to the user code and let the application decide what to watch, e.g only watch the project that is currently rebuild?
  6. ...

@jukzi
Copy link
Contributor

jukzi commented Dec 20, 2022

I don't see any reason to change something when there is no solid out of the box improvement. The OS APIs seem to be for usecases that someone watches a single folder, but not to keep some big tree synchronized. Eclipse is fine with on-access refresh + if a special builder builds something direct on the filesystem he can just call refresh for his target folders afterwards.

@iloveeclipse
Copy link
Member

I'm trying to understand ROI and complexity of a platform dependent solution that should work reliably on three different platforms, scale, don't introduce memory and resource leaks, have tests etc.

Just curious, what exactly is the goal here? Have immediate refresh of any file in the workspace? To be able to do ... what exactly, not covered by current implementation? And how "immediate" would differ from polling solution available now, considering users will most likely not notice a difference between 100 and 500 milliseconds anyway, especially the build that would be triggered could take minutes?

@HannesWell
Copy link
Member

Just curious, what exactly is the goal here? Have immediate refresh of any file in the workspace? To be able to do ... what exactly, not covered by current implementation? And how "immediate" would differ from polling solution available now, considering users will most likely not notice a difference between 100 and 500 milliseconds anyway, especially the build that would be triggered could take minutes?

The motivation for this proposal comes from M2E where we are try to improve the user experience. One big problem we have is that currently for most Maven Plugin's and their goals a 'connector' is required to run the mojo in the IDE. A connector is a plug-in that knows how to handle a particular Maven Plugin/goal.
But that concept scales badly since there is more or less a 1:1 mapping of M2E-Connector to Maven-Plugin and those connectors have to be maintained (which often does not happen and then M2E or Eclipse is blamed for being bad when actually it is someone else fault). It is also possible to built in IDE-support (in general not only for Eclipse) into Maven Plugins, but not the entire maven world is enthusiastic about this.
Since most of the time a connector only tells Eclipse to execute the Maven-Plugin and refreshes the files/directories that were modified in the execution, we thought about to just execute Maven plug-ins in the IDE by default and to detect the files the execution changed, which lead to this proposal.

A Maven-Plugin can modify any file directory in the project. Some only compile java source-files to bytecode into the target-folder, but some also generate java sources into a source folder or do something else that modifies files any where in the project. Simply refreshing the entire project multiple times in one build (since usually multiple mojos are executed for a project) is something that for certain does not scale as well.

That's the motivation behind this issue. :)
And since this is a quite basic requirement, implementing this directly into eclipse-platform and not only in M2E is probably the better way.
Actually I was optimistic that the File-Watcher would help a lot here, but the limitations mentioned by Jörg and Andrey make this much more complicated. Nevertheless thank you for mentioning them so early!

@iloveeclipse
Copy link
Member

... and enabling polling option is not an option because of ...?

@jukzi
Copy link
Contributor

jukzi commented Dec 20, 2022

Fix the root: maven should disallow to write to arbitrary directories.That would also be a requirement for incremental maven builds - like gradle. Maven plugins should only write to a specified outputDirectory, m2e would then need to update the outputDirectory only.
like <configuration><outputDirectory>mydir

@HannesWell
Copy link
Member

... and enabling polling option is not an option because of ...?

I suspect that polling/refreshing the entire project maybe 10 times per one build of it will cause a huge performance regression. But I'm happy to be proofed wrong.

Fix the root: maven should disallow to write to arbitrary directories.That would also be a requirement for incremental maven builds - like gradle. Maven plugins should only write to a specified outputDirectory, m2e would then need to update the outputDirectory only. like <configuration><outputDirectory>mydir

If you can convince the Maven devs to make that change I'll pay for all your drinks at the next EclipseCon. 😃
But to be serious, we are definitely not in the position to change that in Maven and there are good reasons to not mandate it. As said before, for example some plugins generate java sources based on some model files. And such generated sources are usually not generated into the target folder.

5. Make this part more accessible to the user code and let the application decide what to watch, e.g only watch the project that is currently rebuild?

For me this is currently the most promising solution. Register the watchers for all directory of a project when starting to build it and unregister them when a project's build has completed. This might cause a little overhead but I expect this to be faster then polling since the file-tree of a project is only traversed once/twice per build and not multiple times.

@laeubi
Copy link
Contributor Author

laeubi commented Dec 21, 2022

I think some things geeing confused here and even addressing wrong parties, so I like to structure this a bit

Motivation

There was some demand to watch for changed files (and in general it doesn't matter whats the reason for this see below) and then @mickaelistria pointed out that Eclipse already has a facility for this but it is disabled by default, because it uses polling on some OS, this then raised the question if not for some OS we could use a native approach that is even plain Java and so this ticket was created and investigated.

Why are files updated outside the IDE

Mostly this happens with any external process running, this can be a maven build, this can be issuing git command, gradle builds, download things from the internet, whatever... so we effectively can't "fix" them all.

Whats the problem with eclipse

Even if we assume that Eclipse knows that a external command was run, in the case of m2e user has clicked on a pom and choose "run as maven build" or the user is using an internal terminal (what I was told is common for vscode + inttelij users) because of the Resources Abstraction in Eclipse, even if I can connect this run with a particular project and/or root folder, I can not "just" refresh this, because the folder itself might be represented by multiple IResources in the workspace or even any subfolder and/or file of this might have a "twin". Again nothing we can solve in the individual tools.

Conclusion

If we decide polling is fine, as I assume no one is willing to enhance linux/mac in a way that it can watch folders less resource intensive, we should enable it by default as in that case at least "sometime" in the future changes will be detected. For Windows we could try to enable this by default as Windows is still a large userbase. Beside of that there should be a way for plugin-code to know if auto-update is enabled to probably act accordingly, but I think this could better be dedicated tickets and this one can just be kept as a reminder so maybe in the future if linux and/or java changes in this regard we can further work on this.

laeubi added a commit to laeubi/eclipse.platform that referenced this issue Dec 21, 2022
@laeubi laeubi added the future ideas Improvements that are currently not planned or can not be implemented with the current architecture label Dec 21, 2022
@laeubi
Copy link
Contributor Author

laeubi commented Dec 21, 2022

Just in case we want to review this in the future here is a first plain implementation on this branch:

I have tested this with a "flat" project hierarchy and this instantly updates files (adding / removal) and editor (file changes) made outside the IDE. It is currently missing a way to detect if a resource is mapped multiple times where currently even more watchers would be produced than strictly required.

So this could still be an option for small workspaces, but then we would need some kind of configuration option to the user that:

  1. Instead of a boolean flag have to select a provider, and probably show some more information to the user what the provider does
  2. probably have a way to detect the user limit and disable the provider if the workspace is "too large" (how ever this is determined, don't know if Eclipse has some way to count all IResources ...
  3. ...

but for now I think I'll leave this alone as a POC if no one else step up to help further with this ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future ideas Improvements that are currently not planned or can not be implemented with the current architecture
Projects
None yet
Development

No branches or pull requests

6 participants