Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NIO to set file permissions #302

Closed
laeubi opened this issue Dec 19, 2022 · 38 comments
Closed

Use NIO to set file permissions #302

laeubi opened this issue Dec 19, 2022 · 38 comments

Comments

@laeubi
Copy link
Contributor

laeubi commented Dec 19, 2022

Currently it seems we have a native wrapper (libunixfile) to set file permissions. As the NIO API supports this as well it might be a good idea to investigate if this is still required or can be replaced by plain java.

@jukzi
Copy link
Contributor

jukzi commented Jan 25, 2023

If working on this please test the performance. Especially on windows i know jdk does care about more attributes (i.e. slower) then what is needed in eclipse.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 6, 2023

I have looked a bit into this now and it seems there are already NIO variants implemented but currently disabled with a comment that on linux this is faster (it seem to use <sys/stat.h>).

So this issue is actually "fixed" but I wonder if/how one could leverage the code, I doubt no one would enable a system property just for "using nio".

Question is, if platform code is really faster, can we derive a test-case to report this to the Open-JDK? Its actually strange that the JVM is slower as a custom impl with additional JNI overhead...

@akurtakov
Copy link
Member

Can you please point to this comment? At some point @iloveeclipse was looking into the filesystem fragments IIRC.

@iloveeclipse
Copy link
Member

Do you mean this: 6b66a8c and https://bugs.eclipse.org/bugs/show_bug.cgi?id=530868?

@iloveeclipse
Copy link
Member

If yes, that testing was done on Java 8, 5 years ago.

It could be, the Java 17 is now faster, but need to be tested.

We have seen both performance improvements and performance drops in low level code in recent Java releases, so one should do carefull testing before changing this basic code.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

I mean this comment here:

* Please notice that the native implementation is significantly faster than the non-native one.
* The BenchFileStore test runs 3.1 times faster on Linux with the native code than without it.

so actually there is no need to change the code but there is a system property:

boolean nativesAllowed = Boolean.parseBoolean(System.getProperty("eclipse.filesystem.useNatives", "true")); //$NON-NLS-1$ //$NON-NLS-2$

to switch to NIO implementation that itself has three "flavors", default, posix and dos ...

The main benefit I see here would be that we then can completely drop the native impl+fragments making the relng easier, no need for special "binary" repo, special builds. What I can see at the moment is that the specific implementations mostly pass down the raw file name and NIO needs to lookup a Path first, but given that we do use Path internally (instead of String) this might improve the performance but of course this will only be beneficial if we use NIO all the way.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

By the way I run the BenchFileStore once with default settings (0,8 sec) and once with eclipse.filesystem.useNatives=false (4,2 sec) so I think it is still true that NIO is slower, but given that the perf test do 150.000 repeating operations I think this is actually negligible ...

For me it was a POSIX filesystem probably some others can repeat the test and post results here.

@iloveeclipse
Copy link
Member

but given that the perf test do 150.000 repeating operations I think this is actually negligible ...

Why that? We have projects that contain over 100.000 files, so it would be "just" one full build. Or do you mean, benchmark should run 1.000.000 times or more to be representative?

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

@iloveeclipse if your full build runs in under a second so you can notice a performance of the FileInfo fetching from 1 sec > 4 sec then it might make a difference, most of my builds in that scale run more in the range of minutes than seconds so at least I won't spot the difference here :-)

Also I hope that build is doing something more useful than query all day long the FileInfo ... so to be representative one would need to actually capture how often this method is called in a usual life-cycle of the IDE, but I guess it is hard to obtain but obviously there where sometimes a case where someone has noticed a difference (network device?).

That's why it would be good to have more measures from different users/systems.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

By the way I debugged that part of NIO a bit and in the end it just calls fstat what should not be much of a difference here what the native wrapper did, but the VM is performing some security and sanity checks and for the "does not exits" an exception is thrown so I don't think the operation itself is slower, but the additional checks and lookups simply accumulate here.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

I now have made a check with just comment out the "nonexistingstore" part and then NIO requires 0,6sec and "native" requires 0.4sec, numbers vary a bit from run-to-run, so I think NIO is beating by the fact that for a non existing file the exception handling is too expensive and an in the "good case" performance is actually comparable.

@akurtakov
Copy link
Member

@laeubi Have you considered measure the same with Panama impl? It would be interesting if we can preserve the speed for all cases while at the same time droping the fragments if/when we have it by default.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

Not yet as it requires additional implementation and the JDK already has a native binding for this. I also doubt that checking existence of hundreds of thousands non existing files is really a use-case we should optimize, so probably the effort is invested better elsewhere.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

I now has raised the level (but still disable the "do not exits" case) to 50000 loops and 300 repeat ( = 15.000.000 access) and the result for posix is:

  • native: 37 seconds
  • nio: 48 seconds

so for 15 million of access we have about 10 seconds of difference making it about 600ms "speedup" per million access ...

Just to make clear (if it wasn't) this is all not about accessing the file content this is only used to access file attributes (e.g. is it a file/directory, how large is the file, when was it modified and so on) ...

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

I have now pushed my adjusted testcase here:

so if one want to repeat the test do the following:

  1. Run as plugin test, you will get a time that should use the "native" by default
  2. Now edit the runconfig and add -Declipse.filesystem.useNatives=false to the VM arguments, this will give a time for the "nio"
  3. post results here with information about your file-system used (you can put a breakpoint in the LocalFileNativesManager to se what is selected)

I think what can be done is that we change the default for eclipse.filesystem.useNatives from true to false in this release and see if there is any complains because of this. As it is a system-property, it is really easy to change the config in the case anything goes really bad.

Then on the next release we can remove the native bindings from being included in features and build, but keep them in the source.

If then after another release no one complains I think the can safely deleted.

@iloveeclipse
Copy link
Member

but still disable the "do not exits" case

What you are doing now is adopting benchmark in the way that it delivers "right" results.

This particular change of the benchmark #348 is incorrect, because it ignores the fact that many Eclipse IO operations check file existence and might trying accessing not existing files.

Also I hope that build is doing something more useful than query all day long the FileInfo ...

This is a benchmark for dedicated interface, it is by no means a generic Eclipse performance bench case.
Quote: "Benchmarks basic operations on the IFileStore interface". It doesn't claim to be a all purpose generic Eclipse IO performance benchmark.

so to be representative one would need to actually capture how often this method is called in a usual life-cycle of the IDE,

For that purpose one can create a "generic" benchmark and test the "representative" case. Note : there is no "usual life-cycle of the IDE", it always depends on what the users are doing with "IDE". File I/O is one aspect of the performance, and depending on how exactly and how much the applications use file I/O, it may or may not be important for a specific "life cycle".

but I guess it is hard to obtain but obviously there where sometimes a case where someone has noticed a difference (network device?).

I assume all file IO related benchmarks will show completely different picture on NFS (compared to local file system), and that picture will differ for every different NFS implementation / configuration / client/server pair.

Even using different local file system implementation on same workstation may show a big difference (ext3 vs ext4 vs xfs vs btrfs) and that may vary for different benchmarks. Add Windows with famous virus scanner to the equation, and you will see there is no black / white decision here.

So while I agree that maintaining native code is complicated, but just throwing it away because of that reason is not acceptable (as long as we don't have breaking build because of that).

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

What you are doing now is adopting benchmark in the way that it delivers "right" results.

No I just don't mangle different things to get better comparison... One might argue the benchmark was specifically crafted to show a speedup for the native approach ... the benchmark makes it even worse in repeating to fetch the same file info three times what increases the penalty here again, where a true application would most probably only fetch it once.

So there is no "right" result, there is only an observation that NIO is not slower in general, but slower for the case that an item does not exits, and this most likely due to the overhead with exceptions, so the designer of the JDK has though about query for file attributes for an non existing file to be exceptional and not usual ...

Eclipse IO operations check file existence and might trying accessing not existing files.

Then something is wrong at Eclipse I/O ... what is the use-case for operating on non existing files? For sure one needs to check if e.g. a .project file is there or some settings, but this are operations cached untill refresh time an not something one do in a repeated busy loop as the sole action.

This is a benchmark for dedicated interface, it is by no means a generic Eclipse performance bench case.

Yes and the "general purpose" for me is, that most of the time when I access a file I do so because I think it actually exits if you look at the (N)IO API everywhere it is assumed that a non existing file is an exceptional case and therefore an exception is thrown.

So while I agree that maintaining native code is complicated, but just throwing it away because of that reason is not acceptable

Just keeping it because it is there neither ... the natives have served a purpose when Java is not offering an alternative, but there is not only effort in maintaining it (e.g. new architectures arriving on the horizon), this currently also bypass any security checks of the JVM.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

I have now improved the benchmark in a way that is tests both branches (exits / notexits) and also native vs nio:

Here are the results for posix on my system:
grafik

So NIO is 1.7 sec vs 1.4 sec here for if the file exits and 1 sec vs 11 sec for non exits, so NIO is ~10x slower when the file is non existent but one would only notice that for really many many items.

@szarnekow
Copy link

Here are the results for posix on my system

Can you please provide more information? What kind of disk (iops) / OS / file system do you use to reproduce these numbers?

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

It is a Debian 9 with PCI-e SSDs attached and I use an encrypted ext4 file system. But I think it does not matter much as we are comparing relative measures here, so maybe it will be faster/slower on another system but the relation could be different and of course the penalty for throwing an exception might getting lower if the overall disk access is slower.

What would be interesting to execute the tests in parallel but I think the perf-test do not support this.

@szarnekow
Copy link

szarnekow commented Mar 7, 2023

I'm not exactly sure what the goal is here, but recently we figured in a proprietary project, that file access on windows is dreadfully slow via java.nio.Path especially for missing files. Might be worthwhile to measure on Windows, too. (personally I'm on Mac which behaves almost the same as Linux in that regard)

@jukzi
Copy link
Contributor

jukzi commented Mar 7, 2023

On Windows the problem is not the the java layer, but the OS filesystem. That's the same on linux: it depends far more on the filesystem used then on the little java overhead.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 7, 2023

@szarnekow @jukzi thats why I have enhanced the perf test to compare both and to compare existing file vs non existing file, so if you can take a short time and try out

and post your numbers here it would be useful.

@iloveeclipse
Copy link
Member

FYI: we track performance for the "build" use case in our product with a standardized "real life" application test.

We are executing this test on regular base to track performance regressions if we switch to new Eclipse or Java or Xtext versions or change runtime in any other significant way. So far this test was reliable enough for us to find various bugs in both Eclipse and Java.

We executed this test yesterday overnight 100 times for each configuration (with/without natives).

The test does a full build of our medium size application specific project (~290.000 files overall, ~1.500 Java files and ~50.000 DSL files) which involves both Java and Xtext builders (we use both Java and few DSL languages in our application). The "full build" use case runs in 232 seconds with native (Linux) IO and 243 seconds with Java NIO.

The performance regression is 4,7% and standard deviation 1,5%, so the result should be significant and not just some measurement jitter. Test was done on our regular RHEL 7.9, 16 core Xeon workstation with 128 GB RAM, SSD, xfs, max heap of 12 GB, using OpenJDK Java 17.0.4 and 4.25 Eclipse platform. I think there were no significant changes in related areas in platform master, so we can consider this test setup as "good enough" for the discussion here.

Assuming we would get such test results on our next regular platform update, it would be a major issue for us and we would be forced to investigate and fix the regression.

With that, from our point of view we can't switch to Java NIO yet. In order to do so, either Eclipse NIO provider should be improved or Java or both.

Note: this result of ~5% degradation in the "real life" "build" use case would probably be much worse on Windows, because file IO on Windows takes way more time compared to Linux (so regression relative to the application time will be also higher), but we can't run our test on Windows simply because our application is not portable and supports only Linux.

For those who would like to test "pure Java build" on Windows in a "standardized" comparable configuration with different project sizes, see https://github.com/iloveeclipse/java-project-generator where one can configure how big test Java project should be. That tool can generate "real" Java project of any size (in reproducible way) with cross references, inheritance etc.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 8, 2023

@iloveeclipse thanks for sharing these numbers, so your build takes a total of 3:52 seconds and NIO adds 0:11 (11 seconds) so even if that is measurable i really wonder if one can really qualify this a major regression but actually not that bad as it is indicated in some places.

So some measures from Windows Users would be good.

@iloveeclipse
Copy link
Member

Yes, for us 5% is both measurable and a big regression. The time our users spend waiting for build adds to the application test time and this time where our tester (limited HW resource) is blocked literally costs lot of dollars for our customers.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 8, 2023

This of course always depends, but for example the Tycho build (about ~ 2000 java files) runs about 4 minutes in serial mode, but runs for 20 seconds(!) in parallel build mode, so these is actually the thing where I would talk about a big difference. So one probably can save much more in your build than a few seconds with using native I/O but as said if its measurable and important to you than of course it is valid to be considered.

@iloveeclipse
Copy link
Member

So one probably can save much more in your build than a few seconds with using native I/O but as said if its measurable and important to you than of course it is valid to be considered.

Thanks for the hint, as you imagine we have a non trivial application here and we are not burning the money of our customers just because we've missed the opportunity to change some config file and to enable parallel build. Also note, doing the build in parallel will not reduce required IO work (and not reduce the observed overhead for that IO work), which is the main point of discussion here.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 8, 2023

as you imagine we have a non trivial application here and we are not burning the money of our customers just because we've missed the opportunity to change some config file and to enable parallel build

I didn't implied anything like this, I just gave an example about what I consider a big performance drop ...

@laeubi
Copy link
Contributor Author

laeubi commented Mar 8, 2023

@HannesWell (or maybe @jukzi ) if anyone is on windows please share some numbers of the perftest so we can see if/how this works out on windows, the issue itself can be closed I think as obviously there already is a NIO impl, it is just not enabled...

@laeubi laeubi closed this as completed Mar 8, 2023
@jukzi
Copy link
Contributor

jukzi commented Mar 8, 2023

On windows there is a single bottleneck during build for us: writing the .class files. Everything else happens in other threads and is currently faster then that.

@HannesWell
Copy link
Member

@HannesWell (or maybe @jukzi ) if anyone is on windows please share some numbers of the perftest so we can see if/how this works out on windows, the issue itself can be closed I think as obviously there already is a NIO impl, it is just not enabled...

Which test are you referring to exactly?

Regarding if 5% are much or not: In my use-cases I would probably not start optimizing to get 5% improvements, I can probably optimize other things with greater benefit. But I can understand that if runtime is really important a 5% overall increase in runtime can be considered a regression. Although from a code readability/understandability perspective I would also prefer to have a simpler solution.

In general I also made the (subjective) observation that the file-system is slower compared to Windows. And if NIO is fetching more information than Eclipse needs and it can therefore be faster to use a custom method that fetches less, maybe we should suggest to the OpenJDK to either provide methods where one can specify what to fetch exactly or to only fetch information lazy (if the over-head is not greater then)?

@tjwatson
Copy link
Contributor

tjwatson commented Mar 8, 2023

But I can understand that if runtime is really important a 5% overall increase in runtime can be considered a regression.

Remembering the early days of Eclipse development, a 5% reduction in performance would be a major regression. Also, considering this was a "real world" scenario, I doubt that is the true full regression as far as the actual isolated NIO calls. I assume a large amount of other stuff is going on in the "real world" scenario that pad the overall measured numbers. In other words, I would expect the NIO calls, if they were benchmarked in isolation in this same environment, would be quite a bit higher than 5% regression.

@iloveeclipse
Copy link
Member

would expect the NIO calls, if they were benchmarked in isolation in this same environment, would be quite a bit higher than 5% regression

Sure, they are 3x higher, but the original benchmark was considered not relevant and there was a question regarding "real world" scenario, which is what we've answered.

@iloveeclipse
Copy link
Member

Remembering the early days of Eclipse development, a 5% reduction in performance would be a major regression.

For us (Advantest) this is a major regression. I believe everything above 1% is considered as serious, above 3% as critical and 5+ % is a blocker, but that applies to any performance aspect, not just file IO.

@tjwatson
Copy link
Contributor

tjwatson commented Mar 8, 2023

Remembering the early days of Eclipse development, a 5% reduction in performance would be a major regression.

For us (Advantest) this is a major regression. I believe everything above 1% is considered as serious, above 3% as critical and 5+ % is a blocker, but that applies to any performance aspect, not just file IO.

I agree, letting just a few things like this go in and we all of a sudden see +20% reduction in performance!

@HannesWell
Copy link
Member

But I can understand that if runtime is really important a 5% overall increase in runtime can be considered a regression.

Remembering the early days of Eclipse development, a 5% reduction in performance would be a major regression. Also, considering this was a "real world" scenario, I doubt that is the true full regression as far as the actual isolated NIO calls.

Today we often don't have the capacity to optimize that far and have to be happy that things are at least working. :/ Anyway if we have a working faster solution we should keep it then if there are users for which it is important.
Nevertheless NIO can also be a major performance boost. For example using Files.copy(Path, Path) is from my experience magnitudes faster (at least on Windows) than the old, pre NIO approach to copy FileInput/OutputStreams.

@laeubi
Copy link
Contributor Author

laeubi commented Mar 9, 2023

Sure, they are 3x higher, but the original benchmark was considered not relevant

I didn't claimed it is irrelevant, what I said it that it has put focus on the case of non existing files and then NIO seems slower than it actually is and it is hard to compare, I have now improved it here:

So now it gives (for me) much better results that are at my machine:

  • for existing files there is only a small gap
  • for missing files NIO is 10x slower

and there was a question regarding "real world" scenario, which is what we've answered.

This was a different context, you have claimed that most of the time an application is querying for non existing files what seems not realistic to me and is not answered by your measure, still its good to have some numbers from real build, so I don't understand why this again turns into a quite aggressive tone (might be a personal feeling).

Performance is important but its not the only measure, otherwise we should instantly ban any Exceptions (slow), any lambdas (proven to be slow on the large scale as well), maybe most stream usage and even collections->array and rewrite each and every method that issues a system-call.

Also if one never tries or questions things nothing will ever change, so its good from time to time check if previous assumptions still hold true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants