Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated Report designer 4.9 start is very slow #757

Closed
hvbtup opened this issue Nov 29, 2021 · 20 comments · Fixed by #761
Closed

Generated Report designer 4.9 start is very slow #757

hvbtup opened this issue Nov 29, 2021 · 20 comments · Fixed by #761
Milestone

Comments

@hvbtup
Copy link
Contributor

hvbtup commented Nov 29, 2021

I tested the Report Designer All-in-One (which I compiled myself) from the 4.9 master on 2021-11-26.

It works fine once it is started, however I noticed that starting the all-in-one-designer takes 2-3 minutes on my machine (Core i5, 8GB RAM, SSD) with high cpu on Windows 10 64bit.

This is before the progress bar shows.

The 4.2.1 all-in-one-designer only takes a few seconds to start on the same machine.

Colin Sutton reported the same issue on the birt-dev mailing list with the downloaded eclipse4.9-20211127, thus I don't think that something is wrong with my build process.

The second start is also slow.

@hvbtup
Copy link
Contributor Author

hvbtup commented Nov 29, 2021

I used sampling with JVisualVM during the startup (and stopped when I thought that should be enough).
This is the output:
grafik

I'm not an expert but to me it seems the "Refresh Thread" is causing most of the CPU load.

Opening it, it looks like this:

grafik

grafik

From here on, I'm lost.

@hvbtup
Copy link
Contributor Author

hvbtup commented Nov 29, 2021

A quick search for "org.apache.felix.resolver.ResolverImpl checkConsistency slow" finds https://issues.apache.org/jira/browse/FELIX-6140 which was closed only this year and https://issues.apache.org/jira/browse/FELIX-5648

Is someone able to take a look at this?

I'm running out of time for this week and I'm definitely not an OSGI expert.

@wimjongman
Copy link
Contributor

Henning, you are getting better by the minute :)

Henning, could it be that you have hit this bug: https://bugs.eclipse.org/bugs/show_bug.cgi?id=548443

My all-in-one takes a long time only the first time I have unpacked it and run it. After the first run, it is fast as greased lightning.

@hvonbargen
Copy link

Not sure.

Can't do further testing this week.

It sounds like a candidate for being the reason.

On my machine, we are using GData Antivirus, and unfortunately the second start is as slow as the first one.
BTW I don't see this issue when starting the Oomph generated eclipse for development of BIRT.
But maybe this is because those JARs are well-known to the Anti-Virus software.

And a final remark: Inside the bug you mentioned, someone created a patch to use buffered IO, but that was abandoned, which is a pity.
I think this could make a big difference performance-wise.

@gab1one
Copy link
Contributor

gab1one commented Nov 29, 2021

I think this is related to my comment on the java 11 issue: #672 (comment)
The best way to debug this, is to create a file with this content on your machine:

org.eclipse.osgi/resolver=true
org.eclipse.osgi/resolver/roots=true
org.eclipse.osgi/resolver/report=true

and then adding the line -Dosgi.debug=/path/of/the/file to the vm args of the all-in-one launch config

@hvbtup
Copy link
Contributor Author

hvbtup commented Nov 30, 2021

Sorry, I don't understand. I tried what you said, adding an entry to the birt.ini file. But then Eclipse won't start at all when I start birt.exe or birtc.exe.

@hvbtup
Copy link
Contributor Author

hvbtup commented Nov 30, 2021

"Kommando zurück!"
I forgot to setup the environment variables.
birt[c].exe will start, but where do I find the debugging output (I guess you expect a log file to be generated somewhere?)

@hvbtup
Copy link
Contributor Author

hvbtup commented Nov 30, 2021

I temporarily disabled GData AV and this had no positive effect on BIRT startup time.

I also let ProcMon watch the OS calls from javaw.exe during the start, and this didn't show anything interesting (but again, I'm not an expert here).

My gut feeling tells me this is an issue with Apache Felix and it has nothing to do with antivirus software (at least in my case).

The word "permutation" in the source code of Felix ResolverImpl makes me (coming from mathematics) think there is an inevitable general problem (regarding CPU time and memory consumption) if there are many plugins(?).

I found this: https://bugs.eclipse.org/bugs/show_bug.cgi?id=571908
and this: https://groups.google.com/g/bndtools-users/c/2u7s7xdWgTo

One of the commenters (BJ Hargrave) in the Google Groups discussion said:

"The resolver has a hard problem to solve (I believe Glyn Normington demonstrated it to be NP-complete; see SAT solvers)"

This is directly related to my concern regarding the permutations.

How can I determine which release of the Felix resolver BIRT is using?
And could we try to use a newer release?

And as shown in the Google Gruops discussion, the problem may be related to the bundling somehow (again, as JB Hargrave said: "That is also what I am talking about. Having 2 candidates for the requirement log4j doubles the solution space the resolver has to search. If you only have one version of log4j, instead of 2, as a candidate and all bundles will work with that, then you reduce the solution space in half!")

@gab1one
Copy link
Contributor

gab1one commented Nov 30, 2021

You will need to start the application from a terminal, and then you will see the debug output there.
I would however launch the application from eclipse using the debugger. This way you can pause at any time and investigate the output, you will want to, as there will be a lot of text.

@hvbtup
Copy link
Contributor Author

hvbtup commented Nov 30, 2021

In my start script I had the following command line:

start /Dws\birt %BIRT_INST_HOME%\eclipse\birtc.exe -showlocation -clean -data %CD% -vm "%JAVA_HOME%\bin\javaw.exe" -vmargs -Djava.util.logging.config.file=%CD%/birt-logging.properties %BIRT_OTHER_VM_ARGS%

By chance, I found out that I can avoid the slow startup time for subsequent starts by removing the "-clean" argument.

These observations were confirmed by a colleague with a new computer.

To sum it up:

  1. The first start takes very long.
  2. Subsequent starts take very long if the "-clean" argument is present.

The reason for 2) is the Felix ResolverImpl and its NP-complexity.
The reason for 1) is the same + possibly the antivirus software.

I never had any trouble with "-clean" with BIRT 4.2.1. It only caused 1-2 or seconds extra startup time (that's why I added this to my start script by default), whereas now it causes 2 minutes extra startup time.

@hvbtup
Copy link
Contributor Author

hvbtup commented Nov 30, 2021

The osgi.debug thing doesn't work for me.
birt.exe (and birtc.exe as well) both start javaw.exe.
I think this prevents any stdout/stderr output.

@gab1one Perhaps it would make sense to start a teamviewer session?
If you like, call me +49 234 94375 14

@wimjongman
Copy link
Contributor

Yes, -clean is the issue. Do not that.

Also, your permutation hunch has a 99.9% probability of being wrong ;)

I am sure it is the windows defender scanner. Whenever you do -clean all jars are being extracted every time. Eclipse does not save state when you do this. Only use -clean when Eclipse is corrupted (haven't used it since 2013 ;)

google for this: https://www.google.com/search?client=firefox-b-d&q=windows+defender+file+scan+disable
e.g. https://www.windowscentral.com/how-exclude-files-and-folders-windows-defender-antivirus-scans

@wimjongman
Copy link
Contributor

The osgi.debug thing doesn't work for me. birt.exe (and birtc.exe as well) both start javaw.exe. I think this prevents any stdout/stderr output.

@gab1one Perhaps it would make sense to start a teamviewer session? If you like, call me +49 234 94375 14

To trace startup times: follow this https://www.vogella.com/tutorials/EclipsePerformance/article.html#example-tracing-the-startup-time-of-plug-ins

@hvbtup
Copy link
Contributor Author

hvbtup commented Nov 30, 2021

If it was the windows defender or AV software causing this issue, then why isn't BIRT 4.2.1 (or other older BIRT versions or the one I built from hvbtup/birt with Java 8) affected as well?

@wimjongman
Copy link
Contributor

Maybe because that is signed content and what you are working with now is not?

@gab1one
Copy link
Contributor

gab1one commented Nov 30, 2021

I can reproduce the issue, I am very sure this is caused by the plugin resolver getting confused.
I just created a PR that shows how to reproduce this: #760

@wimjongman
Copy link
Contributor

Also, your permutation hunch has a 99.9% probability of being wrong ;)

I stand corrected.

@wimjongman
Copy link
Contributor

Thanks for your persistence, Henning. I merged @gab1one's patch, can you run your tests again and report?

@hvbtup
Copy link
Contributor Author

hvbtup commented Dec 1, 2021

Tested the patch and now the start with "-clean" is as fast as in the good old times again. Thanks, Gabriel!

@gab1one
Copy link
Contributor

gab1one commented Dec 2, 2021

Happy I could be of use :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants