Skip to content
This repository has been archived by the owner on May 7, 2020. It is now read-only.

Allow extra execution threads for use on large systems hosted on powerful hardware #5040

Closed
sheppy99 opened this issue Feb 6, 2018 · 23 comments
Labels

Comments

@sheppy99
Copy link

sheppy99 commented Feb 6, 2018

I've just migrated a large OpenHAB system from 2 x Raspberry Pi's onto one i3-7100u mini PC with 8GB RAM and after consolidating the rules from both machines I ran into issues with rules taking many seconds to fire due, I think, to awaiting spare program threads to use. Most of the time the Java process uses less than 5% of one CPU core and less than 12% memory, yet I seem to run out of resources for rules. There are no warnings in the log for this condition, rules just don't trigger for several seconds. I'm currently running Snapshot 1203 in case its relevant and had the same issues in V2.2 stable
As an enhancement can a setting be added to expand the resources of OpenHAB?

Apologies if I have posted this in the wrong place!

@maggu2810
Copy link
Contributor

You can set the thread pool sizes as it is already done in demo configuration

# Configuration of thread pool sizes
org.eclipse.smarthome.threadpool:thingHandler=3
org.eclipse.smarthome.threadpool:discovery=3
org.eclipse.smarthome.threadpool:safeCall=3

The RuleEngineImpl gets a scheduled pool from the ThreadPoolManager

    protected final ScheduledExecutorService scheduler = ThreadPoolManager
            .getScheduledPool(RuleEngine.class.getSimpleName());

Have you ever tried using

org.eclipse.smarthome.threadpool:RuleEngine=15

@sheppy99
Copy link
Author

sheppy99 commented Feb 6, 2018

@maggu2810 Thanks very much for the prompt reply, I wish I'd posted before doing a lot of work rewriting my rules. Which files are these settings in?

@maggu2810
Copy link
Contributor

I don't know the directory layout of openHAB. But I am pretty sure that service is some cfg file that already contains some lines starting with org.eclipse.smarthome.threadpool.

@sheppy99
Copy link
Author

sheppy99 commented Feb 6, 2018

A quick bit of detective work reveals /var/lib/openhab2/config/org/eclipse/smarthome/threadpool.config
which contains

discovery="5"
safeCall="10"
service.pid="org.eclipse.smarthome.threadpool"
thingHandler="5"

putting
RuleEngine=15 in there doesn't survive a reboot.

HOWEVER:
ruleEngine="15" does.

Is there anyway of checking that anything has changed other than trying to run 16 or more rules at once?

@mhilbush
Copy link
Contributor

mhilbush commented Feb 7, 2018

@sheppy99

Caveat: not sure changing these is recommended.

You can set these in conf/services/runtime.cfg. You would add lines like this, then change the NN to whatever values you want. I think it will pick these up without having to restart.

# Configuration of thread pool sizes
org.eclipse.smarthome.threadpool:thingHandler=NN
org.eclipse.smarthome.threadpool:discovery=NN
org.eclipse.smarthome.threadpool:safeCall=NN
org.eclipse.smarthome.threadpool:RuleEngine=NN

In the karaf console, you can type the following to see the values.

config:list "(service.pid=org.eclipse.smarthome.threadpool)"

@sheppy99
Copy link
Author

sheppy99 commented Feb 7, 2018

@mhilbush Thanks Mark, I have added the section to runtime.cfg, and changes are reflected instantly in both /var/lib/openhab2/config/org/eclipse/smarthome/threadpool.config and also the Karaf Console.
I had to change the letter "R" in the command to lowercase to:

org.eclipse.smarthome.threadpool:ruleEngine=NN``

Do you think it Is worth getting this option documented or should I now just close this issue?

@mhilbush
Copy link
Contributor

mhilbush commented Feb 7, 2018

@sheppy99 I don't really know. Maybe you could let this config run for a while, then report back here if you see any changes (improvements or degradation) in performance.

@mhilbush
Copy link
Contributor

mhilbush commented Feb 7, 2018

@sheppy99 I actually don't think that you can change the rule threads using that config parameter.

@sheppy99
Copy link
Author

sheppy99 commented Feb 7, 2018

@mhilbush That's interesting. I've just started up my old Pi3 running OH V2.2 and it doesn't have a ruleEngine parameter in Karaf as a default, so if it is relevant then I guess there is a built in value that doesn't show up until its overridden. I also just tried changing it to 1, and then to 0 and a test rule still ran.
Do you have any suggestions about where this parameter is? Having spent a lot of the weekend rewriting my rules to ensure that they use the minimum number of threads I don't really want to try and break things again.
EDIT: A quick google leads me to https://github.com/eclipse/smarthome/blob/master/bundles/core/org.eclipse.smarthome.core/src/main/java/org/eclipse/smarthome/core/common/ThreadPoolManager.java
which suggests to my untrained eyes on line 55, that each pool has 5 threads by default

protected static final int DEFAULT_THREAD_POOL_SIZE = 5;

Maybe it changes in Karaf immediately, but doesn't affect the rule engine until a reboot

@mhilbush
Copy link
Contributor

mhilbush commented Feb 7, 2018

From what I can see, it doesn't look like it can be changed from the default.

@sheppy99
Copy link
Author

sheppy99 commented Feb 7, 2018

OK, Further googling leads me to https://community.openhab.org/t/handler-takes-more-than-5000ms-for-processing-event/22158/20 which mentions adding
org.eclipse.smarthome.threadpool:safeCall=10

which seems to have become the default, so I've changed it to 15 and will see what happens. First impressions seem like slightly more memory is in use after a reboot, which suggests that something may have happened from one of the 2 parameters.

I'll update this thread in case anything happens, and hopefully someone else can add anything relevant when Europe wakes up in a few hours. Thanks for your help Mark @mhilbush and Markus @maggu2810

@maggu2810
Copy link
Contributor

@htreu
Copy link
Contributor

htreu commented Feb 7, 2018

Also please note the observations regarding rule execution during *.rules file parsing reported here: #4716 (comment)
The reported issue is due to internal synchronisation and will not resolve by increasing the thread pool.

@kaikreuzer
Copy link
Contributor

Note that this is only an issue when editing and reloading rules. On an idle system, this should be unrelated.
Adding

org.eclipse.smarthome.threadpool:RuleEngine=NN

to your services/runtime.cfg file should indeed be the best thing to do (without having it tested myself).

rules just don't trigger for several seconds

Can you say whether it seems to be queued, because some other rules are already actively executed at the same time or does it even happen, if nothing else is running?

@mhilbush
Copy link
Contributor

mhilbush commented Feb 7, 2018

Not sure why I missed that. I had set DEBUG on org.eclipse.smarthome.core to look for the update to the pool size. I just plain missed the line that showed the RuleEngine change.

@mhilbush
Copy link
Contributor

mhilbush commented Feb 7, 2018

@sheppy99 Sorry for confusing you last night.

I also just tried changing it to 1, and then to 0 and a test rule still ran.

I think this is because there's a timeout on the threadpool that removes threads from the pool after a time period (currently 65 seconds). So, you needed to wait more than 65 seconds for any existing threads to be timed out.

I just ran a test with 8 rules all triggered by the same item. Setting the RuleEngine parameter definitely constrains the number of rules that will run concurrently. Note that when you lower the RuleEngine parameter, the change takes effect, but only after the timeout of any threads in the pool above the new RuleEngine value.

@sheppy99
Copy link
Author

sheppy99 commented Feb 7, 2018

Thanks everyone for the replies.
@mhilbush The timeout explains it.
@kaikreuzer The late running rule was definitely queued as it executed the moment another rule exited. I had a rule that ran a Python script via executeCommandLine and others that were triggering on power use every 5 seconds. At exactly the moment the python script rule finished the delayed rule started, the times in the log was how I found it. When the system is less busy rules trigger instantly, and it is usually very fast.
I've since moved the Python Script to an Exec Binding Item, and also changed the power use rules to trigger less often via cron, but I want to expand my resources so this doesn't happen again.
With the hardware I'm using there shouldn't be any problems with speed but I do have a lot of items that continually update, so the potential for lots of rules to be triggered is there,
What does org.eclipse.smarthome.threadpool:safeCall affect? Is there a seperate setting for bindings that run scripts such as the exec binding?

@sjsf sjsf added the question label Feb 9, 2018
@sheppy99
Copy link
Author

sheppy99 commented Feb 13, 2018

To update this I've been running with the revised settings for 5 days now and everything has been running smoothly. My settings in runtime.cfg are:

org.eclipse.smarthome.threadpool:safeCall=15
org.eclipse.smarthome.threadpool:ruleEngine=16

Could the developers explain what the difference between the 2 settings are please, and maybe if this continues to prove a useful change can this be added to the documentation?

@htreu
Copy link
Contributor

htreu commented Feb 13, 2018

The safeCall thread pool mainly controls the concurrency for handling commands and state updates. Every ThingHandler#handleCommand and ThingHandler#handleUpdate is run through the SafeCaller in its own Thread.
The ruleEngine thread pool controls the concurrency for rule execution, basically how many rules can be executed at the same time.

@htreu
Copy link
Contributor

htreu commented Feb 13, 2018

For tasks which are scheduled from a ThingHandler (like script execution from the Exec binding or periodic polling) there is a separate thread pool thingHandler which can be configured.

@sheppy99
Copy link
Author

@htreu Thanks very much for that, it explains a lot.
Is it possible to get these parameters added to the documentation?

@htreu
Copy link
Contributor

htreu commented Feb 14, 2018

Yes, we should add documentation for this. Please see #5081.
Can this be closed then? It seems your configuration of thread pools did the trick.

@sheppy99
Copy link
Author

Yes this can be closed, thanks everyone for your help!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants