POC: Allow hierarchical filtering of logger namespaces #22859

ghoz · 2024-05-31T21:57:38Z

When playing with the new log_namespaced_levels #22619, I expected that 'zh:zstack:unpi' :info would match all namespaces below, such as zh:zstack:unpi:parser , writer, etc

This PR implements a quick POC to allow that.
To limit runtime penalty it uses ~~the same namespacedLevels as~~ a lazy cache: the first time a new namespace is seen, it tries to find if any namespace above is defined in the config and stores the level, So in effect

advanced:
  log_namespaced_levels:
    'zh:zstack:unpi': info

Would also apply to zh:zstack:unpi:writer , zh:zstack:unpi:parser and anything else below zh:zstack:unpi I might not have seen yet. Of course if a more specific config is defined, it would be picked up.

This would allow to add new namespace sublevels without breaking existing configuration, for example z2m:mqtt:connexion and z2m:mqtt:publish

BUGS / TODOs :

The default log level cannot be dynamically changed , the cache would need to be reset when the level is changed at runtime
Fix logger tests as the namespacedLevels are now filled (no longer necessary)
Add new tests for the feature
There is most likely a nicer / more typescripty way for doing this ... expecting @Nerivec to replace that with a one liner (close enough)
Add a note in the documentation that the config applies to all namespaces below (or not even needed, that was my initial expectation) mention the hierarchical namespace for logging zigbee2mqtt.io#2786

Nerivec · 2024-05-31T22:46:40Z

Doing any kind of looping in log, which can be called hundreds of times per second, will grow cost really fast (and block), at least until the cache is built (or re-built after any level change).
The simple 1:1 match on configured object keys limits that problem and still allows to customize as needed, albeit manually for "below namespaces"...
debug has namespace regex filtering also (that's the level most affected by deeper namespaces).

If this is really needed, will require a well optimized algorithm, and performance impact testing in various "spammy" scenario (startup, OTA, MQTT debug, etc).

ghoz · 2024-06-01T04:48:55Z

looping in log, which can be called hundreds of times per second, will grow cost really fast

Most of those calls are using the same few namespaces, so after the first hits, the cost is the same as currently, 1 key lookup per call. I don't believe the number of entries in the cache would add any measurable cost to the lookup

I don't have the means of doing perf testing but even with that crude fallback when a new namespace is encoutered, I did not notice any slowdowns on my lowy setup.
After all we are not talking about 1000s of very deep namespaces, and z2m is most likely still starting up so I suspect the potential perf impact would be hidden in the noise...

Right now the debug regex approach would need to be implemented in info level to allow spliting mqtt connection/login info messages with the flood of mqtt updates ... It would be interesting to compare the regex match cost, that is incured on every log call this time.

We have really nice new hierarchical namespace, we might as well use it 😋

Nerivec · 2024-06-01T11:18:34Z

The cost once the cache is built would not be the problem. Actually, once it's built, it's probably a tiny bit faster.
But I suspect issues might arise when a cache rebuild is triggered. Performance is likely to drop 5-10x (even if just for a short time). If this is done during a heavy operation, I wouldn't be surprised if that crashes z2m, especially on low-resource machines.
Needs to be investigated further.
It also might be counterproductive in some scenarios for devs wanting to suppress higher-level logic but keeping lower one (granted, debug regex is likely to be more fitting here).

Nerivec · 2024-06-01T14:08:56Z

Ran some quick & dirty benchmarks for reference, simple ones on just the differences in logic involved (also threw in regex one for good measure since it's adjacent to this).
The benchmarks only have the base logic (filtering), no transport, no logging at all, just adding non-filtered messages to an array local to each test to have a minimal equal cost.

Current:

for loop x1M: 189.161ms
Benchmark.js: 6,220,255 ops/sec ±0.94% (89 runs sampled)

Looping (cache built once):

for loop x1M: 162.943ms
Benchmark.js: 6,868,936 ops/sec ±0.39% (90 runs sampled)

Current / Regex:

for loop x1M: 216.387ms
Benchmark.js: 4,832,131 ops/sec ±2.18% (87 runs sampled)

Current with object assign cost (just to compare below rebuild without this cost / not significant):

for loop x1M: 196.371ms
Benchmark.js: 5,528,133 ops/sec ±0.56% (90 runs sampled)

Looping with rebuild:

for loop x1M: 652.21ms
Benchmark.js: 1,625,674 ops/sec ±0.61% (90 runs sampled)

Code

const LOG_LEVELS: readonly string[] = ['error', 'warning', 'info', 'debug'] as const;
type LogLevel = typeof LOG_LEVELS[number];
const LEVEL = 'info';
const namespaces: string[] = [
    'zh:zstack:unpi:parser',
    'zh:zstack:unpi',
    'zh:zspec',
    'z2m',
];
let logger1NamespacedLevels: {[ns: string]: string} = {
    'zh:zstack:unpi:parser': 'warning',
    'zh:zstack:unpi': 'warning',
};
let logger2NamespacedLevels: {[ns: string]: string} = {
    'zh:zstack:unpi': 'warning',
};

const log1 = (level: LogLevel, message: string, namespace: string, picked: string[]): void => {
    const nsLevel = logger1NamespacedLevels[namespace] ?? LEVEL;

    if (LOG_LEVELS.indexOf(level) <= LOG_LEVELS.indexOf(nsLevel)) {
        picked.push(message);
    }
};

const assignLevel = (namespace: string): string => {
    let ns = namespace;

    while (true) {
        const sep = ns.lastIndexOf(":");

        if (sep === -1) {
            return logger2NamespacedLevels[namespace] = LEVEL;
        }

        ns = ns.slice(0, sep);

        if (logger2NamespacedLevels[ns]) {
            return logger2NamespacedLevels[namespace] = logger2NamespacedLevels[ns];
        }
    }
};
const log2 = (level: LogLevel, message: string, namespace: string, picked: string[]): void => {
    const nsLevel = logger2NamespacedLevels[namespace] ?? assignLevel(namespace);

    if (LOG_LEVELS.indexOf(level) <= LOG_LEVELS.indexOf(nsLevel)) {
        picked.push(message);
    }
};

const log3DebugNamespaceIgnoreRegex = new RegExp(/^zh:zstack:unpi.*/);
const log3 = (level: LogLevel, message: string, namespace: string, picked: string[]): void => {
    if (log3DebugNamespaceIgnoreRegex?.test(namespace)) {
        return;
    }

    const nsLevel = logger2NamespacedLevels[namespace] ?? LEVEL;

    if (LOG_LEVELS.indexOf(level) <= LOG_LEVELS.indexOf(nsLevel)) {
        picked.push(message);
    }
};

console.log(namespaces);
console.log(JSON.stringify(logger1NamespacedLevels));
console.log(JSON.stringify(logger2NamespacedLevels));
console.log(log3DebugNamespaceIgnoreRegex);

let current = 0;
let looping = 0;
let regex = 0;
let currentRebuild = 0;
let loopingRebuild = 0;

const benchCurrent = (): void => {
    const picked: string[] = [];
    log1('info', 'Test1-0', namespaces[0], picked);
    log1('info', 'Test1-1', namespaces[1], picked);
    log1('info', 'Test1-2', namespaces[2], picked);
    log1('info', 'Test1-3', namespaces[3], picked);
    current += picked.length;
};
const benchLooping = (): void => {
    const picked: string[] = [];
    log2('info', 'Test2-0', namespaces[0], picked);
    log2('info', 'Test2-1', namespaces[1], picked);
    log2('info', 'Test2-2', namespaces[2], picked);
    log2('info', 'Test2-3', namespaces[3], picked);
    looping += picked.length;
};
const benchRegex = (): void => {
    const picked: string[] = [];
    log3('info', 'Test3-0', namespaces[0], picked);
    log3('info', 'Test3-1', namespaces[1], picked);
    log3('info', 'Test3-2', namespaces[2], picked);
    log3('info', 'Test3-3', namespaces[3], picked);
    regex += picked.length;
};
const benchCurrentRebuild = (): void => {
    logger1NamespacedLevels = {
        'zh:zstack:unpi:parser': 'warning',
        'zh:zstack:unpi': 'warning',
    };
    const picked: string[] = [];
    log1('info', 'Test10-0', namespaces[0], picked);
    log1('info', 'Test10-1', namespaces[1], picked);
    log1('info', 'Test10-2', namespaces[2], picked);
    log1('info', 'Test10-3', namespaces[3], picked);
    currentRebuild += picked.length;
};
const benchLoopingRebuild = (): void => {
    logger2NamespacedLevels = {
        'zh:zstack:unpi': 'warning',
    };
    const picked: string[] = [];
    log2('info', 'Test20-0', namespaces[0], picked);
    log2('info', 'Test20-1', namespaces[1], picked);
    log2('info', 'Test20-2', namespaces[2], picked);
    log2('info', 'Test20-3', namespaces[3], picked);
    loopingRebuild += picked.length;
};

// for (const bench of [benchCurrent, benchLooping, benchRegex, benchCurrentRebuild, benchLoopingRebuild]) {
//     console.time(bench.name);
//     for (let i = 0; i < 1000000; i++) {
//         bench();
//     }
//     console.timeEnd(bench.name);
// }
// console.log(current, looping, regex, currentRebuild, loopingRebuild);

import Benchmark from 'benchmark';
const suite = new Benchmark.Suite();

suite.add('Current', benchCurrent);
suite.add('Looping', benchLooping);
suite.add('Regex', benchRegex);
suite.add('Current with object assign cost', benchCurrentRebuild);
suite.add('Looping with rebuild', benchLoopingRebuild);

// eslint-disable-next-line @typescript-eslint/no-explicit-any
suite.on('cycle', function(event: any) {
    console.log(String(event.target));
});

suite.on('complete', function() {
    console.log(JSON.stringify(logger1NamespacedLevels));
    console.log(JSON.stringify(logger2NamespacedLevels));
    console.log(current, looping, regex, currentRebuild, loopingRebuild);
    // @ts-expect-error bench
    console.log(this.filter('successful').map((b) => b.count));
});

suite.run({'async': true});

Nerivec · 2024-06-01T20:40:56Z

That last commit forces the use of the loop, definitely worse for performance.

ghoz · 2024-06-01T20:44:11Z

Thank you for taking the time to bechmark test this out... we really don't have the same definition of "quick and dirty" 😅
To be honest looking at your results, what surprises me most might be that the results with cache version is~10% faster than the current setup of using the default level if not found...

The cache build/rebuild situation would happen only at startup and level config change, and only on the first occurence of a namespace...
With the numbers, do you still think that the x3 hit that the first log of the namespace would take could be too slow and hang/crash a loaded low end machine ?

My sample of 1 is my current setup: a small /medium network of ~60 devices on a pi 3, that also hosts a small HA and a couple of other services ... not the beefiest setup but so far so good

ghoz · 2024-06-01T20:50:57Z

That last commit forces the use of the loop, definitely worse for performance.

It might be better to use the cache indeed 😄

Nerivec · 2024-06-01T22:08:14Z

It might be better to use the cache indeed

The spread operator is a no-go here too I think. It always has terrible performance for some reason. Object.assign(target, source) likely tons better.

what surprises me most might be that the results with cache version is~10% faster than the current setup of using the default level if not found...

It doesn't fallback (??) once the cached is filled. Seems to be exacerbated in this scenario...

This is likely to be a problem if triggered (a level is changed, cache needs rebuilding) while Z2M is already having a hard time keeping up with inc/out (during an OTA or on a network with a bunch of terrible spammy Tuya... the usual troublemakers), since this will affect lower levels the most, and serial comm is the most chatty.
Only the filtering is being benchmarked here, so ops/sec are pretty high, but with actual logging/transports, it's going to drop/block very suddenly, very hard (that's nearly a 4x drop without anything else running beside it), in scenarios that are already dicey on their own...
I'd say, at best, it increases the chance of things going wrong too much... That, versus adding a couple of lines to manually cover below levels (since this is likely to be used 98% of the time only to silence serial comm while in debug)... I'd pick the manual labor 😁

ghoz · 2024-06-02T06:34:38Z

I'm probably missing some context...
I have a zstack based adapter it uses very few low level namespaces... Does the ember use a lots ?

Even then from my understanding, during a heavy OTA, the namespaces used will just repeat and hit the cache, only the first handful of serial /ota /whatever tuya will get the namespace search hit... and the rest will be very slightly sped up...

And as you said the benchmark is only of the filtering, when adding the full transport overhead, i'm assuming the difference would even be less...

I do have a bunch of the chatty tuya power consumption stuff, but no OTA ready to test this...

On my side, I want to silence namespaces branches without having to go hunt for each new subnamespace 😬
And ideally being able to see the mqtt connection info but not the flood of updates (I was one of the few liking the debug level for those 😁)

Is anyone with a big network and some devices to update on a very ressource constrained machine willing to do some tests 😅 ?

Nerivec · 2024-06-02T14:06:06Z

Have you tried changing a level during a time of chatty behavior (or maybe try changing the log_level in Logs page at a fast pace, to stress-test a bit)? What I'm more worried about is that instant where the cache starts to rebuild and it's going to block each time a new namespace comes up until all are attributed a level (and the first few are likely to block the most). But it could just be me being paranoid 😬
Between zh and zhc, there are probably around 200 namespaces (although several are device-specific, so most people will only ever see a portion of that). You have to consider various scenarios like this one:

advanced:
  log_level: info
  log_namespaced_levels:
    zh: debug
    zhc: debug

You're right though, the cost of the extra logic is probably drowned in the relative expensiveness of the logging itself, the difference won't be less, but hidden because of the drastically lower ops/sec.
I'll try to run some tests with real logging to confirm.

Nerivec · 2024-06-02T17:38:33Z

With the commit I just added, it looks like this.

Since I changed benchmark tool, I figured I should post the previous stuff back for reference.

Task Name	ops/sec	Average Time (ns)	Margin	Samples
Current	'2,811,858'	355.6367133925188	'±0.16%'	14059291
Looping	'1,728,545'	578.5211784473531	'±0.22%'	8642726
Looping with separate cache	'1,719,766'	581.4743780591276	'±0.28%'	8598832
Regex	'2,097,416'	476.7770937359666	'±2.70%'	10487081
Current with object assign cost	'2,674,783'	373.8620109610369	'±0.24%'	13373919
Looping with rebuild	'928,875'	1076.5706056596546	'±0.12%'	4644378
Looping with rebuild with separate cache - Object.assign	'877,969'	1138.9922066567713	'±0.15%'	4389846
Looping with rebuild with separate cache - spread operator	'161,892'	6176.9242479398645	'±1.16%'	809465

(That darn spread operator 😓)

Now the actual logging (exact same logger class as Z2M, just made to stand alone), 'console' only as output ('file' introduces too much randomness). Bench with 5s runtime, 1s warmup. Each run calls logger.info 8 times, with 6 filtered out.

Task Name	ops/sec	Average Time (ns)	Margin	Samples
Current namespaced	'1,847'	541393.4846830966	'±2.53%'	9236
Looping namespaced	'1,876'	532841.1613888753	'±2.87%'	9384
Looping namespaced with cache reset in middle	'1,846'	541639.9772493636	'±2.68%'	9232

ghoz · 2024-06-03T07:27:15Z

Have you tried changing a level during a time of chatty behavior (or maybe try changing the log_level in Logs page at a fast pace, to stress-test a bit)? What I'm more worried about is that instant where the cache starts to rebuild and it's going to block each time a new namespace comes up until all are attributed a level (and the first few are likely to block the most).

I didn't go as far as script log level changes requests in a fast manner, but with "quick" manual changes, I didn't see any adverse effect...

lib/util/logger.ts

Nerivec · 2024-06-03T13:37:58Z

@Koenkk Can you see any usage issue with this automatic hierarchical filtering? It would prevent setting only an upper namespace to a specific level, without affecting below namespaces (would require manually setting the below namespaces, basically reverses current behavior).

Koenkk · 2024-06-03T18:22:13Z

@Nerivec I don't see any issues with that, PR also looks OK so let me know if this can be merged.

Edit: only thing left is a PR for a docs update.

Documents Koenkk/zigbee2mqtt#22859

ghoz · 2024-06-03T22:23:48Z

only thing left is a PR for a docs update.

done: Koenkk/zigbee2mqtt.io#2786

matches the closes defined namespace log level: advanced: log_namespaced_levels: 'zh:zstack:unpi': info would also apply to zh:zstack:unpi:write and zh:zstack:unpi:parse To limit runtime penalty uses the namespacedLevels as a lazy cache

Avoid poluting the namespacedLevel in the config / UI Add cache reset on config time log level change

Nerivec · 2024-06-04T17:57:34Z

Fix for #22935 will use winston hotpath and avoid splat logic entirely (/%[scdjifoO%]/g). Seems to be a bug/weirdness in winston (doesn't pass meta), but for z2m's case, might as well use the hotpath.

Koenkk · 2024-06-04T18:17:40Z

Thanks!

* mention the hierarchical namespace for logging Documents Koenkk/zigbee2mqtt#22859 * fix wording according to review

* POC: Allow hierarchical filtering of logger namespaces matches the closes defined namespace log level: advanced: log_namespaced_levels: 'zh:zstack:unpi': info would also apply to zh:zstack:unpi:write and zh:zstack:unpi:parse To limit runtime penalty uses the namespacedLevels as a lazy cache * Move namespace levels to separate cache Avoid poluting the namespacedLevel in the config / UI Add cache reset on config time log level change * Actually use the cache * Optimize/Cleanup * Add some namespace hierarchy filter tests * More coverage. * Fix Koenkk#22935 --------- Co-authored-by: Nerivec <62446222+Nerivec@users.noreply.github.com>

Koenkk · 2024-07-03T19:34:34Z

lib/util/logger.ts

@@ -102,9 +106,6 @@ class Logger {

            const options: KeyValue = {
                app_name: 'Zigbee2MQTT',
-                format: winston.format.printf(/* istanbul ignore next */(info) => {


Why was this removed? I think this causes #23241

Seems it might not output the message without expressly returning it with printf?

format: winston.format.printf((info) => info.message),

Strange that it's undefined though, I would think it would be the stringified json of info param as default if not message.

@ghoz could you comment on this?

I could, but I would not have much of interest to say ...
I see it has been removed when going for the winston hotpath directly
827e18e#diff-be38649fab0f205ab799e4de3d3b56ad7d54397f525e8f370cda402efcece524L109
but I'm not familiar with this low level to be honest...

I don't have a syslog enabled system to test it out, but I'd suggest reverting this hunk...
@Nerivec do you remember why it was removed ?

Confirmed. It works using:

const options: KeyValue = { app_name: 'Zigbee2MQTT', format: winston.format.printf((info) => info.message), ...settings.get().advanced.log_syslog, };

Not sure why it doesn't output json by default (might be the combine during logger init), but since we don't want json anyway, definitely a bit too much "cleanup" in my last commit here 😅
We might want to add a test to cover this codepath (would have to hijack way above syslog though I guess).

ghoz commented Jun 3, 2024

View reviewed changes

lib/util/logger.ts Show resolved Hide resolved

ghoz commented Jun 3, 2024

View reviewed changes

lib/util/logger.ts Show resolved Hide resolved

ghoz added a commit to ghoz/zigbee2mqtt.io that referenced this pull request Jun 3, 2024

mention the hierarchical namespace for logging

e8f1f10

Documents Koenkk/zigbee2mqtt#22859

ghoz added a commit to ghoz/zigbee2mqtt.io that referenced this pull request Jun 3, 2024

mention the hierarchical namespace for logging

bc87341

Documents Koenkk/zigbee2mqtt#22859

ghoz mentioned this pull request Jun 3, 2024

mention the hierarchical namespace for logging Koenkk/zigbee2mqtt.io#2786

Merged

ghoz force-pushed the feat-log-namespace-hierarchy branch from b6fef7e to 41fb6f3 Compare June 3, 2024 22:31

ghoz and others added 5 commits June 4, 2024 01:59

Move namespace levels to separate cache

87540a8

Avoid poluting the namespacedLevel in the config / UI Add cache reset on config time log level change

Actually use the cache

e813cf1

Optimize/Cleanup

0caeb07

Add some namespace hierarchy filter tests

803c76b

Nerivec force-pushed the feat-log-namespace-hierarchy branch from 41fb6f3 to 803c76b Compare June 3, 2024 23:59

Nerivec added 2 commits June 4, 2024 02:31

More coverage.

8cce8a3

Fix Koenkk#22935

827e18e

Koenkk merged commit 881037c into Koenkk:dev Jun 4, 2024
11 checks passed

Koenkk pushed a commit to Koenkk/zigbee2mqtt.io that referenced this pull request Jun 4, 2024

mention the hierarchical namespace for logging (#2786)

5fbfa80

* mention the hierarchical namespace for logging Documents Koenkk/zigbee2mqtt#22859 * fix wording according to review

Koenkk mentioned this pull request Jun 4, 2024

chore(dev): release 1.39.0 #22868

Merged

Koenkk reviewed Jul 3, 2024

View reviewed changes

Nerivec mentioned this pull request Jul 5, 2024

Fix syslog output. #23269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC: Allow hierarchical filtering of logger namespaces #22859

POC: Allow hierarchical filtering of logger namespaces #22859

ghoz commented May 31, 2024 •

edited

Loading

Nerivec commented May 31, 2024 •

edited

Loading

ghoz commented Jun 1, 2024

Nerivec commented Jun 1, 2024

Nerivec commented Jun 1, 2024

Nerivec commented Jun 1, 2024

ghoz commented Jun 1, 2024

ghoz commented Jun 1, 2024

Nerivec commented Jun 1, 2024

ghoz commented Jun 2, 2024

Nerivec commented Jun 2, 2024

Nerivec commented Jun 2, 2024

ghoz commented Jun 3, 2024

Nerivec commented Jun 3, 2024

Koenkk commented Jun 3, 2024 •

edited

Loading

ghoz commented Jun 3, 2024

Nerivec commented Jun 4, 2024

Koenkk commented Jun 4, 2024

Koenkk Jul 3, 2024

Nerivec Jul 3, 2024

Koenkk Jul 4, 2024

ghoz Jul 5, 2024

ghoz Jul 5, 2024

Nerivec Jul 5, 2024

POC: Allow hierarchical filtering of logger namespaces #22859

POC: Allow hierarchical filtering of logger namespaces #22859

Conversation

ghoz commented May 31, 2024 • edited Loading

Nerivec commented May 31, 2024 • edited Loading

ghoz commented Jun 1, 2024

Nerivec commented Jun 1, 2024

Nerivec commented Jun 1, 2024

Nerivec commented Jun 1, 2024

ghoz commented Jun 1, 2024

ghoz commented Jun 1, 2024

Nerivec commented Jun 1, 2024

ghoz commented Jun 2, 2024

Nerivec commented Jun 2, 2024

Nerivec commented Jun 2, 2024

ghoz commented Jun 3, 2024

Nerivec commented Jun 3, 2024

Koenkk commented Jun 3, 2024 • edited Loading

ghoz commented Jun 3, 2024

Nerivec commented Jun 4, 2024

Koenkk commented Jun 4, 2024

Koenkk Jul 3, 2024

Choose a reason for hiding this comment

Nerivec Jul 3, 2024

Choose a reason for hiding this comment

Koenkk Jul 4, 2024

Choose a reason for hiding this comment

ghoz Jul 5, 2024

Choose a reason for hiding this comment

ghoz Jul 5, 2024

Choose a reason for hiding this comment

Nerivec Jul 5, 2024

Choose a reason for hiding this comment

ghoz commented May 31, 2024 •

edited

Loading

Nerivec commented May 31, 2024 •

edited

Loading

Koenkk commented Jun 3, 2024 •

edited

Loading