feat(logging): Provide JSON-formatted structured logging as option #1308

vtaskow · 2023-07-21T10:36:45Z

Background

Structured logging is a process for generating log records in a format that is easily-read by a machine. The aims to provide analytics and insights from logs.
In the rest of our setup, components' logs are sent to fluentd and then to Elastic Search for processing.

Why

Currently, MLServer does not provide an option for producing structured logging. Enabling structured logging on initialisation will be beneficial in case users would like to create log insights.

How

There are several python libraries that support structured logging(structlog, python-json-logger), however, our case is very simple. Adding an additional dependency would be an overcomplication and the actual code written is not very much different from the proposed solution.

Formatting

In cases when model name is missing, a key-value pairs for both model name and version should not be present in the log.
In cases when model version is missing, a key-value pair for it should not be present in the log.
When mlserver is started, one of the two log formats is chosen based on the newly added flag in the settings.

agrski · 2023-07-21T14:45:39Z

mlserver/logging.py

@@ -42,7 +42,9 @@ class ModelLoggerFormatter(logging.Formatter):
    """

    _UNSTRUCTURED_FORMAT = "%(asctime)s [%(name)s]%(model)s %(levelname)s - %(message)s"
-    _STRUCTURED_FORMAT = "{'time':'%(asctime)s', 'name': '%(name)s', 'level': '%(levelname)s', 'message': '%(message)s' %(model)s}"
+    _STRUCTURED_FORMAT = "{\"time\":\"%(asctime)s\", \"name\": \"%(name)s\", " \


💭 I realise this PR is still in draft, but offering a couple of suggestions that might make things easier.

Firstly, as JSON requires double quotes, why not use single quotes to wrap the entire thing as a Python string? I.e. '{"time": "%(asctime)", ...}'.

Secondly, as this is still prone to human errors, why not represent the entire thing as a dictionary and turn this into the format string using marshalling (json.dumps)? As a tiny example:

import json d = {'time': '%(asctime)'} json.dumps(d) # '{"time": "%(asctime)"}'

Doing this also makes it easy to add key-value pairs, such as model details below, and avoids having to worry about where in a string something goes as this is deferred to the point of marshalling, rather than determining the values to include.

For greater control over validation on provided fields, a data class could be used, and those have been around since Python 3.7 I think. I suspect this may also be desirable because the field order should be fixed across invocations.

Good point about wrapping with single points, thanks! Re dumps: what we want is to not have model_name or model_version when those are not set by the ctx vars, and in the dictionary we can't have only key without value. E.g:

_STRUCTURED_FORMAT = { "time": "%(asctime)s", "name": "%(name)s", "level": "%(levelname)s", "message": "%(message)s", "model": ? } VS '{"time": "%(asctime)s", "name": "%(name)s", "level": "%(levelname)s", ' '"message": "%(message)s" %(model)s}' // model here would contain the key-v vair with commas for proper json formatting

One thing I can do is keep the dict version, and then dump it and attach the model before the closing brace in the super().init(...). But this becomes a bit messy. I think the single quotes-option is cleanest. I am open to other suggestions, of course

One thing we are trying to avoid is modifying the "private" _format on the fly in the format() method, as it's not thread-safe from what I read

For greater control over validation on provided fields, a data class could be used, and those have been around since Python 3.7 I think. I suspect this may also be desirable because the field order should be fixed across invocations.

Yeah, I was thinking about wrapping those in a class but for now it seems that we won't benefit as much from it. Possibly if we introduce further model context fields or require validation. (I think the model name and version validation should be contained in the ModelSetting class where they are initialised).

Not sure I understand the point about field order across invocations. Could you explain 🤔

Not sure I understand the point about field order across invocations. Could you explain

Python implemented reproducible dictionary iteration a while back, I'm pretty sure, but what happens if a dict grows? Expanding it means changing the hashing function, which means the iteration order changes. So you might see time, name, message, level one time then message, model, time, level, name another - hard to read across lines!

Using a dataclass might prevent that by having a set dict size to use and/or allowing custom definition of the time order, e.g. time first, etc.

One thing we are trying to avoid is modifying the "private" _format on the fly in the format() method, as it's not thread-safe from what I read

That would make sense, although coroutines for async calls are running on a single thread, so it'd need a bit more thought as to whether there are any suspending points that might cause such issues.

💭 How about instead just fully defining the format, rather than doing string joins on the fly? Slightly duplicative, but also really explicit and obvious as to what's expected.

You are right about growing the dictionary and not preserving the order. However, I would say that if the structured logging is enabled, it most likely means the logs will be fed to a pipeline for processing and then to a log management system for further inspection. And also if we keep the current structured log format which is a string, the stdout will contain strings which resemble valid json and the order of the fields will not be changed across invocations. A side note from that, in Python specifically, if we have a pipeline that loads those logs via json.loads(), then the order of fields will not be guaranteed by default, but an additional param for OrderedDict can be passed to handle this case.

Hmm, could you give an example of fully defining the format without string joining 🤔

…is passed

vtaskow · 2023-07-26T09:08:04Z

mlserver/logging.py


-        logger.addHandler(stream_handler)
+    handler.setFormatter(ModelLoggerFormatter(settings))


If the CLI is used to run mlserver start ., first the CLI will configure a logger and then start the MLServer(settings) which will then again configure a logger but with settings. This is all within the same thread. In the previous version, the handler was initialised with the wrong formatter first and then when the MLServer class was configuring the logger, it wasn't setting a formatter because there was already a handler for the logger.

adriangonz

Changes look good from my side @vtaskow ! 🚀

I've added a minor nitpick but, as a nitpick, you can totally ignore it.

mlserver/context.py

602 Introduce str logging

42c163b

vtaskow requested a review from adriangonz July 21, 2023 10:36

vtaskow self-assigned this Jul 21, 2023

vtaskow added 3 commits July 21, 2023 13:22

Verify structured logging format for model details

6941b10

Add settings flag explanation

e5c3140

Refactor formatter docstr and varsd

56ae4f1

vtaskow changed the title ~~602 Introduce str logging~~ feat(logging): Provide JSON-formatted structured logging as an option Jul 21, 2023

Test structured logging formatting

2a01680

agrski reviewed Jul 21, 2023

View reviewed changes

Remove redundant dquote escaping in json logger format

7ac1275

vtaskow added the enhancement New feature or request label Jul 21, 2023

vtaskow changed the title ~~feat(logging): Provide JSON-formatted structured logging as an option~~ feat(logging): Provide JSON-formatted structured logging as option Jul 21, 2023

vtaskow marked this pull request as ready for review July 21, 2023 16:09

vtaskow added 2 commits July 21, 2023 17:10

trigger GitHub actions

262ca74

Shorten flag description

f50f63e

vtaskow marked this pull request as draft July 24, 2023 10:39

vtaskow added 2 commits July 24, 2023 14:16

Delete unused file

4b69b44

Override formatter on subsequent configurations when settings object …

cf68aa1

…is passed

vtaskow commented Jul 26, 2023

View reviewed changes

vtaskow marked this pull request as ready for review July 26, 2023 09:11

adriangonz approved these changes Jul 26, 2023

View reviewed changes

mlserver/context.py Outdated Show resolved Hide resolved

Lowercase global vars, they are not constants

3701066

adriangonz merged commit 57208d9 into SeldonIO:master Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(logging): Provide JSON-formatted structured logging as option #1308

feat(logging): Provide JSON-formatted structured logging as option #1308

vtaskow commented Jul 21, 2023 •

edited

Loading

agrski Jul 21, 2023 •

edited

Loading

agrski Jul 21, 2023 •

edited

Loading

vtaskow Jul 21, 2023

vtaskow Jul 21, 2023

vtaskow Jul 21, 2023

agrski Jul 21, 2023

vtaskow Jul 21, 2023

vtaskow Jul 21, 2023

vtaskow Jul 26, 2023

adriangonz left a comment


		logger.addHandler(stream_handler)
		handler.setFormatter(ModelLoggerFormatter(settings))

feat(logging): Provide JSON-formatted structured logging as option #1308

feat(logging): Provide JSON-formatted structured logging as option #1308

Conversation

vtaskow commented Jul 21, 2023 • edited Loading

Background

Why

How

Formatting

agrski Jul 21, 2023 • edited Loading

Choose a reason for hiding this comment

agrski Jul 21, 2023 • edited Loading

Choose a reason for hiding this comment

vtaskow Jul 21, 2023

Choose a reason for hiding this comment

vtaskow Jul 21, 2023

Choose a reason for hiding this comment

vtaskow Jul 21, 2023

Choose a reason for hiding this comment

agrski Jul 21, 2023

Choose a reason for hiding this comment

vtaskow Jul 21, 2023

Choose a reason for hiding this comment

vtaskow Jul 21, 2023

Choose a reason for hiding this comment

vtaskow Jul 26, 2023

Choose a reason for hiding this comment

adriangonz left a comment

Choose a reason for hiding this comment

vtaskow commented Jul 21, 2023 •

edited

Loading

agrski Jul 21, 2023 •

edited

Loading

agrski Jul 21, 2023 •

edited

Loading