Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: improve stdlib logger #5874

Closed
ezrast opened this issue Mar 27, 2018 · 84 comments · Fixed by #8847
Closed

Proposal: improve stdlib logger #5874

ezrast opened this issue Mar 27, 2018 · 84 comments · Fixed by #8847

Comments

@ezrast
Copy link

ezrast commented Mar 27, 2018

This is piggybacking off of some chat discussion by @RX14 and @straight-shoota earlier today.

Currently there is no great way for library authors to log without friction. A library can't just write to STDERR arbitrarily because that's a potentially unwanted side effect. It instead has to take an optional Logger? that is nil by default, which requires boilerplate and is unlikely to actually be used.

If we instead establish a class-based logging hierarchy into the stdlib, library code can log unconditionally, with downstream users selectively enabling logs from the libraries they care about. I propose improving the situation by making the following changes:

Structure loggers hierarchically, with granular level controls

A Logger should be able to take a parent, to whom the actual log writing is delegated. By default, a child Logger's level should be nil, which means it inherits the level of its parent.

main_logger = Logger.new(STDOUT)
child = Logger.new(main_logger)
noisy_child = Logger.new(main_logger)

main_logger.level = Logger::WARN
noisy_child.level = Logger::DEBUG

child.info{ "This logger uses its parent's log level and this message does not get printed" }
noisy_child.info{ "This message will get printed because noisy_child has its own log level set" }

Make logging class-aware by default

It should be trivial to define a logger that is associated with a particular class or module. That logger will inherit from the logger of its parent in the module namespace hierarchy. Parents can be created automagically.

module MyProgram
  class App
    include Loggable # creates a Logger and adds helper methods e.g. `warn`, `info`
    def initialize
      info{ "creating an app" }
    end
  end
end

# Sets the effective log level for every Loggable under the MyProgram namespace.
# Note that the Logger for MyProgram got created behind the scenes at the same time as the Logger for App
Loggable.get_logger(MyProgram).level = Logger::DEBUG

# Logs the following (note the class name is included): 
# I, [2018-03-26 14:43:17 -07:00 #25575] INFO -- MyProgram::App: creating an app
app = MyProgram::App.new

Loggable will have a root level logger that by default does not log anything, so library code will stay quiet. To up the verbosity, just do:

Loggable.get_root_logger.level = Logger::DEBUG  # Enables logging for every class and module in your program, including libraries.

Unrelated to the above points, Loggers currently only operate on IO objects. Modern applications frequently eschew traditional log files in favor of structured logging, so this behavior should be more customizeable. Thus, I also propose to

Abstract Logger away from IO

Instead of storing an IO directly, Loggers should store a LogWriter, which is a module with abstract def write on it. The standard library should include an IOLogWriter class that takes on the formatting behavior of the current Logger, but is interchangeable with custom-defined classes like so:

class GelfLogWriter
  include LogWriter

  def initialize(@transport, @host, @port: 12201)
  end

  def write(severity, datetime, progname, message)
    # build the document and send it over the network here...
  end
end

Loggable.get_root_logger.handler = (GelfLogWriter.new :udp, "graylog.example.com")

I'll be happy to work on implementing these changes if the core team thinks they'd be useful.

@straight-shoota
Copy link
Member

The biggest selling point has not even been mentioned: class-aware loggers should also be hierarchical, meaning you can configure logger settings for MyProgram which also apply to enclosed namespaces like MyProgram::App unless the setting is overwritten somewhere down the hierarchy. This allows for very fine grained control of the log level for individual components.

It would be great to have this integrated in the stdlib so every shard can and should use it as a single logging solution which integrates with every Crystal library.

Just a note: I don't think a Loggable module makes much sense. Log methods should not pollute the instance namespace but use the logger as explicit recipient. Then it comes down to essentially LOGGER = Logger.new(self.class) or something similar in the class scope, which really doesn't need to be extracted in a module.

@ysbaddaden
Copy link
Contributor

  1. How does encapsulating loggers solve anything?

For example, this requires a child logger to know the parent app or library, but that's impossible unless the parent configures the child... why bother with encapsulation, then, instead of directly passing a logger instance?

  1. IMHO a better approach would be to introduce a global Logger actor. It would unning on its own fiber, and be configured once for the whole application (level, filters, output adapter such as IO, syslog, ...). Each library could then send it messages to log, with a library identifier to filter.

That doesn't solve the hierarchical issue, but it does solve the proliferation of having many logger instances.

@RX14
Copy link
Contributor

RX14 commented Mar 27, 2018

I think a better solution is to have many loggers, but centralised configuration. Each logger will know what namespace it's at (::HTTP::Client for example), then when you change configuration, the central logging configurator will apply rules based on starts_with?, ordered by priority according to size. For example ::.log_level will configure every logger (every logger's namespace starts with ::) and ::HTTP.log_level will configure just loggers in the stdlib's HTTP library. The ::HTTP configuration will always override the :: configuration because it's a longer string.

@ezrast
Copy link
Author

ezrast commented Mar 27, 2018

For example, this requires a child logger to know the parent app or library, but that's impossible unless the parent configures the child...

If you index loggers based on class name (as given by self.class.name), you can find a parent by splitting on "::" and dropping the last element.

My reasoning for using Loggable (which needs a better name but I'll go with it for the moment) was that the system really benefits from widespread adoption, and including a module is the most low-friction, crystal-idiomatic way to indicate that a class should do a thing. LOGGER = Logger.new(self.class) is less atomic than include Loggable. Without knowing anything else about the system, the latter says to me "this is the correct way to do loggy things" more than the former.

@RX14, I don't understand what that offers over what I or @ysbaddaden have suggested. Is the configurator setting log levels on each logger one by one? If I do ::Foo::Bar::LOGGER.level = Logger::INFO and then LoggerConfigurator.set_level(::Foo, Logger::WARN), what happens?

@straight-shoota
Copy link
Member

@ysbaddaden I agree logging should be global. Local Logger instances would just be thin wrappers and delegate to the main logger, decorated with the current type/logging scope.

The design described by @RX14 would probably work out pretty similar and it seems to me that it would merely be an implementation detail whether distributing log messages to subscribed handlers happens local with global config or global.

@ezrast Well, it could also be a macro (for example def_logger or whatever) instead of include Loggable. My point is that there is no need for having a module for nothing more than defining a single class constant. And IMO it shouldn't need anymore else. That's enough for Java's excellent logging system, why would we need anything more?

@ysbaddaden
Copy link
Contributor

I think a global / centralized Logger in Crystal makes little sense. Unlike Elixir, Crystal doesn't provide solutions to structure services in an application.

Having scattered logger instances is a pain, and the solutions proposed here are really nice, but they assume a particular way to design and model an application (namespaces, hierarchical class names, central configuration). Again: Crystal's stdlib doesn't provide nor encourages anything more than spawning fibers and communicating through channels. I'm afraid assuming more in Logger will make it a pain to use in different contexts. Assuming namespaces / class names are hierarchical and meaningful out of any context is very weird.

I believe it's a framework job to come with solutions for this. See for example Earl::Logger which assumes a lot about the program, but Earl's job is precisely to structure apps.

That being said Logger must evolve. For example we could support and thus allow the configuration of:

  • one to many handlers, so we could log to STDOUT and to a syslog server for example;
  • filter mechanisms;
  • maybe formatters (but that may be strongly related to handlers).

@j8r
Copy link
Contributor

j8r commented Apr 5, 2018

@ysbaddaden You're quite right if we build a big web application, but for smaller ones like a system tool/API server aLogger will be very welcome. A framework for this will be overkill in this case, maybe a shard but using directly the stdlib's logger would be nice!

@straight-shoota
Copy link
Member

It's true that Crystal doesn't rely on strictly hierarchical namespaces for library components. But I don't think that necessary limits the usage of a hierarchical logging system. The logger namespaces don't even have to correlate to Crystal namespaces. It doesn't have to assume anything. It just provides a simple interface that is capable of logging hierarchically. If and how this feature is adapted depends on how libraries and apps use it.
For shards it is common and encouraged to put library code in a namespace. That would be at least some form of namespacing used in Crystal.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Apr 5, 2018

@j8r Application size is irrelevant. Logger should be generic enough to be useful in any environment.

@straight-shoota I'm reading on java.util.logging and I'm revising my judgement. There is a single Logger from which you can request a logger instance, reating a hierarchy; namespaces are any dot separated string; you can configure loggers from anywhere; it doesn't assume much about application design. That looks good.

@bcardiff
Copy link
Member

bcardiff commented Apr 9, 2018

WDYT about defining just the interface of the logger and leaving configuration / instantiation to the framework?

That way the shards and components could agree on how to send info to the logger.

It would be comfortable to be able to pass around the instance of a logger from a parent to a child so for example an http server can pass the loggers to the handlers.

It would be nice to support some kind of namespace/activity logging configuration. And allow them to be specified while calling the logger so the instance can be shared between parent and child.

So, from the current log(severity, message, progname) we could introduce a context that I imagine it will match usually the namespace/class of the caller.

For tracking the activity I am not sure if should lay in the stdlib. Let me show an example to ilustrate more: let's assume there is an app with and api to register a user. The user registration involves some db statement and some call to external services. If we want to control the user registration is not enough to state the the db namespace or the http api namespace should be logged. So the code base needs to say that the activity been done is user registration. How this information is passed around is either explicit in every call (IMO ugly) or the responsibility is of the framework + logging system used. If we took this later approach, then there is nothing to do regarding the tracking of activity, just to not block the possibility.

So my proposal for the stdlib is to make a module that will define the interface of logging with an additional context. Have a basic STDOUT implementation that only obey severity, and allow logging framework shards to play nice with that idiom (implementing filters by context).

@ezrast
Copy link
Author

ezrast commented Apr 10, 2018

@bcardiff, Have you looked at #5921? Over there the interaction with IO is split out into an IOAdapter class that can easily be swapped out to provide arbitrary logging behaviors, and the Handler class can be improved to allow filtering above and beyond the standard log level threshold. I think it basically lines up with what you're saying, except that Handler provides a default way of filtering selectively based on context. Are you saying that you'd rather have this functionality omitted from the stdlib, or extracted into a concrete implementation of a more basic/abstract handler class?

@bcardiff
Copy link
Member

@ezrast there I checked the PR. I do meant to leave the stdlib as plain as possible without assumptions of the lifecycle / locking of loggers. I would vouch for

  • providing a basic logger implementation that will filter by severity level
  • change reduce the interface of logger to decouple implementors (and it's configuration) from consumers
  • provide idiom to: state a component will use logging and forward loggers from parent component to child components.

More or less this is cover in https://gist.github.com/bcardiff/c5126a9c19fe36987819bc6e86ea7d3f so we can check ideas in actual code.

I would leave more advanced configuration options and loggers architecture out of the stdlib but compliant with the interface exposed in the gist.

@RX14
Copy link
Contributor

RX14 commented Apr 11, 2018

@bcardiff regarding your gist, how would you centrally configure all the outputs for every logger? I also don't like the design because it uses macros and introduces methods into the classes which want to use loggers.

My ideal logging interface would be:

class Foo::Bar
  @log = Logger.get(self.class) # (same as Logger.get("::Foo::Bar") - the :: could be replaced by . for easy parsing)

  def do_something
    @log.trace "doing something"
  end
end

I'm think that loggers should have "one way to do things". Passing around a logger instance, each instance having a namespace which by default inherits it's formatter and loglevel from it's parent, sounds flexible enough to me. And if java's had good success with the same model, why not?

Configuring loggers then becomes easy like:

Logger.get("::").level = Logger::INFO
Logger.get("::").handler = Logger::IOHandler.new(STDERR) do |severity, datetime, progname, message, io|
    label = severity.unknown? ? "ANY" : severity.to_s
    io << label[0] << ", [" << datetime << " #" << Process.pid << "] "
    io << label.rjust(5) << " -- " << progname << ": " << message
  end
Logger.get("::HTTP::Server").level = Logger::DEBUG

The first two lines would be the default of course.

@bcardiff
Copy link
Member

I won't expect by default that the loggers have a logger factory pero consumer class. But rather pass the instance from parent to child component. Whether that instance is the logger itself or if it instantiate objects in the middle it's up to the logger implementation.

But if the Logger.get(self.class) or other global registry is used, then the user won't be able to override the logger per component instance.

In the gist the first parameter of Logger.using "app.foo" would have a default value mangling the namespace of @type, definitely. So if no the component has dependencies onlye a Logger.using would be needed.

I like how calling debug info as protected method keeps the code clean, but is a personal opinion. I could go with log.debug, but in order to do that, in my proposed design where logger instances is shared, the instance returned by log (in log.debug) should be a struct that has the context information for component where it is been called. I would try to make a gist with that option to illustrate if polluting the object protected methods is a strong no.

There are many opinions on how is better to configure the logging. I think we can leave that for the framework / app, and decouple.

@RX14
Copy link
Contributor

RX14 commented Apr 11, 2018

the user won't be able to override the logger per component instance

what do you mean?

In Java's scheme you never need to "override a logger", since the Logger class becomes a shell, delegating to the handler, which contains all the logic.

the instance returned by log (in log.debug) should be a struct that has the context information for component where it is been called

It seems we've arrived at essentially the same design - except that your context struct is renamed to Logger, is mutable, and that the instances come from a registry. The instances coming from a registry allow for more flexibility than reaching straight for macros.

The problem I have is with your design is that "passing the logger from parent to child" is boilerplate, and doesn't make sense for logging from class methods.

As a final note, my example above is actually incorrect, @log should be @@log.

@bcardiff
Copy link
Member

Let's say you create two db connection, that will create statements. If you don't allow logger per instances and passing loggers from parent you child, how are you going to allow different logging configuration between the two db connection?

Maybe I am missing something that allow that to work: like registering in a global logger configuration per instances...

It's true that I was not considering class level logging. But will do.

I play a bit on how to get rid of the method pollution from the previous proposal, add default naming context and bind that context when invoking the logger. The code might seem weird but it achieves that the memory footprint is just a reference to a Logger (8 bytes).

@RX14
Copy link
Contributor

RX14 commented Apr 11, 2018

@bcardiff there's absolutely nothing stopping you creating arbitrary named loggers whenever you want and passing them wherever you want. In the above example I was just expressing the common idiom. Having a registry with custom string namespaces allows you to implement whatever idiom you want in your code - even the common idiom of 1 logger instance per class becomes a single line. Your proposal requires a macro to implement that.

@bcardiff
Copy link
Member

The difference without your suggested approach where named loggers are created and passed around is that, once passed, the context will be the one used at the creation of the logger and not where the logger is been called.

Maybe it's enough, but I was aiming for using a single logger instance in different code context (with potentially different log levels)

I will give it another spin to see what happen...

@ezrast
Copy link
Author

ezrast commented Apr 12, 2018

@bcardiff, I dislike the coupling of modules that comes along with your forward mechanism; adding or removing a child module shouldn't force you to update the implementation of the parent.

Additionally, I'm under the impression that part of the point of centralized configuration is that it makes it easy to deal with logs coming from code you don't control. Under your paradigm, if I require shards that want to log things, then to change logging behavior I have to invoke logger= on the top-level module of every single shard (either manually or through a forwarder of my own), and hope that the shard author set up all their forwarding correctly. It's not a ton of work but it's not the most streamlined.

With centralized configuration I just set behavior once at the root level, and every logger does what I want by default unless explicitly overridden. I can still have different components log at different levels or to different places; the only difference is whether the configuration is all stored in a single structure or scattered throughout all of the program's logger instances.

@RX14
Copy link
Contributor

RX14 commented Apr 13, 2018

the context will be the one used at the creation of the logger and not where the logger is been called.

But that's usually what you want. It offers the flexibility of both. If you want to specify the logger namespace at the callsite just call .get(...).log, if you want it to be per-class, put in in a class var, if you want it to be per-instance, put it in an ivar, if you want the whole logger to be passed into the class just do it.

I'd love to hear the usecase you have in mind (in code) which is difficult with this approach.

@bcardiff
Copy link
Member

My previous proposed design is more about how I see the architecture of an app, but I don't want to impose the design, but rather find a balance between flexibility and comfort.

Given the last feedback and the idea to keep things as decouple and flexible as possible I came to https://gist.github.com/bcardiff/55ebbf2e20b1670681e9352fbaa51be0 where Loggers are consumed by the user and have a context determined on instantiation and LoggersHandlers that actually emit the information to a stream configured by set_handler_factory.

The gist is more similar to #5921 and support the usecase @RX14 vouch for I think. There some differences thought: the logger itself has no configuration of level, that is up to the handler factory.

The framework/app is required to configure a handler_factory to create/return, for a given context, a handler that will receive log(severity, message, context) message. The logger, used by the user has a reference to the handler and the context at creation. So invocation to #log(severity, message) and #{{severity}}(message) will be able to be forwarded to the handler with the context.

A issue with this last gist is that class variables in the stdlib would be initialized before the framework defines the handler factory. This issue is also present in #5921 I think.

To solve that I see two options, the first, which I don't like is to assume the configuration of the handler factory is done via monkey patching (really 👎, I don't want to relay on monkey patching for this).
The second is defer the invocation of the factory until the first log event. This is done in https://gist.github.com/bcardiff/ea21543eff6437c508abb48c060116ef .

I didn't spend time deciding which should be a reasonable default value and logger for the stdlib. Once we agree with the common interface we can move forward into that and implement nice logger tweaking configurations outside the stdlib.

@RX14
Copy link
Contributor

RX14 commented Apr 16, 2018

@bcardiff I'm fine with that interface, it looks super flexible and has basically the same interface to the logger - although it seems like it's harder to configure on the app's side than just assuming inheritance. I'd personally stick to making assumptions here, but i'll wait for others to weigh in.

@ezrast
Copy link
Author

ezrast commented Apr 17, 2018

Even with lazily instantiated handlers, there's no way to gracefully change log levels after program initialization. If you want to enter "debug mode" after you've been running for a while you have to force all your modules to update their loggers somehow.

#5921 doesn't have quite the same issue in practice. It's true that you can't make libraries use any handler other than the default, but using the default handler is what you're expected to do 99% of the time anyway, and you can change its levels at any time.

That does raise the question of why I even bothered letting other Handlers be instantiable, so I probably have some simplification to do there.

@bcardiff
Copy link
Member

The limitation which affect also #5912 that I was referring to is that any logger used in the std and initialized as a class variable wont be able to use an overridden configuration invoked after the prelude is required.

Whether or not a log level change is supported in runtime or only on boot, in last proposal it will be up to the implementation since there is actual no implicit behavior in the std lib part.

@straight-shoota
Copy link
Member

@bcardiff In the end, your proposal uses a centralized instance, the logger factory. In #5921 this is called default_handler and it's a handler instance, instead of a proc.

This setup can avoid the initialization issue when Logger#log delegates to Logger.default_handler. You don't get individual handlers for each logging source by default, so this really calls for a central registry based on namespaces to configure log levels etc. This registry might or might not be included in the stdlib implementation.

@RX14
Copy link
Contributor

RX14 commented Apr 17, 2018

@bcardiff my solution doesn't have that problem since .get is memoised. There's still the problem that you can log before the logger is configured, but I don't think that's solvable. At least you can configure all loggers regardless of when they're initialized, since you can always access any logger instance through a centralised interface.

@paulcsmith
Copy link
Contributor

paulcsmith commented Feb 19, 2020

@asterite you are right. Good point. I like that idea a lot. That way you don't need to set it yourself. You could also still configure it with Crystal's type-safety goodness 👍 I dig it. I imagine the interface can stay largely as planned that way

I'd suggest changing Log.get to Log.for(self). Since it isn't really getting a new one, and that way it discourages trying to use it as a getter in other places. But this a minor thing so totally fine if no one likes that 🤣

module DB
  Log = ::Log.for(self)
end

DB::Log.level = :info

@watzon
Copy link
Contributor

watzon commented Feb 19, 2020

I do agree that using a yaml config might not be the best idea. May as well just have the log config happen in code, no?

@bcardiff
Copy link
Member

Should the yaml config be a separate shard?

We aim for a design that will allow injecting ways providing the configuration. But without having one proposal, it will be chaotic for the user experience. How do I configure logging? It depends on the shard. And there is no default. That will cause a lot of stress because you will need to learn how each app or framework decide to do things. While that will still be an option, it is not required for all the apps.

Typos and unused/incorrect arguments

How to detect typo vs not-binded source? I'm not sure there is a way to solve.

The only thing I imagine is something that will emit the paths requested by the user, maybe a Log.verbose in log source when creating a log :-P. And leave the user to notice all the paths that are potentially used. Of course it won't work on dynamic sources.

Issue in wrongly config backends

It will be up to the backend, in the initialization to complain if something is wrongly configured.

The config type will support a narrow set of types similar to JSON::Any, but without schema validation.

Using concrete constants and methods

It does not work since Log = ::Log will not create a new instance; they are making an alias. It's hard to allow users create their own loggers but also constraint all the available options.

(ok, @asterite covered)

Backends

For IO based backends there might be some code reutilization available, but not all the backends are IO and formatted oriented. Elasticsearch could store the whole entry object.

New feature suggestion: key/value data

That would be the usecase of a context. We thought of adding some overloads to extend the context at the same time the message is sent, but we didn't reach an option that make us happy enough.

We also want to enforce there is always a message per log entry. Not just context data.

@asterite
Copy link
Member

Does this mean I won't need to do require "yaml" to work with YAML because Crystal's runtime will already require it? (which also means YAML needs to be parsed and compiled on every app)

@bcardiff
Copy link
Member

Oh, the issue is the libyaml dependency.

Ok, by default it will only read the CRYSTAL_LOG_LEVEL/CRYSTAL_LOG_SOURCES env variables and if require "log/config" is included in the app the yaml configuration will be available.

@jkthorne
Copy link
Contributor

I think it would be nice to not have YAML be a big of a part of the crystal project as it is now. I YAML has kind of won out. Here is a breakdown of some config languages and I like languages written in Crystal like Con. It would be nice to not have libyaml be required by a crystal app but it might happen.

@straight-shoota
Copy link
Member

@paulcsmith It is a necessary requirement that logging can be configured at runtime. Sysadmins need to be able to configure complex logging setups without having to rebuild the binary. That also means it's simply not possible to ensure type safety.

That being said, we can obviously expose an internal interface to the configuration. This would be useful for example for setting up default config. Or if your app doesn't need runtime configuration.

@ALL Please let's not get into detail about choice of configuration format and other specifics. The primary goal right now should be to discuss and implement the general architecture.
Details of automatic configuration is two steps ahead. Contemplating on that only keeps us from discussing what's on the table right now.

@paulcsmith
Copy link
Contributor

Thanks for the response @bcardiff

Should the yaml config be a separate shard?

We aim for a design that will allow injecting ways providing the configuration. But without having one proposal, it will be chaotic for the user experience. How do I configure logging? It depends on the shard. And there is no default. That will cause a lot of stress because you will need to learn how each app or framework decide to do things. While that will still be an option, it is not required for all the apps.

I definitely agree it should not be separate. There should be one sanctioned way to do things so config is not different for every shard. My suggestion is to not use YAML at all and use code instead. You can inject configuration using Crystal code, and if you really want to you can ready YAML or ENV vars or whatever using Crystal, but that is up to the end user.

Typos and unused/incorrect arguments

How to detect typo vs not-binded source? I'm not sure there is a way to solve.

I think this could be solved by only allowing config via the constant. Then you will get compile time guarantees from Crystal, but maybe I am misunderstanding. I think the name binding is only helpful for YAML, but if that is not used then it is no longer an issue.

The only thing I imagine is something that will emit the paths requested by the user, maybe a Log.verbose in log source when creating a log :-P. And leave the user to notice all the paths that are potentially used. Of course it won't work on dynamic sources.

Issue in wrongly config backends

It will be up to the backend, in the initialization to complain if something is wrongly configured.

The config type will support a narrow set of types similar to JSON::Any, but without schema validation.

I think this is not making the best of Crystal's awesome type-system and compile time guarantees. If instead it is configured in code (not from YAML) you get nice documentation on what args and types are used, and you get nice compile time errors pointing to the exact problem and where the code is that caused it.

Using concrete constants and methods

It does not work since Log = ::Log will not create a new instance; they are making an alias. It's hard to allow users create their own loggers but also constraint all the available options.

(ok, @asterite covered)

Backends

For IO based backends there might be some code reutilization available, but not all the backends are IO and formatted oriented. Elasticsearch could store the whole entry object.

That makes sense 👍 But I still think that using a module instead of class is better. So that if you do have a base class you want to use, you can

New feature suggestion: key/value data

That would be the usecase of a context. We thought of adding some overloads to extend the context at the same time the message is sent, but we didn't reach an option that make us happy enough.

We also want to enforce there is always a message per log entry. Not just context data.

Can you explain a bit more about why you want to enforce a message per log entry? At Heroku and in my own apps we almost always use key/value data for everything: Log.info(operation_name: "SaveUser", author_id: 123) with no particular "message". We just want to log some info about how things went. Would that be possible in the proposed solution?

Under the hood I imagine it could use existing constructs:

def log(data)
  Log.context.using do
     # Set the context data
  end
end

About YAML

I think I'm confused because I don't see the advantage of using YAML over Crystal code. Crystal code is safer at compile time, way more flexible (can use ENV, can use conditionals, etc). But maybe I'm missing why it is useful. Could some elaborate?

@paulcsmith
Copy link
Contributor

paulcsmith commented Feb 19, 2020

@paulcsmith It is a necessary requirement that logging can be configured at runtime. Sysadmins need to be able to configure complex logging setups without having to rebuild the binary. That also means it's simply not possible to ensure type safety.

All my examples showed configuring logging at Runtime. I was never suggesting anything else

# At runtime
Log.level = :info
MyBackend.formatter = MyFormatter

You can also use ENV variables if needed to allow even more flexibility. But maybe I'm misunderstanding what you mean. What does YAML give us that Crystal code does not? If we add a less compile safe and less flexible way of configuring logging I think it would be good to know what the advantages are.

That being said, we can obviously expose an internal interface to the configuration. This would be useful for example for setting up default config. Or if your app doesn't need runtime configuration.

@ALL Please let's not get into detail about choice of configuration format and other specifics. The primary goal right now should be to discuss and implement the general architecture.
Details of automatic configuration is two steps ahead. Contemplating on that only keeps us from discussing what's on the table right now.

I don't believe this is true. The choice to use YAML affects the interfaces. If we use config in Crystal then we can use concrete types and arguments, and don't need to "register" backends or set a name for pools or do validation of backend args at runtime. Most all of it can be done by the compiler. It also means configuration of backends can happen without a hash type and can include a more concerete set of types and type arguments. If we can do everything in code that we can in YAML then I don't think we should use YAML. But if YAML gives us something awesome that's great! I'm curious what it gives us though

@straight-shoota
Copy link
Member

@paulcsmith Log.level = :info is not runtime configuration because it is defined at compile time. Configuration written in Crystal can't be changed without rebuilding. You could set up some flags to select between different config sets. But configuration for an entire logging setup can be pretty complex with maybe dozens of backends and rules. No in-code configuration can provide the necessary flexibility for that.

Which data format to use for that is really not the point right now. The important part is, there needs to be some kind of configuration that's not done in code while compiling an application. Instead, a user/sysadmin/devop should be able to edit a simple text format to configure the logging behaviour before executing the app. Or even change it without restarting the application.

Strictly, this all wouldn't even need to be in the default stdlib implementation (although there are good reasons for that, as per @bcardiff ). But the stlib configuration mechanism must be flexible to support this. It can't just work with compile time safety, because there is no such guarantee when configuration values come from outside the compilation process (i.e. at runtime).

@paulcsmith
Copy link
Contributor

paulcsmith commented Feb 19, 2020

@straight-shoota Ah I think I am starting to see what you mean. However even with YAML you would have to restart the app. Unless we also add some kind of file tracking that reloads configuration automatically when the YAML changes, but that seems a bit much.

Personally I've never worked with anyone making such signicant changes to a running system like you are proposing. If there are changes it is done very purposefull and an ENV var is introduced to configure particular parts without requiring a redploy (just a restart).

But also, I don't think we should shut down this conversation until we have a clear understanding on both sides, because I think that will help us come to an really great solution and better understanding of how people are using this 👍

@paulcsmith
Copy link
Contributor

After some discussion with @straight-shoota i understand the argument for yaml now, and also get where I did a bad job with my examples. Thanks for helping me work through some of that.

I think we can do some kind of approach that blends the best of both worlds and good to have concrete examples tomorrow that we can talk about and see if people like it/hate it

@RX14
Copy link
Contributor

RX14 commented Feb 20, 2020

I'm fine with the yaml config being in the stdlib if it requires an explicit require "log/config". require "log" shouldn't introduce any runtime behavior by itself. This means that shards can require log safely, always.

I'm glad we mostly seem to be agreed on the core proposal anyway.

@bcardiff bcardiff mentioned this issue Feb 24, 2020
@paulcsmith
Copy link
Contributor

paulcsmith commented Feb 25, 2020

@bcardiff After a few discussions with others I realized why people may need YAML config, and I also realized I might have misunderstood what affect YAML config would have on the Logger proposal.

My original thinking was that the YAML config would affect the backends (and possibly the Log classes). I still think that may be the case, but would love clarification.

I thought based on the examples that backends would be required to have an initializer that accepts Log::Backend::Config. Is that the case?

For example, I believe I could not do this because it does not know how to accept the YAML config:

@[Log::RegisterBackend("custom")]
class MyCustomBackend < Log::Backend
  def initialize(path : String)
  end
end

MyCustomBackend.new(path: "/foo/bar")

Can you confirm if that is correct or not? If it is I've got some ideas. If you all are set on the current approach that is ok. LMK and I'll back off :) But if you're still open to ideas I think I have some that may be pretty cool!

@bcardiff
Copy link
Member

I thought based on the examples that backends would be required to have an initializer that accepts Log::Backend::Config. Is that the case?

Yes

The yaml config would be

backends:
  custom:
    path: /foo/bar

And the code something similar to:

@[Log::RegisterBackend("custom")]
class MyCustomBackend < Log::Backend
  @path : String

  def initialize(config : Log::Backend::Config)
    @path = config["path"].as_s? || DEFAULT_PATH
  end
end

Sija referenced this issue Feb 25, 2020
Cleanup context on specs that changes it
@straight-shoota
Copy link
Member

@bcardiff IIUC, the initializer accepting Backend::Config is only required when it's supposed to be used with dynamic configuration. Also, it doesn't mean that there may not be a specialized initializer to expose a dedicated API for programmatical configuration. I suppose that's what Paul's interested in.

@Blacksmoke16
Copy link
Member

I'm assuming there's reason this couldn't be done with YAML::Serializable? That would remove the need for an "Any" style object.

Something like I mentioned #5874 (comment).

@paulcsmith
Copy link
Contributor

paulcsmith commented Feb 25, 2020

@bcardiff Thank you for clarifying, and yes @straight-shoota that is partially true. I could add an intializer that is better suited to code, but the default initializer for the Backend::Config will be there. So if I write a shard, I would need to implement both otherwise the backend would not work for people with YAML. I suppose I could just leave the YAML initializer and tell people to use the code version but then that seems unfair to those using YAML and very unexpected since the method is there but just wouldn't do anything. @straight-shoota here's kind of what I mean:

# Leaving out boilerplate
class MyBackend
  def initialize(@foo: String)
  end

  # If someone tries to set `foo: "bar"` in YAML it will not be set 
  # unless I add an initializer to handle the YAML as well:
  def initialize(config : Config)
    new(foo: config["foo"])
  end
end

I've got some ideas, but will try to write up actualy samples since I'm bad at explaining in words 🤣. Would also be happy to hop on a call sometime if I'm still talking nonsense! The rough idea is that I think YAML config should be talked about separately and Crystal should have a way to configure modules in stdlib or with a shard. So all shards can use it, not just the logger. And because of that, I think the Log implementation would be better off as code-first and the config idea I have would kind of work "automatically". This would lead to more type-safety when configuring with code, less duplication, and better docs (since a code-first approach means you can set the exact named args and types you want)

@straight-shoota
Copy link
Member

Another issue, that hasn't been touched yet: It would be great to be able to modify configuration settings (especially source/severity routing) during the runtime of a program. That doesn't need to be part of the initial implementation, but it should be possible to add later.
Right now Log is immutable, but I figure it would technically allow modifications. This should probably work through the builder revisiting its logger list. Not a public API to modify an individual instance.

The use case would be long-running applications where you want to change for example the severity level or log destination while the application is running. It shouldn't be necessary to restart the application just to update the logging configuration.

@paulcsmith
Copy link
Contributor

paulcsmith commented Feb 25, 2020

Ok I gave a shot at writing it down 🤞

My goals for this comment

  • See if we can remove the YAML config from Logger completely for this first pass
  • Propose that we discuss how to make Config first-class in Crystal for all libs not just Log

Proposal for code-first approach and no YAML (at first)

First I'd like to show why I think a code-first approach makes for safer, better documented, and more flexible code. I will address my thoughts on the YAML config later in the document.

# Leaving out boilerplate for now
class MyFileBackend < Backend
  # The advantage of this is:
  #
  #   * Great and automatic documentation of the types and args!
  #   * Awesome compile time messages if you typo, give it wrong type,
  #     forget an argument
  #   * Doesn't require YAML
  def initialize(@file_path : String)
  end

  def write(message)
    # pseudo code
    @file_path.write_to_it(message)
  end
end

# Of course if YAML is kept as proposed we *could* do this in code.
# The problem is:
#
#   * Litte type-safety. `{"fil_path" => "foo"}` would not fail at compile time
#   * Validation of args must be hand rolled by the backend instead of the compiler
#   * The author must manually write documentation on what args it accepts
#   * Usage in code is not as smooth
MyFileBackend.new({"file_path" => "some-path"})

# As opposed to this:
#
#   * Compile time guarantees
#   * Can accept procs if needed
MyFileBackend.new(file_path: "some-path")

An author could implement a code approach and a YAML approach in the Backend, but that means:

  • You need to duplicate code
  • Stil need to hand roll validations in the YAML config (for example, unkown args)
  • If third party shards just implement the YAML config you have no choice but to use YAML, or use the not-so-type-safe Hash like config object (or monkey path their backend 😜)

Ok, but what about people that do need YAML config?

  • YAML is helpful for pre-built binaries so you can configure without modifying the source code, so it'd be nice to have something for those people
  • Some people just like YAML config :)
  • YAML config should not make code worse for people not using YAML
  • If people want YAML config for Log, I'm sure they want to use it in other places too!

My proposal: a totally separate issue/PR for Crystal configuration for all libs (not just Log)

  • Shards/libs can use it for all their code :D
  • Standardized config for all projects to use so there aren't a bunch of ways to handle config (I love this about Elixir)
  • Log can use a code first approach to start. Easier to code (not so many types, no need to implement or maintain the Config alias).
  • Will make Log improvements easier to ship since we can split discussions and won't have too much stuff in one PR

How would it work

I think this can all be discussed in a separate PR and Log can move forward with a code-first approach, but I want to show a potential approach just so people can see how this might work.

This approach has been tested/used extensively in Lucky with Habitat and works wonderfully. I think we can extend it so people can use JSON/YAML with it too!

Something like this will allow:

  • Automatic documentation since we're using crystal methods and types for each arg
  • All the power of code in configuration (env vars, conditionals, procs, etc.)
  • Really nice errors and tooling to help people debug config super easily (will show examples later :D)
  • All crystal libs to share common conventions for config

The Settings module (or whatever we want to call it)

First I'll start with non-Logger code to show some examples:

class MyStripeProcessor
  Config.setup do
    # These set up class methods and instance methods (that call the class methods)
    setting private_key : String
    setting callback_url : String?, example: "https://myapp.com/strip_webhook".dump # Dump wraps it in quotes
    setting some_object : SomeObject?, example: "SomeObject.new"
  end

  def charge(amount)
    # Settings are usable in instances of an object
    Stripe.charge(amount, private_key, callback_url)
  end

  def self.charge(amount)
    # And as class methods too
    Stripe.charge(amount, private_key, callback_url)
  end
end

And to configure it in code:

MyStripeProcessor.configure do |settings|
  settings.private_key = "Something-secret" # or ENV["SO_SECRET"], etc.
  settings.some_object = SomeObject.new
end

I'll get into how we make it so you can make the types non-nilable when not using an initializer later, but suffice to say there is a pretty good solution.

Great, but what about YAML/JSON/non-code approaches

I have not done this with Habitat (what my proposal is
based on), but I think we could totally make it work in some really cool
ways.

require "yaml_config"

# Call this instead of (or in addition to) the code approach
YAMLConfig.load("config.yml") # Or ENV var, etc.

# Can still modify with code for some libs
MyShard.configure do |settings|
 settings.a_proc = -> {} # Procs can't be created from YAML so do it in code
end

# We'll talk about this later
Config.raise_if_missing_settings!

The YAML in "config.yml":

# Inferred from class name. Maybe we can allow specifying a name: `Config.create("stripe")`
my_stripe_processor:
  private_key: "123-abc"
  callback_url: "myapp.com/foo"
lucky_server:
  secret_key_base: "123-abc"
my_backend:
  file_path: /foo/bar/log
# This may require tweaking as it is not tested, but I believe we can do something like this
# Maybe by adding optional processors to YAMLConfig.load
# Or having a `Log::YAMLConfig` that just looks at `bind_logs` and loads it up. These are implementation details that I'm sure we can figure out
bind_logs: 
  sources:
    - db.*
   level: WARN
   # etc.

I think this is pretty cool. You get powerful YAML config automatically, but you also get type-safe code configuration that you can use without or in addition to YAML. You could also see a JSONConfig.load, etc.

I think the Config.create macro could have some hooks for loading in YAML/JSON config or whatever. THis extensibility means people could do all kinds of cool things. Like loading from Hashicorp Vault, or whatever they want. But they can all be seprate shards.

Some other nice stuff with this approach

  • Would allow cool tooling, like crystal config that would show all the configurable code, what the settings are, what the types are, etc. or crystal config Lucky to only return stuff for a given constant
  • Allowing YAML + Code means people can use Procs or other objects in code pretty easily in addition to YAML
  • Since we have a nice Setting DSL we can make some nice validaiton messages for YAML automatically, similar to the runtime error I showed earlier.
  • The code is actually pretty minimal (257 LOC, 60% of which are comments!). See what we've got so far: https://github.com/luckyframework/habitat/blob/master/src/habitat.cr
  • Config.raise_if_missing_settings! If you forget to set a required setting this error is raised with helpful example:

Screen Shot 2020-02-25 at 11 35 56 AM

  • Really easy/cool way to temporarily override config. Super helpful in tests:
MyStripeProcessor.temp_config(callback_url: "fake_url.com") do
  # Callback_url will be fake_url.com in this block and revert afterward.
end

Concerns

  • We'd have to think about how to make the config really flexible to support some of the config in @bcardiff's proposal. I think it is totally doable though Updated with example!
  • Might take a bit longer to get YAML config out, but I think the end product will be awesome and would be cool to have a nice
    standard way of configuring libs
  • Some people may want to set config at an instance level. In those cases I'd say they should use a regular initializer or extend the class and configure it. It can inherit the parent config and allow changing just the parts you want

Example with Backend and new setting class

class MyBackend
  Config.create do
     # I've got ideas for how to modify processors per-logger, but this is jsut a simple example for now
     setting.file_path : String
  end
end

In conclusion

I propose:

  • We modify the Log PR to use a code-first approach (no YAML config or Config alias) and focus on getting that out.
  • Open a separate PR discussion for Config in Crystal! I would be happy to start this and even work on the PR as I am very excited about it and already have a lot of code ready to use.

@bcardiff, @RX14, @straight-shoota and other core team: If this sounds good, LMK and I'll get started on the Config PR/issue. I'm confident we can make something incredibly cool and more nice to use/flexible than anything out there for config! :)

@bcardiff
Copy link
Member

.. the initializer accepting Backend::Config is only required when it's supposed to be used with dynamic configuration.

There is nothing preventing you to initialize a Log and Log::Backend manually, but they will be out of the loop of the Log.for factory method (?)

.. So if I write a shard, I would need to implement both otherwise the backend would not work for people with YAML.

My idea of the Log::Backend::Config is to be independant of the format used for configuring it. Maybe someone prefers a json o toml based configuration, but we don't want to make them reinvent the mechanism of instantiating and wireing logs and backends, hence the builder.

Another issue, that hasn't been touched yet: It would be great to be able to modify configuration settings (especially source/severity routing) during the runtime of a program. That doesn't need to be part of the initial implementation, but it should be possible to add later.

waj and I discarded that requirement. We usually restart the app on a config change, even the logger. But yeah, something like keeping track of the log to reconfigure them could be done without changing the API. The only clash is when, during development you manually changed the log level. Upon reconfiguration, which should be the final one.

I think this will be the responsibility of the builder, and signaling a reload might need some new public api to do so. Doable but discarded for now based on personal experience.

We modify the Log PR to use a code-first approach (no YAML config or Config alias) and focus on getting that out.

Even if macros are used to introduce a config notion, I don't see how you will get rid of Log::Backend::Config. You might be able to hide it, but removing it completely, I am not sure you can.

Having a Log::Backend::Config with nothing else allows you to code validation that can't be expressed by only typing rules.

I like the story of having a configuration mechanism, but it requires some design cycles still.

The current proposal does not require yaml at all, but there is an implementation that can use that format. In the same spirit, having a config macro in the backends could wire that mechanism on the loggers.

(I need to keep thinking on this topic)

@paulcsmith
Copy link
Contributor

There is nothing preventing you to initialize a Log and Log::Backend manually, but they will be out of the loop of the Log.for factory method (?)

Yes you can, I just think it is less than ideal because you have to give it a hash which is not particularly type-safe, worse documentation, etc. as mentioned in the example

My idea of the Log::Backend::Config is to be independant of the format used for configuring it. Maybe someone prefers a json o toml based configuration, but we don't want to make them reinvent the mechanism of instantiating and wireing logs and backends, hence the builder.

I love this idea! I just fear that by doing this we make things worse for documentation, and a code-first approach. If you do a code-first approach you can still do more validation outside the type-system. The settings stuff can map YAML/JSON to code without a Backend::Config. At least, I'm fairly certain it can. Can you share why you think that is necessary?

My main thought is that it can be done without Backend::Config and would still be able to work with YAML or JSON. Would be happy to chat or pair through it! I'll also take a look at the PR to see if I maybe that sparks an idea

@straight-shoota
Copy link
Member

I might be missing something here, but what do we need Backend::Config for exactly? It's basically just a facilitator between config format and implementation. Functionally it seems to not be really necessary. Instead we could just pass the YAML parser directly the backend implementation. This direct dependency might not b ideal, but I would prefer that over a hardly useful interface in between.
Supporting other config formats would be more challenging, because they would need to build custom mappings to the existing backends. But I'm not sure whether that would be such a big issue at all.
And maybe this would be another incentive to develop a cross-format serialization framework like serde =)

@paulcsmith
Copy link
Contributor

paulcsmith commented Feb 25, 2020

@straight-shoota That makes sense. I think this could be further improved however using the example above. Having a config DSL that would allow code/YAML/JSON whatever. That way you do not need to really worry about it.

I have a proof of concept here that does not need YAML or the intermediary Backend::Config type. This would greatly simplify the implementation of the Log PR because you could split the Log PR from config almost completely.

Roughly what I think it could start out as:

# First set of Log PR changes. Consider using regular Crystal:
class OriginalLogBackend
  def initialize(@file_path : String)
  end

  # or
  class_property! file_path : String?
end
 
# my_backend = OriginalLogBackend.new(file_path: "/foo/bar")
# Log.for(DB::Pool).backends << my_backend
 
# Though this doesn't have YAML log config, it is still much better than what we have today! 
# We get context, nicer 'Log' name, etc.
# We miss the fancy config (for now), but we can build that in as a separate step.

What it could be later on once we've nailed a nice config PR

Note this is UGLY and not even close to production ready but it shows how a simple config DSL could be used to handle YAML/JSON/whatever without a Config hash (just code) and instead using class properties (or could be an initializer or whatever). Also shows you can hook into it to build YamlConfig, JsonConfig, YamlLogConfig (for cool stuff like bind) etc.

This is not to get feedback on how exactly it works. I just wanted something to work as soon as possible to show that it is possible to do config without the Backend::Config type.

require "yaml"

# Proof of concept for how we can use an approach for code/yaml/json
# Similar to Lucky's Habitat, but with coolness for handling YAML config
module Config
  TYPES_WITH_CONFIG = [] of Config
  
  macro included
    {% TYPES_WITH_CONFIG << @type %}
    CONFIG_KEY_NAME = {{ @type.name.stringify.underscore }}
    
    def self.load_runtime_config(config)
    end
    
    # This is bad and inflexible, but was easy to do. Don't judge based on this :P
    def self.format_config_value(config_type : String.class, value) : String
      value.as_s
    end
    
    def self.format_config_value(config_type : Int32.class, value) : Int32
      value.as_i
    end
    
    def self.format_config_value(config_type, value)
      raise "Don't know how to handle #{config_type} with value #{value}"
    end
  end
 
  macro setting(type_declaration)
    class_property {{ type_declaration }}
    
    def self.load_runtime_config(config)
      previous_def(config)
      
      # Load known config keys and cast to the correct type automatically
      if value = config[{{ type_declaration.var.stringify }}]?
        @@{{ type_declaration.var }} = format_config_value({{ type_declaration.type }}, value).as({{ type_declaration.type }})
      end 
    end
  end
end
 
# This could also be a JSON loader, a VaultLoader, or whatever!
module YamlConfigLoader 
  macro load(yaml_string)
    yaml = YAML.parse({{ yaml_string }})
    {% for type_with_config in Config::TYPES_WITH_CONFIG %}
      if values = yaml[{{ type_with_config }}::CONFIG_KEY_NAME]?
        {{ type_with_config }}.load_runtime_config(values)
      end
    {% end %}
  end
end
 
class LogBackend
  include Config
 
  setting retries : Int32 = 5
  setting file_path : String = "foo"
end
 
puts LogBackend.file_path # foo
puts LogBackend.retries # 5
 
YamlConfigLoader.load <<-YAML
log_backend:
  file_path: "set/with/yaml"
  retries: 10
YAML
 
puts "---after yaml config is loaded"
 
puts LogBackend.file_path # set/with/yaml
puts LogBackend.retries # 10

https://play.crystal-lang.org/#/r/8mhp

So I'm thinking we could do a log implementation without the Config type at all at first and instead use regular initializer or class_property. Then tack on config later that works for logger as well as all Crystal code.

Another alternative is move forward with this YAML approach for now with the Backend::Config type, and then later on clean it up once we have Crystal config code figured out. That could also work but seems like duplicated work if work on one version now, and then have to redo parts of it later. Thoughts? I think either way could work, but IMO going without the YAML config for the first pass would be easier/simpler. But up to you of course! @bcardiff

I think either way I'll take a crack at some config code that will allow us to do this kind of thing but far more flexibly and with the other goodies mentioned before like nice error messages and such

@straight-shoota
Copy link
Member

That whole general config thing is a bit overwhelming, unfortunately. I'm not sure about what to make of it, because I see a lot of potential issues with such an aproach that tries to solve everything for everyone. It's definitely a nice idea, though. And maybe it's not that different from a general serialization framework... Anyway, it's really out of scope here.
It might be a good idea though to focus on Log first, but without the dynamic configuration part. Divide and conquer.

@paulcsmith
Copy link
Contributor

It might be a good idea though to focus on Log first, but without the dynamic configuration part. Divide and conquer.

Totally agree! Just wanted to show it is possible so that we can remove the dynamic configuration/YAML stuff in this PR (if you all want) and be confident it can be handled separately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.