New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceControl default log level (Warn) makes is difficult to investigate problems #1486

Closed
mikeminutillo opened this Issue Nov 6, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@mikeminutillo
Copy link
Member

mikeminutillo commented Nov 6, 2018

The default log level for a new instance of ServiceControl is WARN. This is often insufficient to diagnose issues when customers send us log files.

It only helps when we've previously considered what could go wrong and have thought to log at the WARN level. Without any of the context of what else was happening in ServiceControl at that time, we often have to ask the customer to update their logging level (requires messing with a config file) and then restart their instance and hope that the problem occurs again.

@SzymonPobiega

This comment has been minimized.

Copy link
Member

SzymonPobiega commented Nov 6, 2018

@mikeminutillo do we need to perform some kind of load tests to validate if INFO or even more verbose is feasible? Would that be part of PoA?

@SzymonPobiega

This comment has been minimized.

Copy link
Member

SzymonPobiega commented Nov 6, 2018

@Particular/keeping-the-lights-on-squad sounds like something we can take care of? A low-hanging fruit?

@mikeminutillo

This comment has been minimized.

Copy link
Member

mikeminutillo commented Nov 6, 2018

From memory, this was dropped down to WARN when we were concerned about Raven Perf and IO.

A long running (24 hour) test with each transport to show how verbose the logs are is a good idea. If it's too verbose then we should look at making better rules around what is logged at each level to try and improve the situation.

I am wondering if we should have a timed job that runs every hour or so that causes components to log vital stats at the INFO level. Something like:

INFO Audits - Last import: 2018-11-06 15:33:00
INFO Audits - Current Rate: 20 msg/sec
INFO Recoverability - Last message retried 2018-10-30 18:00:00
@SzymonPobiega

This comment has been minimized.

Copy link
Member

SzymonPobiega commented Nov 6, 2018

Yeah. That would be awesome. Even something that shows that these components are still working i.e. the threads have not died would be good.

@SzymonPobiega

This comment has been minimized.

Copy link
Member

SzymonPobiega commented Nov 7, 2018

@mikeminutillo I think we can deal with at least part of this issue in a maintenance release

@WilliamBZA WilliamBZA transferred this issue from another repository Nov 12, 2018

@WilliamBZA WilliamBZA removed their assignment Nov 12, 2018

@WilliamBZA

This comment has been minimized.

Copy link
Member

WilliamBZA commented Nov 12, 2018

Added to maintenance release list

@SzymonPobiega

This comment has been minimized.

Copy link
Member

SzymonPobiega commented Dec 6, 2018

Closed via #1518

@SzymonPobiega SzymonPobiega added this to the 3.4.0 milestone Dec 6, 2018

@SzymonPobiega SzymonPobiega changed the title ServiceControl default log level is insufficient ServiceControl default log level (Info) makes is difficult to investigate problems Dec 6, 2018

@SzymonPobiega SzymonPobiega changed the title ServiceControl default log level (Info) makes is difficult to investigate problems ServiceControl default log level (Warn) makes is difficult to investigate problems Dec 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment