Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low: Bump some log messages to notice #205

Closed
wants to merge 1 commit into from
Closed

Low: Bump some log messages to notice #205

wants to merge 1 commit into from

Conversation

inouekazu
Copy link
Contributor

This request is a proposal patch.

Since the loglevel of some helpful log messages is info or debug, these are not outputted via syslog.
Therefore, bump up loglevel of these helpful log messeges to notice.

@beekhof
Copy link
Member

beekhof commented Jan 10, 2013

I find it interesting that after being crucified for a decade for logging too much, the moment I turn things down people want it all back on again.

The current policy is that messages like those you want turned up are not appropriate for syslog (which is why they are at 'info' or lower). People that want this level of detail are encouraged to specify a logfile in either corosync or /etc/sysconfig/pacemaker

I will take another look, but most of these changes will probably get NACK'd.
Especially in the PE which intentionally only logs the changes that will result from the current transition.

@beekhof beekhof closed this Jan 10, 2013
@inouekazu
Copy link
Contributor Author

I expound on necessity anew about each message.

In pacemaker-1.0, I'm monitoring the message of the following events during production run.
So I need to monitor these events (messages) in pacemaker-1.1 similarly. However, because a large quantity of messages are logged when PCMK_debugfile is used, I don't want to use it.

  1. operation start of a resource
    The "operation completion" message of a resource is notice. This is subject of surveillance.

    crmd[10158]:   notice: process_lrm_event: LRM operation prmDummy_start_0 (call=8, rc=0, cib-update=55, confirmed=true) ok
    crmd[10158]:   notice: process_lrm_event: LRM operation prmDummy_start_0 (call=19, rc=1, cib-update=61, confirmed=true) unknown error
    

    However, there is no notice message of an operation start of a resource.
    I need the notice message of an "operation start" including a resource name and operation, so I want to change this message into notice.

    crmd[10158]:    debug: do_lrm_rsc_op: Performing key=7:1:0:c7dfa3fe-0746-461d-b1b0-f9b6f4cf1d56 op=prmDummy_start_0
    

    inouekazu@a727333#L1L1665

  2. change of node status
    Since I need the notice message which shows the status of a node, I want to change these messages into notice.

    1. "start" of service stop

      crmd[11520]:     info: handle_shutdown_request: Creating shutdown request for dev1 (state=S_IDLE)
      

      inouekazu@a727333#L2L884

    2. node status was determined

      pengine[11519]:     info: determine_online_status: Node dev1 is shutting down
      pengine[11519]:     info: determine_online_status: Node dev2 is online
      

      inouekazu@a727333#L7L912

    3. node became offline

      crmd[11520]:     info: peer_update_callback: Client dev1/peer now has status [offline] (DC=true)
      

      inouekazu@a727333#L0L137

    4. node failed, ...

      crmd[10895]:     info: peer_update_callback: dev1 is now lost (was member)
      

      inouekazu@a727333#L0L124

  3. process creation of pacemakerd
    I think that it's necessary to monitor it since process creation of pacemakerd is an event which consumes computer resource.

    1. child process was created

      pacemakerd[11223]:     info: start_child: Forked child 11225 for process cib
      pacemakerd[11223]:     info: start_child: Forked child 11226 for process stonith-ng
      

      inouekazu@a727333#L8L266

    2. child process was terminated

      pacemakerd[11223]:     info: pcmk_child_exit: Child process crmd exited (pid=11230, rc=0)
      pacemakerd[11223]:     info: pcmk_child_exit: Child process pengine exited (pid=11229, rc=0)
      

      inouekazu@a727333#L8L172

  4. election of DC
    The contents of the message differ by DC node and other nodes. So I need these messages, in order to recognize DC.

    crmd[11520]:    debug: update_dc: Unset DC. Was dev2
    crmd[11520]:     info: update_dc: Set DC to dev2 (3.0.7)
    

    inouekazu@a727333#L5L1050

  5. update of an attribute (CIB)
    I need to receive the notice message including the update of an attribute (node name, attribute name and value), on DC node. Because it may become the trigger of resource movement depending on location constraints.

    crmd[12131]:     info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair, id=status-2506467520-default_ping_set, name=default_ping_set, value=100, magic=NA, cib=0.5.26) : Transient attribute: update
    crmd[12131]:     info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair, id=status-2472913088-default_ping_set, name=default_ping_set, value=100, magic=NA, cib=0.5.28) : Transient attribute: update
    

    inouekazu@a727333#L4L329

  6. final status of resource
    Messages other than "LogActions: Leave" including Start, Move, etc. are notice. (I'm monitoring the status of a resource from these messages.)
    Since I need to monitor the final status of a resource, please change "LogActions: Leave" into notice similarly.

    pengine[12130]:   notice: LogActions: Move    prmDummy     (Started dev1 -> dev2)
    pengine[12130]:   notice: LogActions: Stop    prmPing:0    (dev1)
    pengine[12130]:     info: LogActions: Leave   prmPing:1    (Started dev2)
    pengine[12130]:     info: LogActions: Leave   prmDummy     (Started dev2)
    pengine[12130]:     info: LogActions: Leave   prmPing:0    (Started dev2)
    pengine[12130]:     info: LogActions: Leave   prmPing:1    (Stopped)
    

    inouekazu@a727333#L9L1813
    inouekazu@a727333#L6L18

  7. operation of all resources
    I want to monitor the operation of all resources of the cluster, so I'm monitoring the following messages on DC node. Please change this message into notice.

    crmd[14356]:     info: te_rsc_command: Initiating action 2: stop prmDummy_stop_0 on dev1
    crmd[14356]:     info: te_rsc_command: Initiating action 9: start prmDummy_start_0 on dev2 (local)
    crmd[14356]:     info: te_rsc_command: Initiating action 10: monitor prmDummy_monitor_10000 on dev2 (local)
    

    inouekazu@a727333#L3L395

@beekhof
Copy link
Member

beekhof commented Feb 12, 2013

1 The logs you are looking for exist, but are from the lrmd and not appropriate for syslog.

why, how long, which arguments - all the kinds of information you need to diagnose problems afterwards belong application logfiles (eg. /var/log/cluster.log) not syslog.

2.1 pacemakerd logs are a better source for this
2.2 pengine tries to log changes only. as something that runs often it would be too verbose to log this every time.
crmd logs would be a better source.
2.3 and 2.4
logs from crm_update_peer_state and crm_update_peer_proc are better suited for this

3 There is a saying in english "No news, is good news".
If something bad/unexpected happened, we'll log it, for example:

        notice: pcmk_child_exit:    Child process lrmd terminated with signal 9 (pid=23119, core=0)
        error: pcmk_child_exit:     Child process crmd exited (pid=23122, rc=2)

but the normal starting/stopping of processes isn't something worthy of showing in syslog.

6 See 2.2 (apart from the bit about crmd logs)

I'll follow up on the others shortly

@beekhof
Copy link
Member

beekhof commented Feb 12, 2013

7 Ok. We can make that change.

@beekhof
Copy link
Member

beekhof commented Feb 12, 2013

4 is very much an internal detail. You can still look for it, just enable logfiles and monitor that instead

@beekhof
Copy link
Member

beekhof commented Feb 12, 2013

5 is again a debugging item. if you want to monitor updates to the cib, use the API - thats what its for.
Parsing logfiles is very fragile and likely to break any time we change the format/contents/function name.

And again, if you really have to, you can still look for it, just enable logfiles and monitor that instead.

@inouekazu
Copy link
Contributor Author

Thank you for a reply and advice. I will watch application logfile.
And so I have a request newly.

  • I need the INFO message of "operation start".
    The "operation completion" message of a resource is output with a priority of NOTICE, but INFO (or higher) message of the "operation start" corresponding to it is not output.
    I want to know what is performed from now on. Please merge a change of either.

    inouekazu@ea2591c
    inouekazu@79761c9

  • I want the INFO message of "reset DC."
    The "set DC" message is output with a priority of INFO, but INFO (or higher) message of the "unset DC" corresponding to it is not output.
    The contents of the message differ by DC node and other nodes. So I want this INFO message.

    inouekazu@9c8cd5c

@beekhof
Copy link
Member

beekhof commented Mar 4, 2013

Applied:

  • Kazunori INOUE (3 days ago) 166c670: Low: Bump the log of an operation start of a resource to info (HEAD, master)
  • Kazunori INOUE (3 days ago) dc5f117: Low: Bump the log of unset DC to info

@inouekazu
Copy link
Contributor Author

I confirmed it. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants