xml/operations-identity_alarmdefinitions.xml

<?xml version="1.0"?>
<!DOCTYPE section [
 <!ENTITY % entities SYSTEM "entity-decl.ent"> %entities;
]>
<section xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="identity-alarmdefinitions">
 <title>Identity Alarms</title>
 <para>
  These alarms show under the Identity section of the &productname; &opscon;.
 </para>
 <section>
  <title>SERVICE: IDENTITY-SERVICE</title>
  <informaltable colsep="1" rowsep="1">
   <tgroup cols="2">
    <colspec colname="c1" colnum="1" colwidth="30*"/>
    <colspec colname="c2" colnum="2" colwidth="70*"/>
    <thead>
     <row>
      <entry>Alarm Information</entry>
      <entry>Mitigation Tasks</entry>
     </row>
    </thead>
    <tbody valign="top">
     <row>
      <entry>
       <para>
        <emphasis role="bold">Name: HTTP Status</emphasis>
       </para>
       <para>
        <emphasis role="bold">Description:</emphasis> This check is contacting
        the keystone public endpoint directly.
       </para>
<screen>component=keystone-api
api_endpoint=public</screen>
       <para>
        <emphasis role="bold">Likely cause:</emphasis> The keystone service is
        down on the affected node.
       </para>
      </entry>
      <entry>
       <para>
        Restart the keystone service on the affected node:
       </para>
       <procedure>
        <step>
         <para>
          Log in to the &clm;.
         </para>
        </step>
        <step>
         <para>
          Use the keystone start playbook against the affected node:
         </para>
<screen>cd ~/scratch/ansible/next/ardana/ansible
ansible-playbook -i hosts/verb_hosts keystone-start.yml \
--limit &lt;hostname&gt;</screen>
        </step>
       </procedure>
      </entry>
     </row>
     <row>
      <entry>
       <para>
        <emphasis role="bold">Name: HTTP Status</emphasis>
       </para>
       <para>
        <emphasis role="bold">Description:</emphasis> This check is contacting
        the keystone admin endpoint directly
       </para>
<screen>component=keystone-api
api_endpoint=admin</screen>
       <para>
        <emphasis role="bold">Likely cause:</emphasis> The keystone service is
        down on the affected node.
       </para>
      </entry>
      <entry>
       <para>
        Restart the keystone service on the affected node:
       </para>
       <procedure>
        <step>
         <para>
          Log in to the &clm;.
         </para>
        </step>
        <step>
         <para>
          Use the keystone start playbook against the affected node:
         </para>
<screen>cd ~/scratch/ansible/next/ardana/ansible
ansible-playbook -i hosts/verb_hosts keystone-start.yml \
--limit &lt;hostname&gt;</screen>
        </step>
       </procedure>
      </entry>
     </row>
     <row>
      <entry>
       <para>
        <emphasis role="bold">Name: HTTP Status</emphasis>
       </para>
       <para>
        <emphasis role="bold">Description:</emphasis> This check is contacting
        the keystone admin endpoint via the virtual IP address (HAProxy)
       </para>
<screen>component=keystone-api
monitored_host_type=vip</screen>
       <para>
        <emphasis role="bold">Likely cause:</emphasis> The keystone service is
        unreachable via the virtual IP address.
       </para>
      </entry>
      <entry>
       <para>
        If neither the <literal>api_endpoint=public</literal> or
        <literal>api_endpoint=admin</literal> alarms are triggering at the same
        time then there is likely a problem with haproxy.
       </para>
       <para>
        You can restart the haproxy service with these steps:
       </para>
       <procedure>
        <step>
         <para>
          Log in to the &clm;.
         </para>
        </step>
        <step>
         <para>
          Use this playbook against the affected node:
         </para>
<screen>cd ~/scratch/ansible/next/ardana/ansible
ansible-playbook -i hosts/verb_hosts FND-CLU-start.yml \
--limit &lt;hostname&gt;</screen>
        </step>
       </procedure>
      </entry>
     </row>
     <row>
      <entry>
       <para>
        <emphasis role="bold">Name: Process Check</emphasis>
       </para>
       <para>
        <emphasis role="bold">Description:</emphasis> Separate alarms for each
        of these glance services, specified by the <literal>component</literal>
        dimension:
       </para>
       <itemizedlist>
        <listitem>
         <para>
          keystone-main
         </para>
        </listitem>
        <listitem>
         <para>
          keystone admin
         </para>
        </listitem>
       </itemizedlist>
       <para>
        <emphasis role="bold">Likely cause:</emphasis> Process crashed.
       </para>
      </entry>
      <entry>
       <para>
        You can restart the keystone service with these steps:
       </para>
       <procedure>
        <step>
         <para>
          Log in to the &clm;.
         </para>
        </step>
        <step>
         <para>
          Use this playbook against the affected node:
         </para>
<screen>cd ~/scratch/ansible/next/ardana/ansible
ansible-playbook -i hosts/verb_hosts keystone-start.yml \
--limit &lt;hostname&gt;</screen>
        </step>
       </procedure>
       <para>
        Review the logs in <literal>/var/log/keystone</literal> on the affected
        node.
       </para>
      </entry>
     </row>
     <row>
      <entry>
       <para>
        <emphasis role="bold">Name: Service Log Directory Size</emphasis>
       </para>
       <para>
        <emphasis role="bold">Description:</emphasis> Service log directory
        consuming more disk than its quota.
       </para>
       <para>
        <emphasis role="bold">Likely cause:</emphasis> This could be due to a
        service set to <literal>DEBUG</literal> instead of
        <literal>INFO</literal> level. Another reason could be due to a
        repeating error message filling up the log files. Finally, it could be
        due to log rotate not configured properly so old log files are not
        being deleted properly.
       </para>
      </entry>
      <entry>Find the service that is consuming too much disk space. Look at
      the logs. If <literal>DEBUG</literal> log entries exist, set the logging
      level to <literal>INFO</literal>. If the logs are repeatedly logging an
      error message, do what is needed to resolve the error. If old log files
      exist, configure log rotate to remove them. You could also choose to
      remove old log files by hand after backing them up if needed.</entry>
     </row>
    </tbody>
   </tgroup>
  </informaltable>
 </section>
</section>