Checks number of simultaneous threads (TEST)
====
This notebook checks whether the number of simultaneous threads reaches a number beyond **threadlimit**. It sends mails to all the people substribed to that alert. It is run every half an hour from a cron job (not yet).

In [1]:
from subscribers import subscribers
import alerts

import datetime
import re
import json
import sys

from elasticsearch import Elasticsearch
es = Elasticsearch([{'host':'atlas-kibana.mwt2.org', 'port':9200}],timeout=60)

### Variables for script

1. Minimum number of simultaneous threads beyond which we submit the alert
2. Number of hours for query interval

In [6]:
# Thread limit to trigger an alarm
threadlimit=400
# Period to check from now backwards
nhours=1

### Get starting and current time for query interval 

We need :
1. Current UTC time (as set in timestamp on ES DB)
2. Previous date stamp (**nhours** ago) obtained from a time delta

In order to subtract the time difference we need **ct** to be a datetime object

In [7]:
# Get current UTC time (as set in timestamp on ES DB)
# In order to subtract the time difference we need ct to be a datetime object
ct = datetime.datetime.utcnow()
ind = 'frontier-new-%d-%02d' % (ct.year, ct.month)
print(ind)
curtime = ct.strftime('%Y%m%dT%H%M%S.%f')[:-3]+'Z'

td = datetime.timedelta(hours=nhours)
st = ct - td
starttime = st.strftime('%Y%m%dT%H%M%S.%f')[:-3]+'Z'

print('start time', starttime)
print('current time',curtime)

frontier-new-2017-11
start time 20171106T153539.545Z
current time 20171106T163539.545Z


### Establish connection to ES-DB and submit query

Send a query to the ES-DB to get the highest number of simultaneous threads beyond the limit imposed by **threadlimit** on each Frontier server for the given time interval

In [8]:
es = Elasticsearch(hosts=[{'host':'atlas-kibana.mwt2.org', 'port':9200}],timeout=60)

my_query={
    "size":0,
    "query": {
#        "range":{"modificationtime":{"gte": starttime,"lte": curtime}}
       "range": {
          "@timestamp": {
             "gte": starttime,
             "lte": curtime,
             "format": "basic_date_time"
          }
       }
    },
    "aggs" : {
        "servers" : {
            "terms" : {
                "size" : 20,
                "field" : "frontierserver" 
            },
            "aggs" : {
                "maxthreads" : {
                    "max" : { "field" : "initthreads" }
                }
            }
        }
    }
}

res = es.search(index=ind, body=my_query, request_timeout=600)

frontiersrvr = {}
res=res['aggregations']['servers']['buckets']
for r in res:
    print(r)
    if r['maxthreads']['value']>threadlimit:
        frontiersrvr[r['key']]=r['maxthreads']['value']

print('problematic servers:', frontiersrvr)


{'doc_count': 38827, 'maxthreads': {'value': 5.0}, 'key': 'aiatlas149.cern.ch'}
{'doc_count': 38418, 'maxthreads': {'value': 7.0}, 'key': 'aiatlas037.cern.ch'}
{'doc_count': 38040, 'maxthreads': {'value': 6.0}, 'key': 'aiatlas148.cern.ch'}
{'doc_count': 36374, 'maxthreads': {'value': 6.0}, 'key': 'aiatlas038.cern.ch'}
{'doc_count': 21513, 'maxthreads': {'value': 6.0}, 'key': 'aiatlas147.cern.ch'}
{'doc_count': 21022, 'maxthreads': {'value': 6.0}, 'key': 'aiatlas036.cern.ch'}
{'doc_count': 20771, 'maxthreads': {'value': 6.0}, 'key': 'aiatlas146.cern.ch'}
{'doc_count': 19811, 'maxthreads': {'value': 4.0}, 'key': 'frontier-atlas2.lcg.triumf.ca'}
{'doc_count': 19608, 'maxthreads': {'value': 6.0}, 'key': 'frontier-atlas1.lcg.triumf.ca'}
{'doc_count': 18414, 'maxthreads': {'value': 4.0}, 'key': 'frontier-atlas3.lcg.triumf.ca'}
{'doc_count': 12388, 'maxthreads': {'value': 5.0}, 'key': 'ccosvms0014'}
{'doc_count': 11884, 'maxthreads': {'value': 5.0}, 'key': 'aiatlas073.cern.ch'}
{'doc_count': 

### Submit alert if there are any servers showing a high number of simultaneous threads (>**threadlimit**)

The number associated to each Frontier server is the highest number recorded during the given time interval

In [9]:
if len(frontiersrvr) > 0:
    S = subscribers()
    A = alerts.alerts()

    test_name = 'Too many concurrent threads'
    users =  S.get_immediate_subscribers(test_name)
    for user in users:
        body = 'Dear ' + user.name +',\n\n'
        body += '\tthis mail is to let you know that the number of simultaneous threads went beyond '
        body += str(threadlimit) + ' on some servers \n\n' 
        for fkey in frontiersrvr:
          body += fkey
          body += ' : '
          body += str(frontiersrvr[fkey])
          body += '\n'
        body += '\nBest regards,\nATLAS AAS'
        body += '\n\n To change your alerts preferences please use the following link:\n' + user.link
        A.sendMail(test_name, user.email, body)
##        A.addAlert(test_name, user.name, str(res_page))

Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: Too many concurrent threads
From: AAAS@mwt2.org
To: ilijav@gmail.com

Dear Ilija Vukotic,

	this mail is to let you know that the number of simultaneous threads went beyond 4 on some servers 

aiatlas038.cern.ch : 6.0
aiatlas073.cern.ch : 5.0
aiatlas037.cern.ch : 7.0
ccosvms0013.in2p3.fr : 14.0
ccosvms0012.in2p3.fr : 8.0
aiatlas147.cern.ch : 6.0
aiatlas149.cern.ch : 5.0
aiatlas148.cern.ch : 6.0
frontier-atlas1.lcg.triumf.ca : 6.0
ccosvms0014 : 5.0
aiatlas036.cern.ch : 6.0
aiatlas146.cern.ch : 6.0

Best regards,
ATLAS AAS

 To change your alerts preferences please use the following link:
https://docs.google.com/forms/d/e/1FAIpQLSeedRVj0RPRadEt8eGobDeneix_vNxUkqbtdNg7rGMNOrpcug/viewform?edit2=2_ABaOnufrzSAOPoVDl6wcXDnQKk0EfkQRmlxj04nw9npJrTAK5BZPijqoLhg
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: Too many concurrent threads
Fr