## Determining the (silenced) alerts that fire most often during upgrades

### High-level plan
- Query observatorium-mst for `sre:slo:upgradeoperator_upgrade_result == 0`
  - for each _id as CUUID:
    - Query SL-DB for when most recent upgrade started&ended for this $CUUID
    - Query telemeter-lts for value of alerts{_id=$CUUID} during upgrade timeframe
      - Add cluster's alerts to histogram

### Querying Observatorium-MST for UUIDs of clusters that paged during their last upgrade

In [None]:
from settings import OBSERVATORIUM_URL, OBSERVATORIUM_AUTH_COOKIE
from urllib.parse import quote
import requests

In [None]:
observatorium_query = OBSERVATORIUM_URL + quote("sre:slo:upgradeoperator_upgrade_result == 0")
observatorium_results = requests.get(observatorium_query, cookies=OBSERVATORIUM_AUTH_COOKIE).json()
if observatorium_results['status'] != "success":
    raise ValueError("Observatorium query unsuccessful: " + str(observatorium_results))
observatorium_results['data']['result'][0]['metric']['_id']

In [None]:
alerting_upgrade_cluster_uuids = set(r['metric']['_id'] for r in observatorium_results['data']['result'])
set(alerting_upgrade_cluster_uuids)

Now that we have UUIDs for clusters that paged during their last upgrade, we'll try...
### Querying OCM service log API for upgrade time windows 

In [None]:
import re
from datetime import datetime, timezone
from util import OCMClient

In [None]:
ocm_client = OCMClient()
version_regex = re.compile("version '([-\w\.]+)'")

for cuuid in alerting_upgrade_cluster_uuids:
    # Fetch upgrade-related service logs for this cluster
    sldb_query = quote(f"cluster_uuid = '{cuuid}' and (summary = 'Upgrade maintenance beginning' or summary = 'Upgrade maintenance completed')")
    sldb_response = ocm_client.get("/api/service_logs/v1/cluster_logs?search=" + sldb_query).json()

    # Iterate over fetched service logs to determine latest upgrade start/end times
    upgrade_start_time = datetime.min.replace(tzinfo=timezone.utc)
    upgrade_end_time = datetime.min.replace(tzinfo=timezone.utc)
    upgrade_version = ""
    for sl in sldb_response['items']:
        sl_timestamp = datetime.fromisoformat(sl['timestamp'].replace("Z", "+00:00"))
        if sl['summary'] == "Upgrade maintenance beginning" and sl_timestamp > upgrade_start_time:
            upgrade_start_time = sl_timestamp

        if sl['summary'] == "Upgrade maintenance completed" and sl_timestamp > upgrade_end_time:
            upgrade_end_time = sl_timestamp
            upgrade_version = version_regex.search(sl['description']).group(1)

    print(f"Start: {upgrade_start_time} | End: {upgrade_end_time} | Version: {upgrade_version}")