# QA Model Runs notebook

Notebook for pre-release QA process, running 2 scenarios from a previous model version and comparing results.

⚠️ TODO: Set the paths to the two results files from Azure that you want to run, in the variable "results_dict", together with the scheme code. Example given in the first cell.

This notebook:

1. Downloads results files from Azure Results container
2. Converts the params in the results files into model params .json files, makes minor edits, sends to the API to run on the dev version of the model
3. Checks the status of the runs using the API
4. When the runs are completed, compares the dev model run results to the results downloaded from Azure.

⚠️ Note that this notebook will only work if there have not been breaking changes in the params files between the model versions being tested. If there have been breaking changes, you will need to add these into the cell where the parameters are edited.

The notebook produces and displays dataframes comparing results from the previous model version with the dev version of the model. You will have to use your own eyes 👀 to check for differences. 

In [None]:
# Run at least 2 scenarios from current users/schemes.
# Pick scenarios that are the most recent model version - you may have to run these yourself if they do not already exist.
# Also run at least 1 scenario using sample_params and a non-NHP scheme (these test all params, not just the ones set by schemes)

results_dict = {
    "RXX": {
        "results_path": "prod/vX.X/RXX/scenarioname-datetime.json.gz"
    }
}

In [None]:
# Get params from Azure
%cd ../..

import os
import json
import pandas as pd
from dotenv import load_dotenv
from nhpy import az, process_params, process_results

%load_ext autoreload
%autoreload 2

# Load all environment variables
load_dotenv()
account_url = os.getenv("AZ_STORAGE_EP")
results_container = os.getenv("AZ_STORAGE_RESULTS")
api_url = os.getenv("API_URL")
api_key = os.getenv("API_KEY")

## Get parameters from Azure

In [None]:
# Get scenarios that have been run, where results are stored on Azure

results_connection = az.connect_to_container(account_url, results_container)


for trust in results_dict.keys():

    results_path = results_dict[trust]["results_path"]
    results_json = az.load_results_gzip_file(results_connection, results_path)

    results_dict[trust]["results_old"] = results_json

In [None]:
# Get params only from results JSONs, edit scenario name, save to queue folder
# ⚠️ For v3.3 there is a breaking change to params - we need to change the format of NDG

if not os.path.exists("queue"):
    os.makedirs("queue")

filenames = []
for trust in results_dict.keys():
    params = results_dict[trust]["results_old"]["params"].copy()
    params["scenario"] = params["scenario"] + "-test"
    params_filename = f"{params['dataset'] + '-' + params['scenario']}.json"
    params["app_version"] = "dev"
    params["user"] = "ds-team"
    params["viewable"] = False
    with open(os.path.join("queue", params_filename), "w") as f:
        json.dump(params, f)
    results_dict[trust]["new_params"] = params
    filenames.append(params_filename)

## Send runs to API

In [None]:
import requests
import time

responses = {}
for f in filenames:
    with open(os.path.join("queue", f), "rb") as fopen:
        params = json.load(fopen)
        response = requests.post(
            url=api_url,
            params={
                "app_version": "dev",
                "code": api_key,
                "save_full_model_results": "False",
            },
            json=params,
            timeout=30,
        )
    time.sleep(3)
    responses[params["dataset"]] = response

In [None]:
responses

In [None]:
from ast import literal_eval

for provider, response in responses.items():
    create_datetime = literal_eval(response._content.decode("utf-8"))["create_datetime"]
    params = results_dict[provider]["new_params"]
    results_dict[provider][
        "new_results_path"
    ] = f"prod/{params['app_version']}/{params['dataset']}/{params['scenario']}-{create_datetime}.json.gz"

## Wait for runs to be completed ⌚

At the moment I don't know how to query the API to check if the runs are completed. In the meantime you can check it manually by visiting the URL below in your browser...

This normally takes about 15 mins

In [None]:
f"{os.getenv('API_CHECKPOINT')}?code={api_key}"

## Use completed dev run results

In [None]:
# Read new model runs from Azure and store in the results_dict

for trust in results_dict.keys():
    results_path = results_dict[trust]["new_results_path"]
    results_json = az.load_results_gzip_file(results_connection, results_path)
    results_dict[trust]["results_new"] = results_json
    print(results_path)

In [None]:
trusts = list(results_dict)

In [None]:
# Compare and save to CSV
from datetime import date


df_list = [process_results.compare_results(results_dict, t) for t in trusts]
(

    pd.concat(df_list)
    .reset_index()
    .groupby(["trust", "pod", "measure"])
    .sum()
    .to_csv(f"QA_default_results_{date.today()}.csv")
)

In [None]:
# Compare and save to CSV



sc_list = [process_results.compare_stepcounts(results_dict, t) for t in trusts]
(
    pd.concat(sc_list)
    .reset_index()
    .fillna("-")
    .groupby(["trust", "change_factor", "measure", "strategy"])
    .sum(numeric_only=True)
    .to_csv(f"QA_stepcounts_{date.today()}.csv")
)