# New Hospitals Model

This notebook runs the NHP model and produces the aggregated results.

In [1]:
params_file = "sample_params.json"
data_path = "data"
results_path = "results"

## Setup

Load the required packages

In [14]:
import os
import json
import shutil

from datetime import datetime

from run_model import run_model
from combine_results import combine

from model.aae import AaEModel
from model.inpatients import InpatientsModel
from model.outpatients import OutpatientsModel

We need to load in the params json file.

In [3]:
with open(params_file, "r", encoding="UTF-8") as pf:
  params = json.load(pf)
# extract the number of model_runs the params calls for
model_runs = params["model_runs"]
# set the create_datetime
params["create_datetime"] = f"{datetime.now():%Y%m%d_%H%M%S}"

We will run the model in parallel. By default, use all available CPU cores. You can set this to a lower value to use less resources, but it will take longer to run the model.

In [4]:
cpus = os.cpu_count()
cpus

12

When we run the model in parallel it's slightly more efficient to run a batch of model runs. Batches of 4 or 8 seems to be most efficient. This value should be a power of 2.

In [5]:
batch_size = 2 ** 2
batch_size

4

## Run the model

First, we create the model runner. The `run_model()` function expects the params dictionary, the path to the data, the path where the results will be saved, which model run to start at, how many model runs to perform, the number of CPU cores to use, and the size of the batches to run.

The function returns a function, which takes either `AaEModel`, `InpatientsModel`, or `OutpatientsModel`, depending on what type of model we want to run.

Note, we add one to the model runs. The "principal" model run is model run 0, and then we perform 1 to `model_runs` iterations of the model.

In [6]:
runner = run_model(
    params,
    data_path,
    results_path,
    0,
    model_runs + 1,
    cpus,
    batch_size
)

Now the runner is set up, we can run each of the types of models.

In [7]:
runner(AaEModel)

Running: AaEModel


260it [00:14, 18.06it/s]                         

Model runs completed: 257 / 257





In [8]:
runner(OutpatientsModel)

Running: OutpatientsModel


260it [00:23, 10.86it/s]                         


Model runs completed: 257 / 257


In [9]:
runner(InpatientsModel)

Running: InpatientsModel


260it [01:37,  2.68it/s]                         


Model runs completed: 257 / 257


## Combine Results

Once the model has run, we will have a parquet file for each of the individual model runs. We want to combine these into a single file containing all of the results.

In [15]:
dataset = params["input_data"]
scenario = params["name"]
create_datetime = params["create_datetime"]

In [None]:
combine(results_path, dataset, scenario, create_datetime)

We can now clean up by deleting the individual files.

In [19]:
def clean_results(t):
  for a in ["ip", "op", "aae"]:
    p = f"{results_path}/{t}/activity_type={a}/dataset={dataset}/scenario={scenario}/create_datetime={create_datetime}"
    shutil.rmtree(p)

if params.get("save_all_results", False): clean_results("model_results")
if params.get("aggregate_results", True): clean_results("aggregated_results")
clean_results("change_factors")

## Load Results

We can now load in our results.

In [20]:
import pyarrow.parquet as pq
import pandas as pd

In [27]:
# all files we need to load contain this chunk in the file path
file_path = f"dataset={dataset}/scenario={scenario}/{create_datetime}"

In [33]:
# the selected population variant can be extracted from the "run_params" file
with open(f"{results_path}/run_params/{file_path}.json", "r", encoding = "UTF-8") as rpf:
  run_params = json.load(rpf)
selected_variants = pd.DataFrame({"variant": run_params["variant"] })

Unnamed: 0,variant
0,principal
1,principal
2,principal
3,high migration
4,principal
...,...
252,principal
253,high migration
254,high migration
255,high migration


In [34]:
# the aggregated model results can be loaded like so: we join to the selected variants data loaded above.
# note: model_run -1 is the baseline data, so there is no variant to select, hence the left join.
aggregated_results = (pq
  .read_pandas(f"{results_path}/aggregated_results/combined/{file_path}.parquet")
  .to_pandas()
  .merge(selected_variants, "left", left_on = "model_run", right_index = True)
)
aggregated_results

Unnamed: 0,age_group,sex,tretspef,pod,measure,value,model_run,dataset,activity_type,scenario,create_datetime,variant
0,0- 4,1,100,ip_elective_admission,admissions,10,-1,synthetic,ip,test,20220509_124120,
1,0- 4,1,100,ip_elective_daycase,admissions,17,-1,synthetic,ip,test,20220509_124120,
2,0- 4,1,100,ip_non-elective_admission,admissions,26,-1,synthetic,ip,test,20220509_124120,
3,0- 4,1,101,ip_elective_admission,admissions,1,-1,synthetic,ip,test,20220509_124120,
4,0- 4,1,101,ip_elective_daycase,admissions,20,-1,synthetic,ip,test,20220509_124120,
...,...,...,...,...,...,...,...,...,...,...,...,...
738312,85+,2,Other,aae_type-01,walk-in,1135,99,synthetic,aae,test,20220509_124120,principal
738313,85+,2,Other,aae_type-02,walk-in,50,99,synthetic,aae,test,20220509_124120,principal
738314,85+,2,Other,aae_type-03,ambulance,10,99,synthetic,aae,test,20220509_124120,principal
738315,85+,2,Other,aae_type-03,walk-in,336,99,synthetic,aae,test,20220509_124120,principal


In [36]:
# we can load the change factors in like so. Note, the order of the rows is semi-important within each model_run:
# the "baseline" change_factor row must always come first. The other rows are then in the order that change factor
# was run within the model engine, but strictly do not need to be shown in that order.
change_factors = pd.read_csv(f"{results_path}/change_factors/combined/{file_path}.csv")

Unnamed: 0,change_factor,strategy,measure,value,model_run,dataset,activity_type,scenario,create_datetime
0,baseline,-,admissions,128926,0,synthetic,ip,test,20220509_124120
1,health_status_adjustment,-,admissions,-2853,0,synthetic,ip,test,20220509_124120
2,population_factors,-,admissions,24662,0,synthetic,ip,test,20220509_124120
3,waiting_list_adjustment,-,admissions,1333,0,synthetic,ip,test,20220509_124120
4,admission_avoidance,,admissions,0,0,synthetic,ip,test,20220509_124120
...,...,...,...,...,...,...,...,...,...
33915,health_status_adjustment,-,arrivals,-1016,99,synthetic,aae,test,20220509_124120
33916,population_factors,-,arrivals,17281,99,synthetic,aae,test,20220509_124120
33917,low_cost_discharged,-,arrivals,-810,99,synthetic,aae,test,20220509_124120
33918,left_before_seen,-,arrivals,-41,99,synthetic,aae,test,20220509_124120
