# Quick model run through

This is a very quick, rough sketch of how to use the model directly.

One option would be to use the `run_model.py` script, this is what is set up in the VS code debugger.

However, in development, I'm often finding it useful to get the results of running the model to test things out. In the below I'm using `InpatientsModel`, but it would work the same with `AaEModel` or `OutpatientsModel`.

First, these are the mininimum packages required.

In [1]:
%cd ..

c:\Users\thomas.jemmett\dev\nhp\nhp_model


In [2]:
from model.inpatients import InpatientsModel
from model.helpers import load_params
# strictly, not needed, but I am always using these for exploration/playing with the results
import numpy as np
import pandas as pd

Now we have the packages loaded, we need to create an instance of the model. For that we need two things:

1. a parameters file
2. a path to where we store the data

In [3]:
params = load_params("sample_params.json")
model = InpatientsModel(params, "data")

We can now run the model - the `run` method takes one argument, the model run number. Model run 0 is the "princial" model run, which just set's every parameter to be the central point of the confidence interval.

Any value then, from `1` to `params["model_runs"]` will use a different set of parameters sampled from the given confidence intervals. These values are created with `model.generate_run_params`, which is called by the classes constructor, and they utilise `params["seed"]`, so results are reproducible.

The `run` method returns a tuple containing the "change factors" and the model results.

In [4]:
cf, mr = model.run(0)

The "change factors" contains the average expectations of how each step in the model will change the results. For example, if we start off with 100 rows, and our first step has a parameter value of `0.9`, then the expectation of this would be `-10`, i.e. we will reduce the rows by `10` in this step (on average).

In [5]:
cf

Unnamed: 0,change_factor,strategy,measure,value
0,baseline,-,admissions,97614.000000
1,admission_avoidance,alcohol_wholly_attributable,admissions,72.706326
2,admission_avoidance,ambulatory_care_conditions_acute,admissions,191.514148
3,admission_avoidance,ambulatory_care_conditions_chronic,admissions,255.893772
4,admission_avoidance,ambulatory_care_conditions_vaccine_preventable,admissions,153.333173
...,...,...,...,...
107,los_reduction,pre-op_los_2-day,beddays,-8.000000
108,los_reduction,bads_daycase,beddays,-747.000000
109,los_reduction,bads_daycase_occasional,beddays,-769.000000
110,los_reduction,bads_outpatients,beddays,-96.000000


The model results is a pandas dataframe containing the row level results from running the model.

One thing to note is the model duplicates some rows of data, and these need to be excluded. (We extract one years worth of data based on discharge date. But, for calculating how many people are in a bed on any given day, we need to include rows where the patient was admitted in that year, but discharged in the next year. We don't have that data available in our HES extracts, so we duplicate the rows of data where the patient was admitted in the previous year, but discharged in this period, append to the end of the dataset adding one year to the admission date).

In [6]:
mr.loc[~mr.bedday_rows, ["rn", "speldur", "classpat"]].head()

Unnamed: 0,rn,speldur,classpat
0,24523,0.0,2
1,24523,0.0,2
2,24523,0.0,2
3,96870,0.0,2
4,38950,0.0,2


Once we have run the model, we can aggregate the results. This step is performed when we run the model in Azure so we don't have to store all of the individual row level results from each monte carlo simulation.

In [7]:
agg = model.aggregate(mr, 0)

The aggregated results are stored in a dictionary, where each item in that dictionary is a different type of aggregation.

In [8]:
agg.keys()

dict_keys(['default', 'sex+age_group', 'sex+tretspef', 'bed_occupancy', 'theatres_available'])

Each item in the dictionary then contains it's own dictionary, where the keys are a named tuple. They are stored in this way because:
a) each aggregation can use different groupings of columns from the model results, so it's hard to combine as a single data frame
b) we eventually want to land these results into a document database, so it needs to convert to JSON.

In [9]:
agg["default"]

{results(pod='ip_elective_admission', measure='admissions'): 7471.0,
 results(pod='ip_elective_admission', measure='beddays'): 35793.0,
 results(pod='ip_elective_admission', measure='procedures'): 4169.0,
 results(pod='ip_elective_daycase', measure='admissions'): 56395.0,
 results(pod='ip_elective_daycase', measure='beddays'): 56395.0,
 results(pod='ip_elective_daycase', measure='procedures'): 31186.0,
 results(pod='ip_non-elective_admission', measure='admissions'): 58528.0,
 results(pod='ip_non-elective_admission', measure='beddays'): 278984.0,
 results(pod='ip_non-elective_admission', measure='procedures'): 9789.0,
 results(pod='ip_non-elective_birth-episode', measure='admissions'): 1135.0,
 results(pod='ip_non-elective_birth-episode', measure='beddays'): 1738.0,
 results(pod='ip_non-elective_birth-episode', measure='procedures'): 232.0,
 results(pod='op_procedure', measure='attendances'): 503.0}

In [10]:
# we can see how the named tuples change if we look at different aggregations
agg["sex+age_group"]

{results(pod='ip_elective_admission', measure='admissions', sex=1, age_group=' 0- 4'): 65.0,
 results(pod='ip_elective_admission', measure='admissions', sex=1, age_group=' 5-14'): 85.0,
 results(pod='ip_elective_admission', measure='admissions', sex=1, age_group='15-34'): 203.0,
 results(pod='ip_elective_admission', measure='admissions', sex=1, age_group='35-49'): 309.0,
 results(pod='ip_elective_admission', measure='admissions', sex=1, age_group='50-64'): 712.0,
 results(pod='ip_elective_admission', measure='admissions', sex=1, age_group='65-84'): 1727.0,
 results(pod='ip_elective_admission', measure='admissions', sex=1, age_group='85+'): 373.0,
 results(pod='ip_elective_admission', measure='admissions', sex=2, age_group=' 0- 4'): 47.0,
 results(pod='ip_elective_admission', measure='admissions', sex=2, age_group=' 5-14'): 61.0,
 results(pod='ip_elective_admission', measure='admissions', sex=2, age_group='15-34'): 423.0,
 results(pod='ip_elective_admission', measure='admissions', sex=2

It can be useful to convert a single aggregation back to a pandas dataframe, this can be achieved like so:

In [11]:
pd.DataFrame(
    [
        {
            **k._asdict(),
            "value": v
        }
        for k, v in agg["default"].items()
    ]
)

Unnamed: 0,pod,measure,value
0,ip_elective_admission,admissions,7471.0
1,ip_elective_admission,beddays,35793.0
2,ip_elective_admission,procedures,4169.0
3,ip_elective_daycase,admissions,56395.0
4,ip_elective_daycase,beddays,56395.0
5,ip_elective_daycase,procedures,31186.0
6,ip_non-elective_admission,admissions,58528.0
7,ip_non-elective_admission,beddays,278984.0
8,ip_non-elective_admission,procedures,9789.0
9,ip_non-elective_birth-episode,admissions,1135.0
