### Mapping variable IDs...

# TODO: 
* make this so it can run SQL queries directly to ICCA - to get attributes
* add config file that defined name or server and database (and table names?). instruct user to set these first.
* add instructions for setup and running streamlit app. (maybe remove jupyerlab?)
* write a python script that will run all the initial SQL queries (to get intervention that are in use in each table) and save them to `data/` directory.

#### The procedure is:
- run script(s) to produce complete intervention tables for each db table (e.g. PtAssessment)
- use search strings to query those tables for each variable of interest, selecting relevant interventionIds manually
- for each interventionId requested, run a query to produce example data with each attribute for that intervention
- review outputs manually and select intervention/attribute pairs, giving a name to each and adding any comments as free text
- save these in a suitable formate (one that allows to check for completeness)

#### Notes:

- In some cases there may be many unrelated or irrelevant interventions pciked up by our search strings. This is the price we pay for trying tn esnure that our search is not missing anything!

In [18]:
import pandas as pd
import pyodbc
from functools import reduce
import numpy as np

In [40]:
schema = pd.read_excel(
    '../schema/smartt_variable_definitions.xlsx', 
    sheet_name='search_strings'
)

In [41]:
schema.head()

Unnamed: 0,Variable,Simple name,Search Strings,Likely ICCA table,Possibly derived,Pacmed ontology,Snowmed CT,Notes
0,Age at admission,age,age,PtDemographics,True,General_information.Patient_characteristics,,
1,Gender,gender,gender,PtDemographics,False,General_information.Patient_characteristics,,
2,Weight at admission,weight,weight,PtDemographics,True,General_information.Patient_characteristics,,
3,Origin department,admit_from,"admit, origin, admission, location",?,True,General_information.Admission_information,,
4,pH,ph,ph,PtLabResults,False,Laboratory_results.Blood_gas_analysis,,


In [8]:
interventions = pd.read_csv('../data/all_ptassessment_interventions_units_5_8_9.rpt', sep='\t')
interventions.dropna(axis=0, subset=['shortLabel', 'longLabel'], inplace=True)
interventions.head()

Unnamed: 0,interventionId,shortLabel,longLabel,conceptCode,numberOfPatients,firstChartTime,lastChartTime,numberOfRecords,minClinicalUnitId,maxClinicalUnitId
1,1746,Airway Respiratory Rate,Airway Respiratory Rate,86290005.0,53345.0,2015-02-03 12:00:00.000,2016-05-30 15:00:00.000,53345.0,5.0,5.0
2,1793,RVSWI,Right Ventricular Stroke Work Index,277380003.0,875.0,2015-05-26 17:55:00.000,2023-06-08 08:00:00.000,875.0,5.0,8.0
3,1798,LVSWI,Left Ventricular Stroke Work Index,276898003.0,464.0,2015-05-26 17:55:00.000,2023-06-08 13:00:00.000,464.0,5.0,8.0
4,1830,Glasgow Coma,Glasgow Coma Scale,386554004.0,2751747.0,2015-02-03 10:00:00.000,2023-08-10 00:00:00.000,2751747.0,5.0,9.0
5,1844,LCWI,Left Cardiac Work Index,399266005.0,480.0,2015-05-26 17:55:00.000,2023-06-08 13:00:00.000,480.0,5.0,8.0


In [6]:
variables = list(schema.Variable)
print(f"There are {len(variables)} variables in the schema.")

There are 65 variables in the schema.


In [50]:
variable_id = 0

In [51]:
search_strings = schema.loc[variable_id]['Search Strings'].split(',')

In [52]:
logical_index = np.logical_or.reduce(
    [
        interventions.longLabel.str.contains(search_string, case=False)
        for search_string in search_strings
    ]
)

In [53]:
display_cols = ['interventionId', 'shortLabel', 'longLabel', 'numberOfPatients', 'firstChartTime', 'firstChartTime']
interventions[logical_index][display_cols]

Unnamed: 0,interventionId,shortLabel,longLabel,numberOfPatients,firstChartTime,firstChartTime.1
20,2016,Temp Management,Temperature Management,420378.0,2015-02-03 09:00:00.000,2015-02-03 09:00:00.000
86,2912,Respiratory Management Prescribe,Respiratory Management Prescribed,353.0,2015-03-27 12:19:00.000,2015-03-27 12:19:00.000
140,2971,Management Plan:,Management Plan:,51753.0,2015-02-27 17:25:00.000,2015-02-27 17:25:00.000
145,2978,Management Plan Other:,Management Plan Other:,2872.0,2015-02-27 17:25:00.000,2015-02-27 17:25:00.000
177,3016,Nutrition Management (Day 3):,Nutrition Management (Day 3):,13118.0,2015-03-27 16:37:00.000,2015-03-27 16:37:00.000
179,3018,Respiratory Management (Day 4):,Respiratory Management (Day 4):,6334.0,2015-03-27 16:37:00.000,2015-03-27 16:37:00.000
219,3078,Drainage Variance,Drainage Variance,627.0,2015-03-24 08:41:00.000,2015-03-24 08:41:00.000
225,3085,Record Drainage from Arm/Leg Dra,Record Drainage from Arm/Leg Drain?,2664.0,2015-03-24 08:37:00.000,2015-03-24 08:37:00.000
236,3096,Continue Fluid Management:,Continue Fluid Management:,3348.0,2015-03-24 08:37:00.000,2015-03-24 08:37:00.000
240,3100,Fluid Management Variance:,Fluid Management Variance:,9920.0,2015-03-24 08:37:00.000,2015-03-24 08:37:00.000


In [54]:
interventions.loc[145]

interventionId                          2978
shortLabel            Management Plan Other:
longLabel             Management Plan Other:
conceptCode                      305260005.0
numberOfPatients                      2872.0
firstChartTime       2015-02-27 17:25:00.000
lastChartTime        2023-08-09 16:55:00.000
numberOfRecords                       2872.0
minClinicalUnitId                        5.0
maxClinicalUnitId                        8.0
Name: 145, dtype: object