# Algorex HCC Library
This notebook provides a demonstration of the Algorex HCC library. In developing this library we priortized three main features:

1. Return an accurate set of risk scores using the HCC Algorithim for all available models. 
2. Provide easy inputs so that the library can be integrated in a range of analytical or other applications. 
3. Give analyst/developer a rich interface into the underlying mechanics of the risk adjusment algorithim. Whether to the codes, their mappings, and their coefficients. 


In [63]:
from bokeh.plotting import figure, output_notebook, show, output_file
from bokeh.palettes import Blues, BuGn, viridis, Paired, plasma, PuBuGn
from bokeh.models import FixedTicker, FactorRange,CustomJS, HoverTool,CategoricalAxis, LabelSet, Label, ColumnDataSource, widgets,CategoricalColorMapper,LinearInterpolator, LinearColorMapper, LogColorMapper
from bokeh.models import (GMapPlot, GMapOptions, Range1d, PanTool, WheelZoomTool, BoxSelectTool, HoverTool,  ResetTool, ZoomInTool, ZoomOutTool)
from bokeh.layouts import row, column, widgetbox
from bokeh.models.glyphs import Patches, Line, Circle
# from bkcharts import Histogram, output_file, show, Bar, color,  Scatter
from bokeh.resources import CDN
import sklearn
from sklearn.utils.validation import check_array
import random
import squarify
import pandas as pd
import numpy as np
from IPython.display import display, Markdown


output_notebook()


# 

In [141]:

from hcc_v23 import *
cvars= community_aged_regression()

## Loading Data

For the use of the demonstration, we have generated a condition data set using the amazing opensource library [Synthea](https://github.com/synthetichealth/synthea) which can generate "real-looking" but still fake data for use in testing libraries just like this. 

Synthea's default vocabulary is SNOMED and HCCs default vocabulary is ICD so we have mapped the SNOMED codes to roughly equivalent ICD codes in thsi dataset using the NLM SNOMED CT to ICD10 Map. 


We will also create a helper function to provide labels for the HCCs. 






In [163]:
conditions_synthetic_data = pd.read_csv("/Users/luke/Projects/hcc-python/conditions.csv", dtype={'CODE':str, 'icd_10_code': str})
conditions_synthetic_data['icd_10_code'] = conditions_synthetic_data['icd_10_code'].apply(lambda x:  str(x).replace('.',''))



def name_hcc(label):
  try:
    var = hcc_labels[label][0:8]
    return var
  except:
    return label[0:8]


hcc_labels = {'HCC1':"AIDS",
'HCC2':"Septicemia",
'HCC6':"Opportunistic_Infections",
'HCC8':"Metastatic_Cancer",
'HCC9':"Lung_Cancers",
'HCC10':"Lymphoma",
'HCC11':"Colorectal_Cancers",
'HCC12':"Breast_Prostate_Tumors",
'HCC17':"Diabetes_Acute_Complications",
'HCC18':"Diabetes_Chronic",
'HCC19':"Diabetes",
'HCC21':"Malnutrition",
'HCC22':"Obesity",
'HCC23':"Endocrine",
'HCC27':"ESLD",
'HCC28':"Cirrhosis",
'HCC29':"Hepatitis",
'HCC33':"Intestinal_Obstruction",
'HCC34':"Pancreatitis",
'HCC35':"IBD",
'HCC39':"BJM_Infections",
'HCC40':"Rheumatoid_Arthritis",
'HCC46':"Hematology_Disorders",
'HCC47':"Immunity_Disorders",
'HCC48':"Coagulation_Defects",
'HCC54':"Substance_Psychosis",
'HCC55':"Substance_Dependence",
'HCC57':"Schizophrenia",
'HCC58':"Deppresion_Bipolar",
'HCC70':"Quadriplegia",
'HCC71':"Paraplegia",
'HCC72':"Spinal_Cord_Disorders",
'HCC73':"ALS",
'HCC74':"Cerebral_Palsy",
'HCC75':"Myasthenia",
'HCC76':"Muscular_Dystrophy",
'HCC77':"Multiple_Sclerosis",
'HCC78':"Parkinson_Huntington",
'HCC79':"Seizures",
'HCC80':"Coma",
'HCC82':"Respirator_Dependence",
'HCC83':"Respiratory-Arrest",
'HCC84':"Cardio-Respiratory-Failure",
'HCC85':"CHF",
'HCC86':"AMI",
'HCC87':"IVD",
'HCC88':"Angina_Pectoris",
'HCC96':"Arrhythmias",
'HCC99':"Cerebral_Hemorrhage",
'HCC100':"Stroke",
'HCC103':"Hemiplegia",
'HCC104':"Monoplegia",
'HCC106':"Gangrene",
'HCC107':"Vascular_Disease_Complications",
'HCC108':"Vascular_Disease",
'HCC110':"Cystic_Fibrosis",
'HCC111':"COPD",
'HCC112':"Fibrosis_Lung",
'HCC114':"Pneumonias",
'HCC115':"Pneumococcal",
'HCC122':"Retinopathy",
'HCC124':"Macular_Degeneration",
'HCC134':"Dialysis_Status",
'HCC135':"Acute_Renal_Failure",
'HCC136':"CKD5",
'HCC137':"CKD6",
'HCC157':"Ulcer_Necrosis",
'HCC158':"Ulcer_Skin_Loss",
'HCC161':"Ulcer_Chronic",
'HCC162':"Burn",
'HCC166':"Head_Injury_Severe",
'HCC167':"Head_Injury",
'HCC169':"Spinal_Cord",
'HCC170':"Hip_Fracture",
'HCC173':"Amputation",
'HCC176':"Graft_Implant",
'HCC186':"Organ_Transplant",
'HCC188':"Artifical_Opening",
'HCC189':"Amputation_Complicated"}

In [153]:
top_diags = conditions_synthetic_data.groupby('PATIENT').count()['icd_10_code'].sort_values().tail(2)
patient_1 = top_diags.index[0]
patient_2 = top_diags.index[1]



In [154]:
Markdown(f'''To load the data into the HCC library, you create beneficiary objects and add the diagnosis codes. Our example patient has **{conditions_synthetic_data.groupby('PATIENT').count().loc[patient_1]['CODE']}** seperate diagnosis codes to be added.

The same use case can be used for whenever users have diagnosis codes from multiple sources such as from:
* at-home assesments from a third-party
* mined medical records
* other sources based on outreach efforts.

The library allows the developer/analyst to compare the output of the sources and compare the uplift(or downlift). 

Our example patient here is being named 'Jane'. 
''')


To load the data into the HCC library, you create beneficiary objects and add the diagnosis codes. Our example patient has **27** seperate diagnosis codes to be added.

The same use case can be used for whenever users have diagnosis codes from multiple sources such as from:
* at-home assesments from a third-party
* mined medical records
* other sources based on outreach efforts.

The library allows the developer/analyst to compare the output of the sources and compare the uplift(or downlift). 

Our example patient here is being named 'Jane'. 


In [155]:
conditions_synthetic_data[conditions_synthetic_data.PATIENT == patient_1]

Unnamed: 0,START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION,icd_10_code
50663,1943-08-17,,7b5ed251-b412-4a25-af47-8c55de1c1e67,53ba9dae-bb90-4009-a0ca-52b15ae29882,162864005,Body mass index 30+ - obesity (finding),E669
50664,1952-08-26,,7b5ed251-b412-4a25-af47-8c55de1c1e67,c248e591-4efe-4ef8-9170-2ff14b38783d,449868002,Smokes tobacco daily,Z720
50665,1959-06-23,,7b5ed251-b412-4a25-af47-8c55de1c1e67,79bcab05-3844-4bfb-99ef-9bf266897256,44054006,Diabetes,E119
50666,1959-06-23,,7b5ed251-b412-4a25-af47-8c55de1c1e67,79bcab05-3844-4bfb-99ef-9bf266897256,271737000,Anemia (disorder),D649
50667,1961-06-27,,7b5ed251-b412-4a25-af47-8c55de1c1e67,e8831d39-f832-416b-94b6-7ba72806fbdc,302870006,Hypertriglyceridemia (disorder),E781
50668,1961-06-27,,7b5ed251-b412-4a25-af47-8c55de1c1e67,e8831d39-f832-416b-94b6-7ba72806fbdc,237602007,Metabolic syndrome X (disorder),E8881
50669,1962-04-24,,7b5ed251-b412-4a25-af47-8c55de1c1e67,2199de83-0840-45ae-8f3f-0ebed916c37a,431855005,Chronic kidney disease stage 1 (disorder),
50670,1962-04-24,,7b5ed251-b412-4a25-af47-8c55de1c1e67,2199de83-0840-45ae-8f3f-0ebed916c37a,127013003,Diabetic renal disease (disorder),E1121
50671,1963-05-27,,7b5ed251-b412-4a25-af47-8c55de1c1e67,d3d61ac6-a2c1-4ed0-b45d-93ef8a8123aa,90560007,Gout,M109
50672,1963-06-18,,7b5ed251-b412-4a25-af47-8c55de1c1e67,768afb24-fefd-4127-920e-c37f255caa9e,431856006,Chronic kidney disease stage 2 (disorder),


In [158]:
jane = Beneficiary(hicno=patient_1, sex='female', dob='19480601')
jane_alt = Beneficiary(hicno=patient_2, sex='male', dob='19480601')

#antonio.add_diagnosis(Diagnosis(antonio,"49320",ICDType.NINE))


for row in conditions_synthetic_data[conditions_synthetic_data.PATIENT == patient_1].itertuples():
    code = row.icd_10_code
    jane.add_diagnosis(Diagnosis(jane,code, int(ICDType.TEN)))



The library can output scores across all of the CMS-HCC models including instituional, new enrollee, and community rated groups. For 2017, there will be nine seperate models and will update to allow for that. 

In [159]:
beneficiary_has_hcc(jane, X)

print(score(jane,X,Score))

X                  | Score             
-------------------|-------------------
institutional      | 2.1519999999999997
community_disabled | 1.551             
community          | 1.8969999999999998


## Understand the Mechanics

Getting the score is just one data point, what about how that score worked? the HCC models consists over close to 190 disease categories and complex disease interactions. With our library, the analyst can access the underlying coefficients that are used for this patient. 

In the example below, we are using our sample patient 'Jane' and asking the system to obtain all the indicator variables this patient is eligible. We can also see all the parts that lead up to this score. 

![diagram](./diagram.png)

The query below is based on this idea. Identify all indicators and underlying categories and codes that are true for this patient. 

In [160]:

results = indicator(jane,CC) & beneficiary_icd(jane,ICD,Type)  & CC.in_(cvars) & coefficient("CE_"+CC,Coef)   
print(results[0:30])

[('HCC96', 'D4959', 0, 0.276), ('HCC96', 'J329', 0, 0.276), ('HCC96', 'E119', 0, 0.276), ('HCC96', 'I4891', 0, 0.276), ('HCC96', 'M109', 0, 0.276), ('HCC96', 'I639', 0, 0.276), ('HCC96', 'J029', 0, 0.276), ('HCC96', 'E1121', 0, 0.276), ('HCC96', 'Z720', 0, 0.276), ('HCC96', 'E781', 0, 0.276), ('HCC96', 'E8881', 0, 0.276), ('HCC96', 'nan', 0, 0.276), ('HCC96', 'D649', 0, 0.276), ('HCC96', 'R809', 0, 0.276), ('HCC96', 'S5290X?', 0, 0.276), ('HCC96', 'D075', 0, 0.276), ('HCC96', 'J209', 0, 0.276), ('HCC96', 'I509', 0, 0.276), ('HCC96', 'E669', 0, 0.276), ('HCC96', 'I2510', 0, 0.276), ('HCC96', 'B349', 0, 0.276), ('HCC85', 'D4959', 0, 0.361), ('HCC85', 'J329', 0, 0.361), ('HCC85', 'E119', 0, 0.361), ('HCC85', 'I4891', 0, 0.361), ('HCC85', 'M109', 0, 0.361), ('HCC85', 'I639', 0, 0.361), ('HCC85', 'J029', 0, 0.361), ('HCC85', 'E1121', 0, 0.361), ('HCC85', 'Z720', 0, 0.361)]


In [168]:
x = 0
y = 0
width = 40
height = 40
jane_coefs = indicator(jane,CC)  & CC.in_(cvars) & coefficient("CE_"+CC,Coef)
jane_coefs = sorted(jane_coefs,key= lambda val:  val[1], reverse=True)

normed = squarify.normalize_sizes([value for name, value in jane_coefs], width, height)
rects = squarify.squarify(normed, x, y, width, height)
shapes = []
annotations = []
counter = 0
for r in rects:
    name, value = jane_coefs[counter]
    shapes.append(
        dict(
            x0 = r['x'] + r['dx']/2 ,
            y0 = r['y'] + r['dy']/2 , 
            width = r['dx'],
            height = r['dy'],
            color=random.choice(Paired[12]),
            text=name,
            score=value,
            x1 = r['x'],
            y1 = r['y']
        ) 
    )
    counter = counter + 1

mapper = LinearColorMapper(palette=['#7fcdbb', '#41b6c4', '#1d91c0', '#225ea8', '#08589e'])

r_data = ColumnDataSource(ColumnDataSource.from_df(pd.DataFrame.from_dict(shapes)))
r_data.data['text'] = [name_hcc(label) for label in r_data.data['text']]

q = figure(x_range=Range1d(x, width), y_range=Range1d(y, height), title='Condition Contribution to Jane Risk Score in Year 1')
q.rect( x='x0', y='y0', width='width', height='height', fill_color={'field':'score', 'transform':mapper},
           line_alpha=1, line_color='#FFFFFF', source=r_data)
q.text(x='x1', y='y1', text='text', source=r_data, x_offset=3, y_offset=1
       , text_font_size='8pt')
q.xaxis.visible = False
q.yaxis.visible = False





In [169]:
show(q)


## Scaling Up

So far we have been focused on doing this with only one patient but it is also possible for us to scale up the calculation (all the way to a full population). Here we will do 100 patients. 

In [226]:
cohort_study = list(conditions_synthetic_data.PATIENT.value_counts().index[0:100])


In [232]:



cohort_patients = [Beneficiary(hicno=row, sex=random.choice(('male','female')), dob='19450101') for row in cohort_study]


In [233]:

diags = {}
for patient in cohort_patients:
    patient_diagnoses = [row.icd_10_code for row in conditions_synthetic_data[conditions_synthetic_data.PATIENT == patient.hicno].itertuples()]
    diags[patient] = patient_diagnoses




In [234]:
for patient, diags in diags.items():
    for code in diags:
        patient.add_diagnosis(Diagnosis(patient,code,ICDType.TEN))

In [238]:
cohort_score  = [score(pat,"community",Score) for pat in cohort_patients[0:10]]
cohort_score

[[(1.311,)],
 [],
 [(1.5510000000000002,)],
 [(0.571,)],
 [(1.02,)],
 [(0.677,)],
 [(0.124,)],
 [(1.803,)],
 [(2.0580000000000003,)],
 [(0.124,)]]