# Algorex HCC Library
This notebook provides a demonstration of the Algorex HCC library. In developing this library we priortized three main features:

1. Return an accurate set of risk scores using the HCC Algorithim for all available models. 
2. Provide easy inputs so that the library can be integrated in a range of analytical or other applications. 
3. Give analyst/developer a rich interface into the underlying mechanics of the risk adjusment algorithim. Whether to the codes, their mappings, and their coefficients. 


In [None]:
from bokeh.plotting import figure, output_notebook, show, output_file
from bokeh.palettes import Blues, BuGn, viridis, Paired, plasma, PuBuGn
from bokeh.models import FixedTicker, FactorRange,Color,CustomJS, HoverTool,CategoricalAxis, LabelSet, Label, ColumnDataSource, widgets,CategoricalColorMapper,LinearInterpolator, LinearColorMapper, LogColorMapper
from bokeh.models import (GMapPlot, GMapOptions, Range1d, PanTool, WheelZoomTool, BoxSelectTool, HoverTool,  ResetTool, ZoomInTool, ZoomOutTool)
from bokeh.layouts import row, column, widgetbox
from bokeh.models.glyphs import Patches, Line, Circle
from bokeh.charts import Histogram, output_file, show, Bar, color,  Scatter
from bokeh.resources import CDN
import sklearn
from sklearn.utils.validation import check_array
import random
import squarify
import pandas as pd
import numpy as np


output_notebook()


%load_ext sql
%sql sqlite:///mit-poster.db
%sql ATTACH '../jupyterdemo/claims.db' as mem;


In [1]:
import AlgorexCore as rex
from hcc import *
cvars= community_regression()

bottom


## Loading Data
For the use of the demonstration, we are using a deidentifed claims data warehouse embedded in this notebook for the tutorial. This simulates exactly how you can integrate this library with other data warehouses or databases. 

In the cell below, we are selecting the diagnosis codes for one patient in one year and then we select the same patient in the next year. 



In [None]:
yr1_diags = %sql SELECT DISTINCT Diagnosis  From diagnoses  where PatientID = '0220F11E0B2EC004' and ClaimFromDate like '2008%';
yr2_diags = %sql SELECT DISTINCT Diagnosis  From diagnoses  where PatientID = '0220F11E0B2EC004' and ClaimFromDate like '2009%';

To load the data into the HCC library, you create beneficiary objects and add the diagnosis codes. Our example patient has **{{len(yr1_diags)}}** seperate diagnosis codes to be added in year 1 and **{{len(yr2_diags)}}** in year 2. 

The same use case can be used for whenever users have diagnosis codes from multiple sources such as from:
* at-home assesments from a third-party
* mined medical records
* other sources based on outreach efforts.

The library allows the developer/analyst to compare the output of the sources and compare the uplift(or downlift). 

Our example patient here is being named 'Jane'. 


In [None]:
jane = Beneficiary(hicno='0220F11E0B2EC004', sex='female', dob='19480601')
jane_alt = Beneficiary(hicno='0220F11E0B2EC004', sex='male', dob='19480601')

#antonio.add_diagnosis(Diagnosis(antonio,"49320",ICDType.NINE))


for icd9 in yr1_diags:
    code = icd9[0]
    #print(type(code), code)
    jane.add_diagnosis(Diagnosis(jane,code, int(ICDType.NINE)))

for icd9 in yr1_diags + yr2_diags:
    code = icd9[0]
    #print(type(code), code)
    jane.add_diagnosis(Diagnosis(jane_alt,code, int(ICDType.NINE)))

The library can output scores across all of the CMS-HCC models including instituional, new enrollee, and community rated groups. For 2017, there will be nine seperate models and will update to allow for that. 

In [None]:
beneficiary_has_hcc(jane, X)

print(score(jane,X,Score))
print(score(jane_alt,X,Score))

## Understand the Mechanics

Getting the score is just one data point, what about how that score worked? the HCC models consists over close to 190 disease categories and complex disease interactions. With our library, the analyst can access the underlying coefficients that are used for this patient. 

In the example below, we are using our sample patient 'Jane' and asking the system to obtain all the indicator variables this patient is eligible. We can also see all the parts that lead up to this score. 

![diagram](./diagram.png)

The query below is based on this idea. Identify all indicators and underlying categories and codes that are true for this patient. 

In [None]:

results = indicator(jane,CC) & beneficiary_icd(jane,ICD,Type)  & CC.in_(cvars) & coefficient("CE_"+CC,Coef)   
print(results[0:30])

In [None]:
x = 0
y = 0
width = 40
height = 40
jane_coefs = indicator(jane,CC)  & CC.in_(cvars) & coefficient("CE_"+CC,Coef)
jane_coefs = sorted(jane_coefs,key= lambda val:  val[1], reverse=True)

normed = squarify.normalize_sizes([value for name, value in jane_coefs], width, height)
rects = squarify.squarify(normed, x, y, width, height)
shapes = []
annotations = []
counter = 0
for r in rects:
    name, value = jane_coefs[counter]
    shapes.append(
        dict(
            x0 = r['x'] + r['dx']/2 ,
            y0 = r['y'] + r['dy']/2 , 
            width = r['dx'],
            height = r['dy'],
            color=random.choice(Paired[12]),
            text=name,
            score=value,
            x1 = r['x'],
            y1 = r['y']
        ) 
    )
    counter = counter + 1

mapper = LinearColorMapper(palette=['#7fcdbb', '#41b6c4', '#1d91c0', '#225ea8', '#08589e'])

r_data = ColumnDataSource(ColumnDataSource.from_df(pd.DataFrame.from_dict(shapes)))
r_data.data['text'] = [rex.name_hcc(label) for label in r_data.data['text']]

q = figure(x_range=Range1d(x, width), y_range=Range1d(y, height), title='Condition Contribution to Jane Risk Score in Year 1')
q.rect( x='x0', y='y0', width='width', height='height', fill_color={'field':'score', 'transform':mapper},
           line_alpha=1, line_color='#FFFFFF', source=r_data)
q.text(x='x1', y='y1', text='text', source=r_data, x_offset=3, y_offset=1
       , text_font_size='8pt')
q.xaxis.visible = False
q.yaxis.visible = False





In [None]:

janealt_coefs = indicator(jane_alt,CC)  & CC.in_(cvars) & coefficient("CE_"+CC,Coef)
janealt_coefs = sorted(janealt_coefs,key= lambda val:  val[1], reverse=True)
normed_alt = squarify.normalize_sizes([value for name, value in janealt_coefs], width, height)
rects_alt = squarify.squarify(normed_alt, x, y, width, height)
shapes_alt = []

counter_alt = 0
for r in rects_alt:
    name, value = janealt_coefs[counter_alt]
    shapes_alt.append(
        dict(
            x0 = r['x'] + r['dx']/2 ,
            y0 = r['y'] + r['dy']/2 , 
            width = r['dx'],
            height = r['dy'],
            color=random.choice(Paired[12]),
            text=name,
            score=value,
            x1 = r['x'],
            y1 = r['y']
        ) 
    )
    counter_alt = counter_alt + 1

z_data = ColumnDataSource(ColumnDataSource.from_df(pd.DataFrame.from_dict(shapes_alt)))
z_data.data['text'] = [rex.name_hcc(label) for label in z_data.data['text']]

z = figure(x_range=Range1d(x, width), y_range=Range1d(y, height), title="Condition Contribution to Risk Score for Jane over multiple years")
z.rect( x='x0', y='y0', width='width', height='height', fill_color={'field':'score', 'transform':mapper},
           line_alpha=1, line_color='#FFFFFF', source=z_data)
z.text(x='x1', y='y1', text='text', source=z_data, x_offset=3, y_offset=1
       , text_font_size='8pt')
z.xaxis.visible = False
z.yaxis.visible = False






In [None]:
show(row(q,z))


## Scaling Up

So far we have been focused on doing this with only one patient but it is also possible for us to scale up the calculation (all the way to a full population). Here we will do 100 patients. 

In [None]:
cohort_study = %sql \
    SELECT d.PatientID, p.SEX, p.DOB \
        from diagnoses d join mem.patients p\
                on d.PatientID = p.PATIENT_ID\
        group by PatientID,p.SEX, p.DOB\
        having COUNT(*) > 20\
        order by Count(*) desc\
        limit 100;

In [None]:
def gender(l):
    if l == 'M':
        return 'male'
    else:
        return 'female'


cohort_patients = [Beneficiary(hicno=row[0], sex=gender(row[1]), dob=str(row[2])) for row in cohort_study[1:]]


In [None]:
hicno = [pat.hicno for pat in cohort_patients]
hicno = tuple(hicno)
diags = %sql SELECT DISTINCT PatientID, Diagnosis from diagnoses where PatientID in $hicno
diags = diags.DataFrame().set_index('PatientId')

for pat in cohort_patients:
    pat_diags = diags['Diagnosis'].loc[pat.hicno].tolist()
    for code in pat_diags:
        pat.add_diagnosis(Diagnosis(pat,code,ICDType.NINE))

Here is the average score for this cohort. 


In [None]:
cohort_score  = [score(pat,"community",Score)[0][0] for pat in cohort_patients[0:5]]
np.mean(cohort_score)

In [None]:
#indicator(X,CC)  & CC.in_(cvars) & coefficient("CE_"+CC,Coef) & X.in_(cohort_patients[0:2])

In [None]:
indicator(X,CC) & X.in_([daniel,jane]) 