## A Wild Demo Grouper
```
为明确步骤，以下方法逻辑有冗余（包括代码先后顺序是不合理的），实际运行推荐使用C或Spark
```

### 0. Review

#### 0.0. DRG 基本概念
> **疾病诊断相关组(Diagnosis Related Groups，DRG)**是用于衡量 医疗服务质量效率以及进行医保支付的一个重要工具。DRG 实质上 是一种病例组合分类方案，即根据年龄、疾病诊断、合并症、并发症、 治疗方式、病症严重程度及转归和资源消耗等因素，将患者分入若干 诊断组进行管理的体系。

#### 0.1. DRG 付费适用范围
##### 适用范围

>DRG 是以划分医疗服务产出为目标(同组病例医疗服务产出的
期望相同)，其本质上是一套“管理工具”，只有那些诊断和治疗方式 对病例的资源消耗和治疗结果影响显著的病例，才适合使用 DRG 作为风险调整工具，较适用于急性住院病例(Acute inpatients)。

##### 不适用范围

>不适用于以下情况，应作“除外”处理:1门诊病例;2康复病例; 3需要长期住院的病例;4某些诊断相同，治疗方式相同，但资源消耗和治疗结果变异巨大病例(如精神类疾病)。



#### 0.2. 分组原则
> (1)逐层细化、大类概括; <br>
> (2)疾病诊断、手术或操作临床过程相似，资源消耗相近; <br>
> (3)临床经验与数据验证相结合; <br>
> (4)兼顾医保支付的管理要求和医疗服务的实际需要。<br>

#### 0.3. 分组理念
> DRG 分组采用病例组合(Case-mix)思想，疾病类型不同，应该通过诊断区分开;同类病例但治疗方式不同，亦应通过操作区分开; 同类病例同类治疗方式，但病例个体特征不同，还应该通过年龄、并发症与合并症、出生体重等因素区分开，最终形成 DRG 组。

#### 0.4. Pre-MDC 目录

|DRG 编码|DRG 名称|
|-|-|
|MDCA|器官、骨髓或造血干细胞移植|
|MDCA|气管切开伴呼吸机支持|
|MDCP|出生<29 天内的新生儿|
|MDCY|HIV 感染疾病及相关操作|
|MDCZ|多发严重创伤|

In [9]:
pre_mdc=['MDCA','MDCP','MDCY','MDCZ']

#### 0.5. MCC/CC 处理
> 直接以次要诊断是否在列表中确定 MCC/CC 的列表模式(美国模式)<br>
> 以病人临床复杂水平(PCCLs)确定 MCC/CC 的权重模式 (澳大利亚模式)

#### 0.6. 判断细分 DRG 组，需同时满足以下条件: 
> (1)组内病例数不少于 100 例;<br>
> (2)组内变异系数 CV<1(临床专家判断成组除外);<br> 
> (3)同一 ADRG 内细分的 DRG 组间平均费用的相对差异不低于 20%<br>

### 1. Start Grouping

In [10]:
import pandas as pd
import numpy as np

#### 1.1. Load Data & Clean

In [38]:
case_data=pd.read_csv('./dummy_data/dummy_cases.csv')

In [39]:
# drop cases without diag_p
case_data=case_data.loc[case_data.diag_p.notna()]

In [40]:
# 字段名称含义参见01_Gen_Dummy_Case.ipynb
case_data.head()

Unnamed: 0,case_id,gender,age,diag_p,diag_oth,oper_p,fee,mdci,adrgi,drgii
4,ectQx,0,78,D37.019,T82.501,,389.045262,MDCD,DR1,DR13
5,URvIm,0,39,R82.500x004,T91.205,,621.433309,MDCL,LW1,LW15
7,KbqpW,0,73,N71.001,I72.400x310,,822.699113,MDCN,NS1,NS19
9,ZSFiM,0,75,O00.804,D69.500,65.0102,630.64349,MDCO,OE1,OE19
11,LkpPL,0,40,R04.800x002,I80.804,,363.638841,MDCE,EV1,EV19


In [None]:
# 实际还有一些病例按照住院天数和费用进行裁剪（outlier），在此本来就是造的数据，所有偷懒省略

#### 1.2. Assign MDC

In [27]:
# load ICD to MDC table
icd2mdc=pd.read_csv('./LOOKUP_TABLES/1_5_ICD_to_MDC.csv',encoding='gbk')
icd2mdc.drop(columns=['Unnamed: 0','ADRG_CODE','ADRG_NAME','AND_FLAG','GROUP_FLAG','is_icd_mdc'],inplace=True)

In [28]:
icd2mdc.head()

Unnamed: 0,MDC_CODE,MDC_NAME,ICD_CODE,ICD_DESC
0,MDCB,神经系统疾病及功能障碍,A01.002+G01*,伤寒性脑膜炎
1,MDCB,神经系统疾病及功能障碍,A02.203+G01*,沙门菌脑膜炎
2,MDCB,神经系统疾病及功能障碍,A06.600+G07*,阿米巴脑脓肿
3,MDCB,神经系统疾病及功能障碍,A17.000+G01*,结核性脑膜炎
4,MDCB,神经系统疾病及功能障碍,A17.000x001+G05.0*,结核性脊膜炎


In [34]:
icd2mdc_dict={icd:mdc for icd,mdc in zip(icd2mdc.ICD_CODE,icd2mdc.MDC_CODE)}

In [30]:
icd2mdc2adrg=pd.read_csv('./LOOKUP_TABLES/2_5_MDC_ICD9_to_ADRG.csv',encoding='utf-8')

In [31]:
icd2mdc2adrg.head()

Unnamed: 0,MDC_CODE,MDC_NAME,ADRG_CODE,ADRG_NAME,ICD_CODE,ICD_DESC,AND_FLAG,GROUP_FLAG,is_icd_mdc,is_icd9
0,MDCA,先期分组疾病及相关操作,AA1,心脏移植,37.51,心脏移植术,N,0,True,True
1,MDCA,先期分组疾病及相关操作,AB1,肝移植,50.51,辅助肝移植,N,0,True,True
2,MDCA,先期分组疾病及相关操作,AB1,肝移植,50.5100x001,同种异体原位肝移植术,N,0,True,True
3,MDCA,先期分组疾病及相关操作,AB1,肝移植,50.5900x001,肝肾联合移植术,N,0,True,True
4,MDCA,先期分组疾病及相关操作,AB1,肝移植,50.5900x004,同种异体肝肾联合移植术,N,0,True,True


In [79]:
# 查找那些pre-mdc是只通过手术编码来确定的
icd9_pre_mdc_check=icd2mdc2adrg.groupby('MDC_CODE')['is_icd9'].prod()
icd9_pre_mdc_check=icd9_pre_mdc_check.to_frame()

In [82]:
icd9_pre_mdc_check.loc[icd9_pre_mdc_check.is_icd9==True]

Unnamed: 0_level_0,is_icd9
MDC_CODE,Unnamed: 1_level_1
MDCA,True
MDCZ,True


In [124]:
# 实际观察MDCZ作为pre-mdc在计算时有问题，暂不纳入，估计原来纳入考虑医学因素比较多，在医学上是融洽的
icd9mdc=icd2mdc2adrg.loc[(icd2mdc2adrg.is_icd9==True) \
                 & (\
                    (icd2mdc2adrg.MDC_CODE=='MDCA') \
#                     |(icd2mdc2adrg.MDC_CODE=='MDCP')\
#                     |(icd2mdc2adrg.MDC_CODE=='MDCY')\
#                     |(icd2mdc2adrg.MDC_CODE=='MDCZ')\
                   )].copy()

In [126]:
icd9mdc_dict={icd:mdc for icd,mdc in zip(icd9mdc.ICD_CODE,icd9mdc.MDC_CODE)}

In [127]:
# process pre_mdcs needed
# 'MDCA','MDCP','MDCY','MDCZ'
# 依据诊断部分可与普通分组一起处理，依据手术和年龄的pre-mdc需单独处理
# mdc_r(eal) adrg_r(eal),etc
case_data['mdc_r']=case_data['diag_p'].apply(lambda x:icd2mdc_dict.get(x,'NA'))

# 分开处理的依据手术操作确认的pre-mdc部分,mdc_r2为临时变量
case_data['mdc_r2']=case_data['oper_p'].apply(lambda x:icd9mdc_dict.get(x,'NA'))

case_data.loc[case_data.mdc_r2!='NA','mdc_r']=case_data.mdc_r2

# 假装处理一下按年龄的pre-mdc实际dummy数据不存在这部分数据
case_data.loc[case_data.age<1,'mdc_r']='MDCP'

case_data.drop(columns=['mdc_r2'],inplace=True)

In [136]:
case_data.head()

Unnamed: 0,case_id,gender,age,diag_p,diag_oth,oper_p,fee,mdci,adrgi,drgii,mdc_r
4,ectQx,0,78,D37.019,T82.501,,389.045262,MDCD,DR1,DR13,MDCD
5,URvIm,0,39,R82.500x004,T91.205,,621.433309,MDCL,LW1,LW15,MDCL
7,KbqpW,0,73,N71.001,I72.400x310,,822.699113,MDCN,NS1,NS19,MDCN
9,ZSFiM,0,75,O00.804,D69.500,65.0102,630.64349,MDCO,OE1,OE19,MDCO
11,LkpPL,0,40,R04.800x002,I80.804,,363.638841,MDCE,EV1,EV19,MDCE


#### 1.3. Assign ADRG

In [138]:
icd2mdc2adrg.head(3)

Unnamed: 0,MDC_CODE,MDC_NAME,ADRG_CODE,ADRG_NAME,ICD_CODE,ICD_DESC,AND_FLAG,GROUP_FLAG,is_icd_mdc,is_icd9
0,MDCA,先期分组疾病及相关操作,AA1,心脏移植,37.51,心脏移植术,N,0,True,True
1,MDCA,先期分组疾病及相关操作,AB1,肝移植,50.51,辅助肝移植,N,0,True,True
2,MDCA,先期分组疾病及相关操作,AB1,肝移植,50.5100x001,同种异体原位肝移植术,N,0,True,True


In [139]:
# 类似通过MDC+ICD10/9查到对应的ADRG，到这看起来前面MDC这一步有点冗余，但MDC主要能通过先期分组吸纳了一些特殊病例
# 此处实际存在不唯一性需要地方细化规则，目前处理重复的key是以最后一次出现的value为准
# 以MDCO+O00.804 为例，实际分到了OE1与OT1两个ADRG组
mdc_icd9_adrg_dict={(mdc,icd):adrg for mdc,icd,adrg in zip(icd2mdc2adrg.MDC_CODE,icd2mdc2adrg.ICD_CODE,icd2mdc2adrg.ADRG_CODE)}

In [140]:
mdc_icd9_adrg_dict[('MDCA','50.51')]

'AB1'

In [149]:
# 先按diag_p
case_data['adrg_r']=case_data[['mdc_r','diag_p']].apply(lambda x:mdc_icd9_adrg_dict.get(tuple(x),'NA'),axis=1)

In [151]:
# 再按oper_p,adrg_r2为临时变量
case_data['adrg_r2']=case_data[['mdc_r','oper_p']].apply(lambda x:icd9mdc_dict.get(tuple(x),'NA'),axis=1)

In [None]:
# 再检测一致性，不一致的先偷懒标为错误病历,实际需要调整本地化的规则
case_data['error_case']=False

case_data.loc[(case_data.adrg_r2!='NA') & (case_data.adrg_r!=case_data.adrg_r2),'error_case']=True

case_data=case_data.loc[case_data.error_case==False].copy()

case_data.drop(columns=['adrg_r2'],inplace=True)

In [163]:
case_data.head()

Unnamed: 0,case_id,gender,age,diag_p,diag_oth,oper_p,fee,mdci,adrgi,drgii,mdc_r,adrg_r,error_case
4,ectQx,0,78,D37.019,T82.501,,389.045262,MDCD,DR1,DR13,MDCD,DR1,False
5,URvIm,0,39,R82.500x004,T91.205,,621.433309,MDCL,LW1,LW15,MDCL,LW1,False
7,KbqpW,0,73,N71.001,I72.400x310,,822.699113,MDCN,NS1,NS19,MDCN,NS1,False
9,ZSFiM,0,75,O00.804,D69.500,65.0102,630.64349,MDCO,OE1,OE19,MDCO,OT1,False
11,LkpPL,0,40,R04.800x002,I80.804,,363.638841,MDCE,EV1,EV19,MDCE,EV1,False


#### 1.4. Assign DRG

##### 1.4.1 apply MCC/CC Rules

In [164]:
# Load MCC/CC list and Exclude Rules
# 此处应用美国模式偏向医学经验，澳洲模式偏大数据，有空再试
mcc_lst=pd.read_csv('./LOOKUP_TABLES/5_MCC_LIST_GB2312.csv',encoding='gbk')
cc_lst=pd.read_csv('./LOOKUP_TABLES/6_CC_LIST_GB2312.csv',encoding='gbk')
mcc_cc_ex=pd.read_csv('./LOOKUP_TABLES/7_MCC_CC_EXCLUDE_GB2312.csv',encoding='gbk')

In [175]:
mcc_rule_dict={mcc:rule for mcc,rule in zip(mcc_lst.MCC_ICD_CODE,mcc_lst.EXCLUDE)}
cc_rule_dict={cc:rule for cc,rule in zip(cc_lst.CC_ICD_CODE,cc_lst.CC_EXCLUDE)}

In [172]:
mcc_cc_ex_dict=mcc_cc_ex.groupby('TABLE_NO')['ICD_CODE'].apply(list)
mcc_cc_ex_dict=mcc_cc_ex_dict.to_dict()

In [185]:
mcc_lst=mcc_lst.MCC_ICD_CODE.to_list()
cc_lst=cc_lst.CC_ICD_CODE.to_list()

In [181]:
mcc_cc_ex_dict[cc_rule_dict['A03.200x001']][:5]

['A00.000x001', 'A00.100x001', 'A00.900', 'A00.900x002', 'A00.900x003']

In [187]:
'A02.100' not in mcc_lst#.MCC_ICD_CODE.to_list()

False

In [189]:
def is_mcc(input_data):
    diag_p,diag_oth=input_data
    rule=mcc_rule_dict.get(diag_oth,'NA')
    if rule=='NA':
        return False
    elif diag_p in mcc_cc_ex_dict.get(rule,[]):
        return False
    else:
        return True

In [192]:
# 偷懒写两遍算了,也清晰点:-P
def is_cc(input_data):
    diag_p,diag_oth=input_data
    rule=cc_rule_dict.get(diag_oth,'NA')
    if rule=='NA':
        return False
    elif diag_p in mcc_cc_ex_dict.get(rule,[]):
        return False
    else:
        return True

In [None]:
# try brief lambda script failed...
# diag_p in mcc_cc_ex_dict.get(mcc_rule_dict.get(diag_oth,'NA'),[])

In [171]:
case_data['has_mcc']=False
case_data['has_cc']=False

In [191]:
case_data['has_mcc']=case_data[['diag_p','diag_oth']].apply(is_mcc,axis=1)
case_data['has_cc']=case_data[['diag_p','diag_oth']].apply(is_cc,axis=1)

In [194]:
case_data.head()

Unnamed: 0,case_id,gender,age,diag_p,diag_oth,oper_p,fee,mdci,adrgi,drgii,mdc_r,adrg_r,error_case,has_mcc,has_cc
4,ectQx,0,78,D37.019,T82.501,,389.045262,MDCD,DR1,DR13,MDCD,DR1,False,True,False
5,URvIm,0,39,R82.500x004,T91.205,,621.433309,MDCL,LW1,LW15,MDCL,LW1,False,True,False
7,KbqpW,0,73,N71.001,I72.400x310,,822.699113,MDCN,NS1,NS19,MDCN,NS1,False,True,False
9,ZSFiM,0,75,O00.804,D69.500,65.0102,630.64349,MDCO,OE1,OE19,MDCO,OT1,False,True,False
11,LkpPL,0,40,R04.800x002,I80.804,,363.638841,MDCE,EV1,EV19,MDCE,EV1,False,True,False


In [215]:
cv_check=case_data.groupby('adrg_r')['fee'].agg(['mean','std'])

In [218]:
cv_check['cv']=cv_check['std']/cv_check['mean']

In [219]:
# 造数据时候CV设置小了-_-!
cv_check.describe()

Unnamed: 0,mean,std,cv
count,190.0,188.0,188.0
mean,461.984154,8.02092,0.017463
std,261.654113,36.91492,0.062646
min,137.441712,0.740732,0.000869
25%,212.728414,2.033367,0.004407
50%,397.113394,3.071928,0.00713
75%,627.523682,4.007857,0.012638
max,994.324898,348.667253,0.556685


##### 1.4.2. Train DRG grouper

In [None]:
# Start DRG Tree

In [448]:
# “1”表示伴有严重并发症与合并症;
# “3”表示表 示伴有一般并发症与合并症;
# “5”表示不伴有并发症与合并症;
# “7”表示死亡或转院;
# “9”表示未作区分的情况;
# “0”表示小于 17 岁组
# 数据分组只能将ADRG细分，具体分组对应末尾是1,3,5的那一种还需专家审核
# 或者用MCC/CC+age等写个人工规则判，所以先不直接命名为135,790可以按数据写规则判
# 此处只处理9,0做个样子,其余用'ABCDEFGHIJKLMNOPQRSTUVWXYZ'代替（26个符号分组一般够用了）
# 目前只模拟了二叉树

class train_drg_tree:
    def __init__(self):
        self.namestr=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
        
    @staticmethod
    def calcCv(data_set):
        if len(data_set)==0:
            return 0
        avg=data_set[:,-1].mean()
        std=data_set[:,-1].std()
        cv=std/avg
        return cv
    
    @staticmethod
    def splitDataSet(data_set, axis, value):
        tmp_l=data_set[data_set[:,axis]<=value]
        tmp_r=data_set[data_set[:,axis]>value]
        ret_l=np.concatenate((tmp_l[:,:axis],tmp_l[:,axis+1:]),axis=1)
        ret_r=np.concatenate((tmp_r[:,:axis],tmp_r[:,axis+1:]),axis=1)
        return ret_l,ret_r

    @staticmethod
    def chooseBestFeatureToSplit(data_set):
        numFeatures = len(data_set[0]) - 1   
        baseCv = calcCv(data_set)
        # 此处RIV只是借用了概念，实际RIV按国家文档为比值
        bestRIV = 0.0; bestFeature = -1
        # 同时记录第i个Feature的最佳value
        bestFeaValue={}
        for i in range(numFeatures):        
            featList = data_set[:,i]
            uniqueVals = list(set(featList))
            newCv = 99
            bestFeaValue[i]=uniqueVals[0]
            for value in uniqueVals:
                subl,subr = splitDataSet(data_set, i, value)
                tmpCv=(len(subl)*calcCv(subl)+len(subr)*calcCv(subr))/len(data_set)
                if tmpCv<newCv:
                    newCv=tmpCv
                    bestFeaValue[i]=value
    #         print(newCv)
            RIV = baseCv - newCv     
            if (RIV > bestRIV):       
                bestRIV = RIV         
                bestFeature = i
        return bestFeature,bestFeaValue.get(bestFeature,'NA')

    def createTree(self,data_set,labels,tgtCv=1):
        labels_= list(labels)

        # Cv达标就能停了
        if self.calcCv(data_set) < tgtCv:
            return self.namestr.pop(0)

        # 或者分到最后一个特征字段还是不行
        if len(data_set[0]) == 1:
            return self.namestr.pop(0)

        bestFeat,bestFeaValue = self.chooseBestFeatureToSplit(data_set)

        # 剩余特征无法进一步区分则停
        if bestFeaValue=='NA':
            return self.namestr.pop(0)

        bestFeatLabel = labels_[bestFeat]
        myTree = {bestFeatLabel:{}}
        labels_.remove(bestFeatLabel)

        subl,subr = self.splitDataSet(data_set, bestFeat, bestFeaValue)

#         print(labels_)
#         print('\n')
#         print(self.namestr)

        myTree[bestFeatLabel]['<='+str(bestFeaValue)] = self.createTree(subl,labels_,tgtCv)
        myTree[bestFeatLabel]['>'+str(bestFeaValue)] = self.createTree(subr,labels_,tgtCv)

        return myTree

In [453]:
# Test
labels=['gender','age','has_mcc','fee']

test_data_set=case_data.loc[case_data.adrg_r=='XT3',labels].copy()
test_data_set=test_data_set.values
# 造数据stdev编小了，现在只能调小目标Cv了
myTree = train_drg_tree().createTree(data_set,labels,tgtCv=0.02)
myTree

{'age': {'<=60': 'A', '>60': {'gender': {'<=0': 'B', '>0': 'C'}}}}

In [460]:
# 批量训练得到ADRG->DRG分组规则
drg_rule_models=case_data.groupby('adrg_r')[['gender','age','has_mcc','fee']].apply(lambda x:train_drg_tree().createTree(x.values,labels,tgtCv=0.02))

In [468]:
drg_rule_models.tail()

adrg_r
XT2                                                    A
XT3    {'age': {'<=60': 'A', '>60': {'gender': {'<=0'...
YC1                   {'age': {'<=28': 'A', '>28': 'B'}}
YR1                                                    A
YR2                                                    A
dtype: object

In [467]:
# store model
import pickle
with open('./group_models/dummy_data_model.pickle','wb') as f:
    pickle.dump(drg_rule_models,f)

In [496]:
drg_rule_models=drg_rule_models.to_dict()

##### 1.4.3. Apply DRG grouper

In [495]:
import ast #for safety eval
def drg_rule_apply(rule_model, featLabels, testVec):
    if isinstance(rule_model, str):
        return rule_model

    fea1 = list(rule_model)[0]
    featIndex = featLabels.index(fea1)
    data_value= testVec[featIndex]
    
    thres=list(rule_model[fea1])[0].split('<=')[1]
    
    if testVec[featIndex]<=ast.literal_eval(thres):
        next_rule=rule_model[fea1]['<='+thres]
    else:
        next_rule=rule_model[fea1]['>'+thres]
    
    result=drg_rule_apply(next_rule, featLabels, testVec)
    
    return result

# Test
drg_rule_apply(myTree,['gender','age','has_mcc'],[1, 70, True])

'C'

In [514]:
# 批量执行ADRG->DRG分组规则
case_data['drg_r']=case_data[['adrg_r','gender','age','has_mcc']].apply(\
                                                lambda x:drg_rule_apply(drg_rule_models[x[0]],\
                                                ['gender','age','has_mcc'],\
                                               x[1:]),axis=1)

In [520]:
case_data['drg_r']=case_data[['adrg_r','drg_r']].apply(lambda x:'_'.join(list(x)),axis=1)

In [525]:
case_data['drg_fee']=case_data.groupby('drg_r')['fee'].transform('mean')

In [534]:
# Extra experiments
# 也可以通过drg_fee去排初步531轻中重

In [563]:
drg_rank=case_data.groupby(['adrg_r','drg_r'],as_index=False)['fee'].mean()
drg_rank['rank']=drg_rank.groupby('adrg_r').rank()
drg_rank_dict={drg_r:rank for drg_r,rank in zip(drg_rank.drg_r,drg_rank['rank'])}

In [569]:
case_data['drg_rank']=case_data['drg_r'].apply(lambda x:drg_rank_dict[x])

In [572]:
case_data.tail()

Unnamed: 0,case_id,gender,age,diag_p,diag_oth,oper_p,fee,mdci,adrgi,drgii,mdc_r,adrg_r,error_case,has_mcc,has_cc,drg_r,drg_fee,drg_rank
9994,BzKT8,0,33,C60.900,E83.003+F02.8*,60.6900x002,300.6004,MDCM,MA1,MA19,MDCM,MR1,False,True,False,MR1_A,308.809161,3.0
9995,WjqQ8,0,56,D15.200x001,A41.502,,820.916885,MDCR,RT2,RT25,MDCR,RT2,False,True,False,RT2_A,819.663718,1.0
9997,DMMRh,0,36,N76.000x003,I60.200x005,,821.484993,MDCN,NS1,NS19,MDCN,NS1,False,True,False,NS1_A,824.328389,1.0
9998,dyB7T,0,27,B22.000x001,I60.200x002,,992.849339,MDCY,YR1,YR15,MDCY,YR1,False,True,False,YR1_A,994.324898,1.0
9999,eMDS4,1,52,O04.801,C15.200,,637.617437,MDCO,OS2,OS29,MDCO,OS2,False,True,False,OS2_A,635.084427,1.0


In [573]:
case_data.to_csv('./dummy_data/grouped_data.csv',index=False)
# 后续把rank 123 映射为 135，及唯一分组标记为9，年龄小于17标记为0后重新命名DRG组名称：略

#### 1.5. Group Results QC/Evaluate

In [None]:
# 0.6. 判断细分 DRG 组，需同时满足以下条件:
# 略

In [None]:
# 分组效能评价RIV等，略

### 2. Payment Method Design(RW Rate&Rules)

In [575]:
avg_case_fee=case_data.fee.mean()
case_data['RW']=case_data['drg_fee']/avg_case_fee

In [576]:
case_data.columns

Index(['case_id', 'gender', 'age', 'diag_p', 'diag_oth', 'oper_p', 'fee',
       'mdci', 'adrgi', 'drgii', 'mdc_r', 'adrg_r', 'error_case', 'has_mcc',
       'has_cc', 'drg_r', 'drg_fee', 'drg_rank', 'RW'],
      dtype='object')

In [578]:
case_data[['case_id', 'gender', 'age', 'diag_p',\
           'diag_oth', 'oper_p', 'fee',\
         'drg_r', 'drg_fee', 'drg_rank', 'RW']].head()
# 其他各种绩效玩法，略

Unnamed: 0,case_id,gender,age,diag_p,diag_oth,oper_p,fee,drg_r,drg_fee,drg_rank,RW
4,ectQx,0,78,D37.019,T82.501,,389.045262,DR1_A,390.064114,1.0,0.798311
5,URvIm,0,39,R82.500x004,T91.205,,621.433309,LW1_A,618.981597,1.0,1.266817
7,KbqpW,0,73,N71.001,I72.400x310,,822.699113,NS1_A,824.328389,1.0,1.687083
9,ZSFiM,0,75,O00.804,D69.500,65.0102,630.64349,OT1_A,628.042127,1.0,1.285361
11,LkpPL,0,40,R04.800x002,I80.804,,363.638841,EV1_A,363.729547,1.0,0.744414
