## Given Information and Assumptions :

+ Training set includes true icd10 codes (50K rows) regarding unique cases   
+ On the other hand test set includes a list of icd10 codes (10K rows) for each unique case
which has **either missing or not case related** icd10 codes  
_(Bold written statements are not clearly defined in task text and accepted as assumptions)_

## Target - Output    

+ Recommend **5** icd10 codes for each case in test set 
+ Save 'test_df_with_recommendations.xlsx' with recommended icd10 codes 


## Steps  

+ Find most similar 5 case in training set for each case in test set  using TF-IDF and cosine distance 
+ Compare given test case icd10 codes with most common icd10 codes in m most similar training cases with a custom function
+ Recommend n number(5 in the task) of icd10 codes considering m similar trainig case icd10 codes as a result of custom function

+ Note: _m variable is used as parameter to filter m number of similar cases in training set_

## Additional Outputs

+ Definitions of recommended icd10 codes to provide additional information for further evaluation of results 
+ Similarity between recomended icd10 codes based on a custom trained Word2Vec word embeddings model 
+ Average Similarity Scores and List of Similarity Indexes to give deeper insight on recommended codes


In [1]:
# import libraries

import numpy as np
import pandas as pd

pd.set_option('display.width', 100000)
pd.set_option('max_colwidth', 8000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import linear_kernel

## Train Set

In [2]:
# import train set as df
train_df = pd.read_csv('train.csv')
train_df.head()

Unnamed: 0,case_id,icd
0,b7d38462-fe4b-41fe-96d6-f7988a4dc6fd,"G4731,R065"
1,9f2a1477-8c67-4eee-a69f-8f1800b0c5c5,"Z11,A150,J157,U820,Y579,Z290,Z9581,D684,E559,R11,F321,U6912,K716,E46,R636"
2,ec2c6f40-9176-4865-a40f-9972a31d79b6,"Q211,P616,R845,T813,J981,Q250,J90,P248,Z238,J120,P030,B965,P391,Z380,D648,P928,I460,P743,P284,K911,L22,Q391,B956,Z223,P788,R633,Z466,P221,P362,B962,Z258,Q390,P293,R650,L988,P243,Z431,U8021,P545,Z278,P241,P948,Z480,Q393,Z290,A081,P398,Q321,T801"
3,d693f494-4c75-4047-ae23-dfdb7ccb88fe,
4,ce23794b-d720-4039-8de5-78239235c3cf,"R32,I792,J90,R600,K528,I482,R91,D62,N390,L0311,J9609,I7025,T814,E1175,B956,I1000,G632,F03,B9542,B962,Z433,U8001,M8957,S0095,U5040,H408,K590,B952,N183,E876,K260,E1175,U8000,Z922,K250,G402"


In [3]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   case_id  50000 non-null  object
 1   icd      49233 non-null  object
dtypes: object(2)
memory usage: 781.4+ KB


In [4]:
train_df.head()

Unnamed: 0,case_id,icd
0,b7d38462-fe4b-41fe-96d6-f7988a4dc6fd,"G4731,R065"
1,9f2a1477-8c67-4eee-a69f-8f1800b0c5c5,"Z11,A150,J157,U820,Y579,Z290,Z9581,D684,E559,R11,F321,U6912,K716,E46,R636"
2,ec2c6f40-9176-4865-a40f-9972a31d79b6,"Q211,P616,R845,T813,J981,Q250,J90,P248,Z238,J120,P030,B965,P391,Z380,D648,P928,I460,P743,P284,K911,L22,Q391,B956,Z223,P788,R633,Z466,P221,P362,B962,Z258,Q390,P293,R650,L988,P243,Z431,U8021,P545,Z278,P241,P948,Z480,Q393,Z290,A081,P398,Q321,T801"
3,d693f494-4c75-4047-ae23-dfdb7ccb88fe,
4,ce23794b-d720-4039-8de5-78239235c3cf,"R32,I792,J90,R600,K528,I482,R91,D62,N390,L0311,J9609,I7025,T814,E1175,B956,I1000,G632,F03,B9542,B962,Z433,U8001,M8957,S0095,U5040,H408,K590,B952,N183,E876,K260,E1175,U8000,Z922,K250,G402"


## Test Set

In [5]:
test_df = pd.read_csv('test.csv')
test_df.head()

Unnamed: 0,case_id,icd
0,f44c3b96-13b1-423f-ad29-ccae8fcc737c,"K3188,R651,B368,K296,J90,K8581,J9601,E870,E880,K631,R638,Z9588,I1000,A4158,I728,B962,B371,K567,J988,R000,G7280,F328,G6280,K660,D688,K550,E876,K598,K914,R650,B956,R11,R263,Z433,U8000,E8338,N1793,L304,U8130,J81,R58,D62,T814,B952,R571,R600,B965,B3788,Z431,T810,A4151,U6912,E890,B966,K632,R18,Z904,U5050,L8906,A411,K518,Z223,Z480,Z290,Z430,K650,T818,Z741,E872,T813,D6961,K290,K560,N390"
1,0f90cb28-0936-4df9-be8a-c8da40e5f0be,"P290,K602,Z380,P708,K631,U8104,Q211,L22,P780,P833,T801,B962,Q250,P920,L989,B371,P0701,P709,P616,P363,K660,P590,H351,P391,P072,P781,P838,K914,K566,R650,P719,K4020,Z763,R628,Z291,P743,U8108,P293,K710,T814,B952,P922,U8021,T810,P610,Z258,P742,Z876,P962,Z480,Z238,P928,Z290,P284,P021,Z932,E872,P220,J00,P612,Z278,B958,P251"
2,fbee83e4-4d7d-403f-a1c0-e18493871647,"Z53,E784,E6602,I1190,F328,E1190"
3,eea2ee05-542e-4aca-b342-6f5fe4d63242,"R15,E1120,T885,T8404,E870,E880,S7201,N189,M8605,E038,S7200,I1000,E875,B962,M8665,M6226,D688,E876,K590,R650,R11,N083,R263,M966,Z988,Z11,E784,U8121,D62,Z740,M8625,B3788,I959,I8028,Z911,B9590,U6912,B966,T8902,B9591,L8914,U5050,M6005,Z466,R410,Z223,K564,Z290,Z741,B958,M8645,U5110,T813,T845,A418,M6006,E6681"
4,f98b2e70-d494-4231-98f9-277c24892b5f,"K3188,R651,N185,E880,J90,J9601,K720,E038,B962,I1000,J22,A4158,G934,M318,E875,G7280,E1151,G6280,E8358,T846,A4152,B956,M6002,K626,I792,A410,M702,L0311,R572,N1793,R131,L0310,L8924,L8911,D62,K250,B952,R571,B965,M8986,T810,L8921,M8957,R139,M7263,T8902,F058,E871,I7025,A411,U6912,Z223,T7968,D684,M8786,E872,U8150,T813,K290,Z470,T874"


In [6]:
test_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   case_id  10000 non-null  object
 1   icd      10000 non-null  object
dtypes: object(2)
memory usage: 156.4+ KB


In [7]:
test_df.head(5)

Unnamed: 0,case_id,icd
0,f44c3b96-13b1-423f-ad29-ccae8fcc737c,"K3188,R651,B368,K296,J90,K8581,J9601,E870,E880,K631,R638,Z9588,I1000,A4158,I728,B962,B371,K567,J988,R000,G7280,F328,G6280,K660,D688,K550,E876,K598,K914,R650,B956,R11,R263,Z433,U8000,E8338,N1793,L304,U8130,J81,R58,D62,T814,B952,R571,R600,B965,B3788,Z431,T810,A4151,U6912,E890,B966,K632,R18,Z904,U5050,L8906,A411,K518,Z223,Z480,Z290,Z430,K650,T818,Z741,E872,T813,D6961,K290,K560,N390"
1,0f90cb28-0936-4df9-be8a-c8da40e5f0be,"P290,K602,Z380,P708,K631,U8104,Q211,L22,P780,P833,T801,B962,Q250,P920,L989,B371,P0701,P709,P616,P363,K660,P590,H351,P391,P072,P781,P838,K914,K566,R650,P719,K4020,Z763,R628,Z291,P743,U8108,P293,K710,T814,B952,P922,U8021,T810,P610,Z258,P742,Z876,P962,Z480,Z238,P928,Z290,P284,P021,Z932,E872,P220,J00,P612,Z278,B958,P251"
2,fbee83e4-4d7d-403f-a1c0-e18493871647,"Z53,E784,E6602,I1190,F328,E1190"
3,eea2ee05-542e-4aca-b342-6f5fe4d63242,"R15,E1120,T885,T8404,E870,E880,S7201,N189,M8605,E038,S7200,I1000,E875,B962,M8665,M6226,D688,E876,K590,R650,R11,N083,R263,M966,Z988,Z11,E784,U8121,D62,Z740,M8625,B3788,I959,I8028,Z911,B9590,U6912,B966,T8902,B9591,L8914,U5050,M6005,Z466,R410,Z223,K564,Z290,Z741,B958,M8645,U5110,T813,T845,A418,M6006,E6681"
4,f98b2e70-d494-4231-98f9-277c24892b5f,"K3188,R651,N185,E880,J90,J9601,K720,E038,B962,I1000,J22,A4158,G934,M318,E875,G7280,E1151,G6280,E8358,T846,A4152,B956,M6002,K626,I792,A410,M702,L0311,R572,N1793,R131,L0310,L8924,L8911,D62,K250,B952,R571,B965,M8986,T810,L8921,M8957,R139,M7263,T8902,F058,E871,I7025,A411,U6912,Z223,T7968,D684,M8786,E872,U8150,T813,K290,Z470,T874"


## Merge Train and Test Sets 


In [8]:
merged_df = train_df.append(test_df)


In [9]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 60000 entries, 0 to 9999
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   case_id  60000 non-null  object
 1   icd      59233 non-null  object
dtypes: object(2)
memory usage: 1.4+ MB


In [10]:
merged_df = merged_df.dropna().reset_index().drop('index',axis=1)

In [11]:
merged_df.tail()

Unnamed: 0,case_id,icd
59228,2e4efd48-704f-46a4-bcc3-6f244e4228fd,"I1000,U5000,R060"
59229,0e05215a-ec60-4f4e-bae8-c99ad70a19a7,N23
59230,fbd19bdd-6509-4624-beb8-cb8e6f0721bf,"Z466,O641,Z882,O096,Z370,Z352"
59231,8cf55642-f8ee-4cde-a66f-70fc80a49543,"S060,S066,Z741,Z235,S0005,S0670,S0180,S6261"
59232,2920775f-8f39-4add-893c-83971a4e6599,"R522,U5030,I1001,D6835,J358,Z950,I480,Z9664,T830,Y579,U6912,E1190"


In [12]:
merged_df['icd10_codes'] = merged_df.icd.apply(lambda x: ','.join(x.split(',')))
merged_df['icd10_list'] = merged_df.icd.apply(lambda x: x.split(','))
merged_df = merged_df.drop('icd',axis=1)

In [13]:
merged_df.head()

Unnamed: 0,case_id,icd10_codes,icd10_list
0,b7d38462-fe4b-41fe-96d6-f7988a4dc6fd,"G4731,R065","[G4731, R065]"
1,9f2a1477-8c67-4eee-a69f-8f1800b0c5c5,"Z11,A150,J157,U820,Y579,Z290,Z9581,D684,E559,R11,F321,U6912,K716,E46,R636","[Z11, A150, J157, U820, Y579, Z290, Z9581, D684, E559, R11, F321, U6912, K716, E46, R636]"
2,ec2c6f40-9176-4865-a40f-9972a31d79b6,"Q211,P616,R845,T813,J981,Q250,J90,P248,Z238,J120,P030,B965,P391,Z380,D648,P928,I460,P743,P284,K911,L22,Q391,B956,Z223,P788,R633,Z466,P221,P362,B962,Z258,Q390,P293,R650,L988,P243,Z431,U8021,P545,Z278,P241,P948,Z480,Q393,Z290,A081,P398,Q321,T801","[Q211, P616, R845, T813, J981, Q250, J90, P248, Z238, J120, P030, B965, P391, Z380, D648, P928, I460, P743, P284, K911, L22, Q391, B956, Z223, P788, R633, Z466, P221, P362, B962, Z258, Q390, P293, R650, L988, P243, Z431, U8021, P545, Z278, P241, P948, Z480, Q393, Z290, A081, P398, Q321, T801]"
3,ce23794b-d720-4039-8de5-78239235c3cf,"R32,I792,J90,R600,K528,I482,R91,D62,N390,L0311,J9609,I7025,T814,E1175,B956,I1000,G632,F03,B9542,B962,Z433,U8001,M8957,S0095,U5040,H408,K590,B952,N183,E876,K260,E1175,U8000,Z922,K250,G402","[R32, I792, J90, R600, K528, I482, R91, D62, N390, L0311, J9609, I7025, T814, E1175, B956, I1000, G632, F03, B9542, B962, Z433, U8001, M8957, S0095, U5040, H408, K590, B952, N183, E876, K260, E1175, U8000, Z922, K250, G402]"
4,8921f432-47c8-4c1a-b73e-4f64a5def29f,"I792,L278,I7025,B965,H041,I2513,Z741,D688,J069,Z894,B956,U6912,T8903,G632,Z9588,B962,E784,Z867,Z950,D508,K590,Z480,I1001,E790,B9591,T876,E1175,Z740,N182,B958,T818","[I792, L278, I7025, B965, H041, I2513, Z741, D688, J069, Z894, B956, U6912, T8903, G632, Z9588, B962, E784, Z867, Z950, D508, K590, Z480, I1001, E790, B9591, T876, E1175, Z740, N182, B958, T818]"


In [14]:
merged_df.shape[0]

59233

In [15]:
merged_df['count'] = merged_df['icd10_list'].apply(lambda x: len(x))
merged_df.describe()

Unnamed: 0,count
count,59233.0
mean,8.630578
std,6.788996
min,1.0
25%,4.0
50%,7.0
75%,12.0
max,89.0


In [16]:
merged_df[merged_df['count']<3].sample(3)

Unnamed: 0,case_id,icd10_codes,icd10_list,count
54565,ba8a6fc4-5e7f-4070-8c8e-032e44a3ea46,N132,[N132],1
24266,a645b2bd-6320-4952-95e9-4d9246a1c198,R065,[R065],1
11146,6d5e42d1-5e09-471f-919d-1ac29b91d010,"J069,S060","[J069, S060]",2


In [17]:
corpus_merged = []

for idx,row in merged_df.iterrows():
    corpus_merged.extend(row['icd10_list'])

len(corpus_merged)

511215

In [18]:
from collections import Counter

c1 = Counter(corpus_merged)
    
print( 'count_codes:',c1.most_common(20),'\n')

count_codes: [('I1000', 17595), ('U5000', 10185), ('Z741', 9350), ('Z740', 9338), ('E1190', 7121), ('E784', 6975), ('Z921', 6475), ('Z922', 5906), ('Z480', 5801), ('U5010', 5328), ('B962', 4496), ('E038', 4378), ('I480', 4227), ('N390', 4120), ('E86', 3735), ('Z290', 3578), ('Z904', 3562), ('I1190', 3348), ('Z763', 3320), ('N183', 3310)] 



In [19]:
print(merged_df.columns)

Index(['case_id', 'icd10_codes', 'icd10_list', 'count'], dtype='object')


## Get TF-IDF  Vect from sklearn library

In [20]:
# Get tf-idf of icd10 codes to get sentence(icd10_codes row) similarities
tfidfvectoriser = TfidfVectorizer()
tfidfvectoriser.fit(merged_df['icd10_codes'])
tfidf_vectors = tfidfvectoriser.transform(merged_df['icd10_codes']) 

In [26]:
# custom fuction to get recommendation icd10 codes
from collections import Counter


def most_similar(idx, number_of_similar_ids, recommended_icd10_num=5):
  
    # idx : index number of target case, 
    
    #print('target case given icd10 codes :', merged_df.iloc[idx,2],'\n')
    
    # get cosine_similarities for given case 
    cosine_similarities = linear_kernel(tfidf_vectors[idx], tfidf_vectors).flatten()
    document_scores = [item.item() for item in cosine_similarities[0:]]

    # generate df with calculated similarity scores
    df_similarity = pd.DataFrame(document_scores, columns=['score'])
    update_df = df_similarity.drop([idx])
    
    # get euclidian distance
    
    
    # select #n most similar case indexes from similarity df 
    most_similar_df = update_df.sort_values(by='score',ascending=False).head(number_of_similar_ids)
    similar_indexes = most_similar_df.index.values
    
    # list >> icd10_list, (icd10_codes)
    # get icd10 codes of #n most similar cases from merged df
    icd_10_codes = merged_df['icd10_list'][merged_df.index.isin(similar_indexes)].values

    
    # get set of icd10 codes for most similar #n cases
    nearest_icd_code_list = {item for items in icd_10_codes for item in items}
    
    # get common icd10 codes between target case and most similar n case
    common_list = [x for x in merged_df.iloc[idx,2] if x in nearest_icd_code_list ]

    
    # count the number of icd10 codes for most similar #n cases
    c = Counter([item for items in icd_10_codes for item in items])
    count_common_icd10 = c.most_common()
    
    #print( 'count_codes:',count_common_icd10,'\n')
    
    # select top m = recommended_icd10_num from counted_common_icd10 list gathered from #n most similar cases
    recommendation_list = []
    for code in count_common_icd10:
        # if most common codes are already in target case icd10 list include them in recomendation list
        if code[0] in merged_df.iloc[idx,2]:
            recommendation_list.append(code)      
    
    #print('recommendation_list:',recommendation_list)
    # recommend additional icd10 codes if target icd10 code list has less than 5 codes

    
    if len(recommendation_list) < recommended_icd10_num :
        
        l = len(recommendation_list)
        
        recommendation_list.extend([x for x in count_common_icd10 if x not in recommendation_list][0:recommended_icd10_num-l])
        
        #print('recommended_codes:', recommendation_list,'\n') 
     
    
    return recommendation_list[0:recommended_icd10_num]#most_similar_df, icd_10_codes

In [27]:
# Check out custom function get single prediction

#unq_case_id = 'b7d38462-fe4b-41fe-96d6-f7988a4dc6fd'
#given_index = merged_df[merged_df.case_id==].index.astype(int)[0]

given_index = 56000

recommendation_list = [x[0] for x in most_similar(given_index,20,5)]

print(recommendation_list)
#print(y)

['C711', 'K5909', 'Z740', 'Z741', 'Z480']


In [None]:
# Get icd_10 code recommendations for test set and write to excel file
result_recommendation_list = []
number_of_similar_ids =50
recommended_icd10_num=5

for unq_case_id in test_df.case_id.values:
    try:
        given_index = merged_df[merged_df.case_id==unq_case_id].index.astype(int)[0]
        recommendation_list = [x[0] for x in most_similar(given_index,number_of_similar_ids,recommended_icd10_num)]
        result_recommendation_list.append(recommendation_list)
    except:
        print('ERROR', unq_case_id,)
        result_recommendation_list.append('No Recommendation Possible')
        
test_df['icd_10_recommendations'] = result_recommendation_list
test_df.to_excel('test_df_with_recommendations.xlsx')  

In [65]:
df_test = pd.read_excel('test_df_with_recommendations_50.xlsx')
df_test.sample(5)

Unnamed: 0.1,Unnamed: 0,case_id,icd,icd_10_recommendations
2169,2169,4f462d5a-40fb-4899-93fb-eae519123543,"I8720,I258,C508,R296,M1901,K868,G632,E8358,C773,Z850,D62,Z740,D051,I1100,E1140,U5020,M8180,I5011,Z903,H408,Z741,L82,K5730","['C508', 'C773', 'Z740', 'G632', 'E1140']"
8439,8439,925318d9-38aa-4a22-bd9c-c96894db5c2a,"E871,Z11,E86,Z290,D472,I1100,U5010,J4409,I480","['Z11', 'Z290', 'E86', 'E871', 'J4409']"
9210,9210,f2412d7f-67ed-473e-876b-9c4a5dc1cfba,"E871,Z11,R32,Z921,J9600,J9601,I1000,U5020,E058","['J9600', 'I1000', 'J9601', 'E871', 'Z11']"
6383,6383,b82dc44c-7d35-4066-802c-f40cf075e219,"C780,U5000,J9609,Z923,Z480,E6600,I1000,C492,E1190","['U5000', 'I1000', 'E6600', 'E1190', 'Z480']"
1512,1512,9e8ac41b-d3c8-4c80-9e2e-dcf7102c9304,"I7025,S065,Z922,E86,D62,G402,I958,Z867,B965,J987,I1000,B962,I480,N390,Z953,B958,R139","['I1000', 'B962', 'N390', 'Z922', 'B965']"


## Find most similar icd10 codes  (train word embeddings model)
## Get icd10 code definitions 
## Visualize results on a df

In [32]:
## Get icd10 definitions from oficial source

df = pd.read_csv('icd10gm_def_1.txt', delimiter=';', header=None)
df.columns= ['code','definiton']
df['code'] = df.code.astype(str)
df.head(2)

Unnamed: 0,code,definiton
0,UNDEF,Undefined
1,A00,Cholera


In [35]:
#Get Single ICD10 
given_index = 55900
number_of_similar_ids = 50
recommended_icd10_num = 5

recommendation_list = [x[0] for x in most_similar(given_index, number_of_similar_ids,recommended_icd10_num)]

# convert given icd10 codes to right format to compare in icd10gm_2021
def code(code):
    converted_code = str(code)[0:3]+'.'+str(code)[3:]
    if converted_code[-1] == '.':
        converted_code = converted_code+'0'
    return converted_code

code_list = [code(x) for x in recommendation_list]
fixed_codes = [x[0:3] for x in code_list if x[-2:]=='.0']
code_list.extend(fixed_codes)
code_list_2 = list(set(code_list))

recom_df = df[df.code.isin(code_list_2)]


code_list_3 = []
for idx in recom_df.index:
    if (df.loc[[idx],'code'].item()[0] == df.loc[[idx+1],'code'].item()[0]) and  (eval(df.loc[[idx], 'code'].item()[1:]) != eval(df.loc[[idx+1], 'code'].item()[1:])) :
        code_list_3.append(df.loc[[idx], 'code'].item())

recom_df = recom_df[recom_df .code.isin(code_list_3)]

# Train word embeddings model to get code similarities
from gensim.models import Word2Vec

# generate word corpus
corpus =merged_df.icd10_list.tolist() 

# train model
model_1 = Word2Vec(corpus, size=100, window=4, min_count=2, workers=4)

#save model
model_1.save("model_test.model")

#print("Total number of unique words loaded in Model : ", len(model_1.wv.vocab))

# load saved model
#model_1 = Word2Vec.load("model_test.model")

# find similarities between recommended icd10 codes via word embeddings similarity method
icd_similarities_dict = dict()
for item in recommendation_list:
    icd_similarities_dict[item] = []
    df_ = pd.DataFrame(model_1.wv.most_similar(item, topn=5500), columns=['icd10','sim_score'])
    sub_list = recommendation_list[:]
    sub_list.remove(item)
    for code in sub_list:
        icd_similarities_dict[item].append(df_[df_.icd10==code].reset_index().set_index(['icd10']).T.to_dict())
        
sim_list = []
for code in recom_df.code.values:
    for k,v in icd_similarities_dict.items():
        if (str(code).replace('.','')) == k :
            sim_list.append(v)
            break

recom_df['icd10_similarities'] = sim_list

# generate 2 new columns to average values from similarity evaluation
list_similarity_index = []
list_average_score = []

for data in recom_df['icd10_similarities'].values:
    average_indexes = []
    average_score = 0
    #print('len data:',len(data))
    for item in data:
        average_indexes.append(int((list(item.values())[0])['index']))
        average_score += ((list(item.values())[0])['sim_score'])
    
    list_average_score.append(average_score/len(data))
    list_similarity_index.append(average_indexes)
        
recom_df['list_similarity_indexes'] = list_similarity_index
recom_df['average_similarity_score'] = list_average_score

recom_df


Unnamed: 0,code,definiton,icd10_similarities,list_similarity_indexes,average_similarity_score
11379,P08.1,Sonstige für das Gestationsalter zu schwere Neugeborene,"[{'Z380': {'index': 23.0, 'sim_score': 0.9802700281143188}}, {'P922': {'index': 16.0, 'sim_score': 0.9835652709007263}}, {'Z763': {'index': 2441.0, 'sim_score': 0.6926903128623962}}, {'P590': {'index': 15.0, 'sim_score': 0.9843103289604187}}]","[23, 16, 2441, 15]",0.910209
11587,P59.0,Neugeborenenikterus in Verbindung mit vorzeitiger Geburt,"[{'Z380': {'index': 12.0, 'sim_score': 0.9904824495315552}}, {'P081': {'index': 20.0, 'sim_score': 0.9843103289604187}}, {'P922': {'index': 1.0, 'sim_score': 0.9963864684104919}}, {'Z763': {'index': 1504.0, 'sim_score': 0.7498672604560852}}]","[12, 20, 1, 1504]",0.930262
11685,P92.2,Trinkunlust beim Neugeborenen,"[{'Z380': {'index': 15.0, 'sim_score': 0.9861516952514648}}, {'P081': {'index': 17.0, 'sim_score': 0.9835652112960815}}, {'Z763': {'index': 1103.0, 'sim_score': 0.7560762166976929}}, {'P590': {'index': 0.0, 'sim_score': 0.9963863492012024}}]","[15, 17, 1103, 0]",0.930545
15793,Z38.0,"Einling, Geburt im Krankenhaus","[{'P081': {'index': 18.0, 'sim_score': 0.9802700281143188}}, {'P922': {'index': 7.0, 'sim_score': 0.986151933670044}}, {'Z763': {'index': 1874.0, 'sim_score': 0.7291092872619629}}, {'P590': {'index': 2.0, 'sim_score': 0.9904825091362}}]","[18, 7, 1874, 2]",0.921503
15991,Z76.3,Gesunde Begleitperson einer kranken Person,"[{'Z380': {'index': 113.0, 'sim_score': 0.7291092276573181}}, {'P081': {'index': 246.0, 'sim_score': 0.6926902532577515}}, {'P922': {'index': 64.0, 'sim_score': 0.7560762166976929}}, {'P590': {'index': 81.0, 'sim_score': 0.7498672008514404}}]","[113, 246, 64, 81]",0.731936
